首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Single nucleotide polymorphisms (SNPs) have been proposed to be grouped into haplotype blocks harboring a limited number of haplotypes. Within each block, the portion of haplotypes is expected to be tagged by a selected subset of SNPs; however, none of the proposed selection algorithms have been definitive. To address this issue, we developed a tag SNP selection algorithm based on grouping of SNPs by the linkage disequilibrium (LD) coefficient r(2) and examined five genes in three ethnic populations--the Japanese, African Americans, and Caucasians. Additionally, we investigated ethnic diversity by characterizing 979 SNPs distributed throughout the genome. Our algorithm could spare 60% of SNPs required for genotyping and limit the imprecision in allele-frequency estimation of nontag SNPs to 2% on average. We discovered the presence of a mosaic pattern of LD plots within a conventionally inferred haplotype block. This emerged because multiple groups of SNPs with strong intragroup LD were mingled in their physical positions. The pattern of LD plots showed some similarity, but the details of tag SNPs were not entirely concordant among three populations. Consequently, our algorithm utilizing LD grouping allows selection of a more faithful set of tag SNPs than do previous algorithms utilizing haplotype blocks.  相似文献   

2.
Efficient selective screening of haplotype tag SNPs   总被引:12,自引:0,他引:12  
Haplotypes defined by common single nucleotide polymorphisms (SNPs) have important implications for mapping of disease genes and human traits. Often only a small subset of the SNPs is sufficient to capture the full haplotype information. Such subsets of markers are called haplotype tagging SNPs (htSNPs). Although htSNPs can be identified by eye, efficient computer algorithms and flexible interactive software tools are required for large datasets such as the human genome haplotype map. We describe a java-based program, SNPtagger, which screens for minimal sets of SNP markers to represent given haplotypes according to various user requirements. The program offers several options for inclusion/exclusion of specific markers and presents alternative panels for final selection. AVAILABILITY: The www-based program is available at http://www.well.ox.ac.uk/~xiayi/haplotype/index.html.  相似文献   

3.
The immense volume and rapid growth of human genomic data, especially single nucleotide polymorphisms (SNPs), present special challenges for both biomedical researchers and automatic algorithms. One such challenge is to select an optimal subset of SNPs, commonly referred as "haplotype tagging SNPs" (htSNPs), to capture most of the haplotype diversity of each haplotype block or gene-specific region. This information-reduction process facilitates cost-effective genotyping and, subsequently, genotype-phenotype association studies. It also has implications for assessing the risk of identifying research subjects on the basis of SNP information deposited in public domain databases. We have investigated methods for selecting htSNPs by use of principal components analysis (PCA). These methods first identify eigenSNPs and then map them to actual SNPs. We evaluated two mapping strategies, greedy discard and varimax rotation, by assessing the ability of the selected htSNPs to reconstruct genotypes of non-htSNPs. We also compared these methods with two other htSNP finders, one of which is PCA based. We applied these methods to three experimental data sets and found that the PCA-based methods tend to select the smallest set of htSNPs to achieve a 90% reconstruction precision.  相似文献   

4.
MOTIVATION: Missing data in genotyping single nucleotide polymorphism (SNP) spots are common. High-throughput genotyping methods usually have a high rate of missing data. For example, the published human chromosome 21 data by Patil et al. contains about 20% missing SNPs. Inferring missing SNPs using the haplotype block structure is promising but difficult because the haplotype block boundaries are not well defined. Here we propose a global algorithm to overcome this difficulty. RESULTS: First, we propose to use entropy as a measure of haplotype diversity. We show that the entropy measure combined with a dynamic programming algorithm produces better haplotype block partitions than other measures. Second, based on the entropy measure, we propose a two-step iterative partition-inference algorithm for the inference of missing SNPs. At the first step, we apply the dynamic programming algorithm to partition haplotypes into blocks. At the second step, we use an iterative process similar to the expectation-maximization algorithm to infer missing SNPs in each haplotype block so as to minimize the block entropy. The algorithm iterates these two steps until the total block entropy is minimized. We test our algorithm in several experimental data sets. The results show that the global approach significantly improves the accuracy of the inference. AVAILABILITY: Upon request.  相似文献   

5.
Inactivating mutations in the TSC2 gene, consisting of 41coding exons in 40 kb on 16p13, cause the hamartoma syndrome tuberous sclerosis. During TSC2 mutational analysis we identified ten SNPs that occur within or close to exon boundaries at minor allele frequencies greater than 5%. We determined the haplotypes for six of these SNPs and the microsatellite marker kg8 in the 3' region of TSC2 in a set of 40 parent-child trios. The most common haplotypes accounted for 53%, 11%, 6%, and 5% of chromosomes. Thirty-eight TSC2 mutation-bearing haplotypes had a similar distribution, indicating that there was no haplotype that predisposed to mutation in this region of TSC2. Family analysis was possible in 12 sporadic cases, and indicated that the mother was the parent of origin in 7 cases (3 point mutations, 2 small deletions, 2 large deletions), while the father was in 5 cases (2 point mutations, 3 small deletions). We conclude that TSC2 mutations occur at substantial frequency on both the maternally and paternally derived TSC2 alleles, in contrast to many other genetic diseases including NF1. The observations have implications for genetic counseling in TSC.  相似文献   

6.
The definition of haplotype blocks of single-nucleotide polymorphisms (SNPs) has been proposed so that the haplotypes can be used as markers in association studies and to efficiently describe human genetic variation. The International Haplotype Map (HapMap) project to construct a comprehensive catalog of haplotypic variation in humans is underway. However, a number of factors have already been shown to influence the definition of blocks, including the population studied and the sample SNP density. Here, we examine the effect that marker selection has on the definition of blocks and the pattern of haplotypes by using comparable but complementary SNP sets and a number of block definition methods in various genomic regions and populations that were provided by the Encyclopedia of DNA Elements (ENCODE) project. We find that the chosen SNP set has a profound effect on the block-covered sequence and block borders, even at high marker densities. Our results question the very concept of discrete haplotype blocks and the possibility of generalizing block findings from the HapMap project. We comparatively apply the block-free tagging-SNP approach and discuss both the haplotype approach and the tagging-SNP approach as means to efficiently catalog genetic variation.  相似文献   

7.
8.
Single nucleotide polymorphisms (SNPs) are widely used when investigators try to map complex disease genes. Although biallelic SNP markers are less informative than microsatellite markers, one can increase their information content by using haplotypes. However, assigning haplotypes (i.e., assigning phase) correctly can be problematic in the presence of SNP heterozygosity. For example, a doubly heterozygous individual, with genotype 12, 12, could have haplotypes 1-1/2-2 or 1-2/2-1 with equal probability; in the absence of additional information, there is no way to determine which haplotype is correct. Thus an algorithm that assigns haplotypes to such an individual will assign the wrong one 50% of the time. We have studied the frequency of haplotype misassignments, i.e., haplotypes that are misassigned solely because of inherent marker ambiguity (not because of errors in genotyping or calculation). We examined both SNPs and microsatellite markers. We used the computer programs GENEHUNTER and SIMWALK to assign the haplotypes. We simulated (a) families with 1-5 children, (b) haplotypes involving different numbers of marker loci (3, 5, 7 and 10 loci, all in linkage equilibrium), and (c) different allele frequencies. Misassignment rates are highest (a) in small families, (b) with many SNP loci, and (c) for loci with the greatest heterozygosity (i.e., where both alleles have frequency 0.5). For example, for triads (i.e., one-child families with both parents genotyped), misassignment rates for SNPs can reach almost 50%. Family sizes of 4-5 children are required in order to ensure a misassignment frequency of < or = 5% for ten-SNP haplotypes with allele frequencies of 0.25-0.5. For microsatellites, a family size of at least 2-3 children is necessary to keep haplotyping misassignments < or = 5%. Finally, we point out that it is misleading for a computer program to yield haplotype assignments without indicating that they may have been misassigned, and we discuss the implications of these misassignments for association and linkage analysis.  相似文献   

9.
OBJECTIVES: Discrete blocks of low haplotype diversity exist within the human genome. The non-redundant subset of 'haplotype tagging' single nucleotide polymorphisms (htSNPs) in such blocks can distinguish a majority of the haplotypes. Several approaches have been proposed to determine htSNPs, ranging from visual inspection to formal analytic procedures. Optimal htSNPs can be estimated using a small subgroup of an association study population that have been genotyped for a dense SNP map, and it is just these htSNPs that are genotyped in the remainder of the samples. We investigated by simulation how the size of the subsample affects the power of association studies, and what type of subjects it should include. METHODS: We used the program tagSNPs [Stram et al., Hum Hered 2003;55:27-36], which selects htSNPs to minimize the uncertainty in predicting common haplotypes for individuals with unphased genotype data. RESULTS: On average, 27% of the SNPs were designated as htSNPs. Genotyping as few as 25 unphased individuals to select the htSNPs did not appear to reduce the power of an association study, as compared with using all SNPs. For the disease models considered, selecting htSNPs based on cases, controls, or a mixture of both gave similar results. CONCLUSIONS: These results suggest that the genotyping effort in an association study can be substantially reduced with little loss of power by identifying htSNPs in a small subsample of individuals.  相似文献   

10.
Based on EST sequences, fragments of 37 genes have been amplified and sequenced in two inbred lines of sugar beet. The rate of single nucleotide polymorphisms (SNP) corresponded to 1 every 130 bp, with an average (nucleotide diversity) value of 7.6×10–3. When extrapolated to the whole sugar beet genome, randomly compared lines differ at 5.4×106 SNPs in the genetic pool considered. In a wider search for SNP-related polymorphisms, 96 fragments of expressed genes were scanned with SSCP (single-strand conformation polymorphism) and heteroduplex (HA) analyses in 8 inbred lines. One SSCP or HA polymorphism was found every 1,470 bp of amplified DNA, corresponding to 5×105 SSCP or HA loci in the whole genome. This frequency, 11 times lower than the SNP rate, was attributed to the high frequency of base pair substitution along the amplified fragment analysed electrophoretically. Therefore nucleotide variability was further studied by sequencing fragments of 10 genes in the same 8 lines. The results indicate that sugar beet alleles of expressed genes are very frequently organized as robust intragene haplotypes. In the 8 lines analysed, two haplotypes were identified for each of three gene fragments, three haplotypes for six gene fragments and four haplotypes for one gene fragment which is in good correspondence with the number of alleles detected by SSCP and HA analysis. In a cross between two lines, SSCP or HA alleles of expressed genes have 54% probability to be different.  相似文献   

11.
The genomic region surrounding the TNF locus on human chromosome 6 has previously been associated with typhoid fever in Vietnam (Dunstan et al. in J Infect Dis 183:261–268, 2001). We used a haplotypic approach to understand this association further. Eighty single nucleotide polymorphisms (SNPs) spanning a 150 kb region were genotyped in 95 Vietnamese individuals (typhoid case/mother/father trios). A subset of data from 33 SNPs with a minor allele frequency of >4.3% was used to construct haplotypes. Fifteen SNPs, which tagged the 42 constructed haplotypes were selected. The haplotype tagging SNPs (T1–T15) were genotyped in 380 confirmed typhoid cases and 380 Vietnamese ethnically matched controls. Allelic frequencies of seven SNPs (T1, T2, T3, T5, T6, T7, T8) were significantly different between typhoid cases and controls. Logistic regression results support the hypothesis that there is just one signal associated with disease at this locus. Haplotype-based analysis of the tag SNPs provided positive evidence of association with typhoid (posterior probability 0.821). The analysis highlighted a low-risk cluster of haplotypes that each carry the minor allele of T1 or T7, but not both, and otherwise carry the combination of alleles *12122*1111 at T1–T11, further supporting the one associated signal hypothesis. Finally, individuals that carry the typhoid fever protective haplotype *12122*1111 also produce a relatively low TNF-α response to LPS.  相似文献   

12.
Adiponectin gene haplotype is associated with preeclampsia   总被引:2,自引:0,他引:2  
We determined whether the polymorphism of the gene encoding adiponectin contributes to susceptibility to preeclampsia. The study involved 133 Finnish women with preeclampsia and 245 healthy control subjects. All women were genotyped for two single nucleotide polymorphisms (SNPs), SNP45 in exon 2 and SNP276 in intron 2, in the adiponectin gene. Chi2 analysis was used to assess genotype and allele frequency differences between the preeclamptic and control groups. In addition, the pair of loci haplotype analysis, using the expectation-maximization (EM) algorithm, was used to examine the estimated haplotype frequencies of the two SNPs, among the two groups. The TT genotype versus the pooled G genotypes in SNP276 was associated with protection against preeclampsia (p = 0.012) at an odds ratio of 0.27 (95% confidence interval [CI]: 0.09-0.80). Also the genotype and allele frequency distributions of SNP276 differed significantly between the preeclampsia group and the control group (p = 0.035 and p = 0.043, respectively). Single-point genotype and allele distributions in SNP45 of the adiponectin gene were not statistically different between the groups. In the haplotype estimation analysis, the pooled G haplotypes versus the TT haplotype were significantly overrepresented in the preeclampsia group (p = 0.042 +/- 0.005). Polymorphisms of the adiponectin gene show a weak, but statistically significant, haplotype association with susceptibility to preeclampsia in Finnish women.  相似文献   

13.
Knowledge of human haplotype structure has important implications for strategies of disease-gene mapping and for understanding human evolutionary history. Many attributes of SNPs and haplotypes appear to exhibit highly nonrandom behavior, suggesting past operation of selection or other nonneutral forces. We report the exceptional abundance of a particular haplotype pattern in which two high-frequency haplotypes have different alleles at every SNP site (hence the name "yin yang haplotypes"). Analysis of common haplotypes in 62 random genomic loci and 85 gene coding regions in humans shows that the proportion of the genome spanned by yin yang haplotypes is 75%-85%. Population data of 28 genomic loci in Drosophila melanogaster reveal a similar pattern. The high recurrence (>/=85%) of these haplotype patterns in four distinct human populations suggests that the yin yang haplotypes are likely to predate the African diaspora. The pattern initially appeared to suggest deep population splitting or maintenance of ancient lineages by selection; however, coalescent simulation reveals that the yin yang phenomenon can be explained by strictly neutral evolution in a well-mixed population.  相似文献   

14.
Recent studies have revealed that linkage disequilibrium (LD) patterns vary across the human genome with some regions of high LD interspersed with regions of low LD. Such LD patterns make it possible to select a set of single nucleotide polymorphism (SNPs; tag SNPs) for genome-wide association studies. We have developed a suite of computer programs to analyze the block-like LD patterns and to select the corresponding tag SNPs. Compared to other programs for haplotype block partitioning and tag SNP selection, our program has several notable features. First, the dynamic programming algorithms implemented are guaranteed to find the block partition with minimum number of tag SNPs for the given criteria of blocks and tag SNPs. Second, both haplotype data and genotype data from unrelated individuals and/or from general pedigrees can be analyzed. Third, several existing measures/criteria for haplotype block partitioning and tag SNP selection have been implemented in the program. Finally, the programs provide flexibility to include specific SNPs (e.g. non-synonymous SNPs) as tag SNPs. AVAILABILITY: The HapBlock program and its supplemental documents can be downloaded from the website http://www.cmb.usc.edu/~msms/HapBlock.  相似文献   

15.

Background

The adequacy of association studies for complex diseases depends critically on the existence of linkage disequilibrium (LD) between functional alleles and surrounding SNP markers.

Results

We examined the patterns of LD and haplotype distribution in eight candidate genes for osteoporosis and/or obesity using 31 SNPs in 1,873 subjects. These eight genes are apolipoprotein E (APOE), type I collagen α1 (COL1A1), estrogen receptor-α (ER-α), leptin receptor (LEPR), parathyroid hormone (PTH)/PTH-related peptide receptor type 1 (PTHR1), transforming growth factor-β1 (TGF-β1), uncoupling protein 3 (UCP3), and vitamin D (1,25-dihydroxyvitamin D3) receptor (VDR). Yin yang haplotypes, two high-frequency haplotypes composed of completely mismatching SNP alleles, were examined. To quantify LD patterns, two common measures of LD, D' and r2, were calculated for the SNPs within the genes. The haplotype distribution varied in the different genes. Yin yang haplotypes were observed only in PTHR1 and UCP3. D' ranged from 0.020 to 1.000 with the average of 0.475, whereas the average r2 was 0.158 (ranging from 0.000 to 0.883). A decay of LD was observed as the intermarker distance increased, however, there was a great difference in LD characteristics of different genes or even in different regions within gene.

Conclusion

The differences in haplotype distributions and LD patterns among the genes underscore the importance of characterizing genomic regions of interest prior to association studies.  相似文献   

16.
The haplotype block structure of SNP variation in human DNA has been demonstrated by several recent studies. The presence of haplotype blocks can be used to dramatically increase the statistical power of genetic mapping. Several criteria have already been proposed for identifying these blocks, all of which require haplotypes as input. We propose a comprehensive statistical model of haplotype block variation and show how the parameters of this model can be learned from haplotypes and/or unphased genotype data. Using real-world SNP data, we demonstrate that our approach can be used to resolve genotypes into their constituent haplotypes with greater accuracy than previously known methods.  相似文献   

17.
Single nucleotide polymorphisms (SNPs) are the most abundant form of genetic variations amongst species. With the genome‐wide SNP discovery, many genome‐wide association studies are likely to identify multiple genetic variants that are associated with complex diseases. However, genotyping all existing SNPs for a large number of samples is still challenging even though SNP arrays have been developed to facilitate the task. Therefore, it is essential to select only informative SNPs representing the original SNP distributions in the genome (tag SNP selection) for genome‐wide association studies. These SNPs are usually chosen from haplotypes and called haplotype tag SNPs (htSNPs). Accordingly, the scale and cost of genotyping are expected to be largely reduced. We introduce binary particle swarm optimization (BPSO) with local search capability to improve the prediction accuracy of STAMPA. The proposed method does not rely on block partitioning of the genomic region, and consistently identified tag SNPs with higher prediction accuracy than either STAMPA or SVM/STSA. We compared the prediction accuracy and time complexity of BPSO to STAMPA and an SVM‐based (SVM/STSA) method using publicly available data sets. For STAMPA and SVM/STSA, BPSO effective improved prediction accuracy for smaller and larger scale data sets. These results demonstrate that the BPSO method selects tag SNP with higher accuracy no matter the scale of data sets is used. © 2009 American Institute of Chemical Engineers Biotechnol. Prog., 2010  相似文献   

18.
Recent studies have shown that the human genome has a haplotype block structure such that it can be decomposed into large blocks with high linkage disequilibrium (LD) and relatively limited haplotype diversity, separated by short regions of low LD. One of the practical implications of this observation is that only a small fraction of all the single-nucleotide polymorphisms (SNPs) (referred as "tag SNPs") can be chosen for mapping genes responsible for human complex diseases, which can significantly reduce genotyping effort, without much loss of power. Algorithms have been developed to partition haplotypes into blocks with the minimum number of tag SNPs for an entire chromosome. In practice, investigators may have limited resources, and only a certain number of SNPs can be genotyped. In the present article, we first formulate this problem as finding a block partition with a fixed number of tag SNPs that can cover the maximal percentage of the whole genome, and we then develop two dynamic programming algorithms to solve this problem. The algorithms are sufficiently flexible to permit knowledge of functional polymorphisms to be considered. We apply the algorithms to a data set of SNPs on human chromosome 21, combining the information of coding and noncoding regions. We study the density of SNPs in intergenic regions, introns, and exons, and we find that the SNP density in intergenic regions is similar to that in introns and is higher than that in exons, results that are consistent with previous studies. We also calculate the distribution of block break points in intergenic regions, genes, exons, and coding regions and do not find any significant differences.  相似文献   

19.
Phang BH  Chua HW  Li H  Linn YC  Sabapathy K 《PloS one》2011,6(1):e15320
Multiple single nucleotide polymorphisms (SNPs) have been identified in the tumor suppressor gene p53, though the relevance of many of them is unclear. Some of them are also differentially distributed in various ethnic populations, suggesting selective functionality. We have therefore sequenced all exons and flanking regions of p53 from the Singaporean Chinese population and report here the characterization of some novel and uncharacterized SNPs - four in intron 1 (nucleotide positions 8759/10361/10506/11130), three in intron 3 (11968/11969/11974) and two in the 3'UTR (19168/19514). Allelic frequencies were determined for all these and some known SNPs, and were compared in a limited scale to leukemia and lung cancer patient samples. Intron 2 (11827) and 7 (14181/14201) SNPs were found to have a high minor allele frequency of between 26-47%, in contrast to the lower frequencies found in the US population, but similar in trend to the codon 72 polymorphism (SNP12139) that shows a distribution pattern correlative with latitude. Several of the SNPs were linked, such as those in introns 1, 3 and 7. Most interestingly, we noticed the co-segregation of the intron 2 and the codon 72 SNPs, the latter which has been shown to be expressed in an allele-specific manner, suggesting possible regulatory cross-talk. Association analysis indicated that the T/G alleles in both the co-segregating intron 7 SNPs and a 4tagSNP haplotype was strongly associated increased susceptibility to lung cancer in non-smoker females [OR: 1.97 (1.32, 3.394)]. These data together demonstrate high SNP diversity in p53 gene between different populations, highlighting ethnicity-based differences, and their association with cancer risk.  相似文献   

20.
Analysis of data on 1000 Holstein-Friesian bulls genotyped for 15,036 single-nucleotide polymorphisms (SNPs) has enabled genomewide identification of haplotype blocks and tag SNPs. A final subset of 9195 SNPs in Hardy-Weinberg equilibrium and mapped on autosomes on the bovine sequence assembly (release Btau 3.1) was used in this study. The average intermarker spacing was 251.8 kb. The average minor allele frequency (MAF) was 0.29 (0.05-0.5). Following recent precedents in human HapMap studies, a haplotype block was defined where 95% of combinations of SNPs within a region are in very high linkage disequilibrium. A total of 727 haplotype blocks consisting of > or =3 SNPs were identified. The average block length was 69.7 +/- 7.7 kb, which is approximately 5-10 times larger than in humans. These blocks comprised a total of 2964 SNPs and covered 50,638 kb of the sequence map, which constitutes 2.18% of the length of all autosomes. A set of tag SNPs, which will be useful for further fine-mapping studies, has been identified. Overall, the results suggest that as many as 75,000-100,000 tag SNPs would be needed to track all important haplotype blocks in the bovine genome. This would require approximately 250,000 SNPs in the discovery phase.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号