首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Many investigators are now using haplotype-tagging single-nucleotide polymorphism (htSNPs) as a way of screening regions of the genome for association with disease. A common approach is to genotype htSNPs in a study population and to use this information to draw inferences about each individual's haplotypic makeup, including SNPs that were not directly genotyped. To test the validity of this approach, we simulated the exercise of typing htSNPs in a large sample of individuals and compared the true and inferred haplotypes. The accuracy of haplotype inference varied, depending on the method of selecting htSNPs, the linkage-disequilibrium structure of the region, and the amount of missing data. At the stage of selection of htSNPs, haplotype-block-based methods required a larger number of htSNPs than did unstructured methods but gave lower levels of error in haplotype inference, particularly when there was a significant amount of missing data. We present a Web-based utility that allows investigators to compare the likely error rates of different sets of htSNPs and to arrive at an economical set of htSNPs that provides acceptable levels of accuracy in haplotype inference.  相似文献   

2.
To date, only the H1 MAPT haplotype has been consistently associated with risk of developing the neurodegenerative disease progressive supranuclear palsy (PSP). We hypothesized that additional genetic loci may be involved in conferring risk of PSP that could be identified through a pooling-based genomewide association study of >500,000 SNPs. Candidate SNPs with large differences in allelic frequency were identified by ranking all SNPs by their probe-intensity difference between cohorts. The MAPT H1 haplotype was strongly detected by this methodology, as was a second major locus on chromosome 11p12-p11 that showed evidence of association at allelic (P<.001), genotypic (P<.001), and haplotypic (P<.001) levels and was narrowed to a single haplotype block containing the DNA damage-binding protein 2 (DDB2) and lysosomal acid phosphatase 2 (ACP2) genes. Since DNA damage and lysosomal dysfunction have been implicated in aging and neurodegenerative processes, both genes are viable candidates for conferring risk of disease.  相似文献   

3.
Common genetic polymorphism may explain a portion of the heritable risk for common diseases, so considerable effort has been devoted to finding and typing common single-nucleotide polymorphisms (SNPs) in the human genome. Many SNPs show correlated genotypes, or linkage disequilibrium (LD), suggesting that only a subset of all SNPs (known as tagging SNPs, or tagSNPs) need to be genotyped for disease association studies. Based on the genetic differences that exist among human populations, most tagSNP sets are defined in a single population and applied only in populations that are closely related. To improve the efficiency of multi-population analyses, we have developed an algorithm called MultiPop-TagSelect that finds a near-minimal union of population-specific tagSNP sets across an arbitrary number of populations. We present this approach as an extension of LD-select, a tagSNP selection method that uses a greedy algorithm to group SNPs into bins based on their pairwise association patterns, although the MultiPop-TagSelect algorithm could be used with any SNP tagging approach that allows choices between nearly equivalent SNPs. We evaluate the algorithm by considering tagSNP selection in candidate-gene resequencing data and lower density whole-chromosome data. Our analysis reveals that an exhaustive search is often intractable, while the developed algorithm can quickly and reliably find near-optimal solutions even for difficult tagSNP selection problems. Using populations of African, Asian, and European ancestry, we also show that an optimal multi-population set of tagSNPs can be substantially smaller (up to 44%) than a typical set obtained through independent or sequential selection.Electronic Supplementary Material Supplementary material is available for this article at and is accessible for authorized users.  相似文献   

4.
Association of STAT4 with rheumatoid arthritis in the Korean population   总被引:3,自引:0,他引:3  
A recent study in the North American White population has documented the association of a common STAT4 haplotype (tagged by rs7574865) with risk for rheumatoid arthritis (RA) and systemic lupus erythematosus. To replicate this finding in the Korean population, we performed a case-control association study. We genotyped 67 single nucleotide polymorphisms (SNPs) within the STAT1 and STAT4 regions in 1123 Korean patients with RA and 1008 ethnicity-matched controls. The most significant four risk SNPs (rs11889341, rs7574865, rs8179673, and rs10181656 located within the third intron of STAT4) among 67 SNPs are identical with those in the North American study. All four SNPs have modest risk for RA susceptibility (odds ratio 1.21-1.27). A common haplotype defined by these markers (TTCG) carries significant risk for RA in Koreans [34 percent versus 28 percent, P=0.0027, OR (95 percent CI)=1.33 (1.10-1.60)]. By logistic regression analysis, this haplotype is an independent risk factor in addition to the classical shared epitope alleles at the HLA-DRB1 locus. There were no significant associations with age of disease onset, radiographic progression, or serologic status using either allelic or haplotypic analysis. Unlike several other risk genes for RA such as PTPN22, PADI4, and FCRL3, a haplotype of the STAT4 gene shows consistent association with RA susceptibility across Whites and Asians, suggesting that this risk haplotype predates the divergence of the major racial groups.  相似文献   

5.
Liu W  Zhao W  Chase GA 《Human heredity》2006,61(1):31-44
OBJECTIVE: Single nucleotide polymorphisms (SNPs) serve as effective markers for localizing disease susceptibility genes, but current genotyping technologies are inadequate for genotyping all available SNP markers in a typical linkage/association study. Much attention has recently been paid to methods for selecting the minimal informative subset of SNPs in identifying haplotypes, but there has been little investigation of the effect of missing or erroneous genotypes on the performance of these SNP selection algorithms and subsequent association tests using the selected tagging SNPs. The purpose of this study is to explore the effect of missing genotype or genotyping error on tagging SNP selection and subsequent single marker and haplotype association tests using the selected tagging SNPs. METHODS: Through two sets of simulations, we evaluated the performance of three tagging SNP selection programs in the presence of missing or erroneous genotypes: Clayton's diversity based program htstep, Carlson's linkage disequilibrium (LD) based program ldSelect, and Stram's coefficient of determination based program tagsnp.exe. RESULTS: When randomly selected known loci were relabeled as 'missing', we found that the average number of tagging SNPs selected by all three algorithms changed very little and the power of subsequent single marker and haplotype association tests using the selected tagging SNPs remained close to the power of these tests in the absence of missing genotype. When random genotyping errors were introduced, we found that the average number of tagging SNPs selected by all three algorithms increased. In data sets simulated according to the haplotype frequecies in the CYP19 region, Stram's program had larger increase than Carlson's and Clayton's programs. In data sets simulated under the coalescent model, Carlson's program had the largest increase and Clayton's program had the smallest increase. In both sets of simulations, with the presence of genotyping errors, the power of the haplotype tests from all three programs decreased quickly, but there was not much reduction in power of the single marker tests. CONCLUSIONS: Missing genotypes do not seem to have much impact on tagging SNP selection and subsequent single marker and haplotype association tests. In contrast, genotyping errors could have severe impact on tagging SNP selection and haplotype tests, but not on single marker tests.  相似文献   

6.
The present study was undertaken to replicate an association between the PTGS2/PLA2G4A locus and schizophrenia among a Chinese population. We recruited 168 Chinese parent-offspring trios of Han descent, consisting of fathers, mothers and affected offspring with schizophrenia. Of 3 informative SNPs genotyped, no one showed allelic association with schizophrenia; the haplotype analysis also failed to capture a haplotypic association with the illness. Because the frequencies of alleles and genotypes of SNPs analyzed differ in the Chinese population as compared with a British population that initially showed the genetic association between the PTGS2/PLA2G4A locus and schizophrenia, the ethnic background may be a major reason for poor replication of the initial finding.  相似文献   

7.
Ex situ germplasm collections seek to conserve maximum genetic diversity in a small number of samples. Geographic and environmental information have long been treated as surrogate measures of genetic diversity, proposed to be useful for increasing allelic diversity of collections. We examine the effect of maximizing geographic and environmental diversity on the retention of distinct haplotype blocks in germplasm subsets, using three species with extensive genomewide genotypic data. We show that maximizing diversity in the surrogate measures produces subsets with uneven representation of haplotypic diversity across the genome. Some regions are well-conserved, exhibiting high haplotypic diversity, while others are poorly-conserved and contain significantly less haplotypic diversity than would be obtained via random sampling. In two of three species, poorly-conserved genomic regions were enriched in regulatory genes which, as a class, contribute to phenotypic variation. The specific genes affected varied by species but, overall, haplotypic diversity was poorly-conserved at genes controlling?~?10% of major molecular functions and biological processes. While this study was limited to three exemplar species, we find little evidence to support continued use of geographic or environmental surrogates for ex situ conservation activities attempting to capture maximum genomewide allelic diversity. Although geographic and environmental diversity have proven to be reliable predictors of allele frequency differences and ecotypic differentiation across species ranges, they appear to be poor predictors of allelic diversity per se, offering little opportunity to enrich collections for haplotypic diversity overall, and ample opportunity to bias the conservation of important functional genetic variation. We propose a bioinformatic bridge between haplotypic diversity and the potential phenotypic diversity residing in collections using the Gene Ontology.  相似文献   

8.
Chronic obstructive pulmonary disease (COPD) is a complex human disease likely influenced by multiple genes, cigarette smoking, and gene-by-smoking interactions, but only severe alpha 1-antitrypsin deficiency is a proven genetic risk factor for COPD. Prior linkage analyses in the Boston Early-Onset COPD Study have demonstrated significant linkage to a key intermediate phenotype of COPD on chromosome 2q. We integrated results from murine lung development and human COPD gene-expression microarray studies with human COPD linkage results on chromosome 2q to prioritize candidate-gene selection, thus identifying SERPINE2 as a positional candidate susceptibility gene for COPD. Immunohistochemistry demonstrated expression of serpine2 protein in mouse and human adult lung tissue. In family-based association testing of 127 severe, early-onset COPD pedigrees from the Boston Early-Onset COPD Study, we observed significant association with COPD phenotypes and 18 single-nucleotide polymorphisms (SNPs) in the SERPINE2 gene. Association of five of these SNPs with COPD was replicated in a case-control analysis, with cases from the National Emphysema Treatment Trial and controls from the Normative Aging Study. Family-based and case-control haplotype analyses supported similar regions of association within the SERPINE2 gene. When significantly associated SNPs in these haplotypic regions were included as covariates in linkage models, LOD score attenuation was observed most markedly in a smokers-only linkage model (LOD 4.41, attenuated to 1.74). After the integration of murine and human microarray data to inform candidate-gene selection, we observed significant family-based association and independent replication of association in a case-control study, suggesting that SERPINE2 is a COPD-susceptibility gene and is likely influenced by gene-by-smoking interaction.  相似文献   

9.
We have implemented a technique combining allele-specific PCR (AS-PCR) and denaturing high-performance liquid chromatography (DHPLC) to identify new polymorphic variants within an intergenic region in the beta-globin cluster. This technique is applicable to the detection of new variants in genomic regions where variation is apportioned into distinct classes of haplotype. Duplexes for DHPLC analysis were created by denaturation and re-annealing of a mixture of two AS-PCR products of known and unknown sequence from the same haplotypic class, permitting detection of new haplotypes in each class. A 454bp fragment 3.5kb 5' to the human delta-globin gene, which may have a gene regulatory function, was analysed in 840 chromosomes from a global sampling of human populations using this method. Two divergent haplotypes were found to predominate in all populations studied, possibly as a result of balancing selection.  相似文献   

10.
Genome-wide association studies focused on searching genes responsible for several diseases. Admixture mapping studies proposed a more efficient alternative capable of detecting polymorphisms contributing with a small effect on the disease risk. This method focuses on the higher values of linkage disequilibrium in admixed populations. To test this, we analyzed 10 genomic regions previously defined as related with colorectal cancer among nine populations and studied the variation pattern of haplotypic structures and heterozygosity values on seven categories of SNPs. Both analyses showed differences among chromosomal regions and studied populations. Admixed Latin-American samples generally show intermediate values. Heterozygosity of the SNPs grouped in categories varies more in each gene than in each population. African related populations have more blocks per chromosomal region, coherently with their antiquity. In sum, some similarities were found among Latin American populations, but each chromosomal region showed a particular behavior, despite the fact that the study refers to genes and regions related with one particular complex disease. This study strongly suggests the necessity of developing statistical methods to deal with di- or tri-hybrid populations, as well as to carefully analyze the different historic and demographic scenarios, and the different characteristics of particular chromosomal regions and evolutionary forces.  相似文献   

11.
Effectiveness of computational methods in haplotype prediction   总被引:11,自引:0,他引:11  
Haplotype analysis has been used for narrowing down the location of disease-susceptibility genes and for investigating many population processes. Computational algorithms have been developed to estimate haplotype frequencies and to predict haplotype phases from genotype data for unrelated individuals. However, the accuracy of such computational methods needs to be evaluated before their applications can be advocated. We have experimentally determined the haplotypes at two loci, the N-acetyltransferase 2 gene ( NAT2, 850 bp, n=81) and a 140-kb region on chromosome X ( n=77), each consisting of five single nucleotide polymorphisms (SNPs). We empirically evaluated and compared the accuracy of the subtraction method, the expectation-maximization (EM) method, and the PHASE method in haplotype frequency estimation and in haplotype phase prediction. Where there was near complete linkage disequilibrium (LD) between SNPs (the NAT2 gene), all three methods provided effective and accurate estimates for haplotype frequencies and individual haplotype phases. For a genomic region in which marked LD was not maintained (the chromosome X locus), the computational methods were adequate in estimating overall haplotype frequencies. However, none of the methods was accurate in predicting individual haplotype phases. The EM and the PHASE methods provided better estimates for overall haplotype frequencies than the subtraction method for both genomic regions.  相似文献   

12.
Psoriasis is a common skin disorder of multifactorial origin. Genomewide scans for disease susceptibility have repeatedly demonstrated the existence of a major locus, PSORS1 (psoriasis susceptibility 1), contained within the major histocompatibility complex (MHC), on chromosome 6p21. Subsequent refinement studies have highlighted linkage disequilibrium (LD) with psoriasis, along a 150-kb segment that includes at least three candidate genes (encoding human leukocyte antigen-C [HLA-C], alpha-helix-coiled-coil-rod homologue, and corneodesmosin), each of which has been shown to harbor disease-associated alleles. However, the boundaries of the minimal PSORS1 region remain poorly defined. Moreover, interpretations of allelic association with psoriasis are compounded by limited insight of LD conservation within MHC class I interval. To address these issues, we have pursued a high-resolution genetic characterization of the PSORS1 locus. We resequenced genomic segments along a 220-kb region at chromosome 6p21 and identified a total of 119 high-frequency SNPs. Using 59 SNPs (18 coding and 41 noncoding SNPs) whose position was representative of the overall marker distribution, we genotyped a data set of 171 independently ascertained parent-affected offspring trios. Family-based association analysis of this cohort highlighted two SNPs (n.7 and n.9) respectively lying 7 and 4 kb proximal to HLA-C. These markers generated highly significant evidence of disease association (P<10-9), several orders of magnitude greater than the observed significance displayed by any other SNP that has previously been associated with disease susceptibility. This observation was replicated in a Gujarati Indian case/control data set. Haplotype-based analysis detected overtransmission of a cluster of chromosomes, which probably originated by ancestral mutation of a common disease-bearing haplotype. The only markers exclusive to the overtransmitted chromosomes are SNPs n.7 and n.9, which define a 10-kb PSORS1 core risk haplotype. These data demonstrate the power of SNP haplotype-based association analyses and provide high-resolution dissection of genetic variation across the PSORS1 interval, the major susceptibility locus for psoriasis.  相似文献   

13.
Haplotypic analysis of the TNF locus by association efficiency and entropy   总被引:5,自引:0,他引:5  

Background

To understand the causal basis of TNF associations with disease, it is necessary to understand the haplotypic structure of this locus. We genotyped 12 single-nucleotide polymorphisms (SNPs) distributed over 4.3 kilobases in 296 healthy, unrelated Gambian and Malawian adults. We generated 592 high-quality haplotypes by integrating family- and population-based reconstruction methods.

Results

We found 32 different haplotypes, of which 13 were shared between the two populations. Both populations were haplotypically diverse (gene diversity = 0.80, Gambia; 0.85, Malawi) and significantly differentiated (p < 10-5 by exact test). More than a quarter of marker pairs showed evidence of intragenic recombination (29% Gambia; 27% Malawi). We applied two new methods of analyzing haplotypic data: association efficiency analysis (AEA), which describes the ability of each SNP to detect every other SNP in a case-control scenario; and the entropy maximization method (EMM), which selects the subset of SNPs that most effectively dissects the underlying haplotypic structure. AEA revealed that many SNPs in TNF are poor markers of each other. The EMM showed that 8 of 12 SNPs (Gambia) and 7 of 12 SNPs (Malawi) are required to describe 95% of the haplotypic diversity.

Conclusions

The TNF locus in the Gambian and Malawi sample is haplotypically diverse and has a rich history of intragenic recombination. As a consequence, a large proportion of TNF SNPs must be typed to detect a disease-modifying SNP at this locus. The most informative subset of SNPs to genotype differs between the two populations.  相似文献   

14.

Background

The selection of markers in association studies can be informed through the use of haplotype blocks. Recent reports have determined the genomic architecture of chromosomal segments through different haplotype block definitions based on linkage disequilibrium (LD) measures or haplotype diversity criteria. The relative applicability of distinct block definitions to association studies, however, remains unclear. We compared different block definitions in 6.1 Mb of chromosome 17q in 189 unrelated healthy individuals. Using 137 single nucleotide polymorphisms (SNPs), at a median spacing of 15.5 kb, we constructed haplotype block maps using published methods and additional methods we have developed. Haplotype tagging SNPs (htSNPs) were identified for each map.

Results

Blocks were found to be shorter and coverage of the region limited with methods based on LD measures, compared to the method based on haplotype diversity. Although the distribution of blocks was highly variable, the number of SNPs that needed to be typed in order to capture the maximum number of haplotypes was consistent.

Conclusion

For the marker spacing used in this study, choice of block definition is not important when used as an initial screen of the region to identify htSNPs. However, choice of block definition has consequences for the downstream interpretation of association study results.  相似文献   

15.
There has been great interest in the prospects of using single-nucleotide polymorphisms (SNPs) in the search for complex disease genes, and several initiatives devoted to the identification and mapping of SNPs throughout the human genome are currently underway. However, actual data investigating the use of SNPs for identification of complex disease genes are scarce. To begin to look at issues surrounding the use of SNPs in complex disease studies, we have initiated a collaborative SNP mapping study around APOE, the well-established susceptibility gene for late-onset Alzheimer disease (AD). Sixty SNPs in a 1.5-Mb region surrounding APOE were genotyped in samples of unrelated cases of AD, in controls, and in families with AD. Standard tests were conducted to look for association of SNP alleles with AD, in cases and controls. We also used family-based association analyses, including recently developed methods to look for haplotype association. Evidence of association (P相似文献   

16.
Positional cloning by linkage disequilibrium   总被引:6,自引:0,他引:6       下载免费PDF全文
Recently, metric linkage disequilibrium (LD) maps that assign an LD unit (LDU) location for each marker have been developed (Maniatis et al. 2002). Here we present a multiple pairwise method for positional cloning by LD within a composite likelihood framework and investigate the operating characteristics of maps in physical units (kb) and LDU for two bodies of data (Daly et al. 2001; Jeffreys et al. 2001) on which current ideas of blocks are based. False-negative indications of a disease locus (type II error) were examined by selecting one single-nucleotide polymorphism (SNP) at a time as causal and taking its allelic count (0, 1, or 2, for the three genotypes) as a pseudophenotype, Y. By use of regression and correlation, association between every pseudophenotype and the allelic count of each SNP locus (X) was based on an adaptation of the Malecot model, which includes a parameter for location of the putative gene. By expressing locations in kb or LDU, greater power for localization was observed when the LDU map was fitted. The efficiency of the kb map, relative to the LDU map, to describe LD varied from a maximum of 0.87 to a minimum of 0.36, with a mean of 0.62. False-positive indications of a disease locus (type I error) were examined by simulating an unlinked causal SNP and the allele count was used as a pseudophenotype. The type I error was in good agreement with Wald's likelihood theorem for both metrics and all models that were tested. Unlike tests that select only the most significant marker, haplotype, or haploset, these methods are robust to large numbers of markers in a candidate region. Contrary to predictions from tagging SNPs that retain haplotype diversity, the sample with smaller size but greater SNP density gave less error. The locations of causal SNPs were estimated with the same precision in blocks and steps, suggesting that block definition may be less useful than anticipated for mapping a causal SNP. These results provide a guide to efficient positional cloning by SNPs and a benchmark against which the power of positional cloning by haplotype-based alternatives may be measured.  相似文献   

17.
Optimal selection of SNP markers for disease association studies   总被引:5,自引:0,他引:5  
Genetic association studies with population samples hold the promise of uncovering the susceptibility genes underlying the heritability of complex or common disease. Most association studies rely on the use of surrogate markers, single-nucleotide polymorphism (SNP) being the most suitable due to their abundance and ease of scoring. SNP marker selection is aimed to increase the chances that at least one typed SNP would be in linkage disequilibrium (LD) with the disease causative variant, while at the same time controlling the cost of the study in terms of the number of markers genotyped and samples. Empirical studies reporting block-like segments in the genome with high LD and low haplotype diversity have motivated a marker selection strategy whereby subsets of SNPs that 'tag' the common haplotypes of a region are picked for genotyping, avoiding typing redundant SNPs. Based on these initial observations, a plethora of 'tagging' algorithms for selecting minimum informative subsets of SNPs has recently appeared in the literature. These differ mostly in two major aspects: the quality or correlation measure used to define tagging and the algorithm used for the minimization of the final number of tagging SNPs. In this review we describe the available tagging algorithms utilizing a 3-step unifying framework, point out their methodological and conceptual differences, and make an assessment of their assumptions, performance, and scalability.  相似文献   

18.
19.
The definition of haplotype blocks of single-nucleotide polymorphisms (SNPs) has been proposed so that the haplotypes can be used as markers in association studies and to efficiently describe human genetic variation. The International Haplotype Map (HapMap) project to construct a comprehensive catalog of haplotypic variation in humans is underway. However, a number of factors have already been shown to influence the definition of blocks, including the population studied and the sample SNP density. Here, we examine the effect that marker selection has on the definition of blocks and the pattern of haplotypes by using comparable but complementary SNP sets and a number of block definition methods in various genomic regions and populations that were provided by the Encyclopedia of DNA Elements (ENCODE) project. We find that the chosen SNP set has a profound effect on the block-covered sequence and block borders, even at high marker densities. Our results question the very concept of discrete haplotype blocks and the possibility of generalizing block findings from the HapMap project. We comparatively apply the block-free tagging-SNP approach and discuss both the haplotype approach and the tagging-SNP approach as means to efficiently catalog genetic variation.  相似文献   

20.
Spatially varying selection triggers differential adaptation of local populations. Here, we mined the determinants of local adaptation at the genomewide scale in the two closest maize wild relatives, the teosintes Zea mays ssp parviglumis and ssp. mexicana. We sequenced 120 individuals from six populations: two lowland, two intermediate and two highland populations sampled along two altitudinal gradients. We detected 8 479 581 single nucleotide polymorphisms (SNPs) covered in the six populations with an average sequencing depth per site per population ranging from 17.0× to 32.2×. Population diversity varied from 0.10 to 0.15, and linkage disequilibrium decayed very rapidly. We combined two differentiation‐based methods, and correlation of allele frequencies with environmental variables to detect outlier SNPs. Outlier SNPs displayed significant clustering. From clusters, we identified 47 candidate regions. We further modified a haplotype‐based method to incorporate genotype uncertainties in haplotype calling, and applied it to candidate regions. We retrieved evidence for selection at the haplotype level in 53% of our candidate regions, and in 70% of the cases the same haplotype was selected in the two lowland or the two highland populations. We recovered a candidate region located within a previously characterized inversion on chromosome 1. We found evidence of a soft sweep at a locus involved in leaf macrohair variation. Finally, our results revealed frequent colocalization between our candidate regions and loci involved in the variation of traits associated with plant–soil interactions such as root morphology, aluminium and low phosphorus tolerance. Soil therefore appears to be a major driver of local adaptation in teosintes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号