首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 23 毫秒
1.
For many genome-wide association (GWA) studies individually genotyping one million or more SNPs provides a marginal increase in coverage at a substantial cost. Much of the information gained is redundant due to the correlation structure inherent in the human genome. Pooling-based GWA studies could benefit significantly by utilizing this redundancy to reduce noise, improve the accuracy of the observations and increase genomic coverage. We introduce a measure of correlation between individual genotyping and pooling, under the same framework that r(2) provides a measure of linkage disequilibrium (LD) between pairs of SNPs. We then report a new non-haplotype multimarker multi-loci method that leverages the correlation structure between SNPs in the human genome to increase the efficacy of pooling-based GWA studies. We first give a theoretical framework and derivation of our multimarker method. Next, we evaluate simulations using this multimarker approach in comparison to single marker analysis. Finally, we experimentally evaluate our method using different pools of HapMap individuals on the Illumina 450S Duo, Illumina 550K and Affymetrix 5.0 platforms for a combined total of 1 333 631 SNPs. Our results show that use of multimarker analysis reduces noise specific to pooling-based studies, allows for efficient integration of multiple microarray platforms and provides more accurate measures of significance than single marker analysis. Additionally, this approach can be extended to allow for imputing the association significance for SNPs not directly observed using neighboring SNPs in LD. This multimarker method can now be used to cost-effectively complete pooling-based GWA studies with multiple platforms across over one million SNPs and to impute neighboring SNPs weighted for the loss of information due to pooling.  相似文献   

2.
Stringer S  Wray NR  Kahn RS  Derks EM 《PloS one》2011,6(11):e27964
Complex diseases are often highly heritable. However, for many complex traits only a small proportion of the heritability can be explained by observed genetic variants in traditional genome-wide association (GWA) studies. Moreover, for some of those traits few significant SNPs have been identified. Single SNP association methods test for association at a single SNP, ignoring the effect of other SNPs. We show using a simple multi-locus odds model of complex disease that moderate to large effect sizes of causal variants may be estimated as relatively small effect sizes in single SNP association testing. This underestimation effect is most severe for diseases influenced by numerous risk variants. We relate the underestimation effect to the concept of non-collapsibility found in the statistics literature. As described, continuous phenotypes generated with linear genetic models are not affected by this underestimation effect. Since many GWA studies apply single SNP analysis to dichotomous phenotypes, previously reported results potentially underestimate true effect sizes, thereby impeding identification of true effect SNPs. Therefore, when a multi-locus model of disease risk is assumed, a multi SNP analysis may be more appropriate.  相似文献   

3.
Multiple sclerosis is a chronic inflammatory demyelinating disease of the central nervous system with an important genetic component and strongest association driven by the HLA genes. We performed a pooling-based genome-wide association study of 500,000 SNPs in order to find new loci associated with the disease. After applying several criteria, 320 SNPs were selected from the microarrays and individually genotyped in a first and independent Spanish Caucasian replication cohort. The 8 most significant SNPs validated in this cohort were also genotyped in a second US Caucasian replication cohort for confirmation. The most significant association was obtained for SNP rs3129934, which neighbors the HLA-DRB/DQA loci and validates our pooling-based strategy. The second strongest association signal was found for SNP rs1327328, which resides in an unannotated region of chromosome 13 but is in linkage disequilibrium with nearby functional elements that may play important roles in disease susceptibility. This region of chromosome 13 has not been previously identified in MS linkage genome screens and represents a novel risk locus for the disease.  相似文献   

4.
The success of genome-wide association (GWA) studies for the detection of sequence variation affecting complex traits in human has spurred interest in the use of large-scale high-density single nucleotide polymorphism (SNP) genotyping for the identification of quantitative trait loci (QTL) and for marker-assisted selection in model and agricultural species. A cost-effective and efficient approach for the development of a custom genotyping assay interrogating 54,001 SNP loci to support GWA applications in cattle is described. A novel algorithm for achieving a compressed inter-marker interval distribution proved remarkably successful, with median interval of 37 kb and maximum predicted gap of <350 kb. The assay was tested on a panel of 576 animals from 21 cattle breeds and six outgroup species and revealed that from 39,765 to 46,492 SNP are polymorphic within individual breeds (average minor allele frequency (MAF) ranging from 0.24 to 0.27). The assay also identified 79 putative copy number variants in cattle. Utility for GWA was demonstrated by localizing known variation for coat color and the presence/absence of horns to their correct genomic locations. The combination of SNP selection and the novel spacing algorithm allows an efficient approach for the development of high-density genotyping platforms in species having full or even moderate quality draft sequence. Aspects of the approach can be exploited in species which lack an available genome sequence. The BovineSNP50 assay described here is commercially available from Illumina and provides a robust platform for mapping disease genes and QTL in cattle.  相似文献   

5.
Genome-wide association (GWA) studies are currently one of the most powerful tools in identifying disease-associated genes or variants. In typical GWA studies, single-nucleotide polymorphisms (SNPs) are often used as genetic makers. Therefore, it is critical to estimate the percentage of genetic variations which can be covered by SNPs through linkage disequilibrium (LD). In this study, we use the concept of haplotype blocks to evaluate the coverage of five SNP sets including the HapMap and four commercial arrays, for every exon in the human genome. We show that although some Chips can reach similar coverage as the HapMap, only about 50% of exons are completely covered by haplotype blocks of HapMap SNPs. We suggest further high-resolution genotyping methods are required, to provide adequate genome-wide power for identifying variants.  相似文献   

6.
Four custom Axiom genotyping arrays were designed for a genome-wide association (GWA) study of 100,000 participants from the Kaiser Permanente Research Program on Genes, Environment and Health. The array optimized for individuals of European race/ethnicity was previously described. Here we detail the development of three additional microarrays optimized for individuals of East Asian, African American, and Latino race/ethnicity. For these arrays, we decreased redundancy of high-performing SNPs to increase SNP capacity. The East Asian array was designed using greedy pairwise SNP selection. However, removing SNPs from the target set based on imputation coverage is more efficient than pairwise tagging. Therefore, we developed a novel hybrid SNP selection method for the African American and Latino arrays utilizing rounds of greedy pairwise SNP selection, followed by removal from the target set of SNPs covered by imputation. The arrays provide excellent genome-wide coverage and are valuable additions for large-scale GWA studies.  相似文献   

7.
Although high-density SNP genotyping platforms generate a momentum for detailed genome-wide association (GWA) studies, an offshoot is a new insight into population genetics. Here, we present an example in one of the best-known founder populations by scrutinizing ten distinct Finnish early- and late-settlement subpopulations. By determining genetic distances, homozygosity, and patterns of linkage disequilibrium, we demonstrate that population substructure, and even individual ancestry, is detectable at a very high resolution and supports the concept of multiple historical bottlenecks resulting from consecutive founder effects. Given that genetic studies are currently aiming at identifying smaller and smaller genetic effects, recognizing and controlling for population substructure even at this fine level becomes imperative to avoid confounding and spurious associations. This study provides an example of the power of GWA data sets to demonstrate stratification caused by population history even within a seemingly homogeneous population, like the Finns. Further, the results provide interesting lessons concerning the impact of population history on the genome landscape of humans, as well as approaches to identify rare variants enriched in these subpopulations.  相似文献   

8.
The success of genome-wide association studies (GWAS) to identify risk loci of complex diseases is now well-established. One persistent major hurdle is the cost of those studies, which make them beyond the reach of most research groups. Performing GWAS on pools of DNA samples may be an effective strategy to reduce the costs of these studies. In this study, we performed pooling-based GWAS with more than 550,000 SNPs in two case-control cohorts consisting of patients with Type II diabetes (T2DM) and with chronic rhinosinusitis (CRS). In the T2DM study, the results of the pooling experiment were compared to individual genotypes obtained from a previously published GWAS. TCF7L2 and HHEX SNPs associated with T2DM by the traditional GWAS were among the top ranked SNPs in the pooling experiment. This dataset was also used to refine the best strategy to correctly identify SNPs that will remain significant based on individual genotyping. In the CRS study, the top hits from the pooling-based GWAS located within ten kilobases of known genes were validated by individual genotyping of 1,536 SNPs. Forty-one percent (598 out of the 1,457 SNPs that passed quality control) were associated with CRS at a nominal P value of 0.05, confirming the potential of pooling-based GWAS to identify SNPs that differ in allele frequencies between two groups of subjects. Overall, our results demonstrate that a pooling experiment on high-density genotyping arrays can accurately determine the minor allelic frequency as compared to individual genotyping and produce a list of top ranked SNPs that captures genuine allelic differences between a group of cases and controls. The low cost associated with a pooling-based GWAS clearly justifies its use in screening for genetic determinants of complex diseases. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

9.
Although they have demonstrated success in searching for common variants for complex diseases, genome-wide association (GWA) studies are less successful in detecting rare genetic variants because of the poor statistical power of most of current methods. We developed a two-stage method that can apply to GWA studies for detecting rare variants. Here we report the results of applying this two-stage method to the Wellcome Trust Case Control Consortium (WTCCC) dataset that include seven complex diseases: bipolar disorder, cardiovascular disease, hypertension (HT), rheumatoid arthritis, Crohn’s disease, type 1 diabetes and type 2 diabetes (T2D). We identified 24 genes or regions that reach genome wide significance. Eight of them are novel and were not reported in the WTCCC study. The cumulative risk (or protective) haplotype frequency for each of the 8 genes or regions is small, being at most 11%. For each of the novel genes, the risk (or protective) haplotype set cannot be tagged by the common SNPs available in chips (r 2 < 0.32). The gene identified in HT was further replicated in the Framingham Heart Study, and is also significantly associated with T2D. Our analysis suggests that searching for rare genetic variants is feasible in current GWA studies and candidate gene studies, and the results can severe as guides to future resequencing studies to identify the underlying rare functional variants.  相似文献   

10.
Single nucleotide polymorphisms (SNPs) are single-base inheritable variations in a given and defined genetic location that occur in at least 1% of the population. SNPs are useful markers for genetic association studies in disease susceptibility or adverse drug reactions, in evolutionary studies and forensic science. Given the potential impact of SNPs, the biotechnology industry has focused on the development of high-throughput methods for SNP genotyping. Many highthroughput SNP genotyping technologies are currently available and many others are being patented recently. Each offers a unique combination of scale, accuracy, throughput and cost. In this review, we described some of the most important recent SNP genotyping methods and also recent patents associated with it.  相似文献   

11.
Dong C  Qian Z  Jia P  Wang Y  Huang W  Li Y 《PloS one》2007,2(12):e1262

Background

The high-throughput genotyping chips have contributed greatly to genome-wide association (GWA) studies to identify novel disease susceptibility single nucleotide polymorphisms (SNPs). The high-density chips are designed using two different SNP selection approaches, the direct gene-centric approach, and the indirect quasi-random SNPs or linkage disequilibrium (LD)-based tagSNPs approaches. Although all these approaches can provide high genome coverage and ascertain variants in genes, it is not clear to which extent these approaches could capture the common genic variants. It is also important to characterize and compare the differences between these approaches.

Methodology/Principal Findings

In our study, by using both the Phase II HapMap data and the disease variants extracted from OMIM, a gene-centric evaluation was first performed to evaluate the ability of the approaches in capturing the disease variants in Caucasian population. Then the distribution patterns of SNPs were also characterized in genic regions, evolutionarily conserved introns and nongenic regions, ontologies and pathways. The results show that, no mater which SNP selection approach is used, the current high-density SNP chips provide very high coverage in genic regions and can capture most of known common disease variants under HapMap frame. The results also show that the differences between the direct and the indirect approaches are relatively small. Both have similar SNP distribution patterns in these gene-centric characteristics.

Conclusions/Significance

This study suggests that the indirect approaches not only have the advantage of high coverage but also are useful for studies focusing on various functional SNPs either in genes or in the conserved regions that the direct approach supports. The study and the annotation of characteristics will be helpful for designing and analyzing GWA studies that aim to identify genetic risk factors involved in common diseases, especially variants in genes and conserved regions.  相似文献   

12.
Scheet P  Stephens M 《PLoS genetics》2008,4(8):e1000147
Quality control (QC) is a critical step in large-scale studies of genetic variation. While, on average, high-throughput single nucleotide polymorphism (SNP) genotyping assays are now very accurate, the errors that remain tend to cluster into a small percentage of "problem" SNPs, which exhibit unusually high error rates. Because most large-scale studies of genetic variation are searching for phenomena that are rare (e.g., SNPs associated with a phenotype), even this small percentage of problem SNPs can cause important practical problems. Here we describe and illustrate how patterns of linkage disequilibrium (LD) can be used to improve QC in large-scale, population-based studies. This approach has the advantage over existing filters (e.g., HWE or call rate) that it can actually reduce genotyping error rates by automatically correcting some genotyping errors. Applying this LD-based QC procedure to data from The International HapMap Project, we identify over 1,500 SNPs that likely have high error rates in the CHB and JPT samples and estimate corrected genotypes. Our method is implemented in the software package fastPHASE, available from the Stephens Lab website (http://stephenslab.uchicago.edu/software.html).  相似文献   

13.

Background  

Since the introduction of large-scale genotyping methods that can be utilized in genome-wide association (GWA) studies for deciphering complex diseases, statistical genetics has been posed with a tremendous challenge of how to most appropriately analyze such data. A plethora of advanced model-based methods for genetic mapping of traits has been available for more than 10 years in animal and plant breeding. However, most such methods are computationally intractable in the context of genome-wide studies. Therefore, it is hardly surprising that GWA analyses have in practice been dominated by simple statistical tests concerned with a single marker locus at a time, while the more advanced approaches have appeared only relatively recently in the biomedical and statistical literature.  相似文献   

14.
Connelly CF  Akey JM 《Genetics》2012,191(4):1345-1353
Advances in sequencing technology have enabled whole-genome sequences to be obtained from multiple individuals within species, particularly in model organisms with compact genomes. For example, 36 genome sequences of Saccharomyces cerevisiae are now publicly available, and SNP data are available for even larger collections of strains. One potential use of these resources is mapping the genetic basis of phenotypic variation through genome-wide association (GWA) studies, with the benefit that associated variants can be studied experimentally with greater ease than in outbred populations such as humans. Here, we evaluate the prospects of GWA studies in S. cerevisiae strains through extensive simulations and a GWA study of mitochondrial copy number. We demonstrate that the complex and heterogeneous patterns of population structure present in yeast populations can lead to a high type I error rate in GWA studies of quantitative traits, and that methods typically used to control for population stratification do not provide adequate control of the type I error rate. Moreover, we show that while GWA studies of quantitative traits in S. cerevisiae may be difficult depending on the particular set of strains studied, association studies to map cis-acting quantitative trait loci (QTL) and Mendelian phenotypes are more feasible. We also discuss sampling strategies that could enable GWA studies in yeast and illustrate the utility of this approach in Saccharomyces paradoxus. Thus, our results provide important practical insights into the design and interpretation of GWA studies in yeast, and other model organisms that possess complex patterns of population structure.  相似文献   

15.
In this study, we describe the development and characterization of the first high‐density single nucleotide polymorphism (SNP) genotyping array for rainbow trout. The SNP array is publically available from a commercial vendor (Affymetrix). The SNP genotyping quality was high, and validation rate was close to 90%. This is comparable to other farm animals and is much higher than previous smaller scale SNP validation studies in rainbow trout. High quality and integrity of the genotypes are evident from sample reproducibility and from nearly 100% agreement in genotyping results from other methods. The array is very useful for rainbow trout aquaculture populations with more than 40 900 polymorphic markers per population. For wild populations that were confounded by a smaller sample size, the number of polymorphic markers was between 10 577 and 24 330. Comparison between genotypes from individual populations suggests good potential for identifying candidate markers for populations' traceability. Linkage analysis and mapping of the SNPs to the reference genome assembly provide strong evidence for a wide distribution throughout the genome with good representation in all 29 chromosomes. A total of 68% of the genome scaffolds and contigs were anchored through linkage analysis using the SNP array genotypes, including ~20% of the genome assembly that has not been previously anchored to chromosomes.  相似文献   

16.
Ahn MJ  Won HH  Lee J  Lee ST  Sun JM  Park YH  Ahn JS  Kwon OJ  Kim H  Shim YM  Kim J  Kim K  Kim YH  Park JY  Kim JW  Park K 《Human genetics》2012,131(3):365-372
The proportion of never smoker non-small cell lung cancer (NSCLC) in Asia is about 30-40%. Despite the striking demographics and high prevalence of never smoker NSCLC, the exact causes still remain undetermined. Although several genome wide association (GWA) studies were conducted to find susceptibility loci for lung cancer in never smokers, no regions were replicated except for 5p15.33, suggesting locus heterogeneity and different environmental toxic effects. To identify genetic loci associated with susceptibility of lung cancer in never smokers, we performed a GWA analysis using the Affymetrix 6.0 SNP array. For discovery GWA set, we recruited 446 never smoking Korean patients with NSCLC and 497 normal subjects. We tested association of SNPs with lung cancer susceptibility using the Cochran-Armitage trend test. For validation, 39 SNPs were selected from the top 50 SNPs and five additional SNPs were selected in the DAB1 gene region which showed significant associations in the GWA analysis. The validation SNPs were genotyped in an independent sample including 434 patients and 1,000 controls. Among the 44 validation SNPs, two SNPs (rs11080466 and rs11663246) near the APCDD1, NAPG and FAM38B genes in the 18p11.22 region were replicated. P value of rs11080466 was 1.08 × 10(-6) in the combined sets (2.68 × 10(-5) in the discovery set and 2.60 × 10(-3) in the validation set) and odds ratio was 0.68 (0.58-0.79). We observed similar association for rs11663246. Our result suggests the 18p11.22 region as a novel lung cancer susceptibility locus in never smokers.  相似文献   

17.
Resequencing is an emerging tool for identification of rare disease-associated mutations. Rare mutations are difficult to tag with SNP genotyping, as genotyping studies are designed to detect common variants. However, studies have shown that genetic heterogeneity is a probable scenario for common diseases, in which multiple rare mutations together explain a large proportion of the genetic basis for the disease. Thus, we propose a weighted-sum method to jointly analyse a group of mutations in order to test for groupwise association with disease status. For example, such a group of mutations may result from resequencing a gene. We compare the proposed weighted-sum method to alternative methods and show that it is powerful for identifying disease-associated genes, both on simulated and Encode data. Using the weighted-sum method, a resequencing study can identify a disease-associated gene with an overall population attributable risk (PAR) of 2%, even when each individual mutation has much lower PAR, using 1,000 to 7,000 affected and unaffected individuals, depending on the underlying genetic model. This study thus demonstrates that resequencing studies can identify important genetic associations, provided that specialised analysis methods, such as the weighted-sum method, are used.  相似文献   

18.

Background

High-throughput genotype (HTG) data has been used primarily in genome-wide association (GWA) studies; however, GWA results explain only a limited part of the complete genetic variation of traits. In systems genetics, network approaches have been shown to be able to identify pathways and their underlying causal genes to unravel the biological and genetic background of complex diseases and traits, e.g., the Weighted Gene Co-expression Network Analysis (WGCNA) method based on microarray gene expression data. The main objective of this study was to develop a scale-free weighted genetic interaction network method using whole genome HTG data in order to detect biologically relevant pathways and potential genetic biomarkers for complex diseases and traits.

Results

We developed the Weighted Interaction SNP Hub (WISH) network method that uses HTG data to detect genome-wide interactions between single nucleotide polymorphism (SNPs) and its relationship with complex traits. Data dimensionality reduction was achieved by selecting SNPs based on its: 1) degree of genome-wide significance and 2) degree of genetic variation in a population. Network construction was based on pairwise Pearson's correlation between SNP genotypes or the epistatic interaction effect between SNP pairs. To identify modules the Topological Overlap Measure (TOM) was calculated, reflecting the degree of overlap in shared neighbours between SNP pairs. Modules, clusters of highly interconnected SNPs, were defined using a tree-cutting algorithm on the SNP dendrogram created from the dissimilarity TOM (1-TOM). Modules were selected for functional annotation based on their association with the trait of interest, defined by the Genome-wide Module Association Test (GMAT). We successfully tested the established WISH network method using simulated and real SNP interaction data and GWA study results for carcass weight in a pig resource population; this resulted in detecting modules and key functional and biological pathways related to carcass weight.

Conclusions

We developed the WISH network method which is a novel 'systems genetics' approach to study genetic networks underlying complex trait variation. The WISH network method reduces data dimensionality and statistical complexity in associating genotypes with phenotypes in GWA studies and enables researchers to identify biologically relevant pathways and potential genetic biomarkers for any complex trait of interest.
  相似文献   

19.
Single nucleotide polymorphism (SNP) genotyping has become a key technology for genetic studies. In recent years, matrix-assisted laser desorption/ionization (MALDI) time-of-flight mass spectrometry has emerged as a very powerful method for SNP genotyping. Here, we discuss our experience in implementing a high-throughput SNP genotyping facility based on MALDI, and the issues encountered in adapting this to large-scale genetic studies. Most of these issues are not specific to using MALDI approaches, and they will also serve as valuable pointers for establishing high-throughput genotyping with other methods.  相似文献   

20.
With the advent of genome-wide association (GWA) studies, researchers are hoping that reliable genetic association of common human complex diseases/traits can be detected. Currently, there is an increasing enthusiasm about GWA and a number of GWA studies have been published. In the field a common practice is that replication should be used as the gold standard to validate an association finding. In this article, based on empirical and theoretical data, we emphasize that replication of GWA findings can be quite difficult, and should not always be expected, even when true variants are identified. The probability of replication becomes smaller with the increasing number of independent GWA studies if the power of individual replication studies is less than 100% (which is usually the case), and even a finding that is replicated may not necessarily be true. We argue that the field may have unreasonably high expectations on success of replication. We also wish to raise the question whether it is sufficient or necessary to treat replication as the ultimate and gold standard for defining true variants. We finally discuss the usefulness of integrating evidence from multiple levels/sources such as genetic epidemiological studies (at the DNA level), gene expression studies (at the RNA level), proteomics (at the protein level), and follow-up molecular and cellular studies for eventual validation and illumination of the functional relevance of the genes uncovered.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号