首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The number of variants that have a non-zero effect on a trait (i.e. polygenicity) is a fundamental parameter in the study of the genetic architecture of a complex trait. Although many previous studies have investigated polygenicity at a genome-wide scale, a detailed understanding of how polygenicity varies across genomic regions is currently lacking. In this work, we propose an accurate and scalable statistical framework to estimate regional polygenicity for a complex trait. We show that our approach yields approximately unbiased estimates of regional polygenicity in simulations across a wide-range of various genetic architectures. We then partition the polygenicity of anthropometric and blood pressure traits across 6-Mb genomic regions (N = 290K, UK Biobank) and observe that all analyzed traits are highly polygenic: over one-third of regions harbor at least one causal variant for each of the traits analyzed. Additionally, we observe wide variation in regional polygenicity: on average across all traits, 48.9% of regions contain at least 5 causal SNPs, 5.44% of regions contain at least 50 causal SNPs. Finally, we find that heritability is proportional to polygenicity at the regional level, which is consistent with the hypothesis that heritability enrichments are largely driven by the variation in the number of causal SNPs.  相似文献   

2.
Molecular haplotyping at high throughput   总被引:4,自引:2,他引:2       下载免费PDF全文
Reconstruction of haplotypes, or the allelic phase, of single nucleotide polymorphisms (SNPs) is a key component of studies aimed at the identification and dissection of genetic factors involved in complex genetic traits. In humans, this often involves investigation of SNPs in case/control or other cohorts in which the haplotypes can only be partially inferred from genotypes by statistical approaches with resulting loss of power. Moreover, alternative statistical methodologies can lead to different evaluations of the most probable haplotypes present, and different haplotype frequency estimates when data are ambiguous. Given the cost and complexity of SNP studies, a robust and easy-to-use molecular technique that allows haplotypes to be determined directly from individual DNA samples would have wide applicability. Here, we present a reliable, automated and high-throughput method for molecular haplotyping in 2 kb, and potentially longer, sequence segments that is based on the physical determination of the phase of SNP alleles on either of the individual paternal haploids. We demonstrate that molecular haplotyping with this technique is not more complicated than SNP genotyping when implemented by matrix-assisted laser desorption/ionisation mass spectrometry, and we also show that the method can be applied using other DNA variation detection platforms. Molecular haplotyping is illustrated on the well-described β2-adrenergic receptor gene.  相似文献   

3.
Genome-wide association studies (GWAS) are routinely conducted for both quantitative and binary (disease) traits. We present two analytical tools for use in the experimental design of GWAS. Firstly, we present power calculations quantifying power in a unified framework for a range of scenarios. In this context we consider the utility of quantitative scores (e.g. endophenotypes) that may be available on cases only or both cases and controls. Secondly, we consider, the accuracy of prediction of genetic risk from genome-wide SNPs and derive an expression for genomic prediction accuracy using a liability threshold model for disease traits in a case-control design. The expected values based on our derived equations for both power and prediction accuracy agree well with observed estimates from simulations.  相似文献   

4.
GCTA: a tool for genome-wide complex trait analysis   总被引:7,自引:0,他引:7  
For most human complex diseases and traits, SNPs identified by genome-wide association studies (GWAS) explain only a small fraction of the heritability. Here we report a user-friendly software tool called genome-wide complex trait analysis (GCTA), which was developed based on a method we recently developed to address the "missing heritability" problem. GCTA estimates the variance explained by all the SNPs on a chromosome or on the whole genome for a complex trait rather than testing the association of any particular SNP to the trait. We introduce GCTA's five main functions: data management, estimation of the genetic relationships from SNPs, mixed linear model analysis of variance explained by the SNPs, estimation of the linkage disequilibrium structure, and GWAS simulation. We focus on the function of estimating the variance explained by all the SNPs on the X chromosome and testing the hypotheses of dosage compensation. The GCTA software is a versatile tool to estimate and partition complex trait variation with large GWAS data sets.  相似文献   

5.
For most complex traits, results from genome-wide association studies show that the proportion of the phenotypic variance attributable to the additive effects of individual SNPs, that is, the heritability explained by the SNPs, is substantially less than the estimate of heritability obtained by standard methods using correlations between relatives. This difference has been called the “missing heritability”. One explanation is that heritability estimates from family (including twin) studies are biased upwards. Zuk et al. revisited overestimation of narrow sense heritability from twin studies as a result of confounding with non-additive genetic variance. They propose a limiting pathway (LP) model that generates significant epistatic variation and its simple parametrization provides a convenient way to explore implications of epistasis. They conclude that over-estimation of narrow sense heritability from family data (‘phantom heritability’) may explain an important proportion of missing heritability. We show that for highly heritable quantitative traits large phantom heritability estimates from twin studies are possible only if a large contribution of common environment is assumed. The LP model is underpinned by strong assumptions that are unlikely to hold, including that all contributing pathways have the same mean and variance and are uncorrelated. Here, we relax the assumptions that underlie the LP model to be more biologically plausible. Together with theoretical, empirical, and pragmatic arguments we conclude that in outbred populations the contribution of additive genetic variance is likely to be much more important than the contribution of non-additive variance.  相似文献   

6.
Quantitative traits important to organismal function and fitness, such as brain size, are presumably controlled by many small‐effect loci. Deciphering the genetic architecture of such traits with traditional quantitative trait locus (QTL) mapping methods is challenging. Here, we investigated the genetic architecture of brain size (and the size of five different brain parts) in nine‐spined sticklebacks (Pungitius pungitius) with the aid of novel multilocus QTL‐mapping approaches based on a de‐biased LASSO method. Apart from having more statistical power to detect QTL and reduced rate of false positives than conventional QTL‐mapping approaches, the developed methods can handle large marker panels and provide estimates of genomic heritability. Single‐locus analyses of an F2 interpopulation cross with 239 individuals and 15 198, fully informative single nucleotide polymorphisms (SNPs) uncovered 79 QTL associated with variation in stickleback brain size traits. Many of these loci were in strong linkage disequilibrium (LD) with each other, and consequently, a multilocus mapping of individual SNPs, accounting for LD structure in the data, recovered only four significant QTL. However, a multilocus mapping of SNPs grouped by linkage group (LG) identified 14 LGs (1–6 depending on the trait) that influence variation in brain traits. For instance, 17.6% of the variation in relative brain size was explainable by cumulative effects of SNPs distributed over six LGs, whereas 42% of the variation was accounted for by all 21 LGs. Hence, the results suggest that variation in stickleback brain traits is influenced by many small‐effect loci. Apart from suggesting moderately heritable (h2 ≈ 0.15–0.42) multifactorial genetic architecture of brain traits, the results highlight the challenges in identifying the loci contributing to variation in quantitative traits. Nevertheless, the results demonstrate that the novel QTL‐mapping approach developed here has distinctive advantages over the traditional QTL‐mapping methods in analyses of dense marker panels.  相似文献   

7.
We have recently developed analysis methods (GREML) to estimate the genetic variance of a complex trait/disease and the genetic correlation between two complex traits/diseases using genome-wide single nucleotide polymorphism (SNP) data in unrelated individuals. Here we use analytical derivations and simulations to quantify the sampling variance of the estimate of the proportion of phenotypic variance captured by all SNPs for quantitative traits and case-control studies. We also derive the approximate sampling variance of the estimate of a genetic correlation in a bivariate analysis, when two complex traits are either measured on the same or different individuals. We show that the sampling variance is inversely proportional to the number of pairwise contrasts in the analysis and to the variance in SNP-derived genetic relationships. For bivariate analysis, the sampling variance of the genetic correlation additionally depends on the harmonic mean of the proportion of variance explained by the SNPs for the two traits and the genetic correlation between the traits, and depends on the phenotypic correlation when the traits are measured on the same individuals. We provide an online tool for calculating the power of detecting genetic (co)variation using genome-wide SNP data. The new theory and online tool will be helpful to plan experimental designs to estimate the missing heritability that has not yet been fully revealed through genome-wide association studies, and to estimate the genetic overlap between complex traits (diseases) in particular when the traits (diseases) are not measured on the same samples.  相似文献   

8.
Dreyer AP  Shingleton AW 《PloS one》2011,6(12):e28278
The genitalia of most male arthropods scale hypoallometrically with body size, that is they are more or less the same size across large and small individuals in a population. Such scaling is expected to arise when genital traits show less variation than somatic traits in response to factors that generate size variation among individuals in a population. Nevertheless, there have been few studies directly examining the relative sensitivity of genital and somatic traits to factors that affect their size. Such studies are key to understanding genital evolution and the evolution of morphological scaling relationships more generally. Previous studies indicate that the size of genital traits in male Drosophila melanogaster show a relatively low response to variation in environmental factors that affect trait size. Here we show that the size of genital traits in male fruit flies also exhibit a relatively low response to variation in genetic factors that affect trait size. Importantly, however, this low response is only to genetic factors that affect body and organ size systemically, not those that affect organ size autonomously. Further, we show that the genital traits do not show low levels of developmental instability, which is the response to stochastic developmental errors that also influence organ size autonomously. We discuss these results in the context of current hypotheses on the proximate and ultimate mechanisms that generate genital hypoallometry.  相似文献   

9.
Eucalyptus is characterized by high foliar concentrations of plant secondary metabolites with marked qualitative and quantitative variation within a single species. Secondary metabolites in eucalypts are important mediators of a diverse community of herbivores. We used a candidate gene approach to investigate genetic associations between 195 single nucleotide polymorphisms (SNPs) from 24 candidate genes and 33 traits related to secondary metabolites in the Tasmanian Blue Gum (Eucalyptus globulus). We discovered 37 significant associations (false discovery rate (FDR) Q < 0.05) across 11 candidate genes and 19 traits. The effects of SNPs on phenotypic variation were within the expected range (0.018 < r(2) < 0.061) for forest trees. Whereas most marker effects were nonadditive, two alleles from two consecutive genes in the methylerythritol phosphate pathway (MEP) showed additive effects. This study successfully links allelic variants to ecologically important phenotypes which can have a large impact on the entire community. It is one of very few studies to identify the genetic variants of a foundation tree that influences ecosystem function.  相似文献   

10.
Genome-Wide Association Studies shed light on the identification of genes underlying human diseases and agriculturally important traits. This potential has been shadowed by false positive findings. The Mixed Linear Model (MLM) method is flexible enough to simultaneously incorporate population structure and cryptic relationships to reduce false positives. However, its intensive computational burden is prohibitive in practice, especially for large samples. The newly developed algorithm, FaST-LMM, solved the computational problem, but requires that the number of SNPs be less than the number of individuals to derive a rank-reduced relationship. This restriction potentially leads to less statistical power when compared to using all SNPs. We developed a method to extract a small subset of SNPs and use them in FaST-LMM. This method not only retains the computational advantage of FaST-LMM, but also remarkably increases statistical power even when compared to using the entire set of SNPs. We named the method SUPER (Settlement of MLM Under Progressively Exclusive Relationship) and made it available within an implementation of the GAPIT software package.  相似文献   

11.
The sequencing of pooled non-barcoded individuals is an inexpensive and efficient means of assessing genome-wide population allele frequencies, yet its accuracy has not been thoroughly tested. We assessed the accuracy of this approach on whole, complex eukaryotic genomes by resequencing pools of largely isogenic, individually sequenced Drosophila melanogaster strains. We called SNPs in the pooled data and estimated false positive and false negative rates using the SNPs called in individual strain as a reference. We also estimated allele frequency of the SNPs using "pooled" data and compared them with "true" frequencies taken from the estimates in the individual strains. We demonstrate that pooled sequencing provides a faithful estimate of population allele frequency with the error well approximated by binomial sampling, and is a reliable means of novel SNP discovery with low false positive rates. However, a sufficient number of strains should be used in the pooling because variation in the amount of DNA derived from individual strains is a substantial source of noise when the number of pooled strains is low. Our results and analysis confirm that pooled sequencing is a very powerful and cost-effective technique for assessing of patterns of sequence variation in populations on genome-wide scales, and is applicable to any dataset where sequencing individuals or individual cells is impossible, difficult, time consuming, or expensive.  相似文献   

12.
For most common diseases with heritable components, not a single or a few single-nucleotide polymorphisms (SNPs) explain most of the variance for these disorders. Instead, much of the variance may be caused by interactions (epistasis) among multiple SNPs or interactions with environmental conditions. We present a new powerful statistical model for analyzing and interpreting genomic data that influence multifactorial phenotypic traits with a complex and likely polygenic inheritance. The new method is based on Markov chain Monte Carlo (MCMC) and allows for identification of sets of SNPs and environmental factors that when combined increase disease risk or change the distribution of a quantitative trait. Using simulations, we show that the MCMC method can detect disease association when multiple, interacting SNPs are present in the data. When applying the method on real large-scale data from a Danish population-based cohort, multiple interactions are identified that severely affect serum triglyceride levels in the study individuals. The method is designed for quantitative traits but can also be applied on qualitative traits. It is computationally feasible even for a large number of possible interactions and differs fundamentally from most previous approaches by entertaining nonlinear interactions and by directly addressing the multiple-testing problem.  相似文献   

13.
Gene discovery, estimation of heritability captured by SNP arrays, inference on genetic architecture and prediction analyses of complex traits are usually performed using different statistical models and methods, leading to inefficiency and loss of power. Here we use a Bayesian mixture model that simultaneously allows variant discovery, estimation of genetic variance explained by all variants and prediction of unobserved phenotypes in new samples. We apply the method to simulated data of quantitative traits and Welcome Trust Case Control Consortium (WTCCC) data on disease and show that it provides accurate estimates of SNP-based heritability, produces unbiased estimators of risk in new samples, and that it can estimate genetic architecture by partitioning variation across hundreds to thousands of SNPs. We estimated that, depending on the trait, 2,633 to 9,411 SNPs explain all of the SNP-based heritability in the WTCCC diseases. The majority of those SNPs (>96%) had small effects, confirming a substantial polygenic component to common diseases. The proportion of the SNP-based variance explained by large effects (each SNP explaining 1% of the variance) varied markedly between diseases, ranging from almost zero for bipolar disorder to 72% for type 1 diabetes. Prediction analyses demonstrate that for diseases with major loci, such as type 1 diabetes and rheumatoid arthritis, Bayesian methods outperform profile scoring or mixed model approaches.  相似文献   

14.
Genetic structure in the European American population reflects waves of migration and recent gene flow among different populations. This complex structure can introduce bias in genetic association studies. Using Principal Components Analysis (PCA), we analyze the structure of two independent European American datasets (1,521 individuals-307,315 autosomal SNPs). Individual variation lies across a continuum with some individuals showing high degrees of admixture with non-European populations, as demonstrated through joint analysis with HapMap data. The CEPH Europeans only represent a small fraction of the variation encountered in the larger European American datasets we studied. We interpret the first eigenvector of this data as correlated with ancestry, and we apply an algorithm that we have previously described to select PCA-informative markers (PCAIMs) that can reproduce this structure. Importantly, we develop a novel method that can remove redundancy from the selected SNP panels and show that we can effectively remove correlated markers, thus increasing genotyping savings. Only 150-200 PCAIMs suffice to accurately predict fine structure in European American datasets, as identified by PCA. Simulating association studies, we couple our method with a PCA-based stratification correction tool and demonstrate that a small number of PCAIMs can efficiently remove false correlations with almost no loss in power. The structure informative SNPs that we propose are an important resource for genetic association studies of European Americans. Furthermore, our redundancy removal algorithm can be applied on sets of ancestry informative markers selected with any method in order to select the most uncorrelated SNPs, and significantly decreases genotyping costs.  相似文献   

15.
In linkage studies, independent replication of positive findings is crucial in order to distinguish between true positives and false positives. Recently, the following question has arisen in linkage studies of complex traits: at what distance do we reject the hypothesis that two location estimates in a genomic region represent the same gene? Here we attempt to address this question. Sampling distributions for location estimates were constructed by computer simulation. The conditions for simulation were chosen to reflect features of "typical" complex traits, including incomplete penetrance, phenocopies, and genetic heterogeneity. Our findings, which bear on what is considered a replication in linkage studies of complex traits, suggest that, even with relatively large numbers of multiplex families, chance variation in the location estimate is substantial. In addition, we report evidence that, for the conditions studied here, the standard error of a location estimate is a function of the magnitude of the expected LOD score.  相似文献   

16.
Typically twin studies are used to investigate the aggregate effects of genetic and environmental influences on brain phenotypic measures. Although some phenotypic measures are highly heritable in twin studies, SNPs (single nucleotide polymorphisms) identified by genome-wide association studies (GWAS) account for only a small fraction of the heritability of these measures. We mapped the genetic variation (the proportion of phenotypic variance explained by variation among SNPs) of volumes of pre-defined regions across the whole brain, as explained by 512,905 SNPs genotyped on 747 adult participants from the Alzheimer''s Disease Neuroimaging Initiative (ADNI). We found that 85% of the variance of intracranial volume (ICV) (p = 0.04) was explained by considering all SNPs simultaneously, and after adjusting for ICV, total grey matter (GM) and white matter (WM) volumes had genetic variation estimates near zero (p = 0.5). We found varying estimates of genetic variation across 93 non-overlapping regions, with asymmetry in estimates between the left and right cerebral hemispheres. Several regions reported in previous studies to be related to Alzheimer''s disease progression were estimated to have a large proportion of volumetric variance explained by the SNPs.  相似文献   

17.
Polymorphisms that affect complex traits or quantitative trait loci (QTL) often affect multiple traits. We describe two novel methods (1) for finding single nucleotide polymorphisms (SNPs) significantly associated with one or more traits using a multi-trait, meta-analysis, and (2) for distinguishing between a single pleiotropic QTL and multiple linked QTL. The meta-analysis uses the effect of each SNP on each of n traits, estimated in single trait genome wide association studies (GWAS). These effects are expressed as a vector of signed t-values (t) and the error covariance matrix of these t values is approximated by the correlation matrix of t-values among the traits calculated across the SNP (V). Consequently, t''V−1t is approximately distributed as a chi-squared with n degrees of freedom. An attractive feature of the meta-analysis is that it uses estimated effects of SNPs from single trait GWAS, so it can be applied to published data where individual records are not available. We demonstrate that the multi-trait method can be used to increase the power (numbers of SNPs validated in an independent population) of GWAS in a beef cattle data set including 10,191 animals genotyped for 729,068 SNPs with 32 traits recorded, including growth and reproduction traits. We can distinguish between a single pleiotropic QTL and multiple linked QTL because multiple SNPs tagging the same QTL show the same pattern of effects across traits. We confirm this finding by demonstrating that when one SNP is included in the statistical model the other SNPs have a non-significant effect. In the beef cattle data set, cluster analysis yielded four groups of QTL with similar patterns of effects across traits within a group. A linear index was used to validate SNPs having effects on multiple traits and to identify additional SNPs belonging to these four groups.  相似文献   

18.
When testing large numbers of null hypotheses, one needs to assess the evidence against the global null hypothesis that none of the hypotheses is false. Such evidence typically is based on the test statistic of the largest magnitude, whose statistical significance is evaluated by permuting the sample units to simulate its null distribution. Efron (2007) has noted that correlation among the test statistics can induce substantial interstudy variation in the shapes of their histograms, which may cause misleading tail counts. Here, we show that permutation-based estimates of the overall significance level also can be misleading when the test statistics are correlated. We propose that such estimates be conditioned on a simple measure of the spread of the observed histogram, and we provide a method for obtaining conditional significance levels. We justify this conditioning using the conditionality principle described by Cox and Hinkley (1974). Application of the method to gene expression data illustrates the circumstances when conditional significance levels are needed.  相似文献   

19.
In this report, we present a simple and powerful way to incorporate individual-specific liability classes into linkage analysis. The proposed method is applicable to both quantitative and qualitative traits. In linkage studies, we may have information about different covariates. Incorporation of these covariates along with the estimates of residual familial effects, age-at-onset effects, and susceptibility in the definition of liability classes can increase the power to detect genetic linkage. In this study, we show how one can form individual-specific liability classes and use these classes in standard linkage-analysis programs, such as the widely used LINKAGE package, to perform more powerful genetic linkage analysis. Our simulation study shows that this approach yields higher LOD scores and more-accurate estimates of the recombination fraction in the families showing linkage. The proposed method is also applied to kindreds collected, at the M. D. Anderson Cancer Center, through probands with childhood soft-tissue sarcoma. Confirmed germ-line mutations in the p53 tumor-suppressor gene have been identified in these families. Application of our method to these families yielded significantly higher LOD scores and more-accurate recombination fractions than did analysis that did not account for individual-specific covariate information.  相似文献   

20.
We conducted a comprehensive study of copy number variants (CNVs) well-tagged by SNPs (r(2)≥ 0.8) by analyzing their effect on gene expression and their association with disease susceptibility and other complex human traits. We tested whether these CNVs were more likely to be functional than frequency-matched SNPs as trait-associated loci or as expression quantitative trait loci (eQTLs) influencing phenotype by altering gene regulation. Our study found that CNV-tagging SNPs are significantly enriched for cis eQTLs; furthermore, we observed that trait associations from the NHGRI catalog show an overrepresentation of SNPs tagging CNVs relative to frequency-matched SNPs. We found that these SNPs tagging CNVs are more likely to affect multiple expression traits than frequency-matched variants. Given these findings on the functional relevance of CNVs, we created an online resource of expression-associated CNVs (eCNVs) using the most comprehensive population-based map of CNVs to inform future studies of complex traits. Although previous studies of common CNVs that can be typed on existing platforms and/or interrogated by SNPs in genome-wide association studies concluded that such CNVs appear unlikely to have a major role in the genetic basis of several complex diseases examined, our findings indicate that it would be premature to dismiss the possibility that even common CNVs may contribute to complex phenotypes and at least some common diseases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号