首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Genomewide association studies are now a widely used approach in the search for loci that affect complex traits. After detection of significant association, estimates of penetrance and allele-frequency parameters for the associated variant indicate the importance of that variant and facilitate the planning of replication studies. However, when these estimates are based on the original data used to detect the variant, the results are affected by an ascertainment bias known as the "winner's curse." The actual genetic effect is typically smaller than its estimate. This overestimation of the genetic effect may cause replication studies to fail because the necessary sample size is underestimated. Here, we present an approach that corrects for the ascertainment bias and generates an estimate of the frequency of a variant and its penetrance parameters. The method produces a point estimate and confidence region for the parameter estimates. We study the performance of this method using simulated data sets and show that it is possible to greatly reduce the bias in the parameter estimates, even when the original association study had low power. The uncertainty of the estimate decreases with increasing sample size, independent of the power of the original test for association. Finally, we show that application of the method to case-control data can improve the design of replication studies considerably.  相似文献   

2.
GCTA: a tool for genome-wide complex trait analysis   总被引:7,自引:0,他引:7  
For most human complex diseases and traits, SNPs identified by genome-wide association studies (GWAS) explain only a small fraction of the heritability. Here we report a user-friendly software tool called genome-wide complex trait analysis (GCTA), which was developed based on a method we recently developed to address the "missing heritability" problem. GCTA estimates the variance explained by all the SNPs on a chromosome or on the whole genome for a complex trait rather than testing the association of any particular SNP to the trait. We introduce GCTA's five main functions: data management, estimation of the genetic relationships from SNPs, mixed linear model analysis of variance explained by the SNPs, estimation of the linkage disequilibrium structure, and GWAS simulation. We focus on the function of estimating the variance explained by all the SNPs on the X chromosome and testing the hypotheses of dosage compensation. The GCTA software is a versatile tool to estimate and partition complex trait variation with large GWAS data sets.  相似文献   

3.
Genome-wide association studies are designed to discover SNPs that are associated with a complex trait. Employing strict significance thresholds when testing individual SNPs avoids false positives at the expense of increasing false negatives. Recently, we developed a method for quantitative traits that estimates the variation accounted for when fitting all SNPs simultaneously. Here we develop this method further for case-control studies. We use a linear mixed model for analysis of binary traits and transform the estimates to a liability scale by adjusting both for scale and for ascertainment of the case samples. We show by theory and simulation that the method is unbiased. We apply the method to data from the Wellcome Trust Case Control Consortium and show that a substantial proportion of variation in liability for Crohn disease, bipolar disorder, and type I diabetes is tagged by common SNPs.  相似文献   

4.

Background

While the importance of record linkage is widely recognised, few studies have attempted to quantify how linkage errors may have impacted on their own findings and outcomes. Even where authors of linkage studies have attempted to estimate sensitivity and specificity based on subjects with known status, the effects of false negatives and positives on event rates and estimates of effect are not often described.

Methods

We present quantification of the effect of sensitivity and specificity of the linkage process on event rates and incidence, as well as the resultant effect on relative risks. Formulae to estimate the true number of events and estimated relative risk adjusted for given linkage sensitivity and specificity are then derived and applied to data from a prisoner mortality study. The implications of false positive and false negative matches are also discussed.

Discussion

Comparisons of the effect of sensitivity and specificity on incidence and relative risks indicate that it is more important for linkages to be highly specific than sensitive, particularly if true incidence rates are low. We would recommend that, where possible, some quantitative estimates of the sensitivity and specificity of the linkage process be performed, allowing the effect of these quantities on observed results to be assessed.  相似文献   

5.
Although many methods are available to test sequence variants for association with complex diseases and traits, methods that specifically seek to identify causal variants are less developed. Here we develop and evaluate a Bayesian hierarchical regression method that incorporates prior information on the likelihood of variant causality through weighting of variant effects. By simulation studies using both simulated and real sequence variants, we compared a standard single variant test for analyzing variant-disease association with the proposed method using different weighting schemes. We found that by leveraging linkage disequilibrium of variants with known GWAS signals and sequence conservation (phastCons), the proposed method provides a powerful approach for detecting causal variants while controlling false positives.  相似文献   

6.
High-density genotyping is extensively exploited in genome-wide association mapping studies and genomic selection in maize. By contrast, linkage mapping studies were until now mostly based on low-density genetic maps and theoretical results suggested this to be sufficient. This raises the question, if an increase in marker density would be an overkill for linkage mapping in biparental populations, or if important QTL mapping parameters would benefit from it. In this study, we addressed this question using experimental data and a simulation based on linkage maps with marker densities of 1, 2, and 5 cM. QTL mapping was performed for six diverse traits in a biparental population with 204 doubled haploid maize lines and in a simulation study with varying QTL effects and closely linked QTL for different population sizes. Our results showed that high-density maps neither improved the QTL detection power nor the predictive power for the proportion of explained genotypic variance. By contrast, the precision of QTL localization, the precision of effect estimates of detected QTL, especially for small and medium sized QTL, as well as the power to resolve closely linked QTL profited from an increase in marker density from 5 to 1 cM. In conclusion, the higher costs for high-density genotyping are compensated for by more precise estimates of parameters relevant for knowledge-based breeding, thus making an increase in marker density for linkage mapping attractive.  相似文献   

7.
Wu LY  Sun L  Bull SB 《Human heredity》2006,62(2):84-96
BACKGROUND/AIMS: In genome-wide linkage analysis of quantitative trait loci (QTL), locus-specific heritability estimates are biased when the original data are used to both localize linkage and estimate effects, due to maximization of the LOD score over the genome. Positive bias is increased by adoption of stringent significance levels to control genome-wide type I error. We propose multi-locus bootstrap resampling estimators for bias reduction in the situation in which linkage peaks at more than one QTL are of interest. METHODS: Bootstrap estimates were based on repeated sample splitting in the original dataset. We conducted simulation studies in nuclear families with 0 to 5 QTLs and applied the methods in a genome-wide analysis of a blood pressure phenotype in extended pedigrees from the Framingham Heart Study (FHS). RESULTS: Compared to na?ve estimates in the original simulation samples, bootstrap estimates had reduced bias and smaller mean squared error. In the FHS pedigrees, the bootstrap yielded heritability estimates as much as 70% smaller than in the original sample. CONCLUSIONS: Because effect estimates obtained in an initial study are typically inflated relative to those expected in an independent replication study, successful replication will be more likely when sample size requirements are based on bias-reduced estimates.  相似文献   

8.
A powerful approach to mapping the genes for complex traits is to study isolated founder populations, in which genetic heterogeneity and environmental noise are likely to be reduced and in which extended genealogical data are often available. Using graph theory, we applied an approach that involved sampling from the large number of pairwise relationships present in an extended genealogy to reconstruct sets of subpedigrees that maximize the useful information for linkage mapping while minimizing calculation burden. We investigated, through simulation, the properties of the different sets in terms of bias in identity-by-descent (IBD) estimation and power decrease under various genetic models. We applied this approach to a small isolated population from Sardinia, the village of Talana, consisting of a unique large and complex pedigree, and performed a genomewide search through variance-components linkage analysis for serum lipid levels. We identified a region of significant linkage on chromosome 2 for total serum cholesterol and low-density lipoprotein (LDL) cholesterol. Through higher-density mapping, we obtained an increased linkage for both traits on 2q21.2-q24.1, with a LOD score of 4.3 for total serum cholesterol and of 3.9 for LDL cholesterol. A replication study was performed in an independent and larger set from a genetically differentiated isolated population of the same region of Sardinia, the village of Perdasdefogu. We obtained consistent linkage to the region for total serum cholesterol (LOD score 1.4) and LDL cholesterol (LOD score 2.2), with a level of concordance uncommon for complex traits, and refined the location of the quantitative-trait locus. Interestingly, the 2q21.1-22 region has also been linked to premature coronary heart disease in Finns, and, in the adjacent 2q14 region, significant linkage with triglycerides has been reported in Hutterites.  相似文献   

9.
Variance-component (VC) methods are flexible and powerful procedures for the mapping of genes that influence quantitative traits. However, traditional VC methods make the critical assumption that the quantitative-trait data within a family either follow or can be transformed to follow a multivariate normal distribution. Violation of the multivariate normality assumption can occur if trait data are censored at some threshold value. Trait censoring can arise in a variety of ways, including assay limitation or confounding due to medication. Valid linkage analyses of censored data require the development of a modified VC method that directly models the censoring event. Here, we present such a model, which we call the "tobit VC method." Using simulation studies, we compare and contrast the performance of the traditional and tobit VC methods for linkage analysis of censored trait data. For the simulation settings that we considered, our results suggest that (1) analyses of censored data by using the traditional VC method lead to severe bias in parameter estimates and a modest increase in false-positive linkage findings, (2) analyses with the tobit VC method lead to unbiased parameter estimates and type I error rates that reflect nominal levels, and (3) the tobit VC method has a modest increase in linkage power as compared with the traditional VC method. We also apply the tobit VC method to censored data from the Finland-United States Investigation of Non-Insulin-Dependent Diabetes Mellitus Genetics study and provide two examples in which the tobit VC method yields noticeably different results as compared with the traditional method.  相似文献   

10.
Meta-analysis is an important tool in linkage analysis. The pooling of results across primary linkage studies allows greater statistical power to detect quantitative-trait loci (QTLs) and more-precise estimation of their genetic effects and, hence, yields conclusions that are stronger relative to those of individual studies. Previous methods for the meta-analysis of linkage studies have been proposed, and, although some methods address the problem of between-study heterogeneity, most methods still require linkage analysis at the same marker or set of markers across studies, whereas others do not result in an estimate of genetic variance. In this study, we present a meta-analytic procedure to evaluate evidence from several studies that report Haseman-Elston statistics for linkage to a QTL at multiple, possibly distinct, markers on a chromosome. This technique accounts for between-study heterogeneity and estimates both the location of the QTL and the magnitude of the genetic effect more precisely than does an individual study. We also provide standard errors for the genetic effect and for the location (in cM) of the QTL, using a resampling method. The approach can be applied under other conditions, provided that the various studies use the same linkage statistic.  相似文献   

11.
We construct a mathematical model of the within-cell replication of poliovirus, a prototypic RNA virus, and use realistic parameter estimates to describe the increase of copy number of the viral genome. Our initial model is essentially an exponential growth model; we also consider modifications of this model to account for resource utilization. The saturation of viral replication dynamics observed in experimental systems can be explained in terms of heavy resource use by the virus. We then use our models to consider the conditions under which the growth of poliovirus is optimized. Intriguingly, if poliovirus has optimized its replication within cells, the predicted ratio of positive to negative strands is close to what is actually observed. We interpret our findings in terms of the evolution of life-history traits.  相似文献   

12.
The primary goal of a genomewide scan is to estimate the genomic locations of genes influencing a trait of interest. It is sometimes said that a secondary goal is to estimate the phenotypic effects of each identified locus. Here, it is shown that these two objectives cannot be met reliably by use of a single data set of a currently realistic size. Simulation and analytical results, based on variance-components linkage analysis as an example, demonstrate that estimates of locus-specific effect size at genomewide LOD score peaks tend to be grossly inflated and can even be virtually independent of the true effect size, even for studies on large samples when the true effect size is small. However, the bias diminishes asymptotically. The explanation for the bias is that the LOD score is a function of the locus-specific effect-size estimate, such that there is a high correlation between the observed statistical significance and the effect-size estimate. When the LOD score is maximized over the many pointwise tests being conducted throughout the genome, the locus-specific effect-size estimate is therefore effectively maximized as well. We argue that attempts at bias correction give unsatisfactory results, and that pointwise estimation in an independent data set may be the only way of obtaining reliable estimates of locus-specific effect-and then only if one does not condition on statistical significance being obtained. We further show that the same factors causing this bias are responsible for frequent failures to replicate initial claims of linkage or association for complex traits, even when the initial localization is, in fact, correct. The findings of this study have wide-ranging implications, as they apply to all statistical methods of gene localization. It is hoped that, by keeping this bias in mind, we will more realistically interpret and extrapolate from the results of genomewide scans.  相似文献   

13.
Sample-size guidelines for linkage studies of quantitative traits partially determined by a dominant major locus are needed to provide a rough estimate of the amount of pedigree material that should be sampled to map the loci that influence such traits. After pedigrees are sampled, a specific power calculation can be carried out to evaluate the linkage information provided by the sampled pedigrees. Using computer simulation, I provide sample-size guidelines for linkage studies by the method of lod scores of quantitative traits partially determined by a dominant major locus. I consider the effects of a trait model, marker characteristics, and sampling strategy, with particular attention to sampling strategy because it is the one factor which the investigator can fully control. My results suggest that linkage studies of quantitative traits are practical, particularly if the investigator chooses an efficient sampling design and an efficient strategy to select pedigrees for linkage analysis.  相似文献   

14.
Many genetic traits have complex modes of inheritance; they may exhibit incomplete or age-dependent penetrance or fail to show any clear Mendelian inheritance pattern. As primary linkage maps for the human genome near completion, it is becoming increasingly possible to map these traits. Prior to undertaking a linkage study, it is important to consider whether the pedigrees available for the proposed study are likely to provide sufficient information to demonstrate linkage, assuming a linked marker is tested. In the current paper, we describe a computer simulation method to estimate the power of a proposed study to detect linkage for a complex genetic trait, given a hypothesized genetic model for the trait. Our method simulates trait locus genotypes consistent with observed trait phenotypes, in such a way that the probability to detect linkage can be estimated by sample statistics of the maximum lod score distribution. The method uses terms available when calculating the likelihood of the trait phenotypes for the pedigree and is applicable to any trait determined by one or a few genetic loci; individual-specific environmental effects can also be dealt with. Our method provides an objective answer to the question, Will these pedigrees provide sufficient information to map this complex genetic trait?  相似文献   

15.
16.
Family-based candidate gene and genome-wide association studies are a logical progression from linkage studies for the identification of gene and polymorphisms underlying complex traits. An efficient way to analyse phenotypic and genotypic data is to model linkage and association simultaneously. An important result from such an analysis is whether any evidence for linkage remains after fitting polymorphisms at candidate genes (residual linkage), because this may indicate locus and allelic heterogeneity in the population and will influence subsequent molecular strategies. Here we report that substantial residual linkage is to be expected, even under genetic homogeneity and when the underlying causal polymorphisms are genotyped and fitted in the model. We simulated a powerful design to detect linkage to quantitative trait loci, with 5, 10 or 20 causal SNPs spread throughout the genome. These SNPs were responsible for all genetic variation, and hence for both linkage and association. Residual linkage at the largest linkage peak from a genome-wide scan was substantial, with mean LOD scores of 0.4, 0.7, and 1.4 for the case of 5, 10 and 20 underlying causal SNPs, respectively. For less powerful designs, the proportion of the original LOD scores that remains after association will be even larger. All cases of ‘significant’ residual linkage are false positives. The reason for the apparent paradox of detecting residual linkage after fitting causal polymorphisms is that the linkage signals at the largest peaks in a genome-scan are severely inflated, even if all peaks correspond to true linkage. Our findings are general and apply to linkage mapping of any phenotype and to any pedigree structure.  相似文献   

17.
The commonly used "end diagnosis" phenotype that is adopted in linkage and association studies of complex traits is likely to represent an oversimplified model of the genetic background of a disease. This is also likely to be the case for common types of migraine, for which no convincingly associated genetic variants have been reported. In headache disorders, most genetic studies have used end diagnoses of the International Headache Society (IHS) classification as phenotypes. Here, we introduce an alternative strategy; we use trait components--individual clinical symptoms of migraine--to determine affection status in genomewide linkage analyses of migraine-affected families. We identified linkage between several traits and markers on chromosome 4q24 (highest LOD score under locus heterogeneity [HLOD] 4.52), a locus we previously reported to be linked to the end diagnosis migraine with aura. The pulsation trait identified a novel locus on 17p13 (HLOD 4.65). Additionally, a trait combination phenotype (IHS full criteria) revealed a locus on 18q12 (HLOD 3.29), and the age at onset trait revealed a locus on 4q28 (HLOD 2.99). Furthermore, suggestive or nearly suggestive evidence of linkage to four additional loci was observed with the traits phonophobia (10q22) and aggravation by physical exercise (12q21, 15q14, and Xp21), and, interestingly, these loci have been linked to migraine in previous studies. Our findings suggest that the use of symptom components of migraine instead of the end diagnosis provides a useful tool in stratifying the sample for genetic studies.  相似文献   

18.
Lin M  Lou XY  Chang M  Wu R 《Genetics》2003,165(2):901-913
Because of uncertainty about linkage phases of founders, linkage mapping in nonmodel, outcrossing systems using molecular markers presents one of the major statistical challenges in genetic research. In this article, we devise a statistical method for mapping QTL affecting a complex trait by incorporating all possible QTL-marker linkage phases within a mapping framework. The advantage of this model is the simultaneous estimation of linkage phases and QTL location and effect parameters. These estimates are obtained through maximum-likelihood methods implemented with the EM algorithm. Extensive simulation studies are performed to investigate the statistical properties of our model. In a case study from a forest tree, this model has successfully identified a significant QTL affecting wood density. Also, the probability of the linkage phase between this QTL and its flanking markers is estimated. The implications of our model and its extension to more general circumstances are discussed.  相似文献   

19.
Next-generation sequencing has led to many complex-trait rare-variant (RV) association studies. Although single-variant association analysis can be performed, it is grossly underpowered. Therefore, researchers have developed many RV association tests that aggregate multiple variant sites across a genetic region (e.g., gene), and test for the association between the trait and the aggregated genotype. After these aggregate tests detect an association, it is only possible to estimate the average genetic effect for a group of RVs. As a result of the "winner’s curse," such an estimate can be biased. Although for common variants one can obtain unbiased estimates of genetic parameters by analyzing a replication sample, for RVs it is desirable to obtain unbiased genetic estimates for the study where the association is identified. This is because there can be substantial heterogeneity of RV sites and frequencies even among closely related populations. In order to obtain an unbiased estimate for aggregated RV analysis, we developed bootstrap-sample-split algorithms to reduce the bias of the winner’s curse. The unbiased estimates are greatly important for understanding the population-specific contribution of RVs to the heritability of complex traits. We also demonstrate both theoretically and via simulations that for aggregate RV analysis the genetic variance for a gene or region will always be underestimated, sometimes substantially, because of the presence of noncausal variants or because of the presence of causal variants with effects of different magnitudes or directions. Therefore, even if RVs play a major role in the complex-trait etiologies, a portion of the heritability will remain missing, and the contribution of RVs to the complex-trait etiologies will be underestimated.  相似文献   

20.
Zhang H  Wang X  Ye Y 《Genetics》2006,172(1):693-699
There is growing interest in genomewide association analysis using single-nucleotide polymorphisms (SNPs), because traditional linkage studies are not as powerful in identifying genes for common, complex diseases. Tests for linkage disequilibrium have been developed for binary and quantitative traits. However, since many human conditions and diseases are measured in an ordinal scale, methods need to be developed to investigate the association of genes and ordinal traits. Thus, in the current report we propose and derive a score test statistic that identifies genes that are associated with ordinal traits when gametic disequilibrium between a marker and trait loci exists. Through simulation, the performance of this new test is examined for both ordinal traits and quantitative traits. The proposed statistic not only accommodates and is more powerful for ordinal traits, but also has similar power to that of existing tests when the trait is quantitative. Therefore, our proposed statistic has the potential to serve as a unified approach to identifying genes that are associated with any trait, regardless of how the trait is measured. We further demonstrated the advantage of our test by revealing a significant association (P = 0.00067) between alcohol dependence and a SNP in the growth-associated protein 43.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号