首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
Although they have demonstrated success in searching for common variants for complex diseases, genome-wide association (GWA) studies are less successful in detecting rare genetic variants because of the poor statistical power of most of current methods. We developed a two-stage method that can apply to GWA studies for detecting rare variants. Here we report the results of applying this two-stage method to the Wellcome Trust Case Control Consortium (WTCCC) dataset that include seven complex diseases: bipolar disorder, cardiovascular disease, hypertension (HT), rheumatoid arthritis, Crohn’s disease, type 1 diabetes and type 2 diabetes (T2D). We identified 24 genes or regions that reach genome wide significance. Eight of them are novel and were not reported in the WTCCC study. The cumulative risk (or protective) haplotype frequency for each of the 8 genes or regions is small, being at most 11%. For each of the novel genes, the risk (or protective) haplotype set cannot be tagged by the common SNPs available in chips (r 2 < 0.32). The gene identified in HT was further replicated in the Framingham Heart Study, and is also significantly associated with T2D. Our analysis suggests that searching for rare genetic variants is feasible in current GWA studies and candidate gene studies, and the results can severe as guides to future resequencing studies to identify the underlying rare functional variants.  相似文献   

2.
So HC  Sham PC 《PLoS genetics》2010,6(12):e1001230
An increasing number of genetic variants have been identified for many complex diseases. However, it is controversial whether risk prediction based on genomic profiles will be useful clinically. Appropriate statistical measures to evaluate the performance of genetic risk prediction models are required. Previous studies have mainly focused on the use of the area under the receiver operating characteristic (ROC) curve, or AUC, to judge the predictive value of genetic tests. However, AUC has its limitations and should be complemented by other measures. In this study, we develop a novel unifying statistical framework that connects a large variety of predictive indices together. We showed that, given the overall disease probability and the level of variance in total liability (or heritability) explained by the genetic variants, we can estimate analytically a large variety of prediction metrics, for example the AUC, the mean risk difference between cases and non-cases, the net reclassification improvement (ability to reclassify people into high- and low-risk categories), the proportion of cases explained by a specific percentile of population at the highest risk, the variance of predicted risks, and the risk at any percentile. We also demonstrate how to construct graphs to visualize the performance of risk models, such as the ROC curve, the density of risks, and the predictiveness curve (disease risk plotted against risk percentile). The results from simulations match very well with our theoretical estimates. Finally we apply the methodology to nine complex diseases, evaluating the predictive power of genetic tests based on known susceptibility variants for each trait.  相似文献   

3.
Non-replication and inconsistency had been common features in the search for common variants of candidate genes affecting the risk of complex diseases. They may continue to require attention in the current era, when massive hypothesis-free testing of genetic variants is feasible. An empirical evaluation of the early experience with genome-wide association (GWA) studies suggests several examples where proposed associations have failed to be replicated by subsequent investigations. Non-replication and inconsistency is defined here in the framework of cumulative meta-analysis. Ideally, associations exist, GWA finds them, and subsequent investigations should replicate them. However, a number of other possibilities need to be considered. No common genetic variants may associate with the phenotype of interest and GWA may find nothing; or associations may exist, but GWA may miss them. Associations that do not exist may be falsely selected by the GWA and subsequent studies may appropriately refute them or falsely replicate them. Finally, GWA may find true associations that are nevertheless falsely non-replicated in the subsequent studies; or associations may be genuinely inconsistent across study populations. A list of options is presented for consideration in each of these scenarios.  相似文献   

4.
Genetic information, such as single nucleotide polymorphism (SNP) data, has been widely recognized as useful in prediction of disease risk. However, how to model the genetic data that is often categorical in disease class prediction is complex and challenging. In this paper, we propose a novel class of nonlinear threshold index logistic models to deal with the complex, nonlinear effects of categorical/discrete SNP covariates for Schizophrenia class prediction. A maximum likelihood methodology is suggested to estimate the unknown parameters in the models. Simulation studies demonstrate that the proposed methodology works viably well for moderate-size samples. The suggested approach is therefore applied to the analysis of the Schizophrenia classification by using a real set of SNP data from Western Australian Family Study of Schizophrenia (WAFSS). Our empirical findings provide evidence that the proposed nonlinear models well outperform the widely used linear and tree based logistic regression models in class prediction of schizophrenia risk with SNP data in terms of both Types I/II error rates and ROC curves.  相似文献   

5.
MOTIVATION: The identification of risk-associated genetic variants in common diseases remains a challenge to the biomedical research community. It has been suggested that common statistical approaches that exclusively measure main effects are often unable to detect interactions between some of these variants. Detecting and interpreting interactions is a challenging open problem from the statistical and computational perspectives. Methods in computing science may improve our understanding on the mechanisms of genetic disease by detecting interactions even in the presence of very low heritabilities. RESULTS: We have implemented a method using Genetic Programming that is able to induce a Decision Tree to detect interactions in genetic variants. This method has a cross-validation strategy for estimating classification and prediction errors and tests for consistencies in the results. To have better estimates, a new consistency measure that takes into account interactions and can be used in a genetic programming environment is proposed. This method detected five different interaction models with heritabilities as low as 0.008 and with prediction errors similar to the generated errors. AVAILABILITY: Information on the generated data sets and executable code is available upon request.  相似文献   

6.
Current extensive genetic research into common complex diseases, especially with the completion of genome-wide association studies, is bringing to light many novel genetic risk loci. These new discoveries, along with previously known genetic risk variants, offer an important opportunity for researchers to improve health care. We describe a method of quick evaluation of these new findings for potential clinical practice by designing a new predictive genetic test, estimating its classification accuracy, and determining the sample size required for the verification of this accuracy. The proposed predictive test is asymptotically more powerful than tests built on any other existing method and can be extended to scenarios where loci are linked or interact. We illustrate the approach for the case of type 2 diabetes. We incorporate recently discovered risk factors into the proposed test and find a potentially better predictive genetic test. The area under the receiver operating characteristic (ROC) curve (AUC) of the proposed test is estimated to be higher (AUC = 0.671) than for the existing test (AUC = 0.580).  相似文献   

7.
Dong C  Qian Z  Jia P  Wang Y  Huang W  Li Y 《PloS one》2007,2(12):e1262

Background

The high-throughput genotyping chips have contributed greatly to genome-wide association (GWA) studies to identify novel disease susceptibility single nucleotide polymorphisms (SNPs). The high-density chips are designed using two different SNP selection approaches, the direct gene-centric approach, and the indirect quasi-random SNPs or linkage disequilibrium (LD)-based tagSNPs approaches. Although all these approaches can provide high genome coverage and ascertain variants in genes, it is not clear to which extent these approaches could capture the common genic variants. It is also important to characterize and compare the differences between these approaches.

Methodology/Principal Findings

In our study, by using both the Phase II HapMap data and the disease variants extracted from OMIM, a gene-centric evaluation was first performed to evaluate the ability of the approaches in capturing the disease variants in Caucasian population. Then the distribution patterns of SNPs were also characterized in genic regions, evolutionarily conserved introns and nongenic regions, ontologies and pathways. The results show that, no mater which SNP selection approach is used, the current high-density SNP chips provide very high coverage in genic regions and can capture most of known common disease variants under HapMap frame. The results also show that the differences between the direct and the indirect approaches are relatively small. Both have similar SNP distribution patterns in these gene-centric characteristics.

Conclusions/Significance

This study suggests that the indirect approaches not only have the advantage of high coverage but also are useful for studies focusing on various functional SNPs either in genes or in the conserved regions that the direct approach supports. The study and the annotation of characteristics will be helpful for designing and analyzing GWA studies that aim to identify genetic risk factors involved in common diseases, especially variants in genes and conserved regions.  相似文献   

8.

Objectives

Rotator cuff tear is a common cause of shoulder diseases. Correct diagnosis of rotator cuff tears can save patients from further invasive, costly and painful tests. This study used predictive data mining and Bayesian theory to improve the accuracy of diagnosing rotator cuff tears by clinical examination alone.

Methods

In this retrospective study, 169 patients who had a preliminary diagnosis of rotator cuff tear on the basis of clinical evaluation followed by confirmatory MRI between 2007 and 2011 were identified. MRI was used as a reference standard to classify rotator cuff tears. The predictor variable was the clinical assessment results, which consisted of 16 attributes. This study employed 2 data mining methods (ANN and the decision tree) and a statistical method (logistic regression) to classify the rotator cuff diagnosis into “tear” and “no tear” groups. Likelihood ratio and Bayesian theory were applied to estimate the probability of rotator cuff tears based on the results of the prediction models.

Results

Our proposed data mining procedures outperformed the classic statistical method. The correction rate, sensitivity, specificity and area under the ROC curve of predicting a rotator cuff tear were statistical better in the ANN and decision tree models compared to logistic regression. Based on likelihood ratios derived from our prediction models, Fagan''s nomogram could be constructed to assess the probability of a patient who has a rotator cuff tear using a pretest probability and a prediction result (tear or no tear).

Conclusions

Our predictive data mining models, combined with likelihood ratios and Bayesian theory, appear to be good tools to classify rotator cuff tears as well as determine the probability of the presence of the disease to enhance diagnostic decision making for rotator cuff tears.  相似文献   

9.
牛大彦  严卫丽 《遗传》2015,37(12):1204-1210
心血管疾病、2型糖尿病、原发性高血压、哮喘、肥胖、肿瘤等复杂疾病在全球范围内流行,并成为人类死亡的主要原因。越来越多的人开始关注遗传易感性在复杂疾病发病机制中的作用。至今,与复杂疾病相关的易感基因和基因序列变异仍未完全清楚。人们希望通过遗传关联研究来阐明复杂疾病的遗传基础。近年来,全基因组关联研究和候选基因研究发现了大量与复杂疾病有关的基因序列变异。这些与复杂疾病有因果和(或)关联关系的基因序列变异的发现促进了复杂疾病预测和防治方法的产生和发展。遗传风险评分(Genetic risk score,GRS)作为探索单核苷酸多态(Single nucleotide polymorphisms,SNPs)与复杂疾病临床表型之间关系的新兴方法,综合了若干SNPs的微弱效应,使基因多态对疾病的预测性大幅度提升。该方法在许多复杂疾病遗传学研究中得到成功应用。本文重点介绍了GRS的计算方法和评价标准,简要列举了运用GRS取得的系列成果,并对运用过程中所存在的局限性进行了探讨,最后对遗传风险评分的未来发展方向进行了展望。  相似文献   

10.

Background

Genome-wide association (GWA) study has recently become a powerful approach for detecting genetic variants for common diseases without prior knowledge of the variant's location or function. Generally, in GWA studies, the most significant single-nucleotide polymorphisms (SNPs) associated with top-ranked p values are selected in stage one, with follow-up in stage two. The value of selecting SNPs based on statistically significant p values is obvious. However, when minor allele frequencies (MAFs) are relatively low, less-significant p values can still correspond to higher odds ratios (ORs), which might be more useful for prediction of disease status. Therefore, if SNPs are selected using an approach based only on significant p values, some important genetic variants might be missed. We proposed a hybrid approach for selecting candidate SNPs from the discovery stage of GWA study, based on both p values and ORs, and conducted a simulation study to demonstrate the performance of our approach.

Results

The simulation results showed that our hybrid ranking approach was more powerful than the existing ranked p value approach for identifying relatively less-common SNPs. Meanwhile, the type I error probabilities of the hybrid approach is well-controlled at the end of the second stage of the two-stage GWA study.

Conclusions

In GWA studies, SNPs should be considered for inclusion based not only on ranked p values but also on ranked ORs.  相似文献   

11.
Gene discovery, estimation of heritability captured by SNP arrays, inference on genetic architecture and prediction analyses of complex traits are usually performed using different statistical models and methods, leading to inefficiency and loss of power. Here we use a Bayesian mixture model that simultaneously allows variant discovery, estimation of genetic variance explained by all variants and prediction of unobserved phenotypes in new samples. We apply the method to simulated data of quantitative traits and Welcome Trust Case Control Consortium (WTCCC) data on disease and show that it provides accurate estimates of SNP-based heritability, produces unbiased estimators of risk in new samples, and that it can estimate genetic architecture by partitioning variation across hundreds to thousands of SNPs. We estimated that, depending on the trait, 2,633 to 9,411 SNPs explain all of the SNP-based heritability in the WTCCC diseases. The majority of those SNPs (>96%) had small effects, confirming a substantial polygenic component to common diseases. The proportion of the SNP-based variance explained by large effects (each SNP explaining 1% of the variance) varied markedly between diseases, ranging from almost zero for bipolar disorder to 72% for type 1 diabetes. Prediction analyses demonstrate that for diseases with major loci, such as type 1 diabetes and rheumatoid arthritis, Bayesian methods outperform profile scoring or mixed model approaches.  相似文献   

12.
Genome-wide association studies (GWAS) have been fruitful in identifying disease susceptibility loci for common and complex diseases. A remaining question is whether we can quantify individual disease risk based on genotype data, in order to facilitate personalized prevention and treatment for complex diseases. Previous studies have typically failed to achieve satisfactory performance, primarily due to the use of only a limited number of confirmed susceptibility loci. Here we propose that sophisticated machine-learning approaches with a large ensemble of markers may improve the performance of disease risk assessment. We applied a Support Vector Machine (SVM) algorithm on a GWAS dataset generated on the Affymetrix genotyping platform for type 1 diabetes (T1D) and optimized a risk assessment model with hundreds of markers. We subsequently tested this model on an independent Illumina-genotyped dataset with imputed genotypes (1,008 cases and 1,000 controls), as well as a separate Affymetrix-genotyped dataset (1,529 cases and 1,458 controls), resulting in area under ROC curve (AUC) of ∼0.84 in both datasets. In contrast, poor performance was achieved when limited to dozens of known susceptibility loci in the SVM model or logistic regression model. Our study suggests that improved disease risk assessment can be achieved by using algorithms that take into account interactions between a large ensemble of markers. We are optimistic that genotype-based disease risk assessment may be feasible for diseases where a notable proportion of the risk has already been captured by SNP arrays.  相似文献   

13.
Studies have argued that genetic testing will provide limited information for predicting the probability of common diseases, because of the incomplete penetrance of genotypes and the low magnitude of associated risks for the general population. Such studies, however, have usually examined the effect of one gene at time. We argue that disease prediction for common multifactorial diseases is greatly improved by considering multiple predisposing genetic and environmental factors concurrently, provided that the model correctly reflects the underlying disease etiology. We show how likelihood ratios can be used to combine information from several genetic tests to compute the probability of developing a multifactorial disease. To show how concurrent use of multiple genetic tests improves the prediction of a multifactorial disease, we compute likelihood ratios by logistic regression with simulated case-control data for a hypothetical disease influenced by multiple genetic and environmental risk factors. As a practical example, we also apply this approach to venous thrombosis, a multifactorial disease influenced by multiple genetic and nongenetic risk factors. Under reasonable conditions, the concurrent use of multiple genetic tests markedly improves prediction of disease. For example, the concurrent use of a panel of three genetic tests (factor V Leiden, prothrombin variant G20210A, and protein C deficiency) increases the positive predictive value of testing for venous thrombosis at least eightfold. Multiplex genetic testing has the potential to improve the clinical validity of predictive testing for common multifactorial diseases.  相似文献   

14.
Analysing biological pathways in genome-wide association studies   总被引:1,自引:0,他引:1  
Genome-wide association (GWA) studies have typically focused on the analysis of single markers, which often lacks the power to uncover the relatively small effect sizes conferred by most genetic variants. Recently, pathway-based approaches have been developed, which use prior biological knowledge on gene function to facilitate more powerful analysis of GWA study data sets. These approaches typically examine whether a group of related genes in the same functional pathway are jointly associated with a trait of interest. Here we review the development of pathway-based approaches for GWA studies, discuss their practical use and caveats, and suggest that pathway-based approaches may also be useful for future GWA studies with sequencing data.  相似文献   

15.
Recent successful discoveries of potentially causal single nucleotide polymorphisms (SNPs) for complex diseases hold great promise, and commercialization of genomics in personalized medicine has already begun. The hope is that genetic testing will benefit patients and their families, and encourage positive lifestyle changes and guide clinical decisions. However, for many complex diseases, it is arguable whether the era of genomics in personalized medicine is here yet. We focus on the clinical validity of genetic testing with an emphasis on two popular statistical methods for evaluating markers. The two methods, logistic regression and receiver operating characteristic (ROC) curve analysis, are applied to our age-related macular degeneration dataset. By using an additive model of the CFH, LOC387715, and C2 variants, the odds ratios are 2.9, 3.4, and 0.4, with p-values of 10−13, 10−13, and 10−3, respectively. The area under the ROC curve (AUC) is 0.79, but assuming prevalences of 15%, 5.5%, and 1.5% (which are realistic for age groups 80 y, 65 y, and 40 y and older, respectively), only 30%, 12%, and 3% of the group classified as high risk are cases. Additionally, we present examples for four other diseases for which strongly associated variants have been discovered. In type 2 diabetes, our classification model of 12 SNPs has an AUC of only 0.64, and two SNPs achieve an AUC of only 0.56 for prostate cancer. Nine SNPs were not sufficient to improve the discrimination power over that of nongenetic predictors for risk of cardiovascular events. Finally, in Crohn''s disease, a model of five SNPs, one with a quite low odds ratio of 0.26, has an AUC of only 0.66. Our analyses and examples show that strong association, although very valuable for establishing etiological hypotheses, does not guarantee effective discrimination between cases and controls. The scientific community should be cautious to avoid overstating the value of association findings in terms of personalized medicine before their time.  相似文献   

16.
Since the seminal work of Prentice and Pyke, the prospective logistic likelihood has become the standard method of analysis for retrospectively collected case‐control data, in particular for testing the association between a single genetic marker and a disease outcome in genetic case‐control studies. In the study of multiple genetic markers with relatively small effects, especially those with rare variants, various aggregated approaches based on the same prospective likelihood have been developed to integrate subtle association evidence among all the markers considered. Many of the commonly used tests are derived from the prospective likelihood under a common‐random‐effect assumption, which assumes a common random effect for all subjects. We develop the locally most powerful aggregation test based on the retrospective likelihood under an independent‐random‐effect assumption, which allows the genetic effect to vary among subjects. In contrast to the fact that disease prevalence information cannot be used to improve efficiency for the estimation of odds ratio parameters in logistic regression models, we show that it can be utilized to enhance the testing power in genetic association studies. Extensive simulations demonstrate the advantages of the proposed method over the existing ones. A real genome‐wide association study is analyzed for illustration.  相似文献   

17.
Combining diagnostic test results to increase accuracy   总被引:4,自引:0,他引:4  
When multiple diagnostic tests are performed on an individual or multiple disease markers are available it may be possible to combine the information to diagnose disease. We consider how to choose linear combinations of markers in order to optimize diagnostic accuracy. The accuracy index to be maximized is the area or partial area under the receiver operating characteristic (ROC) curve. We propose a distribution-free rank-based approach for optimizing the area under the ROC curve and compare it with logistic regression and with classic linear discriminant analysis (LDA). It has been shown that the latter method optimizes the area under the ROC curve when test results have a multivariate normal distribution for diseased and non-diseased populations. Simulation studies suggest that the proposed non-parametric method is efficient when data are multivariate normal.The distribution-free method is generalized to a smooth distribution-free approach to: (i) accommodate some reasonable smoothness assumptions; (ii) incorporate covariate effects; and (iii) yield optimized partial areas under the ROC curve. This latter feature is particularly important since it allows one to focus on a region of the ROC curve which is of most relevance to clinical practice. Neither logistic regression nor LDA necessarily maximize partial areas. The approaches are illustrated on two cancer datasets, one involving serum antigen markers for pancreatic cancer and the other involving longitudinal prostate specific antigen data.  相似文献   

18.
With the advent of genome-wide association (GWA) studies, researchers are hoping that reliable genetic association of common human complex diseases/traits can be detected. Currently, there is an increasing enthusiasm about GWA and a number of GWA studies have been published. In the field a common practice is that replication should be used as the gold standard to validate an association finding. In this article, based on empirical and theoretical data, we emphasize that replication of GWA findings can be quite difficult, and should not always be expected, even when true variants are identified. The probability of replication becomes smaller with the increasing number of independent GWA studies if the power of individual replication studies is less than 100% (which is usually the case), and even a finding that is replicated may not necessarily be true. We argue that the field may have unreasonably high expectations on success of replication. We also wish to raise the question whether it is sufficient or necessary to treat replication as the ultimate and gold standard for defining true variants. We finally discuss the usefulness of integrating evidence from multiple levels/sources such as genetic epidemiological studies (at the DNA level), gene expression studies (at the RNA level), proteomics (at the protein level), and follow-up molecular and cellular studies for eventual validation and illumination of the functional relevance of the genes uncovered.  相似文献   

19.
Mitochondrial dysfunction has been observed in skeletal muscle of people with diabetes and insulin-resistant individuals. Furthermore, inherited mutations in mitochondrial DNA can cause a rare form of diabetes. However, it is unclear whether mitochondrial dysfunction is a primary cause of the common form of diabetes. To date, common genetic variants robustly associated with type 2 diabetes (T2D) are not known to affect mitochondrial function. One possibility is that multiple mitochondrial genes contain modest genetic effects that collectively influence T2D risk. To test this hypothesis we developed a method named Meta-Analysis Gene-set Enrichment of variaNT Associations (MAGENTA; http://www.broadinstitute.org/mpg/magenta). MAGENTA, in analogy to Gene Set Enrichment Analysis, tests whether sets of functionally related genes are enriched for associations with a polygenic disease or trait. MAGENTA was specifically designed to exploit the statistical power of large genome-wide association (GWA) study meta-analyses whose individual genotypes are not available. This is achieved by combining variant association p-values into gene scores and then correcting for confounders, such as gene size, variant number, and linkage disequilibrium properties. Using simulations, we determined the range of parameters for which MAGENTA can detect associations likely missed by single-marker analysis. We verified MAGENTA''s performance on empirical data by identifying known relevant pathways in lipid and lipoprotein GWA meta-analyses. We then tested our mitochondrial hypothesis by applying MAGENTA to three gene sets: nuclear regulators of mitochondrial genes, oxidative phosphorylation genes, and ∼1,000 nuclear-encoded mitochondrial genes. The analysis was performed using the most recent T2D GWA meta-analysis of 47,117 people and meta-analyses of seven diabetes-related glycemic traits (up to 46,186 non-diabetic individuals). This well-powered analysis found no significant enrichment of associations to T2D or any of the glycemic traits in any of the gene sets tested. These results suggest that common variants affecting nuclear-encoded mitochondrial genes have at most a small genetic contribution to T2D susceptibility.  相似文献   

20.
目的 男性型脱发(male pattern baldness,MPB),又称为雄激素性脱发(AGA),是一种常见的男性脱发类型,大约80%的表型差异可以用遗传因素解释。目前的MPB遗传推断研究主要基于欧洲人群,东亚人群相关研究较少。本研究在中国人群中对欧洲人群MPB关联位点进行验证分析,并建立遗传推断模型。方法 本研究调查了486个与欧洲人群MPB相关单核苷酸多态性(SNP)位点在312名中国汉族男性中的关联性,分别使用逐步回归和Lasso回归方法对关联出的位点进行筛选。使用逻辑回归算法构建预测模型,通过十折交叉验证的方法评估。之后进一步比较了逻辑回归、k近邻分类器、随机森林、支持向量机4种常用分类器模型对MPB的预测准确性。结果 有174个SNP位点与中国汉族男性的MPB显著相关(P<0.05)。通过不同的筛选方法,分别得到了22个SNP和25个SNP的位点集合。基于上述位点集合建立了22-SNP和 25-SNP两种逻辑回归预测模型。以AUC(ROC曲线下方的面积大小,area under curve)来衡量,两种模型对MPB预测的准确性分别为0.85和0.84;经十折交叉验证后预测准确性分别下降至0.81和0.77。当加入年龄作为预测因子后,两种模型的AUC均达到最大值0.89。从运行结果来看,逻辑回归预测模型较本研究中的其他分类器模型具有明显优势。结论 总体而言,虽然预测模型的准确性尚未达到临床期望水平,但SNP在MPB的遗传预测方面仍具备很大的潜力,可以为MPB的早期诊断、临床干预和法庭科学应用提供参考。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号