首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We propose in this paper a unified approach for testing the association between rare variants and phenotypes in sequencing association studies. This approach maximizes power by adaptively using the data to optimally combine the burden test and the nonburden sequence kernel association test (SKAT). Burden tests are more powerful when most variants in a region are causal and the effects are in the same direction, whereas SKAT is more powerful when a large fraction of the variants in a region are noncausal or the effects of causal variants are in different directions. The proposed unified test maintains the power in both scenarios. We show that the unified test corresponds to the optimal test in an extended family of SKAT tests, which we refer to as SKAT-O. The second goal of this paper is to develop a small-sample adjustment procedure for the proposed methods for the correction of conservative type I error rates of SKAT family tests when the trait of interest is dichotomous and the sample size is small. Both small-sample-adjusted SKAT and the optimal unified test (SKAT-O) are computationally efficient and can easily be applied to genome-wide sequencing association studies. We evaluate the finite sample performance of the proposed methods using extensive simulation studies and illustrate their application using the acute-lung-injury exome-sequencing data of the National Heart, Lung, and Blood Institute Exome Sequencing Project.  相似文献   

2.
Pinpointing the small number of causal variants among the abundant naturally occurring genetic variation is a difficult challenge, but a crucial one for understanding precise molecular mechanisms of disease and follow-up functional studies. We propose and investigate two complementary statistical approaches for identification of rare causal variants in sequencing studies: a backward elimination procedure based on groupwise association tests, and a hierarchical approach that can integrate sequencing data with diverse functional and evolutionary conservation annotations for individual variants. Using simulations, we show that incorporation of multiple bioinformatic predictors of deleteriousness, such as PolyPhen-2, SIFT and GERP++ scores, can improve the power to discover truly causal variants. As proof of principle, we apply the proposed methods to VPS13B, a gene mutated in the rare neurodevelopmental disorder called Cohen syndrome, and recently reported with recessive variants in autism. We identify a small set of promising candidates for causal variants, including two loss-of-function variants and a rare, homozygous probably-damaging variant that could contribute to autism risk.  相似文献   

3.
Although whole-genome association studies using tagSNPs are a powerful approach for detecting common variants, they are underpowered for detecting associations with rare variants. Recent studies have demonstrated that common diseases can be due to functional variants with a wide spectrum of allele frequencies, ranging from rare to common. An effective way to identify rare variants is through direct sequencing. The development of cost-effective sequencing technologies enables association studies to use sequence data from candidate genes and, in the future, from the entire genome. Although methods used for analysis of common variants are applicable to sequence data, their performance might not be optimal. In this study, it is shown that the collapsing method, which involves collapsing genotypes across variants and applying a univariate test, is powerful for analyzing rare variants, whereas multivariate analysis is robust against inclusion of noncausal variants. Both methods are superior to analyzing each variant individually with univariate tests. In order to unify the advantages of both collapsing and multiple-marker tests, we developed the Combined Multivariate and Collapsing (CMC) method and demonstrated that the CMC method is both powerful and robust. The CMC method can be applied to either candidate-gene or whole-genome sequence data.  相似文献   

4.
Yang F  Thomas DC 《Human heredity》2011,71(4):209-220
Multiple rare variants have been suggested as accounting for some of the associations with common single nucleotide polymorphisms identified in genome-wide association studies or possibly some of the as yet undiscovered heritability. We consider the power of various approaches to designing substudies aimed at using next-generation sequencing technologies to discover novel variants and to select some subsets that are possibly causal for genotyping in the original case-control study and testing for association using various weighted sum indices. We find that the selection of variants based on the statistical significance of the case-control difference in the subsample yields good power for testing rare variant indices in the main study, and that multivariate models including both the summary index of rare variants and the associated common single nucleotide polymorphisms can distinguish which is the causal factor. By simulation, we explore the effects of varying the size of the discovery subsample, choice of index, and true causal model.  相似文献   

5.
Next-generation sequencing data will soon become routinely available for association studies between complex traits and rare variants. Sequencing data, however, are characterized by the presence of sequencing errors at each individual genotype. This makes it especially challenging to perform association studies of rare variants, which, due to their low minor allele frequencies, can be easily perturbed by genotype errors. In this article, we develop the quality-weighted multivariate score association test (qMSAT), a new procedure that allows powerful association tests between complex traits and multiple rare variants under the presence of sequencing errors. Simulation results based on quality scores from real data show that the qMSAT often dominates over current methods, that do not utilize quality information. In particular, the qMSAT can dramatically increase power over existing methods under moderate sample sizes and relatively low coverage. Moreover, in an obesity data study, we identified using the qMSAT two functional regions (MGLL promoter and MGLL 3'-untranslated region) where rare variants are associated with extreme obesity. Due to the high cost of sequencing data, the qMSAT is especially valuable for large-scale studies involving rare variants, as it can potentially increase power without additional experimental cost. qMSAT is freely available at http://qmsat.sourceforge.net/.  相似文献   

6.
Current family-based association tests for sequencing data were mainly developed for identifying rare variants associated with a complex disease. As the disease can be influenced by the joint effects of common and rare variants, common variants with modest effects may not be identified by the methods focusing on rare variants. Moreover, variants can have risk, neutral, or protective effects. Association tests that can effectively select groups of common and rare variants that are likely to be causal and consider the directions of effects have become important. We developed the Ordered Subset - Variable Threshold - Pedigree Disequilibrium Test (OVPDT), a combination of three algorithms, for association analysis in family sequencing data. The ordered subset algorithm is used to select a subset of common variants based on their relative risks, calculated using only parental mating types. The variable threshold algorithm is used to search for an optimal allele frequency threshold such that rare variants below the threshold are more likely to be causal. The PDT statistics from both rare and common variants selected by the two algorithms are combined as the OVPDT statistic. A permutation procedure is used in OVPDT to calculate the p-value. We used simulations to demonstrate that OVPDT has the correct type I error rates under different scenarios and compared the power of OVPDT with two other family-based association tests. The results suggested that OVPDT can have more power than the other tests if both common and rare variants have effects on the disease in a region.  相似文献   

7.
We propose a general statistical framework for meta-analysis of gene- or region-based multimarker rare variant association tests in sequencing association studies. In genome-wide association studies, single-marker meta-analysis has been widely used to increase statistical power by combining results via regression coefficients and standard errors from different studies. In analysis of rare variants in sequencing studies, region-based multimarker tests are often used to increase power. We propose meta-analysis methods for commonly used gene- or region-based rare variants tests, such as burden tests and variance component tests. Because estimation of regression coefficients of individual rare variants is often unstable or not feasible, the proposed method avoids this difficulty by calculating score statistics instead that only require fitting the null model for each study and then aggregating these score statistics across studies. Our proposed meta-analysis rare variant association tests are conducted based on study-specific summary statistics, specifically score statistics for each variant and between-variant covariance-type (linkage disequilibrium) relationship statistics for each gene or region. The proposed methods are able to incorporate different levels of heterogeneity of genetic effects across studies and are applicable to meta-analysis of multiple ancestry groups. We show that the proposed methods are essentially as powerful as joint analysis by directly pooling individual level genotype data. We conduct extensive simulations to evaluate the performance of our methods by varying levels of heterogeneity across studies, and we apply the proposed methods to meta-analysis of rare variant effects in a multicohort study of the genetics of blood lipid levels.  相似文献   

8.
Linkage analysis was developed to detect excess co-segregation of the putative alleles underlying a phenotype with the alleles at a marker locus in family data. Many different variations of this analysis and corresponding study design have been developed to detect this co-segregation. Linkage studies have been shown to have high power to detect loci that have alleles (or variants) with a large effect size, i.e. alleles that make large contributions to the risk of a disease or to the variation of a quantitative trait. However, alleles with a large effect size tend to be rare in the population. In contrast, association studies are designed to have high power to detect common alleles which tend to have a small effect size for most diseases or traits. Although genome-wide association studies have been successful in detecting many new loci with common alleles of small effect for many complex traits, these common variants often do not explain a large proportion of disease risk or variation of the trait. In the past, linkage studies were successful in detecting regions of the genome that were likely to harbor rare variants with large effect for many simple Mendelian diseases and for many complex traits. However, identifying the actual sequence variant(s) responsible for these linkage signals was challenging because of difficulties in sequencing the large regions implicated by each linkage peak. Current 'next-generation' DNA sequencing techniques have made it economically feasible to sequence all exons or the whole genomes of a reasonably large number of individuals. Studies have shown that rare variants are quite common in the general population, and it is now possible to combine these new DNA sequencing methods with linkage studies to identify rare causal variants with a large effect size. A brief review of linkage methods is presented here with examples of their relevance and usefulness for the interpretation of whole-exome and whole-genome sequence data.  相似文献   

9.
Zou G  Zuo Y 《Genetics》2006,172(1):687-691
With respect to the multiple-tests problem, recently an increasing amount of attention has been paid to control the false discovery rate (FDR), the positive false discovery rate (pFDR), and the proportion of false positives (PFP). The new approaches are generally believed to be more powerful than the classical Bonferroni one. This article focuses on the PFP approach. It demonstrates via examples in genetic association studies that the Bonferroni procedure can be more powerful than the PFP-control one and also shows the intrinsic connection between controlling the PFP and controlling the overall type I error rate. Since controlling the PFP does not necessarily lead to a desired power level, this article addresses the design issue and recommends the sample sizes that can attain the desired power levels when the PFP is controlled. The results in this article also provide rough guidance for the sample sizes to achieve the desired power levels when the FDR and especially the pFDR are controlled.  相似文献   

10.
A dozen genes/regions have been confirmed as genetic risk factors for oral clefts in human association and linkage studies, and animal models argue even more genes may be involved. Genomic sequencing studies should identify specific causal variants and may reveal additional genes as influencing risk to oral clefts, which have a complex and heterogeneous etiology. We conducted a whole exome sequencing (WES) study to search for potentially causal variants using affected relatives drawn from multiplex cleft families. Two or three affected second, third, and higher degree relatives from 55 multiplex families were sequenced. We examined rare single nucleotide variants (SNVs) shared by affected relatives in 348 recognized candidate genes. Exact probabilities that affected relatives would share these rare variants were calculated, given pedigree structures, and corrected for the number of variants tested. Five novel and potentially damaging SNVs shared by affected distant relatives were found and confirmed by Sanger sequencing. One damaging SNV in CDH1, shared by three affected second cousins from a single family, attained statistical significance (P = 0.02 after correcting for multiple tests). Family-based designs such as the one used in this WES study offer important advantages for identifying genes likely to be causing complex and heterogeneous disorders.  相似文献   

11.
12.
Implementing false discovery rate control: increasing your power   总被引:23,自引:0,他引:23  
Popular procedures to control the chance of making type I errors when multiple statistical tests are performed come at a high cost: a reduction in power. As the number of tests increases, power for an individual test may become unacceptably low. This is a consequence of minimizing the chance of making even a single type I error, which is the aim of, for instance, the Bonferroni and sequential Bonferroni procedures. An alternative approach, control of the false discovery rate (FDR), has recently been advocated for ecological studies. This approach aims at controlling the proportion of significant results that are in fact type I errors. Keeping the proportion of type I errors low among all significant results is a sensible, powerful, and easy-to-interpret way of addressing the multiple testing issue. To encourage practical use of the approach, in this note we illustrate how the proposed procedure works, we compare it to more traditional methods that control the familywise error rate, and we discuss some recent useful developments in FDR control.  相似文献   

13.
In recent years, genome and exome sequencing studies have implicated a plethora of new disease genes with rare causal variants. Here, I review 150 exome sequencing studies that claim to have discovered that a disease can be caused by different rare variants in the same gene, and I determine whether their methods followed the current best-practice guidelines in the interpretation of their data. Specifically, I assess whether studies appropriately assess controls for rare variants throughout the entire gene or implicated region as opposed to only investigating the specific rare variants identified in the cases, and I assess whether studies present sufficient co-segregation data for statistically significant linkage. I find that the proportion of studies performing gene-based analyses has increased with time, but that even in 2015 fewer than 40% of the reviewed studies used this method, and only 10% presented statistically significant co-segregation data. Furthermore, I find that the genes reported in these papers are explaining a decreasing proportion of cases as the field moves past most of the low-hanging fruit, with 50% of the genes from studies in 2014 and 2015 having variants in fewer than 5% of cases. As more studies focus on genes explaining relatively few cases, the importance of performing appropriate gene-based analyses is increasing. It is becoming increasingly important for journal editors and reviewers to require stringent gene-based evidence to avoid an avalanche of misleading disease gene discovery papers.  相似文献   

14.
In genome-wide genetic studies with a large number of markers, balancing the type I error rate and power is a challenging issue. Recently proposed false discovery rate (FDR) approaches are promising solutions to this problem. Using the 100 simulated datasets of a genome-wide marker map spaced about 3 cM and phenotypes from the Genetic Analysis Workshop 14, we studied the type I error rate and power of Storey's FDR approach, and compared it to the traditional Bonferroni procedure. We confirmed that Storey's FDR approach had a strong control of FDR. We found that Storey's FDR approach only provided weak control of family-wise error rate (FWER). For these simulated datasets, Storey's FDR approach only had slightly higher power than the Bonferroni procedure. In conclusion, Storey's FDR approach is more powerful than the Bonferroni procedure if strong control of FDR or weak control of FWER is desired. Storey's FDR approach has little power advantage over the Bonferroni procedure if there is low linkage disequilibrium among the markers. Further evaluation of the type I error rate and power of the FDR approaches for higher linkage disequilibrium and for haplotype analyses is warranted.  相似文献   

15.
Sequencing studies have been discovering a numerous number of rare variants, allowing the identification of the effects of rare variants on disease susceptibility. As a method to increase the statistical power of studies on rare variants, several groupwise association tests that group rare variants in genes and detect associations between genes and diseases have been proposed. One major challenge in these methods is to determine which variants are causal in a group, and to overcome this challenge, previous methods used prior information that specifies how likely each variant is causal. Another source of information that can be used to determine causal variants is the observed data because case individuals are likely to have more causal variants than control individuals. In this article, we introduce a likelihood ratio test (LRT) that uses both data and prior information to infer which variants are causal and uses this finding to determine whether a group of variants is involved in a disease. We demonstrate through simulations that LRT achieves higher power than previous methods. We also evaluate our method on mutation screening data of the susceptibility gene for ataxia telangiectasia, and show that LRT can detect an association in real data. To increase the computational speed of our method, we show how we can decompose the computation of LRT, and propose an efficient permutation test. With this optimization, we can efficiently compute an LRT statistic and its significance at a genome-wide level. The software for our method is publicly available at http://genetics.cs.ucla.edu/rarevariants .  相似文献   

16.
Deep sequencing will soon generate comprehensive sequence information in large disease samples. Although the power to detect association with an individual rare variant is limited, pooling variants by gene or pathway into a composite test provides an alternative strategy for identifying susceptibility genes. We describe a statistical method for detecting association of multiple rare variants in protein-coding genes with a quantitative or dichotomous trait. The approach is based on the regression of phenotypic values on individuals'' genotype scores subject to a variable allele-frequency threshold, incorporating computational predictions of the functional effects of missense variants. Statistical significance is assessed by permutation testing with variable thresholds. We used a rigorous population-genetics simulation framework to evaluate the power of the method, and we applied the method to empirical sequencing data from three disease studies.  相似文献   

17.
An individual's disease risk is determined by the compounded action of both common variants, inherited from remote ancestors, that segregated within the population and rare variants, inherited from recent ancestors, that segregated mainly within pedigrees. Next-generation sequencing (NGS) technologies generate high-dimensional data that allow a nearly complete evaluation of genetic variation. Despite their promise, NGS technologies also suffer from remarkable limitations: high error rates, enrichment of rare variants, and a large proportion of missing values, as well as the fact that most current analytical methods are designed for population-based association studies. To meet the analytical challenges raised by NGS, we propose a general framework for sequence-based association studies that can use various types of family and unrelated-individual data sampled from any population structure and a universal procedure that can transform any population-based association test statistic for use in family-based association tests. We develop family-based functional principal-component analysis (FPCA) with or without smoothing, a generalized T(2), combined multivariate and collapsing (CMC) method, and single-marker association test statistics. Through intensive simulations, we demonstrate that the family-based smoothed FPCA (SFPCA) has the correct type I error rates and much more power to detect association of (1) common variants, (2) rare variants, (3) both common and rare variants, and (4) variants with opposite directions of effect from other population-based or family-based association analysis methods. The proposed statistics are applied to two data sets with pedigree structures. The results show that the smoothed FPCA has a much smaller p value than other statistics.  相似文献   

18.
Common variants, such as those identified by genome-wide association scans, explain only a small proportion of trait variation. Growing evidence suggests that rare functional variants, which are usually missed by genome-wide association scans, play an important role in determining the phenotype. We used pooled multiplexed next-generation sequencing and a customized analysis workflow to detect mutations in five candidate genes for lignin biosynthesis in 768 pooled Populus nigra accessions. We identified a total of 36 non-synonymous single nucleotide polymorphisms, one of which causes a premature stop codon. The most common variant was estimated to be present in 672 of the 1536 tested chromosomes, while the rarest was estimated to occur only once in 1536 chromosomes. Comparison with individual Sanger sequencing in a selected sub-sample confirmed that variants are identified with high sensitivity and specificity, and that the variant frequency was estimated accurately. This proposed method for identification of rare polymorphisms allows accurate detection of variation in many individuals, and is cost-effective compared to individual sequencing.  相似文献   

19.
Population-scale genome sequencing allows the characterization of functional effects of a broad spectrum of genetic variants underlying human phenotypic variation. Here, we investigate the influence of rare and common genetic variants on gene expression patterns, using variants identified from sequencing data from the 1000 genomes project in an African and European population sample and gene expression data from lymphoblastoid cell lines. We detect comparable numbers of expression quantitative trait loci (eQTLs) when compared to genotypes obtained from HapMap 3, but as many as 80% of the top expression quantitative trait variants (eQTVs) discovered from 1000 genomes data are novel. The properties of the newly discovered variants suggest that mapping common causal regulatory variants is challenging even with full resequencing data; however, we observe significant enrichment of regulatory effects in splice-site and nonsense variants. Using RNA sequencing data, we show that 46.2% of nonsynonymous variants are differentially expressed in at least one individual in our sample, creating widespread potential for interactions between functional protein-coding and regulatory variants. We also use allele-specific expression to identify putative rare causal regulatory variants. Furthermore, we demonstrate that outlier expression values can be due to rare variant effects, and we approximate the number of such effects harboured in an individual by effect size. Our results demonstrate that integration of genomic and RNA sequencing analyses allows for the joint assessment of genome sequence and genome function.  相似文献   

20.
Beyond Bonferroni: less conservative analyses for conservation genetics   总被引:1,自引:0,他引:1  
Studies in conservation genetics often attempt to determine genetic differentiation between two or more temporally or geographically distinct sample collections. Pairwise p-values from Fisher’s exact tests or contingency Chi-square tests are commonly reported with a Bonferroni correction for multiple tests. While the Bonferroni correction controls the experiment-wise α, this correction is very conservative and results in greatly diminished power to detect differentiation among pairs of sample collections. An alternative is to control the false discovery rate (FDR) that provides increased power, but this method only maintains experiment-wise α when none of the pairwise comparisons are significant. Recent modifications to the FDR method provide a moderate approach to determining significance level. Simulations reveal that critical values of multiple comparison tests with both the Bonferroni method and a modified FDR method approach a minimum asymptote very near zero as the number of tests gets large, but the Bonferroni method approaches zero much more rapidly than the modified FDR method. I compared pairwise significance from three published studies using three critical values corresponding to Bonferroni, FDR, and modified FDR methods. Results suggest that the modified FDR method may provide the most biologically important critical value for evaluating significance of population differentiation in conservation genetics.␣Ultimately, more thorough reporting of statistical significance is needed to allow interpretation of biological significance of genetic differentiation among populations.An erratum to this article can be found at  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号