首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In recent years, genome-wide association studies (GWAS) and gene-expression profiling have generated a large number of valuable datasets for assessing how genetic variations are related to disease outcomes. With such datasets, it is often of interest to assess the overall effect of a set of genetic markers, assembled based on biological knowledge. Genetic marker-set analyses have been advocated as more reliable and powerful approaches compared with the traditional marginal approaches (Curtis and others, 2005. Pathways to the analysis of microarray data. TRENDS in Biotechnology 23, 429-435; Efroni and others, 2007. Identification of key processes underlying cancer phenotypes using biologic pathway analysis. PLoS One 2, 425). Procedures for testing the overall effect of a marker-set have been actively studied in recent years. For example, score tests derived under an Empirical Bayes (EB) framework (Liu and others, 2007. Semiparametric regression of multidimensional genetic pathway data: least-squares kernel machines and linear mixed models. Biometrics 63, 1079-1088; Liu and others, 2008. Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models. BMC bioinformatics 9, 292-2; Wu and others, 2010. Powerful SNP-set analysis for case-control genome-wide association studies. American Journal of Human Genetics 86, 929) have been proposed as powerful alternatives to the standard Rao score test (Rao, 1948. Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation. Mathematical Proceedings of the Cambridge Philosophical Society, 44, 50-57). The advantages of these EB-based tests are most apparent when the markers are correlated, due to the reduction in the degrees of freedom. In this paper, we propose an adaptive score test which up- or down-weights the contributions from each member of the marker-set based on the Z-scores of their effects. Such an adaptive procedure gains power over the existing procedures when the signal is sparse and the correlation among the markers is weak. By combining evidence from both the EB-based score test and the adaptive test, we further construct an omnibus test that attains good power in most settings. The null distributions of the proposed test statistics can be approximated well either via simple perturbation procedures or via distributional approximations. Through extensive simulation studies, we demonstrate that the proposed procedures perform well in finite samples. We apply the tests to a breast cancer genetic study to assess the overall effect of the FGFR2 gene on breast cancer risk.  相似文献   

2.
This article focuses on conducting global testing for association between a binary trait and a set of rare variants (RVs), although its application can be much broader to other types of traits, common variants (CVs), and gene set or pathway analysis. We show that many of the existing tests have deteriorating performance in the presence of many nonassociated RVs: their power can dramatically drop as the proportion of nonassociated RVs in the group to be tested increases. We propose a class of so-called sum of powered score (SPU) tests, each of which is based on the score vector from a general regression model and hence can deal with different types of traits and adjust for covariates, e.g., principal components accounting for population stratification. The SPU tests generalize the sum test, a representative burden test based on pooling or collapsing genotypes of RVs, and a sum of squared score (SSU) test that is closely related to several other powerful variance component tests; a previous study (Basu and Pan 2011) has demonstrated good performance of one, but not both, of the Sum and SSU tests in many situations. The SPU tests are versatile in the sense that one of them is often powerful, although its identity varies with the unknown true association parameters. We propose an adaptive SPU (aSPU) test to approximate the most powerful SPU test for a given scenario, consequently maintaining high power and being highly adaptive across various scenarios. We conducted extensive simulations to show superior performance of the aSPU test over several state-of-the-art association tests in the presence of many nonassociated RVs. Finally we applied the SPU and aSPU tests to the GAW17 mini-exome sequence data to compare its practical performance with some existing tests, demonstrating their potential usefulness.  相似文献   

3.
The genetic basis of many common human diseases is expected to be highly heterogeneous, with multiple causative loci and multiple alleles at some of the causative loci. Analyzing the association of disease with one genetic marker at a time can have weak power, because of relatively small genetic effects and the need to correct for multiple testing. Testing the simultaneous effects of multiple markers by multivariate statistics might improve power, but they too will not be very powerful when there are many markers, because of the many degrees of freedom. To overcome some of the limitations of current statistical methods for case-control studies of candidate genes, we develop a new class of nonparametric statistics that can simultaneously test the association of multiple markers with disease, with only a single degree of freedom. Our approach, which is based on U-statistics, first measures a score over all markers for pairs of subjects and then compares the averages of these scores between cases and controls. Genetic scoring for a pair of subjects is measured by a "kernel" function, which we allow to be fairly general. However, we provide guidelines on how to choose a kernel for different types of genetic effects. Our global statistic has the advantage of having only one degree of freedom and achieves its greatest power advantage when the contrasts of average genotype scores between cases and controls are in the same direction across multiple markers. Simulations illustrate that our proposed methods have the anticipated type I-error rate and that they can be more powerful than standard methods. Application of our methods to a study of candidate genes for prostate cancer illustrates their potential merits, and offers guidelines for interpretation.  相似文献   

4.
Di CZ  Liang KY 《Biometrics》2011,67(4):1249-1259
Summary We consider likelihood ratio tests (LRT) and their modifications for homogeneity in admixture models. The admixture model is a two‐component mixture model, where one component is indexed by an unknown parameter while the parameter value for the other component is known. This model is widely used in genetic linkage analysis under heterogeneity in which the kernel distribution is binomial. For such models, it is long recognized that testing for homogeneity is nonstandard, and the LRT statistic does not converge to a conventional χ2 distribution. In this article, we investigate the asymptotic behavior of the LRT for general admixture models and show that its limiting distribution is equivalent to the supremum of a squared Gaussian process. We also discuss the connection and comparison between LRT and alternative approaches such as modifications of LRT and score tests, including the modified LRT ( Fu, Chen, and Kalbfleisch, 2006 , Statistica Sinica 16 , 805–823). The LRT is an omnibus test that is powerful to detect general alternative hypotheses. In contrast, alternative approaches may be slightly more powerful to detect certain type of alternatives, but much less powerful for others. Our results are illustrated by simulation studies and an application to a genetic linkage study of schizophrenia.  相似文献   

5.
Pan W  Basu S  Shen X 《Human heredity》2011,72(2):98-109
There has been an increasing interest in detecting gene-gene and gene-environment interactions in genetic association studies. A major statistical challenge is how to deal with a large number of parameters measuring possible interaction effects, which leads to reduced power of any statistical test due to a large number of degrees of freedom or high cost of adjustment for multiple testing. Hence, a popular idea is to first apply some dimension reduction techniques before testing, while another is to apply only statistical tests that are developed for and robust to high-dimensional data. To combine both ideas, we propose applying an adaptive sum of squared score (SSU) test and several other adaptive tests. These adaptive tests are extensions of the adaptive Neyman test [Fan, 1996], which was originally proposed for high-dimensional data, providing a simple and effective way for dimension reduction. On the other hand, the original SSU test coincides with a version of a test specifically developed for high-dimensional data. We apply these adaptive tests and their original nonadaptive versions to simulated data to detect interactions between two groups of SNPs (e.g. multiple SNPs in two candidate regions). We found that for sparse models (i.e. with only few non-zero interaction parameters), the adaptive SSU test and its close variant, an adaptive version of the weighted sum of squared score (SSUw) test, improved the power over their non-adaptive versions, and performed consistently well across various scenarios. The proposed adaptive tests are built in the general framework of regression analysis, and can thus be applied to various types of traits in the presence of covariates.  相似文献   

6.
Liu D  Lin X  Ghosh D 《Biometrics》2007,63(4):1079-1088
We consider a semiparametric regression model that relates a normal outcome to covariates and a genetic pathway, where the covariate effects are modeled parametrically and the pathway effect of multiple gene expressions is modeled parametrically or nonparametrically using least-squares kernel machines (LSKMs). This unified framework allows a flexible function for the joint effect of multiple genes within a pathway by specifying a kernel function and allows for the possibility that each gene expression effect might be nonlinear and the genes within the same pathway are likely to interact with each other in a complicated way. This semiparametric model also makes it possible to test for the overall genetic pathway effect. We show that the LSKM semiparametric regression can be formulated using a linear mixed model. Estimation and inference hence can proceed within the linear mixed model framework using standard mixed model software. Both the regression coefficients of the covariate effects and the LSKM estimator of the genetic pathway effect can be obtained using the best linear unbiased predictor in the corresponding linear mixed model formulation. The smoothing parameter and the kernel parameter can be estimated as variance components using restricted maximum likelihood. A score test is developed to test for the genetic pathway effect. Model/variable selection within the LSKM framework is discussed. The methods are illustrated using a prostate cancer data set and evaluated using simulations.  相似文献   

7.
Association mapping of complex traits typically employs tagSNP genotype data to identify a trait locus within a region of interest. However, considerable debate exists regarding the most powerful strategy for utilizing such tagSNP data for inference. A popular approach tests each tagSNP within the region individually, but such tests could lose power as a result of incomplete linkage disequilibrium between the genotyped tagSNP and the trait locus. Alternatively, one can jointly test all tagSNPs simultaneously within the region (by using genotypes or haplotypes), but such multivariate tests have large degrees of freedom that can also compromise power. Here, we consider a semiparametric model for quantitative-trait mapping that uses genetic information from multiple tagSNPs simultaneously in analysis but produces a test statistic with reduced degrees of freedom compared to existing multivariate approaches. We fit this model by using a dimension-reducing technique called least-squares kernel machines, which we show is identical to analysis using a specific linear mixed model (which we can fit by using standard software packages like SAS and R). Using simulated SNP data based on real data from the International HapMap Project, we demonstrate that our approach often has superior performance for association mapping of quantitative traits compared to the popular approach of single-tagSNP testing. Our approach is also flexible, because it allows easy modeling of covariates and, if interest exists, high-dimensional interactions among tagSNPs and environmental predictors.  相似文献   

8.
In biomedical studies, testing for homogeneity between two groups, where one group is modeled by mixture models, is often of great interest. This paper considers the semiparametric exponential family mixture model proposed by Hong et al. (2017) and studies the score test for homogeneity under this model. The score test is nonregular in the sense that nuisance parameters disappear under the null hypothesis. To address this difficulty, we propose a modification of the score test, so that the resulting test enjoys the Wilks phenomenon. In finite samples, we show that with fixed nuisance parameters the score test is locally most powerful. In large samples, we establish the asymptotic power functions under two types of local alternative hypotheses. Our simulation studies illustrate that the proposed score test is powerful and computationally fast. We apply the proposed score test to an UK ovarian cancer DNA methylation data for identification of differentially methylated CpG sites.  相似文献   

9.
Multilocus association mapping using variable-length Markov chains   总被引:1,自引:0,他引:1       下载免费PDF全文
I propose a new method for association-based gene mapping that makes powerful use of multilocus data, is computationally efficient, and is straightforward to apply over large genomic regions. The approach is based on the fitting of variable-length Markov chain models, which automatically adapt to the degree of linkage disequilibrium (LD) between markers to create a parsimonious model for the LD structure. Edges of the fitted graph are tested for association with trait status. This approach can be thought of as haplotype testing with sophisticated windowing that accounts for extent of LD to reduce degrees of freedom and number of tests while maximizing information. I present analyses of two published data sets that show that this approach can have better power than single-marker tests or sliding-window haplotypic tests.  相似文献   

10.
Inference for detecting the existence of an association between a diallelic marker and a trait locus is based on the chi-squared statistic with one degree of freedom. For polymorphic markers with m alleles (2), three approaches are mainly used in practice. First, one may use Pearson's chi-squared statistic with m-1 degrees of freedom (d.f.) but this leads to a loss in test power. Second, one can select an allele to be the most associated and then collapse the other allele categories into a single class. This reduces in a biased way, the locus to a diallelic system. Third, one may use the Terwilliger [J.D. Terwilliger, Am. J. Hum. Genet. 56 (1995) 777] likelihood ratio statistic which has a non-standard unknown limiting probability distribution. In this paper, we propose a new statistic, L(D), based on the second testing approach. We derive the asymptotic probability distribution of L(D) in an easy way. Simulation studies show that L(D) is more powerful than Pearson's chi-squared statistic with m-1 d.f.  相似文献   

11.
Statistical methods to test for effects of single nucleotide polymorphisms (SNPs) on exon inclusion exist but often rely on testing of associations between multiple exon–SNP pairs, with sometimes subsequent summarization of results at the gene level. Such approaches require heavy multiple testing corrections and detect mostly events with large effect sizes. We propose here a test to find spliceQTL (splicing quantitative trait loci) effects that takes all exons and all SNPs into account simultaneously. For any chosen gene, this score-based test looks for an association between the set of exon expressions and the set of SNPs, via a random-effects model framework. It is efficient to compute and can be used if the number of SNPs is larger than the number of samples. In addition, the test is powerful in detecting effects that are relatively small for individual exon–SNP pairs but are observed for many pairs. Furthermore, test results are more often replicated across datasets than pairwise testing results. This makes our test more robust to exon–SNP pair-specific effects, which do not extend to multiple pairs within the same gene. We conclude that the test we propose here offers more power and better replicability in the search for spliceQTL effects.  相似文献   

12.
We consider testing whether the nonparametric function in a semiparametric additive mixed model is a simple fixed degree polynomial, for example, a simple linear function. This test provides a goodness-of-fit test for checking parametric models against nonparametric models. It is based on the mixed-model representation of the smoothing spline estimator of the nonparametric function and the variance component score test by treating the inverse of the smoothing parameter as an extra variance component. We also consider testing the equivalence of two nonparametric functions in semiparametric additive mixed models for two groups, such as treatment and placebo groups. The proposed tests are applied to data from an epidemiological study and a clinical trial and their performance is evaluated through simulations.  相似文献   

13.
To detect the role of a candidate gene for a trait in a sample of individuals, we may test SNP haplotype or diplotype effects. For a limited sample size, many haplotype or diplotype categories may contain few individuals. This involves a power decrease when testing the association between the trait and the haplotypes or diplotypes as these categories provide little additional information while increasing the degrees of freedom. The present paper proposes a new strategy to group rare categories based on a measure of similarity between haplotypes or diplotypes and compares it to two other possible strategies to deal with rare categories: a SNP selection strategy based on haplotype diversity, and a grouping strategy that pools all rare categories into a single baseline group. This comparison is performed by means of simulation under four scenarios. We show that this new strategy shows the largest increase in power irrespective of the model underlying the candidate gene in the studied trait. This strategy therefore provides a powerful alternative to currently used methods to reduce the number of rare categories.  相似文献   

14.
With development of massively parallel sequencing technologies, there is a substantial need for developing powerful rare variant association tests. Common approaches include burden and non-burden tests. Burden tests assume all rare variants in the target region have effects on the phenotype in the same direction and of similar magnitude. The recently proposed sequence kernel association test (SKAT) (Wu, M. C., and others, 2011. Rare-variant association testing for sequencing data with the SKAT. The American Journal of Human Genetics 89, 82-93], an extension of the C-alpha test (Neale, B. M., and others, 2011. Testing for an unusual distribution of rare variants. PLoS Genetics 7, 161-165], provides a robust test that is particularly powerful in the presence of protective and deleterious variants and null variants, but is less powerful than burden tests when a large number of variants in a region are causal and in the same direction. As the underlying biological mechanisms are unknown in practice and vary from one gene to another across the genome, it is of substantial practical interest to develop a test that is optimal for both scenarios. In this paper, we propose a class of tests that include burden tests and SKAT as special cases, and derive an optimal test within this class that maximizes power. We show that this optimal test outperforms burden tests and SKAT in a wide range of scenarios. The results are illustrated using simulation studies and triglyceride data from the Dallas Heart Study. In addition, we have derived sample size/power calculation formula for SKAT with a new family of kernels to facilitate designing new sequence association studies.  相似文献   

15.
Strauch K 《Human heredity》2007,64(3):192-202
A MOD-score analysis, in which the parametric LOD score is maximized with respect to the trait-model parameters, can be a powerful method for the mapping of complex traits. With affected sib pairs, it has been shown before that MOD scores asymptotically follow a mixture of chi(2) distributions with 2, 1 and 0 degrees of freedom under the null hypothesis of no linkage. In that context, a MOD-score analysis yields some (albeit limited) information regarding the trait-model parameters, and there is a chance for an increased power compared to a simple LOD-score analysis. Here, it is shown that with unilineal affected relative pairs, MOD scores asymptotically follow a mixture of chi(2) distributions with 1 and 0 degrees of freedom under the null hypothesis, that is, the same distribution as followed by simple LOD scores. No information regarding the trait model can be obtained in this setting, and no power is gained when compared to a LOD-score analysis. An outlook to larger pedigrees is given. The number of degrees of freedom underlying the null distribution of MOD scores, that depends on the type of pedigrees studied, corresponds to the number of explored dimensions related to power and to the number of parameters that can jointly be estimated.  相似文献   

16.
Evaluation of impact of potential uncontrolled confounding is an important component for causal inference based on observational studies. In this article, we introduce a general framework of sensitivity analysis that is based on inverse probability weighting. We propose a general methodology that allows both non‐parametric and parametric analyses, which are driven by two parameters that govern the magnitude of the variation of the multiplicative errors of the propensity score and their correlations with the potential outcomes. We also introduce a specific parametric model that offers a mechanistic view on how the uncontrolled confounding may bias the inference through these parameters. Our method can be readily applied to both binary and continuous outcomes and depends on the covariates only through the propensity score that can be estimated by any parametric or non‐parametric method. We illustrate our method with two medical data sets.  相似文献   

17.
Wang L  Zhou XH 《Biometrics》2007,63(4):1218-1225
Heteroscedastic data arise in many applications. In heteroscedastic regression analysis, the variance is often modeled as a parametric function of the covariates or the regression mean. We propose a kernel-smoothing type nonparametric test for checking the adequacy of a given parametric variance structure. The test does not need to specify a parametric distribution for the random errors. It is shown that the test statistic has an asymptotical normal distribution under the null hypothesis and is powerful against a large class of alternatives. We suggest a simple bootstrap algorithm to approximate the distribution of the test statistic in finite sample size. Numerical simulations demonstrate the satisfactory performance of the proposed test. We also illustrate the application by the analysis of a radioimmunoassay data set.  相似文献   

18.
We propose in this paper a unified approach for testing the association between rare variants and phenotypes in sequencing association studies. This approach maximizes power by adaptively using the data to optimally combine the burden test and the nonburden sequence kernel association test (SKAT). Burden tests are more powerful when most variants in a region are causal and the effects are in the same direction, whereas SKAT is more powerful when a large fraction of the variants in a region are noncausal or the effects of causal variants are in different directions. The proposed unified test maintains the power in both scenarios. We show that the unified test corresponds to the optimal test in an extended family of SKAT tests, which we refer to as SKAT-O. The second goal of this paper is to develop a small-sample adjustment procedure for the proposed methods for the correction of conservative type I error rates of SKAT family tests when the trait of interest is dichotomous and the sample size is small. Both small-sample-adjusted SKAT and the optimal unified test (SKAT-O) are computationally efficient and can easily be applied to genome-wide sequencing association studies. We evaluate the finite sample performance of the proposed methods using extensive simulation studies and illustrate their application using the acute-lung-injury exome-sequencing data of the National Heart, Lung, and Blood Institute Exome Sequencing Project.  相似文献   

19.
The coancestry coefficient, also known as the population structure parameter, is of great interest in population genetics. It can be thought of as the intraclass correlation of pairs of alleles within populations and it can serve as a measure of genetic distance between populations. For a general class of evolutionary models it determines the distribution of allele frequencies among populations. Under more restrictive models it can be regarded as the probability of identity by descent of any pair of alleles at a locus within a random mating population. In this paper we review estimation procedures that use the method of moments or are maximum likelihood under the assumption of normally distributed allele frequencies. We then consider the problem of testing hypotheses about this parameter. In addition to parametric and non-parametric bootstrap tests we present an asymptotically-distributed chi-square test. This test reduces to the contingency-table test for equal sample sizes across populations. Our new test appears to be more powerful than previous tests, especially for loci with multiple alleles. We apply our methods to HapMap SNP data to confirm that the coancestry coefficient for humans is strictly positive.  相似文献   

20.

Background  

Single Nucleotide Polymorphism (SNP) analysis only captures a small proportion of associated genetic variants in Genome-Wide Association Studies (GWAS) partly due to small marginal effects. Pathway level analysis incorporating prior biological information offers another way to analyze GWAS's of complex diseases, and promises to reveal the mechanisms leading to complex diseases. Biologically defined pathways are typically comprised of numerous genes. If only a subset of genes in the pathways is associated with disease then a joint analysis including all individual genes would result in a loss of power. To address this issue, we propose a pathway-based method that allows us to test for joint effects by using a pre-selected gene subset. In the proposed approach, each gene is considered as the basic unit, which reduces the number of genetic variants considered and hence reduces the degrees of freedom in the joint analysis. The proposed approach also can be used to investigate the joint effect of several genes in a candidate gene study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号