首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 546 毫秒
1.
Generalized T2 test for genome association studies   总被引:4,自引:0,他引:4       下载免费PDF全文
Recent progress in the development of single-nucleotide polymorphism (SNP) maps within genes and across the genome provides a valuable tool for fine-mapping and has led to the suggestion of genomewide association studies to search for susceptibility loci for complex traits. Test statistics for genome association studies that consider a single marker at a time, ignoring the linkage disequilibrium between markers, are inefficient. In this study, we present a generalized T2 statistic for association studies of complex traits, which can utilize multiple SNP markers simultaneously and considers the effects of multiple disease-susceptibility loci. This generalized T2 statistic is a corollary to that originally developed for multivariate analysis and has a close relationship to discriminant analysis and common measure of genetic distance. We evaluate the power of the generalized T2 statistic and show that power to be greater than or equal to those of the traditional chi2 test of association and a similar haplotype-test statistic. Finally, examples are given to evaluate the performance of the proposed T2 statistic for association studies using simulated and real data.  相似文献   

2.
The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T(2), collapsing method, multivariate and collapsing (CMC) method, individual χ(2) test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets.  相似文献   

3.
One way to perform linkage-disequilibrium (LD) mapping of genetic traits is to use single markers. Since dense marker maps-such as single-nucleotide polymorphism and high-resolution microsatellite maps-are available, it is natural and practical to generalize single-marker LD mapping to high-resolution haplotype or multiple-marker LD mapping. This article investigates high-resolution LD-mapping methods, for complex diseases, based on haplotype maps or microsatellite marker maps. The objective is to explore test statistics that combine information from haplotype blocks or multiple markers. Based on two coding methods, genotype coding and haplotype coding, Hotelling's T2 statistics TG and TH are proposed to test the association between a disease locus and two haplotype blocks or two markers. The validity of the two T2 statistics is proved by theoretical calculations. A statistic TC, an extension of the traditional chi2 method of comparing haplotype frequencies, is introduced by simply adding the chi2 test statistics of the two haplotype blocks together. The merit of the three methods is explored by calculation and comparison of power and of type I errors. In the presence of LD between the two blocks, the type I error of TC is higher than that of TH and TG, since TC ignores the correlation between the two blocks. For each of the three statistics, the power of using two haplotype blocks is higher than that of using only one haplotype block. By power comparison, we notice that TC has higher power than that of TH, and TH has higher power than that of TG. In the absence of LD between the two blocks, the power of TC is similar to that of TH and higher than that of TG. Hence, we advocate use of TH in the data analysis. In the presence of LD between the two blocks, TH takes into account the correlation between the two haplotype blocks and has a lower type I error and higher power than TG. Besides, the feasibility of the methods is shown by sample-size calculation.  相似文献   

4.
A test statistic to detect errors in sib-pair relationships.   总被引:4,自引:2,他引:2  
Several authors have proposed algorithms to detect Mendelian errors in human genetic linkage data. Most currently available methods use likelihood-based methods on multiplex family data to identify typing or pedigree errors. These algorithms cannot be applied in many sib-pair collections, because of lack of parental-genotype information. Nonetheless, misspecifying the relationships between individuals has serious consequences for sib-pair linkage studies: false relationships bias the statistics designed to identify linkage with disease phenotypes. To test the hypothesis that two individuals are sibs, we propose a test statistic based on the summation, over a large number of genetic markers, of the number of alleles shared identical by state by a pair of individuals, for each marker. The test statistic has an approximately normal distribution under the null hypothesis, and extreme negative values correspond to nonsib pairs. Power and significance studies show that the test statistic calculated by use of 50 unlinked markers has 96% power to detect half-sibs and has 100% power to detect unrelated individuals as not full-sib pairs, with a 5% false-positive rate. Furthermore, extreme positive values of the test statistic identify sibs as MZ twins.  相似文献   

5.
We have compared the power of several allele-sharing statistics for "nonparametric" linkage analysis of X-linked traits in nuclear families and extended pedigrees. Our rationale was that, although several of these statistics have been implemented in popular software packages, there has been no formal evaluation of their relative power. Here, we evaluate the relative performance of five test statistics, including two new test statistics. We considered sibships of sizes two through four, four different extended pedigrees, 15 different genetic models (12 single-locus models and 3 two-locus models), and varying recombination fractions between the marker and the trait locus. We analytically estimated the sample sizes required for 80% power at a significance level of.001 and also used simulation methods to estimate power for a sample size of 10 families. We tried to identify statistics whose power was robust over a wide variety of models, with the idea that such statistics would be particularly useful for detection of X-linked loci associated with complex traits. We found that a commonly used statistic, S(all), generally performed well under various conditions and had close to the optimal sample sizes in most cases but that there were certain cases in which it performed quite poorly. Our two new statistics did not perform any better than those already in the literature. We also note that, under dominant and additive models, regardless of the statistic used, pedigrees with all-female siblings have very little power to detect X-linked loci.  相似文献   

6.
We present here four nonparametric statistics for linkage analysis that test whether pairs of affected relatives share marker alleles more often than expected. These statistics are based on simulating the null distribution of a given statistic conditional on the unaffecteds' marker genotypes. Each statistic uses a different measure of marker sharing: the SimAPM statistic uses the simulation-based affected-pedigree-member measure based on identity-by-state (IBS) sharing. The SimKIN (kinship) measure is 1.0 for identity-by-descent (IBD) sharing, 0.0 for no IBD status sharing, and the kinship coefficient when the IBD status is ambiguous. The simulation-based IBD (SimIBD) statistic uses a recursive algorithm to determine the probability of two affecteds sharing a specific allele IBD. The SimISO statistic is identical to SimIBD, except that it also measures marker similarity between unaffected pairs. We evaluated our statistics on data simulated under different two-locus disease models, comparing our results to those obtained with several other nonparametric statistics. Use of IBD information produces dramatic increases in power over the SimAPM method, which uses only IBS information. The power of our best statistic in most cases meets or exceeds the power of the other nonparametric statistics. Furthermore, our statistics perform comparisons between all affected relative pairs within general pedigrees and are not restricted to sib pairs or nuclear families.  相似文献   

7.
As the ability to measure dense genetic markers approaches the limit of the DNA sequence itself, taking advantage of possible clustering of genetic variants in, and around, a gene would benefit genetic association analyses, and likely provide biological insights. The greatest benefit might be realized when multiple rare variants cluster in a functional region. Several statistical tests have been developed, one of which is based on the popular Kulldorff scan statistic for spatial clustering of disease. We extended another popular spatial clustering method—Tango’s statistic—to genomic sequence data. An advantage of Tango’s method is that it is rapid to compute, and when single test statistic is computed, its distribution is well approximated by a scaled χ 2 distribution, making computation of p values very rapid. We compared the Type-I error rates and power of several clustering statistics, as well as the omnibus sequence kernel association test. Although our version of Tango’s statistic, which we call “Kernel Distance” statistic, took approximately half the time to compute than the Kulldorff scan statistic, it had slightly less power than the scan statistic. Our results showed that the Ionita-Laza version of Kulldorff’s scan statistic had the greatest power over a range of clustering scenarios.  相似文献   

8.
OBJECTIVE: The potential value of haplotypes has attracted widespread interest in the mapping of complex traits. Haplotype sharing methods take the linkage disequilibrium information between multiple markers into account, and may have good power to detect predisposing genes. We present a new approach based on Mantel statistics for spacetime clustering, which is developed in order to improve the power of haplotype sharing analysis for gene mapping in complex disease. METHODS: The new statistic correlates genetic similarity and phenotypic similarity across pairs of haplotypes for case-only and case-control studies. The genetic similarity is measured as the shared length between haplotypes around a putative disease locus. The phenotypic similarity is measured as the mean-corrected cross-product based on the respective phenotypes. We analyzed two tests for statistical significance with respect to type I error: (1) assuming asymptotic normality, and (2) using a Monte Carlo permutation procedure. The results were compared to the chi(2) test for association based on 3-marker haplotypes. RESULTS: The results of the type I error rates for the Mantel statistics using the permutational procedure yielded pointwise valid tests. The approach based on the assumption of asymptotic normality was seriously liberal. CONCLUSION: Power comparisons showed that the Mantel statistics were better than or equal to the chi(2) test for all simulated disease models.  相似文献   

9.
Sib-pair linkage analysis has been proposed for identifying genes that predispose to common diseases. We have shown that the presence of assortative mating and multiple disease-susceptibility loci (genetic heterogeneity) can increase the required sample size for affected-affected sib pairs several fold over the sample size required under random mating. We propose a new test statistic based on sib trios composed of either one unaffected and two affected siblings or one affected and two unaffected siblings. The sample-size requirements under assortative mating and multiple disease loci for these sib-trio statistics are much smaller, under most conditions, than the corresponding sample sizes for sib pairs. Study designs based on data from sib trios with one or two affected members are recommended whenever assortative mating and genetic heterogeneity are suspected.  相似文献   

10.
We analyze some aspects of scan statistics, which have been proposed to help for the detection of weak signals in genetic linkage analysis. We derive approximate expressions for the power of a test based on moving averages of the identity by descent allele sharing proportions for pairs of relatives at several contiguous markers. We confirm these approximate formulae by simulation. The results show that when there is a single trait-locus on a chromosome, the test based on the scan statistic is slightly less powerful than that based on the customary allele sharing statistic. On the other hand, if two genes having a moderate effect on a trait lie close to each other on the same chromosome, scan statistics improve power to detect linkage.  相似文献   

11.
Both theoretical calculations and simulation studies have been used to compare and contrast the statistical power of methods for mapping quantitative trait loci (QTLs) in simple and complex pedigrees. A widely used approach in such studies is to derive or simulate the expected mean test statistic under the alternative hypothesis of a segregating QTL and to equate a larger mean test statistic with larger power. In the present study, we show that, even when the test statistic under the null hypothesis of no linkage follows a known asymptotic distribution (the standard being chi(2)), it cannot be assumed that the distribution under the alternative hypothesis is noncentral chi(2). Hence, mean test statistics cannot be used to indicate power differences, and a comparison between methods that are based on simulated average test statistics may lead to the wrong conclusion. We illustrate this important finding, through simulations and analytical derivations, for a recently proposed new regression method for the analysis of general pedigrees to map quantitative trait loci. We show that this regression method is not necessarily more powerful nor computationally more efficient than a maximum-likelihood variance-component approach. We advocate the use of empirical power to compare trait-mapping methods.  相似文献   

12.
Summary This article describes applications of extensions of bivariate rank sum statistics to the crossover design with four sequence groups for two treatments. A randomized clinical trial in ophthalmology provides motivating background for the discussion. The bilateral design for this study has four sequence groups T:T, T:P, P:T, and P:P, respectively, for T as test treatment or P as placebo in the corresponding order for the left and right eyes. This article describes how to use the average of the separate Wilcoxon rank sum statistics for the left and right eyes for the overall comparison between T and P with the correlation between the two eyes taken into account. An extension of this criterion with better sensitivity to potential differences between T and P through reduction of the applicable variance has discussion in terms of a conceptual model with constraints for within‐side homogeneity of groups with the same treatment and between‐side homogeneity of the differences between T and P. Goodness of fit for this model can have assessment with test statistics for its corresponding constraints. Simulation studies for the conceptual model confirm better power for the extended test statistic with its full invocation than other criteria without this property. The methods summarized here are illustrated for the motivating clinical trial in ophthalmology, but they are applicable to other situations with the crossover design with four sequence groups for either two locations for two treatments at the same time for a patient or two successive periods for the assigned treatments for a recurrent disorder. This article also notes that the methods based on its conceptual model can have unsatisfactory power for departures from that model where the difference between T and P via the T:T and P:P groups is not similar to that via the T:P and P:T groups, as might occur when T has a systemic effect in a bilateral trial. For this situation, more robust test statistics have identification, but there is recognition that the parallel groups design with only the T:T and P:P groups may be more useful than the bilateral design with four sequence groups.  相似文献   

13.
An entropy-based statistic for genomewide association studies   总被引:8,自引:0,他引:8       下载免费PDF全文
Efficient genotyping methods and the availability of a large collection of single-nucleotide polymorphisms provide valuable tools for genetic studies of human disease. The standard chi2 statistic for case-control studies, which uses a linear function of allele frequencies, has limited power when the number of marker loci is large. We introduce a novel test statistic for genetic association studies that uses Shannon entropy and a nonlinear function of allele frequencies to amplify the differences in allele and haplotype frequencies to maintain statistical power with large numbers of marker loci. We investigate the relationship between the entropy-based test statistic and the standard chi2 statistic and show that, in most cases, the power of the entropy-based statistic is greater than that of the standard chi2 statistic. The distribution of the entropy-based statistic and the type I error rates are validated using simulation studies. Finally, we apply the new entropy-based test statistic to two real data sets, one for the COMT gene and schizophrenia and one for the MMP-2 gene and esophageal carcinoma, to evaluate the performance of the new method for genetic association studies. The results show that the entropy-based statistic obtained smaller P values than did the standard chi2 statistic.  相似文献   

14.
L R Muenz  S B Green  D P Byar 《Biometrics》1977,33(4):617-626
In comparing two survival distributions, a Mantel-Haenszel statistic can be computed after each death as a non-linear two-sample rank statistic. The distributions of both the maximum and terminal statistics in such a sequence are studied numerically, in the absence of censoring, and appropriate critical values are determined. The maximum statistic is applied to simultaneous inference, and both the maximum and terminal statistics are used as the basis for early stopping procedures (especially in the pseudo-sequential context). Procedures based on the two statistics are compared for power and for early decision properties such as stopping index and (for exponential distributions) stopping time.  相似文献   

15.
ABSTRACT: BACKGROUND: In the last years GWA studies have successfully identified common SNPs associated with complex diseases. However, most of the variants found this way account for only a small portion of the trait variance. This fact leads researchers to focus on rare-variant mapping with large scale sequencing, which can be facilitated by using linkage information. The question arises why linkage analysis often fails to identify genes when analyzing complex diseases. Using simulations we have investigated the power of parametric and nonparametric linkage statistics (KC-LOD, NPL, LOD and MOD scores), to detect the effect of genes responsible for complex diseases using different pedigree structures. RESULTS: As expected, a small number of pedigrees with less than three affected individuals has low power to map disease genes with modest effect. Interestingly, the power decreases when unaffected individuals are included in the analysis, irrespective of the true mode of inheritance. Furthermore, we found that the best performing statistic depends not only on the type of pedigrees but also on the true mode of inheritance. CONCLUSIONS: When applied in a sensible way linkage is an appropriate and robust technique to map genes for complex disease. Unlike association analysis, linkage analysis is not hampered by allelic heterogeneity. So, why does linkage analysis often fail with complex diseases? Evidently, when using an insufficient number of small pedigrees, one might miss a true genetic linkage when actually a real effect exists. Furthermore, we show that the test statistic has an important effect on the power to detect linkage as well. Therefore, a linkage analysis might fail if an inadequate test statistic is employed. We provide recommendations regarding the most favorable test statistics, in terms of power, for a given mode of inheritance and type of pedigrees under study, in order to reduce the probability to miss a true linkage.  相似文献   

16.
Zhao J  Jin L  Xiong M 《Genetics》2006,174(3):1529-1538
As millions of single-nucleotide polymorphisms (SNPs) have been identified and high-throughput genotyping technologies have been rapidly developed, large-scale genomewide association studies are soon within reach. However, since a genomewide association study involves a large number of SNPs it is therefore nearly impossible to ensure a genomewide significance level of 0.05 using the available statistics, although the multiple-test problems can be alleviated, but not sufficiently, by the use of tagging SNPs. One strategy to circumvent the multiple-test problem associated with genome-wide association tests is to develop novel test statistics with high power. In this report, we introduce several nonlinear tests, which are based on nonlinear transformation of allele or haplotype frequencies. We investigate the power of the nonlinear test statistics and demonstrate that under certain conditions, some nonlinear test statistics have much higher power than the standard chi2-test statistic. Type I error rates of the nonlinear tests are validated using simulation studies. We also show that a class of similarity measure-based test statistics is based on the quadratic function of allele or haplotype frequencies, and thus they belong to nonlinear tests. To evaluate their performance, the nonlinear test statistics are also applied to three real data sets. Our study shows that nonlinear test statistics have great potential in association studies of complex diseases.  相似文献   

17.
Wu Y  Genton MG  Stefanski LA 《Biometrics》2006,62(3):877-885
We develop a new statistic for testing the equality of two multivariate mean vectors. A scaled chi-squared distribution is proposed as an approximating null distribution. Because the test statistic is based on componentwise statistics, it has the advantage over Hotelling's T2 test of being applicable to the case where the dimension of an observation exceeds the number of observations. An appealing feature of the new test is its ability to handle missing data by relying on only componentwise sample moments. Monte Carlo studies indicate good power compared to Hotelling's T2 and a recently proposed test by Srivastava (2004, Technical Report, University of Toronto). The test is applied to drug discovery data.  相似文献   

18.
Despite the growing consensus on the importance of testing gene-gene interactions in genetic studies of complex diseases, the effect of gene-gene interactions has often been defined as a deviance from genetic additive effects, which is essentially treated as a residual term in genetic analysis and leads to low power in detecting the presence of interacting effects. To what extent the definition of gene-gene interaction at population level reflects the genes' biochemical or physiological interaction remains a mystery. In this article, we introduce a novel definition and a new measure of gene-gene interaction between two unlinked loci (or genes). We developed a general theory for studying linkage disequilibrium (LD) patterns in disease population under two-locus disease models. The properties of using the LD measure in a disease population as a function of the measure of gene-gene interaction between two unlinked loci were also investigated. We examined how interaction between two loci creates LD in a disease population and showed that the mathematical formulation of the new definition for gene-gene interaction between two loci was similar to that of the LD between two loci. This finding motived us to develop an LD-based statistic to detect gene-gene interaction between two unlinked loci. The null distribution and type I error rates of the LD-based statistic for testing gene-gene interaction were validated using extensive simulation studies. We found that the new test statistic was more powerful than the traditional logistic regression under three two-locus disease models and demonstrated that the power of the test statistic depends on the measure of gene-gene interaction. We also investigated the impact of using tagging SNPs for testing interaction on the power to detect interaction between two unlinked loci. Finally, to evaluate the performance of our new method, we applied the LD-based statistic to two published data sets. Our results showed that the P values of the LD-based statistic were smaller than those obtained by other approaches, including logistic regression models.  相似文献   

19.
This paper considers four summary test statistics, including the one recently proposed by Bennett (1986, Biometrical Journal 28, 859–862), for hypothesis testing of association in a series of independent fourfold tables under inverse sampling. This paper provides a systematic and quantitative evaluation of the small-sample performance for these summary test statistics on the basis of a Monte Carlo simulation. This paper notes that the test statistic developed by Bennett (1986) can be conservative and thereby possibly lose the power when the underlying disease is not rare. This paper also finds that for given a fixed total number of cases in each table, the conditional test statistic is the best in controlling type I error among all test statistics considered here.  相似文献   

20.
New tests for trend in proportions, in the presence of historical control data, are proposed. One such test is a simple score statistic based on a binomial likelihood for the "current" study and beta-binomial likelihoods for each historical control series. A closely related trend statistic based on estimating equations is also proposed. Trend statistics that allow overdispersed proportions in the current study are also developed, including a version of Tarone's (1982, Biometrics 38, 215-220) test that acknowledges sampling variation in the beta distribution parameters, and a trend statistic based on estimating equations. Each such trend test is evaluated with respect to size and power under both binomial and beta-binomial sampling conditions for the current study, and illustrations are provided.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号