首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In many case-control genetic association studies, a set of correlated secondary phenotypes that may share common genetic factors with disease status are collected. Examination of these secondary phenotypes can yield valuable insights about the disease etiology and supplement the main studies. However, due to unequal sampling probabilities between cases and controls, standard regression analysis that assesses the effect of SNPs (single nucleotide polymorphisms) on secondary phenotypes using cases only, controls only, or combined samples of cases and controls can yield inflated type I error rates when the test SNP is associated with the disease. To solve this issue, we propose a Gaussian copula-based approach that efficiently models the dependence between disease status and secondary phenotypes. Through simulations, we show that our method yields correct type I error rates for the analysis of secondary phenotypes under a wide range of situations. To illustrate the effectiveness of our method in the analysis of real data, we applied our method to a genome-wide association study on high-density lipoprotein cholesterol (HDL-C), where "cases" are defined as individuals with extremely high HDL-C level and "controls" are defined as those with low HDL-C level. We treated 4 quantitative traits with varying degrees of correlation with HDL-C as secondary phenotypes and tested for association with SNPs in LIPG, a gene that is well known to be associated with HDL-C. We show that when the correlation between the primary and secondary phenotypes is >0.2, the P values from case-control combined unadjusted analysis are much more significant than methods that aim to correct for ascertainment bias. Our results suggest that to avoid false-positive associations, it is important to appropriately model secondary phenotypes in case-control genetic association studies.  相似文献   

2.
Li M  Boehnke M  Abecasis GR  Song PX 《Genetics》2006,173(4):2317-2327
Mapping and identifying variants that influence quantitative traits is an important problem for genetic studies. Traditional QTL mapping relies on a variance-components (VC) approach with the key assumption that the trait values in a family follow a multivariate normal distribution. Violation of this assumption can lead to inflated type I error, reduced power, and biased parameter estimates. To accommodate nonnormally distributed data, we developed and implemented a modified VC method, which we call the "copula VC method," that directly models the nonnormal distribution using Gaussian copulas. The copula VC method allows the analysis of continuous, discrete, and censored trait data, and the standard VC method is a special case when the data are distributed as multivariate normal. Through the use of link functions, the copula VC method can easily incorporate covariates. We use computer simulations to show that the proposed method yields unbiased parameter estimates, correct type I error rates, and improved power for testing linkage with a variety of nonnormal traits as compared with the standard VC and the regression-based methods.  相似文献   

3.
The recent development of sequencing technology allows identification of association between the whole spectrum of genetic variants and complex diseases. Over the past few years, a number of association tests for rare variants have been developed. Jointly testing for association between genetic variants and multiple correlated phenotypes may increase the power to detect causal genes in family-based studies, but familial correlation needs to be appropriately handled to avoid an inflated type I error rate. Here we propose a novel approach for multivariate family data using kernel machine regression (denoted as MF-KM) that is based on a linear mixed-model framework and can be applied to a large range of studies with different types of traits. In our simulation studies, the usual kernel machine test has inflated type I error rates when applied directly to familial data, while our proposed MF-KM method preserves the expected type I error rates. Moreover, the MF-KM method has increased power compared to methods that either analyze each phenotype separately while considering family structure or use only unrelated founders from the families. Finally, we illustrate our proposed methodology by analyzing whole-genome genotyping data from a lung function study.  相似文献   

4.
Variance component analysis provides an efficient method for performing linkage analysis for quantitative traits. However, type I error of variance components-based likelihood ratio testing may be affected when phenotypic data are non-normally distributed (especially with high values of kurtosis). This results in inflated LOD scores when the normality assumption does not hold. Even though different solutions have been proposed to deal with this problem with univariate phenotypes, little work has been done in the multivariate case. We present an empirical approach to adjust the inflated LOD scores obtained from a bivariate phenotype that violates the assumption of normality. Using the Collaborative Study on the Genetics of Alcoholism data available for the Genetic Analysis Workshop 14, we show how bivariate linkage analysis with leptokurtotic traits gives an inflated type I error. We perform a novel correction that achieves acceptable levels of type I error.  相似文献   

5.
The present study assesses the effects of genotyping errors on the type I error rate of a particular transmission/disequilibrium test (TDT(std)), which assumes that data are errorless, and introduces a new transmission/disequilibrium test (TDT(ae)) that allows for random genotyping errors. We evaluate the type I error rate and power of the TDT(ae) under a variety of simulations and perform a power comparison between the TDT(std) and the TDT(ae), for errorless data. Both the TDT(std) and the TDT(ae) statistics are computed as two times a log-likelihood difference, and both are asymptotically distributed as chi(2) with 1 df. Genotype data for trios are simulated under a null hypothesis and under an alternative (power) hypothesis. For each simulation, errors are introduced randomly via a computer algorithm with different probabilities (called "allelic error rates"). The TDT(std) statistic is computed on all trios that show Mendelian consistency, whereas the TDT(ae) statistic is computed on all trios. The results indicate that TDT(std) shows a significant increase in type I error when applied to data in which inconsistent trios are removed. This type I error increases both with an increase in sample size and with an increase in the allelic error rates. TDT(ae) always maintains correct type I error rates for the simulations considered. Factors affecting the power of the TDT(ae) are discussed. Finally, the power of TDT(std) is at least that of TDT(ae) for simulations with errorless data. Because data are rarely error free, we recommend that researchers use methods, such as the TDT(ae), that allow for errors in genotype data.  相似文献   

6.
The transmission/disequilibrium test was introduced to test for linkage disequilibrium between a marker and a putative disease locus using case-parent trios. However, parental genotypes may be incomplete in such a study. When parental information is non-randomly missing, due, for example, to death from the disease under study, the impact on type I error and power under dominant and recessive disease models has been reported. In this paper, we examine non-ignorable missingness by assigning missing values to the genotypes of affected parents. We used unrelated case-parent trios in the Genetic Analysis Workshop 14 simulated data for the Danacaa population. Our computer simulations revealed that the type I error of these tests using incomplete trios was not inflated over the nominal level under either recessive or dominant disease models. However, the power of these tests appears to be inflated over the complete information case due to an excess of heterozygous parents in dyads.  相似文献   

7.
OBJECTIVES: This is the first of two articles discussing the effect of population stratification on the type I error rate (i.e., false positive rate). This paper focuses on the confounding risk ratio (CRR). It is accepted that population stratification (PS) can produce false positive results in case-control genetic association. However, which values of population parameters lead to an increase in type I error rate is unknown. Some believe PS does not represent a serious concern, whereas others believe that PS may contribute to contradictory findings in genetic association. We used computer simulations to estimate the effect of PS on type I error rate over a wide range of disease frequencies and marker allele frequencies, and we compared the observed type I error rate to the magnitude of the confounding risk ratio. METHODS: We simulated two populations and mixed them to produce a combined population, specifying 160 different combinations of input parameters (disease prevalences and marker allele frequencies in the two populations). From the combined populations, we selected 5000 case-control datasets, each with either 50, 100, or 300 cases and controls, and determined the type I error rate. In all simulations, the marker allele and disease were independent (i.e., no association). RESULTS: The type I error rate is not substantially affected by changes in the disease prevalence per se. We found that the CRR provides a relatively poor indicator of the magnitude of the increase in type I error rate. We also derived a simple mathematical quantity, Delta, that is highly correlated with the type I error rate. In the companion article (part II, in this issue), we extend this work to multiple subpopulations and unequal sampling proportions. CONCLUSION: Based on these results, realistic combinations of disease prevalences and marker allele frequencies can substantially increase the probability of finding false evidence of marker disease associations. Furthermore, the CRR does not indicate when this will occur.  相似文献   

8.
The discovery that microsatellite repeat expansions can cause clinical disease has fostered renewed interest in testing for age-at-onset anticipation (AOA). A commonly used procedure is to sample affected parent-child pairs (APCPs) from available data sets and to test for a difference in mean age at onset between the parents and the children. However, standard statistical methods fail to take into account the right truncation of both the parent and child age-at-onset distributions under this design, with the result that type I error rates can be inflated substantially. Previously, we had introduced a new test, based on the correct, bivariate right-truncated, age-at-onset distribution. We showed that this test has the correct type I error rate for random APCPs, even for quite small samples. However, in that paper, we did not consider two key statistical complications that arise when the test is applied to realistic data. First, affected pairs usually are sampled from pedigrees preferentially selected for the presence of multiple affected individuals. In this paper, we show that this will tend to inflate the type I error rate of the test. Second, we consider the appropriate probability model under the alternative hypothesis of true AOA due to an expanding microsatellite mechanism, and we show that there is good reason to believe that the power to detect AOA may be quite small, even for substantial effect sizes. When the type I error rate of the test is high relative to the power, interpretation of test results becomes problematic. We conclude that, in many applications, AOA tests based on APCPs may not yield meaningful results.  相似文献   

9.
Ueki M  Cordell HJ 《PLoS genetics》2012,8(4):e1002625
Recently, Wu and colleagues [1] proposed two novel statistics for genome-wide interaction analysis using case/control or case-only data. In computer simulations, their proposed case/control statistic outperformed competing approaches, including the fast-epistasis option in PLINK and logistic regression analysis under the correct model; however, reasons for its superior performance were not fully explored. Here we investigate the theoretical properties and performance of Wu et al.'s proposed statistics and explain why, in some circumstances, they outperform competing approaches. Unfortunately, we find minor errors in the formulae for their statistics, resulting in tests that have higher than nominal type 1 error. We also find minor errors in PLINK's fast-epistasis and case-only statistics, although theory and simulations suggest that these errors have only negligible effect on type 1 error. We propose adjusted versions of all four statistics that, both theoretically and in computer simulations, maintain correct type 1 error rates under the null hypothesis. We also investigate statistics based on correlation coefficients that maintain similar control of type 1 error. Although designed to test specifically for interaction, we show that some of these previously-proposed statistics can, in fact, be sensitive to main effects at one or both loci, particularly in the presence of linkage disequilibrium. We propose two new "joint effects" statistics that, provided the disease is rare, are sensitive only to genuine interaction effects. In computer simulations we find, in most situations considered, that highest power is achieved by analysis under the correct genetic model. Such an analysis is unachievable in practice, as we do not know this model. However, generally high power over a wide range of scenarios is exhibited by our joint effects and adjusted Wu statistics. We recommend use of these alternative or adjusted statistics and urge caution when using Wu et al.'s originally-proposed statistics, on account of the inflated error rate that can result.  相似文献   

10.
BackgroundCluster randomised trials (CRTs) are commonly analysed using mixed-effects models or generalised estimating equations (GEEs). However, these analyses do not always perform well with the small number of clusters typical of most CRTs. They can lead to increased risk of a type I error (finding a statistically significant treatment effect when it does not exist) if appropriate corrections are not used.MethodsWe conducted a small simulation study to evaluate the impact of using small-sample corrections for mixed-effects models or GEEs in CRTs with a small number of clusters. We then reanalysed data from TRIGGER, a CRT with six clusters, to determine the effect of using an inappropriate analysis method in practice. Finally, we reviewed 100 CRTs previously identified by a search on PubMed in order to assess whether trials were using appropriate methods of analysis. Trials were classified as at risk of an increased type I error rate if they did not report using an analysis method which accounted for clustering, or if they had fewer than 40 clusters and performed an individual-level analysis without reporting the use of an appropriate small-sample correction.ResultsOur simulation study found that using mixed-effects models or GEEs without an appropriate correction led to inflated type I error rates, even for as many as 70 clusters. Conversely, using small-sample corrections provided correct type I error rates across all scenarios. Reanalysis of the TRIGGER trial found that inappropriate methods of analysis gave much smaller P values (P ≤ 0.01) than appropriate methods (P = 0.04–0.15). In our review, of the 99 trials that reported the number of clusters, 64 (65 %) were at risk of an increased type I error rate; 14 trials did not report using an analysis method which accounted for clustering, and 50 trials with fewer than 40 clusters performed an individual-level analysis without reporting the use of an appropriate correction.ConclusionsCRTs with a small or medium number of clusters are at risk of an inflated type I error rate unless appropriate analysis methods are used. Investigators should consider using small-sample corrections with mixed-effects models or GEEs to ensure valid results.  相似文献   

11.
The finite mixture model approach has attracted much attention in analyzing microarray data due to its robustness to the excessive variability which is common in the microarray data. Pan (2003) proposed to use the normal mixture model method (MMM) to estimate the distribution of a test statistic and its null distribution. However, considering the fact that the test statistic is often of t-type, our studies find that the rejection region from MMM is often significantly larger than the correct rejection region, resulting an inflated type I error. This motivates us to propose the t-mixture model (TMM) approach. In this paper, we demonstrate that TMM provides significantly more accurate control of the probability of making type I errors (hence of the familywise error rate) than MMM. Finally, TMM is applied to the well-known leukemia data of Golub et al. (1999). The results are compared with those obtained from MMM.  相似文献   

12.
Computer simulation was used to test Smith's (1994) correction for phylogenetic nonindependence in comparative studies. Smith's method finds effective N, which is computed using nested analysis of variance, and uses this value in place of observed N as the baseline degrees of freedom (df) for calculating statistical significance levels. If Smith's formula finds the correct df, distributions of computer-generated statistics from simulations with observed N nonindependent species should match theoretical distributions (from statistical tables) with the df based on effective N. The computer program developed to test Smith's method simulates character evolution down user-specified phylogenies. Parameters were systematically varied to discover their effects on Smith's method. In simulations in which the phylogeny and taxonomy were identical (tests of narrow-sense validity), Smith's method always gave conservative statistical results when the taxonomy had fewer than five levels. This conservative departure gave way to a liberal deviation in type I error rates in simulations using more than five taxonomic levels, except when species values were nearly independent. Reducing the number of taxonomic levels used in the analysis, and thereby eliminating available information regarding evolutionary relationships, also increased type I error rates (broad-sense validity), indicating that this may be inappropriate under conditions shown to have high type I error rates. However, the use of taxonomic categories over more accurate phylogenies did not create a liberal bias in all cases in the analysis performed here. The effect of correlated trait evolution was ambiguous but, relative to other parameters, negligible. © 1995 Wiley-Liss, Inc.  相似文献   

13.
Chan IS  Tang NS  Tang ML  Chan PS 《Biometrics》2003,59(4):1170-1177
Testing of noninferiority has become increasingly important in modern medicine as a means of comparing a new test procedure to a currently available test procedure. Asymptotic methods have recently been developed for analyzing noninferiority trials using rate ratios under the matched-pair design. In small samples, however, the performance of these asymptotic methods may not be reliable, and they are not recommended. In this article, we investigate alternative methods that are desirable for assessing noninferiority trials, using the rate ratio measure under small-sample matched-pair designs. In particular, we propose an exact and an approximate exact unconditional test, along with the corresponding confidence intervals based on the score statistic. The exact unconditional method guarantees the type I error rate will not exceed the nominal level. It is recommended for when strict control of type I error (protection against any inflated risk of accepting inferior treatments) is required. However, the exact method tends to be overly conservative (thus, less powerful) and computationally demanding. Via empirical studies, we demonstrate that the approximate exact score method, which is computationally simple to implement, controls the type I error rate reasonably well and has high power for hypothesis testing. On balance, the approximate exact method offers a very good alternative for analyzing correlated binary data from matched-pair designs with small sample sizes. We illustrate these methods using two real examples taken from a crossover study of soft lenses and a Pneumocystis carinii pneumonia study. We contrast the methods with a hypothetical example.  相似文献   

14.
Zang Y  Zhang H  Yang Y  Zheng G 《Human heredity》2007,63(3-4):187-195
The population-based case-control design is a powerful approach for detecting susceptibility markers of a complex disease. However, this approach may lead to spurious association when there is population substructure: population stratification (PS) or cryptic relatedness (CR). Two simple approaches to correct for the population substructure are genomic control (GC) and delta centralization (DC). GC uses the variance inflation factor to correct for the variance distortion of a test statistic, and the DC centralizes the non-central chi-square distribution of the test statistic. Both GC and DC have been studied for case-control association studies mainly under a specific genetic model (e.g. recessive, additive or dominant), under which an optimal trend test is available. The genetic model is usually unknown for many complex diseases. In this situation, we study the performance of three robust tests based on the GC and DC corrections in the presence of the population substructure. Our results show that, when the genetic model is unknown, the DC- (or GC-) corrected maximum and Pearson's association test are robust and have good control of Type I error and high power relative to the optimal trend tests in the presence of PS (or CR).  相似文献   

15.
Aim Variation partitioning based on canonical analysis is the most commonly used analysis to investigate community patterns according to environmental and spatial predictors. Ecologists use this method in order to understand the pure contribution of the environment independent of space, and vice versa, as well as to control for inflated type I error in assessing the environmental component under spatial autocorrelation. Our goal is to use numerical simulations to compare how different spatial predictors and model selection procedures perform in assessing the importance of the spatial component and in controlling for type I error while testing environmental predictors. Innovation We determine for the first time how the ability of commonly used (polynomial regressors) and novel methods based on eigenvector maps compare in the realm of spatial variation partitioning. We introduce a novel forward selection procedure to select spatial regressors for community analysis. Finally, we point out a number of issues that have not been previously considered about the joint explained variation between environment and space, which should be taken into account when reporting and testing the unique contributions of environment and space in patterning ecological communities. Main conclusions In tests of species‐environment relationships, spatial autocorrelation is known to inflate the level of type I error and make the tests of significance invalid. First, one must determine if the spatial component is significant using all spatial predictors (Moran's eigenvector maps). If it is, consider a model selection for the set of spatial predictors (an individual‐species forward selection procedure is to be preferred) and use the environmental and selected spatial predictors in a partial regression or partial canonical analysis scheme. This is an effective way of controlling for type I error in such tests. Polynomial regressors do not provide tests with a correct level of type I error.  相似文献   

16.
WHY DOES A METHOD THAT FAILS CONTINUE TO BE USED? THE ANSWER   总被引:1,自引:0,他引:1  
It has been claimed that hundreds of researchers use nested clade phylogeographic analysis (NCPA) based on what the method promises rather than requiring objective validation of the method. The supposed failure of NCPA is based upon the argument that validating it by using positive controls ignored type I error, and that computer simulations have shown a high type I error. The first argument is factually incorrect: the previously published validation analysis fully accounted for both type I and type II errors. The simulations that indicate a 75% type I error rate have serious flaws and only evaluate outdated versions of NCPA. These outdated type I error rates fall precipitously when the 2003 version of single-locus NCPA is used or when the 2002 multilocus version of NCPA is used. It is shown that the tree-wise type I errors in single-locus NCPA can be corrected to the desired nominal level by a simple statistical procedure, and that multilocus NCPA reconstructs a simulated scenario used to discredit NCPA with 100% accuracy. Hence, NCPA is a not a failed method at all, but rather has been validated both by actual data and by simulated data in a manner that satisfies the published criteria given by its critics. The critics have come to different conclusions because they have focused on the pre-2002 versions of NCPA and have failed to take into account the extensive developments in NCPA since 2002. Hence, researchers can choose to use NCPA based upon objective critical validation that shows that NCPA delivers what it promises.  相似文献   

17.
The Mantel test, based on comparisons of distance matrices, is commonly employed in comparative biology, but its statistical properties in this context are unknown. Here, we evaluate the performance of the Mantel test for two applications in comparative biology: testing for phylogenetic signal, and testing for an evolutionary correlation between two characters. We find that the Mantel test has poor performance compared to alternative methods, including low power and, under some circumstances, inflated type‐I error. We identify a remedy for the inflated type‐I error of three‐way Mantel tests using phylogenetic permutations; however, this test still has considerably lower power than independent contrasts. We recommend that use of the Mantel test should be restricted to cases in which data can only be expressed as pairwise distances among taxa.  相似文献   

18.
We examined Type I error rates of Felsenstein's (1985; Am. Nat. 125:1-15) comparative method of phylogenetically independent contrasts when branch lengths are in error and the model of evolution is not Brownian motion. We used seven evolutionary models, six of which depart strongly from Brownian motion, to simulate the evolution of two continuously valued characters along two different phylogenies (15 and 49 species). First, we examined the performance of independent contrasts when branch lengths are distorted systematically, for example, by taking the square root of each branch segment. These distortions often caused inflated Type I error rates, but performance was almost always restored when branch length transformations were used. Next, we investigated effects of random errors in branch lengths. After the data were simulated, we added errors to the branch lengths and then used the altered phylogenies to estimate character correlations. Errors in the branches could be of two types: fixed, where branch lengths are either shortened or lengthened by a fixed fraction; or variable, where the error is a normal variate with mean zero and the variance is scaled to the length of the branch (so that expected error relative to branch length is constant for the whole tree). Thus, the error added is unrelated to the microevolutionary model. Without branch length checks and transformations, independent contrasts tended to yield extremely inflated and highly variable Type I error rates. Type I error rates were reduced, however, when branch lengths were checked and transformed as proposed by Garland et al. (1992; Syst. Biol. 41:18-32), and almost never exceeded twice the nominal P-value at alpha = 0.05. Our results also indicate that, if branch length transformations are applied, then the appropriate degrees of freedom for testing the significance of a correlation coefficient should, in general, be reduced to account for estimation of the best branch length transformation. These results extend those reported in Díaz-Uriarte and Garland (1996; Syst. Biol. 45:27-47), and show that, even with errors in branch lengths and evolutionary models different from Brownian motion, independent contrasts are a robust method for testing hypotheses of correlated evolution.  相似文献   

19.
A threshold of 3.3 for a genome-wide maximum LOD score (MAXLOD) has been demonstrated in human linkage studies as corresponding to a type I error rate of 5%. Generalization of this work to other species assumes the presence of an infinitely dense marker map. While this assumption is increasingly realistic for the human genome, it may be unrealistic for the dog genome. In this study we establish the analytic and empirical thresholds for MAXLOD in canine linkage studies corresponding to type I error rates of 5% and 1% for autosomal traits. Empirical thresholds are computed via simulation assuming a 10 cM map with no fine mapping performed. Pedigree structures for simulations were drawn from two canine disease studies. Five thousand replicates of genome-wide null genotype data were simulated and analyzed for each disease. We determined that MAXLOD thresholds of 3.2 and 2.7 correspond to analytic and empirical type I error rates of 5%, respectively. In all cases, the MAXLOD thresholds from simulations were always at least 0.5 LOD units below the corresponding analytic thresholds. We therefore recommend that a threshold of 3.2 be used for canine linkage studies when fine mapping is performed, and that researchers perform their own simulation studies to assess genome-wide empirical significance levels when no fine mapping is performed.  相似文献   

20.
The central issue for Genetic Analysis Workshop 14 (GAW14) is the question, which is the better strategy for linkage analysis, the use of single-nucleotide polymorphisms (SNPs) or microsatellite markers? To answer this question we analyzed the simulated data using Duffy's SIB-PAIR program, which can incorporate parental genotypes, and our identity-by-state – identity-by-descent (IBS-IBD) transformation method of affected sib-pair linkage analysis which uses the matrix transformation between IBS and IBD. The advantages of our method are as follows: the assumption of Hardy-Weinberg equilibrium is not necessary; the parental genotype information maybe all unknown; both IBS and its related IBD transformation can be used in the linkage analysis; the determinant of the IBS-IBD transformation matrix provides a quantitative measure of the quality of the marker in linkage analysis. With the originally distributed simulated data, we found that 1) for microsatellite markers there are virtually no differences in types I and II error rates when parental genotypes were or were not used; 2) on average, a microsatellite marker has more power than a SNP marker does in linkage detection; 3) if parental genotype information is used, SNP markers show lower type I error rates than microsatellite markers; and 4) if parental genotypes are not available, SNP markers show considerable variation in type I error rates for different methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号