首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Statistical tests are needed to determine whether spatial structure has had a significant effect on the genetic differentiation of subpopulations. Here we introduce a new family of statistics based on a sum of an exponential function of the distances between individuals, which can be used with any genetic distance (e.g., nucleotide differences, number of nonshared alleles, or separation on a phylogenetic tree). The power of the tests to detect genetic differentiation in Wright-Fisher island models and stepping stone models was calculated for various sample sizes, rates of migration and mutation, and definitions of spatial neighborhoods. We found that our new test was in some cases more powerful than the Ks* statistic of Hudson et al. (Mol. Biol. Evol. 9, 138-151, 1992), but in all cases was slightly less powerful than both a traditional chi2 test without lumping of rare haplotypes and the S(nn) test of Hudson (Genetics 155, 2011-2014, 2000). However, when we applied our new tests to three data sets, we found in some cases highly significant results that were missed by the other tests.  相似文献   

2.
Nixon J 《Heredity》2006,96(4):290-297
It is important that breeders have the means to assess genetic scoring data for segregation distortion because of its probable effect on the design of efficient breeding strategies. Scoring data is usually assessed for segregation distortion by separate nonindependent chi2 tests at each locus in a set of marker loci. This analysis gives the loci most affected by selection if it exists, but it cannot give a statistically correct test for the presence or absence of selection in a linkage group as a whole. I have used a combined test based on the statistic, which is the most significant P-value from the above tests, called the single locus test. I have also derived mathematically a new combined statistical test, the overall test, for segregation distortion that requires genetic scoring data for a single linkage group. This test also takes genetic linkage into account. Using a range of marker densities and population sizes, simulations were carried out, to compare the power of these two statistical tests to detect the effect of selection at one or two loci. The single locus test was always found to be more powerful than the overall test, but the single locus test required a more complicated P-value correction. For the single locus test, approximate correction factors for the P-values are given for a range of marker densities and genetic lengths.  相似文献   

3.
A test statistic that is valid for data collected according to a particular type of family study design is not necessarily valid when applied to data obtained from a different type of family study design. When this can occur, a different test that usually is valid is developed for each type of family study design. However, investigators might find that their data come from two (or more) different family study designs, each requiring a different test, yet they want an overall conclusion, essentially a valid hypothesis test that is as powerful as possible. When the underlying genetic model is unknown, it is not clear how to proceed, as several alternative approaches might appear feasible. By using as an example the development of a test of association for data concerning affected singletons and their parents and affected sib pairs and their parents, it is shown that it may not be possible to develop a universally optimal approach without knowledge of the underlying genetic model.  相似文献   

4.
Despite the growing consensus on the importance of testing gene-gene interactions in genetic studies of complex diseases, the effect of gene-gene interactions has often been defined as a deviance from genetic additive effects, which is essentially treated as a residual term in genetic analysis and leads to low power in detecting the presence of interacting effects. To what extent the definition of gene-gene interaction at population level reflects the genes' biochemical or physiological interaction remains a mystery. In this article, we introduce a novel definition and a new measure of gene-gene interaction between two unlinked loci (or genes). We developed a general theory for studying linkage disequilibrium (LD) patterns in disease population under two-locus disease models. The properties of using the LD measure in a disease population as a function of the measure of gene-gene interaction between two unlinked loci were also investigated. We examined how interaction between two loci creates LD in a disease population and showed that the mathematical formulation of the new definition for gene-gene interaction between two loci was similar to that of the LD between two loci. This finding motived us to develop an LD-based statistic to detect gene-gene interaction between two unlinked loci. The null distribution and type I error rates of the LD-based statistic for testing gene-gene interaction were validated using extensive simulation studies. We found that the new test statistic was more powerful than the traditional logistic regression under three two-locus disease models and demonstrated that the power of the test statistic depends on the measure of gene-gene interaction. We also investigated the impact of using tagging SNPs for testing interaction on the power to detect interaction between two unlinked loci. Finally, to evaluate the performance of our new method, we applied the LD-based statistic to two published data sets. Our results showed that the P values of the LD-based statistic were smaller than those obtained by other approaches, including logistic regression models.  相似文献   

5.
Marker-trait association analysis is an important statistical tool for detecting DNA variants responsible for genetic traits. In such analyses, an analysis model of the mean genetic effects of the genotypes is often specified. For instance, the effect of the disease allele on the trait is often specified to be dominant, recessive, additive, or multiplicative. Although this model-based approach is powerful when the analysis model is correctly specified, it has been found to have low power sometimes when the specified model is incorrect. We introduce an approach that does not require the specification of a particular genetic model. This approach is built upon a constrained maximum likelihood in which the mean genetic effect of the heterozygous genotype is required to not exceed those of the two homozygous genotypes. The asymptotic distribution of the likelihood-ratio statistic is derived for two special cases. A simulation study suggests that this new approach has power comparable to that of the model-based method when the analysis model is correctly specified. This approach uses one marker at a time (i.e., it is a single-marker analysis). However, given the latest findings that powerful inferential procedures for haplotype analyses can be constructed from single-marker analyses, we expect this approach to be useful for haplotype analyses.  相似文献   

6.
Researchers conducting family-based association studies have a wide variety of transmission/disequilibrium (TD)-based methods to choose from, but few guidelines exist in the selection of a particular method to apply to available data. Using a simulation study design, we compared the power and type I error of eight popular TD-based methods under different family structures, frequencies of missing parental data, genetic models, and population stratifications. No method was uniformly most powerful under all conditions, but type I error was appropriate for nearly every test statistic under all conditions. Power varied widely across methods, with a 46.5% difference in power observed between the most powerful and the least powerful method when 50% of families consisted of an affected sib pair and one parent genotyped under an additive genetic model and a 35.2% difference when 50% of families consisted of a single affection-discordant sibling pair without parental genotypes available under an additive genetic model. Methods were generally robust to population stratification, although some slightly less so than others. The choice of a TD-based test statistic should be dependent on the predominant family structure ascertained, the frequency of missing parental genotypes, and the assumed genetic model.  相似文献   

7.
In order to study family‐based association in the presence of linkage, we extend a generalized linear mixed model proposed for genetic linkage analysis (Lebrec and van Houwelingen (2007), Human Heredity 64 , 5–15) by adding a genotypic effect to the mean. The corresponding score test is a weighted family‐based association tests statistic, where the weight depends on the linkage effect and on other genetic and shared environmental effects. For testing of genetic association in the presence of gene–covariate interaction, we propose a linear regression method where the family‐specific score statistic is regressed on family‐specific covariates. Both statistics are straightforward to compute. Simulation results show that adjusting the weight for the within‐family variance structure may be a powerful approach in the presence of environmental effects. The test statistic for genetic association in the presence of gene–covariate interaction improved the power for detecting association. For illustration, we analyze the rheumatoid arthritis data from GAW15. Adjusting for smoking and anti‐cyclic citrullinated peptide increased the significance of the association with the DR locus.  相似文献   

8.
Recent investigations such as a more powerful quasi-likelihoods score test (MQLS) statistic have enabled the efficient association analysis with related samples. Although those approaches are robust against the mis-specified phenotypic distribution and covariance structure, it has been shown that MQLS statistic becomes violated under the presence of the population substructure if the level of population substructure depends on the genomic location. In this report, we propose a new statistical method which combines EIGENSTRAT approach and MQLS-statistic. The proposed method was evaluated with simulation data under various scenarios and we found that proposed method performs better than the traditional methods such as transmission disequilibrium test. The proposed method was applied to genetic association analysis for body mass index with Framingham heart study, and we found that rs1121980 and rs9940128 in the linkage block in FTO gene are associated with the body mass index.  相似文献   

9.
Anthony Almudevar 《Biometrics》2001,57(4):1080-1088
The problem of inferring kinship structure among a sample of individuals using genetic markers is considered with the objective of developing hypothesis tests for genetic relatedness with nearly optimal properties. The class of tests considered are those that are constrained to be permutation invariant, which in this context defines tests whose properties do not depend on the labeling of the individuals. This is appropriate when all individuals are to be treated identically from a statistical point of view. The approach taken is to derive tests that are probably most powerful for a permutation invariant alternative hypothesis that is, in some sense, close to a null hypothesis of mutual independence. This is analagous to the locally most powerful test commonly used in parametric inference. Although the resulting test statistic is a U-statistic, normal approximation theory is found to be inapplicable because of high skewness. As an alternative it is found that a conditional procedure based on the most powerful test statistic can calculate accurate significance levels without much loss in power. Examples are given in which this type of test proves to be more powerful than a number of alternatives considered in the literature, including Queller and Goodknight's (1989) estimate of genetic relatedness, the average number of shared alleles (Blouin, 1996), and the number of feasible sibling triples (Almudevar and Field, 1999).  相似文献   

10.

Background

The etiology of complex diseases is due to the combination of genetic and environmental factors, usually many of them, and each with a small effect. The identification of these small-effect contributing factors is still a demanding task. Clearly, there is a need for more powerful tests of genetic association, and especially for the identification of rare effects

Results

We introduce a new genetic association test based on symbolic dynamics and symbolic entropy. Using a freely available software, we have applied this entropy test, and a conventional test, to simulated and real datasets, to illustrate the method and estimate type I error and power. We have also compared this new entropy test to the Fisher exact test for assessment of association with low-frequency SNPs. The entropy test is generally more powerful than the conventional test, and can be significantly more powerful when the genotypic test is applied to low allele-frequency markers. We have also shown that both the Fisher and Entropy methods are optimal to test for association with low-frequency SNPs (MAF around 1-5%), and both are conservative for very rare SNPs (MAF<1%)

Conclusions

We have developed a new, simple, consistent and powerful test to detect genetic association of biallelic/SNP markers in case-control data, by using symbolic dynamics and symbolic entropy as a measure of gene dependence. We also provide a standard asymptotic distribution of this test statistic. Given that the test is based on entropy measures, it avoids smoothed nonparametric estimation. The entropy test is generally as good or even more powerful than the conventional and Fisher tests. Furthermore, the entropy test is more computationally efficient than the Fisher's Exact test, especially for large number of markers. Therefore, this entropy-based test has the advantage of being optimal for most SNPs, regardless of their allele frequency (Minor Allele Frequency (MAF) between 1-50%). This property is quite beneficial, since many researchers tend to discard low allele-frequency SNPs from their analysis. Now they can apply the same statistical test of association to all SNPs in a single analysis., which can be especially helpful to detect rare effects.  相似文献   

11.
We applied a new approach based on Mantel statistics to analyze the Genetic Analysis Workshop 14 simulated data with prior knowledge of the answers. The method was developed in order to improve the power of a haplotype sharing analysis for gene mapping in complex disease. The new statistic correlates genetic similarity and phenotypic similarity across pairs of haplotypes from case-control studies. The genetic similarity is measured as the shared length between haplotype pairs around a genetic marker. The phenotypic similarity is measured as the mean corrected cross-product based on the respective phenotypes. Cases with phenotype P1 and unrelated controls were drawn from the population of Danacaa. Power to detect main effects was compared to the X2-test for association based on 3-marker haplotypes and a global permutation test for haplotype association to test for main effects. Power to detect gene x gene interaction was compared to unconditional logistic regression. The results suggest that the Mantel statistics might be more powerful than alternative tests.  相似文献   

12.
Zang Y  Zhang H  Yang Y  Zheng G 《Human heredity》2007,63(3-4):187-195
The population-based case-control design is a powerful approach for detecting susceptibility markers of a complex disease. However, this approach may lead to spurious association when there is population substructure: population stratification (PS) or cryptic relatedness (CR). Two simple approaches to correct for the population substructure are genomic control (GC) and delta centralization (DC). GC uses the variance inflation factor to correct for the variance distortion of a test statistic, and the DC centralizes the non-central chi-square distribution of the test statistic. Both GC and DC have been studied for case-control association studies mainly under a specific genetic model (e.g. recessive, additive or dominant), under which an optimal trend test is available. The genetic model is usually unknown for many complex diseases. In this situation, we study the performance of three robust tests based on the GC and DC corrections in the presence of the population substructure. Our results show that, when the genetic model is unknown, the DC- (or GC-) corrected maximum and Pearson's association test are robust and have good control of Type I error and high power relative to the optimal trend tests in the presence of PS (or CR).  相似文献   

13.
Combining information across genes in the statistical analysis of microarray data is desirable because of the relatively small number of data points obtained for each individual gene. Here we develop an estimator of the error variance that can borrow information across genes using the James-Stein shrinkage concept. A new test statistic (FS) is constructed using this estimator. The new statistic is compared with other statistics used to test for differential expression: the gene-specific F test (F1), the pooled-variance F statistic (F3), a hybrid statistic (F2) that uses the average of the individual and pooled variances, the regularized t-statistic, the posterior odds statistic B, and the SAM t-test. The FS-test shows best or nearly best power for detecting differentially expressed genes over a wide range of simulated data in which the variance components associated with individual genes are either homogeneous or heterogeneous. Thus FS provides a powerful and robust approach to test differential expression of genes that utilizes information not available in individual gene testing approaches and does not suffer from biases of the pooled variance approach.  相似文献   

14.
Lepage's test combines the Wilcoxon rank-sum and the Ansari-Bradley statistics. We propose to replace the latter statistic by a Wilcoxon rank-sum calculated after Levene's transformation. We use the medians for this transformation, i.e. absolute deviations from sample medians are calculated. The new location-scale test can be carried out as a permutation test based on permutations of the original observations, the Levene transformation has to be applied for each permutation in an intermediate step to calculate the test statistic. Simulations indicate that the new test can be more powerful than an O'Brien-type test and Lepage's test, the latter is the standard nonparametric location-scale test. The new test is illustrated using real data about colony sizes of yellow-eyed penguins and an SAS program to perform the test is freely available.  相似文献   

15.
Genetic interaction analysis,in which two mutations have a combined effect not exhibited by either mutation alone, is a powerful and widespread tool for establishing functional linkages between genes. In the yeast Saccharomyces cerevisiae, ongoing screens have generated >4,800 such genetic interaction data. We demonstrate that by combining these data with information on protein-protein, prote in-DNA or metabolic networks, it is possible to uncover physical mechanisms behind many of the observed genetic effects. Using a probabilistic model, we found that 1,922 genetic interactions are significantly associated with either between- or within-pathway explanations encoded in the physical networks, covering approximately 40% of known genetic interactions. These models predict new functions for 343 proteins and suggest that between-pathway explanations are better than within-pathway explanations at interpreting genetic interactions identified in systematic screens. This study provides a road map for how genetic and physical interactions can be integrated to reveal pathway organization and function.  相似文献   

16.
Di CZ  Liang KY 《Biometrics》2011,67(4):1249-1259
Summary We consider likelihood ratio tests (LRT) and their modifications for homogeneity in admixture models. The admixture model is a two‐component mixture model, where one component is indexed by an unknown parameter while the parameter value for the other component is known. This model is widely used in genetic linkage analysis under heterogeneity in which the kernel distribution is binomial. For such models, it is long recognized that testing for homogeneity is nonstandard, and the LRT statistic does not converge to a conventional χ2 distribution. In this article, we investigate the asymptotic behavior of the LRT for general admixture models and show that its limiting distribution is equivalent to the supremum of a squared Gaussian process. We also discuss the connection and comparison between LRT and alternative approaches such as modifications of LRT and score tests, including the modified LRT ( Fu, Chen, and Kalbfleisch, 2006 , Statistica Sinica 16 , 805–823). The LRT is an omnibus test that is powerful to detect general alternative hypotheses. In contrast, alternative approaches may be slightly more powerful to detect certain type of alternatives, but much less powerful for others. Our results are illustrated by simulation studies and an application to a genetic linkage study of schizophrenia.  相似文献   

17.
OBJECTIVES: The association of a candidate gene with disease can be evaluated by a case-control study in which the genotype distribution is compared for diseased cases and unaffected controls. Usually, the data are analyzed with Armitage's test using the asymptotic null distribution of the test statistic. Since this test does not generally guarantee a type I error rate less than or equal to the significance level alpha, tests based on exact null distributions have been investigated. METHODS: An algorithm to generate the exact null distribution for both Armitage's test statistic and a recently proposed modification of the Baumgartner-Weiss-Schindler statistic is presented. I have compared the tests in a simulation study. RESULTS: The asymptotic Armitage test is slightly anticonservative whereas the exact tests control the type I error rate. The exact Armitage test is very conservative, but the exact test based on the modification of the Baumgartner-Weiss-Schindler statistic has a type I error rate close to alpha. The exact Armitage test is the least powerful test; the difference in power between the other two tests is often small and the comparison does not show a clear winner. CONCLUSION: Simulation results indicate that an exact test based on the modification of the Baumgartner-Weiss-Schindler statistic is preferable for the analysis of case-control studies of genetic markers.  相似文献   

18.
Detecting gene-gene interaction in complex diseases has become an important priority for common disease genetics, but most current approaches to detecting interaction start with disease-marker associations. These approaches are based on population allele frequency correlations, not genetic inheritance, and therefore cannot exploit the rich information about inheritance contained within families. They are also hampered by issues of rigorous phenotype definition, multiple test correction, and allelic and locus heterogeneity. We recently developed, tested, and published a powerful gene-gene interaction detection strategy based on conditioning family data on a known disease-causing allele or a disease-associated marker allele4. We successfully applied the method to disease data and used computer simulation to exhaustively test the method for some epistatic models. We knew that the statistic we developed to indicate interaction was less reliable when applied to more-complex interaction models. Here, we improve the statistic and expand the testing procedure. We computer-simulated multipoint linkage data for a disease caused by two interacting loci. We examined epistatic as well as additive models and compared them with heterogeneity models. In all our models, the at-risk genotypes are “major” in the sense that among affected individuals, a substantial proportion has a disease-related genotype. One of the loci (A) has a known disease-related allele (as would have been determined from a previous analysis). We removed (pruned) family members who did not carry this allele; the resultant dataset is referred to as “stratified.” This elimination step has the effect of raising the “penetrance” and detectability at the second locus (B). We used the lod scores for the stratified and unstratified data sets to calculate a statistic that either indicated the presence of interaction or indicated that no interaction was detectable. We show that the new method is robust and reliable for a wide range of parameters. Our statistic performs well both with the epistatic models (false negative rates, i.e., failing to detect interaction, ranging from 0 to 2.5%) and with the heterogeneity models (false positive rates, i.e., falsely detecting interaction, ≤1%). It works well with the additive model except when allele frequencies at the two loci differ widely. We explore those features of the additive model that make detecting interaction more difficult. All testing of this method suggests that it provides a reliable approach to detecting gene-gene interaction.  相似文献   

19.
An entropy-based statistic TPE has been proposed for genomic association study for disease-susceptibility locus.The statistic TPE may be directly adopted and/or extended to quantitative-trait locus (QTL)mapping for quantitative traits.In this article,the statistic TPE was extended and applied to quantitative trait for association analysis of QTL by means of selective genotyping.The statistical properties (the type I error rate and the power) were examined under a range of parameters and population-sampling strategies (e.g.,various genetic models,various heritabilities,and various sample-selection threshold values) by simulation studies.The results indicated that the statistic Tee is robust and powerful for genomic association study of QTL.A simulation study based on the haplotype frequencies of 10 single nucleotide polymorphisms (SNPs) of angiotensin-I converting enzyme genes was conducted to evaluate the performance of the statistic TPE for genetic association study.  相似文献   

20.
The neutral theory of molecular evolution predicts that the ratio of polymorphisms to fixed differences should be fairly uniform across a region of DNA sequence. Significant heterogeneity in this ratio can indicate the effects of balancing selection, selective sweeps, mildly deleterious mutations, or background selection. Comparing an observed heterogeneity statistic with simulations of the heterogeneity resulting from random phylogenetic and sampling variation provides a test of the statistical significance of the observed pattern. When simulated data sets containing heterogeneity in the polymorphism-to-divergence ratio are examined, different statistics are most powerful for detecting different patterns of heterogeneity. The number of runs is most powerful for detecting patterns containing several peaks of polymorphism; the Kolmogorov-Smirnov statistic is most powerful for detecting patterns in which one end of the gene has high polymorphism and the other end has low polymorphism; and a newly developed statistic, the mean sliding G statistic, is most powerful for detecting patterns containing one or two peaks of polymorphism with reduced polymorphism on either side. Nine out of 27 genes from the Drosophila melanogaster subgroup exhibit heterogeneity that is significant under at least one of these three tests, with five of the nine remaining significant after a correction for multiple comparisons, suggesting that detectable evidence for the effects of some kind of selection is fairly common.   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号