共查询到20条相似文献,搜索用时 15 毫秒
1.
Statistical tests for detecting positive selection by utilizing high-frequency variants 总被引:6,自引:2,他引:6
By comparing the low-, intermediate-, and high-frequency parts of the frequency spectrum, we gain information on the evolutionary forces that influence the pattern of polymorphism in population samples. We emphasize the high-frequency variants on which positive selection and negative (background) selection exhibit different effects. We propose a new estimator of θ (the product of effective population size and neutral mutation rate), θL, which is sensitive to the changes in high-frequency variants. The new θL allows us to revise Fay and Wu's H-test by normalization. To complement the existing statistics (the H-test and Tajima's D-test), we propose a new test, E, which relies on the difference between θL and Watterson's θW. We show that this test is most powerful in detecting the recovery phase after the loss of genetic diversity, which includes the postselective sweep phase. The sensitivities of these tests to (or robustness against) background selection and demographic changes are also considered. Overall, D and H in combination can be most effective in detecting positive selection while being insensitive to other perturbations. We thus propose a joint test, referred to as the DH test. Simulations indicate that DH is indeed sensitive primarily to directional selection and no other driving forces. 相似文献
2.
3.
Statistical methods for detecting molecular adaptation 总被引:2,自引:0,他引:2
The past few years have seen the development of powerful statistical methods for detecting adaptive molecular evolution. These methods compare synonymous and nonsynonymous substitution rates in protein-coding genes, and regard a nonsynonymous rate elevated above the synonymous rate as evidence for darwinian selection. Numerous cases of molecular adaptation are being identified in various systems from viruses to humans. Although previous analyses averaging rates over sites and time have little power, recent methods designed to detect positive selection at individual sites and lineages have been successful. Here, we summarize recent statistical methods for detecting molecular adaptation, and discuss their limitations and possible improvements. 相似文献
4.
Statistical tests for multivariate bioequivalence 总被引:3,自引:0,他引:3
5.
This paper presents a statistical method for testing whether a male mouse is a recessive lethal-carrier. The analysis is based on a back-cross experiment in which the male mouse is mated with some of his daughters. The numbers of total implantations and intrauterine deaths in each litter are recorded. It is assumed that, conditional on the number of total implantations, the number of intrauterine deaths follows a binomial distribution. Using computer-simulated experimentation it is shown that the proposed statistical method, which is sensitive to the pattern of intrauterine death rates, is more powerful than a test based only on the total number of implant deaths. The proposed test requires relatively simple calculations and can be used for a wide range of values of total implantations and background implant mortality rates. For computer-simulated experiments, there was no practical difference between the empirical error rate and the nominal error rate. 相似文献
6.
Risch and Teng [Genome Res 1998;8:1273-1288] and Teng and Risch [Genome Res 1999;9:234-241] proposed a class of transmission/disequilibrium test-like statistical tests based on the difference between the estimated allele frequencies in the affected and control populations. They evaluated the power of a variety of family-based and nonfamily-based designs for detecting an association between a candidate allele and disease. Because they were concerned with diseases with low penetrances, their power calculations assumed that unaffected individuals can be treated as a random sample from the population. They predicted that this assumption rendered their sample size calculations slightly conservative. We generalize their partial ascertainment conditioning by including the status of the unaffected sibs in the calculations of the distribution and power of the statistic used to compare the allele frequency in affected offspring to the estimated frequency in the parents, based on sibships with genotyped affected and unaffected sibs. Sample size formulas for our full ascertainment methods are presented. The sample sizes for our procedure are compared to those of Teng and Risch. The numerical results and simulations indicate that the simplifying assumption used in Teng and Risch can produce both conservative and anticonservative results. The magnitude of the difference between the sample sizes needed by their partial ascertainment approximation and the full ascertainment is small in the circumstances they focused on but can be appreciable in others, especially when the baseline penetrances are moderate. Two other statistics, using different estimators for the variance of the basic statistic comparing the allele frequencies in the affected and unaffected sibs are introduced. One of them incorporates an estimate of the null variance obtained from an auxiliary sample and appears to noticeably decrease the sample sizes required to achieve a prespecified power. 相似文献
7.
There has been an increasing interest in detecting gene-gene and gene-environment interactions in genetic association studies. A major statistical challenge is how to deal with a large number of parameters measuring possible interaction effects, which leads to reduced power of any statistical test due to a large number of degrees of freedom or high cost of adjustment for multiple testing. Hence, a popular idea is to first apply some dimension reduction techniques before testing, while another is to apply only statistical tests that are developed for and robust to high-dimensional data. To combine both ideas, we propose applying an adaptive sum of squared score (SSU) test and several other adaptive tests. These adaptive tests are extensions of the adaptive Neyman test [Fan, 1996], which was originally proposed for high-dimensional data, providing a simple and effective way for dimension reduction. On the other hand, the original SSU test coincides with a version of a test specifically developed for high-dimensional data. We apply these adaptive tests and their original nonadaptive versions to simulated data to detect interactions between two groups of SNPs (e.g. multiple SNPs in two candidate regions). We found that for sparse models (i.e. with only few non-zero interaction parameters), the adaptive SSU test and its close variant, an adaptive version of the weighted sum of squared score (SSUw) test, improved the power over their non-adaptive versions, and performed consistently well across various scenarios. The proposed adaptive tests are built in the general framework of regression analysis, and can thus be applied to various types of traits in the presence of covariates. 相似文献
8.
Suzuki Y 《Genes & genetic systems》2010,85(6):359-376
In the study of molecular and phenotypic evolution, understanding the relative importance of random genetic drift and positive selection as the mechanisms for driving divergences between populations and maintaining polymorphisms within populations has been a central issue. A variety of statistical methods has been developed for detecting natural selection operating at the amino acid and nucleotide sequence levels. These methods may be largely classified into those aimed at detecting recurrent and/or recent/ongoing natural selection by utilizing the divergence and/or polymorphism data. Using these methods, pervasive positive selection has been identified for protein-coding and non-coding sequences in the genomic analysis of some organisms. However, many of these methods have been criticized by using computer simulation and real data analysis to produce excessive false-positives and to be sensitive to various disturbing factors. Importantly, some of these methods have been invalidated experimentally. These facts indicate that many of the statistical methods for detecting natural selection are unreliable. In addition, the signals that have been believed as the evidence for fixations of advantageous mutations due to positive selection may also be interpreted as the evidence for fixations of deleterious mutations due to random genetic drift. The genomic diversity data are rapidly accumulating in various organisms, and detection of natural selection may play a critical role for clarifying the relative role of random genetic drift and positive selection in molecular and phenotypic evolution. It is therefore important to develop reliable statistical methods that are unbiased as well as robust against various disturbing factors, for inferring natural selection. 相似文献
9.
We have determined the marker separations (genetic distances) that maximize the probability, or power, of detecting meiotic recombination deficiency when only a limited number of meiotic progeny can be assayed. We find that the optimal marker separation is as large as 30-100 cM in many cases. Provided the appropriate marker separation is used, small reductions in recombination potential (as little as 50%) can be detected by assaying a single interval in as few as 100 progeny. If recombination is uniformly altered across the genomic region of interest, the same sensitivity can be obtained by assaying multiple independent intervals in correspondingly fewer progeny. A reduction or abolition of crossover interference, with or without a reduction of recombination proficiency, can be detected with similar sensitivity. We present a set of graphs that display the optimal marker separation and the number of meiotic progeny that must be assayed to detect a given recombination deficiency in the presence of various levels of crossover interference. These results will aid the optimal design of experiments to detect meiotic recombination deficiency in any organism. 相似文献
10.
The w statistic introduced by Lockhart et al. (1998. A covariotide model explains apparent phylogenetic structure of oxygenic photosynthetic lineages. Mol Biol Evol. 15:1183-1188) is a simple and easily calculated statistic intended to detect heterotachy by comparing amino acid substitution patterns between two monophyletic groups of protein sequences. It is defined as the difference between the fraction of varied sites in both groups and the fraction of varied sites in each group. The w test has been used to distinguish a covarion process from equal rates and rates variation across sites processes. Using simulation we show that the w test is effective for small data sets and for data sets that have low substitution rates in the groups but can have difficulties when these conditions are not met. Using site entropy as a measure of variability of a sequence site, we modify the w statistic to a w' statistic by assigning as varied in one group those sites that are actually varied in both groups but have a large entropy difference. We show that the w' test has more power to detect two kinds of heterotachy processes (covarion and bivariate rate shifts) in large and variable data. We also show that a test of Pearson's correlation of the site entropies between two monophyletic groups can be used to detect heterotachy and has more power than the w' test. Furthermore, we demonstrate that there are settings where the correlation test as well as w and w' tests do not detect heterotachy signals in data simulated under a branch length mixture model. In such cases, it is sometimes possible to detect heterotachy through subselection of appropriate taxa. Finally, we discuss the abilities of the three statistical tests to detect a fourth mode of heterotachy: lineage-specific changes in proportion of variable sites. 相似文献
11.
12.
Mourier T Ho SY Gilbert MT Willerslev E Orlando L 《Molecular biology and evolution》2012,29(9):2241-2251
Populations carry a genetic signal of their demographic past, providing an opportunity for investigating the processes that shaped their evolution. Our ability to infer population histories can be enhanced by including ancient DNA data. Using serial-coalescent simulations and a range of both quantitative and temporal sampling schemes, we test the power of ancient mitochondrial sequences and nuclear single-nucleotide polymorphisms (SNPs) to detect past population bottlenecks. Within our simulated framework, mitochondrial sequences have only limited power to detect subtle bottlenecks and/or fast post-bottleneck recoveries. In contrast, nuclear SNPs can detect bottlenecks followed by rapid recovery, although bottlenecks involving reduction of less than half the population are generally detected with low power unless extensive genetic information from ancient individuals is available. Our results provide useful guidelines for scaling sampling schemes and for optimizing our ability to infer past population dynamics. In addition, our results suggest that many ancient DNA studies may face power issues in detecting moderate demographic collapses and/or highly dynamic demographic shifts when based solely on mitochondrial information. 相似文献
13.
Statistical methods of DNA sequence analysis: detection of intragenic recombination or gene conversion 总被引:28,自引:4,他引:28
Simple but exact statistical tests for detecting a cluster of associated
nucleotide changes in DNA are presented. The tests are based on the linear
distribution of a set of s sites among a total of n sites, where the s
sites may be the variable sites, sites of insertion/deletion, or
categorized in some other way. These tests are especially useful for
detecting gene conversion and intragenic recombination in a sample of DNA
sequences. In this case, the sites of interest are those that correspond to
particular ways of splitting the sequences into two groups (e.g., sequences
A and D vs. sequences B, C, and E-J). Each such split is termed a
phylogenetic partition. Application of these methods to a well-documented
case of gene conversion in human gamma-globin genes shows that sites
corresponding to two of the three observed partitions are significantly
clustered, whereas application to hominoid mitochondrial DNA
sequences--among which no recombination is expected to occur--shows no
evidence of such clustering. This indicates that clustering of
partition-specific sites is largely due to intragenic recombination or gene
conversion. Alternative hypotheses explaining the observed clustering of
sites, such as biased selection or mutation, are discussed.
相似文献
14.
Extracting biological information from microarray data requires appropriate statistical methods. The simplest statistical method for detecting differential expression is the t test, which can be used to compare two conditions when there is replication of samples. With more than two conditions, analysis of variance (ANOVA) can be used, and the mixed ANOVA model is a general and powerful approach for microarray experiments with multiple factors and/or several sources of variation. 相似文献
15.
Wang K 《Biostatistics (Oxford, England)》2012,13(4):724-733
The central theme in case-control genetic association studies is to efficiently identify genetic markers associated with trait status. Powerful statistical methods are critical to accomplishing this goal. A popular method is the omnibus Pearson's chi-square test applied to genotype counts. To achieve increased power, tests based on an assumed trait model have been proposed. However, they are not robust to model misspecification. Much research has been carried out on enhancing robustness of such model-based tests. An analysis framework that tests the equality of allele frequency while allowing for different deviation from Hardy-Weinberg equilibrium (HWE) between cases and controls is proposed. The proposed method does not require specification of trait models nor HWE. It involves only 1 degree of freedom. The likelihood ratio statistic, score statistic, and Wald statistic associated with this framework are introduced. Their performance is evaluated by extensive computer simulation in comparison with existing methods. 相似文献
16.
Peery MZ Kirby R Reid BN Stoelting R Doucet-Bëer E Robinson S Vásquez-Carrillo C Pauli JN Palsbøll PJ 《Molecular ecology》2012,21(14):3403-3418
The identification of population bottlenecks is critical in conservation because populations that have experienced significant reductions in abundance are subject to a variety of genetic and demographic processes that can hasten extinction. Genetic bottleneck tests constitute an appealing and popular approach for determining if a population decline has occurred because they only require sampling at a single point in time, yet reflect demographic history over multiple generations. However, a review of the published literature indicates that, as typically applied, microsatellite-based bottleneck tests often do not detect bottlenecks in vertebrate populations known to have experienced declines. This observation was supported by simulations that revealed that bottleneck tests can have limited statistical power to detect bottlenecks largely as a result of limited sample sizes typically used in published studies. Moreover, commonly assumed values for mutation model parameters do not appear to encompass variation in microsatellite evolution observed in vertebrates and, on average, the proportion of multi-step mutations is underestimated by a factor of approximately two. As a result, bottleneck tests can have a higher probability of 'detecting' bottlenecks in stable populations than expected based on the nominal significance level. We provide recommendations that could add rigor to inferences drawn from future bottleneck tests and highlight new directions for the characterization of demographic history. 相似文献
17.
The numerical data collected daily for the longest series of inorganic chemical tests, carried out in Florence (Piccardi and co-workers, 1951–1972) and in Brussels (Capel-Boute, 1956–1978), have been submitted to a statistical analysis for the purpose of searching an answer to the questions which led to start the collection of long-term series of data with the Piccardi chemical tests in different places. The question was to study the variability in the course of time of various effects observed on aqueous systems, even in the most rigorously standardized conditions, for a chemical precipitation reaction. Since significant long-term perturbations and an annual variation are present in all data sets, the observations cannot be conceived as purely random fluctuations. No common long-term pattern is observed and the measurements are not unambiguously correlated with climatological effects or the solar cycle. The statistical information content of the chemical tests is time-dependent, which implies non-stationarity of the observations. These results suggest the necessity of search for disturbing geophysical and cosmological factors to understand the mechanisms of the interaction. 相似文献
18.
We propose using a variant of logistic regression (LR) with-regularization to fit gene–gene and gene–environment interaction models. Studies haveshown that many common diseases are influenced by interactionof certain genes. LR models with quadratic penalization notonly correctly characterizes the influential genes along withtheir interaction structures but also yields additional benefitsin handling high-dimensional, discrete factors with a binaryresponse. We illustrate the advantages of using an -regularization scheme and compare its performancewith that of "multifactor dimensionality reduction" and "FlexTree,"2 recent tools for identifying gene–gene interactions.Through simulated and real data sets, we demonstrate that ourmethod outperforms other methods in the identification of theinteraction structures as well as prediction accuracy. In addition,we validate the significance of the factors selected throughbootstrap analyses. 相似文献
19.
Admixture mapping is a promising new tool for discovering genes that contribute to complex traits. This mapping approach uses samples from recently admixed populations to detect susceptibility loci at which the risk alleles have different frequencies in the original contributing populations. Although the idea for admixture mapping has been around for more than a decade, the genomic tools are only now becoming available to make this a feasible and attractive option for complex-trait mapping. In this article, we describe new statistical methods for analyzing multipoint data from admixture-mapping studies to detect ancestry association. The new test statistics do not assume a particular disease model; instead, they are based simply on the extent to which the sample's ancestry proportions at a locus deviate from the genome average. Our power calculations show that, for loci at which the underlying risk-allele frequencies are substantially different in the ancestral populations, the power of admixture mapping can be comparable to that of association mapping but with a far smaller number of markers. We also show that, although ancestry informative markers (AIMs) are superior to random single-nucleotide polymorphisms (SNPs), random SNPs can perform quite well when AIMs are not available. Hence, researchers who study admixed populations in which AIMs are not available can perform admixture mapping with the use of modestly higher densities of random markers. Software to perform the gene-mapping calculations, MALDsoft, is freely available on the Pritchard Lab Web site. 相似文献
20.
We discuss some theory concerning directional data and introduce a suite of statistical tools that researchers interested in the directional movement of animal groups can use to analyse results from their models. We illustrate these tools by analysing the results of a model of groups moving under the duress of certain informed indistinguishable individuals, that arises in the context of honeybee (Apis mellifera) swarming behaviour. We modify an existing model of collective motion, based on inter-individual social interactions, allowing knowledgeable individuals to guide group members to the goal by travelling through the group in a direct line aligned with the goal direction. 相似文献