首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Rapid and inexpensive sequencing technologies are making it possible to collect whole genome sequence data on multiple individuals from a population. This type of data can be used to quickly identify genes that control important ecological and evolutionary phenotypes by finding the targets of adaptive natural selection, and we therefore refer to such approaches as "reverse ecology." To quantify the power gained in detecting positive selection using population genomic data, we compare three statistical methods for identifying targets of selection: the McDonald-Kreitman test, the mkprf method, and a likelihood implementation for detecting d(N)/d(S) > 1. Because the first two methods use polymorphism data we expect them to have more power to detect selection. However, when applied to population genomic datasets from human, fly, and yeast, the tests using polymorphism data were actually weaker in two of the three datasets. We explore reasons why the simpler comparative method has identified more genes under selection, and suggest that the different methods may really be detecting different signals from the same sequence data. Finally, we find several statistical anomalies associated with the mkprf method, including an almost linear dependence between the number of positively selected genes identified and the prior distributions used. We conclude that interpreting the results produced by this method should be done with some caution.  相似文献   

2.
Detecting positive Darwinian selection at the DNA sequence level has been a subject of considerable interest. However, positive selection is difficult to detect because it often operates episodically on a few amino acid sites, and the signal may be masked by negative selection. Several methods have been developed to test positive selection that acts on given branches (branch methods) or on a subset of sites (site methods). Recently, Yang, Z., and R. Nielsen (2002. Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol. Biol. Evol. 19:908-917) developed likelihood ratio tests (LRTs) based on branch-site models to detect positive selection that affects a small number of sites along prespecified lineages. However, computer simulations suggested that the tests were sensitive to the model assumptions and were unable to distinguish between relaxation of selective constraint and positive selection (Zhang, J. 2004. Frequent false detection of positive selection by the likelihood method with branch-site models. Mol. Biol. Evol. 21:1332-1339). Here, we describe a modified branch-site model and use it to construct two LRTs, called branch-site tests 1 and 2. We applied the new tests to reanalyze several real data sets and used computer simulation to examine the performance of the two tests by examining their false-positive rate, power, and robustness. We found that test 1 was unable to distinguish relaxed constraint from positive selection affecting the lineages of interest, while test 2 had acceptable false-positive rates and appeared robust against violations of model assumptions. As test 2 is a direct test of positive selection on the lineages of interest, it is referred to as the branch-site test of positive selection and is recommended for use in real data analysis. The test appeared conservative overall, but exhibited better power in detecting positive selection than the branch-based test. Bayes empirical Bayes identification of amino acid sites under positive selection along the foreground branches was found to be reliable, but lacked power.  相似文献   

3.
The selective pressure at the protein level is usually measured by the nonsynonymous/synonymous rate ratio (omega = dN/dS), with omega < 1, omega = 1, and omega > 1 indicating purifying (or negative) selection, neutral evolution, and diversifying (or positive) selection, respectively. The omega ratio is commonly calculated as an average over sites. As every functional protein has some amino acid sites under selective constraints, averaging rates across sites leads to low power to detect positive selection. Recently developed models of codon substitution allow the omega ratio to vary among sites and appear to be powerful in detecting positive selection in empirical data analysis. In this study, we used computer simulation to investigate the accuracy and power of the likelihood ratio test (LRT) in detecting positive selection at amino acid sites. The test compares two nested models: one that allows for sites under positive selection (with omega > 1), and another that does not, with the chi2 distribution used for significance testing. We found that use of the chi(2) distribution makes the test conservative, especially when the data contain very short and highly similar sequences. Nevertheless, the LRT is powerful. Although the power can be low with only 5 or 6 sequences in the data, it was nearly 100% in data sets of 17 sequences. Sequence length, sequence divergence, and the strength of positive selection also were found to affect the power of the LRT. The exact distribution assumed for the omega ratio over sites was found not to affect the effectiveness of the LRT.  相似文献   

4.
We investigate the performance of tests of neutrality in admixed populations using plausible demographic models for African-American history as well as resequencing data from African and African-American populations. The analysis of both simulated and human resequencing data suggests that recent admixture does not result in an excess of false-positive results for neutrality tests based on the frequency spectrum after accounting for the population growth in the parental African population. Furthermore, when simulating positive selection, Tajima's D, Fu and Li's D, and haplotype homozygosity have lower power to detect population-specific selection using individuals sampled from the admixed population than from the nonadmixed population. Fay and Wu's H test, however, has more power to detect selection using individuals from the admixed population than from the nonadmixed population, especially when the selective sweep ended long ago. Our results have implications for interpreting recent genome-wide scans for positive selection in human populations.  相似文献   

5.
Perspective: detecting adaptive molecular polymorphism: lessons from the MHC   总被引:13,自引:0,他引:13  
Abstract. In the 1960s, when population geneticists first began to collect data on the amount of genetic variation in natural populations, balancing selection was invoked as a possible explanation for how such high levels of molecular variation are maintained. However, the predictions of the neutral theory of molecular evolution have since become the standard by which cases of balancing selection may be inferred. Here we review the evidence for balancing selection acting on the major histocompatibility complex (MHC) of vertebrates, a genetic system that defies many of the predictions of neutrality. We apply many widely used tests of neutrality to MHC data as a benchmark for assessing the power of these tests. These tests can be categorized as detecting selection in the current generation, over the history of populations, or over the histories of species. We find that selection is not detectable in MHC datasets in every generation, population, or every evolutionary lineage. This suggests either that selection on the MHC is heterogeneous or that many of the current neutrality tests lack sufficient power to detect the selection consistently. Additionally, we identify a potential inference problem associated with several tests of neutrality. We demonstrate that the signals of selection may be generated in a relatively short period of microevolutionary time, yet these signals may take exceptionally long periods of time to be erased in the absence of selection. This is especially true for the neutrality test based on the ratio of nonsynonymous to synonymous substitutions. Inference of the nature of the selection events that create such signals should be approached with caution. However, a combination of tests on different time scales may overcome such problems.  相似文献   

6.
Statistical properties of the branch-site test of positive selection   总被引:1,自引:0,他引:1  
The branch-site test is a likelihood ratio test to detect positive selection along prespecified lineages on a phylogeny that affects only a subset of codons in a protein-coding gene, with positive selection indicated by accelerated nonsynonymous substitutions (with ω = d(N)/d(S) > 1). This test may have more power than earlier methods, which average nucleotide substitution rates over sites in the protein and/or over branches on the tree. However, a few recent studies questioned the statistical basis of the test and claimed that the test generated too many false positives. In this paper, we examine the null distribution of the test and conduct a computer simulation to examine the false-positive rate and the power of the test. The results suggest that the asymptotic theory is reliable for typical data sets, and indeed in our simulations, the large-sample null distribution was reliable with as few as 20-50 codons in the alignment. We examined the impact of sequence length, the strength of positive selection, and the proportion of sites under positive selection on the power of the branch-site test. We found that the test was far more powerful in detecting episodic positive selection than branch-based tests, which average substitution rates over all codons in the gene and thus miss the signal when most codons are under strong selective constraint. Recent claims of statistical problems with the branch-site test are due to misinterpretations of simulation results. Our results, as well as previous simulation studies that have demonstrated the robustness of the test, suggest that the branch-site test may be a useful tool for detecting episodic positive selection and for generating biological hypotheses for mutation studies and functional analyses. The test is sensitive to sequence and alignment errors and caution should be exercised concerning its use when data quality is in doubt.  相似文献   

7.
Selective phenotyping for increased efficiency in genetic mapping studies   总被引:3,自引:0,他引:3  
Jin C  Lan H  Attie AD  Churchill GA  Bulutuglo D  Yandell BS 《Genetics》2004,168(4):2285-2293
The power of a genetic mapping study depends on the heritability of the trait, the number of individuals included in the analysis, and the genetic dissimilarity among them. In experiments that involve microarrays or other complex physiological assays, phenotyping can be expensive and time-consuming and may impose limits on the sample size. A random selection of individuals may not provide sufficient power to detect linkage until a large sample size is reached. We present an algorithm for selecting a subset of individuals solely on the basis of genotype data that can achieve substantial improvements in sensitivity compared to a random sample of the same size. The selective phenotyping method involves preferentially selecting individuals to maximize their genotypic dissimilarity. Selective phenotyping is most effective when prior knowledge of genetic architecture allows us to focus on specific genetic regions. However, it can also provide modest improvements in efficiency when applied on a whole-genome basis. Importantly, selective phenotyping does not reduce the efficiency of mapping as compared to a random sample in regions that are not considered in the selection process. In contrast to selective genotyping, inferences based solely on a selectively phenotyped population of individuals are representative of the whole population. The substantial improvement introduced by selective phenotyping is particularly useful when phenotyping is difficult or costly and thus limits the sample size in a genetic mapping study.  相似文献   

8.
Taxanes are defensive metabolites produced by Taxus species (yews) and used in anticancer therapies. Despite their medical interest, patterns of natural diversity in taxane-related genes are unknown. We examined variation at five main genes of Taxus baccata in the Iberian Peninsula, a region where unique yew genetic resources are endangered. We looked at several gene features and applied complementary neutrality tests, including diversity/divergence tests, tests solely based on site frequency spectrum (SFS) and Zeng's compound tests. To account for specific demography, microsatellite data were used to infer historical changes in population size based on an Approximate Bayesian Computation (ABC) approach. Polymorphism-divergence tests pointed to positive selection for genes TBT and TAT and balancing selection for DBAT. In addition, neutrality tests based on SFS found that while a recent reduction in population size may explain most statistics' values, selection may still be in action in genes TBT and DBAT, at least in some populations. Molecular signatures on taxol genes suggest the action of frequent selective waves with different direction or intensity, possibly related to varying adaptive pressures produced by the host-enemy co-evolution on defence-related genes. Such natural selection processes may have produced taxane variants still undiscovered.  相似文献   

9.
Populations carry a genetic signal of their demographic past, providing an opportunity for investigating the processes that shaped their evolution. Our ability to infer population histories can be enhanced by including ancient DNA data. Using serial-coalescent simulations and a range of both quantitative and temporal sampling schemes, we test the power of ancient mitochondrial sequences and nuclear single-nucleotide polymorphisms (SNPs) to detect past population bottlenecks. Within our simulated framework, mitochondrial sequences have only limited power to detect subtle bottlenecks and/or fast post-bottleneck recoveries. In contrast, nuclear SNPs can detect bottlenecks followed by rapid recovery, although bottlenecks involving reduction of less than half the population are generally detected with low power unless extensive genetic information from ancient individuals is available. Our results provide useful guidelines for scaling sampling schemes and for optimizing our ability to infer past population dynamics. In addition, our results suggest that many ancient DNA studies may face power issues in detecting moderate demographic collapses and/or highly dynamic demographic shifts when based solely on mitochondrial information.  相似文献   

10.
Orengo DJ  Aguadé M 《Genetics》2004,167(4):1759-1766
The effects on nucleotide variation of adaptations to temperate habitats and of the possible bottleneck associated with the origin of European populations of Drosophila melanogaster should be detectable in DNA sequences given the short time elapsed relative to the species population size. We surveyed nucleotide variation in 109 fragments distributed across the X chromosome in a European population of D. melanogaster to detect the footprint of positive selection. Fragments were located primarily in large noncoding regions. Multilocus tests based on Tajima's D statistic revealed a significant departure from neutral expectations in a stationary panmictic population, with an important contribution from both positive and negative D values. A positive relationship between Tajima's D values and distance to coding region was detected, with a comparative excess of significantly negative D values in the subset of fragments closer to coding regions. Also, there was a significant heterogeneity in the polymorphism to divergence ratio, with 12 fragments contributing 42% to the test statistic. Moreover, these fragments were comparatively closer to coding regions. These findings would imply positive selection events, and thus selective sweeps, during the species expansion to Europe.  相似文献   

11.
Zhu L  Bustamante CD 《Genetics》2005,170(3):1411-1421
We present a novel composite-likelihood-ratio test (CLRT) for detecting genes and genomic regions that are subject to recurrent natural selection (either positive or negative). The method uses the likelihood functions of Hartl et al. (1994) for inference in a Wright-Fisher genic selection model and corrects for nonindependence among sites by application of coalescent simulations with recombination. Here, we (1) characterize the distribution of the CLRT statistic (Lambda) as a function of the population recombination rate (R=4Ner); (2) explore the effects of bias in estimation of R on the size (type I error) of the CLRT; (3) explore the robustness of the model to population growth, bottlenecks, and migration; (4) explore the power of the CLRT under varying levels of mutation, selection, and recombination; (5) explore the discriminatory power of the test in distinguishing negative selection from population growth; and (6) evaluate the performance of maximum composite-likelihood estimation (MCLE) of the selection coefficient. We find that the test has excellent power to detect weak negative selection and moderate power to detect positive selection. Moreover, the test is quite robust to bias in the estimate of local recombination rate, but not to certain demographic scenarios such as population growth or a recent bottleneck. Last, we demonstrate that the MCLE of the selection parameter has little bias for weak negative selection and has downward bias for positively selected mutations.  相似文献   

12.
Anisimova M  Nielsen R  Yang Z 《Genetics》2003,164(3):1229-1236
Maximum-likelihood methods based on models of codon substitution accounting for heterogeneous selective pressures across sites have proved to be powerful in detecting positive selection in protein-coding DNA sequences. Those methods are phylogeny based and do not account for the effects of recombination. When recombination occurs, such as in population data, no unique tree topology can describe the evolutionary history of the whole sequence. This violation of assumptions raises serious concerns about the likelihood method for detecting positive selection. Here we use computer simulation to evaluate the reliability of the likelihood-ratio test (LRT) for positive selection in the presence of recombination. We examine three tests based on different models of variable selective pressures among sites. Sequences are simulated using a coalescent model with recombination and analyzed using codon-based likelihood models ignoring recombination. We find that the LRT is robust to low levels of recombination (with fewer than three recombination events in the history of a sample of 10 sequences). However, at higher levels of recombination, the type I error rate can be as high as 90%, especially when the null model in the LRT is unrealistic, and the test often mistakes recombination as evidence for positive selection. The test that compares the more realistic models M7 (beta) against M8 (beta and omega) is more robust to recombination, where the null model M7 allows the positive selection pressure to vary between 0 and 1 (and so does not account for positive selection), and the alternative model M8 allows an additional discrete class with omega = d(N)/d(S) that could be estimated to be >1 (and thus accounts for positive selection). Identification of sites under positive selection by the empirical Bayes method appears to be less affected than the LRT by recombination.  相似文献   

13.
The use of codon substitution models to compare synonymous and nonsynonymous substitution rates is a widely used approach to detecting positive Darwinian selection affecting protein evolution. However, in several recent papers, Hughes and colleagues claim that codon-based likelihood-ratio tests (LRTs) are logically flawed as they lack prior hypotheses and fail to accommodate random fluctuations in synonymous and nonsynonymous substitutions Friedman and Hughes (2007) also used site-based LRTs to analyze 605 gene families consisting of human and mouse paralogues. They found that the outcome of the tests was largely determined by irrelevant factors such as the GC content at the third codon positions and the synonymous rate d(S), but not by the nonsynonymous rate d(N) or the d(N)/d(S) ratio, factors that should be related to selection. Here, we reanalyze those data. Contra Friedman and Hughes, we found that the test results are related to sequence length and the average d(N)/d(S) ratio. We examine the criticisms of Hughes and suggest that they are based on misunderstandings of the codon models and on statistical errors. Our analyses suggest that codon-based tests are useful tools for comparative analysis of genomic data sets.  相似文献   

14.
Gompert Z  Buerkle CA 《Genetics》2011,187(3):903-917
The demography of populations and natural selection shape genetic variation across the genome and understanding the genomic consequences of these evolutionary processes is a fundamental aim of population genetics. We have developed a hierarchical Bayesian model to quantify genome-wide population structure and identify candidate genetic regions affected by selection. This model improves on existing methods by accounting for stochastic sampling of sequences inherent in next-generation sequencing (with pooled or indexed individual samples) and by incorporating genetic distances among haplotypes in measures of genetic differentiation. Using simulations we demonstrate that this model has a low false-positive rate for classifying neutral genetic regions as selected genes (i.e., Φ(ST) outliers), but can detect recent selective sweeps, particularly when genetic regions in multiple populations are affected by selection. Nonetheless, selection affecting just a single population was difficult to detect and resulted in a high false-negative rate under certain conditions. We applied the Bayesian model to two large sets of human population genetic data. We found evidence of widespread positive and balancing selection among worldwide human populations, including many genetic regions previously thought to be under selection. Additionally, we identified novel candidate genes for selection, several of which have been linked to human diseases. This model will facilitate the population genetic analysis of a wide range of organisms on the basis of next-generation sequence data.  相似文献   

15.
Properties of Statistical Tests of Neutrality for DNA Polymorphism Data   总被引:5,自引:5,他引:0  
A class of statistical tests based on molecular polymorphism data is studied to determine size and power properties. The class includes TAJIMA''s D statistic as well as the D* and F* tests proposed by FU and LI. A new method of constructing critical values for these tests is described. Simulations indicate that TAJIMA''s test is generally most powerful against the alternative hypotheses of selective sweep, population bottleneck, and population subdivision, among tests within this class. However, even TAJIMA''s test can detect a selective sweep or bottleneck only if it has occurred within a specific interval of time in the recent past or population subdivision only when it has persisted for a very long time. For greatest power against the particular alternatives studied here, it is better to sequence more alleles than more sites.  相似文献   

16.
Mes TH 《Molecular ecology》2003,12(6):1555-1566
Mitochondrial ND4 sequences of populations of four species of parasitic nematodes of livestock were subjected to demographic analyses. Deviation from selective neutrality was detectable using the frequency spectrum of segregating sites and highly negative neutrality statistics. However, the mitochondrial data sets do not comply with the infinite-sites model that underlies these tests, and as a consequence, it was not established whether these features are solely a result of population expansion, or whether aspects of the molecular evolution of these mitochondrial regions are also involved. Coalescent analyses based on Fu's Fs neutrality test, which incorporated estimates of rate heterogeneity, the transition-transversion ratio and nucleotide bias, as well as analyses that are fairly robust to deviations from the infinite-sites model supported population expansion. Also analyses that do not depend on the infinite-sites model suggested historical population expansion of these nematodes. The very similar time since expansion, the absence of signatures of positive selection in ND4 and the logical association with human demography imply that selective sweeps of mitochondrial variants are less probable, and that expansion is the most likely scenario for the parasitic nematodes of livestock. The methods used to characterize the expansion have different assumptions and emphasize different aspects of expansions. The resulting restrictions on the interpretation of expansions are outlined.  相似文献   

17.
It is important to detect population bottlenecks in threatened and managed species because bottlenecks can increase the risk of population extinction. Early detection is critical and can be facilitated by statistically powerful monitoring programs for detecting bottleneck-induced genetic change. We used Monte Carlo computer simulations to evaluate the power of the following tests for detecting genetic changes caused by a severe reduction in a population's effective size ( N e): a test for loss of heterozygosity, two tests for loss of alleles, two tests for change in the distribution of allele frequencies, and a test for small N e based on variance in allele frequencies (the 'variance test'). The variance test was most powerful; it provided an 85% probability of detecting a bottleneck of size N e = 10 when monitoring five microsatellite loci and sampling 30 individuals both before and one generation after the bottleneck. The variance test was almost 10-times more powerful than a commonly used test for loss of heterozygosity, and it allowed for detection of bottlenecks before 5% of a population's heterozygosity had been lost. The second most powerful tests were generally the tests for loss of alleles. However, these tests had reduced power for detecting genetic bottlenecks caused by skewed sex ratios. We provide guidelines for the number of loci and individuals needed to achieve high-power tests when monitoring via the variance test. We also illustrate how the variance test performs when monitoring loci that have widely different allele frequency distributions as observed in five wild populations of mountain sheep ( Ovis canadensis ).  相似文献   

18.
Compound tests for the detection of hitchhiking under positive selection   总被引:2,自引:0,他引:2  
Many statistical tests have been developed for detecting positive selection. Most of these tests draw conclusions based on significant deviations from the patterns of polymorphism predicted by the neutral model. However, many non-equilibrium forces may cause similar deviations, and thus the tests usually have low statistical specificity to positive selection. The main challenge is hence to construct test statistics that are reasonably powerful in detecting positive selection, but are relatively insensitive to other forces. Recently, Zeng et al. (2006) proposed a new test, DH, which is a compound of Tajima's D and Fay and Wu's H, and showed that DH has reasonably high statistical specificity to positive selection. In this report, we expand the idea of a compound test by combining Fay and Wu's H or DH with the Ewens-Watterson (EW) test. We refer to these 2 new tests as HEW and DHEW, respectively. Compared to the DH test, HEW and DHEW are more robust against the presence of recombination, and are also more powerful in detecting positive selection. Furthermore, the DHEW test, similar to DH, is also relatively insensitive to background selection and demography. The HEW test, on the other hand, tends to be somewhat less conservative than DH and DHEW in some cases.  相似文献   

19.
Detection of positive Darwinian selection has become ever more important with the rapid growth of genomic data sets. Recent branch-site models of codon substitution account for variation of selective pressure over branches on the tree and across sites in the sequence and provide a means to detect short episodes of molecular adaptation affecting just a few sites. In likelihood ratio tests based on such models, the branches to be tested for positive selection have to be specified a priori. In the absence of a biological hypothesis to designate so-called foreground branches, one may test many branches, but a correction for multiple testing becomes necessary. In this paper, we employ computer simulation to evaluate the performance of 6 multiple test correction procedures when the branch-site models are used to test every branch on the phylogeny for positive selection. Four of the methods control the familywise error rates (FWERs), whereas the other 2 control the false discovery rate (FDR). We found that all correction procedures achieved acceptable FWER except for extremely divergent sequences and serious model violations, when the test may become unreliable. The power of the test to detect positive selection is influenced by the strength of selection and the sequence divergence, with the highest power observed at intermediate divergences. The 4 correction procedures that control the FWER had similar power. We recommend Rom's procedure for its slightly higher power, but the simple Bonferroni correction is useable as well. The 2 correction procedures that control the FDR had slightly more power and also higher FWER. We demonstrate the multiple test procedures by analyzing gene sequences from the extracellular domain of the cluster of differentiation 2 (CD2) gene from 10 mammalian species. Both our simulation and real data analysis suggest that the multiple test procedures are useful when multiple branches have to be tested on the same data set.  相似文献   

20.
Li H  Stephan W 《Genetics》2005,171(1):377-384
Two maximum-likelihood methods are proposed for detecting recent, strongly positive selection and for localizing the target of selection along a recombining chromosome. The methods utilize the compact mutation frequency spectrum at multiple neutral loci that are partially linked to the selected site. Using simulated data, we show that the power of the tests lies between 80 and 98% in most cases, and the false positive rate could be as low as approximately 10% when the number of sampled marker loci is sufficiently large (> or = 20). The confidence interval around the estimated position of selection is reasonably narrow. The methods are applied to X chromosome data of Drosophila melanogaster from a European and an African population. Evidence of selection was found for both populations (including a selective sweep that was shared between both populations).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号