首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Lin K  Li H  Schlötterer C  Futschik A 《Genetics》2011,187(1):229-244
Summary statistics are widely used in population genetics, but they suffer from the drawback that no simple sufficient summary statistic exists, which captures all information required to distinguish different evolutionary hypotheses. Here, we apply boosting, a recent statistical method that combines simple classification rules to maximize their joint predictive performance. We show that our implementation of boosting has a high power to detect selective sweeps. Demographic events, such as bottlenecks, do not result in a large excess of false positives. A comparison to other neutrality tests shows that our boosting implementation performs well compared to other neutrality tests. Furthermore, we evaluated the relative contribution of different summary statistics to the identification of selection and found that for recent sweeps integrated haplotype homozygosity is very informative whereas older sweeps are better detected by Tajima's π. Overall, Watterson's was found to contribute the most information for distinguishing between bottlenecks and selection.  相似文献   

2.
Statistical properties of the branch-site test of positive selection   总被引:1,自引:0,他引:1  
The branch-site test is a likelihood ratio test to detect positive selection along prespecified lineages on a phylogeny that affects only a subset of codons in a protein-coding gene, with positive selection indicated by accelerated nonsynonymous substitutions (with ω = d(N)/d(S) > 1). This test may have more power than earlier methods, which average nucleotide substitution rates over sites in the protein and/or over branches on the tree. However, a few recent studies questioned the statistical basis of the test and claimed that the test generated too many false positives. In this paper, we examine the null distribution of the test and conduct a computer simulation to examine the false-positive rate and the power of the test. The results suggest that the asymptotic theory is reliable for typical data sets, and indeed in our simulations, the large-sample null distribution was reliable with as few as 20-50 codons in the alignment. We examined the impact of sequence length, the strength of positive selection, and the proportion of sites under positive selection on the power of the branch-site test. We found that the test was far more powerful in detecting episodic positive selection than branch-based tests, which average substitution rates over all codons in the gene and thus miss the signal when most codons are under strong selective constraint. Recent claims of statistical problems with the branch-site test are due to misinterpretations of simulation results. Our results, as well as previous simulation studies that have demonstrated the robustness of the test, suggest that the branch-site test may be a useful tool for detecting episodic positive selection and for generating biological hypotheses for mutation studies and functional analyses. The test is sensitive to sequence and alignment errors and caution should be exercised concerning its use when data quality is in doubt.  相似文献   

3.
We investigate the performance of tests of neutrality in admixed populations using plausible demographic models for African-American history as well as resequencing data from African and African-American populations. The analysis of both simulated and human resequencing data suggests that recent admixture does not result in an excess of false-positive results for neutrality tests based on the frequency spectrum after accounting for the population growth in the parental African population. Furthermore, when simulating positive selection, Tajima's D, Fu and Li's D, and haplotype homozygosity have lower power to detect population-specific selection using individuals sampled from the admixed population than from the nonadmixed population. Fay and Wu's H test, however, has more power to detect selection using individuals from the admixed population than from the nonadmixed population, especially when the selective sweep ended long ago. Our results have implications for interpreting recent genome-wide scans for positive selection in human populations.  相似文献   

4.
Taxanes are defensive metabolites produced by Taxus species (yews) and used in anticancer therapies. Despite their medical interest, patterns of natural diversity in taxane-related genes are unknown. We examined variation at five main genes of Taxus baccata in the Iberian Peninsula, a region where unique yew genetic resources are endangered. We looked at several gene features and applied complementary neutrality tests, including diversity/divergence tests, tests solely based on site frequency spectrum (SFS) and Zeng's compound tests. To account for specific demography, microsatellite data were used to infer historical changes in population size based on an Approximate Bayesian Computation (ABC) approach. Polymorphism-divergence tests pointed to positive selection for genes TBT and TAT and balancing selection for DBAT. In addition, neutrality tests based on SFS found that while a recent reduction in population size may explain most statistics' values, selection may still be in action in genes TBT and DBAT, at least in some populations. Molecular signatures on taxol genes suggest the action of frequent selective waves with different direction or intensity, possibly related to varying adaptive pressures produced by the host-enemy co-evolution on defence-related genes. Such natural selection processes may have produced taxane variants still undiscovered.  相似文献   

5.
Recombination can negatively impact methods designed to detect divergent gene function that rely on explicit knowledge of a gene tree. However, we know little about how recombination detection methods perform under evolutionary scenarios encountered in studies of functional molecular divergence. We use simulation to evaluate false positive rates for six recombination detection methods (GENECONV, MaxChi, Chimera, RDP, GARD-SBP, GARD-MBP) under evolutionary scenarios that might increase false positives. Broadly, these scenarios address: (i) asymmetric tree topology and sequence divergence, (ii) non-stationary codon bias and selection pressure, and (iii) positive selection. We also evaluate power to detect recombination under truly recombinant history. As with previous studies, we find that power increases with sequence divergence. However, we also find that accuracy to correctly infer the number of breakpoints is extremely low. When recombination is absent, increased sequence divergence leads to increased false positives. Furthermore, one method (GARD-SBP) is sensitive to tree shape, with higher false positive rates under an asymmetric tree topology. Somewhat surprisingly, all methods are robust to the simulated heterogeneity in codon bias, shifts in selection pressure and presence of positive selection. Based on these findings, we recommend that studies of functional divergence in systems where recombination is plausible can, and should, include a pre-test for recombination. Application of all methods to the core genome of Prochlorococcus reveals a substantial lack of concordance among results. Based on analysis of both real and simulated datasets we present some guidelines for the investigation of recombination in genes that may have experienced functional divergence.  相似文献   

6.
In this report, we compare the differences between various site- and haplotype-frequency tests in their power to detect positive selection by doing computer simulations. Our results are the following. 1) Although haplotype-frequency tests that are conditional on the number of haplotypes (K) were developed for nonrecombining haplotypes, these tests are insensitive to recombination. Such tests, including the Ewens-Watterson (EW) test, can therefore be applied to recombining haplotypes. 2) Tests conditional on the number of segregating sites (S) become overly conservative in the presence of recombination. 3) The EW test is usually the most powerful test during the sweep phase, especially when the local recombination rate is high. 4) The "extended haplotype homozygosity" test relies heavily on the prior knowledge of the target of selection. With that knowledge, it is the most powerful test, whereas in the absence of this prior information, the test has little power. We also study the sensitivities of the haplotype-frequency tests to background selection and various demographic forces. We find that these tests are sensitive to some forces other than positive selection. To alleviate the problem of low specificity, compound tests, such as the DH test (Zeng et al. 2006), may be a solution. In the companion paper (Zeng K, Shi S, Wu C-I, in preparation), we use the EW test to devise 2 compound tests, which are more powerful in detecting positive selection than DH, but are also relatively insensitive to demography.  相似文献   

7.
Case-control association studies are widely used in the search for genetic variants that contribute to human diseases. It has long been known that such studies may suffer from high rates of false positives if there is unrecognized population structure. It is perhaps less widely appreciated that so-called “cryptic relatedness” (i.e., kinship among the cases or controls that is not known to the investigator) might also potentially inflate the false positive rate. Until now there has been little work to assess how serious this problem is likely to be in practice. In this paper, we develop a formal model of cryptic relatedness, and study its impact on association studies. We provide simple expressions that predict the extent of confounding due to cryptic relatedness. Surprisingly, these expressions are functions of directly observable parameters. Our analytical results show that, for well-designed studies in outbred populations, the degree of confounding due to cryptic relatedness will usually be negligible. However, in contrast, studies where there is a sampling bias toward collecting relatives may indeed suffer from excessive rates of false positives. Furthermore, cryptic relatedness may be a serious concern in founder populations that have grown rapidly and recently from a small size. As an example, we analyze the impact of excess relatedness among cases for six phenotypes measured in the Hutterite population.  相似文献   

8.
Thornton KR  Jensen JD 《Genetics》2007,175(2):737-750
Rapid typing of genetic variation at many regions of the genome is an efficient way to survey variability in natural populations in an effort to identify segments of the genome that have experienced recent natural selection. Following such a genome scan, individual regions may be chosen for further sequencing and a more detailed analysis of patterns of variability, often to perform a parametric test for selection and to estimate the strength of a recent selective sweep. We show here that not accounting for the ascertainment of loci in such analyses leads to false inference of natural selection when the true model is selective neutrality, because the procedure of choosing unusual loci (in comparison to the rest of the genome-scan data) selects regions of the genome with genealogies similar to those expected under models of recent directional selection. We describe a simple and efficient correction for this ascertainment bias, which restores the false-positive rate to near-nominal levels. For the parameters considered here, we find that obtaining a test with the expected distribution of P-values depends on accurately accounting both for ascertainment of regions and for demography. Finally, we use simulations to explore the utility of relying on outlier loci to detect recent selective sweeps. We find that measures of diversity and of population differentiation are more effective than summaries of the site-frequency spectrum and that sequencing larger regions (2.5 kbp) in genome-scan studies leads to more power to detect recent selective sweeps.  相似文献   

9.
Genome scans with many genetic markers provide the opportunity to investigate local adaptation in natural populations and identify candidate genes under selection. In particular, SNPs are dense throughout the genome of most organisms and are commonly observed in functional genes making them ideal markers to study adaptive molecular variation. This approach has become commonly employed in ecological and population genetics studies to detect outlier loci that are putatively under selection. However, there are several challenges to address with outlier approaches including genotyping errors, underlying population structure and false positives, variation in mutation rate and limited sensitivity (false negatives). In this study, we evaluated multiple outlier tests and their type I (false positive) and type II (false negative) error rates in a series of simulated data sets. Comparisons included simulation procedures (FDIST2, ARLEQUIN v.3.5 and BAYESCAN) as well as more conventional tools such as global F(ST) histograms. Of the three simulation methods, FDIST2 and BAYESCAN typically had the lowest type II error, BAYESCAN had the least type I error and Arlequin had highest type I and II error. High error rates in Arlequin with a hierarchical approach were partially because of confounding scenarios where patterns of adaptive variation were contrary to neutral structure; however, Arlequin consistently had highest type I and type II error in all four simulation scenarios tested in this study. Given the results provided here, it is important that outlier loci are interpreted cautiously and error rates of various methods are taken into consideration in studies of adaptive molecular variation, especially when hierarchical structure is included.  相似文献   

10.
Compound tests for the detection of hitchhiking under positive selection   总被引:2,自引:0,他引:2  
Many statistical tests have been developed for detecting positive selection. Most of these tests draw conclusions based on significant deviations from the patterns of polymorphism predicted by the neutral model. However, many non-equilibrium forces may cause similar deviations, and thus the tests usually have low statistical specificity to positive selection. The main challenge is hence to construct test statistics that are reasonably powerful in detecting positive selection, but are relatively insensitive to other forces. Recently, Zeng et al. (2006) proposed a new test, DH, which is a compound of Tajima's D and Fay and Wu's H, and showed that DH has reasonably high statistical specificity to positive selection. In this report, we expand the idea of a compound test by combining Fay and Wu's H or DH with the Ewens-Watterson (EW) test. We refer to these 2 new tests as HEW and DHEW, respectively. Compared to the DH test, HEW and DHEW are more robust against the presence of recombination, and are also more powerful in detecting positive selection. Furthermore, the DHEW test, similar to DH, is also relatively insensitive to background selection and demography. The HEW test, on the other hand, tends to be somewhat less conservative than DH and DHEW in some cases.  相似文献   

11.
Detecting positive Darwinian selection at the DNA sequence level has been a subject of considerable interest. However, positive selection is difficult to detect because it often operates episodically on a few amino acid sites, and the signal may be masked by negative selection. Several methods have been developed to test positive selection that acts on given branches (branch methods) or on a subset of sites (site methods). Recently, Yang, Z., and R. Nielsen (2002. Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol. Biol. Evol. 19:908-917) developed likelihood ratio tests (LRTs) based on branch-site models to detect positive selection that affects a small number of sites along prespecified lineages. However, computer simulations suggested that the tests were sensitive to the model assumptions and were unable to distinguish between relaxation of selective constraint and positive selection (Zhang, J. 2004. Frequent false detection of positive selection by the likelihood method with branch-site models. Mol. Biol. Evol. 21:1332-1339). Here, we describe a modified branch-site model and use it to construct two LRTs, called branch-site tests 1 and 2. We applied the new tests to reanalyze several real data sets and used computer simulation to examine the performance of the two tests by examining their false-positive rate, power, and robustness. We found that test 1 was unable to distinguish relaxed constraint from positive selection affecting the lineages of interest, while test 2 had acceptable false-positive rates and appeared robust against violations of model assumptions. As test 2 is a direct test of positive selection on the lineages of interest, it is referred to as the branch-site test of positive selection and is recommended for use in real data analysis. The test appeared conservative overall, but exhibited better power in detecting positive selection than the branch-based test. Bayes empirical Bayes identification of amino acid sites under positive selection along the foreground branches was found to be reliable, but lacked power.  相似文献   

12.
Detection of natural selection operating at the amino acid sequencelevel is important in the study of molecular evolution. Single-siteanalysis and one-dimensional window analysis can be used todetect selection when the biological functions of amino acidsites are unknown. Single-site analysis is useful when selectionoperates more or less constantly over evolutionary time, butless so when selection operates temporarily. One-dimensionalwindow analysis is more sensitive than single-site analysiswhen the functions of amino acid sites in close proximity inthe linear sequence are similar, although this is not alwaysthe case. Here I present a three-dimensional window analysismethod for detecting selection given the three-dimensional structureof the protein of interest. In the three-dimensional structure,the window is defined as the sphere centered on the -carbonof an amino acid site. The window size is the radius of thesphere. The sites whose -carbons are included in the windoware grouped for the neutrality test. The window is moved withinthe three-dimensional structure by sequentially moving the centralsite along the primary amino acid sequence. To detect positiveselection, it may also be useful to group the surface-exposedsites in the window separately. Three-dimensional window analysisappears not only to be more sensitive than single-site analysisand one-dimensional window analysis but also to provide similarspecificity for inferring positive selection in the analysesof the hemagglutinin and neuraminidase genes of human influenzaA viruses. This method, however, may fail to detect selectionwhen it operates only on a particular site, in which case single-siteanalysis may be preferred, although a large number of sequencesis required.  相似文献   

13.
In this article, we consider the probabilistic identification of amino acid positions that evolve under positive selection as a multiple hypothesis testing problem. The null hypothesis "H0,s: site s evolves under a negative selection or under a neutral process of evolution" is tested at each codon site of the alignment of homologous coding sequences. Standard hypothesis testing is based on the control of the expected proportion of falsely rejected null hypotheses or type-I error rate. As the number of tests increases, however, the power of an individual test may become unacceptably low. Recent advances in statistics have shown that the false discovery rate--in this case, the expected proportion of sites that do not evolve under positive selection among those that are estimated to evolve under this selection regime--is a quantity that can be controlled. Keeping the proportion of false positives low among the significant results generally leads to an increase in power. In this article, we show that controlling the false detection rate is relevant when searching for positively selected sites. We also compare this new approach to traditional methods using extensive simulations.  相似文献   

14.
Interleukin-13 (IL13) is believed to play an important role in the pathogenesis of atopy and allergic asthma. To better understand genetic variation at the IL13 locus, we resequenced a 5.1-kb genomic region spanning the entire locus and identified 26 single-nucleotide polymorphisms (SNPs) in 74 individuals from three major populations-Chinese, Caucasian, and African. Our survey suggests exceptionally high and significant geographic structure at the IL13 locus between African and outside Africa populations. This unusual pattern suggests that positive selection that acts in some local populations may have played a role on the IL13 locus. In support of this suggestion, we found a significant excess of high frequency-derived SNPs in the Chinese population and Caucasian population, respectively, as expected after a recent episode of positive selection. Further, the unusual haplotype structure indicates that different scenarios of the action of positive selection on the IL13 locus in different populations may exist. In the Caucasian population, the skewed haplotype distribution dominated by one common haplotype supports the hypothesis of simple directional selection. Whereas, in the Chinese population, the two-round hitchhiking hypothesis may explain the skewed haplotype structure with three dominant ones. These findings may provide insight into the likely relative roles of selection and population history in establishing present-day variation at the IL13 locus, and, motivate further studies of this locus as an important candidate in common diseases association studies.  相似文献   

15.
Both genetic drift and natural selection cause the frequencies of alleles in a population to vary over time. Discriminating between these two evolutionary forces, based on a time series of samples from a population, remains an outstanding problem with increasing relevance to modern data sets. Even in the idealized situation when the sampled locus is independent of all other loci, this problem is difficult to solve, especially when the size of the population from which the samples are drawn is unknown. A standard χ2-based likelihood-ratio test was previously proposed to address this problem. Here we show that the χ2-test of selection substantially underestimates the probability of type I error, leading to more false positives than indicated by its P-value, especially at stringent P-values. We introduce two methods to correct this bias. The empirical likelihood-ratio test (ELRT) rejects neutrality when the likelihood-ratio statistic falls in the tail of the empirical distribution obtained under the most likely neutral population size. The frequency increment test (FIT) rejects neutrality if the distribution of normalized allele-frequency increments exhibits a mean that deviates significantly from zero. We characterize the statistical power of these two tests for selection, and we apply them to three experimental data sets. We demonstrate that both ELRT and FIT have power to detect selection in practical parameter regimes, such as those encountered in microbial evolution experiments. Our analysis applies to a single diallelic locus, assumed independent of all other loci, which is most relevant to full-genome selection scans in sexual organisms, and also to evolution experiments in asexual organisms as long as clonal interference is weak. Different techniques will be required to detect selection in time series of cosegregating linked loci.  相似文献   

16.
Perspective: detecting adaptive molecular polymorphism: lessons from the MHC   总被引:13,自引:0,他引:13  
Abstract. In the 1960s, when population geneticists first began to collect data on the amount of genetic variation in natural populations, balancing selection was invoked as a possible explanation for how such high levels of molecular variation are maintained. However, the predictions of the neutral theory of molecular evolution have since become the standard by which cases of balancing selection may be inferred. Here we review the evidence for balancing selection acting on the major histocompatibility complex (MHC) of vertebrates, a genetic system that defies many of the predictions of neutrality. We apply many widely used tests of neutrality to MHC data as a benchmark for assessing the power of these tests. These tests can be categorized as detecting selection in the current generation, over the history of populations, or over the histories of species. We find that selection is not detectable in MHC datasets in every generation, population, or every evolutionary lineage. This suggests either that selection on the MHC is heterogeneous or that many of the current neutrality tests lack sufficient power to detect the selection consistently. Additionally, we identify a potential inference problem associated with several tests of neutrality. We demonstrate that the signals of selection may be generated in a relatively short period of microevolutionary time, yet these signals may take exceptionally long periods of time to be erased in the absence of selection. This is especially true for the neutrality test based on the ratio of nonsynonymous to synonymous substitutions. Inference of the nature of the selection events that create such signals should be approached with caution. However, a combination of tests on different time scales may overcome such problems.  相似文献   

17.
Genome-wide scanning for signals of recent positive selection is essential for a comprehensive and systematic understanding of human adaptation. Here, we present a genomic survey of recent local selective sweeps, especially aimed at those nearly or recently completed. A novel approach was developed for such signals, based on contrasting the extended haplotype homozygosity (EHH) profiles between populations. We applied this method to the genome single nucleotide polymorphism (SNP) data of both the International HapMap Project and Perlegen Sciences, and detected widespread signals of recent local selection across the genome, consisting of both complete and partial sweeps. A challenging problem of genomic scans of recent positive selection is to clearly distinguish selection from neutral effects, given the high sensitivity of the test statistics to departures from neutral demographic assumptions and the lack of a single, accurate neutral model of human history. We therefore developed a new procedure that is robust across a wide range of demographic and ascertainment models, one that indicates that certain portions of the genome clearly depart from neutrality. Simulations of positive selection showed that our tests have high power towards strong selection sweeps that have undergone fixation. Gene ontology analysis of the candidate regions revealed several new functional groups that might help explain some important interpopulation differences in phenotypic traits.  相似文献   

18.
FST outlier tests are a potentially powerful way to detect genetic loci under spatially divergent selection. Unfortunately, the extent to which these tests are robust to nonequilibrium demographic histories has been understudied. We developed a landscape genetics simulator to test the effects of isolation by distance (IBD) and range expansion on FST outlier methods. We evaluated the two most commonly used methods for the identification of FST outliers (FDIST2 and BayeScan, which assume samples are evolutionarily independent) and two recent methods (FLK and Bayenv2, which estimate and account for evolutionary nonindependence). Parameterization with a set of neutral loci (‘neutral parameterization’) always improved the performance of FLK and Bayenv2, while neutral parameterization caused FDIST2 to actually perform worse in the cases of IBD or range expansion. BayeScan was improved when the prior odds on neutrality was increased, regardless of the true odds in the data. On their best performance, however, the widely used methods had high false‐positive rates for IBD and range expansion and were outperformed by methods that accounted for evolutionary nonindependence. In addition, default settings in FDIST2 and BayeScan resulted in many false positives suggesting balancing selection. However, all methods did very well if a large set of neutral loci is available to create empirical P‐values. We conclude that in species that exhibit IBD or have undergone range expansion, many of the published FST outliers based on FDIST2 and BayeScan are probably false positives, but FLK and Bayenv2 show great promise for accurately identifying loci under spatially divergent selection.  相似文献   

19.
Detection of positive Darwinian selection has become ever more important with the rapid growth of genomic data sets. Recent branch-site models of codon substitution account for variation of selective pressure over branches on the tree and across sites in the sequence and provide a means to detect short episodes of molecular adaptation affecting just a few sites. In likelihood ratio tests based on such models, the branches to be tested for positive selection have to be specified a priori. In the absence of a biological hypothesis to designate so-called foreground branches, one may test many branches, but a correction for multiple testing becomes necessary. In this paper, we employ computer simulation to evaluate the performance of 6 multiple test correction procedures when the branch-site models are used to test every branch on the phylogeny for positive selection. Four of the methods control the familywise error rates (FWERs), whereas the other 2 control the false discovery rate (FDR). We found that all correction procedures achieved acceptable FWER except for extremely divergent sequences and serious model violations, when the test may become unreliable. The power of the test to detect positive selection is influenced by the strength of selection and the sequence divergence, with the highest power observed at intermediate divergences. The 4 correction procedures that control the FWER had similar power. We recommend Rom's procedure for its slightly higher power, but the simple Bonferroni correction is useable as well. The 2 correction procedures that control the FDR had slightly more power and also higher FWER. We demonstrate the multiple test procedures by analyzing gene sequences from the extracellular domain of the cluster of differentiation 2 (CD2) gene from 10 mammalian species. Both our simulation and real data analysis suggest that the multiple test procedures are useful when multiple branches have to be tested on the same data set.  相似文献   

20.
Population differentiation (PD) and ecological association (EA) tests have recently emerged as prominent statistical methods to investigate signatures of local adaptation using population genomic data. Based on statistical models, these genomewide testing procedures have attracted considerable attention as tools to identify loci potentially targeted by natural selection. An important issue with PD and EA tests is that incorrect model specification can generate large numbers of false‐positive associations. Spurious association may indeed arise when shared demographic history, patterns of isolation by distance, cryptic relatedness or genetic background are ignored. Recent works on PD and EA tests have widely focused on improvements of test corrections for those confounding effects. Despite significant algorithmic improvements, there is still a number of open questions on how to check that false discoveries are under control and implement test corrections, or how to combine statistical tests from multiple genome scan methods. This tutorial study provides a detailed answer to these questions. It clarifies the relationships between traditional methods based on allele frequency differentiation and EA methods and provides a unified framework for their underlying statistical tests. We demonstrate how techniques developed in the area of genomewide association studies, such as inflation factors and linear mixed models, benefit genome scan methods and provide guidelines for good practice while conducting statistical tests in landscape and population genomic applications. Finally, we highlight how the combination of several well‐calibrated statistical tests can increase the power to reject neutrality, improving our ability to infer patterns of local adaptation in large population genomic data sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号