首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Genetic adaptation to external stimuli occurs through the combined action of mutation and selection. A central problem in genetics is to identify loci responsive to specific selective constraints. Many tests have been proposed to identify the genomic signatures of natural selection by quantifying the skew in the site frequency spectrum (SFS) under selection relative to neutrality. We build upon recent work that connects many of these tests under a common framework, by describing how selective sweeps affect the scaled SFS. We show that the specific skew depends on many attributes of the sweep, including the selection coefficient and the time under selection. Using supervised learning on extensive simulated data, we characterize the features of the scaled SFS that best separate different types of selective sweeps from neutrality. We develop a test, SFselect, that consistently outperforms many existing tests over a wide range of selective sweeps. We apply SFselect to polymorphism data from a laboratory evolution experiment of Drosophila melanogaster adapted to hypoxia and identify loci that strengthen the role of the Notch pathway in hypoxia tolerance, but were missed by previous approaches. We further apply our test to human data and identify regions that are in agreement with earlier studies, as well as many novel regions.  相似文献   

2.
Genome-wide scanning for signals of recent positive selection is essential for a comprehensive and systematic understanding of human adaptation. Here, we present a genomic survey of recent local selective sweeps, especially aimed at those nearly or recently completed. A novel approach was developed for such signals, based on contrasting the extended haplotype homozygosity (EHH) profiles between populations. We applied this method to the genome single nucleotide polymorphism (SNP) data of both the International HapMap Project and Perlegen Sciences, and detected widespread signals of recent local selection across the genome, consisting of both complete and partial sweeps. A challenging problem of genomic scans of recent positive selection is to clearly distinguish selection from neutral effects, given the high sensitivity of the test statistics to departures from neutral demographic assumptions and the lack of a single, accurate neutral model of human history. We therefore developed a new procedure that is robust across a wide range of demographic and ascertainment models, one that indicates that certain portions of the genome clearly depart from neutrality. Simulations of positive selection showed that our tests have high power towards strong selection sweeps that have undergone fixation. Gene ontology analysis of the candidate regions revealed several new functional groups that might help explain some important interpopulation differences in phenotypic traits.  相似文献   

3.
Ferretti L  Raineri E  Ramos-Onsins S 《Genetics》2012,191(4):1397-1401
Missing data are common in DNA sequences obtained through high-throughput sequencing. Furthermore, samples of low quality or problems in the experimental protocol often cause a loss of data even with traditional sequencing technologies. Here we propose modified estimators of variability and neutrality tests that can be naturally applied to sequences with missing data, without the need to remove bases or individuals from the analysis. Modified statistics include the Watterson estimator θ(W), Tajima's D, Fay and Wu's H, and HKA. We develop a general framework to take missing data into account in frequency spectrum-based neutrality tests and we derive the exact expression for the variance of these statistics under the neutral model. The neutrality tests proposed here can also be used as summary statistics to describe the information contained in other classes of data like DNA microarrays.  相似文献   

4.
Previous studies of immunity in wild populations have focused primarily on genes of the major histocompatibility complex (MHC); however, studies of model species have identified additional immune-related genes that also affect fitness. In this study, we sequenced five non-MHC immune genes in six greater prairie-chicken (Tympanuchus cupido) populations that have experienced varying degrees of genetic drift as a consequence of population bottlenecks and fragmentation. We compared patterns of geographic variation at the immune genes with six neutral microsatellite markers to investigate the relative effects of selection and genetic drift. Global F(ST) outlier tests identified positive selection on just one of five immune genes (IAP-1) in one population. In contrast, at other immune genes, standardized G'(ST) values were lower than those at microsatellites for a majority of pairwise population comparisons, consistent with balancing selection or with species-wide positive or purifying selection resulting in similar haplotype frequencies across populations. The effects of genetic drift were also evident as summary statistics (e.g., Tajima's D) did not differ from neutrality for the majority of cases, and immune gene diversity (number of haplotypes per gene) was correlated positively with population size. In summary, we found that both genetic drift and selection shaped variation at the five immune genes, and the strength and type of selection varied among genes. Our results caution that neutral forces, such as drift, can make it difficult to detect current selection on genes.  相似文献   

5.
Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme.  相似文献   

6.
The power of several neutrality tests to reject a simple bottleneck model is examined in a coalescent framework. Several tests are considered including some relying on the frequency spectrum of mutations and some reflecting the linkage disequilibrium structure of the data. We evaluate the effect of the age and of the strength of the bottleneck, and their interaction. We contrast two qualitatively different bottleneck effects depending on their strength. In genealogical terms, during severe bottlenecks, all lineages coalesce leading to a star-like gene genealogy of the sample. Some time after the bottleneck, once new mutations have arisen, they tend to show an excess of rare variants and a slight excess of haplotypes. On the contrary, more moderate bottlenecks allow several lineages to survive the demographic crash, leading to a balanced genealogy with long internal branches. Soon after the event, data tend to show an excess of intermediate frequency variants and a deficit of haplotypes. We show that for moderate sequencing efforts, severe bottlenecks can be detected only after an intermediate time period has allowed for mutations to occur, preferably by frequency spectrum statistics. Moderate bottlenecks can be more easily detected for more recent events, especially using haplotype statistics. Finally, for a single locus, the bottleneck results closely approximate those of a simple hitchhiking model. The main difference concerns the frequency distribution of mutations and haplotypes after moderate perturbations. Hitchhiking increases the number of rare ancestral mutations and leads to a more predominant major haplotype class. Thus, despite a number of common features between the two processes, hitchhiking cannot be strictly modeled by bottlenecks.  相似文献   

7.
Characterizing the nature of the adaptive process at the genetic level is a central goal for population genetics. In particular, we know little about the sources of adaptive substitution or about the number of adaptive variants currently segregating in nature. Historically, population geneticists have focused attention on the hard-sweep model of adaptation in which a de novo beneficial mutation arises and rapidly fixes in a population. Recently more attention has been given to soft-sweep models, in which alleles that were previously neutral, or nearly so, drift until such a time as the environment shifts and their selection coefficient changes to become beneficial. It remains an active and difficult problem, however, to tease apart the telltale signatures of hard vs. soft sweeps in genomic polymorphism data. Through extensive simulations of hard- and soft-sweep models, here we show that indeed the two might not be separable through the use of simple summary statistics. In particular, it seems that recombination in regions linked to, but distant from, sites of hard sweeps can create patterns of polymorphism that closely mirror what is expected to be found near soft sweeps. We find that a very similar situation arises when using haplotype-based statistics that are aimed at detecting partial or ongoing selective sweeps, such that it is difficult to distinguish the shoulder of a hard sweep from the center of a partial sweep. While knowing the location of the selected site mitigates this problem slightly, we show that stochasticity in signatures of natural selection will frequently cause the signal to reach its zenith far from this site and that this effect is more severe for soft sweeps; thus inferences of the target as well as the mode of positive selection may be inaccurate. In addition, both the time since a sweep ends and biologically realistic levels of allelic gene conversion lead to errors in the classification and identification of selective sweeps. This general problem of “soft shoulders” underscores the difficulty in differentiating soft and partial sweeps from hard-sweep scenarios in molecular population genomics data. The soft-shoulder effect also implies that the more common hard sweeps have been in recent evolutionary history, the more prevalent spurious signatures of soft or partial sweeps may appear in some genome-wide scans.  相似文献   

8.
Identification of partial sweeps, which include both hard and soft sweeps that have not currently reached fixation, provides crucial information about ongoing evolutionary responses. To this end, we introduce partialS/HIC, a deep learning method to discover selective sweeps from population genomic data. partialS/HIC uses a convolutional neural network for image processing, which is trained with a large suite of summary statistics derived from coalescent simulations incorporating population-specific history, to distinguish between completed versus partial sweeps, hard versus soft sweeps, and regions directly affected by selection versus those merely linked to nearby selective sweeps. We perform several simulation experiments under various demographic scenarios to demonstrate partialS/HIC’s performance, which exhibits excellent resolution for detecting partial sweeps. We also apply our classifier to whole genomes from eight mosquito populations sampled across sub-Saharan Africa by the Anopheles gambiae 1000 Genomes Consortium, elucidating both continent-wide patterns as well as sweeps unique to specific geographic regions. These populations have experienced intense insecticide exposure over the past two decades, and we observe a strong overrepresentation of sweeps at insecticide resistance loci. Our analysis thus provides a list of candidate adaptive loci that may be relevant to mosquito control efforts. More broadly, our supervised machine learning approach introduces a method to distinguish between completed and partial sweeps, as well as between hard and soft sweeps, under a variety of demographic scenarios. As whole-genome data rapidly accumulate for a greater diversity of organisms, partialS/HIC addresses an increasing demand for useful selection scan tools that can track in-progress evolutionary dynamics.  相似文献   

9.
Thornton KR  Jensen JD 《Genetics》2007,175(2):737-750
Rapid typing of genetic variation at many regions of the genome is an efficient way to survey variability in natural populations in an effort to identify segments of the genome that have experienced recent natural selection. Following such a genome scan, individual regions may be chosen for further sequencing and a more detailed analysis of patterns of variability, often to perform a parametric test for selection and to estimate the strength of a recent selective sweep. We show here that not accounting for the ascertainment of loci in such analyses leads to false inference of natural selection when the true model is selective neutrality, because the procedure of choosing unusual loci (in comparison to the rest of the genome-scan data) selects regions of the genome with genealogies similar to those expected under models of recent directional selection. We describe a simple and efficient correction for this ascertainment bias, which restores the false-positive rate to near-nominal levels. For the parameters considered here, we find that obtaining a test with the expected distribution of P-values depends on accurately accounting both for ascertainment of regions and for demography. Finally, we use simulations to explore the utility of relying on outlier loci to detect recent selective sweeps. We find that measures of diversity and of population differentiation are more effective than summaries of the site-frequency spectrum and that sequencing larger regions (2.5 kbp) in genome-scan studies leads to more power to detect recent selective sweeps.  相似文献   

10.
Whether hard sweeps or soft sweeps dominate adaptation has been a matter of much debate. Recently, we developed haplotype homozygosity statistics that (i) can detect both hard and soft sweeps with similar power and (ii) can classify the detected sweeps as hard or soft. The application of our method to population genomic data from a natural population of Drosophila melanogaster (DGRP) allowed us to rediscover three known cases of adaptation at the loci Ace, Cyp6g1, and CHKov1 known to be driven by soft sweeps, and detected additional candidate loci for recent and strong sweeps. Surprisingly, all of the top 50 candidates showed patterns much more consistent with soft rather than hard sweeps. Recently, Harris et al. 2018 criticized this work, suggesting that all the candidate loci detected by our haplotype statistics, including the positive controls, are unlikely to be sweeps at all and that instead these haplotype patterns can be more easily explained by complex neutral demographic models. They also claim that these neutral non-sweeps are likely to be hard instead of soft sweeps. Here, we reanalyze the DGRP data using a range of complex admixture demographic models and reconfirm our original published results suggesting that the majority of recent and strong sweeps in D. melanogaster are first likely to be true sweeps, and second, that they do appear to be soft. Furthermore, we discuss ways to take this work forward given that most demographic models employed in such analyses are necessarily too simple to capture the full demographic complexity, while more realistic models are unlikely to be inferred correctly because they require a large number of free parameters.  相似文献   

11.
Mes TH 《Molecular ecology》2003,12(6):1555-1566
Mitochondrial ND4 sequences of populations of four species of parasitic nematodes of livestock were subjected to demographic analyses. Deviation from selective neutrality was detectable using the frequency spectrum of segregating sites and highly negative neutrality statistics. However, the mitochondrial data sets do not comply with the infinite-sites model that underlies these tests, and as a consequence, it was not established whether these features are solely a result of population expansion, or whether aspects of the molecular evolution of these mitochondrial regions are also involved. Coalescent analyses based on Fu's Fs neutrality test, which incorporated estimates of rate heterogeneity, the transition-transversion ratio and nucleotide bias, as well as analyses that are fairly robust to deviations from the infinite-sites model supported population expansion. Also analyses that do not depend on the infinite-sites model suggested historical population expansion of these nematodes. The very similar time since expansion, the absence of signatures of positive selection in ND4 and the logical association with human demography imply that selective sweeps of mitochondrial variants are less probable, and that expansion is the most likely scenario for the parasitic nematodes of livestock. The methods used to characterize the expansion have different assumptions and emphasize different aspects of expansions. The resulting restrictions on the interpretation of expansions are outlined.  相似文献   

12.
Current methods of identifying positively selected regions in the genome are limited in two key ways: the underlying models cannot account for the timing of adaptive events and the comparison between models of selective sweeps and sequence data is generally made via simple summaries of genetic diversity. Here, we develop a tractable method of describing the effect of positive selection on the genealogical histories in the surrounding genome, explicitly modeling both the timing and context of an adaptive event. In addition, our framework allows us to go beyond analyzing polymorphism data via the site frequency spectrum or summaries thereof and instead leverage information contained in patterns of linked variants. Tests on both simulations and a human data example, as well as a comparison to SweepFinder2, show that even with very small sample sizes, our analytic framework has higher power to identify old selective sweeps and to correctly infer both the time and strength of selection. Finally, we derived the marginal distribution of genealogical branch lengths at a locus affected by selection acting at a linked site. This provides a much-needed link between our analytic understanding of the effects of sweeps on sequence variation and recent advances in simulation and heuristic inference procedures that allow researchers to examine the sequence of genealogical histories along the genome.  相似文献   

13.
While it is well understood that the pace of evolution depends on the interplay between natural selection, random genetic drift, mutation, and gene flow, it is not always easy to disentangle the relative roles of these factors with data from natural populations. One popular approach to infer whether the observed degree of population differentiation has been influenced by local adaptation is the comparison of neutral marker gene differentiation (as reflected in FST) and quantitative trait divergence (as reflected in QST). However, this method may lead to compromised statistical power, because FST and QST are summary statistics which neglect information on specific pairs of populations, and because current multivariate tests of neutrality involve an averaging procedure over the traits. Further, most FST-QST comparisons actually replace QST by its expectation over the evolutionary process and are thus theoretically flawed. To overcome these caveats, we derived the statistical distribution of population means generated by random genetic drift and used the probability density of this distribution to test whether the observed pattern could be generated by drift alone. We show that our method can differentiate between genetic drift and selection as a cause of population differentiation even in cases with FST=QST and demonstrate with simulated data that it disentangles drift from selection more accurately than conventional FST-QST tests especially when data sets are small.  相似文献   

14.
Assessing the rate of evolution depends on our ability to detect selection at several genes simultaneously. We summarize DNA sequence variation data in three new and six previously published data sets from the left arm of the second chromosome of Drosophila melanogaster in a population from West Africa, the presumed area of origin of this species. Four loci [Acp26Aa, Fbp2, Vha68-1, and Su(H)] were previously found to deviate from a neutral mutation-drift equilibrium as a consequence of one or several selective sweeps. Polymorphism data from five loci from intervening regions (dpp, Acp26Ab, Acp29AB, GH10711, and Sos) did not show the characteristic deviation from neutrality caused by local selective sweeps. This genomic region is polymorphic for the In(2L)t inversion. Four loci located near inversion breakpoints [dpp, sos, GH10711, and Su(H)] showed significant structuring between the two arrangements or significant deviation from neutrality in the inverted class, probably as a result of a recent shift in inversion frequency. Overall, these patterns of variation suggest that the four selective events were independent. Six loci were observed with no a priori knowledge of selection, and independent selective sweeps were detected in three of them. This suggests that a large part of the D. melanogaster genome has experienced the effect of positive selection in its ancestral African range.  相似文献   

15.
Phenotypic divergences between modern human populations have developed as a result of genetic adaptation to local environments over the past 100,000 years. To identify genes involved in population-specific phenotypes, it is necessary to detect signatures of recent positive selection in the human genome. Although detection of elongated linkage disequilibrium (LD) has been a powerful tool in the field of evolutionary genetics, current LD-based approaches are not applicable to already fixed loci. Here, we report a method of scanning for population-specific strong selective sweeps that have reached fixation. In this method, genome-wide SNP data is used to analyze differences in the haplotype frequency, nucleotide diversity, and LD between populations, using the ratio of haplotype homozygosity between populations. To estimate the detection power of the statistics used in this study, we performed computer simulations and found that these tests are relatively robust against the density of typed SNPs and demographic parameters if the advantageous allele has reached fixation. Therefore, we could determine the threshold for maintaining high detection power, regardless of SNP density and demographic history. When this method was applied to the HapMap data, it was able to identify the candidates of population-specific strong selective sweeps more efficiently than the outlier approach that depends on the empirical distribution. This study, confirming strong positive selection on genes previously reported to be associated with specific phenotypes, also identifies other candidates that are likely to contribute to phenotypic differences between human populations.  相似文献   

16.
In this report, we investigate the statistical power of several tests of selective neutrality based on patterns of genetic diversity within and between species. The goal is to compare tests based solely on population genetic data with tests using comparative data or a combination of comparative and population genetic data. We show that in the presence of repeated selective sweeps on relatively neutral background, tests based on the d(N)/d(S) ratios in comparative data almost always have more power to detect selection than tests based on population genetic data, even if the overall level of divergence is low. Tests based solely on the distribution of allele frequencies or the site frequency spectrum, such as the Ewens-Watterson test or Tajima's D, have less power in detecting both positive and negative selection because of the transient nature of positive selection and the weak signal left by negative selection. The Hudson-Kreitman-Aguadé test is the most powerful test for detecting positive selection among the population genetic tests investigated, whereas McDonald-Kreitman test typically has more power to detect negative selection. We discuss our findings in the light of the discordant results obtained in several recently published genomic scans.  相似文献   

17.
Herbeck JT  Funk DJ  Degnan PH  Wernegreen JJ 《Genetics》2003,165(4):1651-1660
The obligate endosymbiotic bacterium Buchnera aphidicola shows elevated rates of sequence evolution compared to free-living relatives, particularly at nonsynonymous sites. Because Buchnera experiences population bottlenecks during transmission to the offspring of its aphid host, it is hypothesized that genetic drift and the accumulation of slightly deleterious mutations can explain this rate increase. Recent studies of intraspecific variation in Buchnera reveal patterns consistent with this hypothesis. In this study, we examine inter- and intraspecific nucleotide variation in groEL, a highly conserved chaperonin gene that is constitutively overexpressed in Buchnera. Maximum-likelihood estimates of nonsynonymous substitution rates across Buchnera species are strikingly low at groEL compared to other loci. Despite this evidence for strong purifying selection on groEL, our intraspecific analysis of this gene documents reduced synonymous polymorphism, elevated nonsynonymous polymorphism, and an excess of rare alleles relative to the neutral expectation, as found in recent studies of other Buchnera loci. Comparisons with Escherichia coli generally show patterns predicted by their differences in N(e). The sum of these observations is not expected under relaxed or balancing selection, selective sweeps, or increased mutation rate. Rather, they further support the hypothesis that drift is an important force driving accelerated protein evolution in this obligate mutualist.  相似文献   

18.
Identifying genomic targets of population‐specific positive selection is a major goal in several areas of basic and applied biology. However, it is unclear how often such selection should act on new mutations versus standing genetic variation or recurrent mutation, and furthermore, favoured alleles may either become fixed or remain variable in the population. Very few population genetic statistics are sensitive to all of these modes of selection. Here, we introduce and evaluate the Comparative Haplotype Identity statistic (χMD), which assesses whether pairwise haplotype sharing at a locus in one population is unusually large compared with another population, relative to genomewide trends. Using simulations that emulate human and Drosophila genetic variation, we find that χMD is sensitive to a wide range of selection scenarios, and for some very challenging cases (e.g. partial soft sweeps), it outperforms other two‐population statistics. We also find that, as with FST, our haplotype approach has the ability to detect surprisingly ancient selective sweeps. Particularly for the scenarios resembling human variation, we find that χMD outperforms other frequency‐ and haplotype‐based statistics for soft and/or partial selective sweeps. Applying χMD and other between‐population statistics to published population genomic data from D. melanogaster, we find both shared and unique genes and functional categories identified by each statistic. The broad utility and computational simplicity of χMD will make it an especially valuable tool in the search for genes targeted by local adaptation.  相似文献   

19.
Achaz G 《Genetics》2008,179(3):1409-1424
Many data sets one could use for population genetics contain artifactual sites, i.e., sequencing errors. Here, we first explore the impact of such errors on several common summary statistics, assuming that sequencing errors are mostly singletons. We thus show that in the presence of those errors, estimators of can be strongly biased. We further show that even with a moderate number of sequencing errors, neutrality tests based on the frequency spectrum reject neutrality. This implies that analyses of data sets with such errors will systematically lead to wrong inferences of evolutionary scenarios. To avoid to these errors, we propose two new estimators of theta that ignore singletons as well as two new tests Y and Y* that can be used to test neutrality despite sequencing errors. All in all, we show that even though singletons are ignored, these new tests show some power to detect deviations from a standard neutral model. We therefore advise the use of these new tests to strengthen conclusions in suspicious data sets.  相似文献   

20.
There is currently large interest in distinguishing the signatures of genetic variation produced by demographic events from those produced by natural selection. We propose a simple multilocus statistical test to identify candidate sites of selective sweeps with high power. The test is based on the variability profile measured in an array of linked microsatellites. We also show that the analysis of flanking markers drastically reduces the number of false positives among the candidates that are identified in a genomewide survey of unlinked loci and find that this property is maintained in many population-bottleneck scenarios. However, for a certain range of intermediately severe population bottlenecks we find genomic signatures that are very similar to those produced by a selective sweep. While in these worst-case scenarios the power of the proposed test remains high, the false-positive rate reaches values close to 50%. Hence, selective sweeps may be hard to identify even if multiple linked loci are analyzed. Nevertheless, the integration of information from multiple linked loci always leads to a considerable reduction of the false-positive rate compared to a genome scan of unlinked loci. We discuss the application of this test to experimental data from Drosophila melanogaster.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号