首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme.  相似文献   

2.
Long-term balancing selection typically leaves narrow footprints of increased genetic diversity, and therefore most detection approaches only achieve optimal performances when sufficiently small genomic regions (i.e., windows) are examined. Such methods are sensitive to window sizes and suffer substantial losses in power when windows are large. Here, we employ mixture models to construct a set of five composite likelihood ratio test statistics, which we collectively term B statistics. These statistics are agnostic to window sizes and can operate on diverse forms of input data. Through simulations, we show that they exhibit comparable power to the best-performing current methods, and retain substantially high power regardless of window sizes. They also display considerable robustness to high mutation rates and uneven recombination landscapes, as well as an array of other common confounding scenarios. Moreover, we applied a specific version of the B statistics, termed B2, to a human population-genomic data set and recovered many top candidates from prior studies, including the then-uncharacterized STPG2 and CCDC169SOHLH2, both of which are related to gamete functions. We further applied B2 on a bonobo population-genomic data set. In addition to the MHC-DQ genes, we uncovered several novel candidate genes, such as KLRD1, involved in viral defense, and SCN9A, associated with pain perception. Finally, we show that our methods can be extended to account for multiallelic balancing selection and integrated the set of statistics into open-source software named BalLeRMix for future applications by the scientific community.  相似文献   

3.

Background

Population differentiation has proved to be effective for identifying loci under geographically localized positive selection, and has the potential to identify loci subject to balancing selection. We have previously investigated the pattern of genetic differentiation among human populations at 36.8 million genomic variants to identify sites in the genome showing high frequency differences. Here, we extend this dataset to include additional variants, survey sites with low levels of differentiation, and evaluate the extent to which highly differentiated sites are likely to result from selective or other processes.

Results

We demonstrate that while sites with low differentiation represent sampling effects rather than balancing selection, sites showing extremely high population differentiation are enriched for positive selection events and that one half may be the result of classic selective sweeps. Among these, we rediscover known examples, where we actually identify the established functional SNP, and discover novel examples including the genes ABCA12, CALD1 and ZNF804, which we speculate may be linked to adaptations in skin, calcium metabolism and defense, respectively.

Conclusions

We identify known and many novel candidate regions for geographically restricted positive selection, and suggest several directions for further research.  相似文献   

4.
Identifying adaptive genetic divergence among populations from genome scans   总被引:26,自引:0,他引:26  
The identification of signatures of natural selection in genomic surveys has become an area of intense research, stimulated by the increasing ease with which genetic markers can be typed. Loci identified as subject to selection may be functionally important, and hence (weak) candidates for involvement in disease causation. They can also be useful in determining the adaptive differentiation of populations, and exploring hypotheses about speciation. Adaptive differentiation has traditionally been identified from differences in allele frequencies among different populations, summarised by an estimate of FST. Low outliers relative to an appropriate neutral population-genetics model indicate loci subject to balancing selection, whereas high outliers suggest adaptive (directional) selection. However, the problem of identifying statistically significant departures from neutrality is complicated by confounding effects on the distribution of FST estimates, and current methods have not yet been tested in large-scale simulation experiments. Here, we simulate data from a structured population at many unlinked, diallelic loci that are predominantly neutral but with some loci subject to adaptive or balancing selection. We develop a hierarchical-Bayesian method, implemented via Markov chain Monte Carlo (MCMC), and assess its performance in distinguishing the loci simulated under selection from the neutral loci. We also compare this performance with that of a frequentist method, based on moment-based estimates of FST. We find that both methods can identify loci subject to adaptive selection when the selection coefficient is at least five times the migration rate. Neither method could reliably distinguish loci under balancing selection in our simulations, even when the selection coefficient is twenty times the migration rate.  相似文献   

5.
Uniparental reproduction in diploids, via asexual reproduction or selfing, reduces the independence with which separate loci are transmitted across generations. This is expected to increase the extent to which a neutral marker is affected by selection elsewhere in the genome. Such effects have previously been quantified in coalescent models involving selfing. Here we examine the effects of background selection and balancing selection in diploids capable of both sexual and asexual reproduction (i.e., partial asexuality). We find that the effect of background selection on reducing coalescent time (and effective population size) can be orders of magnitude greater when rates of sex are low than when sex is common. This is because asexuality enhances the effects of background selection through both a recombination effect and a segregation effect. We show that there are several reasons that the strength of background selection differs between systems with partial asexuality and those with comparable levels of uniparental reproduction via selfing. Expectations for reductions in Ne via background selection have been verified using stochastic simulations. In contrast to background selection, balancing selection increases the coalescence time for a linked neutral site. With partial asexuality, the effect of balancing selection is somewhat dependent upon the mode of selection (e.g., heterozygote advantage vs. negative frequency-dependent selection) in a manner that does not apply to selfing. This is because the frequency of heterozygotes, which are required for recombination onto alternative genetic backgrounds, is more dependent on the pattern of selection with partial asexuality than with selfing.  相似文献   

6.
Gompert Z  Buerkle CA 《Genetics》2011,187(3):903-917
The demography of populations and natural selection shape genetic variation across the genome and understanding the genomic consequences of these evolutionary processes is a fundamental aim of population genetics. We have developed a hierarchical Bayesian model to quantify genome-wide population structure and identify candidate genetic regions affected by selection. This model improves on existing methods by accounting for stochastic sampling of sequences inherent in next-generation sequencing (with pooled or indexed individual samples) and by incorporating genetic distances among haplotypes in measures of genetic differentiation. Using simulations we demonstrate that this model has a low false-positive rate for classifying neutral genetic regions as selected genes (i.e., Φ(ST) outliers), but can detect recent selective sweeps, particularly when genetic regions in multiple populations are affected by selection. Nonetheless, selection affecting just a single population was difficult to detect and resulted in a high false-negative rate under certain conditions. We applied the Bayesian model to two large sets of human population genetic data. We found evidence of widespread positive and balancing selection among worldwide human populations, including many genetic regions previously thought to be under selection. Additionally, we identified novel candidate genes for selection, several of which have been linked to human diseases. This model will facilitate the population genetic analysis of a wide range of organisms on the basis of next-generation sequence data.  相似文献   

7.
Perspective: detecting adaptive molecular polymorphism: lessons from the MHC   总被引:13,自引:0,他引:13  
Abstract. In the 1960s, when population geneticists first began to collect data on the amount of genetic variation in natural populations, balancing selection was invoked as a possible explanation for how such high levels of molecular variation are maintained. However, the predictions of the neutral theory of molecular evolution have since become the standard by which cases of balancing selection may be inferred. Here we review the evidence for balancing selection acting on the major histocompatibility complex (MHC) of vertebrates, a genetic system that defies many of the predictions of neutrality. We apply many widely used tests of neutrality to MHC data as a benchmark for assessing the power of these tests. These tests can be categorized as detecting selection in the current generation, over the history of populations, or over the histories of species. We find that selection is not detectable in MHC datasets in every generation, population, or every evolutionary lineage. This suggests either that selection on the MHC is heterogeneous or that many of the current neutrality tests lack sufficient power to detect the selection consistently. Additionally, we identify a potential inference problem associated with several tests of neutrality. We demonstrate that the signals of selection may be generated in a relatively short period of microevolutionary time, yet these signals may take exceptionally long periods of time to be erased in the absence of selection. This is especially true for the neutrality test based on the ratio of nonsynonymous to synonymous substitutions. Inference of the nature of the selection events that create such signals should be approached with caution. However, a combination of tests on different time scales may overcome such problems.  相似文献   

8.
An outstanding question in human genetics has been the degree to which adaptation occurs from standing genetic variation or from de novo mutations. Here, we combine several common statistics used to detect selection in an Approximate Bayesian Computation (ABC) framework, with the goal of discriminating between models of selection and providing estimates of the age of selected alleles and the selection coefficients acting on them. We use simulations to assess the power and accuracy of our method and apply it to seven of the strongest sweeps currently known in humans. We identify two genes, ASPM and PSCA, that are most likely affected by selection on standing variation; and we find three genes, ADH1B, LCT, and EDAR, in which the adaptive alleles seem to have swept from a new mutation. We also confirm evidence of selection for one further gene, TRPV6. In one gene, G6PD, neither neutral models nor models of selective sweeps fit the data, presumably because this locus has been subject to balancing selection.  相似文献   

9.
Balancing selection can maintain immunogenetic variation within host populations, but detecting its signal in a postbottlenecked population is challenging due to the potentially overriding effects of drift. Toll‐like receptor genes (TLRs) play a fundamental role in vertebrate immune defence and are predicted to be under balancing selection. We previously characterized variation at TLR loci in the Seychelles warbler (Acrocephalus sechellensis), an endemic passerine that has undergone a historical bottleneck. Five of seven TLR loci were polymorphic, which is in sharp contrast to the low genomewide variation observed. However, standard population genetic statistical methods failed to detect a contemporary signature of selection at any TLR locus. We examined whether the observed TLR polymorphism could be explained by neutral evolution, simulating the population's demography in the software DIYABC. This showed that the posterior distributions of mutation rates had to be unrealistically high to explain the observed genetic variation. We then conducted simulations with an agent‐based model using typical values for the mutation rate, which indicated that weak balancing selection has acted on the three TLR genes. The model was able to detect evidence of past selection elevating TLR polymorphism in the prebottleneck populations, but was unable to discern any effects of balancing selection in the contemporary population. Our results show drift is the overriding evolutionary force that has shaped TLR variation in the contemporary Seychelles warbler population, and the observed TLR polymorphisms might be merely the ‘ghost of selection past’. Forecast models predict immunogenetic variation in this species will continue to be eroded in the absence of contemporary balancing selection. Such ‘drift debt’ occurs when a gene pool has not yet reached its new equilibrium level of polymorphism, and this loss could be an important threat to many recently bottlenecked populations.  相似文献   

10.
Existing inference methods for estimating the strength of balancing selection in multi-locus genotypes rely on the assumption that there are no epistatic interactions between loci. Complex systems in which balancing selection is prevalent, such as sets of human immune system genes, are known to contain components that interact epistatically. Therefore, current methods may not produce reliable inference on the strength of selection at these loci. In this paper, we address this problem by presenting statistical methods that can account for epistatic interactions in making inference about balancing selection. A theoretical result due to Fearnhead (2006) is used to build a multi-locus Wright-Fisher model of balancing selection, allowing for epistatic interactions among loci. Antagonistic and synergistic types of interactions are examined. The joint posterior distribution of the selection and mutation parameters is sampled by Markov chain Monte Carlo methods, and the plausibility of models is assessed via Bayes factors. As a component of the inference process, an algorithm to generate multi-locus allele frequencies under balancing selection models with epistasis is also presented. Recent evidence on interactions among a set of human immune system genes is introduced as a motivating biological system for the epistatic model, and data on these genes are used to demonstrate the methods.  相似文献   

11.
There has been much speculation as to what role balancing selection has played in evolution. In an attempt to identify regions, such as HLA, at which polymorphism has been maintained in the human population for millions of years, we scanned the human genome for regions of high SNP density. We found 16 regions that, outside of HLA and ABO, are the most highly polymorphic regions yet described; however, evidence for balancing selection at these sites is notably lacking--indeed, whole-genome simulations indicate that our findings are expected under neutrality. We propose that (i) because it is rarely stable, long-term balancing selection is an evolutionary oddity, and (ii) when a balanced polymorphism is ancient in origin, the requirements for detection by means of SNP data alone will rarely be met.  相似文献   

12.
Investigation of the diversity of malaria parasite antigens can help prioritize and validate them as vaccine candidates and identify the most common variants for inclusion in vaccine formulations. Studies of vaccine candidates of the most virulent human malaria parasite, Plasmodium falciparum, have focused on a handful of well-known antigens, while several others have never been studied. Here we examine the global diversity and population structure of leading vaccine candidate antigens of P. falciparum using the MalariaGEN Pf3K (version 5.1) resource, comprising more than 2600 genomes from 15 malaria endemic countries. A stringent variant calling pipeline was used to extract high quality antigen gene ‘haplotypes’ from the global dataset and a new R-package named VaxPack was used to streamline population genetic analyses. In addition, a newly developed algorithm that enables spatial averaging of selection pressure on 3D protein structures was applied to the dataset. We analysed the genes encoding 23 leading and novel candidate malaria vaccine antigens including csp, trap, eba175, ama1, rh5, and CelTOS. Our analysis shows that current malaria vaccine formulations are based on rare haplotypes and thus may have limited efficacy against natural parasite populations. High levels of diversity with evidence of balancing selection was detected for most of the erythrocytic and pre-erythrocytic antigens. Measures of natural selection were then mapped to 3D protein structures to predict targets of functional antibodies. For some antigens, geographical variation in the intensity and distribution of these signals on the 3D structure suggests adaptation to different human host or mosquito vector populations. This study provides an essential framework for the diversity of P. falciparum antigens to be considered in the design of the next generation of malaria vaccines.  相似文献   

13.
Genetic adaptation to different environmental conditions is expected to lead to large differences between populations at selected loci, thus providing a signature of positive selection. Whereas balancing selection can maintain polymorphisms over long evolutionary periods and even geographic scale, thus leads to low levels of divergence between populations at selected loci. However, little is known about the relative importance of these two selective forces in shaping genomic diversity, partly due to difficulties in recognizing balancing selection in species showing low levels of differentiation. Here we address this problem by studying genomic diversity in the European common vole (Microtus arvalis) presenting high levels of differentiation between populations (average F ST = 0.31). We studied 3,839 Amplified Fragment Length Polymorphism (AFLP) markers genotyped in 444 individuals from 21 populations distributed across the European continent and hence over different environmental conditions. Our statistical approach to detect markers under selection is based on a Bayesian method specifically developed for AFLP markers, which treats AFLPs as a nearly codominant marker system, and therefore has increased power to detect selection. The high number of screened populations allowed us to detect the signature of balancing selection across a large geographic area. We detected 33 markers potentially under balancing selection, hence strong evidence of stabilizing selection in 21 populations across Europe. However, our analyses identified four-times more markers (138) being under positive selection, and geographical patterns suggest that some of these markers are probably associated with alpine regions, which seem to have environmental conditions that favour adaptation. We conclude that despite favourable conditions in this study for the detection of balancing selection, this evolutionary force seems to play a relatively minor role in shaping the genomic diversity of the common vole, which is more influenced by positive selection and neutral processes like drift and demographic history.  相似文献   

14.
Polymorphism of genes in the major histocompatibility complex (MHC) is believed to be maintained by balancing selection. However, direct evidence of selection has proven difficult to demonstrate. In 1994, Satta and colleagues estimated the selection intensity of the human MHC (human leukocyte antigen (HLA)) loci; however, at that time the number of HLA sequences was limited. By comparing five different methods, this study demonstrated the best way to calculate the selection coefficient, through a computer simulation study. Since the study, many HLA nucleotide sequences have been made available. Our new analysis takes advantage of these newly available sequences and compares new estimates with those of the previous study. Generally, our new results are consistent with those of the 1994 study. Our results show that, even after 20 years of exhaustive sequencing of human HLA, the number of dominant HLA alleles, on which our original estimate of selection intensity depended, appears to be conserved. Indeed, according to the frequency distribution for each HLA allele, most sequences in the database were minor or private alleles; therefore, we conclude that the selection intensities of HLA loci are at most 4.4 % even though the HLA is the prominent example on which the natural selection has been operating.  相似文献   

15.
The factors maintaining sex chromosome meiotic drive, or sex ratio (SR), in natural populations remain uncertain. Coevolution between segregation distortion and modifiers should produce transient SR distortion while selection can result in a stable polymorphism. We hypothesize that if SR is maintained by selection, then phylogenetically related populations should exhibit similar SR frequency and intensity. Furthermore, when drive is present, females should mate with multiple males more often both to insure fertility and to increase the probability of producing male progeny. In this paper we report on variation in SR frequency and multiple mating among seven populations and three species of stalk-eyed flies, genus Cyrtodiopsis, from southeast Asia. Using a phylogenetic hypothesis based on 1100 bp of mtDNA sequence we find that while sex chromosome meiotic drive is present in all populations of C. whitei and C. dalmanni, the frequency and intensity of drive only differs between populations or species with greater than 4.8% sequence divergence. The frequency of females mating with multiple males is higher in populations with SR. In addition, SR males mate less often, possibly to compensate for sperm depletion. Our results suggest that sex chromosome drive is maintained by balancing selection in populations of C. whitei and C. dalmanni. Nevertheless, coevolution between drive and suppressors deserves further study.  相似文献   

16.
Ecological communities are structured by competitive, predatory, mutualistic and parasitic interactions combined with chance events. Separating deterministic from stochastic processes is possible, but finding statistical evidence for specific biological interactions is challenging. We attempt to solve this problem for ant communities nesting in epiphytic bird’s nest ferns (Asplenium nidus) in Borneo’s lowland rainforest. By recording the frequencies with which each and every single ant species occurred together, we were able to test statistically for patterns associated with interspecific competition. We found evidence for competition, but the resulting co-occurrence pattern was the opposite of what we expected. Rather than detecting species segregation—the classical hallmark of competition—we found species aggregation. Moreover, our approach of testing individual pairwise interactions mostly revealed spatially positive rather than negative associations. Significant negative interactions were only detected among large ants, and among species of the subfamily Ponerinae. Remarkably, the results from this study, and from a corroborating analysis of ant communities known to be structured by competition, suggest that competition within the ants leads to species aggregation rather than segregation. We believe this unexpected result is linked with the displacement of species following asymmetric competition. We conclude that analysing co-occurrence frequencies across complete species assemblages, separately for each species, and for each unique pairwise combination of species, represents a subtle yet powerful way of detecting structure and compartmentalisation in ecological communities.  相似文献   

17.
When selection favours rare alleles over common ones (balancing selection in the form of negative frequency-dependent selection), a locus may maintain a large number of alleles, each at similar frequency. To better understand how allelic richness is generated and maintained at such loci, we assessed 201 sequences of the complementary sex determiner (csd) of the Asian honeybee (Apis cerana), sampled from across its range. Honeybees are haplodiploid; hemizygotes at csd develop as males and heterozygotes as females, while homozygosity is lethal. Thus, csd is under strong negative frequency-dependent selection because rare alleles are less likely to end up in the lethal homozygous form. We find that in A. cerana, as in other Apis, just a few amino acid differences between csd alleles in the hypervariable region are sufficient to trigger female development. We then show that while allelic lineages are spread across geographical regions, allelic differentiation is high between populations, with most csd alleles (86.3%) detected in only one sample location. Furthermore, nucleotide diversity in the hypervariable region indicates an excess of recently arisen alleles, possibly associated with population expansion across Asia since the last glacial maximum. Only the newly invasive populations of the Austral-Pacific share most of their csd alleles. In all, the geographic patterns of csd diversity in A. cerana indicate that high mutation rates and balancing selection act together to produce high rates of allele genesis and turnover at the honeybee sex locus, which in turn leads to its exceptionally high local and global polymorphism.Subject terms: Evolutionary genetics, Rare variants, Ecological genetics  相似文献   

18.
Identifying adaptively important loci in recently bottlenecked populations – be it natural selection acting on a population following the colonization of novel habitats in the wild, or artificial selection during the domestication of a breed – remains a major challenge. Here we report the results of a simulation study examining the performance of available population-genetic tools for identifying genomic regions under selection. To illustrate our findings, we examined the interplay between selection and demography in two species of Peromyscus mice, for which we have independent evidence of selection acting on phenotype as well as functional evidence identifying the underlying genotype. With this unusual information, we tested whether population-genetic-based approaches could have been utilized to identify the adaptive locus. Contrary to published claims, we conclude that the use of the background site frequency spectrum as a null model is largely ineffective in bottlenecked populations. Results are quantified both for site frequency spectrum and linkage disequilibrium-based predictions, and are found to hold true across a large parameter space that encompasses many species and populations currently under study. These results suggest that the genomic footprint left by selection on both new and standing variation in strongly bottlenecked populations will be difficult, if not impossible, to find using current approaches.  相似文献   

19.
Genetic adaptation to external stimuli occurs through the combined action of mutation and selection. A central problem in genetics is to identify loci responsive to specific selective constraints. Many tests have been proposed to identify the genomic signatures of natural selection by quantifying the skew in the site frequency spectrum (SFS) under selection relative to neutrality. We build upon recent work that connects many of these tests under a common framework, by describing how selective sweeps affect the scaled SFS. We show that the specific skew depends on many attributes of the sweep, including the selection coefficient and the time under selection. Using supervised learning on extensive simulated data, we characterize the features of the scaled SFS that best separate different types of selective sweeps from neutrality. We develop a test, SFselect, that consistently outperforms many existing tests over a wide range of selective sweeps. We apply SFselect to polymorphism data from a laboratory evolution experiment of Drosophila melanogaster adapted to hypoxia and identify loci that strengthen the role of the Notch pathway in hypoxia tolerance, but were missed by previous approaches. We further apply our test to human data and identify regions that are in agreement with earlier studies, as well as many novel regions.  相似文献   

20.
High genetic load in the Pacific oyster Crassostrea gigas   总被引:12,自引:0,他引:12  
Launey S  Hedgecock D 《Genetics》2001,159(1):255-265
The causes of inbreeding depression and the converse phenomenon of heterosis or hybrid vigor remain poorly understood despite their scientific and agricultural importance. In bivalve molluscs, related phenomena, marker-associated heterosis and distortion of marker segregation ratios, have been widely reported over the past 25 years. A large load of deleterious recessive mutations could explain both phenomena, according to the dominance hypothesis of heterosis. Using inbred lines derived from a natural population of Pacific oysters and classical crossbreeding experiments, we compare the segregation ratios of microsatellite DNA markers at 6 hr and 2-3 months postfertilization in F(2) or F(3) hybrid families. We find evidence for strong and widespread selection against identical-by-descent marker homozygotes. The marker segregation data, when fit to models of selection against linked deleterious recessive mutations and extrapolated to the whole genome, suggest that the wild founders of inbred lines carried a minimum of 8-14 highly deleterious recessive mutations. This evidence for a high genetic load strongly supports the dominance theory of heterosis and inbreeding depression and establishes the oyster as an animal model for understanding the genetic and physiological causes of these economically important phenomena.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号