首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
An outstanding question in human genetics has been the degree to which adaptation occurs from standing genetic variation or from de novo mutations. Here, we combine several common statistics used to detect selection in an Approximate Bayesian Computation (ABC) framework, with the goal of discriminating between models of selection and providing estimates of the age of selected alleles and the selection coefficients acting on them. We use simulations to assess the power and accuracy of our method and apply it to seven of the strongest sweeps currently known in humans. We identify two genes, ASPM and PSCA, that are most likely affected by selection on standing variation; and we find three genes, ADH1B, LCT, and EDAR, in which the adaptive alleles seem to have swept from a new mutation. We also confirm evidence of selection for one further gene, TRPV6. In one gene, G6PD, neither neutral models nor models of selective sweeps fit the data, presumably because this locus has been subject to balancing selection.  相似文献   

2.
One of the key steps in positional cloning and marker-aided selection is to identify marker(s) tightly linked to the target gene (i.e., fine mapping). Selective genotyping such as selective recombinant genotyping (SRG) is commonly used in fine mapping for cost-saving. To further decrease genotyping effort and rapidly screen for tightly linked markers, we propose here a combined DNA pooling and SRG strategy. A two-stage pooled genotyping can be used for identifying recombinants between a pair of flanking markers more efficiently, and a joint use of bulked DNA analysis and two-stage pooling can also save cost for genotyping recombinants. The combined DNA pooling and SRG strategy can further be extended to fine mapping for polygenic traits. The numerical results based on hypothetical scenarios and an illustrative application to fine mapping of a mutant gene, called xl(t), in rice suggest that the proposed strategy can remarkably reduce genotyping amount compared with the conventional SRG.  相似文献   

3.
We have evaluated a pooling approach that can reduce the number of polymerase chain reactions in a screen for selective sweeps by more than an order of magnitude. We show that the complex peak pattern that results from pooling of all samples from a given population is a faithful reflection of the composite pattern of the individual alleles, although with an under‐representation of the larger alleles. Candidate loci for selective sweeps can be identified by visual inspection of the pool patterns. We have also implemented a software tool, which can find suitable microsatellite loci in the vicinity of annotated genes.  相似文献   

4.
Due to its cost effectiveness, next-generation sequencing of pools of individuals (Pool-Seq) is becoming a popular strategy for characterizing variation in population samples. Because Pool-Seq provides genome-wide SNP frequency data, it is possible to use them for demographic inference and/or the identification of selective sweeps. Here, we introduce a statistical method that is designed to detect selective sweeps from pooled data by accounting for statistical challenges associated with Pool-Seq, namely sequencing errors and random sampling among chromosomes. This allows for an efficient use of the information: all base calls are included in the analysis, but the higher credibility of regions with higher coverage and base calls with better quality scores is accounted for. Computer simulations show that our method efficiently detects sweeps even at very low coverage (0.5× per chromosome). Indeed, the power of detecting sweeps is similar to what we could expect from sequences of individual chromosomes. Since the inference of selective sweeps is based on the allele frequency spectrum (AFS), we also provide a method to accurately estimate the AFS provided that the quality scores for the sequence reads are reliable. Applying our approach to Pool-Seq data from Drosophila melanogaster, we identify several selective sweep signatures on chromosome X that include some previously well-characterized sweeps like the wapl region.  相似文献   

5.
Genome-wide genotyping of a cohort using pools rather than individual samples has long been proposed as a cost-saving alternative for performing genome-wide association (GWA) studies. However, successful disease gene mapping using pooled genotyping has thus far been limited to detecting common variants with large effect sizes, which tend not to exist for many complex common diseases or traits. Therefore, for DNA pooling to be a viable strategy for conducting GWA studies, it is important to determine whether commonly used genome-wide SNP array platforms such as the Affymetrix 6.0 array can reliably detect common variants of small effect sizes using pooled DNA. Taking obesity and age at menarche as examples of human complex traits, we assessed the feasibility of genome-wide genotyping of pooled DNA as a single-stage design for phenotype association. By individually genotyping the top associations identified by pooling, we obtained a 14- to 16-fold enrichment of SNPs nominally associated with the phenotype, but we likely missed the top true associations. In addition, we assessed whether genotyping pooled DNA can serve as an inexpensive screen as the second stage of a multi-stage design with a large number of samples by comparing the most cost-effective 3-stage designs with 80% power to detect common variants with genotypic relative risk of 1.1, with and without pooling. Given the current state of the specific technology we employed and the associated genotyping costs, we showed through simulation that a design involving pooling would be 1.07 times more expensive than a design without pooling. Thus, while a significant amount of information exists within the data from pooled DNA, our analysis does not support genotyping pooled DNA as a means to efficiently identify common variants contributing small effects to phenotypes of interest. While our conclusions were based on the specific technology and study design we employed, the approach presented here will be useful for evaluating the utility of other or future genome-wide genotyping platforms in pooled DNA studies.  相似文献   

6.
Identifying genomic targets of population‐specific positive selection is a major goal in several areas of basic and applied biology. However, it is unclear how often such selection should act on new mutations versus standing genetic variation or recurrent mutation, and furthermore, favoured alleles may either become fixed or remain variable in the population. Very few population genetic statistics are sensitive to all of these modes of selection. Here, we introduce and evaluate the Comparative Haplotype Identity statistic (χMD), which assesses whether pairwise haplotype sharing at a locus in one population is unusually large compared with another population, relative to genomewide trends. Using simulations that emulate human and Drosophila genetic variation, we find that χMD is sensitive to a wide range of selection scenarios, and for some very challenging cases (e.g. partial soft sweeps), it outperforms other two‐population statistics. We also find that, as with FST, our haplotype approach has the ability to detect surprisingly ancient selective sweeps. Particularly for the scenarios resembling human variation, we find that χMD outperforms other frequency‐ and haplotype‐based statistics for soft and/or partial selective sweeps. Applying χMD and other between‐population statistics to published population genomic data from D. melanogaster, we find both shared and unique genes and functional categories identified by each statistic. The broad utility and computational simplicity of χMD will make it an especially valuable tool in the search for genes targeted by local adaptation.  相似文献   

7.
Detecting the targets of adaptive natural selection from whole genome sequencing data is a central problem for population genetics. However, to date most methods have shown sub-optimal performance under realistic demographic scenarios. Moreover, over the past decade there has been a renewed interest in determining the importance of selection from standing variation in adaptation of natural populations, yet very few methods for inferring this model of adaptation at the genome scale have been introduced. Here we introduce a new method, S/HIC, which uses supervised machine learning to precisely infer the location of both hard and soft selective sweeps. We show that S/HIC has unrivaled accuracy for detecting sweeps under demographic histories that are relevant to human populations, and distinguishing sweeps from linked as well as neutrally evolving regions. Moreover, we show that S/HIC is uniquely robust among its competitors to model misspecification. Thus, even if the true demographic model of a population differs catastrophically from that specified by the user, S/HIC still retains impressive discriminatory power. Finally, we apply S/HIC to the case of resequencing data from human chromosome 18 in a European population sample, and demonstrate that we can reliably recover selective sweeps that have been identified earlier using less specific and sensitive methods.  相似文献   

8.
Gompert Z  Buerkle CA 《Genetics》2011,187(3):903-917
The demography of populations and natural selection shape genetic variation across the genome and understanding the genomic consequences of these evolutionary processes is a fundamental aim of population genetics. We have developed a hierarchical Bayesian model to quantify genome-wide population structure and identify candidate genetic regions affected by selection. This model improves on existing methods by accounting for stochastic sampling of sequences inherent in next-generation sequencing (with pooled or indexed individual samples) and by incorporating genetic distances among haplotypes in measures of genetic differentiation. Using simulations we demonstrate that this model has a low false-positive rate for classifying neutral genetic regions as selected genes (i.e., Φ(ST) outliers), but can detect recent selective sweeps, particularly when genetic regions in multiple populations are affected by selection. Nonetheless, selection affecting just a single population was difficult to detect and resulted in a high false-negative rate under certain conditions. We applied the Bayesian model to two large sets of human population genetic data. We found evidence of widespread positive and balancing selection among worldwide human populations, including many genetic regions previously thought to be under selection. Additionally, we identified novel candidate genes for selection, several of which have been linked to human diseases. This model will facilitate the population genetic analysis of a wide range of organisms on the basis of next-generation sequence data.  相似文献   

9.
Identifying recent positive selection signatures in domesticated animals could provide information on genome response to strong directional selection from domestication and artificial selection and therefore could help in identifying mutations responsible for improved traits. We used genotyping data generated using Illumina's BovineSNP50 Genotyping BeadChips to identify selection signatures in the Blonde d'Aquitaine breed, a well‐muscled French beef breed. For this purpose, we employed a hidden Markov model‐based test, which detects selection by studying local variations in the allele frequency spectrum along the genome, within a single population. Three regions containing selective sweeps were identified. Annotation of genes located within these regions revealed interesting candidate genes. For example, myostatin (also known as GDF8), a known muscle growth factor inhibitor, is located within the selection signature region found on chromosome 2. In addition, we have identified chromosomal regions that show some evidence of selection within QTL regions for economically important traits. The results of this study could help to better understand the mechanisms related to the selection of the Blonde d'Aquitaine breed.  相似文献   

10.
We identified and examined a candidate gene for local directional selection in Europeans, TRPV6, and conclude that selection has acted on standing genetic variation at this locus, creating parallel soft sweep events in humans. A novel modification of the extended haplotype homozygosity (EHH) test was utilized, which compares EHH for a single allele across populations, to investigate the signature of selection at TRPV6 and neighboring linked loci in published data sets for Europeans, Asians and African-Americans, as well as in newly-obtained sequence data for additional populations. We find that all non-African populations carry a signature of selection on the same haplotype at the TRPV6 locus. The selective footprints, however, are significantly differentiated between non-African populations and estimated to be younger than an ancestral population of non-Africans. The possibility of a single selection event occurring in an ancestral population of non-Africans was tested by simulations and rejected. The putatively-selected TRPV6 haplotype contains three candidate sites for functional differences, namely derived non-synonymous substitutions C157R, M378V and M681T. Potential functional differences between the ancestral and derived TRPV6 proteins were investigated by cloning the ancestral and derived forms, transfecting cell lines, and carrying out electrophysiology experiments via patch clamp analysis. No statistically-significant differences in biophysical channel function were found, although one property of the protein, namely Ca(2+) dependent inactivation, may show functionally relevant differences between the ancestral and derived forms. Although the reason for selection on this locus remains elusive, this is the first demonstration of a widespread parallel selection event acting on standing genetic variation in humans, and highlights the utility of between population EHH statistics.  相似文献   

11.
Adaptation from standing genetic variation or recurrent de novo mutation in large populations should commonly generate soft rather than hard selective sweeps. In contrast to a hard selective sweep, in which a single adaptive haplotype rises to high population frequency, in a soft selective sweep multiple adaptive haplotypes sweep through the population simultaneously, producing distinct patterns of genetic variation in the vicinity of the adaptive site. Current statistical methods were expressly designed to detect hard sweeps and most lack power to detect soft sweeps. This is particularly unfortunate for the study of adaptation in species such as Drosophila melanogaster, where all three confirmed cases of recent adaptation resulted in soft selective sweeps and where there is evidence that the effective population size relevant for recent and strong adaptation is large enough to generate soft sweeps even when adaptation requires mutation at a specific single site at a locus. Here, we develop a statistical test based on a measure of haplotype homozygosity (H12) that is capable of detecting both hard and soft sweeps with similar power. We use H12 to identify multiple genomic regions that have undergone recent and strong adaptation in a large population sample of fully sequenced Drosophila melanogaster strains from the Drosophila Genetic Reference Panel (DGRP). Visual inspection of the top 50 candidates reveals that in all cases multiple haplotypes are present at high frequencies, consistent with signatures of soft sweeps. We further develop a second haplotype homozygosity statistic (H2/H1) that, in combination with H12, is capable of differentiating hard from soft sweeps. Surprisingly, we find that the H12 and H2/H1 values for all top 50 peaks are much more easily generated by soft rather than hard sweeps. We discuss the implications of these results for the study of adaptation in Drosophila and in species with large census population sizes.  相似文献   

12.
Although the uniparental (or maternal) inheritance of mitochondrial DNA (mtDNA) is widespread, the reasons for its evolution remain unclear. Two main hypotheses have been proposed: selection against individuals containing different mtDNAs (heteroplasmy) and selection against “selfish” mtDNA mutations. Recently, uniparental inheritance was shown to promote adaptive evolution in mtDNA, potentially providing a third hypothesis for its evolution. Here, we explore this hypothesis theoretically and ask if the accumulation of beneficial mutations provides a sufficient fitness advantage for uniparental inheritance to invade a population in which mtDNA is inherited biparentally. In a deterministic model, uniparental inheritance increases in frequency but cannot replace biparental inheritance if only a single beneficial mtDNA mutation sweeps through the population. When we allow successive selective sweeps of mtDNA, however, uniparental inheritance can replace biparental inheritance. Using a stochastic model, we show that a combination of selection and drift facilitates the fixation of uniparental inheritance (compared to a neutral trait) when there is only a single selective mtDNA sweep. When we consider multiple mtDNA sweeps in a stochastic model, uniparental inheritance becomes even more likely to replace biparental inheritance. Our findings thus suggest that selective sweeps of beneficial mtDNA haplotypes can drive the evolution of uniparental inheritance.  相似文献   

13.
Selective genotyping of one or both phenotypic extremes of a population can be used to detect linkage between markers and quantitative trait loci (QTL) in situations in which full-population genotyping is too costly or not feasible, or where the objective is to rapidly screen large numbers of potential donors for useful alleles with large effects. Data may be subjected to 'trait-based' analysis, in which marker allele frequencies are compared between classes of progeny defined based on trait values, or to 'marker-based' analysis, in which trait means are compared between progeny classes defined based on marker genotypes. Here, bidirectional and unidirectional selective genotyping were simulated, using population sizes and selection intensities relevant to cereal breeding. Control of Type I error was usually adequate with marker-based analysis of variance or trait-based testing using the normal approximation of the binomial distribution. Bidirectional selective genotyping was more powerful than unidirectional. Trait-based analysis and marker-based analysis of variance were about equally powerful. With genotyping of the best 30 out of 500 lines (6%), a QTL explaining 15% of the phenotypic variance could be detected with a power of 0.8 when tests were conducted at a marker 10 cM from the QTL. With bidirectional selective genotyping, QTL with smaller effects and (or) QTL farther from the nearest marker could be detected. Similar QTL detection approaches were applied to data from a population of 436 recombinant inbred rice lines segregating for a large-effect QTL affecting grain yield under drought stress. That QTL was reliably detected by genotyping as few as 20 selected lines (4.5%). In experimental populations, selective genotyping can reduce costs of QTL detection, allowing larger numbers of potential donors to be screened for useful alleles with effects across different backgrounds. In plant breeding programs, selective genotyping can make it possible to detect QTL using even a limited number of progeny that have been retained after selection.  相似文献   

14.
Disentangling the effects of demography and selection in human history   总被引:18,自引:0,他引:18  
Demographic events affect all genes in a genome, whereas natural selection has only local effects. Using publicly available data from 151 loci sequenced in both European-American and African-American populations, we attempt to distinguish the effects of demography and selection. To analyze large sets of population genetic data such as this one, we introduce "Perlymorphism," a Unix-based suite of analysis tools. Our analyses show that the demographic histories of human populations can account for a large proportion of effects on the level and frequency of variation across the genome. The African-American population shows both a higher level of nucleotide diversity and more negative values of Tajima's D statistic than does a European-American population. Using coalescent simulations, we show that the significantly negative values of the D statistic in African-Americans and the positive values in European-Americans are well explained by relatively simple models of population admixture and bottleneck, respectively. Working within these nonequilibrium frameworks, we are still able to show deviations from neutral expectations at a number of loci, including ABO and TRPV6. In addition, we show that the frequency spectrum of mutations--corrected for levels of polymorphism--is correlated with recombination rate only in European-Americans. These results are consistent with repeated selective sweeps in non-African populations, in agreement with recent reports using microsatellite data.  相似文献   

15.
Thornton KR  Jensen JD 《Genetics》2007,175(2):737-750
Rapid typing of genetic variation at many regions of the genome is an efficient way to survey variability in natural populations in an effort to identify segments of the genome that have experienced recent natural selection. Following such a genome scan, individual regions may be chosen for further sequencing and a more detailed analysis of patterns of variability, often to perform a parametric test for selection and to estimate the strength of a recent selective sweep. We show here that not accounting for the ascertainment of loci in such analyses leads to false inference of natural selection when the true model is selective neutrality, because the procedure of choosing unusual loci (in comparison to the rest of the genome-scan data) selects regions of the genome with genealogies similar to those expected under models of recent directional selection. We describe a simple and efficient correction for this ascertainment bias, which restores the false-positive rate to near-nominal levels. For the parameters considered here, we find that obtaining a test with the expected distribution of P-values depends on accurately accounting both for ascertainment of regions and for demography. Finally, we use simulations to explore the utility of relying on outlier loci to detect recent selective sweeps. We find that measures of diversity and of population differentiation are more effective than summaries of the site-frequency spectrum and that sequencing larger regions (2.5 kbp) in genome-scan studies leads to more power to detect recent selective sweeps.  相似文献   

16.
Lin K  Li H  Schlötterer C  Futschik A 《Genetics》2011,187(1):229-244
Summary statistics are widely used in population genetics, but they suffer from the drawback that no simple sufficient summary statistic exists, which captures all information required to distinguish different evolutionary hypotheses. Here, we apply boosting, a recent statistical method that combines simple classification rules to maximize their joint predictive performance. We show that our implementation of boosting has a high power to detect selective sweeps. Demographic events, such as bottlenecks, do not result in a large excess of false positives. A comparison to other neutrality tests shows that our boosting implementation performs well compared to other neutrality tests. Furthermore, we evaluated the relative contribution of different summary statistics to the identification of selection and found that for recent sweeps integrated haplotype homozygosity is very informative whereas older sweeps are better detected by Tajima's π. Overall, Watterson's was found to contribute the most information for distinguishing between bottlenecks and selection.  相似文献   

17.
Inferring the mode and tempo of natural selection helps further our understanding of adaptation to past environmental changes. Here, we introduce McSwan, a method to detect and date past and recent natural selection events in the case of a hard sweep. The method is based on the comparison of site frequency spectra obtained under various demographic models that include selection. McSwan demonstrated high power (high sensitivity and specificity) in capturing hard selective sweep events without requiring haplotype phasing. It performed slightly better than SweeD when the recent effective population size was low and the genomic region was small. We then applied our method to a European (CEU) and an African (LWK) human re‐sequencing data set. Most hard sweeps were detected in the CEU population (96%). Moreover, hard sweeps in the African population were estimated to have occurred further back in time (mode: 43,625 years BP) compared to those of Europeans (mode: 24,850 years BP). Most of the estimated ages of hard sweeps in Europeans were associated with the Last Glacial Maximum and were enriched in immunity‐associated genes.  相似文献   

18.
Seed banking (or dormancy) is a widespread bet-hedging strategy, generating a form of population overlap, which decreases the magnitude of genetic drift. The methodological complexity of integrating this trait implies it is ignored when developing tools to detect selective sweeps. But, as dormancy lengthens the ancestral recombination graph (ARG), increasing times to fixation, it can change the genomic signatures of selection. To detect genes under positive selection in seed banking species it is important to (1) determine whether the efficacy of selection is affected, and (2) predict the patterns of nucleotide diversity at and around positively selected alleles. We present the first tree sequence-based simulation program integrating a weak seed bank to examine the dynamics and genomic footprints of beneficial alleles in a finite population. We find that seed banking does not affect the probability of fixation and confirm expectations of increased times to fixation. We also confirm earlier findings that, for strong selection, the times to fixation are not scaled by the inbreeding effective population size in the presence of seed banks, but are shorter than would be expected. As seed banking increases the effective recombination rate, footprints of sweeps appear narrower around the selected sites and due to the scaling of the ARG are detectable for longer periods of time. The developed simulation tool can be used to predict the footprints of selection and draw statistical inference of past evolutionary events in plants, invertebrates, or fungi with seed banks.  相似文献   

19.
Adaptation from de novo mutation can produce so-called soft selective sweeps, where adaptive alleles of independent mutational origin sweep through the population at the same time. Population genetic theory predicts that such soft sweeps should be likely if the product of the population size and the mutation rate toward the adaptive allele is sufficiently large, such that multiple adaptive mutations can establish before one has reached fixation; however, it remains unclear how demographic processes affect the probability of observing soft sweeps. Here we extend the theory of soft selective sweeps to realistic demographic scenarios that allow for changes in population size over time. We first show that population bottlenecks can lead to the removal of all but one adaptive lineage from an initially soft selective sweep. The parameter regime under which such “hardening” of soft selective sweeps is likely is determined by a simple heuristic condition. We further develop a generalized analytical framework, based on an extension of the coalescent process, for calculating the probability of soft sweeps under arbitrary demographic scenarios. Two important limits emerge within this analytical framework: In the limit where population-size fluctuations are fast compared to the duration of the sweep, the likelihood of soft sweeps is determined by the harmonic mean of the variance effective population size estimated over the duration of the sweep; in the opposing slow fluctuation limit, the likelihood of soft sweeps is determined by the instantaneous variance effective population size at the onset of the sweep. We show that as a consequence of this finding the probability of observing soft sweeps becomes a function of the strength of selection. Specifically, in species with sharply fluctuating population size, strong selection is more likely to produce soft sweeps than weak selection. Our results highlight the importance of accurate demographic estimates over short evolutionary timescales for understanding the population genetics of adaptation from de novo mutation.  相似文献   

20.
A. Darvasi  M. Soller 《Genetics》1994,138(4):1365-1373
Selective genotyping is a method to reduce costs in marker-quantitative trait locus (QTL) linkage determination by genotyping only those individuals with extreme, and hence most informative, quantitative trait values. The DNA pooling strategy (termed: ``selective DNA pooling') takes this one step further by pooling DNA from the selected individuals at each of the two phenotypic extremes, and basing the test for linkage on marker allele frequencies as estimated from the pooled samples only. This can reduce genotyping costs of marker-QTL linkage determination by up to two orders of magnitude. Theoretical analysis of selective DNA pooling shows that for experiments involving backcross, F(2) and half-sib designs, the power of selective DNA pooling for detecting genes with large effect, can be the same as that obtained by individual selective genotyping. Power for detecting genes with small effect, however, was found to decrease strongly with increase in the technical error of estimating allele frequencies in the pooled samples. The effect of technical error, however, can be markedly reduced by replication of technical procedures. It is also shown that a proportion selected of 0.1 at each tail will be appropriate for a wide range of experimental conditions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号