首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
It remains a central problem in population genetics to infer the past action of natural selection, and these inferences pose a challenge because demographic events will also substantially affect patterns of polymorphism and divergence. Thus it is imperative to explicitly model the underlying demographic history of the population whenever making inferences about natural selection. In light of the considerable interest in adaptation in African populations of Drosophila melanogaster, which are considered ancestral to the species, we generated a large polymorphism data set representing 2.1 Mb from each of 20 individuals from a Ugandan population of D. melanogaster. In contrast to previous inferences of a simple population expansion in eastern Africa, our demographic modeling of this ancestral population reveals a strong signature of a population bottleneck followed by population expansion, which has significant implications for future demographic modeling of derived populations of this species. Taking this more complex underlying demographic history into account, we also estimate a mean X-linked region-wide rate of adaptation of 6 × 10−11/site/generation and a mean selection coefficient of beneficial mutations of 0.0009. These inferences regarding the rate and strength of selection are largely consistent with most other estimates from D. melanogaster and indicate a relatively high rate of adaptation driven by weakly beneficial mutations.  相似文献   

2.
Drosophila melanogaster is a leading model in population genetics and genomics, and a growing number of whole-genome data sets from natural populations of this species have been published over the last years. A major challenge is the integration of disparate data sets, often generated using different sequencing technologies and bioinformatic pipelines, which hampers our ability to address questions about the evolution of this species. Here we address these issues by developing a bioinformatics pipeline that maps pooled sequencing (Pool-Seq) reads from D. melanogaster to a hologenome consisting of fly and symbiont genomes and estimates allele frequencies using either a heuristic (PoolSNP) or a probabilistic variant caller (SNAPE-pooled). We use this pipeline to generate the largest data repository of genomic data available for D. melanogaster to date, encompassing 271 previously published and unpublished population samples from over 100 locations in >20 countries on four continents. Several of these locations have been sampled at different seasons across multiple years. This data set, which we call Drosophila Evolution over Space and Time (DEST), is coupled with sampling and environmental metadata. A web-based genome browser and web portal provide easy access to the SNP data set. We further provide guidelines on how to use Pool-Seq data for model-based demographic inference. Our aim is to provide this scalable platform as a community resource which can be easily extended via future efforts for an even more extensive cosmopolitan data set. Our resource will enable population geneticists to analyze spatiotemporal genetic patterns and evolutionary dynamics of D. melanogaster populations in unprecedented detail.  相似文献   

3.
Assessing the extent of linkage disequilibrium (LD) in natural populations of a nonmodel species has been difficult due to the lack of available genomic markers. However, with advances in genotyping and genome sequencing, genomic characterization of natural populations has become feasible. Using sequence data and SNP genotypes, we measured LD and modeled the demographic history of wild canid populations and domestic dog breeds. In 11 gray wolf populations and one coyote population, we find that the extent of LD as measured by the distance at which r2 = 0.2 extends <10 kb in outbred populations to >1.7 Mb in populations that have experienced significant founder events and bottlenecks. This large range in the extent of LD parallels that observed in 18 dog breeds where the r2 value varies from ~20 kb to >5 Mb. Furthermore, in modeling demographic history under a composite-likelihood framework, we find that two of five wild canid populations exhibit evidence of a historical population contraction. Five domestic dog breeds display evidence for a minor population contraction during domestication and a more severe contraction during breed formation. Only a 5% reduction in nucleotide diversity was observed as a result of domestication, whereas the loss of nucleotide diversity with breed formation averaged 35%.  相似文献   

4.
Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme.  相似文献   

5.
6.
Short read sequencing of diploid individuals does not permit the direct inference of the sequence on each of the two homologous chromosomes. Although various phasing software packages exist, they were primarily tailored for and tested on human data, which differ from other species in factors that influence phasing, such as SNP density, amounts of linkage disequilibrium (LD) and sample sizes. Despite becoming increasingly popular for other species, the reliability of phasing in non‐human data has not been evaluated to a sufficient extent. We scrutinized the phasing accuracy for Drosophila melanogaster, a species with high polymorphism levels and reduced LD relative to humans. We phased two D. melanogaster populations and compared the results to the known haplotypes. The performance increased with size of the reference panel and was highest when the reference panel and phased individuals were from the same population. Full genomic SNP data and inclusion of sequence read information also improved phasing. Despite humans and Drosophila having similar switch error rates between polymorphic sites, the distances between switch errors were much shorter in Drosophila with only fragments <300–1500 bp being correctly phased with ≥95% confidence. This suggests that the higher SNP density cannot compensate for the higher recombination rate in D. melanogaster. Furthermore, we show that populations that have gone through demographic events such as bottlenecks can be phased with higher accuracy. Our results highlight that statistically phased data are particularly error prone in species with large population sizes or populations lacking suitable reference panels.  相似文献   

7.
Linkage disequilibrium (LD) is the nonrandom association of alleles at two markers. Patterns of LD have biological implications as well as practical ones when designing association studies or conservation programs aimed at identifying the genetic basis of fitness differences within and among populations. However, the temporal dynamics of LD in wild populations has received little empirical attention. In this study, we examined the overall extent of LD, the effect of sample size on the accuracy and precision of LD estimates, and the temporal dynamics of LD in two populations of bighorn sheep (Ovis canadensis) with different demographic histories. Using over 200 microsatellite loci, we assessed two metrics of multi‐allelic LD, D′, and χ′2. We found that both populations exhibited high levels of LD, although the extent was much shorter in a native population than one that was founded via translocation, experienced a prolonged bottleneck post founding, followed by recent admixture. In addition, we observed significant variation in LD in relation to the sample size used, with small sample sizes leading to depressed estimates of the extent of LD but inflated estimates of background levels of LD. In contrast, there was not much variation in LD among yearly cross‐sections within either population once sample size was accounted for. Lack of pronounced interannual variability suggests that researchers may not have to worry about interannual variation when estimating LD in a population and can instead focus on obtaining the largest sample size possible.  相似文献   

8.
Whole-genome resequencing (WGR) is a high-throughput way to determine genomic variations in breeding-related research. Accuracy and sensitivity are two of the most important issues in variation calling of WGR, especially for samples with low-depth resequencing data, which are used to reduce cost and save time in studies as survey of core germplasms from natural populations or genome-based breeding selection in segregation populations. An approach called pooled mapping was developed to call variations from low-depth resequencing data of natural or segregation populations. It is highly accurate and sensitive. First, pooled mapping creates a library of confident polymorphic loci in genomes of the population; then, the genotypes are called out at these confident loci for each sample in an efficient manner. The reliability of this pooled mapping method was confirmed using simulated datasets, real resequencing data and experimental genotyping. With onefold simulated resequencing data, results showed that pooled mapping identified SNPs in high accuracy (99.59 %) and sensitivity (93 %), compared to the commonly used method (accuracy: 29 %; sensitivity: 56 %). For the real low-depth resequencing data (≈0.8×) of 281 B. oleracea accessions, four loci corresponding to 1063 sites were selected for KASP genotyping to confirm the performance of pooled mapping. We found for all the 875 homozygous sites analyzed, pooled mapping achieved accuracy as 98.24 % and a sensitivity as 90.97 %. In conclusion, pooled mapping is an efficient means of determining reliable genomic variations with limited resequencing data for population samples. It will be a valuable tool in population genomic analysis and genome-based breeding research.  相似文献   

9.
Detecting the targets of adaptive natural selection from whole genome sequencing data is a central problem for population genetics. However, to date most methods have shown sub-optimal performance under realistic demographic scenarios. Moreover, over the past decade there has been a renewed interest in determining the importance of selection from standing variation in adaptation of natural populations, yet very few methods for inferring this model of adaptation at the genome scale have been introduced. Here we introduce a new method, S/HIC, which uses supervised machine learning to precisely infer the location of both hard and soft selective sweeps. We show that S/HIC has unrivaled accuracy for detecting sweeps under demographic histories that are relevant to human populations, and distinguishing sweeps from linked as well as neutrally evolving regions. Moreover, we show that S/HIC is uniquely robust among its competitors to model misspecification. Thus, even if the true demographic model of a population differs catastrophically from that specified by the user, S/HIC still retains impressive discriminatory power. Finally, we apply S/HIC to the case of resequencing data from human chromosome 18 in a European population sample, and demonstrate that we can reliably recover selective sweeps that have been identified earlier using less specific and sensitive methods.  相似文献   

10.

Key message

The number of SNPs required for QTL discovery is justified by the distance at which linkage disequilibrium has decayed. Simulations and real potato SNP data showed how to estimate and interpret LD decay.

Abstract

The magnitude of linkage disequilibrium (LD) and its decay with genetic distance determine the resolution of association mapping, and are useful for assessing the desired numbers of SNPs on arrays. To study LD and LD decay in tetraploid potato, we simulated autotetraploid genotypes and used it to explore the dependence on: (1) the number of haplotypes in the population (the amount of genetic variation) and (2) the percentage of haplotype specific SNPs (hs-SNPs). Several estimators for short-range LD were explored, such as the average r 2, median r 2, and other percentiles of r 2 (80, 90, and 95 %). For LD decay, we looked at LD½,90, the distance at which the short-range LD is halved when using the 90 % percentile of r 2 at short range, as estimator for LD. Simulations showed that the performance of various estimators for LD decay strongly depended on the number of haplotypes, although the real value of LD decay was not influenced very much by this number. The estimator LD½,90 was chosen to evaluate LD decay in 537 tetraploid varieties. LD½,90 values were 1.5 Mb for varieties released before 1945 and 0.6 Mb in varieties released after 2005. LD½,90 values within three different subpopulations ranged from 0.7 to 0.9 Mb. LD½,90 was 2.5 Mb for introgressed regions, indicating large haplotype blocks. In pericentromeric heterochromatin, LD decay was negligible. This study demonstrates that several related factors influencing LD decay could be disentangled, that no universal approach can be suggested, and that the estimation of LD decay has to be performed with great care and knowledge of the sampled material.
  相似文献   

11.
D Gianola  S Qanbari  H Simianer 《Heredity》2013,111(4):275-285
The analysis of systems involving many loci is important in population and quantitative genetics. An important problem is the study of linkage disequilibrium (LD), a concept relevant in genome-enabled prediction of quantitative traits and in exploration of marker–phenotype associations. This article introduces a new estimator of a LD parameter (ρ2) that is much easier to compute than a maximum likelihood (or Bayesian) estimate of a tetra-choric correlation. We examined the conjecture that the sampling distribution of the estimator of ρ2 could be less frequency dependent than that of the estimator of r2, a widely used metric for assessing LD. This was done via an empirical evaluation of LD in 806 Holstein–Friesian cattle using 771 single-nucleotide polymorphism (SNP) markers and of HapMap III data on 21 991 SNPs (chromosome 3) observed in 88 unrelated individuals from Tuscany. Also, 1600 haplotypes over a region of 1 Mb simulated under the coalescent were used to estimate LD using the two measures. Subsequently, a simulation study compared the new estimator with that of r2 using several scenarios of LD and allelic frequencies. From these studies, it is concluded that ρ2 provides a useful metric for the study of LD as the distribution of its estimator is less frequency dependent than that of the standard estimator of r2.  相似文献   

12.
The level of population structure and the extent of linkage disequilibrium (LD) can have large impacts on the power, resolution, and design of genome-wide association studies (GWAS) in plants. Until recently, the topics of LD and population structure have not been explored in oat due to the lack of a high-throughput, high-density marker system. The objectives of this research were to survey the level of population structure and the extent of LD in oat germplasm and determine their implications for GWAS. In total, 1,205 lines and 402 diversity array technology (DArT) markers were used to explore population structure. Principal component analysis and model-based cluster analysis of these data indicated that, for the lines used in this study, relatively weak population structure exists. To explore LD decay, map distances of 2,225 linked DArT marker pairs were compared with LD (estimated as r 2). Results showed that LD between linked markers decayed rapidly to r 2 = 0.2 for marker pairs with a map distance of 1.0 centi-Morgan (cM). For GWAS, we suggest a minimum of one marker every cM, but higher densities of markers should increase marker-QTL association and therefore detection power. Additionally, it was found that LD was relatively consistent across the majority of germplasm clusters. These findings suggest that GWAS in oat can include germplasm with diverse origins and backgrounds. The results from this research demonstrate the feasibility of GWAS and related analyses in oat.  相似文献   

13.
The pattern of linkage disequilibrium in German Holstein cattle   总被引:1,自引:0,他引:1  
This study presents a second generation of linkage disequilibrium (LD) map statistics for the whole genome of the Holstein–Friesian population, which has a four times higher resolution compared with that of the maps available so far. We used DNA samples of 810 German Holstein–Friesian cattle genotyped by the Illumina Bovine SNP50K BeadChip to analyse LD structure. A panel of 40 854 (75.6%) markers was included in the final analysis. The pairwise r2 statistic of SNPs up to 5 Mb apart across the genome was estimated. A mean value of r2 = 0.30 ± 0.32 was observed in pairwise distances of <25 kb and it dropped to 0.20 ± 0.24 at 50–75 kb, which is nearly the average inter‐marker space in this study. The proportion of SNPs in useful LD (r20.25) was 26% for the distance of 50 and 75 kb between SNPs. We found a lower level of LD for SNP pairs at the distance ≤100 kb than previously thought. Analysis revealed 712 haplo‐blocks spanning 4.7% of the genome and containing 8.0% of all SNPs. Mean and median block length were estimated as 164 ± 117 kb and 144 kb respectively. Allele frequencies of the SNPs have a considerable and systematic impact on the estimate of r2. It is shown that minimizing the allele frequency difference between SNPs reduces the influence of frequency on r2 estimates. Analysis of past effective population size based on the direct estimates of recombination rates from SNP data showed a decline in effective population size to Ne = 103 up to ~4 generations ago. Systematic effects of marker density and effective population size on observed LD and haplotype structure are discussed.  相似文献   

14.
Abstract: The northern bobwhite (Colinus virginianus) is an economically important gamebird that is currently undergoing widespread population declines. Despite considerable research on the population ecology of bobwhites, there have been few attempts to model population dynamics of bobwhites to determine the contributions of different demographic parameters to variance of the finite rate of population change (Λ). We conducted a literature review and compiled 405 estimates of 9 demographic parameters from 49 field studies of bobwhites. To identify demographic parameters that might be important for management, we used life-stage simulation analyses (LSA) to examine sensitivity of Λ to simulated variation in 9 demographic parameters for female bobwhites. In a baseline LSA based on uniform distributions bounded by the range of estimates for each demographic parameter, bobwhite populations were predicted to decline (Λ = 0.56) and winter survival of adults made the greatest contribution to variance of Λ (r2 = 0.453), followed by summer survival of adults (r2 = 0.163), and survival of chicks (r2 = 0.120). Population change was not sensitive to total clutch laid, nest survival, egg hatchability, or 3 parameters associated with the number of nesting attempts (r2<0.06). Our conclusions were robust to alternative simulation scenarios, and parameter rankings changed only if we adjusted the lower bounds of winter survival upwards. Bobwhite populations were not viable with survival rates reported from most field studies. Survival rates may be depressed below sustainable levels by environmental conditions or possibly by impacts of capture and telemetry methods. Overall, our simulation results indicate that management practices that improve seasonal survival rates will have the greatest potential benefit for recovery of declining populations of bobwhites.  相似文献   

15.
Examples of clinal variation in phenotypes and genotypes across latitudinal transects have served as important models for understanding how spatially varying selection and demographic forces shape variation within species. Here, we examine the selective and demographic contributions to latitudinal variation through the largest comparative genomic study to date of Drosophila simulans and Drosophila melanogaster, with genomic sequence data from 382 individual fruit flies, collected across a spatial transect of 19 degrees latitude and at multiple time points over 2 years. Consistent with phenotypic studies, we find less clinal variation in D. simulans than D. melanogaster, particularly for the autosomes. Moreover, we find that clinally varying loci in D. simulans are less stable over multiple years than comparable clines in D. melanogaster. D. simulans shows a significantly weaker pattern of isolation by distance than D. melanogaster and we find evidence for a stronger contribution of migration to D. simulans population genetic structure. While population bottlenecks and migration can plausibly explain the differences in stability of clinal variation between the two species, we also observe a significant enrichment of shared clinal genes, suggesting that the selective forces associated with climate are acting on the same genes and phenotypes in D. simulans and D. melanogaster.  相似文献   

16.
A rapid and cost-effective method of sampling hemolymph from the model insect Drosophila melanogaster is needed for studies in several fields, including ionoregulatory physiology, metabolism, immunology and toxicology. Here, we describe the construction and use of a device that uses airflow and pressure to manipulate adult flies and extract high-volume hemolymph samples. This method is rapid and inexpensive, and does not require cold or CO2 anesthesia at any point in the sampling process, thus avoiding the possible confounding effects of these treatments on the biochemical properties of the hemolymph sampled. To demonstrate one use for this method, we measure active concentrations of Na+ and K+ in isolated hemolymph droplets from individual adult D. melanogaster using an ion-selective microelectrode technique.  相似文献   

17.
The recurrent fixation of newly arising, beneficial mutations in a species reduces levels of linked neutral variability. Models positing frequent weakly beneficial substitutions or, alternatively, rare, strongly selected substitutions predict similar average effects on linked neutral variability, if the product of the rate and strength of selection is held constant. We propose an approximate Bayesian (ABC) polymorphism-based estimator that can be used to distinguish between these models, and apply it to multi-locus data from Drosophila melanogaster. We investigate the extent to which inference about the strength of selection is sensitive to assumptions about the underlying distributions of the rates of substitution and recombination, the strength of selection, heterogeneity in mutation rate, as well as the population''s demographic history. We show that assuming fixed values of selection parameters in estimation leads to overestimates of the strength of selection and underestimates of the rate. We estimate parameters for an African population of D. melanogaster (ŝ∼2E−03, ) and compare these to previous estimates. Finally, we show that surveying larger genomic regions is expected to lend much more discriminatory power to the approach. It will thus be of great interest to apply this method to emerging whole-genome polymorphism data sets in many taxa.  相似文献   

18.
The nonrecombining Drosophila melanogaster Y chromosome is heterochromatic and has few genes. Despite these limitations, there remains ample opportunity for natural selection to act on the genes that are vital for male fertility and on Y factors that modulate gene expression elsewhere in the genome. Y chromosomes of many organisms have low levels of nucleotide variability, but a formal survey of D. melanogaster Y chromosome variation had yet to be performed. Here we surveyed Y-linked variation in six populations of D. melanogaster spread across the globe. We find surprisingly low levels of variability in African relative to Cosmopolitan (i.e., non-African) populations. While the low levels of Cosmopolitan Y chromosome polymorphism can be explained by the demographic histories of these populations, the staggeringly low polymorphism of African Y chromosomes cannot be explained by demographic history. An explanation that is entirely consistent with the data is that the Y chromosomes of Zimbabwe and Uganda populations have experienced recent selective sweeps. Interestingly, the Zimbabwe and Uganda Y chromosomes differ: in Zimbabwe, a European Y chromosome appears to have swept through the population.  相似文献   

19.
Kusakabe S  Mukai T 《Genetics》1984,108(2):393-408
About 400 second chromosomes were extracted from the Aomori population, a northernmost population of D. melanogaster on Honshu in Japan, and the following experimental results were obtained. (1) The frequency of lethal chromosomes was 0.23. (2) The effective size of the population was estimated to be about 3000, from the allelism rate of lethal chromosomes and their frequency. (3) The detrimental and lethal loads for viability were 0.243 and 0.242, respectively, and the D/L ratio became 1.00. (4) The average degree of dominance for mildly deleterious genes was estimated to be 0.178 ± 0.056. (5) Additive (σ2A) and dominance (σ2D) variances of viability were estimated to be 0.00276 ± 0.00090 and 0.00011 ± 0.00014, respectively. (6) There was no significant difference in environmental variances between homozygotes and heterozygotes. Using these estimates, we discuss the maintenance mechanisms of genetic variability of viability in the population. The mutation-selection balance explained these experimental results.  相似文献   

20.
Knowing the distribution of fitness effects (DFE) of new mutations is important for several topics in evolutionary genetics. Existing computational methods with which to infer the DFE based on DNA polymorphism data have frequently assumed that the DFE can be approximated by a unimodal distribution, such as a lognormal or a gamma distribution. However, if the true DFE departs substantially from the assumed distribution (e.g., if the DFE is multimodal), this could lead to misleading inferences about its properties. We conducted simulations to test the performance of parametric and nonparametric discretized distribution models to infer the properties of the DFE for cases in which the true DFE is unimodal, bimodal, or multimodal. We found that lognormal and gamma distribution models can perform poorly in recovering the properties of the distribution if the true DFE is bimodal or multimodal, whereas discretized distribution models perform better. If there is a sufficient amount of data, the discretized models can detect a multimodal DFE and can accurately infer the mean effect and the average fixation probability of a new deleterious mutation. We fitted several models for the DFE of amino acid-changing mutations using whole-genome polymorphism data from Drosophila melanogaster and the house mouse subspecies Mus musculus castaneus. A lognormal DFE best explains the data for D. melanogaster, whereas we find evidence for a bimodal DFE in M. m. castaneus.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号