首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 453 毫秒
1.
Classification methods used in machine learning (e.g., artificial neural networks, decision trees, and k-nearest neighbor clustering) are rarely used with population genetic data. We compare different nonparametric machine learning techniques with parametric likelihood estimations commonly employed in population genetics for purposes of assigning individuals to their population of origin ("assignment tests"). Classifier accuracy was compared across simulated data sets representing different levels of population differentiation (low and high F(ST)), number of loci surveyed (5 and 10), and allelic diversity (average of three or eight alleles per locus). Empirical data for the lake trout (Salvelinus namaycush) exhibiting levels of population differentiation comparable to those used in simulations were examined to further evaluate and compare classification methods. Classification error rates associated with artificial neural networks and likelihood estimators were lower for simulated data sets compared to k-nearest neighbor and decision tree classifiers over the entire range of parameters considered. Artificial neural networks only marginally outperformed the likelihood method for simulated data (0-2.8% lower error rates). The relative performance of each machine learning classifier improved relative likelihood estimators for empirical data sets, suggesting an ability to "learn" and utilize properties of empirical genotypic arrays intrinsic to each population. Likelihood-based estimation methods provide a more accessible option for reliable assignment of individuals to the population of origin due to the intricacies in development and evaluation of artificial neural networks.  相似文献   

2.
Important aspects of population evolution have been investigated using nucleotide sequences. Under the neutral Wright–Fisher model, the scaled mutation rate represents twice the average number of new mutations per generations and it is one of the key parameters in population genetics. In this study, we present various methods of estimation of this parameter, analytical studies of their asymptotic behavior as well as comparisons of the distribution's behavior of these estimators through simulations. As knowledge of the genealogy is needed to estimate the maximum likelihood estimator (MLE), an application with real data is also presented, using jackknife to correct the bias of the MLE, which can be generated by the estimation of the tree. We proved analytically that the Waterson's estimator and the MLE are asymptotically equivalent with the same rate of convergence to normality. Furthermore, we showed that the MLE has a better rate of convergence than Waterson's estimator for values of the parameter greater than one and this relationship is reversed when the parameter is less than one.  相似文献   

3.
Molecular phylogenetics has revolutionized the study of not only evolution but also disparate fields such as genomics, bioinformatics, epidemiology, ecology, microbiology, molecular biology and biochemistry. Particularly significant are its achievements in population genetics as a result of the development of coalescent theory, which have contributed to more accurate model-based parameter estimation and explicit hypothesis testing. The study of the evolution of many microorganisms, and HIV in particular, have benefited from these new methodologies. HIV is well suited for such sophisticated population analyses because of its large population sizes, short generation times, high substitution rates and relatively small genomes. All these factors make HIV an ideal and fascinating model to study molecular evolution in real time. Here we review the significant advances made in HIV evolution through the application of phylogenetic approaches. We first examine the relative roles of mutation and recombination on the molecular evolution of HIV and its adaptive response to drug therapy and tissue allocation. We then review some of the fundamental questions in HIV evolution in relation to its origin and diversification and describe some of the insights gained using phylogenies. Finally, we show how phylogenetic analysis has advanced our knowledge of HIV dynamics (i.e., phylodynamics).  相似文献   

4.
Weissman DB  Feldman MW  Fisher DS 《Genetics》2010,186(4):1389-1410
Biological traits result in part from interactions between different genetic loci. This can lead to sign epistasis, in which a beneficial adaptation involves a combination of individually deleterious or neutral mutations; in this case, a population must cross a "fitness valley" to adapt. Recombination can assist this process by combining mutations from different individuals or retard it by breaking up the adaptive combination. Here, we analyze the simplest fitness valley, in which an adaptation requires one mutation at each of two loci to provide a fitness benefit. We present a theoretical analysis of the effect of recombination on the valley-crossing process across the full spectrum of possible parameter regimes. We find that low recombination rates can speed up valley crossing relative to the asexual case, while higher recombination rates slow down valley crossing, with the transition between the two regimes occurring when the recombination rate between the loci is approximately equal to the selective advantage provided by the adaptation. In large populations, if the recombination rate is high and selection against single mutants is substantial, the time to cross the valley grows exponentially with population size, effectively meaning that the population cannot acquire the adaptation. Recombination at the optimal (low) rate can reduce the valley-crossing time by up to several orders of magnitude relative to that in an asexual population.  相似文献   

5.
Vasco DA 《Genetics》2008,179(2):951-963
The estimation of ancestral and current effective population sizes in expanding populations is a fundamental problem in population genetics. Recently it has become possible to scan entire genomes of several individuals within a population. These genomic data sets can be used to estimate basic population parameters such as the effective population size and population growth rate. Full-data-likelihood methods potentially offer a powerful statistical framework for inferring population genetic parameters. However, for large data sets, computationally intensive methods based upon full-likelihood estimates may encounter difficulties. First, the computational method may be prohibitively slow or difficult to implement for large data. Second, estimation bias may markedly affect the accuracy and reliability of parameter estimates, as suggested from past work on coalescent methods. To address these problems, a fast and computationally efficient least-squares method for estimating population parameters from genomic data is presented here. Instead of modeling genomic data using a full likelihood, this new approach uses an analogous function, in which the full data are replaced with a vector of summary statistics. Furthermore, these least-squares estimators may show significantly less estimation bias for growth rate and genetic diversity than a corresponding maximum-likelihood estimator for the same coalescent process. The least-squares statistics also scale up to genome-sized data sets with many nucleotides and loci. These results demonstrate that least-squares statistics will likely prove useful for nonlinear parameter estimation when the underlying population genomic processes have complex evolutionary dynamics involving interactions between mutation, selection, demography, and recombination.  相似文献   

6.
In many instances, there are large sex differences in mutation rates, recombination rates, selection, rates of gene flow, and genetic drift. Mutation rates are often higher in males, a difference that has been estimated both directly and indirectly. The higher male mutation rate appears related to the larger number of cell divisions in male lineages but mutation rates also appear gene- and organism-specific. When there is recombination in only one sex, it is always the homogametic sex. When there is recombination in both sexes, females often have higher recombination but there are many exceptions. There are a number of hypotheses to explain the sex differences in recombination. Sex-specific differences in selection may result in stable polymorphisms or for sex chromosomes, faster evolutionary change. In addition, sex-dependent selection may result in antagonistic pleiotropy or sexually antagonistic genes. There are many examples of sex-specific differences in gene flow (dispersal) and a number of adaptive explanations for these differences. The overall effective population size (genetic drift) is dominated by the lower sex-specific effective population size. The mean of the mutation, recombination, and gene flow rates over the two sexes can be used in a population genetics context unless there are sex-specific differences in selection or genetic drift. Sex-specific differences in these evolutionary factors appear to be unrelated to each other. The evolutionary explanations for sex-specific differences for each factor are multifaceted and, in addition, explanations may include chance, nonadaptive differences, or mechanistic, nonevolutionary factors.  相似文献   

7.
Here, I provide the first direct estimate of the spontaneous mutation rate in an Old World monkey, using a seven individual, three‐generation pedigree of African green monkeys. Eight de novo mutations were identified within ~1.5 Gbp of accessible genome, corresponding to an estimated point mutation rate of 0.94 × 10?8 per site per generation, suggesting an effective population size of ~12000 for the species. This estimation represents a significant improvement in our knowledge of the population genetics of the African green monkey, one of the most important nonhuman primate models in biomedical research. Furthermore, by comparing mutation rates in Old World monkeys with the only other direct estimates in primates to date–humans and chimpanzees–it is possible to uniquely address how mutation rates have evolved over longer time scales. While the estimated spontaneous mutation rate for African green monkeys is slightly lower than the rate of 1.2 × 10?8 per base pair per generation reported in chimpanzees, it is similar to the lower range of rates of 0.96 × 10?8–1.28 × 10?8 per base pair per generation recently estimated from whole genome pedigrees in humans. This result suggests a long‐term constraint on mutation rate that is quite different from similar evidence pertaining to recombination rate evolution in primates.  相似文献   

8.
Data from HIV and from human neoplastic cells can show substantial between-lineage mutation rate variation even within a single population. Such variation may affect estimators of population quantities such as Theta = 4N(e)mu. Using simulated DNA data, I measured the effect of rate variation on recovery of Theta by the summary-statistic estimator of Watterson (Watterson GA. 1975. On the number of segregating sites in genetical systems without recombination. Theor Popul Biol. 7:256-276) and the coalescent maximum likelihood algorithm LAMARC (Kuhner MK. 2006. LAMARC 2.0: maximum likelihood and Bayesian estimation of population parameters. Bioinformatics. Advance Access doi: 10.1093/bioinformatics/btk051). Watterson's estimator showed a downward bias, as expected, with high values of Theta. LAMARC's mean estimate was accurate for all tested values of Theta and rate variation except for a downward bias when rate variation was maximal (i.e., the slow rate was zero). LAMARC had consistently narrower confidence intervals (CIs) than Watterson's estimator. Both methods tended to reject the truth too often when rate variation was 8x or greater and independent among branches, as well as when variation was 4x or greater and correlated among branches. In the case of Watterson's estimate, this excess rejection was fully attributable to variation among genealogies in the amount of total branch length associated with the fast and slow rates. However, in the case of LAMARC, some excess rejection was still observed even when between-genealogy variation was taken into account. Both estimators are robust to modest rate variation; however, their use should be coupled with a statistical test to rule out extreme rate variation as the resulting CIs may not be reliable.  相似文献   

9.
Nowadays, the population genetics analysis of autopolyploid species faces many difficulties due to (i) limited development of population genetics tools under polysomic inheritance, (ii) difficulties to assess allelic dosage when genotyping individuals and (iii) a form of inbreeding resulting from the mechanism of ‘double reduction’. Consequently, few data analysis computer programs are applicable to autopolyploids. To contribute bridging this gap, this article first derives theoretical expectations for the inbreeding and identity disequilibrium coefficients under polysomic inheritance in a mixed mating model. Moment estimators of these coefficients are proposed when exact genotypes or just markers phenotypes (i.e. allelic dosage unknown) are available. This led to the development of estimators of the selfing rate based on adult genotypes or phenotypes and applicable to any even‐ploidy level. Their statistical performances and robustness were assessed by numerical simulations. Contrary to inbreeding‐based estimators, the identity disequilibrium‐based estimator using phenotypes is robust (absolute bias generally < 0.05), even in the presence of double reduction, null alleles or biparental inbreeding due to isolation by distance. A fairly good precision of the selfing rate estimates (root mean squared error < 0.1) is already achievable using a sample of 30–50 individuals phenotyped at 10 loci bearing 5–10 alleles each, conditions reachable using microsatellite markers. Diallelic markers (e.g. SNP) can also perform satisfactorily in diploids and tetraploids but more polymorphic markers are necessary for higher ploidy levels. The method is implemented in the software SPAGeDi and should contribute to reduce the lack of population genetics tools applicable to autopolyploids.  相似文献   

10.
Rates of recombination vary considerably between species. Despite the significance of this observation for evolutionary biology and genetics, the evolutionary mechanisms that contribute to these interspecific differences are unclear. On fine physical scales, recombination rates appear to evolve rapidly between closely related species, but the mode and tempo of recombination rate evolution on the broader scale is poorly understood. Here, we use phylogenetic comparative methods to begin to characterize the evolutionary processes underlying average genomic recombination rates in mammals. We document a strong phylogenetic effect in recombination rates, indicating that more closely related species tend to have more similar average rates of recombination. We demonstrate that this phylogenetic signal is not an artifact of errors in recombination rate estimation and show that it is robust to uncertainty in the mammalian phylogeny. Neutral evolutionary models present good fits to the data and we find no evidence for heterogeneity in the rate of evolution in recombination across the mammalian tree. These results suggest that observed interspecific variation in average genomic rates of recombination is largely attributable to the steady accumulation of neutral mutations over evolutionary time. Although single recombination hotspots may live and die on short evolutionary time scales, the strong phylogenetic signal in genomic recombination rates indicates that the pace of evolution on this scale may be considerably slower.  相似文献   

11.
Cutter AD 《Genetics》2008,178(3):1661-1672
Natural selection and neutral processes such as demography, mutation, and gene conversion all contribute to patterns of polymorphism within genomes. Identifying the relative importance of these varied components in evolution provides the principal challenge for population genetics. To address this issue in the nematode Caenorhabditis remanei, I sampled nucleotide polymorphism at 40 loci across the X chromosome. The site-frequency spectrum for these loci provides no evidence for population size change, and one locus presents a candidate for linkage to a target of balancing selection. Selection for codon usage bias leads to the non-neutrality of synonymous sites, and despite its weak magnitude of effect (N(e)s approximately 0.1), is responsible for profound patterns of diversity and divergence in the C. remanei genome. Although gene conversion is evident for many loci, biased gene conversion is not identified as a significant evolutionary process in this sample. No consistent association is observed between synonymous-site diversity and linkage-disequilibrium-based estimators of the population recombination parameter, despite theoretical predictions about background selection or widespread genetic hitchhiking, but genetic map-based estimates of recombination are needed to rigorously test for a diversity-recombination relationship. Coalescent simulations also illustrate how a spurious correlation between diversity and linkage-disequilibrium-based estimators of recombination can occur, due in part to the presence of unbiased gene conversion. These results illustrate the influence that subtle natural selection can exert on polymorphism and divergence, in the form of codon usage bias, and demonstrate the potential of C. remanei for detecting natural selection from genomic scans of polymorphism.  相似文献   

12.
There has been considerable recent interest in understanding the way in which recombination rates vary over small physical distances, and the extent of recombination hotspots, in various genomes. Here we adapt, apply, and assess the power of recently developed coalescent-based approaches to estimating recombination rates from sequence polymorphism data. We apply full-likelihood estimation to study rate variation in and around a well-characterized recombination hotspot in humans, in the beta-globin gene cluster, and show that it provides similar estimates, consistent with those from sperm studies, from two populations deliberately chosen to have different demographic and selectional histories. We also demonstrate how approximate-likelihood methods can be used to detect local recombination hotspots from genomic-scale SNP data. In a simulation study based on 80 100-kb regions, these methods detect 43 out of 60 hotspots (ranging from 1 to 2 kb in size), with only two false positives out of 2000 subregions that were tested for the presence of a hotspot. Our study suggests that new computational tools for sophisticated analysis of population diversity data are valuable for hotspot detection and fine-scale mapping of local recombination rates.  相似文献   

13.
We consider the estimation of the scaled mutation parameter θ, which is one of the parameters of key interest in population genetics. We provide a general result showing when estimators of θ can be improved using shrinkage when taking the mean squared error as the measure of performance. As a consequence, we show that Watterson’s estimator is inadmissible, and propose an alternative shrinkage-based estimator that is easy to calculate and has a smaller mean squared error than Watterson’s estimator for all possible parameter values 0<θ<. This estimator is admissible in the class of all linear estimators. We then derive improved versions for other estimators of θ, including the MLE. We also investigate how an improvement can be obtained both when combining information from several independent loci and when explicitly taking into account recombination. A simulation study provides information about the amount of improvement achieved by our alternative estimators.  相似文献   

14.
Y Raynes  P D Sniegowski 《Heredity》2014,113(5):375-380
Because genes that affect mutation rates are themselves subject to mutation, mutation rates can be influenced by natural selection and other evolutionary forces. The population genetics of mutation rate modifier alleles has been a subject of theoretical interest for many decades. Here, we review experimental contributions to our understanding of mutation rate modifier dynamics. Numerous evolution experiments have shown that mutator alleles (modifiers that elevate the genomic mutation rate) can readily rise to high frequencies via genetic hitchhiking in non-recombining microbial populations. Whereas these results certainly provide an explanatory framework for observations of sporadically high mutation rates in pathogenic microbes and in cancer lineages, it is nonetheless true that most natural populations have very low mutation rates. This raises the interesting question of how mutator hitchhiking is suppressed or its phenotypic effect reversed in natural populations. Very little experimental work has addressed this question; with this in mind, we identify some promising areas for future experimental investigation.  相似文献   

15.
McVean G  Awadalla P  Fearnhead P 《Genetics》2002,160(3):1231-1241
Determining the amount of recombination in the genealogical history of a sample of genes is important to both evolutionary biology and medical population genetics. However, recurrent mutation can produce patterns of genetic diversity similar to those generated by recombination and can bias estimates of the population recombination rate. Hudson 2001 has suggested an approximate-likelihood method based on coalescent theory to estimate the population recombination rate, 4N(e)r, under an infinite-sites model of sequence evolution. Here we extend the method to the estimation of the recombination rate in genomes, such as those of many viruses and bacteria, where the rate of recurrent mutation is high. In addition, we develop a powerful permutation-based method for detecting recombination that is both more powerful than other permutation-based methods and robust to misspecification of the model of sequence evolution. We apply the method to sequence data from viruses, bacteria, and human mitochondrial DNA. The extremely high level of recombination detected in both HIV1 and HIV2 sequences demonstrates that recombination cannot be ignored in the analysis of viral population genetic data.  相似文献   

16.
Recombination has the potential to facilitate adaptation. In spite of the substantial body of theory on the impact of recombination on the evolutionary dynamics of adapting populations, empirical evidence to test these theories is still scarce. We examined the effect of recombination on adaptation on a large-scale empirical fitness landscape in HIV-1 based on in vitro fitness measurements. Our results indicate that recombination substantially increases the rate of adaptation under a wide range of parameter values for population size, mutation rate and recombination rate. The accelerating effect of recombination is stronger for intermediate mutation rates but increases in a monotonic way with the recombination rates and population sizes that we examined. We also found that both fitness effects of individual mutations and epistatic fitness interactions cause recombination to accelerate adaptation. The estimated epistasis in the adapting populations is significantly negative. Our results highlight the importance of recombination in the evolution of HIV-I.  相似文献   

17.
Volz EM 《Genetics》2012,190(1):187-201
Estimates of the coalescent effective population size N(e) can be poorly correlated with the true population size. The relationship between N(e) and the population size is sensitive to the way in which birth and death rates vary over time. The problem of inference is exacerbated when the mechanisms underlying population dynamics are complex and depend on many parameters. In instances where nonparametric estimators of N(e) such as the skyline struggle to reproduce the correct demographic history, model-based estimators that can draw on prior information about population size and growth rates may be more efficient. A coalescent model is developed for a large class of populations such that the demographic history is described by a deterministic nonlinear dynamical system of arbitrary dimension. This class of demographic model differs from those typically used in population genetics. Birth and death rates are not fixed, and no assumptions are made regarding the fraction of the population sampled. Furthermore, the population may be structured in such a way that gene copies reproduce both within and across demes. For this large class of models, it is shown how to derive the rate of coalescence, as well as the likelihood of a gene genealogy with heterochronous sampling and labeled taxa, and how to simulate a coalescent tree conditional on a complex demographic history. This theoretical framework encapsulates many of the models used by ecologists and epidemiologists and should facilitate the integration of population genetics with the study of mathematical population dynamics.  相似文献   

18.
The prevailing wisdom of the plant mitochondrial genome is that it has very low substitution rates, thus it is generally assumed that nucleotide diversity within species will also be low. However, recent evidence suggests plant mitochondrial genes may harbor variable and sometimes high levels of within-species polymorphism, a result attributed to variance in the influence of selection. However, insufficient attention has been paid to the effect of among-gene variation in mutation rate on varying levels of polymorphism across loci. We measured levels of polymorphism in seven mitochondrial gene regions across a geographically wide sample of the plant Silene vulgaris to investigate whether individual mitochondrial genes accumulate polymorphisms equally. We found that genes vary significantly in polymorphism. Tests based on coalescence theory show that the genes vary significantly in their scaled mutation rate, which, in the absence of differences among genes in effective population size, suggests these genes vary in their underlying mutation rate. Further evidence that among-gene variance in polymorphism is due to variation in the underlying mutation rate comes from a significant positive relationship between the number of segregating sites and silent site divergence from an outgroup. Contrary to recent studies, we found unconvincing evidence of recombination in the mitochondrial genome, and generally confirm the standard model of plant mitochondria characterized by low substitution rates and no recombination. We also show no evidence of significant variation in the strength or direction of selection among genes; this result may be expected if there is no recombination. The present study provides some of the most thorough data on plant mitochondrial polymorphism, and provides compelling evidence for mutation rate variation among genes. The study also demonstrates the difficulty in establishing a null model of mitochondrial genome polymorphism, and thus the difficulty, in the absence of a comparative approach, in testing the assumption that low substitution rates in plant mitochondria lead to low polymorphism.  相似文献   

19.
Extensive data from multilocus electrophoresis are available for many bacterial populations. In some cases, for example Neisseria gonorrhoeae, these data are consistent with the population being in linkage equilibrium. This raises the following question. What frequency of transformation, or other means of genetic recombination, is needed, relative to mutation, to produce apparent panmixis? Simulation of a finite-population model suggests that, if transformation is at least twenty times as frequent as mutation, the population structure will be indistinguishable from a panmictic one, using the best available data sets. That is, relatively infrequent transformation is sufficient to produce approximate linkage equilibrium.  相似文献   

20.
According to population genetics models, genomic regions with lower crossing-over rates are expected to experience less effective selection because of Hill-Robertson interference (HRi). The effect of genetic linkage is thought to be particularly important for a selection of weak intensity such as selection affecting codon usage. Consistent with this model, codon bias correlates positively with recombination rate in Drosophila melanogaster and Caenorhabditis elegans. However, in these species, the G+C content of both noncoding DNA and synonymous sites correlates positively with recombination, which suggests that mutation patterns and recombination are associated. To remove this effect of mutation patterns on codon bias, we used the synonymous sites of lowly expressed genes that are expected to be effectively neutral sites. We measured the differences between codon biases of highly expressed genes and their lowly expressed neighbors. In D. melanogaster we find that HRi weakly reduces selection on codon usage of genes located in regions of very low recombination; but these genes only comprise 4% of the total. In C. elegans we do not find any evidence for the effect of recombination on selection for codon bias. Computer simulations indicate that HRi poorly enhances codon bias if the local recombination rate is greater than the mutation rate. This prediction of the model is consistent with our data and with the current estimate of the mutation rate in D. melanogaster. The case of C. elegans, which is highly self-fertilizing, is discussed. Our results suggest that HRi is a minor determinant of variations in codon bias across the genome.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号