首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 234 毫秒
The multispecies coalescent (MSC) model accommodates both species divergences and within-species coalescent and provides a natural framework for phylogenetic analysis of genomic data when the gene trees vary across the genome. The MSC model implemented in the program bpp assumes a molecular clock and the Jukes–Cantor model, and is suitable for analyzing genomic data from closely related species. Here we extend our implementation to more general substitution models and relaxed clocks to allow the rate to vary among species. The MSC-with-relaxed-clock model allows the estimation of species divergence times and ancestral population sizes using genomic sequences sampled from contemporary species when the strict clock assumption is violated, and provides a simulation framework for evaluating species tree estimation methods. We conducted simulations and analyzed two real datasets to evaluate the utility of the new models. We confirm that the clock-JC model is adequate for inference of shallow trees with closely related species, but it is important to account for clock violation for distant species. Our simulation suggests that there is valuable phylogenetic information in the gene-tree branch lengths even if the molecular clock assumption is seriously violated, and the relaxed-clock models implemented in bpp are able to extract such information. Our Markov chain Monte Carlo algorithms suffer from mixing problems when used for species tree estimation under the relaxed clock and we discuss possible improvements. We conclude that the new models are currently most effective for estimating population parameters such as species divergence times when the species tree is fixed.  相似文献   

The multispecies coalescent model provides a natural framework for species tree estimation accounting for gene-tree conflicts. Although a number of species tree methods under the multispecies coalescent have been suggested and evaluated using simulation, their statistical properties remain poorly understood. Here, we use mathematical analysis aided by computer simulation to examine the identifiability, consistency, and efficiency of different species tree methods in the case of three species and three sequences under the molecular clock. We consider four major species-tree methods including concatenation, two-step, independent-sites maximum likelihood, and maximum likelihood. We develop approximations that predict that the probit transform of the species tree estimation error decreases linearly with the square root of the number of loci. Even in this simplest case, major differences exist among the methods. Full-likelihood methods are considerably more efficient than summary methods such as concatenation and two-step. They also provide estimates of important parameters such as species divergence times and ancestral population sizes,whereas these parameters are not identifiable by summary methods. Our results highlight the need to improve the statistical efficiency of summary methods and the computational efficiency of full likelihood methods of species tree estimation.  相似文献   

There are currently 25 recognized species of the chipmunk genus Tamias. In this study we sequenced the complete mitochondrial cytochrome b (cyt b) gene of 23 Tamias species. We analyzed the cyt b sequence and then analyzed a combined data set of cyt b along with a previous data set of cytochrome oxidase subunit II (COII) sequence. Maximum-likelihood was used to further test the fit of models of evolution to the cyt b data. Other sciurid cyt b sequence was added to examine the evolution of Tamias in the context of other sciurids. Relationships among Tamias species are discussed, particularly the possibility of a current sorting event among taxa of the southwestern United States and the extreme divergences among the three subgenera (Neotamias, Eutamias, and Tamias).  相似文献   

The Kingman coalescent, which has become the foundation for a wide range of theoretical as well as empirical studies, was derived as an approximation of the Wright-Fisher (WF) model. The approximation heavily relies on the assumption that population size is large and sample size is much smaller than the population size. Whether the sample size is too large compared to the population size is rarely questioned in practice when applying statistical methods based on the Kingman coalescent. Since WF model is the most widely used population genetics model for reproduction, it is desirable to develop a coalescent framework for the WF model, which can be used whenever there are concerns about the accuracy of the Kingman coalescent as an approximation. This paper described the exact coalescent theory for the WF model and develops a simulation algorithm, which is then used, together with an analytical approach, to study the properties of the exact coalescent as well as its differences to the Kingman coalescent. We show that the Kingman coalescent differs from the exact coalescent by: (1) shorter waiting time between successive coalescent events; (2) different probability of observing a topological relationship among sequences in a sample; and (3) slightly smaller tree length in the genealogy of a large sample. On the other hand, there is little difference in the age of the most recent common ancestor (MRCA) of the sample. The exact coalescent makes up the longer waiting time between successive coalescent events by having multiple coalescence at the same time. The most significant difference among various summary statistics of a coalescent examined is the sum of lengths of external branches, which can be more than 10% larger for exact coalescent than that for the Kingman coalescent. As a whole, the Kingman coalescent is a remarkably accurate approximation to the exact coalescent for sample and population sizes falling considerably outside the region that was originally anticipated.  相似文献   

Many East Asian human populations harbor a high-frequency deficiency allele for the aldehyde dehydrogenase 2 (ALDH2) enzyme, a critical protein involved in the metabolism of ethanol. Here we use resequencing and long-range SNP haplotype data from a Japanese sample to test whether patterns of nucleotide diversity and linkage disequilibrium at this locus are compatible with a standard neutral model of evolution. Examination of the pattern of polymorphism at a locus such as this, where the frequency of a common allele is known a priori, introduces an ascertainment bias that must be corrected for in analyses of the frequency spectrum of polymorphisms. We apply a flexible and generally applicable simulation approach to correct for this bias in our ALDH2 data and, also, to explore the effect of bias on the commonly used summary statistics Tajima’s D, Fu and Li’s D, and Fay and Wu’s H. Our study finds no evidence that the pattern of genetic variation at ALDH2 differs from that expected under a standard neutral model. However, our general examination of ascertainment bias indicates that a priori knowledge of segregating alleles greatly affects the expected distributions of summary statistics. Under many parameter combinations we find that ascertainment bias introduces an elevated rate of false positives when summary statistics are used to test for deviations from a standard neutral model. However, we also show that over a wide range of conditions the power of all summary statistics can be greatly increased by incorporating prior knowledge of segregating alleles. [Reviewing Editor: Dr. Martin Kreitman]  相似文献   

The enchytraeid genus Lumbricillus comprises about 80 described species of clitellate worms, which are up to a few centimetres long, and they mostly inhabit the littoral zone of non‐tropical marine and brackish waters world‐wide. The phylogeny of this genus is poorly studied, but previous work has suggested that Lumbricillus is a non‐monophyletic group. In this study, species boundaries and the phylogeny of this genus is re‐assessed using more than 300 DNA‐barcoded specimens (using COI mtDNA), part of which was also sequenced for two additional mitochondrial and four nuclear molecular markers. Statistical and coalescent based applications were used for the delimitation of a total of 24 species, of which 20 were identified as belonging to 17 described morphospecies; one morphospecies was found to be a complex of four delimited species, and another four delimited species could not be matched with any described species. Furthermore, gene trees, concatenation and multispecies coalescent based species trees were estimated using Bayesian inference. The estimated phylogenies confirm a non‐monophyletic Lumbricillus as L. semifuscus is clearly excluded from the genus. Furthermore, the placement of a monophyletic clade consisting of L. arenarius, L. dubius, and an unidentified species varies between analyses; they are either found as the sister‐group to the genus Grania or as sister‐group to the remaining Lumbricillus, where the latter relationship is supported by the multispecies coalescent, which we consider as the most reliable method.  相似文献   

Wu CH  Drummond AJ 《Genetics》2011,188(1):151-164
We provide a framework for Bayesian coalescent inference from microsatellite data that enables inference of population history parameters averaged over microsatellite mutation models. To achieve this we first implemented a rich family of microsatellite mutation models and related components in the software package BEAST. BEAST is a powerful tool that performs Bayesian MCMC analysis on molecular data to make coalescent and evolutionary inferences. Our implementation permits the application of existing nonparametric methods to microsatellite data. The implemented microsatellite models are based on the replication slippage mechanism and focus on three properties of microsatellite mutation: length dependency of mutation rate, mutational bias toward expansion or contraction, and number of repeat units changed in a single mutation event. We develop a new model that facilitates microsatellite model averaging and Bayesian model selection by transdimensional MCMC. With Bayesian model averaging, the posterior distributions of population history parameters are integrated across a set of microsatellite models and thus account for model uncertainty. Simulated data are used to evaluate our method in terms of accuracy and precision of estimation and also identification of the true mutation model. Finally we apply our method to a red colobus monkey data set as an example.  相似文献   

The study of sequence diversity under phylogenetic models is now classic. Theoretical studies of diversity under the Kingman coalescent appeared shortly after the introduction of the coalescent. In this paper we revisit this topic under the multispecies coalescent, an extension of the single population model to multiple populations. We derive exact formulas for the sequence dissimilarity of two sequences drawn at random under a basic multispecies setup. The multispecies model uses three parameters—the species tree birth rate under the pure birth process (Yule), the species effective population size and the mutation rate. We also discuss the effects of relaxing some of the model assumptions.  相似文献   

Hybridization and introgression have important consequences in evolution, such as increasing the genetic diversity and adaptive potential of a species. One of their most conspicuous footprints is discordance among gene trees or between genes and phenotypes. However, most studies that report introgression fail to disprove the null hypothesis that genetic incongruence may result from stochastic sorting of ancestral allelic polymorphisms. In the case of ancient introgression, these two processes may be especially difficult to distinguish topologically, but they make different predictions about the patterns of coalescence among loci. Here we apply three methods, molecular dating, multispecies coalescent models, and gene tree simulation under coalescence, to compare these two hypotheses that explain the polyphyletic mtDNA of the butterfly peacock bass, Cichla orinocensis. In comparison with a species tree based on 20 unlinked nuclear loci, we determined that mtDNA divergences were too recent to be explained by ancestral polymorphism. Similarly, coalescent species tree branches were significantly shorter when putative introgressed mtDNA was incorporated, and simulations showed the mtDNA topology to be unlikely under lineage sorting only. We conclude that introgression approximately 1.5 million years ago resulted in capture by C. orinocensis of an mtDNA lineage ancestral to the modern subspecies C. oc. monoculus.  相似文献   

Assessing the relative role of evolutionary processes on genetic diversity is critical for understanding species response to climatic change. However, many processes, independent of climate, can lead to the same genetic pattern. Because effective population size and gene flow are affected directly by abundance and dispersal, population ecology has the potential to profoundly influence patterns of genetic variation over microevolutionary timescales. Here, we use aDNA data and simulations to explore the influence of population ecology and Holocene climate change on genetic diversity of the Uinta ground squirrel (Spermophilus armatus). We examined phylochronology from three modern and two ancient populations spanning the climate transitions of the last 3000 years. Population genetic analyses based on summary statistics suggest that changes in genetic diversity and structure coincided with the Medieval Warm Period (MWP), c. 1000 years ago. Serial coalescent simulations allowed us to move beyond correlation with climate to statistically compare the likelihoods of alternative population histories given the observed data. The data best fit source–sink models that include large, mid‐elevation populations that exchange many migrants and small populations at the elevational extremes. While the MWP is likely to have reduced genetic diversity, our model‐testing approach revealed that MWP‐driven changes in genetic structure were not better supported for the range of models explored. Our results point to the importance of species ecology in understanding responses to climate, and showcase the use of ancient genetic data and simulation‐based inference for unraveling the relative roles of microevolutionary processes.  相似文献   

Two monophyletic sister species of wall lizards inhabit the two main groups of Balearic Islands: Podarcis lilfordi from islets and small islands around Mallorca and Menorca and Podarcis pityusensis from Ibiza, Formentera and associated islets. Genetic diversity within the endangered P. lilfordi has been well characterized, but P. pityusensis has not been studied in depth. Here, 2430 bp of mtDNA and 15 microsatellite loci were analysed from Ppityusensis populations from across its natural range. Two main genetic groupings were identified, although geographical structuring differed slightly between the mtDNA and the nuclear loci. In general, individuals from islets/islands adjacent to the main island of Ibiza were genetically distinct from those from Formentera and the associated Freus islands for both mtDNA and the nuclear loci. However, most individuals from the island of Ibiza were grouped with neighbouring islets/islands for nuclear loci, but with Formentera and Freus islands for the mitochondrial locus. A time‐calibrated Bayesian tree was constructed for the principal mitochondrial lineages within the Balearics, using the multispecies coalescent model, and provided statistical support for divergence of the two main Ppityusensis lineages 0.111–0.295 Ma. This suggests a mid‐late Pleistocene intraspecific divergence, compared with an early Pleistocene divergence in P. lilfordi, and postdates some major increases in sea level between 0.4 and 0.6 Ma, which may have flooded Formentera. The program IMa2 provided a posterior divergence time of 0.089–0.221 Ma, which was similar to the multispecies coalescent tree estimate. More significantly, it indicated low but asymmetric effective gene copy migration rates, with higher migration from Formentera to Ibiza populations. Our findings suggest that much of the present‐day diversity may have originated from a late Pleistocene colonization of one island group from the other, followed by allopatric divergence of these populations. Subsequent gene flow between these insular groups seems likely to be explained by recent human introductions. Two evolutionary significant units can be defined for P. pityusensis but these units would need to exclude the populations that have been the subjects of recent admixture.  相似文献   

The multispecies coalescent (MSC) is a statistical framework that models how gene genealogies grow within the branches of a species tree. The field of computational phylogenetics has witnessed an explosion in the development of methods for species tree inference under MSC, owing mainly to the accumulating evidence of incomplete lineage sorting in phylogenomic analyses. However, the evolutionary history of a set of genomes, or species, could be reticulate due to the occurrence of evolutionary processes such as hybridization or horizontal gene transfer. We report on a novel method for Bayesian inference of genome and species phylogenies under the multispecies network coalescent (MSNC). This framework models gene evolution within the branches of a phylogenetic network, thus incorporating reticulate evolutionary processes, such as hybridization, in addition to incomplete lineage sorting. As phylogenetic networks with different numbers of reticulation events correspond to points of different dimensions in the space of models, we devise a reversible-jump Markov chain Monte Carlo (RJMCMC) technique for sampling the posterior distribution of phylogenetic networks under MSNC. We implemented the methods in the publicly available, open-source software package PhyloNet and studied their performance on simulated and biological data. The work extends the reach of Bayesian inference to phylogenetic networks and enables new evolutionary analyses that account for reticulation.  相似文献   

Occupancy models (Ecology, 2002; 83: 2248) were developed to infer the probability that a species under investigation occupies a site. Bayesian analysis of these models can be undertaken using statistical packages such as WinBUGS, OpenBUGS, JAGS, and more recently Stan, however, since these packages were not developed specifically to fit occupancy models, one often experiences long run times when undertaking an analysis. Bayesian spatial single‐season occupancy models can also be fit using the R package stocc. The approach assumes that the detection and occupancy regression effects are modeled using probit link functions. The use of the logistic link function, however, is algebraically more tractable and allows one to easily interpret the coefficient effects of an estimated model by using odds ratios, which is not easily done for a probit link function for models that do not include spatial random effects. We develop a Gibbs sampler to obtain posterior samples from the posterior distribution of the parameters of various occupancy models (nonspatial and spatial) when logit link functions are used to model the regression effects of the detection and occupancy processes. We apply our methods to data extracted from the 2nd Southern African Bird Atlas Project to produce a species distribution map of the Cape weaver (Ploceus capensis) and helmeted guineafowl (Numida meleagris) for South Africa. We found that the Gibbs sampling algorithm developed produces posterior samples that are identical to those obtained when using JAGS and Stan and that in certain cases the posterior chains mix much faster than those obtained when using JAGS, stocc, and Stan. Our algorithms are implemented in the R package, Rcppocc. The software is freely available and stored on GitHub ( https://github.com/AllanClark/Rcppocc ).  相似文献   

We examine genetic statistics used in the study of structured populations. In a 1999 paper, Wakeley observed that the coalescent process associated with the finite island model can be decomposed into a scattering phase and a collecting phase. This decomposition becomes exact in the large population limit with the coalescent at the end of the scattering phase converging to the Ewens sampling formula and the coalescent during the collecting phase converging to the Kingman coalescent. In this paper we introduce a class of limiting models, which we refer to as G/KC models, that generalize Wakeley’s decomposition. G in G/KC represents a completely general limit for the scattering phase, while KC represents a Kingman coalescent limit for the collecting phase. We show that both the island and two-dimensional stepping stone models converge to G/KC models in the large population limit. We then derive the distribution of the statistic F st for all G/KC models under a large sample limit for the cases of strong or weak mutation, thereby deriving the large population, large sample limiting distribution of F st for the island and two-dimensional stepping stone models as a special case of a general formula. Our methods allow us to take the large population and large sample limits simultaneously. In the context of large population, large sample limits, we show that the variance of F st in the presence of weak mutation collapses as O(\frac1logd){O(\frac{1}{\log d})} where d is the number of demes sampled. Further, we show that this O(\frac1logd){O(\frac{1}{\log d})} is caused by a heavy tail in the distribution of F st . Our analysis of F st can be extended to an entire class of genetic statistics, and we use our approach to examine homozygosity measures. Our analysis uses coalescent based methods.  相似文献   

Genome-scale sequence data have become increasingly available in the phylogenetic studies for understanding the evolutionary histories of species. However, it is challenging to develop probabilistic models to account for heterogeneity of phylogenomic data. The multispecies coalescent model describes gene trees as independent random variables generated from a coalescence process occurring along the lineages of the species tree. Since the multispecies coalescent model allows gene trees to vary across genes, coalescent-based methods have been popularly used to account for heterogeneous gene trees in phylogenomic data analysis. In this paper, we summarize and evaluate the performance of coalescent-based methods for estimating species trees from genome-scale sequence data. We investigate the effects of deep coalescence and mutation on the performance of species tree estimation methods. We found that the coalescent-based methods perform well in estimating species trees for a large number of genes, regardless of the degree of deep coalescence and mutation. The performance of the coalescent methods is negatively correlated with the lengths of internal branches of the species tree.  相似文献   



Several phylogenetic approaches have been developed to estimate species trees from collections of gene trees. However, maximum likelihood approaches for estimating species trees under the coalescent model are limited. Although the likelihood of a species tree under the multispecies coalescent model has already been derived by Rannala and Yang, it can be shown that the maximum likelihood estimate (MLE) of the species tree (topology, branch lengths, and population sizes) from gene trees under this formula does not exist. In this paper, we develop a pseudo-likelihood function of the species tree to obtain maximum pseudo-likelihood estimates (MPE) of species trees, with branch lengths of the species tree in coalescent units.  相似文献   

The genus Myotis (Vespertilionidae, Myotinae) comprises a diverse group of small to large-sized vespertilionid bats that present a worldwide distribution. Twelve South American species are currently recognized. In this paper we evaluate the morphological and morphometric variation observed in South American populations of the most widespread species, Myotis nigricans. Against this background, two forms can be morphologically distinguished from M. nigricans and other known South American species. We describe these new species, documenting their diagnostic external and cranial characters by comparing them to other sympatric and cryptic species of South American Myotis. In addition, we provide an emended diagnosis of Myotis nigricans.  相似文献   

Localized up‐down altitudinal shifts and subsequent isolation–admixture of montane species in response to glacial cycles has been proposed as a mechanism for the high diversity along Anatolian mountains. However, specific predictions of the proposed mechanism (the elevation shift model) have yet to be tested. Here, we provide a first assessment of this model for promoting inter‐ and intraspecific genetic diversity in the bush‐cricket genus Phonochorion endemic to the West Lesser Caucasus hotspot. Mitochondrial genes were analysed by Bayesian Markov Chain Monte Carlo inferences and coalescent simulations. Timing of diversification was estimated using a multispecies coalescent model. Divergence with gene flow was tested using an isolation with migration model. Population genetic parameters and genetic structuring were determined using Bayesian coalescent methods and spatial analysis. Demographic history was assessed using mismatch distributions and extended Bayesian skyline plots. Speciation events corresponded both to the Miocene and Pleistocene while intraspecific divergence was Pleistocene based. There was evidence for moderate levels of gene flow between species during diversification; however, incomplete lineage sorting could explain the data as well as gene flow. Overall diversification patterns within the genus Phonochorion agree with the predictions of the elevations shift model. Genetic patterns of diversification were driven mainly by Pleistocene glacial cycles and reflected the nature and distribution of sky islands. There was also some albeit weak evidence of demographic expansions coinciding with glacial cooling. However, evidence for divergence with gene flow was inconclusive.  相似文献   

The Kingman coalescent and its developments are often considered among the most important advances in population genetics of the last decades. Demographic inference based on coalescent theory has been used to reconstruct the population dynamics and evolutionary history of several species, including Mycobacterium tuberculosis (MTB), an important human pathogen causing tuberculosis. One key assumption of the Kingman coalescent is that the number of descendants of different individuals does not vary strongly, and violating this assumption could lead to severe biases caused by model misspecification. Individual lineages of MTB are expected to vary strongly in reproductive success because 1) MTB is potentially under constant selection due to the pressure of the host immune system and of antibiotic treatment, 2) MTB undergoes repeated population bottlenecks when it transmits from one host to the next, and 3) some hosts show much higher transmission rates compared with the average (superspreaders).Here, we used an approximate Bayesian computation approach to test whether multiple-merger coalescents (MMC), a class of models that allow for large variation in reproductive success among lineages, are more appropriate models to study MTB populations. We considered 11 publicly available whole-genome sequence data sets sampled from local MTB populations and outbreaks and found that MMC had a better fit compared with the Kingman coalescent for 10 of the 11 data sets. These results indicate that the null model for analyzing MTB outbreaks should be reassessed and that past findings based on the Kingman coalescent need to be revisited.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号