共查询到20条相似文献,搜索用时 0 毫秒
1.
The multispecies coalescent (MSC) is a statistical framework that models how gene genealogies grow within the branches of a species tree. The field of computational phylogenetics has witnessed an explosion in the development of methods for species tree inference under MSC, owing mainly to the accumulating evidence of incomplete lineage sorting in phylogenomic analyses. However, the evolutionary history of a set of genomes, or species, could be reticulate due to the occurrence of evolutionary processes such as hybridization or horizontal gene transfer. We report on a novel method for Bayesian inference of genome and species phylogenies under the multispecies network coalescent (MSNC). This framework models gene evolution within the branches of a phylogenetic network, thus incorporating reticulate evolutionary processes, such as hybridization, in addition to incomplete lineage sorting. As phylogenetic networks with different numbers of reticulation events correspond to points of different dimensions in the space of models, we devise a reversible-jump Markov chain Monte Carlo (RJMCMC) technique for sampling the posterior distribution of phylogenetic networks under MSNC. We implemented the methods in the publicly available, open-source software package PhyloNet and studied their performance on simulated and biological data. The work extends the reach of Bayesian inference to phylogenetic networks and enables new evolutionary analyses that account for reticulation. 相似文献
2.
3.
Ori Sargsyan 《Genetics》2010,185(4):1355-1368
The general coalescent tree framework is a family of models for determining ancestries among random samples of DNA sequences at a nonrecombining locus. The ancestral models included in this framework can be derived under various evolutionary scenarios. Here, a computationally tractable full-likelihood-based inference method for neutral polymorphisms is presented, using the general coalescent tree framework and the infinite-sites model for mutations in DNA sequences. First, an exact sampling scheme is developed to determine the topologies of conditional ancestral trees. However, this scheme has some computational limitations and to overcome these limitations a second scheme based on importance sampling is provided. Next, these schemes are combined with Monte Carlo integrations to estimate the likelihood of full polymorphism data, the ages of mutations in the sample, and the time of the most recent common ancestor. In addition, this article shows how to apply this method for estimating the likelihood of neutral polymorphism data in a sample of DNA sequences completely linked to a mutant allele of interest. This method is illustrated using the data in a sample of DNA sequences at the APOE gene locus.THE interest in analyzing polymorphism data in contemporary samples of DNA sequences under various evolutionary scenarios creates a demand to design computationally tractable full-likelihood-based inference methods. For an evolutionary scenario of interest, an ancestral-mutation model can be used to design such a method. The ancestral-mutation model for a sample of DNA sequences at a nonrecombining locus is a combination of two processes: one is an ancestral process that traces the lineages of the sample back in time until the most recent common ancestor, constructing an ancestral tree for the sample. The second is a mutation process that is superimposed on the ancestral tree. The complexities of ancestral-mutation models make the design of such methods challenging. Full data are used instead of summary statistics, which can result in loss of important information in the data (see Felsenstein 1992; Donnelly and Tavaré 1995). In addition, current methods use specific features of the underlying ancestral-mutation models, so they lose flexibility to be applicable to other ancestral-mutation models.More specifically, Griffiths and Tavaré (1994c, 1995) and Kuhner et al. (1995) developed full-likelihood-based inference methods for neutral polymorphisms at a nonrecombining locus. They used the combinations of the standard coalescent (Kingman 1982a,b,c; Hudson 1983; Tajima 1983) with the finite-sites or infinite-sites (Watterson 1975) models as ancestral-mutation models. Stephens and Donnelly (2000) designed an importance sampling method to estimate the full likelihood of the data using the same settings for the ancestral-mutation models. Hobolth et al. (2008) provided another importance sampling scheme restricted to the infinite-sites model. The last two methods are computationally more efficient than the first two methods, but they lose flexibility to be applicable to ancestral models without standard coalescent features with independent coalescence waiting times, such as the coalescent processes with exponential growth (Slatkin and Hudson 1991; Griffiths and Tavaré 1994b).To incorporate the coalescent processes with exponential growth, Kuhner et al. (1998) and Griffiths and Tavaré (1994a, 1999) extended their previous methods. For example, the method of Griffiths and Tavaré (1994a, 1999) allows one to consider ancestral models based on coalescent processes with variable population sizes. Coop and Griffiths (2004) modified this inference method and made it applicable for analyzing full polymorphism data in a sample of DNA sequences from a nonrecombining locus completely linked to a mutant allele of interest, either neutral or under selection. Additionally, ancestral models have been developed for this type of sample, where the mutant allele is either neutral (Griffiths and Tavaré 1998, 2003; Wiuf and Donnelly 1999; Stephens 2000) or under selection (Slatkin and Rannala 1997; Stephens and Donnelly 2003). The ancestral model of Slatkin and Rannala (1997) is part of a family of ancestral models derived by Thompson (1975), Nee et al. (1994), and Rannala (1997), using a linear birth–death process as an evolutionary process in a population. Although all the ancestral models mentioned above differ in their properties and evolutionary scenarios, they are part of the general coalescent tree framework (Griffiths and Tavaré 1998). Therefore, a computationally tractable full-likelihood-based inference method based on this general framework is of great interest.For a sample of n sequences, an ancestral model in the general coalescent tree framework is described as a bifurcating rooted tree with n − 1 internal nodes and n leaves, where the internal nodes are coalescent events that happen one at a time. The tree is a combination of two independent components: the topology and the branch lengths. The topology of the tree is constructed going backward in time by combining two randomly chosen ancestral lineages of the sample at each node; the branch lengths of the tree are defined by the joint distribution function of the coalescence waiting times. Note that any density function for coalescence waiting times can define an ancestral model in the general coalescent tree framework.The n leaves (and the sequences in the sample) are labeled from 1 to n; and the n − 1 internal nodes of the ancestral tree are labeled from 1 to n − 1 (in order of occurrence of the coalescent events backward in time). Thus, the topology of an ancestral tree is a leaf-labeled bifurcating rooted tree with totally ordered interior vertices. These trees are called topological trees.When using the general coalescent tree framework and the infinite-sites model, an evolutionary process that generates polymorphism data in a sample of DNA sequences can be described in the following way. An ancestral tree is constructed, as described above, and mutations are added independently on different branches of the ancestral tree as Poisson processes with equal rates, θ/2, in which θ is the mutation rate at the locus. Then, at the mutation events, the ancestral sequences of the sample are changed according to the infinite-sites model; that is, each mutation occurs at a site of an ancestral sequence at which no previous mutations occurred. Thus, these changes define polymorphism data.Naively, this probabilistic framework can be used to estimate the likelihood of the full observed data in a sample of n sequences. That is, data sets are simulated independently as described above and each simulated data set is compared to the observed data. The proportion of the simulated data sets that match the observed data is an estimate of the likelihood of the observed data. Although this approach provides an estimate for the likelihood of the observed data, this method is computationally infeasible, because the topologies of the ancestral trees of the generated data sets are sampled from the space of all the possible topological trees with n leaves. This space has size n!(n − 1)!/2n−1 (Edwards 1970), which is huge for moderate values of n. The topologies of the ancestral trees of the generated data sets that match the observed data represent a small portion of that space. Thus, designing a method that samples topologies of the ancestral trees from this subspace can make the method computationally tractable.On the basis of this idea, I use the general coalescent tree framework with the infinite-sites model to develop a computationally tractable full-likelihood-based inference method for polymorphisms in DNA sequences at a nonrecombining locus. First, an exact sampling scheme for topologies of the conditional ancestral trees is developed. This method has some computational limitations, so to overcome these limitations a second scheme based on an importance sampling is provided. These sampling schemes are combined with Monte Carlo integrations to estimate the likelihood of the full data, the ages of the mutations in the sample, and the time of the most recent common ancestor of the sample. I describe an application of this method for neutral polymorphism data in a sample of DNA sequences at a nonrecombining locus that is completely linked to a mutant allele of interest, either neutral or under selection. The method is illustrated using the data in a sample of DNA sequences at the APOE gene locus from Fullerton et al. (2000). 相似文献
4.
Jochen Blath Adrián González Casanova Bjarki Eldon Noemi Kurt Maite Wilke-Berenguer 《Genetics》2015,200(3):921-934
We analyze patterns of genetic variability of populations in the presence of a large seedbank with the help of a new coalescent structure called the seedbank coalescent. This ancestral process appears naturally as a scaling limit of the genealogy of large populations that sustain seedbanks, if the seedbank size and individual dormancy times are of the same order as those of the active population. Mutations appear as Poisson processes on the active lineages and potentially at reduced rate also on the dormant lineages. The presence of “dormant” lineages leads to qualitatively altered times to the most recent common ancestor and nonclassical patterns of genetic diversity. To illustrate this we provide a Wright–Fisher model with a seedbank component and mutation, motivated from recent models of microbial dormancy, whose genealogy can be described by the seedbank coalescent. Based on our coalescent model, we derive recursions for the expectation and variance of the time to most recent common ancestor, number of segregating sites, pairwise differences, and singletons. Estimates (obtained by simulations) of the distributions of commonly employed distance statistics, in the presence and absence of a seedbank, are compared. The effect of a seedbank on the expected site-frequency spectrum is also investigated using simulations. Our results indicate that the presence of a large seedbank considerably alters the distribution of some distance statistics, as well as the site-frequency spectrum. Thus, one should be able to detect from genetic data the presence of a large seedbank in natural populations. 相似文献
5.
Chaozhi Zheng Mary K. Kuhner Elizabeth A. Thompson 《Journal of molecular evolution》2014,78(5):279-292
We propose a genealogy-sampling algorithm, Sequential Markov Ancestral Recombination Tree (SMARTree), that provides an approach to estimation from SNP haplotype data of the patterns of coancestry across a genome segment among a set of homologous chromosomes. To enable analysis across longer segments of genome, the sequence of coalescent trees is modeled via the modified sequential Markov coalescent (Marjoram and Wall, Genetics 7:16, 2006). To assess performance in estimating these local trees, our SMARTree implementation is tested on simulated data. Our base data set is of the SNPs in 10 DNA sequences over 50 kb. We examine the effects of longer sequences and of more sequences, and of a recombination and/or mutational hotspot. The model underlying SMARTree is an approximation to the full recombinant-coalescent distribution. However, in a small trial on simulated data, recovery of local trees was similar to that of LAMARC (Kuhner et al. Genetics 156:1393-1401, 2000a), a sampler which uses the full model. 相似文献
6.
The inference of demographic history from genome data is hindered by a lack of efficient computational approaches. In particular, it has proved difficult to exploit the information contained in the distribution of genealogies across the genome. We have previously shown that the generating function (GF) of genealogies can be used to analytically compute likelihoods of demographic models from configurations of mutations in short sequence blocks (Lohse et al. 2011). Although the GF has a simple, recursive form, the size of such likelihood calculations explodes quickly with the number of individuals and applications of this framework have so far been mainly limited to small samples (pairs and triplets) for which the GF can be written by hand. Here we investigate several strategies for exploiting the inherent symmetries of the coalescent. In particular, we show that the GF of genealogies can be decomposed into a set of equivalence classes that allows likelihood calculations from nontrivial samples. Using this strategy, we automated blockwise likelihood calculations for a general set of demographic scenarios in Mathematica. These histories may involve population size changes, continuous migration, discrete divergence, and admixture between multiple populations. To give a concrete example, we calculate the likelihood for a model of isolation with migration (IM), assuming two diploid samples without phase and outgroup information. We demonstrate the new inference scheme with an analysis of two individual butterfly genomes from the sister species Heliconius melpomene rosina and H. cydno. 相似文献
7.
The large amount and high quality of genomic data available today enable, in principle, accurate inference of evolutionary histories of observed populations. The Wright-Fisher model is one of the most widely used models for this purpose. It describes the stochastic behavior in time of allele frequencies and the influence of evolutionary pressures, such as mutation and selection. Despite its simple mathematical formulation, exact results for the distribution of allele frequency (DAF) as a function of time are not available in closed analytical form. Existing approximations build on the computationally intensive diffusion limit or rely on matching moments of the DAF. One of the moment-based approximations relies on the beta distribution, which can accurately describe the DAF when the allele frequency is not close to the boundaries (0 and 1). Nonetheless, under a Wright-Fisher model, the probability of being on the boundary can be positive, corresponding to the allele being either lost or fixed. Here we introduce the beta with spikes, an extension of the beta approximation that explicitly models the loss and fixation probabilities as two spikes at the boundaries. We show that the addition of spikes greatly improves the quality of the approximation. We additionally illustrate, using both simulated and real data, how the beta with spikes can be used for inference of divergence times between populations with comparable performance to an existing state-of-the-art method. 相似文献
8.
The coalescent with recombination is a fundamental model to describe the genealogical history of DNA sequence samples from recombining organisms. Considering recombination as a process which acts along genomes and which creates sequence segments with shared ancestry, we study the influence of single recombination events upon tree characteristics of the coalescent. We focus on properties such as tree height and tree balance and quantify analytically the changes in these quantities incurred by recombination in terms of probability distributions. We find that changes in tree topology are often relatively mild under conditions of neutral evolution, while changes in tree height are on average quite large. Our results add to a quantitative understanding of the spatial coalescent and provide the neutral reference to which the impact by other evolutionary scenarios, for instance tree distortion by selective sweeps, can be compared. 相似文献
9.
We observed behaviors and compiled activity budgets of adult Anoplophora glabripennis (Coleoptera: Cerambycidae) given a choice of different species of living trees as hosts in a greenhouse. Frequency of observation of beetles on different tree species provided a good overall indicator of host preference. Beetles were most often observed resting; they were least active early in the day and most active late in the day, but mating was observed with equal frequency during all 4-h time intervals between 0800 and 2400 h. Adults of both sexes were promiscuous, mating repeatedly and with different partners. Males engaged in mate guarding for periods of several hours or more to ensure paternity of the guarded female's progeny. 相似文献
10.
Estimation of epidemiological and population parameters from molecular sequence data has become central to the understanding of infectious disease dynamics. Various models have been proposed to infer details of the dynamics that describe epidemic progression. These include inference approaches derived from Kingman’s coalescent theory. Here, we use recently described coalescent theory for epidemic dynamics to develop stochastic and deterministic coalescent susceptible–infected–removed (SIR) tree priors. We implement these in a Bayesian phylogenetic inference framework to permit joint estimation of SIR epidemic parameters and the sample genealogy. We assess the performance of the two coalescent models and also juxtapose results obtained with a recently published birth–death-sampling model for epidemic inference. Comparisons are made by analyzing sets of genealogies simulated under precisely known epidemiological parameters. Additionally, we analyze influenza A (H1N1) sequence data sampled in the Canterbury region of New Zealand and HIV-1 sequence data obtained from known United Kingdom infection clusters. We show that both coalescent SIR models are effective at estimating epidemiological parameters from data with large fundamental reproductive number R0 and large population size S0. Furthermore, we find that the stochastic variant generally outperforms its deterministic counterpart in terms of error, bias, and highest posterior density coverage, particularly for smaller R0 and S0. However, each of these inference models is shown to have undesirable properties in certain circumstances, especially for epidemic outbreaks with R0 close to one or with small effective susceptible populations. 相似文献
11.
Strategy Space and the Disturbance Spectrum: A Life-History Model for Tree Species Coexistence 总被引:1,自引:0,他引:1
Loehle C 《The American naturalist》2000,156(1):14-33
The disturbance spectrum consists of disturbance patterns differing in type, size, intensity, and frequency. It is proposed that tree life-history traits are adaptations to particular disturbance regimes. Four independent axes are proposed to define the dominant dimensions of tree strategy space: shade tolerance, tree height, capacity for vegetative reproduction, and seed dispersal distance. A fitness model was developed to elucidate interactions between the proposed life-history traits. The model shows how alternate life-history sets can coexist when disturbance patterns fluctuate in space and time. Variable disturbance regimes were shown, based on data and simulation results, to enhance species coexistence, as predicted. The strategy space model accurately predicts the number of common tree species for the eastern United States, boreal Canada, and southwestern pi?on-juniper woodlands. The model also provides an explanation for latitudinal gradients in tree species richness in North America and Europe. The proposed model predicts a relationship between disturbance characteristics and the species composition of a forest that allows for the coexistence of large numbers of species. The life-history traits of size, growth rate, life span, shade tolerance, age of reproduction, seed dispersal distance, and vegetative reproduction are all incorporated into the model. 相似文献
12.
In this paper we develop a Bayesian approach to parameter estimation in a stochastic spatio-temporal model of the spread of invasive species across a landscape. To date, statistical techniques, such as logistic and autologistic regression, have outstripped stochastic spatio-temporal models in their ability to handle large numbers of covariates. Here we seek to address this problem by making use of a range of covariates describing the bio-geographical features of the landscape. Relative to regression techniques, stochastic spatio-temporal models are more transparent in their representation of biological processes. They also explicitly model temporal change, and therefore do not require the assumption that the species' distribution (or other spatial pattern) has already reached equilibrium as is often the case with standard statistical approaches. In order to illustrate the use of such techniques we apply them to the analysis of data detailing the spread of an invasive plant, Heracleum mantegazzianum, across Britain in the 20th Century using geo-referenced covariate information describing local temperature, elevation and habitat type. The use of Markov chain Monte Carlo sampling within a Bayesian framework facilitates statistical assessments of differences in the suitability of different habitat classes for H. mantegazzianum, and enables predictions of future spread to account for parametric uncertainty and system variability. Our results show that ignoring such covariate information may lead to biased estimates of key processes and implausible predictions of future distributions. 相似文献
13.
Plants and animals have responded to past climate changes by migrating with habitable environments, sometimes shifting the boundaries of their geographic ranges by tens of kilometers per year or more. Species migrating in response to present climate conditions, however, must contend with landscapes fragmented by anthropogenic disturbance. We consider this problem in the context of wind-dispersed tree species. Mechanisms of long-distance seed dispersal make these species capable of rapid migration rates. Models of species-front migration suggest that even tree species with the capacity for long-distance dispersal will be unable to keep pace with future spatial changes in temperature gradients, exclusive of habitat fragmentation effects. Here we present a numerical model that captures the salient dynamics of migration by long-distance dispersal for a generic tree species. We then use the model to explore the possible effects of assisted colonization within a fragmented landscape under a simulated tree-planting scheme. Our results suggest that an assisted-colonization program could accelerate species-front migration rates enough to match the speed of climate change, but such a program would involve an environmental-sustainability intervention at a massive scale. 相似文献
14.
15.
Parentage and Sibship Inference From Multilocus Genotype Data Under Polygamy 总被引:1,自引:0,他引:1
下载免费PDF全文

Likelihood methods have been developed to partition individuals in a sample into sibling clusters using genetic marker data without parental information. Most of these methods assume either both sexes are monogamous to infer full sibships only or only one sex is polygamous to infer full sibships and paternal or maternal (but not both) half sibships. We extend our previous method to the more general case of both sexes being polygamous to infer full sibships, paternal half sibships, and maternal half sibships and to the case of a two-generation sample of individuals to infer parentage jointly with sibships. The extension not only expands enormously the scope of application of the method, but also increases its statistical power. The method is implemented for both diploid and haplodiploid species and for codominant and dominant markers, with mutations and genotyping errors accommodated. The performance and robustness of the method are evaluated by analyzing both simulated and empirical data sets. Our method is shown to be much more powerful than pairwise methods in both parentage and sibship assignments because of the more efficient use of marker information. It is little affected by inbreeding in parents and is moderately robust to nonrandom mating and linkage of markers. We also show that individually much less informative markers, such as SNPs or AFLPs, can reach the same power for parentage and sibship inferences as the highly informative marker simple sequence repeats (SSRs), as long as a sufficient number of loci are employed in the analysis. 相似文献
16.
Microbial Community Structure and Density Under Different Tree Species in an Acid Forest Soil (Morvan, France) 总被引:2,自引:0,他引:2
Overexploitation of forests to increase wood production has led to the replacement of native forest by large areas of monospecific
tree plantations. In the present study, the effects of different monospecific tree cover plantations on density and composition
of the indigenous soil microbial community are described. The experimental site of “Breuil-Chenue” in the Morvan (France)
was the site of a comparison of a similar mineral soil under Norway spruce (Picea abies), Douglas fir (Pseudotuga menziesii), oak (Quercus sessiflora), and native forest [mixed stand dominated by oak and beech (Fagus sylvatica)]. Sampling was performed during winter (February) at three depths (0–5, 5–10, and 10–15 cm). Abundance of microorganisms
was estimated via microbial biomass measurements, using the fumigation–extraction method. The genetic structure of microbial
communities was investigated using the bacterial- and fungal-automated ribosomal intergenic spacer analysis (B-ARISA and F-ARISA,
respectively) DNA fingerprint. Only small differences in microbial biomass were observed between tree species, the highest
values being recorded under oak forest and the lowest under Douglas fir. B- and F-ARISA community profiles of the different
tree covers clustered separately, but noticeable similarities were observed for soils under Douglas fir and oak. A significant
stratification was revealed under each tree species by a decrease in microbial biomass with increasing depths and by distinct
microbial communities for each soil layer. Differences in density and community composition according to tree species and
depth were related to soil physicochemical characteristics and organic matter composition. 相似文献
17.
18.
19.
Alan R. Templeton 《Environmental Biology of Fishes》2004,69(1-4):7-20
Genetic variation is now routinely screened at the DNA sequence level in many studies. If the DNA region being screened has not experienced excessive amounts of recombination, it is often possible to reconstruct the evolutionary history of the genetic variation in the form of a haplotype tree. This tree estimates the evolutionary pathway that interconnects all the different haplotypes (sequence variants) observed in the sample. This haplotype tree can be used to define a series of nested branches (clades) that reflects the relative temporal history of the haplotypes and groups of haplotypes. Geographical information can then be overlaid upon this temporal series to test for significant associations between geography and temporal position in the haplotype tree. This allows a reconstruction of how the genetic variation arose and spread in both space and time. Such reconstructions can yield many insights into the joint roles of recurrent events such as gene flow and of historical events such as fragmentation or range expansion. These points are illustrated with studies on the chub, Leuciscus cephalus. There is also a need to extend such nested phylogeographic analyses to a phylo/reticulate geographic analysis that incorporates both assortment and recombination between and within DNA regions. A preliminary phylo/reticulate geographic analysis is presented of the transferrin locus in the brown trout, Salmo trutta, species complex that reveals the importance of hybridization in the recent evolutionary history of this group. This example shows the inadequacy of a strictly phylogenetic approach and illustrates the need to incorporate reticulate evolution. The results of nested clade phylogeographic analysis and the new phylo/reticulate geographic analysis are then used for inferring species status of the marbled trout. The results indicate that an old hybridization event may have played a role in the origin of the marbled trout. Currently the marbled trout is primarily endangered by hybridization with introduced brown trout. These results show both the positive and negative impacts of hybridization upon biodiversity. Such phylo/reticulate geographic studies will challenge both our concepts of species and our conservation management strategies. 相似文献