首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Epidemiological models have highlighted the importance of population structure in the transmission dynamics of infectious diseases. Using HIV-1 as an example of a model evolutionary system, we consider how population structure affects the shape and the structure of a viral phylogeny in the absence of strong selection at the population level. For structured populations, the number of lineages as a function of time is insufficient to describe the shape of the phylogeny. We develop deterministic approximations for the dynamics of tips of the phylogeny over evolutionary time, the number of ‘cherries’, tips that share a direct common ancestor, and Sackin''s index, a commonly used measure of phylogenetic imbalance or asymmetry. We employ cherries both as a measure of asymmetry of the tree as well as a measure of the association between sequences from different groups. We consider heterogeneity in infectiousness associated with different stages of HIV infection, and in contact rates between groups of individuals. In the absence of selection, we find that population structure may have relatively little impact on the overall asymmetry of a tree, especially when only a small fraction of infected individuals is sampled, but may have marked effects on how sequences from different subpopulations cluster and co-cluster.  相似文献   

2.
A central theme connecting macroevolutionary processes to macroecological patterns is the shaping of regional biodiversity over time through speciation, extinction, migration, and range shifts. The use of phylogenies to explore the dynamics of diversification due to variation in speciation and extinction rates has been well-developed and there are established methods for inferring speciation times from phylogenies and generating its null distributions (as represented by node heights on molecular phylogenies). But inferring colonization events from phylogenies is more challenging. Unlike speciation events, represented by nodes, colonization events could occur at any point along a branch connecting species in the assemblage to the regional pool. We account for uncertainty in identification of colonization lineages and timing of colonization events by using an efficient analytical solution to inferring the distribution of colonization times from an assemblage phylogeny. Using the same solution, we efficiently derive the null distribution of colonization times, which provides us with a general approach to testing the adequacy of a model to describe colonization events into the assemblage. We illustrate this approach by demonstrating how the movement of squamate lineages into Madagascar has been uneven over time, peaking in the early Cenozoic when ocean conditions favored colonization.  相似文献   

3.
A phylogeny is a tree-based model of common ancestry that is an indispensable tool for studying biological variation. Phylogenies play a special role in the study of rapidly evolving populations such as viruses, where the proliferation of lineages is constantly being shaped by the mode of virus transmission, by adaptation to immune systems, and by patterns of human migration and contact. These processes may leave an imprint on the shapes of virus phylogenies that can be extracted for comparative study; however, tree shapes are intrinsically difficult to quantify. Here we present a comprehensive study of phylogenies reconstructed from 38 different RNA viruses from 12 taxonomic families that are associated with human pathologies. To accomplish this, we have developed a new procedure for studying phylogenetic tree shapes based on the ‘kernel trick’, a technique that maps complex objects into a statistically convenient space. We show that our kernel method outperforms nine different tree balance statistics at correctly classifying phylogenies that were simulated under different evolutionary scenarios. Using the kernel method, we observe patterns in the distribution of RNA virus phylogenies in this space that reflect modes of transmission and pathogenesis. For example, viruses that can establish persistent chronic infections (such as HIV and hepatitis C virus) form a distinct cluster. Although the visibly ‘star-like’ shape characteristic of trees from these viruses has been well-documented, we show that established methods for quantifying tree shape fail to distinguish these trees from those of other viruses. The kernel approach presented here potentially represents an important new tool for characterizing the evolution and epidemiology of RNA viruses.  相似文献   

4.
Yi Jin  Hong Qian 《Ecography》2019,42(8):1353-1359
We present V.PhyloMaker, a freely available package for R designed to generate phylogenies for vascular plants. The mega‐tree implemented in V.PhyloMaker (i.e. GBOTB.extended.tre), which was derived from two recently published mega‐trees and includes 74 533 species and all families of extant vascular plants, is the largest dated phylogeny for vascular plants. V.PhyloMaker can generate phylogenies for very large species lists (the largest species list that we tested included 314 686 species). V.PhyloMaker generates phylogenies at a fast speed, much faster than other phylogeny‐generating packages. Our tests of V.PhyloMaker show that generating a phylogeny for 60 000 species requires less than six hours. V.PhyloMaker includes an approach to attach genera or species to their close relatives in a phylogeny. We provide a simple example in this paper to show how to use V.PhyloMaker to generate phylogenies.  相似文献   

5.
Molecular phylogenies typically consist of only extant species, yet they allow inference of past rates of extinction, because recently originated species are less likely to be extinct than ancient species. Despite the simple structure of the assumed underlying speciation-extinction process, parametric functions to estimate extinction rates from phylogenies turned out to be complex and often difficult to derive. Moreover, these parametric functions are specific to a particular process (e.g. complete species level phylogeny with constant birth and death rates) and a particular type of data (e.g. times between bifurcations). Here, it is shown that artificial neural networks can substitute for parametric estimation functions once they have been sufficiently trained on simulated data. This technique can in principle be used for different processes and data types, and because it circumvents the time-consuming and difficult task of deriving parametric estimation functions, it may greatly extend the possibilities to make macro-evolutionary inferences from molecular phylogenies. This novel approach is explained, applied to estimate speciation and extinction rates from a molecular phylogeny of the reef fish genus Naso (Acanturidae), and its performance is compared to that of maximum likelihood estimation.  相似文献   

6.
Golding GB 《Genetics》2002,161(2):889-896
In general when a phylogeny is reconstructed from DNA or protein sequence data, it makes use only of the probabilities of obtaining some phylogeny given a collection of data. It is also possible to determine the prior probabilities of different phylogenies. This information can be of use in analyzing the biological causes for the observed divergence of sampled taxa. Unusually "rare" topologies for a given data set may be indicative of different biological forces acting. A recursive algorithm is presented that calculates the prior probabilities of a phylogeny for different allelic samples and for different phylogenies. This method is a straightforward extension of Ewens' sample distribution. The probability of obtaining each possible sample according to Ewens' distribution is further subdivided into each of the possible phylogenetic topologies. These probabilities depend not only on the identity of the alleles and on 4N(mu) (four times the effective population size times the neutral mutation rate) but also on the phylogenetic relationships among the alleles. Illustrations of the algorithm are given to demonstrate how different phylogenies are favored under different conditions.  相似文献   

7.
Case-control association studies are widely used in the search for genetic variants that contribute to human diseases. It has long been known that such studies may suffer from high rates of false positives if there is unrecognized population structure. It is perhaps less widely appreciated that so-called “cryptic relatedness” (i.e., kinship among the cases or controls that is not known to the investigator) might also potentially inflate the false positive rate. Until now there has been little work to assess how serious this problem is likely to be in practice. In this paper, we develop a formal model of cryptic relatedness, and study its impact on association studies. We provide simple expressions that predict the extent of confounding due to cryptic relatedness. Surprisingly, these expressions are functions of directly observable parameters. Our analytical results show that, for well-designed studies in outbred populations, the degree of confounding due to cryptic relatedness will usually be negligible. However, in contrast, studies where there is a sampling bias toward collecting relatives may indeed suffer from excessive rates of false positives. Furthermore, cryptic relatedness may be a serious concern in founder populations that have grown rapidly and recently from a small size. As an example, we analyze the impact of excess relatedness among cases for six phenotypes measured in the Hutterite population.  相似文献   

8.
Because phylogenies can be estimated without stratigraphic data and because estimated phylogenies also infer gaps in sampling, some workers have used phylogeny estimates as templates for evaluating sampling from the fossil record and for "correcting" historical diversity patterns. However, it is not known how sampling intensity (the probability of sampling taxa per unit time) and completeness (the proportion of taxa sampled) affect the accuracy of phylogenetic inferences, nor how phylogenetically inferred estimates of sampling and diversity respond to inaccurate estimates of phylogeny. Both issues are addressed with a series of simulations using simple models of character evolution, varying speciation patterns, and various rates of speciation, extinction, character change, and preservation. Parsimony estimates of simulated phylogenies become less accurate as sampling decreases, and inaccurate trees chronically underestimate sampling. Biotic factors such as rates of morphologic change and extinction both affect the accuracy of phylogenetic estimates and thus affect estimated gaps in sampling, indicating that differences in implied sampling need not reflect actual differences in sampling. Errors in inferred diversity are concentrated early in the history of a clade. This, coupled with failure to account for true extinction times (i.e., the Signor-Lipps effect), inflates relative diversity levels early in clade histories. Because factors other than differences in sampling predict differences in the numbers of gaps implied by phylogeny estimates, inferred phylogenies can be misleading templates for evaluating sampling or historical diversity patterns.  相似文献   

9.
Relaxed phylogenetics and dating with confidence   总被引:3,自引:1,他引:2       下载免费PDF全文
In phylogenetics, the unrooted model of phylogeny and the strict molecular clock model are two extremes of a continuum. Despite their dominance in phylogenetic inference, it is evident that both are biologically unrealistic and that the real evolutionary process lies between these two extremes. Fortunately, intermediate models employing relaxed molecular clocks have been described. These models open the gate to a new field of “relaxed phylogenetics.” Here we introduce a new approach to performing relaxed phylogenetic analysis. We describe how it can be used to estimate phylogenies and divergence times in the face of uncertainty in evolutionary rates and calibration times. Our approach also provides a means for measuring the clocklikeness of datasets and comparing this measure between different genes and phylogenies. We find no significant rate autocorrelation among branches in three large datasets, suggesting that autocorrelated models are not necessarily suitable for these data. In addition, we place these datasets on the continuum of clocklikeness between a strict molecular clock and the alternative unrooted extreme. Finally, we present analyses of 102 bacterial, 106 yeast, 61 plant, 99 metazoan, and 500 primate alignments. From these we conclude that our method is phylogenetically more accurate and precise than the traditional unrooted model while adding the ability to infer a timescale to evolution.  相似文献   

10.
Studies of phylogenetic tree shape often concentrate on the balance of phylogenies of extant taxa. Paleontological phylogenies (which include extinct taxa) can contain additional useful information and can directly document changes in tree shape through evolutionary time. Unfortunately, the inclusion of extinct taxa lowers the power of direct examinations of tree balance because it increases the range of tree shapes expected under null models of evolution (with equal rates of speciation and extinction across lineages). A promising approach for the analysis of tree shape in paleontological phylogenies is to break the phylogeny down into time slices, examining the shape of the phylogeny of taxa alive at each time slice and changes in that shape between successive time slices. This method was illustrated with 57 time slices through a stratophenetic phylogeny of the Cretaceous planktonic foraminiferal superfamily Globotruncanacea. At 3 of 56 intervals between time slices, 93-92.5 million years ago (MYA), 89-88.5 MYA, and 85.5-84 MYA, the group showed steep increases in imbalance. Although none of these increases were significant after Bonferroni correction, these points in the history of the Globotruncanacea were nevertheless identified as deserving of further macroevolutionary investigation. The 84 MYA time slice coincides with a peak in species turnover for the superfamily. Time slices through phylogenies may prove useful for identifying periods of time when evolution was proceeding in a nonstochastic manner.  相似文献   

11.
Mammalian phylogeny is far too asymmetric for all contemporaneous lineages to have had equal chances of diversifying. We consider this asymmetry or imbalance from four perspectives. First, we infer a minimal set of 'regime changes'-points at which net diversification rate has changed-identifying 15 significant radiations and 12 clades that may be 'downshifts'. We next show that mammalian phylogeny is similar in shape to a large set of published phylogenies of other vertebrate, arthropod and plant groups, suggesting that many clades may diversify under a largely shared set of 'rules'. Third, we simulate six simple macroevolutionary models, showing that those where speciation slows down as geographical or niche space is filled, produce more realistic phylogenies than do models involving key innovations. Lastly, an analysis of the spatial scaling of imbalance shows that the phylogeny of species within an assemblage, ecoregion or larger area always tends to be more unbalanced than expected from the phylogeny of species at the next more inclusive spatial scale. We conclude with a verbal model of mammalian macroevolution, which emphasizes the importance to diversification of accessing new regions of geographical or niche space.  相似文献   

12.
Bayesian estimation of ancestral character states on phylogenies   总被引:17,自引:0,他引:17  
Biologists frequently attempt to infer the character states at ancestral nodes of a phylogeny from the distribution of traits observed in contemporary organisms. Because phylogenies are normally inferences from data, it is desirable to account for the uncertainty in estimates of the tree and its branch lengths when making inferences about ancestral states or other comparative parameters. Here we present a general Bayesian approach for testing comparative hypotheses across statistically justified samples of phylogenies, focusing on the specific issue of reconstructing ancestral states. The method uses Markov chain Monte Carlo techniques for sampling phylogenetic trees and for investigating the parameters of a statistical model of trait evolution. We describe how to combine information about the uncertainty of the phylogeny with uncertainty in the estimate of the ancestral state. Our approach does not constrain the sample of trees only to those that contain the ancestral node or nodes of interest, and we show how to reconstruct ancestral states of uncertain nodes using a most-recent-common-ancestor approach. We illustrate the methods with data on ribonuclease evolution in the Artiodactyla. Software implementing the methods (BayesMultiState) is available from the authors.  相似文献   

13.
The branching times of molecular phylogenies allow us to infer speciation and extinction dynamics even when fossils are absent. Troublingly, phylogenetic approaches usually return estimates of zero extinction, conflicting with fossil evidence. Phylogenies and fossils do agree, however, that there are often limits to diversity. Here, we present a general approach to evaluate the likelihood of a phylogeny under a model that accommodates diversity-dependence and extinction. We find, by likelihood maximization, that extinction is estimated most precisely if the rate of increase in the number of lineages in the phylogeny saturates towards the present or first decreases and then increases. We demonstrate the utility and limits of our approach by applying it to the phylogenies for two cases where a fossil record exists (Cetacea and Cenozoic macroperforate planktonic foraminifera) and to three radiations lacking fossil evidence (Dendroica, Plethodon and Heliconius). We propose that the diversity-dependence model with extinction be used as the standard model for macro-evolutionary dynamics because of its biological realism and flexibility.  相似文献   

14.
Statistical randomization tests in evolutionary biology often require a set of random, computer-generated trees. For example, earlier studies have shown how large numbers of computer-generated trees can be used to conduct phylogenetic comparative analyses even when the phylogeny is uncertain or unknown. These methods were limited, however, in that (in the absence of molecular sequence or other data) they allowed users to assume that no phylogenetic information was available or that all possible trees were known. Intermediate situations where only a taxonomy or other limited phylogenetic information (e.g., polytomies) are available are technically more difficult. The current study describes a procedure for generating random samples of phylogenies while incorporating limited phylogenetic information (e.g., four taxa belong together in a subclade). The procedure can be used to conduct comparative analyses when the phylogeny is only partially resolved or can be used in other randomization tests in which large numbers of possible phylogenies are needed.  相似文献   

15.
An important challenge in evolutionary biology is to understand how major changes in body form arise. The dramatic transition from a lizard-like to snake-like body form in squamate reptiles offers an exciting system for such research because this change is replicated dozens of times. Here, we use morphometric data for 258 species and a time-calibrated phylogeny to explore rates and patterns of body-form evolution across squamates. We also demonstrate how time-calibrated phylogenies may be used to make inferences about the time frame over which major morphological transitions occur. Using the morphometric data, we find that the transition from lizard-like to snake-like body form involves concerted evolution of limb reduction, digit loss, and body elongation. These correlations are similar across squamate clades, despite very different ecologies and >180 million years (My) of divergence. Using the time-calibrated phylogeny and ancestral reconstructions, we find that the dramatic transition between these body forms can occur in 20 My or less, but that seemingly intermediate morphologies can also persist for tens of millions of years. Finally, although loss of digits is common, we find statistically significant support for at least six examples of the re-evolution of lost digits in the forelimb and hind limb.  相似文献   

16.
The rapid increase in published genomic sequences for bacteria presents the first opportunity to reconstruct evolutionary events on the scale of entire genomes. However, extensive lateral gene transfer (LGT) may thwart this goal by preventing the establishment of organismal relationships based on individual gene phylogenies. The group for which cases of LGT are most frequently documented and for which the greatest density of complete genome sequences is available is the γ-Proteobacteria, an ecologically diverse and ancient group including free-living species as well as pathogens and intracellular symbionts of plants and animals. We propose an approach to multigene phylogeny using complete genomes and apply it to the case of the γ-Proteobacteria. We first applied stringent criteria to identify a set of likely gene orthologs and then tested the compatibilities of the resulting protein alignments with several phylogenetic hypotheses. Our results demonstrate phylogenetic concordance among virtually all (203 of 205) of the selected gene families, with each of the exceptions consistent with a single LGT event. The concatenated sequences of the concordant families yield a fully resolved phylogeny. This topology also received strong support in analyses aimed at excluding effects of heterogeneity in nucleotide base composition across lineages. Our analysis indicates that single-copy orthologous genes are resistant to horizontal transfer, even in ancient bacterial groups subject to high rates of LGT. This gene set can be identified and used to yield robust hypotheses for organismal phylogenies, thus establishing a foundation for reconstructing the evolutionary transitions, such as gene transfer, that underlie diversity in genome content and organization.  相似文献   

17.
Large-scale phylogenies provide a valuable source to study background diversification rates and investigate if the rates have changed over time. Unfortunately most large-scale, dated phylogenies are sparsely sampled (fewer than 5% of the described species) and taxon sampling is not uniform. Instead, taxa are frequently sampled to obtain at least one representative per subgroup (e.g. family) and thus to maximize diversity (diversified sampling). So far, such complications have been ignored, potentially biasing the conclusions that have been reached. In this study I derive the likelihood of a birth-death process with non-constant (time-dependent) diversification rates and diversified taxon sampling. Using simulations I test if the true parameters and the sampling method can be recovered when the trees are small or medium sized (fewer than 200 taxa). The results show that the diversification rates can be inferred and the estimates are unbiased for large trees but are biased for small trees (fewer than 50 taxa). Furthermore, model selection by means of Akaike''s Information Criterion favors the true model if the true rates differ sufficiently from alternative models (e.g. the birth-death model is recovered if the extinction rate is large and compared to a pure-birth model). Finally, I applied six different diversification rate models – ranging from a constant-rate pure birth process to a decreasing speciation rate birth-death process but excluding any rate shift models – on three large-scale empirical phylogenies (ants, mammals and snakes with respectively 149, 164 and 41 sampled species). All three phylogenies were constructed by diversified taxon sampling, as stated by the authors. However only the snake phylogeny supported diversified taxon sampling. Moreover, a parametric bootstrap test revealed that none of the tested models provided a good fit to the observed data. The model assumptions, such as homogeneous rates across species or no rate shifts, appear to be violated.  相似文献   

18.
As methods of molecular phylogeny have become more explicit and more biologically realistic following the pioneering work of Thomas Jukes, they have had to relax their initial assumption that rates of evolution were equal at all sites. Distance matrix and likelihood methods of inferring phylogenies make this assumption; parsimony, when valid, is less limited by it. Nucleotide sequences, including RNA sequences, can show substantial rate variation; protein sequences show rates that vary much more widely. Assuming a prior distribution of rates such as a gamma distribution or lognormal distribution has deservedly been popular, but for likelihood methods it leads to computational difficulties. These can be resolved using hidden Markov model (HMM) methods which approximate the distribution by one with a modest number of discrete rates. Generalized Laguerre quadrature can be used to improve the selection of rates and their probabilities so as to more nearly approach the desired gamma distribution. A model based on population genetics is presented predicting how the rates of evolution might vary from locus to locus. Challenges for the future include allowing rates at a given site to vary along the tree, as in the ``covarion' model, and allowing them to have correlations that reflect three-dimensional structure, rather than position in the coding sequence. Markov chain Monte Carlo likelihood methods may be the only practical way to carry out computations for these models. Received: 8 February 2001 / Accepted: 20 May 2001  相似文献   

19.
Studies of shifts in diversification rates and adaptive radiations are difficult when there are no fossils because past events cannot be inferred. The phylogenies of recent species, however, allow one to infer the patterns of past diversifications. I present a new method for estimating the diversification rate of a lineage, provided that a phylogeny of recent species, constructed, for instance, with molecular data, is available. This method was inspired by survival models and takes into account species that are not included in detailed phylogenetic data, provided that approximate dates of origin of these species are known. Likelihood ratio tests and Akaike Information Criterion make it possible to test for differences in diversification among lineages or groups of lineages and, thus, to evaluate adaptive radiation hypotheses. The present modeling approach can easily be extended to include temporal variations in diversification rates. A simulation study showed that the method is statistically consistent, avoiding Type I and Type II errors, and that it is robust to periodic or random fluctuations in the speciation rate. An example is presented with a composite phylogeny of primates.  相似文献   

20.
Molecular phylogenies are increasingly being used to investigate the patterns and mechanisms of macroevolution. In particular, node heights in a phylogeny can be used to detect changes in rates of diversification over time. Such analyses rest on the assumption that node heights in a phylogeny represent the timing of diversification events, which in turn rests on the assumption that evolutionary time can be accurately predicted from DNA sequence divergence. But there are many influences on the rate of molecular evolution, which might also influence node heights in molecular phylogenies, and thus affect estimates of diversification rate. In particular, a growing number of studies have revealed an association between the net diversification rate estimated from phylogenies and the rate of molecular evolution. Such an association might, by influencing the relative position of node heights, systematically bias estimates of diversification time. We simulated the evolution of DNA sequences under several scenarios where rates of diversification and molecular evolution vary through time, including models where diversification and molecular evolutionary rates are linked. We show that commonly used methods, including metric‐based, likelihood and Bayesian approaches, can have a low power to identify changes in diversification rate when molecular substitution rates vary. Furthermore, the association between the rates of speciation and molecular evolution rate can cause the signature of a slowdown or speedup in speciation rates to be lost or misidentified. These results suggest that the multiple sources of variation in molecular evolutionary rates need to be considered when inferring macroevolutionary processes from phylogenies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号