首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Here, a new theory of molecular phylogeny is developed in a multidimensional vector space (MVS). The molecular evolution is represented as a successive splitting of branch vectors in the MVS. The end points of these vectors are the extant species and indicate the specific directions reflected by their individual histories of evolution in the past. This representation makes it possible to infer the phylogeny (evolutionary histories) from the spatial positions of the end points. Search vectors are introduced to draw out the groups of species distributed around them. These groups are classified according to the nearby order of branches with them. A law of physics is applied to determine the species positions in the MVS. The species are regarded as the particles moving in time according to the equation of motion, finally falling into the lowest-energy state in spite of their randomly distributed initial condition. This falling into the ground state results in the construction of an MVS in which the relative distances between two particles are equal to the substitution distances. The species positions are obtained prior to the phylogeny inference. Therefore, as the number of species increases, the species vectors can be more specific in an MVS of a larger size, such that the vector analysis gives a more stable and reliable topology. The efficacy of the present method was examined by using computer simulations of molecular evolution in which all the branch- and end-point sequences of the trees are known in advance. In the phylogeny inference from the end points with 100 multiple data sets, the present method consistently reconstructed the correct topologies, in contrast to standard methods. In applications to 185 vertebrates in the alpha-hemoglobin, the vector analysis drew out the two lineage groups of birds and mammals. A core member of the mammalian radiation appeared at the base of the mammalian lineage. Squamates were isolated from the bird lineage to compose the outgroup, while the other living reptilians were directly coupled with birds without forming any sister groups. This result is in contrast to the morphological phylogeny and is also different from those of recent molecular analyses.  相似文献   

2.
Modes and rates of molecular evolution, and congruence and combinability for phylogenetic reconstruction, of portions of the nuclear large ribosomal subunit (nLSU-rDNA) and mitochondrial small subunit (mtSSU-rDNA) genes were investigated in the mushroom genus Amanita. The AT content was higher in the mtSSU-rDNA than in the nLSU-rDNA. A transition bias in which AT substitutions were as frequent as transitions was present in the mtSSU-rDNA but not in the nLSU-rDNA. Among-sites rate variation in nucleotide substitutions at variable sites was present in the nLSU-rDNA but not in the mtSSU-rDNA. Likelihood ratio tests indicated very different models of evolution for the two molecules. A molecular clock could be rejected for both data sets. Rates of molecular evolution in the two molecules were uncoupled: faster evolutionary rates in the mtSSU-rDNA and nLSU-rDNA were not observed for the same taxa. In separate phylogenetic analyses, the nLSU-rDNA data set had higher phylogenetic resolution. The partition homogeneity test and statistical bootstrap support for branches indicated absence of conflict in the phylogenetic signal in the two data sets; however, tree topologies produced from the separate data sets were not congruent. Heterogeneity in modes and rates of evolution in the two molecules pose difficulties for a combined analysis of the two data sets: the use of equally weighted parsimony is not fully satisfactory when rate heterogeneity is present, and it is impractical to determine a model for maximum-likelihood analysis that fits simultaneously two heterogeneous data sets. Overall topologies produced from either the separated or the combined analyses using various tree reconstruction methods were identical for nearly all statistically significant branches.  相似文献   

3.
We study the phylogeny of the placental mammals using molecular data from all mitochondrial tRNAs and rRNAs of 54 species. We use probabilistic substitution models specific to evolution in base paired regions of RNA. A number of these models have been implemented in a new phylogenetic inference software package for carrying out maximum likelihood and Bayesian phylogenetic inferences. We describe our Bayesian phylogenetic method which uses a Markov chain Monte Carlo algorithm to provide samples from the posterior distribution of tree topologies. Our results show support for four primary mammalian clades, in agreement with recent studies of much larger data sets mainly comprising nuclear DNA. We discuss some issues arising when using Bayesian techniques on RNA sequence data.  相似文献   

4.
It has long been recognized that phylogenetic trees are more unbalanced than those generated by a Yule process. Recently, the degree of this imbalance has been quantified using the large set of phylogenetic trees available in the TreeBASE data set. In this article, a more precise analysis of imbalance is undertaken. Trees simulated under a range of models are compared with trees from TreeBASE and two smaller data sets. Several simple models can match the amount of imbalance measured in real data. Most of them also match the variance of imbalance among empirical trees to a remarkable degree. Statistics are developed to measure balance and to distinguish between trees with the same overall imbalance. The match between models and data for these statistics is investigated. In particular, age-dependent (Bellman-Harris) branching process are studied in detail. It remains difficult to separate the process of macroevolution from biases introduced by sampling. The lessons for phylogenetic analysis are clearer. In particular, the use of the usual proportional to distinguishable arrangements (uniform) prior on tree topologies in Bayesian phylogenetic analysis is not recommended.  相似文献   

5.
6.
The covarion hypothesis of molecular evolution proposes that selective pressures on an amino acid or nucleotide site change through time, thus causing changes of evolutionary rate along the edges of a phylogenetic tree. Several kinds of Markov models for the covarion process have been proposed. One model, proposed by Huelsenbeck (2002), has 2 substitution rate classes: the substitution process at a site can switch between a single variable rate, drawn from a discrete gamma distribution, and a zero invariable rate. A second model, suggested by Galtier (2001), assumes rate switches among an arbitrary number of rate classes but switching to and from the invariable rate class is not allowed. The latter model allows for some sites that do not participate in the rate-switching process. Here we propose a general covarion model that combines features of both models, allowing evolutionary rates not only to switch between variable and invariable classes but also to switch among different rates when they are in a variable state. We have implemented all 3 covarion models in a maximum likelihood framework for amino acid sequences and tested them on 23 protein data sets. We found significant likelihood increases for all data sets for the 3 models, compared with a model that does not allow site-specific rate switches along the tree. Furthermore, we found that the general model fit the data better than the simpler covarion models in the majority of the cases, highlighting the complexity in modeling the covarion process. The general covarion model can be used for comparing tree topologies, molecular dating studies, and the investigation of protein adaptation.  相似文献   

7.
Territorial song structures are often the most prominent characters for distinguishing closely related taxa among songbirds. Learning processes may cause convergent evolution of passerine songs, but phylogenetic information of acoustic traits can be investigated with the help of molecular phylogenies, which are not affected by cultural evolutionary processes. We used a phylogeny based on cytochrome b sequences to trace the evolution of territorial song within the genus Regulus. Five discrete song units are defined as basic components of regulid song via sonagraphic measurements. Traits of each unit are traced on a molecular tree and a mean acoustic character difference between taxon pairs is calculated. Acoustic divergence between regulid taxa correlates strongly with genetic distances. Syntax features of complete songs and of single units are most consistent with the molecular data, whereas the abundance of certain element types is not. Whether song characters are innate or learned was interpreted using hand-reared birds in aviary experiments. We found that convergent character evolution seems to be most probable for learned acoustic traits. We conclude that syntax traits of whole verses or subunits of territorial song, especially innate song structures, are the most reliable acoustic traits for phylogenetic reconstructions in Regulus.  相似文献   

8.
Molecular trees of trypanosomes have confirmed conventionally accepted genera, but often produce topologies that are incongruent with knowledge of the evolution, systematics, and biogeography of hosts and vectors. These distorted topologies result largely from incorrect assumptions about molecular clocks. A host-based phylogenetic tree could serve as a broad outline against which the reasonability of molecular phylogenies could be evaluated. The host-based tree of trypanosomes presented here supports the " invertebrate first " hypothesis of trypanosome evolution, supports the monophyly of Trypanosomatidae, and indicates the digenetic lifestyle arose three times. An area cladogram of Leishmania supports origination in the Palaearctic during the Palaeocene.  相似文献   

9.
Most phylogenetic tree estimation methods assume that there is a single set of hierarchical relationships among sequences in a data set for all sites along an alignment. Mosaic sequences produced by past recombination events will violate this assumption and may lead to misleading results from a phylogenetic analysis due to the imposition of a single tree along the entire alignment. Therefore, the detection of past recombination is an important first step in an analysis. A Bayesian model for the changes in topology caused by recombination events is described here. This model relaxes the assumption of one topology for all sites in an alignment and uses the theory of Hidden Markov models to facilitate calculations, the hidden states being the underlying topologies at each site in the data set. Changes in topology along the multiple sequence alignment are estimated by means of the maximum a posteriori (MAP) estimate. The performance of the MAP estimate is assessed by application of the model to data sets of four sequences, both simulated and real.  相似文献   

10.
Until recently, phylogenetic analyses have been routinely based on homologous sequences of a single gene. Given the vast number of gene sequences now available, phylogenetic studies are now based on the analysis of multiple genes. Thus, it has become necessary to devise statistical methods to combine multiple molecular data sets. Here, we compare several models for combining different genes for the purpose of evaluating the likelihood of tree topologies. Three methods of branch length estimation were studied: assuming all genes have the same branch lengths (concatenate model), assuming that branch lengths are proportional among genes (proportional model), or assuming that each gene has a separate set of branch lengths (separate model). We also compared three models of among-site rate variation: the homogenous model, a model that assumes one gamma parameter for all genes, and a model that assumes one gamma parameter for each gene. On the basis of two nuclear and one mitochondrial amino acid data sets, our results suggest that, depending on the data set chosen, either the separate model or the proportional model represents the most appropriate method for branch length analysis. For all the data sets examined, one gamma parameter for each gene represents the best model for among-site rate variation. Using these models we analyzed alternative mammalian tree topologies, and we describe the effect of the assumed model on the maximum likelihood tree. We show that the choice of the model has an impact on the best phylogeny obtained.  相似文献   

11.
A new view of phylogenetic estimation is presented where data sets, tree evolution models, and estimation methods are placed in a common geometric framework. Each of these objects is placed in a vector space where the character patterns are the basis vectors. This viewpoint allows intuitive understanding of various complex properties of the phylogeneticestimation problem structure. This is illustrated with examples discussing data set combinations, mixture models, consistency, and phylogenetic invariants.  相似文献   

12.
We use a likelihood-based statistical test to evaluate the extent to which the available molecular data sets can be used to falsify alternative phylogenetic hypotheses describing the inter-relationship among corbiculate bee tribes. Based on the results of this test, we explore three alternative models of behavioural character state evolution and evaluate the support each model has for single-origin versus dual-origin hypotheses for 'highly' eusocial behaviour. We show that only one of four data sets could statistically reject any of the 15 possible outgroup-rooted phylogenetic hypotheses. However, a cytochrome b data set rejected all but three alternative topologies. Using this information, a simple model of behavioural character state evolution, in which transitions between solitary/communal, 'primitively' eusocial, and 'highly' eusocial are unconstrained, supports single-origin hypotheses for 'highly' eusocial behaviour, in spite of phylogenetic uncertainty. By contrast, an ordered model, in which 'highly' eusocial is constrained to be an evolutionarily terminal state, supports a dual-origins hypothesis. Our results show that the molecular phylogenetic evidence favouring a dual-origins hypothesis for 'highly' eusocial behaviour is, at present, conditional on information from one gene (cyt b) and on specific, though likely realistic, assumptions regarding the nature of eusocial evolution.  相似文献   

13.
A central task in the study of molecular evolution is the reconstruction of a phylogenetic tree from sequences of current-day taxa. The most established approach to tree reconstruction is maximum likelihood (ML) analysis. Unfortunately, searching for the maximum likelihood phylogenetic tree is computationally prohibitive for large data sets. In this paper, we describe a new algorithm that uses Structural Expectation Maximization (EM) for learning maximum likelihood phylogenetic trees. This algorithm is similar to the standard EM method for edge-length estimation, except that during iterations of the Structural EM algorithm the topology is improved as well as the edge length. Our algorithm performs iterations of two steps. In the E-step, we use the current tree topology and edge lengths to compute expected sufficient statistics, which summarize the data. In the M-Step, we search for a topology that maximizes the likelihood with respect to these expected sufficient statistics. We show that searching for better topologies inside the M-step can be done efficiently, as opposed to standard methods for topology search. We prove that each iteration of this procedure increases the likelihood of the topology, and thus the procedure must converge. This convergence point, however, can be a suboptimal one. To escape from such "local optima," we further enhance our basic EM procedure by incorporating moves in the flavor of simulated annealing. We evaluate these new algorithms on both synthetic and real sequence data and show that for protein sequences even our basic algorithm finds more plausible trees than existing methods for searching maximum likelihood phylogenies. Furthermore, our algorithms are dramatically faster than such methods, enabling, for the first time, phylogenetic analysis of large protein data sets in the maximum likelihood framework.  相似文献   

14.
Covarion processes allow changes in evolutionary rates at sites along the branches of a phylogenetic tree. Covarion-like evolution is increasingly recognized as an important mode of protein evolution. Several recent reports suggest that maximum likelihood estimation employing covarion models may support different optimal topologies than estimation using standard rates-across-sites (RAS) models. However, it remains to be demonstrated that ignoring covarion evolution will generally result in topological misestimation. In this study we performed analytical and theoretical studies of limiting distances under the covarion model and four-taxon tree simulations to investigate the extent to which the covarion process impacts on phylogenetic estimation. In particular, we assessed the limits of an RAS model-based maximum likelihood method to recover the phylogenies when the sequence data were simulated under the covarion processes. We find that, when ignored, covarion processes can induce systematic errors in phylogeny reconstruction. Surprisingly, when sequences are evolved under a covarion process but an RAS model is used for estimation, we find that a long branch repel bias occurs.  相似文献   

15.

Background

Whenever different data sets arrive at conflicting phylogenetic hypotheses, only testable causal explanations of sources of errors in at least one of the data sets allow us to critically choose among the conflicting hypotheses of relationships. The large (28S) and small (18S) subunit rRNAs are among the most popular markers for studies of deep phylogenies. However, some nodes supported by this data are suspected of being artifacts caused by peculiarities of the evolution of these molecules. Arthropod phylogeny is an especially controversial subject dotted with conflicting hypotheses which are dependent on data set and method of reconstruction. We assume that phylogenetic analyses based on these genes can be improved further i) by enlarging the taxon sample and ii) employing more realistic models of sequence evolution incorporating non-stationary substitution processes and iii) considering covariation and pairing of sites in rRNA-genes.

Results

We analyzed a large set of arthropod sequences, applied new tools for quality control of data prior to tree reconstruction, and increased the biological realism of substitution models. Although the split-decomposition network indicated a high noise content in the data set, our measures were able to both improve the analyses and give causal explanations for some incongruities mentioned from analyses of rRNA sequences. However, misleading effects did not completely disappear.

Conclusion

Analyses of data sets that result in ambiguous phylogenetic hypotheses demand for methods, which do not only filter stochastic noise, but likewise allow to differentiate phylogenetic signal from systematic biases. Such methods can only rely on our findings regarding the evolution of the analyzed data. Analyses on independent data sets then are crucial to test the plausibility of the results. Our approach can easily be extended to genomic data, as well, whereby layers of quality assessment are set up applicable to phylogenetic reconstructions in general.  相似文献   

16.
In phylogenetic analyses, conducted on the ND4L gene and part of the ND4 gene from species of the genus Pimephales , maximum parsimony yielded four trees, with the strict consensus providing no resolution of relationships among species. Maximum likelihood and minimum evolution methods yielded identical tree topologies, which differed from previous hypotheses of relationships for these species. If this topology is correct, it implies independent evolution of morphological characters, possibly associated with convergent trophic specialization.  相似文献   

17.
All characters and trait systems in an organism share a common evolutionary history that can be estimated using phylogenetic methods. However, differential rates of change and the evolutionary mechanisms driving those rates result in pervasive phylogenetic conflict. These drivers need to be uncovered because mismatches between evolutionary processes and phylogenetic models can lead to high confidence in incorrect hypotheses. Incongruence between phylogenies derived from morphological versus molecular analyses, and between trees based on different subsets of molecular sequences has become pervasive as datasets have expanded rapidly in both characters and species. For more than a decade, evolutionary relationships among members of the New World bat family Phyllostomidae inferred from morphological and molecular data have been in conflict. Here, we develop and apply methods to minimize systematic biases, uncover the biological mechanisms underlying phylogenetic conflict, and outline data requirements for future phylogenomic and morphological data collection. We introduce new morphological data for phyllostomids and outgroups and expand previous molecular analyses to eliminate methodological sources of phylogenetic conflict such as taxonomic sampling, sparse character sampling, or use of different algorithms to estimate the phylogeny. We also evaluate the impact of biological sources of conflict: saturation in morphological changes and molecular substitutions, and other processes that result in incongruent trees, including convergent morphological and molecular evolution. Methodological sources of incongruence play some role in generating phylogenetic conflict, and are relatively easy to eliminate by matching taxa, collecting more characters, and applying the same algorithms to optimize phylogeny. The evolutionary patterns uncovered are consistent with multiple biological sources of conflict, including saturation in morphological and molecular changes, adaptive morphological convergence among nectar‐feeding lineages, and incongruent gene trees. Applying methods to account for nucleotide sequence saturation reduces, but does not completely eliminate, phylogenetic conflict. We ruled out paralogy, lateral gene transfer, and poor taxon sampling and outgroup choices among the processes leading to incongruent gene trees in phyllostomid bats. Uncovering and countering the possible effects of introgression and lineage sorting of ancestral polymorphism on gene trees will require great leaps in genomic and allelic sequencing in this species‐rich mammalian family. We also found evidence for adaptive molecular evolution leading to convergence in mitochondrial proteins among nectar‐feeding lineages. In conclusion, the biological processes that generate phylogenetic conflict are ubiquitous, and overcoming incongruence requires better models and more data than have been collected even in well‐studied organisms such as phyllostomid bats.  相似文献   

18.
Different views of the pattern of social evolution among the highly eusocial bees have arisen as a result of discordance between past molecular and morphology-based phylogenies. Here we present new data and taxa for four molecular data sets and reassess the morphological characters available to date. We show there is no significant character incongruence between four molecular data sets (two nuclear and two mitochondrial), but highly significant character incongruence leads to topological incongruence between the molecular and morphological data. We investigate the effects of using different outgroup combinations to root the estimated tree. We also consider various ways in which biases in the sequence data could be misleading, using several maximum likelihood models, LogDet corrections, and spectral analyses. Ultimately, we concede there is strong discordance between the molecular and morphological data partitions and appropriately apply the conditional combination approach in this case. We also find two equally well supported placements of the root for the molecular trees, one supported by 16S and 28S sequences, the other supported by cytochrome b and opsin. The strength of the evidence leads us to accept two equally well supported hypotheses based on analyses of the molecular data sets. These are the most rigorously supported hypotheses of corbiculate bee relationships at this time, and frame our argument that highly eusocial behavior within the corbiculate bees evolved twice independently.  相似文献   

19.
MOTIVATION: A large, high-quality database of homologous sequence alignments with good estimates of their corresponding phylogenetic trees will be a valuable resource to those studying phylogenetics. It will allow researchers to compare current and new models of sequence evolution across a large variety of sequences. The large quantity of data may provide inspiration for new models and methodology to study sequence evolution and may allow general statements about the relative effect of different molecular processes on evolution. RESULTS: The Pandit 7.6 database contains 4341 families of sequences derived from the seed alignments of the Pfam database of amino acid alignments of families of homologous protein domains (Bateman et al., 2002). Each family in Pandit includes an alignment of amino acid sequences that matches the corresponding Pfam family seed alignment, an alignment of DNA sequences that contain the coding sequence of the Pfam alignment when they can be recovered (overall, 82.9% of sequences taken from Pfam) and the alignment of amino acid sequences restricted to only those sequences for which a DNA sequence could be recovered. Each of the alignments has an estimate of the phylogenetic tree associated with it. The tree topologies were obtained using the neighbor joining method based on maximum likelihood estimates of the evolutionary distances, with branch lengths then calculated using a standard maximum likelihood approach.  相似文献   

20.
Phylogenetic reconstructions are a major component of many studies in evolutionary biology, but their accuracy can be reduced under certain conditions. Recent studies showed that the convergent evolution of some phenotypes resulted from recurrent amino acid substitutions in genes belonging to distant lineages. It has been suggested that these convergent substitutions could bias phylogenetic reconstruction toward grouping convergent phenotypes together, but such an effect has never been appropriately tested. We used computer simulations to determine the effect of convergent substitutions on the accuracy of phylogenetic inference. We show that, in some realistic conditions, even a relatively small proportion of convergent codons can strongly bias phylogenetic reconstruction, especially when amino acid sequences are used as characters. The strength of this bias does not depend on the reconstruction method but varies as a function of how much divergence had occurred among the lineages prior to any episodes of convergent substitutions. While the occurrence of this bias is difficult to predict, the risk of spurious groupings is strongly decreased by considering only 3rd codon positions, which are less subject to selection, as long as saturation problems are not present. Therefore, we recommend that, whenever possible, topologies obtained with amino acid sequences and 3rd codon positions be compared to identify potential phylogenetic biases and avoid evolutionarily misleading conclusions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号