首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Since branch lengths provide important information about the timing and the extent of evolutionary divergence among taxa, accurate resolution of evolutionary history depends as much on branch length estimates as on recovery of the correct topology. However, the empirical relationship between the choice of genes to sequence and the quality of branch length estimation remains ill defined. To address this issue, we evaluated the accuracy of branch lengths estimated from subsets of the mitochondrial genome for a mammalian phylogeny with known subordinal relationships. Using maximum-likelihood methods, we estimated branch lengths from an 11-kb sequence of all 13 protein-coding genes and compared them with estimates from single genes (0.2-1.8 kb) and from 7 different combinations of genes (2-3.5 kb). For each sequence, we separated the component of the log-likelihood deviation due to branch length differences associated with alternative topologies from that due to those that are independent of the topology. Even among the sequences that recovered the same tree topology, some produced significantly better branch length estimates than others did. The combination of correct topology and significantly better branch length estimation suggests that these gene combinations may prove useful in estimating phylogenetic relationships for mammalian divergences below the ordinal level. Thus, the proper choice of genes to sequence is a critical factor for reliable estimation of evolutionary history from molecular data.  相似文献   

2.
Because of the stochastic way in which lineages sort during speciation, gene trees may differ in topology from each other and from species trees. Surprisingly, assuming that genetic lineages follow a coalescent model of within-species evolution, we find that for any species tree topology with five or more species, there exist branch lengths for which gene tree discordance is so common that the most likely gene tree topology to evolve along the branches of a species tree differs from the species phylogeny. This counterintuitive result implies that in combining data on multiple loci, the straightforward procedure of using the most frequently observed gene tree topology as an estimate of the species tree topology can be asymptotically guaranteed to produce an incorrect estimate. We conclude with suggestions that can aid in overcoming this new obstacle to accurate genomic inference of species phylogenies.  相似文献   

3.
Gene trees are evolutionary trees representing the ancestry of genes sampled from multiple populations. Species trees represent populations of individuals—each with many genes—splitting into new populations or species. The coalescent process, which models ancestry of gene copies within populations, is often used to model the probability distribution of gene trees given a fixed species tree. This multispecies coalescent model provides a framework for phylogeneticists to infer species trees from gene trees using maximum likelihood or Bayesian approaches. Because the coalescent models a branching process over time, all trees are typically assumed to be rooted in this setting. Often, however, gene trees inferred by traditional phylogenetic methods are unrooted. We investigate probabilities of unrooted gene trees under the multispecies coalescent model. We show that when there are four species with one gene sampled per species, the distribution of unrooted gene tree topologies identifies the unrooted species tree topology and some, but not all, information in the species tree edges (branch lengths). The location of the root on the species tree is not identifiable in this situation. However, for 5 or more species with one gene sampled per species, we show that the distribution of unrooted gene tree topologies identifies the rooted species tree topology and all its internal branch lengths. The length of any pendant branch leading to a leaf of the species tree is also identifiable for any species from which more than one gene is sampled.  相似文献   

4.
Yang Z 《Systematic biology》1998,47(1):125-133
The effect of the evolutionary rate of a gene on the accuracy of phylogeny reconstruction was examined by computer stimulation. The evolutionary rate is measured by the tree length, that is, the expected total number of nucleotide substitutions per site on the phylogeny. DNA sequence data were simulated using both fixed trees with specified branch lengths and random trees with branch lengths generated from a model of cladogenesis. The parsimony and likelihood methods were used for phylogeny reconstruction, and the proportion of correctly recovered branch partitions by each method was estimated. Phylogenetic methods including parsimony appear quite tolerant of multiple substitutions at the same site. The optimum levels of sequence divergence were even higher than upper limits previously suggested for saturation of substitutions, indicating that the problem of saturation may have been exaggerated. Instead, the lack of information at low levels of divergence should be seriously considered in evaluation of a gene's phylogenetic utility, especially when the gene sequence is short. The performance of parsimony, relative to that of likelihood, does not necessarily decrease with the increase of the evolutionary rate.  相似文献   

5.
Abstract.— The importance of accommodating the phylogenetic history of a group when performing a comparative analysis is now widely recognized. The typical approaches either assume the tree is known without error, or they base inferences on a collection of well-supported trees or on a collection of trees generated under a stochastic model of cladogenesis. However, these approaches do not adequately account for the uncertainty of phylogenetic trees in a comparative analysis, especially when data relevant to the phylogeny of a group are available. Here, we develop a method for performing comparative analyses that is based on an extension of Felsenstein's independent contrasts method. Uncertainties in the phylogeny, branch lengths, and other parameters are accommodated by averaging over all possible trees, weighting each by the probability that the tree is correct. We do this in a Bayesian framework and use Markov chain Monte Carlo to perform the high-dimensional summations and integrations required by the analysis. We illustrate the method using comparative characters sampled from Anolis lizards.  相似文献   

6.
Several methods have been designed to infer species trees from gene trees while taking into account gene tree/species tree discordance. Although some of these methods provide consistent species tree topology estimates under a standard model, most either do not estimate branch lengths or are computationally slow. An exception, the GLASS method of Mossel and Roch, is consistent for the species tree topology, estimates branch lengths, and is computationally fast. However, GLASS systematically overestimates divergence times, leading to biased estimates of species tree branch lengths. By assuming a multispecies coalescent model in which multiple lineages are sampled from each of two taxa at L independent loci, we derive the distribution of the waiting time until the first interspecific coalescence occurs between the two taxa, considering all loci and measuring from the divergence time. We then use the mean of this distribution to derive a correction to the GLASS estimator of pairwise divergence times. We show that our improved estimator, which we call iGLASS, consistently estimates the divergence time between a pair of taxa as the number of loci approaches infinity, and that it is an unbiased estimator of divergence times when one lineage is sampled per taxon. We also show that many commonly used clustering methods can be combined with the iGLASS estimator of pairwise divergence times to produce a consistent estimator of the species tree topology. Through simulations, we show that iGLASS can greatly reduce the bias and mean squared error in obtaining estimates of divergence times in a species tree.  相似文献   

7.
A biologically realistic method was used to simulate evolutionary trees. The method uses a real DNA coding sequence as the starting point, simulates mutation according to the mutational spectrum of Escherichia coli-including base substitutions, insertions, and deletions-and separates the processes of mutation and selection. Trees of 8, 16, 32, and 64 taxa were simulated with average branch lengths of 50, 100, 150, 200, and 250 changes per branch. The resulting sequences were aligned with ClustalX, and trees were estimated by Neighbor Joining, Parsimony, Maximum Likelihood, and Bayesian methods from both DNA sequences and the corresponding protein sequences. The estimated trees were compared with the true trees, and both topological and branch length accuracies were scored. Over the variety of conditions tested, Bayesian trees estimated from DNA sequences that had been aligned according to the alignment of the corresponding protein sequences were the most accurate, followed by Maximum Likelihood trees estimated from DNA sequences and Parsimony trees estimated from protein sequences.  相似文献   

8.
The maximum-likelihood (ML) solution to a simple phylogenetic estimation problem is obtained analytically The problem is estimation of the rooted tree for three species using binary characters with a symmetrical rate of substitution under the molecular clock. ML estimates of branch lengths and log-likelihood scores are obtained analytically for each of the three rooted binary trees. Estimation of the tree topology is equivalent to partitioning the sample space (space of possible data outcomes) into subspaces, within each of which one of the three binary trees is the ML tree. Distance-based least squares and parsimony-like methods produce essentially the same estimate of the tree topology, although differences exist among methods even under this simple model. This seems to be the simplest case, but has many of the conceptual and statistical complexities involved in phylogeny estimation. The solution to this real phylogeny estimation problem will be useful for studying the problem of significance evaluation.  相似文献   

9.
We would like to use maximum likelihood to estimate parameters such as the effective population size N(e) or, if we do not know mutation rates, the product 4N(e) mu of mutation rate per site and effective population size. To compute the likelihood for a sample of unrecombined nucleotide sequences taken from a random-mating population it is necessary to sum over all genealogies that could have led to the sequences, computing for each one the probability that it would have yielded the sequences, and weighting each one by its prior probability. The genealogies vary in tree topology and in branch lengths. Although the likelihood and the prior are straightforward to compute, the summation over all genealogies seems at first sight hopelessly difficult. This paper reports that it is possible to carry out a Monte Carlo integration to evaluate the likelihoods approximately. The method uses bootstrap sampling of sites to create data sets for each of which a maximum likelihood tree is estimated. The resulting trees are assumed to be sampled from a distribution whose height is proportional to the likelihood surface for the full data. That it will be so is dependent on a theorem which is not proven, but seems likely to be true if the sequences are not short. One can use the resulting estimated likelihood curve to make a maximum likelihood estimate of the parameter of interest, N(e) or of 4N(e) mu. The method requires at least 100 times the computational effort required for estimation of a phylogeny by maximum likelihood, but is practical on today's work stations. The method does not at present have any way of dealing with recombination.  相似文献   

10.
Summary Operator metrics are explicity designed to measure evolutionary distances from nucleic acid sequences when substitution rates differ greatly among the organisms being compared, or when substitutions have been extensive. Unlike lengths calculated by the distance matrix and parsimony methods, in which substitutions in one branch of a tree can alter the measured length of another branch, lengths determined by operator metrics are not affected by substitutions outside the branch.In the method, lengths (operator metrics) corresponding to each of the branches of an unrooted tree are calculated. The metric length of a branch reconstructs the number of (transversion) differences between sequences at a tip and a node (or between nodes) of a tree. The theory is general and is fundamentally independent of differences in substitution rates among the organisms being compared. Mathematically, the independence has been obtained becuase the metrics are eigen vectors of fundamental equations which describe the evolution of all unrooted trees.Even under conditions when both the distance matrix method or a simple parsimony length method are show to indicate lengths than are an order of magnitude too large or too small, the operator metrics are accurate. Examples, using data calculated with evolutionary rates and branchings designed to confuse the measurement of branch lengths and to camouflage the topology of the true tree, demonstrate the validity of operator metrics. The method is robust. Operator metric distances are easy to calculated, can be extended to any number of taxa, and provide a statistical estimate of their variances.The utility of the method is demonstrated by using it to analyze the origins and evolutionary of chloroplasts, mitochondria, and eubacteria.  相似文献   

11.
Rannala B  Yang Z 《Genetics》2003,164(4):1645-1656
The effective population sizes of ancestral as well as modern species are important parameters in models of population genetics and human evolution. The commonly used method for estimating ancestral population sizes, based on counting mismatches between the species tree and the inferred gene trees, is highly biased as it ignores uncertainties in gene tree reconstruction. In this article, we develop a Bayes method for simultaneous estimation of the species divergence times and current and ancestral population sizes. The method uses DNA sequence data from multiple loci and extracts information about conflicts among gene tree topologies and coalescent times to estimate ancestral population sizes. The topology of the species tree is assumed known. A Markov chain Monte Carlo algorithm is implemented to integrate over uncertain gene trees and branch lengths (or coalescence times) at each locus as well as species divergence times. The method can handle any species tree and allows different numbers of sequences at different loci. We apply the method to published noncoding DNA sequences from the human and the great apes. There are strong correlations between posterior estimates of speciation times and ancestral population sizes. With the use of an informative prior for the human-chimpanzee divergence date, the population size of the common ancestor of the two species is estimated to be approximately 20,000, with a 95% credibility interval (8000, 40,000). Our estimates, however, are affected by model assumptions as well as data quality. We suggest that reliable estimates have yet to await more data and more realistic models.  相似文献   

12.
Amplified fragment length polymorphisms (AFLPs) are widely used for phylogenetic inference especially in non-model species. Frequently, trees obtained with other nuclear or mitochondrial markers or with morphological information need additional resolution, increased branch support, or independent data sources (i.e. unlinked loci). In such cases, the use of AFLPs is a quick and cheap option. Computer simulation has shown that dominant AFLP markers lead to less accurate tree topologies than bi-allelic codominant markers such as SNPs, but this difference becomes negligible for shallow trees when using AFLP data sets that include a sufficiently large number of characters. Thus, determining how many AFLP characters are required to recover a given phylogeny is a key issue regarding the appropriateness of AFLPs for phylogenetic reconstruction. Here, we present a user-friendly, java-based graphical interface, AFLPMax, which executes an automatic pipeline of different programs providing the user with the optimal number of AFLP characters needed to recover a given phylogeny with high accuracy and support. Executables for Windows, linux and MacOS X operating systems, source code and user manual are available from: http://webs.uvigo.es/acraaj/AFLPMax.htm.  相似文献   

13.
Pichia kluyveri, a sexual ascomycetous yeast from cactus necroses and acidic fruit, is divided into three varieties. We used physiological, RAPD, and AFLP data to compare 46 P. kluyveri strains collected worldwide to investigate relationships among varieties. Physiology did not place all strains into described varieties. Although the combined AFLP and RAPD data produced a single most parsimonious tree, separate analysis of AFLP and RAPD data resulted in significantly different trees (by the partition homogeneity test). We then compared the distribution of strains per band to an expected distribution. This suggested we could separate both the AFLP and RAPD datasets into bands from rapidly and slowly changing DNA regions. When only bands from slowly changing regions (from each dataset) were included in the analysis, both the RAPD and AFLP datasets supported a single tree. This second tree did not differ significantly from the cladogram based on all of the DNA data, which we accepted as the best estimate of the phylogeny of these yeast strains. Based on this phylogeny, we were able to demonstrate the strong influence of geography on the population structure of this yeast, confirm the monophyly of one variety, question the utility of maintaining another variety, and demonstrate that the physiological differences used to separate the varieties did not do so in all cases.  相似文献   

14.
Shorebirds (Charadriiformes) are a diverse assemblage of species renowned for their variation in behavior, morphology, and life-history traits, but comparative studies of trait variation remain limited by the lack of a well-supported phylogeny based on DNA sequences. In this study we build upon previous shorebird phylogenies to construct the first sequence-based species-level phylogeny for the Scolopaci, one of three shorebird suborders. We sampled 84 species in the Scolopaci, and collected data for five genes (one nuclear and four mitochondrial) via PCR and sequencing or from GenBank. The phylogeny was estimated using Bayesian inference on a partitioned dataset of 6365 aligned base pairs, and was well-supported except for the radiations within Tringa and Calidris. The shanks and phalaropes are sister to the snipes, woodcocks and dowitchers, which in turn are sister to the sandpipers. The godwits and curlews are successive sister-groups to these clades, and the morphologically disparate taxa (jacanas, painted snipes, seedsnipes, and the Plains-wanderer) are the basal sister-group in the Scolopaci. We show that Tringa, Gallinago, and Calidris are paraphyletic assemblages, and thus are in need of taxonomic revision. The clade of Calidridine sandpipers has very short internal branches indicative of a relatively recent rapid radiation, and will require a gene tree/species tree approach to resolve relationships among species.  相似文献   

15.
Under a coalescent model for within-species evolution, gene trees may differ from species trees to such an extent that the gene tree topology most likely to evolve along the branches of a species tree can disagree with the species tree topology. Gene tree topologies that are more likely to be produced than the topology that matches that of the species tree are termed anomalous, and the region of branch-length space that gives rise to anomalous gene trees (AGTs) is the anomaly zone. We examine the occurrence of anomalous gene trees for the case of five taxa, the smallest number of taxa for which every species tree topology has a nonempty anomaly zone. Considering all sets of branch lengths that give rise to anomalous gene trees, the largest value possible for the smallest branch length in the species tree is greater in the five-taxon case (0.1934 coalescent time units) than in the previously studied case of four taxa (0.1568). The five-taxon case demonstrates the existence of three phenomena that do not occur in the four-taxon case. First, anomalous gene trees can have the same unlabeled topology as the species tree. Second, the anomaly zone does not necessarily enclose a ball centered at the origin in branch-length space, in which all branches are short. Third, as a branch length increases, it is possible for the number of AGTs to increase rather than decrease or remain constant. These results, which help to describe how the properties of anomalous gene trees increase in complexity as the number of taxa increases, will be useful in formulating strategies for evading the problem of anomalous gene trees during species tree inference from multilocus data.  相似文献   

16.
Development of methods for estimating species trees from multilocus data is a current challenge in evolutionary biology. We propose a method for estimating the species tree topology and branch lengths using approximate Bayesian computation (ABC). The method takes as data a sample of observed rooted gene tree topologies, and then iterates through the following sequence of steps: First, a randomly selected species tree is used to compute the distribution of rooted gene tree topologies. This distribution is then compared to the observed gene topology frequencies, and if the fit between the observed and the predicted distributions is close enough, the proposed species tree is retained. Repeating this many times leads to a collection of retained species trees that are then used to form the estimate of the overall species tree. We test the performance of the method, which we call ST-ABC, using both simulated and empirical data. The simulation study examines both symmetric and asymmetric species trees over a range of branch lengths and sample sizes. The results from the simulation study show that the model performs very well, giving accurate estimates for both the topology and the branch lengths across the conditions studied, and that a sample size of 25 loci appears to be adequate for the method. Further, we apply the method to two empirical cases: a 4-taxon data set for primates and a 7-taxon data set for yeast. In both cases, we find that estimates obtained with ST-ABC agree with previous studies. The method provides efficient estimation of the species tree, and does not require sequence data, but rather the observed distribution of rooted gene topologies without branch lengths. Therefore, this method is a useful alternative to other currently available methods for species tree estimation.  相似文献   

17.
Mitochondrial cytochrome b sequence data from 15 species of herons (Aves: Ardeidae), representing 13 genera, were compared with DNA hybridization data of single-copy nuclear DNA (scnDNA) from the same species in a taxonomic congruence assessment of heron phylogeny. The two data sets produced a partially resolved, completely congruent estimate of phylogeny with the following basic structure: (Tigrisoma, Cochlearius, (((Zebrilus, (Ixobrychus, Botaurus)), (((Ardea, Casmerodius), Bubulcus), ((Egretta thula, Egretta caerulea, Egretta tricolor), Syrigma), Butorides, Nycticorax, Nyctanassa)))). Because congruence indicated similar phylogenetic information in the two data sets, we used the relatively unsaturated DNA hybridization distances as surrogates of time to examine graphically the patterns and rates of change in cytochrome b distances. Cytochrome b distances were computed either from whole sequences or from partitioned sequences consisting of transitions, transversions, specific codon site positions, or specific protein-coding regions. These graphical comparisons indicated that unpartitioned cytochrome b has evolved at 5-10 times the rate of scnDNA. Third-position transversions appeared to offer the most useful sequence partition for phylogenetic analysis because of their relatively fast rate of substitution (two times that of scnDNA) and negligible saturation. We also examined lineage-based rates of evolution by comparing branch length patterns between the nuclear and cytochrome b trees. The degree of correlation in corresponding branch lengths between cytochrome b and DNA hybridization trees depended on DNA sequence partitioning. When cytochrome b sequences were not partitioned, branch lengths in the cytochrome b and DNA hybridization trees were not correlated. However, when cytochrome b sequences were reduced to third-position transversions (i.e., unsaturated, relatively fast changing data), branch lengths were correlated. This finding suggests that lineage-based rates of DNA evolution in nuclear and mitochondrial genomes are influenced by common causes.  相似文献   

18.
Phylogeny reconstruction is challenging when branch lengths vary and when different genetic loci show conflicting signals. The number of DNA sequence characters required to obtain robust support for all the nodes in a phylogeny becomes greater with denser taxon sampling. We test the usefulness of an approach mixing densely sampled, variable non-coding sequences (trnL-F; rpl16; atpB-rbcL; ITS) with sparsely sampled, more conservative protein coding and ribosomal sequences (matK; ndhF; rbcL; 26S), for the grass subfamily Danthonioideae. Previous phylogenetic studies of Danthonioideae revealed extensive generic paraphyly, but were often impeded by insufficient character and taxon sampling and apparent inter-gene conflict. Our variably-sampled supermatrix approach allowed us to represent 79% of the species with up to c. 9900 base pairs for taxa representing the major clades. A 'taxon duplication' approach for taxa with conflicting phylogenetic signals allowed us to combine the data whilst representing the differences between chloroplast and nuclear encoded gene trees. This approach efficiently improves resolution and support whilst maximising representation of taxa and their sometimes composite evolutionary histories, resulting in a phylogeny of the Danthonioideae that will be useful both for a wide range of evolutionary studies and to inform forthcoming realignment of generic delimitations in the subfamily.  相似文献   

19.
应用叶绿体DNAtrnL内含子序列分析檀香目科间的系统发育关系。取样研究的檀香目个体的trnL内含子序列长度在科间呈现较大差异(从291bp到587bp)。最大简约性分析产生的严格一致树与以前已发表的基于其它基因的檀香目的分子系统学研究结果大体一致。香芙木属(铁青树科)是最早分支出的类群:桑寄生科、槲寄生科分别表现为单系类群,檀香科为并系;桑寄生科和槲寄生科并不具密切亲缘关系,槲寄生科从檀香科内衍生出来。本研究表明,具相对高的核苷酸替换率的叶绿体DNAtrnL内含子序列可为高等级类群系统发育关系的研究提供更多的信息位点。  相似文献   

20.
Summary The effects of temporal (among different branches of a phylogeny) and spatial (among different nucleotide sites within a gene) nonuniformities of nucleotide substitution rates on the construction of phylogenetic trees from nucleotide sequences are addressed. Spatial nonuniformity may be estimated by using Shannon's (1948) entropy formula to measure the Relative Nucleotide Variability (RNV) at each nucleotide site in an aligned set of sequences; this is demonstrated by a comparative analysis of 5S rRNAs. New methods of constructing phylogenetic trees are proposed that augment the Unweighted Pair-Group Using Arithmetic Averages (UPGMA) algorithm by estimating and compensating for both spatial and temporal nonuniformity in substitution rates. These methods are evaluated by computer simulations of 5S rRNA evolution that include both kinds of nonuniformities. It was found that the proposed Reference Ratio Method improved both the ability to reconstruct the correct topology of a tree and also the estimation of branch lengths as compared to UPGMA. A previous method (Farris et al. 1970; Klotz et al. 1979; Li 1981) was found to be less successful in reconstructing topologies when there is high probability of multiple mutations at some sites. Phylogenetic analyses of 5S rRNA sequences support the endosymbiotic origins of both chloroplasts and mitochondria, even though the latter exhibit an accelerated rate of nucleotide substitution. Phylogenetic trees also reveal an adaptive radiation within the eubacteria and another within the eukaryotes for the origins of most major phyla within each group during the Precambrian era.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号