首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 546 毫秒
1.
Phylogenetic trees from multiple genes can be obtained in two fundamentally different ways. In one, gene sequences are concatenated into a super-gene alignment, which is then analyzed to generate the species tree. In the other, phylogenies are inferred separately from each gene, and a consensus of these gene phylogenies is used to represent the species tree. Here, we have compared these two approaches by means of computer simulation, using 448 parameter sets, including evolutionary rate, sequence length, base composition, and transition/transversion rate bias. In these simulations, we emphasized a worst-case scenario analysis in which 100 replicate datasets for each evolutionary parameter set (gene) were generated, and the replicate dataset that produced a tree topology showing the largest number of phylogenetic errors was selected to represent that parameter set. Both randomly selected and worst-case replicates were utilized to compare the consensus and concatenation approaches primarily using the neighbor-joining (NJ) method. We find that the concatenation approach yields more accurate trees, even when the sequences concatenated have evolved with very different substitution patterns and no attempts are made to accommodate these differences while inferring phylogenies. These results appear to hold true for parsimony and likelihood methods as well. The concatenation approach shows >95% accuracy with only 10 genes. However, this gain in accuracy is sometimes accompanied by reinforcement of certain systematic biases, resulting in spuriously high bootstrap support for incorrect partitions, whether we employ site, gene, or a combined bootstrap resampling approach. Therefore, it will be prudent to report the number of individual genes supporting an inferred clade in the concatenated sequence tree, in addition to the bootstrap support.  相似文献   

2.
A new method is presented for inferring evolutionary trees using nucleotide sequence data. The birth-death process is used as a model of speciation and extinction to specify the prior distribution of phylogenies and branching times. Nucleotide substitution is modeled by a continuous-time Markov process. Parameters of the branching model and the substitution model are estimated by maximum likelihood. The posterior probabilities of different phylogenies are calculated and the phylogeny with the highest posterior probability is chosen as the best estimate of the evolutionary relationship among species. We refer to this as the maximum posterior probability (MAP) tree. The posterior probability provides a natural measure of the reliability of the estimated phylogeny. Two example data sets are analyzed to infer the phylogenetic relationship of human, chimpanzee, gorilla, and orangutan. The best trees estimated by the new method are the same as those from the maximum likelihood analysis of separate topologies, but the posterior probabilities are quite different from the bootstrap proportions. The results of the method are found to be insensitive to changes in the rate parameter of the branching process. Correspondence to: Z. Yang  相似文献   

3.
Phylogenetic relationships of mushrooms and their relatives within the order Agaricales were addressed by using nuclear large subunit ribosomal DNA sequences. Approximately 900 bases of the 5' end of the nucleus-encoded large subunit RNA gene were sequenced for 154 selected taxa representing most families within the Agaricales. Several phylogenetic methods were used, including weighted and equally weighted parsimony (MP), maximum likelihood (ML), and distance methods (NJ). The starting tree for branch swapping in the ML analyses was the tree with the highest ML score among previously produced MP and NJ trees. A high degree of consensus was observed between phylogenetic estimates obtained through MP and ML. NJ trees differed according to the distance model that was used; however, all NJ trees still supported most of the same terminal groupings as the MP and ML trees did. NJ trees were always significantly suboptimal when evaluated against the best MP and ML trees, by both parsimony and likelihood tests. Our analyses suggest that weighted MP and ML provide the best estimates of Agaricales phylogeny. Similar support was observed between bootstrapping and jackknifing methods for evaluation of tree robustness. Phylogenetic analyses revealed many groups of agaricoid fungi that are supported by moderate to high bootstrap or jackknife values or are consistent with morphology-based classification schemes. Analyses also support separate placement of the boletes and russules, which are basal to the main core group of gilled mushrooms (the Agaricineae of Singer). Examples of monophyletic groups include the families Amanitaceae, Coprinaceae (excluding Coprinus comatus and subfamily Panaeolideae), Agaricaceae (excluding the Cystodermateae), and Strophariaceae pro parte (Stropharia, Pholiota, and Hypholoma); the mycorrhizal species of Tricholoma (including Leucopaxillus, also mycorrhizal); Mycena and Resinomycena; Termitomyces, Podabrella, and Lyophyllum; and Pleurotus with Hohenbuehelia. Several groups revealed by these data to be nonmonophyletic include the families Tricholomataceae, Cortinariaceae, and Hygrophoraceae and the genera Clitocybe, Omphalina, and Marasmius. This study provides a framework for future systematics studies in the Agaricales and suggestions for analyzing large molecular data sets.  相似文献   

4.
In phylogenetic analyses with combined multigene or multiprotein data sets, accounting for differing evolutionary dynamics at different loci is essential for accurate tree prediction. Existing maximum likelihood (ML) and Bayesian approaches are computationally intensive. We present an alternative approach that is orders of magnitude faster. The method, Distance Rates (DistR), estimates rates based upon distances derived from gene/protein sequence data. Simulation studies indicate that this technique is accurate compared with other methods and robust to missing sequence data. The DistR method was applied to a fungal mitochondrial data set, and the rate estimates compared well to those obtained using existing ML and Bayesian approaches. Inclusion of the protein rates estimated from the DistR method into the ML calculation of trees as a branch length multiplier resulted in a significantly improved fit as measured by the Akaike Information Criterion (AIC). Furthermore, bootstrap support for the ML topology was significantly greater when protein rates were used, and some evident errors in the concatenated ML tree topology (i.e., without protein rates) were corrected. [Bayesian credible intervals; DistR method; multigene phylogeny; PHYML; rate heterogeneity.].  相似文献   

5.
This study is a phylogenetic analysis of the avian family Ciconiidae, the storks, based on two molecular data sets: 1065 base pairs of sequence from the mitochondrial cytochromebgene and a complete matrix of single-copy nuclear DNA–DNA hybridization distances. Sixteen of the nineteen stork species were included in the cytochromebdata matrix, and fifteen in the DNA–DNA hybridization matrix. Both matrices included outgroups from the families Cathartidae (New World vultures) and Threskiornithidae (ibises, spoonbills). Optimal trees based on the two data sets were congruent in those nodes with strong bootstrap support. In the best-fit tree based on DNA–DNA hybridization distances, nodes defining relationships among very recently diverged species had low bootstrap support, while nodes defining more distant relationships had strong bootstrap support. In the optimal trees based on the sequence data, nodes defining relationships among recently diverged species had strong bootstrap support, while nodes defining basal relationships in the family had weak support and were incongruent among analyses. A combinable-component consensus of the best-fit DNA–DNA hybridization tree and a consensus tree based on different analyses of the cytochromebsequences provide the best estimate of relationships among stork species based on the two data sets.  相似文献   

6.
Comparative restriction site mapping of the chloroplast genome was performed to examine phylogenetic relationships among 27 species representing 16 genera of the Berberidaceae and two outgroups. Chloroplast genomes of the species included in this study showed no major structural rearrangements (i.e., they are collinear to tobacco cpDNA) except for the extension of the inverted repeat in species of Berberis and Mahonia. Excluding several regions that exhibited severe length variation, a total of 501 phylogenetically informative sites was mapped for ten restriction enzymes. The strict consensus tree of 14 equally parsimonious trees indicated that some berberidaceous genera (Berberis, Mahonia, Diphylleia) are not monophyletic. To explore phylogenetic utility of different parsimony methods phylogenetic trees were generated using Wagner, Dollo, and weighted parsimony for a reduced data set that included 18 species. One of the most significant results was the recognition of the four chromosomal groups, which were strongly supported regardless of the parsimony method used. The most notable difference among the trees produced by the three parsimony methods was the relationships among the four chromosomal groups. The cpDNA trees also strongly supported a close relationship of several generic pairs (e.g., Berberis-Mahonia, Epimedium-Vancouveria, etc.). Maximum likelihood values were computed for the four different tree topologies of the chromosomal groups, two Wagner, one Dollo, and one weighted topology. The results indicate that the weighted tree has the highest likelihood value. The lowest likelihood value was obtained for the Dollo tree, which had the highest bootstrap and decay values. Separate analyses using only the Inverted Repeat (IR) region resulted in a tree that is identical to the weighted tree. Poor resolution and/or support for the relationships among the four chromosomal lineages of the Berberidaceae indicate that they may have radiated from an ancestral stock in a relatively short evolutionary time.  相似文献   

7.
Several data partitions, including nuclear and mitochondrial gene sequences, chromosomes, isoenzymes, and morphological characters, were used to propose a new phylogeny and to test previously published hypotheses about the phylogenetic positions of basal clades of the lizard genus Sceloporus and the relationship of Sceloporus to the former genus "Sator". In accord with earlier studies, our results grouped "Sator" as internal to Sceloporus, and both support a hypothesis of transgulfian vicariance for the origin of the former genus "Sator" on islands in the Sea of Cortez. Robustness of support for internal nodes in our best tree was established though widely used indices (bootstrap proportions, decay values) but also through congruence among independent data partitions. Several deep nodes in the tree recovered by several methods, including equally weighted and differentially weighted parsimony and maximum likelihood models, are only weakly supported by the traditional indices. This methodological concordance is taken as evidence for insensitivity of the deep structure of the topology to alternative assumptions.  相似文献   

8.
Comprehensively sampled phylogenetic trees provide the most compelling foundations for strong inferences in comparative evolutionary biology. Mismatches are common, however, between the taxa for which comparative data are available and the taxa sampled by published phylogenetic analyses. Moreover, many published phylogenies are gene trees, which cannot always be adapted immediately for species level comparisons because of discordance, gene duplication, and other confounding biological processes. A new database, STBase, lets comparative biologists quickly retrieve species level phylogenetic hypotheses in response to a query list of species names. The database consists of 1 million single- and multi-locus data sets, each with a confidence set of 1000 putative species trees, computed from GenBank sequence data for 413,000 eukaryotic taxa. Two bodies of theoretical work are leveraged to aid in the assembly of multi-locus concatenated data sets for species tree construction. First, multiply labeled gene trees are pruned to conflict-free singly-labeled species-level trees that can be combined between loci. Second, impacts of missing data in multi-locus data sets are ameliorated by assembling only decisive data sets. Data sets overlapping with the user’s query are ranked using a scheme that depends on user-provided weights for tree quality and for taxonomic overlap of the tree with the query. Retrieval times are independent of the size of the database, typically a few seconds. Tree quality is assessed by a real-time evaluation of bootstrap support on just the overlapping subtree. Associated sequence alignments, tree files and metadata can be downloaded for subsequent analysis. STBase provides a tool for comparative biologists interested in exploiting the most relevant sequence data available for the taxa of interest. It may also serve as a prototype for future species tree oriented databases and as a resource for assembly of larger species phylogenies from precomputed trees.  相似文献   

9.
Martin FN  Tooley PW 《Mycologia》2003,95(2):269-284
The phylogenetic relationships of 51 isolates representing 27 species of Phytophthora were assessed by sequence alignment of 568 bp of the mitochondrially encoded cytochrome oxidase II gene. A total of 1299 bp of the cytochrome oxidase I gene also were examined for a subset of 13 species. The cox II gene trees constructed by a heuristic search, based on maximum parsimony for a bootstrap 50% majority-rule consensus tree, revealed 18 species grouping into seven clades and nine species unaffiliated with a specific clade. The phylogenetic relationships among species observed on cox II gene trees did not exhibit consistent similarities in groupings for morphology, pathogenicity, host range or temperature optima. The topology of cox I gene trees, constructed by a heuristic search based on maximum parsimony for a bootstrap 50% majority-rule consensus tree for 13 species of Phytophthora, revealed 10 species grouping into three clades and three species unaffiliated with a specific clade. The groupings in general agreed with what was observed in the cox II tree. Species relationships observed for the cox II gene tree were in agreement with those based on ITS regions, with several notable exceptions. Some of these differences were noted in species in which the same isolates were used for both ITS and cox II analysis, suggesting either a differential rate of evolutionary divergence for these two regions or incorrect assumptions about alignment of ITS sequences. Analysis of combined data sets of ITS and cox II sequences generated a tree that did not differ substantially from analysis of ITS data alone, however, the results of a partition homogeneity test suggest that combining data sets may not be valid.  相似文献   

10.
伊珍珍  陈子桂  高珊  宋微波 《动物学报》2007,53(6):1031-1040
以36种旋唇类高等类群纤毛虫的核糖体小亚基核苷酸(Small subunit ribosomal RNA,SS rRNA)基因序列为素材,比较研究了不同条件(包括外类群、内类群的选择,同一基因不同序列长度的组合,不同建树方法和不同分析软件的使用)对纤毛虫分子系统树构建结果的影响。结果表明,上述因素均可不同程度地影响拓扑结构。结果同时提示,在利用有限数据进行相关研究,特别是在对未明类群的系统关系分析中,必须充分考虑因建树条件的不同所带来的影响。作者同时也建议,在当前可用的分子信息欠充分的前提下,对于纤毛虫任何类群的分子系统学探讨而言,慎重形成结论并尽可能地结合和参照形态学、发生学等资讯,仍是需优先考虑的工作路线。  相似文献   

11.
We studied the factors affecting the accuracy of the neighbor-joining (NJ) method for estimating phylogenies by simulating character change under different evolutionary models applied to twenty different 8-OTU tree topologies that varied widely with respect to tree imbalance and stemminess. The models incorporated three evolutionary rates—constant, varying among lineages, varying among characters—and three evolutionary contexts concerning patterns of character change relative to speciation events—phyletic, speciational, and punctuational. All combinations of the rate and context models were studied. In addition, three different absolute rates of change were investigated. To measure the accuracy, the strict consensus index was computed between the estimated tree and the tree topology along which the data had been generated. The results were analyzed by analysis of variance and compared to a previous study that evaluated UPGMA clustering and maximum parsimony (MP) as phylogenetic estimation techniques. We found evolutionary context and tree imbalance to be the most important factors affecting the accuracy of the NJ method. NJ was more accurate than UPGMA or MP in terms of the average strict consensus index over all treatments. However, no one method was more accurate than the other two for all combinations of treatments. Higher absolute rate of change generally resulted in higher accuracy for all three methods.  相似文献   

12.
Elongation factor 1 alpha (EF-1 alpha) is a highly conserved ubiquitous protein involved in translation that has been suggested to have desirable properties for phylogenetic inference. To examine the utility of EF-1 alpha as a phylogenetic marker for eukaryotes, we studied three properties of EF-1 alpha trees: congruency with other phyogenetic markers, the impact of species sampling, and the degree of substitutional saturation occurring between taxa. Our analyses indicate that the EF-1 alpha tree is congruent with some other molecular phylogenies in identifying both the deepest branches and some recent relationships in the eukaryotic line of descent. However, the topology of the intermediate portion of the EF-1 alpha tree, occupied by most of the protist lineages, differs for different phylogenetic methods, and bootstrap values for branches are low. Most problematic in this region is the failure of all phylogenetic methods to resolve the monophyly of two higher-order protistan taxa, the Ciliophora and the Alveolata. JACKMONO analyses indicated that the impact of species sampling on bootstrap support for most internal nodes of the eukaryotic EF-1 alpha tree is extreme. Furthermore, a comparison of observed versus inferred numbers of substitutions indicates that multiple overlapping substitutions have occurred, especially on the branch separating the Eukaryota from the Archaebacteria, suggesting that the rooting of the eukaryotic tree on the diplomonad lineage should be treated with caution. Overall, these results suggest that the phylogenies obtained from EF-1 alpha are congruent with other molecular phylogenies in recovering the monophyly of groups such as the Metazoa, Fungi, Magnoliophyta, and Euglenozoa. However, the interrelationships between these and other protist lineages are not well resolved. This lack of resolution may result from the combined effects of poor taxonomic sampling, relatively few informative positions, large numbers of overlapping substitutions that obscure phylogenetic signal, and lineage-specific rate increases in the EF-1 alpha data set. It is also consistent with the nearly simultaneous diversification of major eukaryotic lineages implied by the "big-bang" hypothesis of eukaryote evolution.  相似文献   

13.
The plastid-bearing members of the Cryptophyta contain two functional eukaryotic genomes of different phylogenetic origin, residing in the nucleus and in the nucleomorph, respectively. These widespread and diverse protists thus offer a unique opportunity to study the coevolution of two different eukaryotic genomes within one group of organisms. In this study, the SSU rRNA genes of both genomes were PCR-amplified with specific primers and phylogenetic analyses were performed on different data sets using different evolutionary models. The results show that the composition of the principal clades obtained from the phylogenetic analyses of both genes was largely congruent, but striking differences in evolutionary rates were observed. These affected the topologies of the nuclear and nucleomorph phylogenies differently, resulting in long-branch attraction artifacts when simple evolutionary models were applied. Deletion of long-branch taxa stabilized the internal branching order in both phylogenies and resulted in a completely resolved topology in the nucleomorph phylogeny. A comparison of the tree topologies derived from SSU rDNA sequences with characters previously used in cryptophyte systematics revealed that the biliprotein type was congruent, but the type of inner periplast component incongruent, with the molecular trees. The latter is indicative of a hidden cellular dimorphism (cells with two periplast types present in a single clonal strain) of presumably widespread occurrence throughout cryptophyte diversity, which, in consequence, has far-reaching implications for cryptophyte systematics as it is practiced today.  相似文献   

14.
We examine the impact of likelihood surface characteristics on phylogenetic inference. Amino acid data sets simulated from topologies with branch length features chosen to represent varying degrees of difficulty for likelihood maximization are analyzed. We present situations where the tree found to achieve the global maximum in likelihood is often not equal to the true tree. We use the program covSEARCH to demonstrate how the use of adaptively sized pools of candidate trees that are updated using confidence tests results in solution sets that are highly likely to contain the true tree. This approach requires more computation than traditional maximum likelihood methods, hence covSEARCH is best suited to small to medium-sized alignments or large alignments with some constrained nodes. The majority rule consensus tree computed from the confidence sets also proves to be different from the generating topology. Although low phylogenetic signal in the input alignment can result in large confidence sets of trees, some biological information can still be obtained based on nodes that exhibit high support within the confidence set. Two real data examples are analyzed: mammal mitochondrial proteins and a small tubulin alignment. We conclude that the technique of confidence set optimization can significantly improve the robustness of phylogenetic inference at a reasonable computational cost. Additionally, when either very short internal branches or very long terminal branches are present, confident resolution of specific bipartitions or subtrees, rather than whole-tree phylogenies, may be the most realistic goal for phylogenetic methods. [Reviewing Editor: Dr. Nicolas Galtier]  相似文献   

15.
Nye TM 《Systematic biology》2008,57(5):785-794
Phylogenetic analysis very commonly produces several alternative trees for a given fixed set of taxa. For example, different sets of orthologous genes may be analyzed, or the analysis may sample from a distribution of probable trees. This article describes an approach to comparing and visualizing multiple alternative phylogenies via the idea of a "tree of trees" or "meta-tree." A meta-tree clusters phylogenies with similar topologies together in the same way that a phylogeny clusters species with similar DNA sequences. Leaf nodes on a meta-tree correspond to the original set of phylogenies given by some analysis, whereas interior nodes correspond to certain consensus topologies. The construction of meta-trees is motivated by analogy with construction of a most parsimonious tree for DNA data, but instead of using DNA letters, in a meta-tree the characters are partitions or splits of the set of taxa. An efficient algorithm for meta-tree construction is described that makes use of a known relationship between the majority consensus and parsimony in terms of gain and loss of splits. To illustrate these ideas meta-trees are constructed for two datasets: a set of gene trees for species of yeast and trees from a bootstrap analysis of a set of gene trees in ray-finned fish. A software tool for constructing meta-trees and comparing alternative phylogenies is available online, and the source code can be obtained from the author.  相似文献   

16.
Relationships among the five groups of extant seed plants (cycads, Ginkgo, conifers, Gnetales, and angiosperms) remain uncertain. To explore relationships among groups of extant seed plants further and to attempt to explain the conflict among molecular data sets, we assembled a data set of four plastid (cpDNA) genes (rbcL, atpB, psaA, and psbB), three mitochondrial (mtDNA) genes (mtSSU, coxI, and atpA), and one nuclear gene (18S rDNA) for 19 exemplars representing the five groups of living seed plants. Analyses of the combined eight-gene data set (15?772 base pairs/taxon) with maximum parsimony (MP), maximum likelihood (ML), and Bayesian approaches reveal a gymnosperm clade that is sister to angiosperms. Within the gymnosperms, a conifer clade includes Gnetales as sister to Pinaceae. Cycads and Ginkgo are either successive sisters to this conifer clade (including Gnetales) or a clade that is sister to conifers and Gnetales. All analyses of the mtDNA partition and ML analyses of the nuclear partition yield very similar topologies. However, MP analyses of the combined cpDNA genes place Gnetales as sister to all other seed plants with strong bootstrap support, whereas ML and Bayesian analyses of the cpDNA data set place Gnetales as sister to Pinaceae. Maximum parsimony and ML analyses of first and second codon positions of the cpDNA partiation also place Gnetales as sister to Pinaceae. In contrast, MP analyses of third codon positions place Gnetales as sister to other seed plants, although ML analyses of third codon positions place Gnetales with Pinaceae. Thus, most of the discrepancies in seed plant topologies involve third codon positions of cpDNA genes. The likelihood ratio (LR) and Shimodaira-Hasegasa (SH) tests were applied to the cpDNA data. The preferred topology based on the LR test is that Gnetales are sister to Pseudotsuga. The SH test based on first and second codon and all three codon positions indicated that there is no significant difference between the best topology (Gnetales sister to Pseudotsuga) and Gnetales sister to a conifer clade. However, there is a significant difference between the best topology and topologies in which Gnetales are sister to the rest of the seed plants or Gnetales sister to angiosperms.  相似文献   

17.
The phylogenetic relationships of multiple enterobacterial species were reconstructed based on 16S rDNA gene sequences to evaluate the robustness of this housekeeping gene in the taxonomic placement of the enteric plant pathogens Erwinia, Brenneria, Pectobacterium, and Pantoea. Four data sets were compiled, two of which consisted of previously published data. The data sets were designed in order to evaluate how 16S rDNA gene phylogenies are affected by the use of different plant pathogen accessions and varying numbers of animal pathogen and outgroup sequences. DNA data matrices were analyzed using maximum likelihood (ML) algorithms, and character support was determined by ML bootstrap and Bayesian analyses. As additional animal pathogen sequences were added to the phylogenetic analyses, taxon placement changed. Further, the phylogenies varied in their placement of the plant pathogen species, and only the genus Pantoea was monophyletic in all four trees. Finally, bootstrap and Bayesian support values were low for most of the nodes, and all nonterminal branches collapsed in strict consensus trees. Inspection of 16S rDNA nucleotide alignments revealed several highly variable blocks punctuated by regions of conserved sequence. These data suggest that 16S rDNA, while effective for both species-level and family-level phylogenetic reconstruction, may underperform for genus-level phylogenetic analyses in the Enterobacteriaceae.  相似文献   

18.
Revived interest in molluscan phylogeny has resulted in a torrent of molecular sequence data from phylogenetic, mitogenomic, and phylogenomic studies. Despite recent progress, basal relationships of the class Bivalvia remain contentious, owing to conflicting morphological and molecular hypotheses. Marked incongruity of phylogenetic signal in datasets heavily represented by nuclear ribosomal genes versus mitochondrial genes has also impeded consensus on the type of molecular data best suited for investigating bivalve relationships. To arbitrate conflicting phylogenetic hypotheses, we evaluated the utility of four nuclear protein-encoding genes-ATP synthase β, elongation factor-1α, myosin heavy chain type II, and RNA polymerase II-for resolving the basal relationships of Bivalvia. We sampled all five major lineages of bivalves (Archiheterodonta, Euheterodonta [including Anomalodesmata], Palaeoheterodonta, Protobranchia, and Pteriomorphia) and inferred relationships using maximum likelihood and Bayesian approaches. To investigate the robustness of the phylogenetic signal embedded in the data, we implemented additional datasets wherein length variability and/or third codon positions were eliminated. Results obtained include (a) the clade (Nuculanida+Opponobranchia), i.e., the traditionally defined Protobranchia; (b) the monophyly of Pteriomorphia; (c) the clade (Archiheterodonta+Palaeoheterodonta); (d) the monophyly of the traditionally defined Euheterodonta (including Anomalodesmata); and (e) the monophyly of Heteroconchia, i.e., (Palaeoheterodonta+Archiheterodonta+Euheterodonta). The stability of the basal tree topology to dataset manipulation is indicative of signal robustness in these four genes. The inferred tree topology corresponds closely to those obtained by datasets dominated by nuclear ribosomal genes (18S rRNA and 28S rRNA), controverting recent taxonomic actions based solely upon mitochondrial gene phylogenies.  相似文献   

19.
Interior-branch and bootstrap tests of phylogenetic trees   总被引:19,自引:3,他引:16  
We have compared statistical properties of the interior-branch and bootstrap tests of phylogenetic trees when the neighbor-joining tree- building method is used. For each interior branch of a predetermined topology, the interior-branch and bootstrap tests provide the confidence values, PC and PB, respectively, that indicate the extent of statistical support of the sequence cluster generated by the branch. In phylogenetic analysis these two values are often interpreted in the same way, and if PC and PB are high (say, > or = 0.95), the sequence cluster is regarded as reliable. We have shown that PC is in fact the complement of the P-value used in the standard statistical test, but PB is not. Actually, the bootstrap test usually underestimates the extent of statistical support of species clusters. The relationship between the confidence values obtained by the two tests varies with both the topology and expected branch lengths of the true (model) tree. The most conspicuous difference between PC and PB is observed when the true tree is starlike, and there is a tendency for the difference to increase as the number of sequences in the tree increases. The reason for this is that the bootstrap test tends to become progressively more conservative as the number of sequences in the tree increases. Unlike the bootstrap, the interior-branch test has the same statistical properties irrespective of the number of sequences used when a predetermined tree is considered. Therefore, the interior-branch test appears to be preferable to the bootstrap test as long as unbiased estimators of evolutionary distances are used. However, when the interior-branch is applied to a tree estimated from a given data set, PC may give an overestimate of statistical confidence. For this case, we developed a method for computing a modified version (P'C) of the PC value and showed that this P'C tends to give a conservative estimate of statistical confidence, though it is not as conservative as PB. In this paper we have introduced a model in which evolutionary distances between sequences follow a multivariate normal distribution. This model allowed us to study the relationships between the two tests analytically.   相似文献   

20.
The nucleotide substitution matrix inferred from avian data sets using cytochrome b differs considerably from the models commonly used in phylogenetic analyses. To analyze the possible effects of this particular pattern of change in phylogeny estimation we performed a computer simulation in which we started with a real sequence and used the inferred model of change to produce a tree of 10 species. Maximum parsimony (MP), maximum likelihood (ML), and various distance methods were then used to recover the topology and the branch lengths. We used two kinds of data with varying levels of variation. In addition, we tested with the removal of third positions and different weighting schemes. At low levels of variation, MP was outstanding in recovering the topology (90% correct), while unweighted pair-group method, arithmetic average (UPGMA), regardless of distances used, was poor (40%). At the higher level, most methods had a chance of around 40%-58% of finding the true tree. However, in most cases, the trees found were only slightly wrong, with only one or a few branches misplaced. On the other hand, the use of a "wrong" model had serious effects on the estimation of branch lengths (distances). Although precision was high, accuracy was poor with most methods, giving branch lengths that were biased downward. When seeded with the true distance matrix, Fitch and NJ always found the true tree, while UPGMA frequently failed to do so. The effect of removing third positions was dramatic at low levels of variation, because only one MP program was able to find a true tree at all, albeit rarely, while none of the others ever did so. At higher levels, the situation was better, but still much worse than with the whole data set.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号