首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Given a collection of discrete characters (e.g., aligned DNA sites, gene adjacencies), a common measure of distance between taxa is the proportion of characters for which taxa have different character states. Tree reconstruction based on these (uncorrected) distances can be statistically inconsistent and can lead to trees different from those obtained using character-based methods such as maximum likelihood or maximum parsimony. However, in these cases the distance data often reveal their unreliability by some deviation from additivity, as indicated by conflicting support for more than one tree. We describe two results that show how uncorrected (and miscorrected) distance data can be simultaneously perfectly additive and misleading. First, multistate character data can be perfectly compatible and define one tree, and yet the uncorrected distances derived from these characters are perfectly treelike (and obey a molecular clock), only for a completely different tree. Second, under a Markov model of character evolution a similar phenomenon can occur; not only is there statistical inconsistency using uncorrected distances, but there is no evidence of this inconsistency because the distances look perfectly treelike (this does not occur in the classic two-parameter Felsenstein zone). We characterize precisely when uncorrected distances are additive on the true (and on a false) tree for four taxa. We also extend this result to a more general setting that applies to distances corrected according to an incorrect model.  相似文献   

2.
The effects on phylogenetic accuracy of adding characters and/or taxa were explored using data generated by computer simulation. The conditions of this study were constrained but allowed for systematic investigation of certain parameters. The starting point for the study was a four-taxon tree in the "Felsenstein zone," representing a difficult phylogenetic problem with an extreme situation of long branch attraction. Taxa were added sequentially to this tree in a manner specifically designed to break up the long branches, and for each tree data matrices of different sizes were simulated. Phylogenetic trees were reconstructed from these data using the criteria of parsimony and maximum likelihood. Phylogenetic accuracy was measured in three ways: (1) proportion of trees that are completely correct, (2) proportion of correctly reconstructed branches in all trees, and (3) proportion of trees in which the original four-taxon statement is correctly reconstructed. Accuracy improved dramatically with the addition of taxa and much more slowly with the addition of characters. If taxa can be added to break up long branches, it is much more preferable to add taxa than characters.  相似文献   

3.
The inconsistency of the maximum parsimony method is known to occur even when the rate of nucleotide substitution is constant. To understand why this inconsistency occurs, a mathematical study was conducted for the cases of five, six, and seven sequences. The results obtained indicate that this inconsistency occurs because the probability of occurrence of nucleotide configurations generated by one substitution on a short interior branch is often lower than that of configurations generated by more substitutions on other longer branches. The chance of occurrence of this event—or, the inconsistency of the maximum parsimony method—apparently increases as the number of sequences increases. The inconsistency may occur even when the extent of sequence divergence is relatively small. Correspondence to: M. Nei  相似文献   

4.
Although long-branch attraction (LBA) is frequently cited as the cause of anomalous phylogenetic groupings, few examples of LBA involving real sequence data are known. We have found several cases of probable LBA by analyzing subsamples from an alignment of 18S rDNA sequences for 133 metazoans. In one example, maximum parsimony analysis of sequences from two rotifers, a ctenophore, and a polychaete annelid resulted in strong support for a tree grouping two "long-branch taxa" (a rotifer and the ctenophore). Maximum-likelihood analysis of the same sequences yielded strong support for a more biologically reasonable "rotifer monophyly" tree. Attempts to break up long branches for problematic subsamples through increased taxon sampling reduced, but did not eliminate, LBA problems. Exhaustive analyses of all quartets for a subset of 50 sequences were performed in order to compare the performance of maximum likelihood, equal-weights parsimony, and two additional variants of parsimony; these methods do differ substantially in their rates of failure to recover trees consistent with well established, but highly unresolved phylogenies. Power analyses using simulations suggest that some incorrect inferences by maximum parsimony are due to statistical inconsistency and that when estimates of central branch lengths for certain quartets are very low, maximum-likelihood analyses have difficulty recovering accepted phylogenies even with large amounts of data. These examples demonstrate that LBA problems can occur in real data sets, and they provide an opportunity to investigate causes of incorrect inferences.  相似文献   

5.
Taxon sampling may be critically important for phylogenetic accuracy because adding taxa can help to subdivide misleading long branches. Although the idea that added taxa can break up long branches was exemplified by a study of "incomplete" fossil taxa, the issue of taxon completeness (i.e., proportion of missing data) has been largely ignored in most subsequent discussions of taxon sampling and long-branch attraction. In this article, I use simulations to test the ability of incomplete taxa to subdivide long branches and improve phylogenetic accuracy in situations of potential long-branch attraction. The results show that for most methods and conditions examined, adding taxa that are only 50% complete may provide similar benefits to adding the same number of complete taxa (suggesting that the advantages of increased taxon sampling may be obtained with less data than previously considered). For parsimony, taxa that are less complete (5% to 25% complete) may often have limited ability to rescue analyses from long-branch attraction. In contrast, highly incomplete taxa can be surprisingly beneficial when using model-based methods. The results also suggest the importance of model-based methods in phylogenetic analyses that combine molecular and fossil data.  相似文献   

6.
Model‐based approaches (e.g. maximum likelihood, Bayesian inference) are widely used with molecular data, where they might be more appropriate than maximum parsimony for estimating phylogenies under various models of molecular evolution. Recently, there has been an increase in the application of model‐based approaches with morphological (mainly fossil) data; however, there is some doubt as to the effectiveness of the model of morphological evolution. The input parameters (prior probabilities) for the model are unclear, particularly when concerned with unobserved character states. Despite this, some systematists are suggesting superiority of these model‐based methods over maximum parsimony based on, for example, increased resolution or, in the current study, the preferred phylogenetic placement of an iconic taxon. Here, we revisit a recently published analysis implying such superiority and document the discrepancies between parsimony‐based and model‐based approaches to phylogeny estimation. We find that although some taxa are shifted back to their “traditional” phylogenetic placement, other clades are disturbed. The model‐based phylogenies are better resolved; however, due to the lack of an appropriate model of morphological evolution, the increase in resolving power is probably not meaningful. Similarly, some of the preferred phylogenetic positions of taxa, particularly of labile taxa such as Archaeopteryx, are based solely on analyses employing maximum parsimony as the optimality criterion. Poor resolution and labile taxa indicate a need for further examination of the morphology and not a change in method.  相似文献   

7.
We newly sequenced the nuclear-encoded small subunit (SSU) rDNA coding region for 21 taxa of the genus Closterium. The new sequences were integrated into an alignment with 13 known sequences of conjugating green algae representing six traditional families (i.e. Zygnemataceae, Mesotaeniaceae, Gonatozygaceae, Peniaceae, Closteriaceae, and Desmidiaceae) and five known charophycean sequences as outgroups. Both maximum likelihood and maximum parsimony analyses supported with high bootstrap values one large clade containing all placoderm desmids (Desmidiales). All the Closterium taxa formed one clade with 100% bootstrap support, indicating their monophyly, but not paraphyly, as suggested earlier. As to the taxa within the genus Closterium , we found two clades of morphologically closely related taxa in both maximum likelihood and maximum parsimony trees. They corresponded to the C. calosporum species complex and the C. moniliferum-ehrenbergii species complex. It is of particular interest that the homothallic entity of C. moniliferum v. moniliferum was distinguished from and ancestral to all other entities of the C. moniliferum-ehrenbergii species complex. Superimposing all 50 charophycean sequences on the higher order SSU rRNA structure model of Closterium , we investigated degrees of nucleotide conservation at a given position in the nucleotide sequence. A characteristic "signature" structure to the genus Closterium was found as an additional helix at the tip of V1 region. In addition, eight base deletions at the tip of helix 10 were found to be characteristic of the C. calosporum species complex, C. gracile , C. incurvum , C. pleurodermatum , and C. pusillum v. maius. These taxa formed one clade with an 82% bootstrap value in maximum parsimony analysis.  相似文献   

8.
In this paper we investigate mathematical questions concerning the reliability (reconstruction accuracy) of Fitch's maximum parsimony algorithm for reconstructing the ancestral state given a phylogenetic tree and a character. In particular, we consider the question whether the maximum parsimony method applied to a subset of taxa can reconstruct the ancestral state of the root more accurately than when applied to all taxa, and we give an example showing that this indeed is possible. A surprising feature of our example is that ignoring a taxon closer to the root improves the reliability of the method. On the other hand, in the case of the two-state symmetric substitution model, we answer affirmatively a conjecture of Li, Steel and Zhang which states that under a molecular clock the probability that the state at a single taxon is a correct guess of the ancestral state is a lower bound on the reconstruction accuracy of Fitch's method applied to all taxa.  相似文献   

9.
The method of evolutionary parsimony--or operator invariants--is a technique of nucleic acid sequence analysis related to parsimony analysis and explicitly designed for determining evolutionary relationships among four distantly related taxa. The method is independent of substitution rates because it is derived from consideration of the group properties of substitution operators rather than from an analysis of the probabilities of substitution in branches of a tree. In both parsimony and evolutionary parsimony, three patterns of nucleotide substitution are associated one-to-one with the three topologically linked trees for four taxa. In evolutionary parsimony, the three quantities are operator invariants. These invariants are the remnants of substitutions that have occurred in the interior branch of the tree and are analogous to the substitutions assigned to the central branch by parsimony. The two invariants associated with the incorrect trees must equal zero (statistically), whereas only the correct tree can have a nonzero invariant. The chi 2-test is used to ascertain the nonzero invariant and the statistically favored tree. Examples, obtained using data calculated with evolutionary rates and branchings designed to camouflage the true tree, show that the method accurately predicts the tree, even when substitution rates differ greatly in neighboring peripheral branches (conditions under which parsimony will consistently fail). As the number of substitutions in peripheral branches becomes fewer, the parsimony and the evolutionary-parsimony solutions converge. The method is robust and easy to use.   相似文献   

10.
The phylogeny of 31 autolytine taxa (Syllidae, Polychaeta, and Annelida) was estimated based on 16S rDNA and 18S rDNA sequences. Outgroups included 12 non-autolytine syllids and four other annelids from related groups. The phylogeny was used to trace the evolution of the various reproductive strategies (i.e., epigamy, anterior and posterior scissiparity, and gemmiparity) within the group, and it will also serve as a basis for a forthcoming revision of autolytine taxonomy. The two genes were analysed both separately and in combination using parsimony, maximum likelihood, and Bayesian inference. Regardless of method used the combined analysis supported a division of Autolytinae into three major clades: one with epigamous Autolytus; a second comprising Autolytus and Myrianida with posterior scissiparity and gemmiparity; and a third containing Proceraea, Procerastea, and Virchowia with anterior scissiparity. The relationship between these three groups is uncertain. Ancestral reproductive states were reconstructed with parsimony and maximum likelihood, and the results unequivocally support epigamy as the plesiomorphic reproductive mode in Syllidae, and that schizogamy in Syllinae and Autolytinae are separate events. The evolution of reproductive traits is ambiguous within Autolytinae, and either of the different reproductive modes could represent the ancestral state.  相似文献   

11.
Many phylogenetic analyses, particularly morphological studies, use higher taxa (e.g., genera, families) rather than species as terminal taxa. This general approach requires dealing with interspecific variation among the species that make up the higher taxon. In this paper, I review different parsimony methods for coding and sampling higher taxa and compare their relative accuracies using computer simulations. Despite their widespread use, methods that involve coding higher taxa as terminals perform poorly in simulations, relative to splitting up the higher taxa and using species as terminals. Among the methods that use higher taxa as terminals, coding a taxon based on the most common condition among the included species (majority or modal coding) is generally more accurate than other coding methods, such as coding taxa as missing or polymorphic. The success of the majority method, and results of further simulations, suggest that in many cases "common equals primitive" within variable taxa, at least for low and intermediate rates of character change. The fixed-only method (excluding variable characters) performs very poorly, a result that is indirectly supported by analyses of published data for squamate reptiles. Sampling only a single species per higher taxon also yields low accuracy under many conditions. Along with recent studies of intraspecific polymorphism, the results of this study show the general importance of (1) including characters despite variation within taxa and (2) using methods that incorporate detailed information on the distribution of states within variable taxa.  相似文献   

12.
The nuclear large subunit (LSU) rRNA gene is a rich source of phylogenetic characters because of its large size, mosaic of slowly and rapidly evolving regions, and complex secondary structure variation. Nevertheless, many studies have indicated that inconsistency, bias, and gene-specific error (e.g., within-individual gene family variation, cryptic sequence simplicity, and sequence coevolution) can complicate animal phylogenies based on LSU rDNA sequences. However, most of these studies sampled small gene fragments from expansion segments--among animals only five nonchordate complete LSU sequences are published. In this study, we sequenced near-complete nuclear LSU genes from 11 representative daphniids (Crustacea). The daphniid expansion segment V6 was larger and showed more length variation (90-351 bp) than is found in all other reported LSU V6 sequences. Daphniid LSU (without the V6 region) phylogenies generally agreed with the existing phylogenies based on morphology and mtDNA sequences. Nevertheless, a major disagreement between the LSU and the expected trees involved a positively misleading association between the two taxa with the longest branches, Daphnia laevis and D. occidentalis. Both maximum parsimony (MP) and maximum likelihood (ML) optimality criteria recovered this association, but parametric simulations indicated that MP was markedly more sensitive to this bias than ML. Examination of data partitions indicated that the inconsistency was caused by increased nucleotide substitution rates in the branches leading to D. laevis and D. occidentalis rather than among-taxon differences in base composition or distribution of sites that are free to vary. These results suggest that lineage-specific rate acceleration can lead to long-branch attraction even in the conserved genes of animal species that are almost morphologically indistinguishable.  相似文献   

13.
Sequences of two chloroplast photosystem genes, psaA and psbB, together comprising about 3,500 bp, were obtained for all five major groups of extant seed plants and several outgroups among other vascular plants. Strongly supported, but significantly conflicting, phylogenetic signals were obtained in parsimony analyses from partitions of the data into first and second codon positions versus third positions. In the former, both genes agreed on a monophyletic gymnosperms, with Gnetales closely related to certain conifers. In the latter, Gnetales are inferred to be the sister group of all other seed plants, with gymnosperms paraphyletic. None of the data supported the modern "anthophyte hypothesis," which places Gnetales as the sister group of flowering plants. A series of simulation studies were undertaken to examine the error rate for parsimony inference. Three kinds of errors were examined: random error, systematic bias (both properties of finite data sets), and statistical inconsistency owing to long-branch attraction (an asymptotic property). Parsimony reconstructions were extremely biased for third-position data for psbB. Regardless of the true underlying tree, a tree in which Gnetales are sister to all other seed plants was likely to be reconstructed for these data. None of the combinations of genes or partitions permits the anthophyte tree to be reconstructed with high probability. Simulations of progressively larger data sets indicate the existence of long-branch attraction (statistical inconsistency) for third-position psbB data if either the anthophyte tree or the gymnosperm tree is correct. This is also true for the anthophyte tree using either psaA third positions or psbB first and second positions. A factor contributing to bias and inconsistency is extremely short branches at the base of the seed plant radiation, coupled with extremely high rates in Gnetales and nonseed plant outgroups.  相似文献   

14.
Phylogenetic analysis using parsimony and likelihood methods   总被引:1,自引:0,他引:1  
The assumptions underlying the maximum-parsimony (MP) method of phylogenetic tree reconstruction were intuitively examined by studying the way the method works. Computer simulations were performed to corroborate the intuitive examination. Parsimony appears to involve very stringent assumptions concerning the process of sequence evolution, such as constancy of substitution rates between nucleotides, constancy of rates across nucleotide sites, and equal branch lengths in the tree. For practical data analysis, the requirement of equal branch lengths means similar substitution rates among lineages (the existence of an approximate molecular clock), relatively long interior branches, and also few species in the data. However, a small amount of evolution is neither a necessary nor a sufficient requirement of the method. The difficulties involved in the application of current statistical estimation theory to tree reconstruction were discussed, and it was suggested that the approach proposed by Felsenstein (1981,J. Mol. Evol. 17: 368–376) for topology estimation, as well as its many variations and extensions, differs fundamentally from the maximum likelihood estimation of a conventional statistical parameter. Evidence was presented showing that the Felsenstein approach does not share the asymptotic efficiency of the maximum likelihood estimator of a statistical parameter. Computer simulations were performed to study the probability that MP recovers the true tree under a hierarchy of models of nucleotide substitution; its performance relative to the likelihood method was especially noted. The results appeared to support the intuitive examination of the assumptions underlying MP. When a simple model of nucleotide substitution was assumed to generate data, the probability that MP recovers the true topology could be as high as, or even higher than, that for the likelihood method. When the assumed model became more complex and realistic, e.g., when substitution rates were allowed to differ between nucleotides or across sites, the probability that MP recovers the true topology, and especially its performance relative to that of the likelihood method, generally deteriorates. As the complexity of the process of nucleotide substitution in real sequences is well recognized, the likelihood method appears preferable to parsimony. However, the development of a statistical methodology for the efficient estimation of the tree topology remains a difficult open problem.  相似文献   

15.
Z. Yang  S. Kumar    M. Nei 《Genetics》1995,141(4):1641-1650
A statistical method was developed for reconstructing the nucleotide or amino acid sequences of extinct ancestors, given the phylogeny and sequences of the extant species. A model of nucleotide or amino acid substitution was employed to analyze data of the present-day sequences, and maximum likelihood estimates of parameters such as branch lengths were used to compare the posterior probabilities of assignments of character states (nucleotides or amino acids) to interior nodes of the tree; the assignment having the highest probability was the best reconstruction at the site. The lysozyme c sequences of six mammals were analyzed by using the likelihood and parsimony methods. The new likelihood-based method was found to be superior to the parsimony method. The probability that the amino acids for all interior nodes at a site reconstructed by the new method are correct was calculated to be 0.91, 0.86, and 0.73 for all, variable, and parsimony-informative sites, respectively, whereas the corresponding probabilities for the parsimony method were 0.84, 0.76, and 0.51, respectively. The probability that an amino acid in an ancestral sequence is correctly reconstructed by the likelihood analysis ranged from 91.3 to 98.7% for the four ancestral sequences.  相似文献   

16.
The nucleotide sequence of the complete mitochondrial cytochrome b gene has been determined and compared for 51 species of the family Bovidae and 10 potential pecoran and tragulid outgroups. A detailed saturation analysis at each codon position relative to the maximum parsimony procedure indicates that all transitions on third codon positions do not accumulate in a similar fashion: C-T are more saturated than A-G substitutions. The same trend is observed for second positions but not for first positions where A-G and C-T transitions exhibit roughly the same levels of saturation. Maximum parsimony reconstructions were weighted according to these observations. Maximum parsimony, maximum likelihood, and distance phylogenetic reconstructions all depict a major split within Bovidae. The subfamily Bovinae includes four multifurcating tribes and subtribes: Boselaphini, Tragelaphini, cattle-Bovini (Bos and Bison), and buffalo-Bovini (Bubalus and Syncerus). Its sister group is the subfamily Antilopinae, i.e., all non-Bovinae taxa, represented by seven lineages: Antilopini (including Saiga), Caprini sensu lato (i. e., Caprinae including Pantholops), Hippotragini, Alcelaphini, Reduncini (including Pelea), Aepyceros possibly linked to Neotragus, and Cephalophini possibly linked to Oreotragus (the suni and the klipspringer being members of a polyphyletic Neotragini). These various tribes and major lineages were produced by two noteworthy explosive radiations, which occurred simultaneously between 12.0 and 15.3 MY (Middle Miocene) in the subfamilies Bovinae and Antilopinae.  相似文献   

17.
Summary A phylogenetic tree was constructed from 245 globin amino acid sequences. Of the six plant globins, five represented the Leguminosae and one the Ulmaceae. Among the invertebrate sequences, 7 represented the phylum Annelida, 13 represented Insecta and Crustacea of the phylum Arthropoda, and 6 represented the phylum Mollusca. Of the vertebrate globins, 4 represented the Agnatha and 209 represented the Gnathostomata. A common alignment was achieved for the 245 sequences using the parsimony principle, and a matrix of minimum mutational distances was constructed. The most parsimonious phylogenetic tree, i.e., the one having the lowest number of nucleotide substitutions that cause amino acid replacements, was obtained employing clustering and branch-swapping algorithms. Based on the available fossil record, the earliest split in the ancestral metazoan lineage was placed at 680 million years before present (Myr BP), the origin of vertebrates was placed at 510 Myr BP, and the separation of the Chondrichthyes and the Osteichthyes was placed at 425 Myr BP. Local molecular clock calculations were used to date the branch points on the descending branches of the various lineages within the plant and invertebrate portions of the tree. The tree divided the 245 sequences into five distinct clades that corresponded exactly to the five groups plants, annelids, arthropods, molluscs, and vertebrates. Furthermore, the maximum parsimony tree, in contrast to the unweighted pair group and distance Wagner trees, was consistent with the available fossil record and supported the hypotheses that the primitive hemoglobin of metazoans was monomeric and that the multisubunit extracellular hemoglobins found among the Annelida and the Arthropoda represent independently derived states.  相似文献   

18.

Background

Long branch attraction (LBA) is a problem that afflicts both the parsimony and maximum likelihood phylogenetic analysis techniques. Research has shown that parsimony is particularly vulnerable to inferring the wrong tree in Felsenstein topologies. The long branch extraction method is a procedure to detect a data set suffering from this problem so that Maximum Likelihood could be used instead of Maximum Parsimony.

Results

The long branch extraction method has been well cited and used by many authors in their analysis but no strong validation has been performed as to its accuracy. We performed such an analysis by an extensive search of the branch length search space under two topologies of six taxa, a Felsenstein-like topology and Farris-like topology. We also examine a long branch shortening method.

Conclusions

The long branch extraction method seems to mask the majority of the search space rendering it ineffective as a detection method of LBA. A proposed alternative, the long branch shortening method, is also ineffective in predicting long branch attraction for all tree topologies.
  相似文献   

19.

Background

It has been suggested that statistical parsimony network analysis could be used to get an indication of species represented in a set of nucleotide data, and the approach has been used to discuss species boundaries in some taxa.

Methodology/Principal Findings

Based on 635 base pairs of the mitochondrial protein-coding gene cytochrome c oxidase I (COI), we analyzed 152 nemertean specimens using statistical parsimony network analysis with the connection probability set to 95%. The analysis revealed 15 distinct networks together with seven singletons. Statistical parsimony yielded three networks supporting the species status of Cephalothrix rufifrons, C. major and C. spiralis as they currently have been delineated by morphological characters and geographical location. Many other networks contained haplotypes from nearby geographical locations. Cladistic structure by maximum likelihood analysis overall supported the network analysis, but indicated a false positive result where subnetworks should have been connected into one network/species. This probably is caused by undersampling of the intraspecific haplotype diversity.

Conclusions/Significance

Statistical parsimony network analysis provides a rapid and useful tool for detecting possible undescribed/cryptic species among cephalotrichid nemerteans based on COI gene. It should be combined with phylogenetic analysis to get indications of false positive results, i.e., subnetworks that would have been connected with more extensive haplotype sampling.  相似文献   

20.
The fern genus Dryopteris (Dryopteridaceae) is represented in the Hawaiian Islands by 18 endemic taxa and one non-endemic, native species. The goals of this study were to determine whether Dryopteris in Hawai'i is monophyletic and to infer the biogeographical origins of Hawaiian Dryopteris by determining the geographical distributions of their closest living relatives. We sequenced two chloroplast DNA fragments, rbcL and the trnL-F intergenic spacer (IGS), for 18 Hawaiian taxa, 45 non-Hawaiian taxa, and two outgroup species. For individual fragments, we estimated phylogenetic relationships using Bayesian inference and maximum parsimony. We performed a combined analysis of both cpDNA fragments employing Bayesian inference, maximum parsimony, and maximum likelihood. These analyses indicate that Hawaiian Dryopteris is not monophyletic, and that there were at least five separate colonizations of the Hawaiian Islands by different species of dryopteroid ferns, with most of the five groups having closest relatives in SE Asia. The results suggest that one colonizing ancestor, perhaps from SE Asia, gave rise to eight endemic taxa (the glabra group). Another colonizing ancestor, also possibly from SE Asia, gave rise to a group of five endemic taxa (the exindusiate group). Dryopteris fusco-atra and its two varieties, which are endemic to Hawai'i, most likely diversified from a SE Asian ancestor. The Hawaiian endemic Nothoperanema rubiginosum has its closest relatives in SE Asia, and while the remaining two species, D. wallichiana and D. subbipinnata, are sister species, their biogeographical origins could not be determined from these analyses due to the widespread distributions of D. wallichiana and its closest non-Hawaiian relative.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号