首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Notoriously slow rates of molecular evolution and convergent evolution among some morphological characters have limited phylogenetic resolution for the palm family (Arecaceae). This study adds nuclear DNA (18S SSU rRNA) and chloroplast DNA (cpDNA; atpB and rbcL) sequence data for 65 genera of palms and characterizes molecular variation for each molecule. Phylogenetic relationships were estimated with maximum likelihood and maximum parsimony techniques for the new data and for previously published molecular data for 45 palm genera. Maximum parsimony analysis was also used to compare molecular and morphological data for 33 palm genera. Incongruence among datasets was detected between cpDNA and 18S data and between molecular and morphological data. Most conflict between nuclear and cpDNA data was associated with the genus Nypa. Several taxa showed relatively long branches with 18S data, but phylogenetic resolution of these taxa was essentially the same for 18S and cpDNA data. Base composition bias for 18S that contributed to erroneous phylogenetic resolution in other taxa did not seem to be present in Palmae. Morphological data were incongruent with all molecular data due to apparent morphological homoplasy for Caryoteae, Ceroxyloideae, Iriarteae, and Thrinacinae. Both cpDNA and nuclear 18S data firmly resolved Caryoteae with Borasseae of Coryphoideae, suggesting that at least some morphological characters used to place Caryoteae in Arecoideae are homoplastic. In this study, increased character sampling seems to be more important than increased taxon sampling; a comparison of the full (65-taxon) and reduced (45- and 33-taxon) datasets suggests little difference in core topology but considerably more nodal support with the increased character sample sizes. These results indicate a general trend toward a stable estimate of phylogenetic relationships for the Palmae. Although the 33-taxon topologies are even better resolved, they lack several critical taxa and are affected by incongruence between molecular and morphological data. As such, a comparison of results from the 45- and 33-taxon trees offers the best available reference for phylogenetic inference on palms.  相似文献   

2.
Supertree methods construct trees on a set of taxa (species) combining many smaller trees on the overlapping subsets of the entire set of taxa. A ‘quartet’ is an unrooted tree over taxa, hence the quartet-based supertree methods combine many -taxon unrooted trees into a single and coherent tree over the complete set of taxa. Quartet-based phylogeny reconstruction methods have been receiving considerable attentions in the recent years. An accurate and efficient quartet-based method might be competitive with the current best phylogenetic tree reconstruction methods (such as maximum likelihood or Bayesian MCMC analyses), without being as computationally intensive. In this paper, we present a novel and highly accurate quartet-based phylogenetic tree reconstruction method. We performed an extensive experimental study to evaluate the accuracy and scalability of our approach on both simulated and biological datasets.  相似文献   

3.
Comprehensive phylogenetic trees are essential tools to better understand evolutionary processes. For many groups of organisms or projects aiming to build the Tree of Life, comprehensive phylogenetic analysis implies sampling hundreds to thousands of taxa. For the tree of all life this task rises to a highly conservative 13 million. Here, we assessed the performances of methods to reconstruct large trees using Monte Carlo simulations with parameters inferred from four large angiosperm DNA matrices, containing between 141 and 567 taxa. For each data set, parameters of the HKY85+G model were estimated and used to simulate 20 new matrices for sequence lengths from 100 to 10,000 base pairs. Maximum parsimony and neighbor joining were used to analyze each simulated matrix. In our simulations, accuracy was measured by counting the number of nodes in the model tree that were correctly inferred. The accuracy of the two methods increased very quickly with the addition of characters before reaching a plateau around 1000 nucleotides for any sizes of trees simulated. An increase in the number of taxa from 141 to 567 did not significantly decrease the accuracy of the methods used, despite the increase in the complexity of tree space. Moreover, the distribution of branch lengths rather than the rate of evolution was found to be the most important factor for accurately inferring these large trees. Finally, a tree containing 13,000 taxa was created to represent a hypothetical tree of all angiosperm genera and the efficiency of phylogenetic reconstructions was tested with simulated matrices containing an increasing number of nucleotides up to a maximum of 30,000. Even with such a large tree, our simulations suggested that simple heuristic searches were able to infer up to 80% of the nodes correctly.  相似文献   

4.
Taxon sampling may be critically important for phylogenetic accuracy because adding taxa can help to subdivide misleading long branches. Although the idea that added taxa can break up long branches was exemplified by a study of "incomplete" fossil taxa, the issue of taxon completeness (i.e., proportion of missing data) has been largely ignored in most subsequent discussions of taxon sampling and long-branch attraction. In this article, I use simulations to test the ability of incomplete taxa to subdivide long branches and improve phylogenetic accuracy in situations of potential long-branch attraction. The results show that for most methods and conditions examined, adding taxa that are only 50% complete may provide similar benefits to adding the same number of complete taxa (suggesting that the advantages of increased taxon sampling may be obtained with less data than previously considered). For parsimony, taxa that are less complete (5% to 25% complete) may often have limited ability to rescue analyses from long-branch attraction. In contrast, highly incomplete taxa can be surprisingly beneficial when using model-based methods. The results also suggest the importance of model-based methods in phylogenetic analyses that combine molecular and fossil data.  相似文献   

5.
Effects of taxonomic sampling and conflicting signal on the inference of seed plant trees supported in previous molecular analyses were explored using 13 single-locus data sets. Changing the number of taxa in single-locus analyses had limited effects on log likelihood differences between the gnepine (Gnetales plus Pinaceae) and gnetifer (Gnetales plus conifers) trees. Distinguishing among these trees also was little affected by the use of different substitution parameters. The 13-locus combined data set was partitioned into nine classes based on substitution rates. Sites evolving at intermediate rates had the best likelihood and parsimony scores on gnepine trees, and those evolving at the fastest rates had the best parsimony scores on Gnetales-sister trees (Gnetales plus other seed plants). When the fastest evolving sites were excluded from parsimony analyses, well-supported gnepine trees were inferred from the combined data and from each genomic partition. When all sites were included, Gnetales-sister trees were inferred from the combined data, whereas a different tree was inferred from each genomic partition. Maximum likelihood trees from the combined data and from each genomic partition were well-supported gnepine trees. A preliminary stratigraphic test highlights the poor fit of Gnetales-sister trees to the fossil data.  相似文献   

6.
A major assumption of many molecular phylogenetic methods is the homogeneity of nucleotide frequencies among taxa, which refers to the equality of the nucleotide frequency bias among species. Changes in nucleotide frequency among different lineages in a data set are thought to lead to erroneous phylogenetic inference because unrelated clades may appear similar because of evolutionarily unrelated similarities in nucleotide frequencies. We tested the effects of the heterogeneity of nucleotide frequency bias on phylogenetic inference, along with the interaction between this heterogeneity and stratified taxon sampling, by means of computer simulations using evolutionary parameters derived from genomic databases. We found that the phylogenetic trees inferred from data sets simulated under realistic, observed levels of heterogeneity for mammalian genes were reconstructed with accuracy comparable to those simulated with homogeneous nucleotide frequencies; the results hold for Neighbor-Joining, minimum evolution, maximum parsimony, and maximum-likelihood methods. The LogDet distance method, specifically designed to deal with heterogeneous nucleotide frequencies, does not perform better than distance methods that assume substitution pattern homogeneity among sequences. In these specific simulation conditions, we did not find a significant interaction between phylogenetic accuracy and substitution pattern heterogeneity among lineages, even when the taxon sampling is increased.  相似文献   

7.
JJ Wiens  J Tiu 《PloS one》2012,7(8):e42925

Background

Phylogenies are essential to many areas of biology, but phylogenetic methods may give incorrect estimates under some conditions. A potentially common scenario of this type is when few taxa are sampled and terminal branches for the sampled taxa are relatively long. However, the best solution in such cases (i.e., sampling more taxa versus more characters) has been highly controversial. A widespread assumption in this debate is that added taxa must be complete (no missing data) in order to save analyses from the negative impacts of limited taxon sampling. Here, we evaluate whether incomplete taxa can also rescue analyses under these conditions (empirically testing predictions from an earlier simulation study).

Methodology/Principal Findings

We utilize DNA sequence data from 16 vertebrate species with well-established phylogenetic relationships. In each replicate, we randomly sample 4 species, estimate their phylogeny (using Bayesian, likelihood, and parsimony methods), and then evaluate whether adding in the remaining 12 species (which have 50, 75, or 90% of their data replaced with missing data cells) can improve phylogenetic accuracy relative to analyzing the 4 complete taxa alone. We find that in those cases where sampling few taxa yields an incorrect estimate, adding taxa with 50% or 75% missing data can frequently (>75% of relevant replicates) rescue Bayesian and likelihood analyses, recovering accurate phylogenies for the original 4 taxa. Even taxa with 90% missing data can sometimes be beneficial.

Conclusions

We show that adding taxa that are highly incomplete can improve phylogenetic accuracy in cases where analyses are misled by limited taxon sampling. These surprising empirical results confirm those from simulations, and show that the benefits of adding taxa may be obtained with unexpectedly small amounts of data. These findings have important implications for the debate on sampling taxa versus characters, and for studies attempting to resolve difficult phylogenetic problems.  相似文献   

8.
Among insects, eusocial behavior occurs in termites, ants, some bees and wasps. Isoptera and Hymenoptera convergently share social behavior, and for both taxa its evolution remains poorly understood. While dating analyses provide researchers with the opportunity to date the origin of eusociality, fossil calibration methodology may mislead subsequent ecological interpretations. Using a comprehensive termite dataset, we explored the effect of fossil placement and calibration methodology. A combined molecular and morphological dataset for 42 extant termite lineages was used, and a second dataset including these 42 taxa, plus an additional 39 fossil lineages for which we had only morphological data. MrBayes doublet-model analyses recovered similar topologies, with one minor exception (Stolotermitidae is sister to the Hodotermitidae, s.s., in the 42-taxon analysis but is in a polytomy with Hodotermitidae and (Kalotermitidae + Neoisoptera) in the 81-taxon analysis). Analyses using the r8s program on these topologies were run with either minimum/maximum constraints (analysis a = 42-taxon and analysis c = 81-taxon analyses) or with the fossil taxon ages fixed (ages fixed to be the geological age of the deposit from which they came, analysis b = 81-taxon analysis). Confidence intervals were determined for the resulting ultrametric trees, and for most major clades there was significant overlap between dates recovered for analyses A and C (with exceptions, such as the nodes Neoisoptera, and Euisoptera). With the exception of isopteran and eusiopteran node ages, however, none of the major clade ages overlapped when analysis B is compared with either analysis A or C. Future studies on Dictyoptera should note that the age of Kalotermitidae was underestimated in absence of kalotermitid fossils with fixed ages.  相似文献   

9.
Supertree methods are used to assemble separate phylogenetic trees with shared taxa into larger trees (supertrees) in an effort to construct more comprehensive phylogenetic hypotheses. In spite of much recent interest in supertrees, there are still few methods for supertree construction. The flip supertree problem is an error correction approach that seeks to find a minimum number of changes (flips) to the matrix representation of the set of input trees to resolve their incompatibilities. A previous flip supertree algorithm was limited to finding exact solutions and was only feasible for small input trees. We developed a heuristic algorithm for the flip supertree problem suitable for much larger input trees. We used a series of 48- and 96-taxon simulations to compare supertrees constructed with the flip supertree heuristic algorithm with supertrees constructed using other approaches, including MinCut (MC), modified MC (MMC), and matrix representation with parsimony (MRP). Flip supertrees are generally far more accurate than supertrees constructed using MC or MMC algorithms and are at least as accurate as supertrees built with MRP. The flip supertree method is therefore a viable alternative to other supertree methods when the number of taxa is large.  相似文献   

10.
The proliferation of gene data from multiple loci of large multigene families has been greatly facilitated by considerable recent advances in sequence generation. The evolution of such gene families, which often undergo complex histories and different rates of change, combined with increases in sequence data, pose complex problems for traditional phylogenetic analyses, and in particular, those that aim to successfully recover species relationships from gene trees. Here, we implement gene tree parsimony analyses on multicopy gene family data sets of snake venom proteins for two separate groups of taxa, incorporating Bayesian posterior distributions as a rigorous strategy to account for the uncertainty present in gene trees. Gene tree parsimony largely failed to infer species trees congruent with each other or with species phylogenies derived from mitochondrial and single-copy nuclear sequences. Analysis of four toxin gene families from a large expressed sequence tag data set from the viper genus Echis failed to produce a consistent topology, and reanalysis of a previously published gene tree parsimony data set, from the family Elapidae, suggested that species tree topologies were predominantly unsupported. We suggest that gene tree parsimony failure in the family Elapidae is likely the result of unequal and/or incomplete sampling of paralogous genes and demonstrate that multiple parallel gene losses are likely responsible for the significant species tree conflict observed in the genus Echis. These results highlight the potential for gene tree parsimony analyses to be undermined by rapidly evolving multilocus gene families under strong natural selection.  相似文献   

11.
MOTIVATION: The computation of large phylogenetic trees with statistical models such as maximum likelihood or bayesian inference is computationally extremely intensive. It has repeatedly been demonstrated that these models are able to recover the true tree or a tree which is topologically closer to the true tree more frequently than less elaborate methods such as parsimony or neighbor joining. Due to the combinatorial and computational complexity the size of trees which can be computed on a Biologist's PC workstation within reasonable time is limited to trees containing approximately 100 taxa. RESULTS: In this paper we present the latest release of our program RAxML-III for rapid maximum likelihood-based inference of large evolutionary trees which allows for computation of 1.000-taxon trees in less than 24 hours on a single PC processor. We compare RAxML-III to the currently fastest implementations for maximum likelihood and bayesian inference: PHYML and MrBayes. Whereas RAxML-III performs worse than PHYML and MrBayes on synthetic data it clearly outperforms both programs on all real data alignments used in terms of speed and final likelihood values. Availability SUPPLEMENTARY INFORMATION: RAxML-III including all alignments and final trees mentioned in this paper is freely available as open source code at http://wwwbode.cs.tum/~stamatak CONTACT: stamatak@cs.tum.edu.  相似文献   

12.
Although long-branch attraction (LBA) is frequently cited as the cause of anomalous phylogenetic groupings, few examples of LBA involving real sequence data are known. We have found several cases of probable LBA by analyzing subsamples from an alignment of 18S rDNA sequences for 133 metazoans. In one example, maximum parsimony analysis of sequences from two rotifers, a ctenophore, and a polychaete annelid resulted in strong support for a tree grouping two "long-branch taxa" (a rotifer and the ctenophore). Maximum-likelihood analysis of the same sequences yielded strong support for a more biologically reasonable "rotifer monophyly" tree. Attempts to break up long branches for problematic subsamples through increased taxon sampling reduced, but did not eliminate, LBA problems. Exhaustive analyses of all quartets for a subset of 50 sequences were performed in order to compare the performance of maximum likelihood, equal-weights parsimony, and two additional variants of parsimony; these methods do differ substantially in their rates of failure to recover trees consistent with well established, but highly unresolved phylogenies. Power analyses using simulations suggest that some incorrect inferences by maximum parsimony are due to statistical inconsistency and that when estimates of central branch lengths for certain quartets are very low, maximum-likelihood analyses have difficulty recovering accepted phylogenies even with large amounts of data. These examples demonstrate that LBA problems can occur in real data sets, and they provide an opportunity to investigate causes of incorrect inferences.  相似文献   

13.
The effects on phylogenetic accuracy of adding characters and/or taxa were explored using data generated by computer simulation. The conditions of this study were constrained but allowed for systematic investigation of certain parameters. The starting point for the study was a four-taxon tree in the "Felsenstein zone," representing a difficult phylogenetic problem with an extreme situation of long branch attraction. Taxa were added sequentially to this tree in a manner specifically designed to break up the long branches, and for each tree data matrices of different sizes were simulated. Phylogenetic trees were reconstructed from these data using the criteria of parsimony and maximum likelihood. Phylogenetic accuracy was measured in three ways: (1) proportion of trees that are completely correct, (2) proportion of correctly reconstructed branches in all trees, and (3) proportion of trees in which the original four-taxon statement is correctly reconstructed. Accuracy improved dramatically with the addition of taxa and much more slowly with the addition of characters. If taxa can be added to break up long branches, it is much more preferable to add taxa than characters.  相似文献   

14.
Many phylogenetic analyses that include numerous terminals but few genes show high resolution and branch support for relatively recently diverged clades, but lack of resolution and/or support for "basal" clades of the tree. The various benefits of increased taxon and character sampling have been widely discussed in the literature, albeit primarily based on simulations rather than empirical data. In this study, we used a well-sampled gene-tree analysis (based on 100 mitochondrial genomes of higher teleost fishes) to test empirically the efficiency of different methods of data sampling and phylogenetic inference to "correctly" resolve the basal clades of a tree (based on congruence with the reference tree constructed using all 100 taxa and 7990 characters). By itself, increased character sampling was an inefficient method by which to decrease the likelihood of "incorrect" resolution (i.e., incongruence with the reference tree) for parsimony analyses. Although increased taxon sampling was a powerful approach to alleviate "incorrect" resolution for parsimony analyses, it had the general effect of increasing the number of, and support for, "incorrectly" resolved clades in the Bayesian analyses. For both the parsimony and Bayesian analyses, increased taxon sampling, by itself, was insufficient to help resolve the basal clades, making this sampling strategy ineffective for that purpose. For this empirical study, the most efficient of the six approaches considered to resolve the basal clades when adding nucleotides to a dataset that consists of a single gene sampled for a small, but representative, number of taxa, is to increase character sampling and analyze the characters using the Bayesian method.  相似文献   

15.
Analyzing Large Data Sets in Reasonable Times: Solutions for Composite Optima   总被引:19,自引:3,他引:16  
New methods for parsimony analysis of large data sets are presented. The new methods are sectorial searches, tree-drifting, and tree-fusing. For Chase et al. 's 500-taxon data set these methods (on a 266-MHz Pentium II) find a shortest tree in less than 10 min (i.e., over 15,000 times faster than PAUP and 1000 times faster than PAUP*). Making a complete parsimony analysis requires hitting minimum length several times independently, but not necessarily all "islands" for Chase et al. 's data set, this can be done in 4 to 6 h. The new methods also perform well in other cases analyzed (which range from 170 to 854 taxa).  相似文献   

16.
Direct optimization frameworks for simultaneously estimating alignments and phylogenies have recently been developed. One such method, implemented in the program POY, is becoming more common for analyses of variable length sequences (e.g., analyses using ribosomal genes) and for combined evidence analyses (morphology + multiple genes). Simulation of sequences containing insertion and deletion events was performed in order to directly compare a widely used method of multiple sequence alignment (ClustalW) and subsequent parsimony analysis in PAUP* with direct optimization via POY. Data sets were simulated for pectinate, balanced, and random tree shapes under different conditions (clocklike, non-clocklike, and ultrametric). Alignment accuracy scores for the implied alignments from POY and the multiple sequence alignments from ClustalW were calculated and compared. In almost all cases (99.95%), ClustalW produced more accurate alignments than POY-implied alignments, judged by the proportion of correctly identified homologous sites. Topological accuracy (distance to the true tree) for POY topologies and topologies generated under parsimony in PAUP* from the ClustalW alignments were also compared. In 44.94% of the cases, Clustal alignment tree reconstructions via PAUP* were more accurate than POY, whereas in 16.71% of the cases POY reconstructions were more topologically accurate (38.38% of the time they were equally accurate). Comparisons between POY hypothesized alignments and the true alignments indicated that, on average, as alignment error increased, topological accuracy decreased.  相似文献   

17.
We newly sequenced the nuclear-encoded small subunit (SSU) rDNA coding region for 21 taxa of the genus Closterium. The new sequences were integrated into an alignment with 13 known sequences of conjugating green algae representing six traditional families (i.e. Zygnemataceae, Mesotaeniaceae, Gonatozygaceae, Peniaceae, Closteriaceae, and Desmidiaceae) and five known charophycean sequences as outgroups. Both maximum likelihood and maximum parsimony analyses supported with high bootstrap values one large clade containing all placoderm desmids (Desmidiales). All the Closterium taxa formed one clade with 100% bootstrap support, indicating their monophyly, but not paraphyly, as suggested earlier. As to the taxa within the genus Closterium , we found two clades of morphologically closely related taxa in both maximum likelihood and maximum parsimony trees. They corresponded to the C. calosporum species complex and the C. moniliferum-ehrenbergii species complex. It is of particular interest that the homothallic entity of C. moniliferum v. moniliferum was distinguished from and ancestral to all other entities of the C. moniliferum-ehrenbergii species complex. Superimposing all 50 charophycean sequences on the higher order SSU rRNA structure model of Closterium , we investigated degrees of nucleotide conservation at a given position in the nucleotide sequence. A characteristic "signature" structure to the genus Closterium was found as an additional helix at the tip of V1 region. In addition, eight base deletions at the tip of helix 10 were found to be characteristic of the C. calosporum species complex, C. gracile , C. incurvum , C. pleurodermatum , and C. pusillum v. maius. These taxa formed one clade with an 82% bootstrap value in maximum parsimony analysis.  相似文献   

18.
Recent studies have shown that addition or deletion of taxa from a data matrix can change the estimate of phylogeny. I used 29 data sets from the literature to examine the effect of taxon sampling on phylogeny estimation within data sets. I then used multiple regression to assess the effect of number of taxa, number of characters, homoplasy, strength of support, and tree symmetry on the sensitivity of data sets to taxonomic sampling. Sensitivity to sampling was measured by mapping characters from a matrix of culled taxa onto optimal trees for that reduced matrix and onto the pruned optimal tree for the entire matrix, then comparing the length of the reduced tree to the length of the pruned complete tree. Within-data-set patterns can be described by a second-order equation relating fraction of taxa sampled to sensitivity to sampling. Multiple regression analyses found number of taxa to be a significant predictor of sensitivity to sampling; retention index, number of informative characters, total support index, and tree symmetry were nonsignificant predictors. I derived a predictive regression equation relating fraction of taxa sampled and number of taxa potentially sampled to sensitivity to taxonomic sampling and calculated values for this equation within the bounds of the variables examined. The length difference between the complete tree and a subsampled tree was generally small (average difference of 0-2.9 steps), indicating that subsampling taxa is probably not an important problem for most phylogenetic analyses using up to 20 taxa.  相似文献   

19.
Synecological analyses are usually based on typological, phenetic and cladistic methods. The disadvantages of these techniques are shown. The application of the Wagner parsimony method to synecology is considered. All the methods need some prerequisites, viz. definitions of localities and characters (the most simple one being the presence/absence of taxa); the choice of taxonomic level of taxa; their autochthony. The application of Wagner parsimony needs a new terminology. The congruence of any environmental condition, including freshwater monitoring indices, can be tested on parsimonious trees. The Wagner parsimony method not only provides various indices (tree length, CI, HI, RC, RI) which allow the comparison of trees but also minimal trees which are direct tools in synecology.  相似文献   

20.
Many phylogenetic analyses, particularly morphological studies, use higher taxa (e.g., genera, families) rather than species as terminal taxa. This general approach requires dealing with interspecific variation among the species that make up the higher taxon. In this paper, I review different parsimony methods for coding and sampling higher taxa and compare their relative accuracies using computer simulations. Despite their widespread use, methods that involve coding higher taxa as terminals perform poorly in simulations, relative to splitting up the higher taxa and using species as terminals. Among the methods that use higher taxa as terminals, coding a taxon based on the most common condition among the included species (majority or modal coding) is generally more accurate than other coding methods, such as coding taxa as missing or polymorphic. The success of the majority method, and results of further simulations, suggest that in many cases "common equals primitive" within variable taxa, at least for low and intermediate rates of character change. The fixed-only method (excluding variable characters) performs very poorly, a result that is indirectly supported by analyses of published data for squamate reptiles. Sampling only a single species per higher taxon also yields low accuracy under many conditions. Along with recent studies of intraspecific polymorphism, the results of this study show the general importance of (1) including characters despite variation within taxa and (2) using methods that incorporate detailed information on the distribution of states within variable taxa.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号