首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Proper taxon sampling is one of the greatest challenges to understanding phylogenetic relationships, perhaps as important as choice of optimality criterion or data type. This has been demonstrated in diatoms where centric diatoms may either be strongly supported as monophyletic or paraphyletic when analyzing SSU rDNA sequences using the same optimality criterion. The effect of ingroup and outgroup taxon sampling on relationships of diatoms is explored for diatoms as a whole and for the order Thalassiosirales. In the latter case, SSU rDNA and rbcL sequence data result in phylogenetic relationships that appear to be strongly incongruent with morphology and broadly incongruent with the fossil record. For example, Cyclotella stelligera Cleve & Grunow behaves like a rogue taxon, jumping from place to place throughout the tree. Morphological data place C. stelligera near the base of the freshwater group as sister to the extinct genus Mesodictyon Theriot and Bradbury, suggesting that it is an old, long branch that might be expected to "misbehave" in poorly sampled trees. Cyclotella stelligera and C. bodanica Grunow delimit the diameter of morphological diversity in Cyclotella , so increased sampling of intermediate taxa will be critical to resolving this part of the tree. Morphology is sampled for a much greater number of taxa and many transitional states of putative synapomorphies seem to suggest a robust morphological hypothesis. The Thalassiosirales are unstable with regards to taxon sampling in the genetic data, suggesting that perhaps the morphological hypothesis is (for now) preferable.  相似文献   

2.
A phylogeny for 21 species of spatangoid sea urchins is constructed using data from three genes and results compared with morphology-based phylogenies derived for the same taxa and for a much larger sample of 88 Recent and fossil taxa. Different data sets and methods of analysis generate different phylogenetic hypotheses, although congruence tests show that all molecular approaches produce trees that are congruent with each other. By contrast, the trees generated from morphological data differ significantly according to taxon sampling density and only those with dense sampling (after a posteriori weighting) are congruent with molecular estimates. With limited taxon sampling, secondary reversals in deep-water taxa are interpreted as plesiomorphies, pulling them to a basal position. The addition of fossil taxa with their unique character combinations reveals hidden homoplasy and generates a phylogeny that is compatible with molecular estimates. As homoplasy levels were found to be broadly similar across different anatomical structures in the echinoid test, no one suite of morphological characters can be considered to provide more reliable phylogenetic information. Some traditional groupings are supported, including the grouping of Loveniidae, Brissidae and Spatangidae within the Micrasterina, but the Asterostomatidae is shown to be polyphyletic with members scattered amongst at least five different clades. As these are mostly deep-sea taxa, this finding implies multiple independent invasions into the deep sea.  相似文献   

3.
Proper taxon sampling is one of the greatest challenges to understanding phylogenetic relationships, perhaps as important as choice of optimality criterion or data type. This has been demonstrated in diatoms where centric diatoms may either be strongly supported as monophyletic or paraphyletic when analyzing SSU rDNA sequences using the same optimality criterion. The effect of ingroup and outgroup taxon sampling on relationships of diatoms is explored for diatoms as a whole and for the order Thalassiosirales. In the latter case, SSU rDNA and rbcL sequence data result in phylogenetic relationships that appear to be strongly incongruent with morphology and broadly incongruent with the fossil record. For example, Cyclotella stelligera Cleve & Grunow behaves like a rogue taxon, jumping from place to place throughout the tree. Morphological data place C. stelligera near the base of the freshwater group as sister to the extinct genus Mesodictyon Theriot and Bradbury, suggesting that it is an old, long branch that might be expected to “misbehave” in poorly sampled trees. Cyclotella stelligera and C. bodanica Grunow delimit the diameter of morphological diversity in Cyclotella, so increased sampling of intermediate taxa will be critical to resolving this part of the tree. Morphology is sampled for a much greater number of taxa and many transitional states of putative synapomorphies seem to suggest a robust morphological hypothesis. The Thalassiosirales are unstable with regards to taxon sampling in the genetic data, suggesting that perhaps the morphological hypothesis is (for now) preferable.  相似文献   

4.
The effect of taxonomic sampling on phylogenetic accuracy under parsimony is examined by simulating nucleotide sequence evolution. Random error is minimized by using very large numbers of simulated characters. This allows estimation of the consistency behavior of parsimony, even for trees with up to 100 taxa. Data were simulated on 8 distinct 100-taxon model trees and analyzed as stratified subsets containing either 25 or 50 taxa, in addition to the full 100-taxon data set. Overall accuracy decreased in a majority of cases when taxa were added. However, the magnitude of change in the cases in which accuracy increased was larger than the magnitude of change in the cases in which accuracy decreased, so, on average, overall accuracy increased as more taxa were included. A stratified sampling scheme was used to assess accuracy for an initial subsample of 25 taxa. The 25-taxon analyses were compared to 50- and 100-taxon analyses that were pruned to include only the original 25 taxa. On average, accuracy for the 25 taxa was improved by taxon addition, but there was considerable variation in the degree of improvement among the model trees and across different rates of substitution.  相似文献   

5.
类群取样与系统发育分析精确度之探索   总被引:6,自引:2,他引:4  
Appropriate and extensive taxon sampling is one of the most important determinants of accurate phylogenetic estimation. In addition, accuracy of inferences about evolutionary processes obtained from phylogenetic analyses is improved significantly by thorough taxon sampling efforts. Many recent efforts to improve phylogenetic estimates have focused instead on increasing sequence length or the number of overall characters in the analysis, and this often does have a beneficial effect on the accuracy of phylogenetic analyses. However, phylogenetic analyses of few taxa (but each represented by many characters) can be subject to strong systematic biases, which in turn produce high measures of repeatability (such as bootstrap proportions) in support of incorrect or misleading phylogenetic results. Thus, it is important for phylogeneticists to consider both the sampling of taxa, as well as the sampling of characters, in designing phylogenetic studies. Taxon sampling also improves estimates of evolutionary parameters derived from phylogenetic trees, and is thus important for improved applications of phylogenetic analyses. Analysis of sensitivity to taxon inclusion, the possible effects of long-branch attraction, and sensitivity of parameter estimation for model-based methods should be a part of any careful and thorough phylogenetic analysis. Furthermore, recent improvements in phylogenetic algorithms and in computational power have removed many constraints on analyzing large, thoroughly sampled data sets. Thorough taxon sampling is thus one of the most practical ways to improve the accuracy of phylogenetic estimates, as well as the accuracy of biological inferences that are based on these phylogenetic trees.  相似文献   

6.
Taxon sampling and seed plant phylogeny   总被引:2,自引:0,他引:2  
We investigated the effects of taxon sampling on phylogenetic inference by exchanging terminals in two sizes of rbcL matrices for seed plants, applying parsimony and bayesian analyses to ten 38‐taxon matrices and ten 80‐taxon matrices. In comparing tree topologies we concentrated on the position of the Gnetales, an important group whose placement has long been disputed. With either method, trees obtained from different taxon samples could be mutually contradictory and even disagree on groups that seemed strongly supported. Adding terminals improved the consistency of results for unweighted parsimony, but not for parsimony with third positions excluded and not for bayesian analysis, particularly when the general time‐reversible model was employed. This suggests that attempting to resolve deep relationships using only a few taxa can lead to spurious conclusions, groupings unlikely to be repeatable with different taxon samplings or larger data sets. The effect of taxon sampling has not generally been recognized, and phylogenetic studies of seed plants have often been based on few taxa. Such insufficient sampling may help explain the variety of phylogenetic hypotheses for seed plants proposed in recent years. We recommend that restricted data sets such as single‐gene subsets of multigene studies should be reanalyzed with alternative selections of terminals to assess topological consistency.  相似文献   

7.
Phylogeny reconstruction is a difficult computational problem, because the number of possible solutions increases with the number of included taxa. For example, for only 14 taxa, there are more than seven trillion possible unrooted phylogenetic trees. For this reason, phylogenetic inference methods commonly use clustering algorithms (e.g., the neighbor-joining method) or heuristic search strategies to minimize the amount of time spent evaluating nonoptimal trees. Even heuristic searches can be painfully slow, especially when computationally intensive optimality criteria such as maximum likelihood are used. I describe here a different approach to heuristic searching (using a genetic algorithm) that can tremendously reduce the time required for maximum-likelihood phylogenetic inference, especially for data sets involving large numbers of taxa. Genetic algorithms are simulations of natural selection in which individuals are encoded solutions to the problem of interest. Here, labeled phylogenetic trees are the individuals, and differential reproduction is effected by allowing the number of offspring produced by each individual to be proportional to that individual's rank likelihood score. Natural selection increases the average likelihood in the evolving population of phylogenetic trees, and the genetic algorithm is allowed to proceed until the likelihood of the best individual ceases to improve over time. An example is presented involving rbcL sequence data for 55 taxa of green plants. The genetic algorithm described here required only 6% of the computational effort required by a conventional heuristic search using tree bisection/reconnection (TBR) branch swapping to obtain the same maximum-likelihood topology.   相似文献   

8.
9.
The Galerucinae (Coleoptera: Chrysomelidae) sensu stricto (true galerucines) comprise a large assemblage of diverse phytophagous beetles containing over 5000 described species. Together with their sister taxon, the flea beetles, which differ from true galerucines by having the hind femora usually modified for jumping, the Galerucinae sensu lato comprises over 13 000 described species and is the largest natural group within the Chrysomelidae. Unlike the flea beetles, for which robust hierarchical classification schemes have not been erected, an existing taxonomic structure exists for the true galerucines, based mostly on the works of the late John Wilcox. In the most recent taxonomic list of the Galerucinae sensu stricto, five tribes were established comprising 29 sections housing 488 genera. The majority of the diversity within these tribes is found within the tribe Luperini, in which two genera, Monolepta and Diabrotica, are known to contain over 500 described species. Here, we extend the work from previous phylogenetic studies of the Galerucinae by analysing four amplicons from three gene regions (18S and 28S rRNA; COI) representing 249 taxa, providing the largest phylogenetic analysis of this taxon to date. Using two seven‐state RNA models, we combine five maximum likelihood models (RNA + DNA for the rRNAs; three separate DNA models for the COI codon positions) for these partitions and analyse the data under likelihood using Bayesian inference. The results of these two analyses are compared with those from equally weighted parsimony. Instead of choosing the results from one optimality criterion over another, either based on statistical support, tree topology or philosophical predisposition, we elect to draw attention to the similar results produced by all three analyses, illustrating the robustness of the data to these different analytical methods. In general, the results from all three analyses are consistent with each other and previous molecular phylogenetic reconstructions for Galerucinae, except that increased taxon sampling for several groups, namely the tribes Hylaspini and Oidini, has improved the phylogenetic position of these taxa. As with previous analyses, under‐sampled taxa, such as the Old World Metacyclini and all sections of the subtribe Luperina, continue to be unstable, with the few taxa representing these groups fluctuating in their positions based on the implemented optimality criterion. Nonetheless, we report here the most comprehensive phylogenetic estimation for the Galerucinae to date.  相似文献   

10.
A molecular phylogeny of annelids   总被引:6,自引:0,他引:6  
We present parsimony analyses of annelids based on the largest taxon sample and most extensive molecular data set yet assembled, with two nuclear ribosomal genes (18S rDNA and the D1 region of 28S rDNA), one nuclear protein coding‐gene (Histone H3) and one mitochondrial ribosomal gene (16S rDNA) from 217 terminal taxa. Of these, 267 sequences are newly sequenced, and the remaining were obtained from GenBank. The included taxa are based on the criteria that the taxon must have 18S rDNA or at least two other loci. Our analyses show that 68% of annelid family ranked taxa represented by more than one taxon in our study are supported by a jackknife value > 50%. In spite of the size of our data set, the phylogenetic signal in the deepest part of the tree remains weak and the majority of the currently recognized major polychaete clades (except Amphinomida and Aphroditiformia) could not be recovered. Terbelliformia is monophyletic (with the exclusion of Pectinariidae, for which only 18S data were available), whereas members of taxa such as Phyllodocida, Cirratuliformia, Sabellida and Scolecida are scattered over the trees. Clitellata is monophyletic, although Dinophilidae should possibly be included, and Clitellata has a sister group within the polychaetes. One major problem is the current lack of knowledge on the closest relatives to annelids and the position of the annelid root. We suggest that the poor resolution in the basal parts of the trees presented here may be due to lack of signal connected to incomplete data sets both in terms of terminal and gene sampling, rapid radiation events and/or uneven evolutionary rates and long‐branch attraction. © The Willi Hennig Society 2006.  相似文献   

11.
Taxon sampling may be critically important for phylogenetic accuracy because adding taxa can help to subdivide misleading long branches. Although the idea that added taxa can break up long branches was exemplified by a study of "incomplete" fossil taxa, the issue of taxon completeness (i.e., proportion of missing data) has been largely ignored in most subsequent discussions of taxon sampling and long-branch attraction. In this article, I use simulations to test the ability of incomplete taxa to subdivide long branches and improve phylogenetic accuracy in situations of potential long-branch attraction. The results show that for most methods and conditions examined, adding taxa that are only 50% complete may provide similar benefits to adding the same number of complete taxa (suggesting that the advantages of increased taxon sampling may be obtained with less data than previously considered). For parsimony, taxa that are less complete (5% to 25% complete) may often have limited ability to rescue analyses from long-branch attraction. In contrast, highly incomplete taxa can be surprisingly beneficial when using model-based methods. The results also suggest the importance of model-based methods in phylogenetic analyses that combine molecular and fossil data.  相似文献   

12.
A major assumption of many molecular phylogenetic methods is the homogeneity of nucleotide frequencies among taxa, which refers to the equality of the nucleotide frequency bias among species. Changes in nucleotide frequency among different lineages in a data set are thought to lead to erroneous phylogenetic inference because unrelated clades may appear similar because of evolutionarily unrelated similarities in nucleotide frequencies. We tested the effects of the heterogeneity of nucleotide frequency bias on phylogenetic inference, along with the interaction between this heterogeneity and stratified taxon sampling, by means of computer simulations using evolutionary parameters derived from genomic databases. We found that the phylogenetic trees inferred from data sets simulated under realistic, observed levels of heterogeneity for mammalian genes were reconstructed with accuracy comparable to those simulated with homogeneous nucleotide frequencies; the results hold for Neighbor-Joining, minimum evolution, maximum parsimony, and maximum-likelihood methods. The LogDet distance method, specifically designed to deal with heterogeneous nucleotide frequencies, does not perform better than distance methods that assume substitution pattern homogeneity among sequences. In these specific simulation conditions, we did not find a significant interaction between phylogenetic accuracy and substitution pattern heterogeneity among lineages, even when the taxon sampling is increased.  相似文献   

13.
A central question concerning data collection strategy for molecular phylogenies has been, is it better to increase the number of characters or the number of taxa sampled to improve the robustness of a phylogeny estimate? A recent simulation study concluded that increasing the number of taxa sampled is preferable to increasing the number of nucleotide characters, if taxa are chosen specifically to break up long branches. We explore this hypothesis by using empirical data from noctuoid moths, one of the largest superfamilies of insects. Separate studies of two nuclear genes, elongation factor-1 alpha (EF-1 alpha) and dopa decarboxylase (DDC), have yielded similar gene trees and high concordance with morphological groupings for 49 exemplar species. However, support levels were quite low for nodes deeper than the subfamily level. We tested the effects on phylogenetic signal of (1) increasing the taxon sampling by nearly 60%, to 77 species, and (2) combining data from the two genes in a single analysis. Surprisingly, the increased taxon sampling, although designed to break up long branches, generated greater disagreement between the two gene data sets and decreased support levels for deeper nodes. We appear to have inadvertently introduced new long branches, and breaking these up may require a yet larger taxon sample. Sampling additional characters (combining data) greatly increased the phylogenetic signal. To contrast the potential effect of combining data from independent genes with collection of the same total number of characters from a single gene, we simulated the latter by bootstrap augmentation of the single-gene data sets. Support levels for combined data were at least as high as those for the bootstrap-augmented data set for DDC and were much higher than those for the augmented EF-1 alpha data set. This supports the view that in obtaining additional sequence data to solve a refractory systematic problem, it is prudent to take them from an independent gene.  相似文献   

14.
Abstract.  In this study, we assessed the ability of mitochondrial genome sequences to recover a test phylogeny of five hymenopteran taxa from which phylogenetic relationships are well accepted. Our analyses indicated that the test phylogeny was well recovered in all nucleotide Bayesian analyses when all the available holometabolan (i.e. outgroup) taxa were included, but only in Bayesian analyses excluding third codon positions when only the hymenopteran representatives and a single outgroup were included. This result suggests that taxon sampling of the outgroup might be as important as taxon sampling of the ingroup when recovering hymenopteran phylogenetic relationships using whole mitochondrial genomes. Parsimony analyses were more sensitive to both taxon sampling and the analytical model than Bayesian analyses, and analyses using the protein dataset did not recover the test phylogeny. In general, mitochondrial genomes did not resolve the position of the Hymenoptera within the Holometabola with confidence, suggesting that an increased taxon sampling, both within the Holometabola and among outgroups, is necessary.  相似文献   

15.
Least-inclusive taxonomic unit: a new taxonomic concept for biology   总被引:2,自引:0,他引:2  
Phylogenetic taxonomy has been introduced as a replacement for the Linnaean system. It differs from traditional nomenclature in defining taxon names with reference to phylogenetic trees and in not employing ranks for supraspecific taxa. However, 'species' are currently kept distinct. Within a system of phylogenetic taxonomy we believe that taxon names should refer to monophyletic groups only and that species should not be recognized as taxa. To distinguish the smallest identified taxa, we here introduce the least-inclusive taxonomic unit (LITU), which are differentiated from more inclusive taxa by initial lower-case letters. LITUs imply nothing absolute about inclusiveness, only that subdivisions are not presently recognized.  相似文献   

16.
We illustrate how recently developed large sequence-length approximations to probabilities of correct phylogenetic reconstruction for maximum likelihood estimation can be used to evaluate experimental design strategies. The specific criterion of interest is the probability of correctly resolving an a priori defined split of interest in a phylogenetic tree. Design strategies considered include increased taxon sampling and increasing sequence length. Our analyses of specific examples strongly suggest that it is better to sample taxa that connect as close as possible to the split of interest. Assuming this can be done, these examples suggest it is better to sample additional taxa than to add a comparable number of sites for the existing taxa. If the rates of evolution in the added taxa are slow, it is better to choose taxa connecting to a long edge, but if rates are comparable to a sister lineage, it is not necessarily the best strategy to sample taxa connected to a long edge. We also examined deleting taxa while increasing the number of sites. Although deleting a small number of taxa distant from the split of interest can be beneficial, deleting too many or making poor choices as to what should be deleted can lead to smaller probabilities of correct reconstruction than for the original sequence data.  相似文献   

17.
DNA sequences from three mitochondrial genes and one nuclear gene were analyzed to determine the phylogeny of the Malagasy primate family Lemuridae. Whether analyzed separately or in combination, the data consistently indicate that Eulemur species comprise a clade that is sister to a Lemur catta plus Hapalemur clade. The genus Varecia is basal to both. Resolution of cladogenic events within Eulemur was found to be extremely problematic with a total of six alternative arrangements offered by various data sets and weighting regimes. We attempt to determine the best arrangement of Eulemur taxa through a variety of character and taxon sampling strategies. Because our study includes all but one Eulemur species, increased taxon sampling is probably not an option for enhancing phylogenetic accuracy. We find, however, that the combined genetic data set is more robust to changes in taxon sample than are any of the individual data sets, suggesting that increased character sampling stabilizes phylogenetic resolution. Nonetheless, due to the difficult nature of the problem, we may have to accept certain aspects of Eulemur interrelationships as uncertain.  相似文献   

18.
Many phylogenetic analyses that include numerous terminals but few genes show high resolution and branch support for relatively recently diverged clades, but lack of resolution and/or support for "basal" clades of the tree. The various benefits of increased taxon and character sampling have been widely discussed in the literature, albeit primarily based on simulations rather than empirical data. In this study, we used a well-sampled gene-tree analysis (based on 100 mitochondrial genomes of higher teleost fishes) to test empirically the efficiency of different methods of data sampling and phylogenetic inference to "correctly" resolve the basal clades of a tree (based on congruence with the reference tree constructed using all 100 taxa and 7990 characters). By itself, increased character sampling was an inefficient method by which to decrease the likelihood of "incorrect" resolution (i.e., incongruence with the reference tree) for parsimony analyses. Although increased taxon sampling was a powerful approach to alleviate "incorrect" resolution for parsimony analyses, it had the general effect of increasing the number of, and support for, "incorrectly" resolved clades in the Bayesian analyses. For both the parsimony and Bayesian analyses, increased taxon sampling, by itself, was insufficient to help resolve the basal clades, making this sampling strategy ineffective for that purpose. For this empirical study, the most efficient of the six approaches considered to resolve the basal clades when adding nucleotides to a dataset that consists of a single gene sampled for a small, but representative, number of taxa, is to increase character sampling and analyze the characters using the Bayesian method.  相似文献   

19.
All characters and trait systems in an organism share a common evolutionary history that can be estimated using phylogenetic methods. However, differential rates of change and the evolutionary mechanisms driving those rates result in pervasive phylogenetic conflict. These drivers need to be uncovered because mismatches between evolutionary processes and phylogenetic models can lead to high confidence in incorrect hypotheses. Incongruence between phylogenies derived from morphological versus molecular analyses, and between trees based on different subsets of molecular sequences has become pervasive as datasets have expanded rapidly in both characters and species. For more than a decade, evolutionary relationships among members of the New World bat family Phyllostomidae inferred from morphological and molecular data have been in conflict. Here, we develop and apply methods to minimize systematic biases, uncover the biological mechanisms underlying phylogenetic conflict, and outline data requirements for future phylogenomic and morphological data collection. We introduce new morphological data for phyllostomids and outgroups and expand previous molecular analyses to eliminate methodological sources of phylogenetic conflict such as taxonomic sampling, sparse character sampling, or use of different algorithms to estimate the phylogeny. We also evaluate the impact of biological sources of conflict: saturation in morphological changes and molecular substitutions, and other processes that result in incongruent trees, including convergent morphological and molecular evolution. Methodological sources of incongruence play some role in generating phylogenetic conflict, and are relatively easy to eliminate by matching taxa, collecting more characters, and applying the same algorithms to optimize phylogeny. The evolutionary patterns uncovered are consistent with multiple biological sources of conflict, including saturation in morphological and molecular changes, adaptive morphological convergence among nectar‐feeding lineages, and incongruent gene trees. Applying methods to account for nucleotide sequence saturation reduces, but does not completely eliminate, phylogenetic conflict. We ruled out paralogy, lateral gene transfer, and poor taxon sampling and outgroup choices among the processes leading to incongruent gene trees in phyllostomid bats. Uncovering and countering the possible effects of introgression and lineage sorting of ancestral polymorphism on gene trees will require great leaps in genomic and allelic sequencing in this species‐rich mammalian family. We also found evidence for adaptive molecular evolution leading to convergence in mitochondrial proteins among nectar‐feeding lineages. In conclusion, the biological processes that generate phylogenetic conflict are ubiquitous, and overcoming incongruence requires better models and more data than have been collected even in well‐studied organisms such as phyllostomid bats.  相似文献   

20.
Ecologists are increasingly making use of molecular phylogenies, especially in the fields of community ecology and conservation. However, these phylogenies are often used without full appreciation of their underlying assumptions and uncertainties. A frequent practice in ecological studies is inferring a phylogeny with molecular data from taxa only within the community of interest. These “inferred community phylogenies” are inherently biased in their taxon sampling. Despite the importance of comprehensive sampling in constructing phylogenies, the implications of using inferred community phylogenies in ecological studies have not been examined. Here, we evaluate how taxon sampling affects the quantification and comparison of community phylogenetic diversity using both simulated and empirical data sets. We demonstrate that inferred community trees greatly underestimate phylogenetic diversity and that the probability of incorrectly ranking community diversity can reach up to 25%, depending on the dating methods employed. We argue that to reach reliable conclusions, ecological studies must improve their taxon sampling and generate the best phylogeny possible.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号