首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A review of long-branch attraction   总被引:24,自引:1,他引:24  
The history of long‐branch attraction, and in particular methods suggested to detect and avoid the artifact to date, is reviewed. Methods suggested to avoid LBA‐artifacts include excluding long‐branch taxa, excluding faster evolving third codon positions, using inference methods less sensitive to LBA such as likelihood, the Aguinaldo et al. approach, sampling more taxa to break up long branches and sampling more characters especially of another kind, and the pros and cons of these are discussed. Methods suggested to detect LBA are numerous and include methodological disconcordance, RASA, separate partition analyses, parametric simulation, random outgroup sequences, long‐branch extraction, split decomposition and spectral analysis. Less than 10 years ago it was doubted if LBA occurred in real datasets. Today, examples are numerous in the literature and it is argued that the development of methods to deal with the problem is warranted. A 16 kbp dataset of placental mammals and a morphological and molecular combined dataset of gall waSPS are used to illustrate the particularly common problem of LBA of problematic ingroup taxa to outgroups. The preferred methods of separate partition analysis, methodological disconcordance, and long branch extraction are used to demonstrate detection methods. It is argued that since outgroup taxa almost always represent long branches and are as such a hazard towards misplacing long branched ingroup taxa, phylogenetic analyses should always be run with and without the outgroups included. This will detect whether only the outgroup roots the ingroup or if it simultaneously alters the ingroup topology, in which case previous studies have shown that the latter is most often the worse. Apart from that LBA to outgroups is the major and most common problem; scanning the literature also detected the ill advised comfort of high support values from thousands of characters, but very few taxa, in the age of genomics. Taxon sampling is crucial for an accurate phylogenetic estimate and trust cannot be put on whole mitochondrial or chloroplast genome studies with only a few taxa, despite their high support values. The placental mammal example demonstrates that parsimony analysis will be prone to LBA by the attraction of the tenrec to the distant marsupial outgroups. In addition, the murid rodents, creating the classic “the guinea‐pig is not a rodent” hypothesis in 1996, are also shown to be attracted to the outgroup by nuclear genes, although including the morphological evidence for rodents and Glires overcomes the artifact. The gall wasp example illustrates that Bayesian analyses with a partition‐specific GTR + Γ + I model give a conflicting resolution of clades, with a posterior probability of 1.0 when comparing ingroup alone versus outgroup rooted topologies, and this is due to long‐branch attraction to the outgroup. © The Willi Hennig Society 2005.  相似文献   

2.
A significant proportion of protein-encoding gene phylogenies in bacteria is inconsistent with the species phylogeny. It was usually argued that such inconsistencies resulted from lateral transfers. Here, by further studying the phylogeny of the oprF gene encoding the major surface protein in the bacterial Pseudomonas genus, we found that the incongruent tree topology observed results from a long-branch attraction (LBA) artifact and not from lateral transfers. LBA in the oprF phylogeny could be explained by the faster evolution in a lineage adapted to the rhizosphere, highlighting an unexpected adaptive radiation. We argue that analysis of such artifacts in other inconsistent bacterial phylogenies could be a valuable tool in molecular ecology to highlight cryptic adaptive radiations in microorganisms.  相似文献   

3.
In the context of exponential growing molecular databases, it becomes increasingly easy to assemble large multigene data sets for phylogenomic studies. The expected increase of resolution due to the reduction of the sampling (stochastic) error is becoming a reality. However, the impact of systematic biases will also become more apparent or even dominant. We have chosen to study the case of the long-branch attraction artefact (LBA) using real instead of simulated sequences. Two fast-evolving eukaryotic lineages, whose evolutionary positions are well established, microsporidia and the nucleomorph of cryptophytes, were chosen as model species. A large data set was assembled (44 species, 133 genes, and 24,294 amino acid positions) and the resulting rooted eukaryotic phylogeny (using a distant archaeal outgroup) is positively misled by an LBA artefact despite the use of a maximum likelihood-based tree reconstruction method with a complex model of sequence evolution. When the fastest evolving proteins from the fast lineages are progressively removed (up to 90%), the bootstrap support for the apparently artefactual basal placement decreases to virtually 0%, and conversely only the expected placement, among all the possible locations of the fast-evolving species, receives increasing support that eventually converges to 100%. The percentage of removal of the fastest evolving proteins constitutes a reliable estimate of the sensitivity of phylogenetic inference to LBA. This protocol confirms that both a rich species sampling (especially the presence of a species that is closely related to the fast-evolving lineage) and a probabilistic method with a complex model are important to overcome the LBA artefact. Finally, we observed that phylogenetic inference methods perform strikingly better with simulated as opposed to real data, and suggest that testing the reliability of phylogenetic inference methods with simulated data leads to overconfidence in their performance. Although phylogenomic studies can be affected by systematic biases, the possibility of discarding a large amount of data containing most of the nonphylogenetic signal allows recovering a phylogeny that is less affected by systematic biases, while maintaining a high statistical support.  相似文献   

4.

Background

Thanks to the large amount of signal contained in genome-wide sequence alignments, phylogenomic analyses are converging towards highly supported trees. However, high statistical support does not imply that the tree is accurate. Systematic errors, such as the Long Branch Attraction (LBA) artefact, can be misleading, in particular when the taxon sampling is poor, or the outgroup is distant. In an otherwise consistent probabilistic framework, systematic errors in genome-wide analyses can be traced back to model mis-specification problems, which suggests that better models of sequence evolution should be devised, that would be more robust to tree reconstruction artefacts, even under the most challenging conditions.

Methods

We focus on a well characterized LBA artefact analyzed in a previous phylogenomic study of the metazoan tree, in which two fast-evolving animal phyla, nematodes and platyhelminths, emerge either at the base of all other Bilateria, or within protostomes, depending on the outgroup. We use this artefactual result as a case study for comparing the robustness of two alternative models: a standard, site-homogeneous model, based on an empirical matrix of amino-acid replacement (WAG), and a site-heterogeneous mixture model (CAT). In parallel, we propose a posterior predictive test, allowing one to measure how well a model acknowledges sequence saturation.

Results

Adopting a Bayesian framework, we show that the LBA artefact observed under WAG disappears when the site-heterogeneous model CAT is used. Using cross-validation, we further demonstrate that CAT has a better statistical fit than WAG on this data set. Finally, using our statistical goodness-of-fit test, we show that CAT, but not WAG, correctly accounts for the overall level of saturation, and that this is due to a better estimation of site-specific amino-acid preferences.

Conclusion

The CAT model appears to be more robust than WAG against LBA artefacts, essentially because it correctly anticipates the high probability of convergences and reversions implied by the small effective size of the amino-acid alphabet at each site of the alignment. More generally, our results provide strong evidence that site-specificities in the substitution process need be accounted for in order to obtain more reliable phylogenetic trees.
  相似文献   

5.
Convergence has long been of interest to evolutionary biologists. Cave organisms appear to be ideal candidates for studying convergence in morphological, physiological, and developmental traits. Here we report apparent convergence in two cave-catfishes that were described on morphological grounds as congeners: Prietella phreatophila and Prietella lundbergi. We collected mitochondrial DNA sequence data from 10 species of catfishes, representing five of the seven genera in Ictaluridae, as well as seven species from a broad range of siluriform outgroups. Analysis of the sequence data under parsimony supports a monophyletic Prietella. However, both maximum-likelihood and Bayesian analyses support polyphyly of the genus, with P. lundbergi sister to Ictalurus and P. phreatophila sister to Ameiurus. The topological difference between parsimony and the other methods appears to result from long-branch attraction between the Prietella species. Similarly, the sequence data do not support several other relationships within Ictaluridae supported by morphology. We develop a new Bayesian method for examining variation in molecular rates of evolution across a phylogeny.  相似文献   

6.
Recent studies based on different types of data (i.e., morphology, molecules) have found strongly conflicting phylogenies for the genera of iguanid lizards but have been unable to explain the basis for this incongruence. We reanalyze published data from morphology and from the mitochondrial ND4, cytochrome b, 12S, and 16S genes to explore the sources of incongruence and resolve these conflicts. Much of the incongruence centers on the genus Cyclura, which is the sister taxon of Iguana, according to parsimony analyses of the morphology and the ribosomal genes, but is the sister taxon of all other Iguanini, according to the protein-coding genes. Maximum likelihood analyses show that there has been an increase in the rate of nucleotide substitution in Cyclura in the two protein-coding genes (ND4 and cytochrome b), although this increase is not as clear when parsimony is used to estimate branch lengths. Parametric simulations suggest that Cyclura may be misplaced by the protein-coding genes as a result of long-branch attraction; even when Cyclura and Iguana are sister taxa in a simulated phylogeny, Cyclura is still placed as the basal member of the Iguanini by parsimony analysis in 55% of the replicates. A similar long-branch attraction problem may also exist in the morphological data with regard to the placement of Sauromalus with the Galápagos iguanas (Amblyrhynchus and Conolophus). The results have many implications for the analysis of diverse data sets, the impact of long branches on parsimony and likelihood methods, and the use of certain protein-coding genes in phylogeny reconstruction.  相似文献   

7.
Microsporidia branch at the base of eukaryotic phylogenies inferred from translation elongation factor 1alpha (EF-1alpha) sequences. Because these parasitic eukaryotes are fungi (or close relatives of fungi), it is widely accepted that fast-evolving microsporidian sequences are artifactually "attracted" to the long branch leading to the archaebacterial (outgroup) sequences ("long-branch attraction," or "LBA"). However, no previous studies have explicitly determined the reason(s) why the artifactual allegiance of microsporidia and archaebacteria ("M + A") is recovered by all phylogenetic methods, including maximum likelihood, a method that is supposed to be resistant to classical LBA. Here we show that the M + A affinity can be attributed to those alignment sites associated with large differences in evolutionary site rates between the eukaryotic and archaebacterial subtrees. Therefore, failure to model the significant evolutionary rate distribution differences (covarion shifts) between the ingroup and outgroup sequences is apparently responsible for the artifactual basal position of microsporidia in phylogenetic analyses of EF-1alpha sequences. Currently, no evolutionary model that accounts for discrete changes in the site rate distribution on particular branches is available for either protein or nucleotide level phylogenetic analysis, so the same artifacts may affect many other "deep" phylogenies. Furthermore, given the relative similarity of the site rate patterns of microsporidian and archaebacterial EF-1alpha proteins ("parallel site rate variation"), we suggest that the microsporidian orthologs may have lost some eukaryotic EF-1alpha-specific nontranslational functions, exemplifying the extreme degree of reduction in this parasitic lineage.  相似文献   

8.
9.
Sequences of two chloroplast photosystem genes, psaA and psbB, together comprising about 3,500 bp, were obtained for all five major groups of extant seed plants and several outgroups among other vascular plants. Strongly supported, but significantly conflicting, phylogenetic signals were obtained in parsimony analyses from partitions of the data into first and second codon positions versus third positions. In the former, both genes agreed on a monophyletic gymnosperms, with Gnetales closely related to certain conifers. In the latter, Gnetales are inferred to be the sister group of all other seed plants, with gymnosperms paraphyletic. None of the data supported the modern "anthophyte hypothesis," which places Gnetales as the sister group of flowering plants. A series of simulation studies were undertaken to examine the error rate for parsimony inference. Three kinds of errors were examined: random error, systematic bias (both properties of finite data sets), and statistical inconsistency owing to long-branch attraction (an asymptotic property). Parsimony reconstructions were extremely biased for third-position data for psbB. Regardless of the true underlying tree, a tree in which Gnetales are sister to all other seed plants was likely to be reconstructed for these data. None of the combinations of genes or partitions permits the anthophyte tree to be reconstructed with high probability. Simulations of progressively larger data sets indicate the existence of long-branch attraction (statistical inconsistency) for third-position psbB data if either the anthophyte tree or the gymnosperm tree is correct. This is also true for the anthophyte tree using either psaA third positions or psbB first and second positions. A factor contributing to bias and inconsistency is extremely short branches at the base of the seed plant radiation, coupled with extremely high rates in Gnetales and nonseed plant outgroups.  相似文献   

10.
The nuclear large subunit (LSU) rRNA gene is a rich source of phylogenetic characters because of its large size, mosaic of slowly and rapidly evolving regions, and complex secondary structure variation. Nevertheless, many studies have indicated that inconsistency, bias, and gene-specific error (e.g., within-individual gene family variation, cryptic sequence simplicity, and sequence coevolution) can complicate animal phylogenies based on LSU rDNA sequences. However, most of these studies sampled small gene fragments from expansion segments--among animals only five nonchordate complete LSU sequences are published. In this study, we sequenced near-complete nuclear LSU genes from 11 representative daphniids (Crustacea). The daphniid expansion segment V6 was larger and showed more length variation (90-351 bp) than is found in all other reported LSU V6 sequences. Daphniid LSU (without the V6 region) phylogenies generally agreed with the existing phylogenies based on morphology and mtDNA sequences. Nevertheless, a major disagreement between the LSU and the expected trees involved a positively misleading association between the two taxa with the longest branches, Daphnia laevis and D. occidentalis. Both maximum parsimony (MP) and maximum likelihood (ML) optimality criteria recovered this association, but parametric simulations indicated that MP was markedly more sensitive to this bias than ML. Examination of data partitions indicated that the inconsistency was caused by increased nucleotide substitution rates in the branches leading to D. laevis and D. occidentalis rather than among-taxon differences in base composition or distribution of sites that are free to vary. These results suggest that lineage-specific rate acceleration can lead to long-branch attraction even in the conserved genes of animal species that are almost morphologically indistinguishable.  相似文献   

11.
Whole-genome duplication (WGD) produces sets of gene pairs that are all of the same age. We therefore expect that phylogenetic trees that relate these pairs to their orthologs in other species should show a single consistent topology. However, a previous study of gene pairs formed by WGD in the yeast Saccharomyces cerevisiae found conflicting topologies among neighbor-joining (NJ) trees drawn from different loci and suggested that this conflict was the result of "asynchronous functional divergence" of duplicated genes (Langkjaer, R. B., P. F. Cliften, M. Johnston, and J. Piskur. 2003. Yeast genome duplication was followed by asynchronous differentiation of duplicated genes. Nature 421:848-852). Here, we test whether the conflicting topologies might instead be due to asymmetrical rates of evolution leading to long-branch attraction (LBA) artifacts in phylogenetic trees. We constructed trees for 433 pairs of WGD paralogs in S. cerevisiae with their single orthologs in Saccharomyces kluyveri and Candida albicans. We find a strong correlation between the asymmetry of evolutionary rates of a pair of S. cerevisiae paralogs and the topology of the tree inferred for that pair. Saccharomyces cerevisiae gene pairs with approximately equal rates of evolution tend to give phylogenies in which the WGD postdates the speciation between S. cerevisiae and S. kluyveri (B-trees), whereas trees drawn from gene pairs with asymmetrical rates tend to show WGD pre-dating this speciation (A-trees). Gene order data from throughout the genome indicate that the "A-trees" are artifacts, even though more than 50% of gene pairs are inferred to have this topology when the NJ method as implemented in ClustalW (i.e., with Poisson correction of distances) is used to construct the trees. This LBA artifact can be ameliorated, but not eliminated, by using gamma-corrected distances or by using maximum likelihood trees with robustness estimated by the Shimodaira-Hasegawa test. Tests for adaptive evolution indicated that positive selection might be the cause of rate asymmetry in a substantial fraction (19%) of the paralog pairs.  相似文献   

12.
13.
Study of structure/function relationships constitutes an important field of research, especially for modification of protein function and drug design. However, the fact that rational design (i.e. the modification of amino acid sequences by means of directed mutagenesis, based on knowledge of the three-dimensional structure) appears to be much less efficient than irrational design (i.e. random mutagenesis followed by in vitro selection) clearly indicates that we understand little about the relationships between primary sequence, three-dimensional structure and function. The use of evolutionary approaches and concepts will bring insights to this difficult question. The increasing availability of multigene family sequences that has resulted from genome projects has inspired the creation of novel in silico evolutionary methods to predict details of protein function in duplicated (paralogous) proteins. The underlying principle of all such approaches is to compare the evolutionary properties of homologous sequence positions in paralogs. It has been proposed that the positions that show switches in substitution rate over time--i.e., 'heterotachous sites'--are good indicators of functional divergence. However, it appears that heterotachy is a much more general process, since most variable sites of homologous proteins with no evidence of functional shift are heterotachous. Similarly, it appears that switches in substitution rate are as frequent when paralogous sequences are compared as when orthologous sequences are compared. Heterotachy, instead of being indicative of functional shift, may more generally reflect a less specific process related to the many intra- and inter-molecular interactions compatible with a range of more or less equally viable protein conformations. These interactions will lead to different constraints on the nature of the primary sequences, consistently with theories suggesting the non-independence of substitutions in proteins. However, a specific type of amino acid variation might constitute a good indicator of functional divergence: substitutions occurring at positions that are generally slowly evolving. Such substitutions at constrained sites are indeed much more frequent soon after gene duplication. The identification and analysis of these sites by complementing structural information with evolutionary data may represent a promising direction to future studies dealing with the functional characterization of an ever increasing number of multi-gene families identified by complete genome analysis.  相似文献   

14.
Closure operations are a useful device in both the theory and practice of tree reconstruction in biology and other areas of classification. These operations take a collection of trees (rooted or unrooted) that classify overlapping sets of objects at their leaves, and infer further tree-like relationships. In this paper we investigate closure operations on phylogenetic trees; both rooted and unrooted; as well as on X-splits, and in a general abstract setting. We derive a number of new results, particularly concerning the completeness (and incompleteness) and complexity of various types of closure rules.  相似文献   

15.
RAPD problems in phylogenetics   总被引:1,自引:0,他引:1  
This paper is intended to clarify some of the questions related with the application of RAPD for phylogenetic reconstruction purposes. Using different specimens of mammals selected across various taxonomic levels, we assessed the validity of RAPD to recover a known phylogeny, using four distance coefficients (simple matching, Russell & Rao, Jaccard, and Dice). We assessed the minimum number of primers required in the computations to obtain stable results in terms of distance estimates and/or topologies of the derived trees. These results based on distance methods were compared with those obtained with parsimony analyses of RAPD markers. Both approaches have shown to be equally problematic for comparing taxa above the family level. On the basis of these comparisons among various indices and methods, we recommend the use of Jaccard or Dice coefficients, with no less than twelve primers. We also suggest validation of any phylogeny based on RAPD data with a resampling procedure (i.e. the bootstrap or the jackknife) before any sound conclusion can be drawn.  相似文献   

16.
Heterotachy, an important process of protein evolution.   总被引:10,自引:0,他引:10  
Because of functional constraints, substitution rates vary among the positions of a protein but are usually assumed to be constant at a given site during evolution. The distribution of the rates across the sequence positions generally fits a Gamma distribution. Models of sequence evolution were accordingly designed and led to improved phylogenetic reconstruction. However, it has been convincingly demonstrated that the evolutionary rate of a given position is not always constant throughout time. We called such within-site rate variations heterotachy (for "different speed" in Greek). Yet, heterotachy was found among homologous sequences of distantly related organisms, often with different functions. In such cases, the functional constraints are likely different, which would explain the different distribution of variable sites. To evaluate the importance of heterotachy, we focused on amino acid sequences of mitochondrial cytochrome b, for which the function is likely the same in all vertebrates. Using 2,038 sequences, we demonstrate that 95% of the variable positions are heterotachous, i.e., underwent dramatic variations of substitution rate among vertebrate lineages. Heterotachy even occurs at small evolutionary scale, and in these cases it is very unlikely to be related to functional changes. Since a large number of sequences are required to efficiently detect heterotachy, the extent of this phenomenon could not be estimated for all proteins yet. It could be as large as for cytochrome b, since this protein is not a peculiar case. The observations made here open several new avenues of research, such as the understanding of the evolution of functional constraints or the improvement of phylogenetic reconstruction methods.  相似文献   

17.
Concepts of species proposed within the phylogenetic paradigm arecritically reviewed. Most so called phylogenetic species concepts relyheavily on factors immaterial to phylogenetic hypotheses. Thus, theyhave limited empirical content and offer weak bases on which to makedecisions about real problems related to species. Any workable notion ofspecies relies on an explicit character analysis, rather than onabstract properties of lineages, narrative predications and speculationson tokogenetic relationships. Species only exist conjecturally, as thesmallest meaningful units for phylogenetic analysis, as based oncharacter evidence. Such an idea considers species to be conjecturesbased on similarity, that are subsequently subject to testing by theresults of analysis. Species, thus, are units of phylogenetic analysisin the same way as hypotheses of homology are units of comparablesimilarities, i.e. conjectures to be tested by congruence. Althoughmonophyly need not be demonstrated for species-level taxa, hypotheses ofrelationships are the only basis to refute species limits and guidenecessary rearrangements. The factor that leads to recognition ofspecies is similarity in observed traits. The concept of life cycle isintroduced as an important element in the discussion of species, as anefficient way to convey subsidiary notions of sexual dimorphism,polymorphism, polytypy and clusters of diagnosable semaphoronts. Thenotion of exemplars is used to expand the concept ofspecies-as-individual-organisms into a more generally usable concept.Species are therefore proposed for a diagnosable sample of(observed or inferred) life cycles represented by exemplars all of whichare hypothesized to attach to the same node in a cladogram, and whichare not structured into other similarly diagnosable clusters. Thisdefinition is character-based, potentially testable by reference to abranching diagram, and dispenses with reference to ancestor-descendantrelationships or regression into population concepts. It provides aworkable basis on which to proceed with phylogenetic analysis and abasis for that analysis to refute or refine species limits. A protocolis offered for testing hypotheses of species boundaries in cladograms.  相似文献   

18.
The nature of heterotachy at the center of recent controversy over the relative performance of tree-building methods is different from the form of heterotachy that has been inferred in empirical studies. The latter have suggested that proportions of variable sites (p(var)) vary among orthologues and among paralogues. However, the strength of this inference, describing what may be one of the most important evolutionary properties of sequence data, has remained weak. Consequently, other models of sequence evolution have been proposed to explain some long-branch attraction (LBA) problems that could be attributed to differences in p(var). For an empirical case with plastid and eubacterial RNA polymerase sequences, we confirm using capture-recapture estimates and simulations that p(var) can differ among orthologues in anciently diverged evolutionary lineages. We find that parsimony and a least squares distance method that implements an overly simple model of sequence evolution are susceptible to LBA induced by this form of heterotachy. Although homogeneous maximum likelihood inference was found to be robust to model misspecification in our specific example, we caution against assuming that it will always be so.  相似文献   

19.
分子系统学在生物保护中的意义   总被引:8,自引:1,他引:7  
王文 《生物多样性》1998,6(2):138-142
本文综述了近年来分子系统学的原理和方法及其在生物多样性保护中的应用和发展。分子系统学方法可以很好地确定物种保护的基本单元——进化显著性单元,并可用于推测群体的发展状态,从而为物种的保护提供了一项新的具很强操作性的科学手段。  相似文献   

20.
In two areas of phylogenetics, contrary predictions have been developed and maintained for character analysis and weighting. With regard to adaptation, many have argued that adaptive characters are poorly suited to phylogenetic analysis because of a propensity for homoplasy, while others have argued that complex adaptive characters should be given high weight because homoplasy in complex characters is unlikely. Similarly, with regard to correlated sets of characters, one point of view is that such sets should be collapsed into a single character-a single piece of phylogenetic evidence. Another point of view is that a suite of correlated characters should be emphasized in phylogenetics, again because recurrence of detailed similarity in the same suite of features is unlikely. In this paper, I discuss the theoretical background of adaptation and functional integration with respect to phylogenetic systematics of primates. Several character examples are reviewed with regard to their functional morphology and phylogenetic signal: postorbital structures, tympanic morphology, fusion of the mandibular symphysis, the tooth comb, strepsirrhine talar morphology, and the prehensile tail. It is clear when considering characters such as these that some characters are synapomorphic of major clades and at the same time functionally important. This appears particularly to be the case when characters are integrated into a complex and maintained as stable configurations. Rather than being simply a problem in character analysis, processes of integration may help to explain the utility of phylogenetically informative characters. On the other hand, the character examples also highlight the difficulty in forming a priori predictions about a character's phylogenetic signal. Explanations of patterns of character evolution are often clade-specific, which does not allow for a simple framework of character selection and/or weighting.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号