首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Despite the advances in understanding molecular evolution, current phylogenetic methods barely take account of a fraction of the complexity of evolution. We are chiefly constrained by our incomplete knowledge of molecular evolutionary processes and the limits of computational power. These limitations lead to the establishment of either biologically simplistic models that rarely account for a fraction of the complexity involved or overfitting models that add little resolution to the problem. Such oversimplified models may lead us to assign high confidence to an incorrect tree (inconsistency). Rate-across-site (RAS) models are commonly used evolutionary models in phylogenetic studies. These account for heterogeneity in the evolutionary rates among sites but do not account for changing within-site rates across lineages (heterotachy). If heterotachy is common, using RAS models may lead to systematic errors in tree inference. In this work we show possible misleading effects in tree inference when the assumption of constant within-site rates across lineages is violated using maximum likelihood. Using a simulation study, we explore the ways in which gamma stationary models can lead to wrong topology or to deceptive bootstrap support values when the within-site rates change across lineages. More precisely, we show that different degrees of heterotachy mislead phylogenetic inference when the model assumed is stationary. Finally, we propose a geometry-based approach to visualize and to test for the possible existence of bias due to heterotachy.  相似文献   

2.
Likelihood, parsimony, and heterogeneous evolution   总被引:5,自引:0,他引:5  
Evolutionary rates vary among sites and across the phylogenetic tree (heterotachy). A recent analysis suggested that parsimony can be better than standard likelihood at recovering the true tree given heterotachy. The authors recommended that results from parsimony, which they consider to be nonparametric, be reported alongside likelihood results. They also proposed a mixture model, which was inconsistent but better than either parsimony or standard likelihood under heterotachy. We show that their main conclusion is limited to a special case for the type of model they study. Their mixture model was inconsistent because it was incorrectly implemented. A useful nonparametric model should perform well over a wide range of possible evolutionary models, but parsimony does not have this property. Likelihood-based methods are therefore the best way to deal with heterotachy.  相似文献   

3.
Heterotachy, an important process of protein evolution.   总被引:10,自引:0,他引:10  
Because of functional constraints, substitution rates vary among the positions of a protein but are usually assumed to be constant at a given site during evolution. The distribution of the rates across the sequence positions generally fits a Gamma distribution. Models of sequence evolution were accordingly designed and led to improved phylogenetic reconstruction. However, it has been convincingly demonstrated that the evolutionary rate of a given position is not always constant throughout time. We called such within-site rate variations heterotachy (for "different speed" in Greek). Yet, heterotachy was found among homologous sequences of distantly related organisms, often with different functions. In such cases, the functional constraints are likely different, which would explain the different distribution of variable sites. To evaluate the importance of heterotachy, we focused on amino acid sequences of mitochondrial cytochrome b, for which the function is likely the same in all vertebrates. Using 2,038 sequences, we demonstrate that 95% of the variable positions are heterotachous, i.e., underwent dramatic variations of substitution rate among vertebrate lineages. Heterotachy even occurs at small evolutionary scale, and in these cases it is very unlikely to be related to functional changes. Since a large number of sequences are required to efficiently detect heterotachy, the extent of this phenomenon could not be estimated for all proteins yet. It could be as large as for cytochrome b, since this protein is not a peculiar case. The observations made here open several new avenues of research, such as the understanding of the evolution of functional constraints or the improvement of phylogenetic reconstruction methods.  相似文献   

4.
Heterotachy occurs when the relative evolutionary rates among sites are not the same across lineages. Sequence alignments are likely to exhibit heterotachy with varying severity because the intensity of purifying selection and adaptive forces at a given amino acid or DNA sequence position is unlikely to be the same in different species. In a recent study, the influence of heterotachy on the performance of different phylogenetic methods was examined using computer simulation for a four-species phylogeny. Maximum parsimony (MP) was reported to generally outperform maximum likelihood (ML). However, our comparisons of MP and ML methods using the methods and evaluation criteria employed in that study, but considering the possible range of proportions of sites involved in heterotachy, contradict their findings and indicate that, in fact, ML is significantly superior to MP even under heterotachy.  相似文献   

5.
In the context of exponential growing molecular databases, it becomes increasingly easy to assemble large multigene data sets for phylogenomic studies. The expected increase of resolution due to the reduction of the sampling (stochastic) error is becoming a reality. However, the impact of systematic biases will also become more apparent or even dominant. We have chosen to study the case of the long-branch attraction artefact (LBA) using real instead of simulated sequences. Two fast-evolving eukaryotic lineages, whose evolutionary positions are well established, microsporidia and the nucleomorph of cryptophytes, were chosen as model species. A large data set was assembled (44 species, 133 genes, and 24,294 amino acid positions) and the resulting rooted eukaryotic phylogeny (using a distant archaeal outgroup) is positively misled by an LBA artefact despite the use of a maximum likelihood-based tree reconstruction method with a complex model of sequence evolution. When the fastest evolving proteins from the fast lineages are progressively removed (up to 90%), the bootstrap support for the apparently artefactual basal placement decreases to virtually 0%, and conversely only the expected placement, among all the possible locations of the fast-evolving species, receives increasing support that eventually converges to 100%. The percentage of removal of the fastest evolving proteins constitutes a reliable estimate of the sensitivity of phylogenetic inference to LBA. This protocol confirms that both a rich species sampling (especially the presence of a species that is closely related to the fast-evolving lineage) and a probabilistic method with a complex model are important to overcome the LBA artefact. Finally, we observed that phylogenetic inference methods perform strikingly better with simulated as opposed to real data, and suggest that testing the reliability of phylogenetic inference methods with simulated data leads to overconfidence in their performance. Although phylogenomic studies can be affected by systematic biases, the possibility of discarding a large amount of data containing most of the nonphylogenetic signal allows recovering a phylogeny that is less affected by systematic biases, while maintaining a high statistical support.  相似文献   

6.
Evolutionary relationships are typically inferred from molecular sequence data using a statistical model of the evolutionary process. When the model accurately reflects the underlying process, probabilistic phylogenetic methods recover the correct relationships with high accuracy. There is ample evidence, however, that models commonly used today do not adequately reflect real-world evolutionary dynamics. Virtually all contemporary models assume that relatively fast-evolving sites are fast across the entire tree, whereas slower sites always evolve at relatively slower rates. Many molecular sequences, however, exhibit site-specific changes in evolutionary rates, called "heterotachy." Here we examine the accuracy of 2 phylogenetic methods for incorporating heterotachy, the mixed branch length model--which incorporates site-specific rate changes by summing likelihoods over multiple sets of branch lengths on the same tree--and the covarion model, which uses a hidden Markov process to allow sites to switch between variable and invariable as they evolve. Under a variety of simple heterogeneous simulation conditions, the mixed model was dramatically more accurate than homotachous models, which were subject to topological biases as well as biases in branch length estimates. When data were simulated with strong versions of the types of heterotachy observed in real molecular sequences, the mixed branch length model was more accurate than homotachous techniques. Analyses of empirical data sets confirmed that the mixed branch length model can improve phylogenetic accuracy under conditions that cause homotachous models to fail. In contrast, the covarion model did not improve phylogenetic accuracy compared with homotachous models and was sometimes substantially less accurate. We conclude that a mixed branch length approach, although not the solution to all phylogenetic errors, is a valuable strategy for improving the accuracy of inferred trees.  相似文献   

7.
Numerous models of molecular evolution have been formulated to describe the forces that shape sequence divergence among homologous proteins. These models have greatly enhanced our understanding of evolutionary processes. Rarely are such models empirically tested in the laboratory, and even more rare, are such models exploited to generate novel molecules useful for synthetic biology. Here, we experimentally demonstrate that the heterotachy model of evolution captures signatures of functional divergence among homologous elongation factors (EFs) between bacterial EF-Tu and eukaryotic eEF1A. These EFs are GTPases that participate in protein translation by presenting aminoacylated-tRNAs to the ribosome. Upon release from the ribosome, the EFs are recharged by nucleotide exchange factors EF-Ts in bacteria or eEF1B in eukaryotes. The two nucleotide exchange factors perform analogous functions despite not being homologous proteins. The heterotachy model was used to identify a set of sites in eEF1A/EF-Tu associated with eEF1B binding in eukaryotes and another reciprocal set associated with EF-Ts binding in bacteria. Introduction of bacterial EF-Tu residues at these sites into eEF1A protein efficiently disrupted binding of cognate eEF1B as well as endowed eEF1A with the novel ability to bind bacterial EF-Ts. We further demonstrate that eEF1A variants, unlike yeast wild-type, can function in a reconstituted in vitro bacterial translation system.  相似文献   

8.
The rate at which a given site in a gene sequence alignment evolves over time may vary. This phenomenon--known as heterotachy--can bias or distort phylogenetic trees inferred from models of sequence evolution that assume rates of evolution are constant. Here, we describe a phylogenetic mixture model designed to accommodate heterotachy. The method sums the likelihood of the data at each site over more than one set of branch lengths on the same tree topology. A branch-length set that is best for one site may differ from the branch-length set that is best for some other site, thereby allowing different sites to have different rates of change throughout the tree. Because rate variation may not be present in all branches, we use a reversible-jump Markov chain Monte Carlo algorithm to identify those branches in which reliable amounts of heterotachy occur. We implement the method in combination with our 'pattern-heterogeneity' mixture model, applying it to simulated data and five published datasets. We find that complex evolutionary signals of heterotachy are routinely present over and above variation in the rate or pattern of evolution across sites, that the reversible-jump method requires far fewer parameters than conventional mixture models to describe it, and serves to identify the regions of the tree in which heterotachy is most pronounced. The reversible-jump procedure also removes the need for a posteriori tests of 'significance' such as the Akaike or Bayesian information criterion tests, or Bayes factors. Heterotachy has important consequences for the correct reconstruction of phylogenies as well as for tests of hypotheses that rely on accurate branch-length information. These include molecular clocks, analyses of tempo and mode of evolution, comparative studies and ancestral state reconstruction. The model is available from the authors' website, and can be used for the analysis of both nucleotide and morphological data.  相似文献   

9.
Variation in substitution rates among evolutionary lineages (among-lineage rate variation or ALRV) has been reported to negatively affect the estimation of phylogenies. When the substitution processes underlying ALRV are modeled inadequately, non-sister taxa with similar substitution rates are estimated incorrectly as sister species due to long-branch attraction. Recent advances in modeling site-specific rate variation (heterotachy) have reduced the impacts of ALRV on phylogeny estimation in several empirical and simulated datasets. However, the addition of parameters to the substitution model reduces power to estimate each parameter correctly, which can also lead to incorrect phylogeny estimation. A potential solution to this problem is to identify the levels of ALRV that negatively impact phylogeny estimation such that molecular markers with non-deleterious levels of ALRV can be identified. To this end, we used analyses of empirical and simulated gene datasets to evaluate whether levels of ALRV identified in a mitochondrial genomic dataset for salamanders negatively impacted phylogeny estimation. We simulated data with and without ALRV, holding all other evolutionary parameters constant, and compared the phylogenetic performance of both simulated and empirical datasets. Overall, we found limited, positive effects of ALRV on phylogeny estimation in this dataset, the majority of which resulted from an increase in substitution rate on short branches. We conclude that ALRV does not always negatively impact phylogeny estimation. Therefore, ALRV can likely be disregarded as a criterion for marker selection in comparable phylogenetic studies.  相似文献   

10.
Although probabilistic models of genotype (e.g., DNA sequence) evolution have been greatly elaborated, less attention has been paid to the effect of phenotype on the evolution of the genotype. Here we propose an evolutionary model and a Bayesian inference procedure that are aimed at filling this gap. In the model, RNA secondary structure links genotype and phenotype by treating the approximate free energy of a sequence folded into a secondary structure as a surrogate for fitness. The underlying idea is that a nucleotide substitution resulting in a more stable secondary structure should have a higher rate than a substitution that yields a less stable secondary structure. This free energy approach incorporates evolutionary dependencies among sequence positions beyond those that are reflected simply by jointly modeling change at paired positions in an RNA helix. Although there is not a formal requirement with this approach that secondary structure be known and nearly invariant over evolutionary time, computational considerations make these assumptions attractive and they have been adopted in a software program that permits statistical analysis of multiple homologous sequences that are related via a known phylogenetic tree topology. Analyses of 5S ribosomal RNA sequences are presented to illustrate and quantify the strong impact that RNA secondary structure has on substitution rates. Analyses on simulated sequences show that the new inference procedure has reasonable statistical properties. Potential applications of this procedure, including improved ancestral sequence inference and location of functionally interesting sites, are discussed.  相似文献   

11.
Sequence alignments of multiple genes are routinely used to infer phylogenetic relationships among species. The analysis of their concatenation is more likely to give correct results under an assumption of homotachy (i.e., the evolutionary rates within lineages in each of the concatenated genes are constant during evolution). Here, we examine how the violation of homotachy (i.e., presence of within-site rate variation, called heterotachy) distorts species phylogenies. A theoretical examination has been conducted using a four taxon case and the neighbor joining (NJ) method, concluding that NJ recovers the incorrect tree when concatenated genes exhibit heterotachy. The application of average and weighted-average distance approaches, where gene boundaries are kept intact, overcomes the detrimental effect of heterotachy in multigene analysis using the NJ method.  相似文献   

12.

Background  

Model violations constitute the major limitation in inferring accurate phylogenies. Characterizing properties of the data that are not being correctly handled by current models is therefore of prime importance. One of the properties of protein evolution is the variation of the relative rate of substitutions across sites and over time, the latter is the phenomenon called heterotachy. Its effect on phylogenetic inference has recently obtained considerable attention, which led to the development of new models of sequence evolution. However, thus far focus has been on the quantitative heterogeneity of the evolutionary process, thereby overlooking more qualitative variations.  相似文献   

13.
The w statistic introduced by Lockhart et al. (1998. A covariotide model explains apparent phylogenetic structure of oxygenic photosynthetic lineages. Mol Biol Evol. 15:1183-1188) is a simple and easily calculated statistic intended to detect heterotachy by comparing amino acid substitution patterns between two monophyletic groups of protein sequences. It is defined as the difference between the fraction of varied sites in both groups and the fraction of varied sites in each group. The w test has been used to distinguish a covarion process from equal rates and rates variation across sites processes. Using simulation we show that the w test is effective for small data sets and for data sets that have low substitution rates in the groups but can have difficulties when these conditions are not met. Using site entropy as a measure of variability of a sequence site, we modify the w statistic to a w' statistic by assigning as varied in one group those sites that are actually varied in both groups but have a large entropy difference. We show that the w' test has more power to detect two kinds of heterotachy processes (covarion and bivariate rate shifts) in large and variable data. We also show that a test of Pearson's correlation of the site entropies between two monophyletic groups can be used to detect heterotachy and has more power than the w' test. Furthermore, we demonstrate that there are settings where the correlation test as well as w and w' tests do not detect heterotachy signals in data simulated under a branch length mixture model. In such cases, it is sometimes possible to detect heterotachy through subselection of appropriate taxa. Finally, we discuss the abilities of the three statistical tests to detect a fourth mode of heterotachy: lineage-specific changes in proportion of variable sites.  相似文献   

14.
Gene duplication is regarded as an important evolutionary mechanism creating genetic and phenotypic novelty. At the same time, the evolutionary mechanisms following gene duplication have been a subject of much debate. Here we analyze the sequence evolution of zonadhesin, a mammalian sperm ligand that binds to the oocyte zona pellucida in a species-specific manner. In pig, rabbit, and primates, precursor zonadhesin comprises, among others, one partial and four complete tandem repetitive D domains. The mouse precursor is distinguished by 20 additional partial D3 domains consisting of 120 amino acids each. This gene structure allows sequence comparison in both paralogues and orthologues. Detailed sequence analysis reveals that D domains evolve faster across paralogues than orthologues. Moreover, at the codon level, partial D3 paralogues of mouse show evidence of positive selection, whereas the corresponding orthologues do not. Individual posttranslational motif patterns and positive selection point to neofunctionalization of partial D3 paralogues of mouse, rather than subfunctionalization. However, as we found additional evidence for homogenization by partial gene conversion, sequence evolution of partial D3 paralogues of mouse might be better described as a combination of divergent and convergent evolution. So far, the divergence at the codon level has outbalanced the convergence at the level of smaller fragments. The probable driving force behind the evolutionary patterns observed is sexual selection. We finally discuss whether the functional determination influences the evolutionary regime acting on sperm ligands and egg receptors, respectively. [Reviewing Editor: Dr. Yves Van de Peer]  相似文献   

15.

Background  

Probabilistic methods have progressively supplanted the Maximum Parsimony (MP) method for inferring phylogenetic trees. One of the major reasons for this shift was that MP is much more sensitive to the Long Branch Attraction (LBA) artefact than is Maximum Likelihood (ML). However, recent work by Kolaczkowski and Thornton suggested, on the basis of simulations, that MP is less sensitive than ML to tree reconstruction artefacts generated by heterotachy, a phenomenon that corresponds to shifts in site-specific evolutionary rates over time. These results led these authors to recommend that the results of ML and MP analyses should be both reported and interpreted with the same caution. This specific conclusion revived the debate on the choice of the most accurate phylogenetic method for analysing real data in which various types of heterogeneities occur. However, variation of evolutionary rates across species was not explicitly incorporated in the original study of Kolaczkowski and Thornton, and in most of the subsequent heterotachous simulations published to date, where all terminal branch lengths were kept equal, an assumption that is biologically unrealistic.  相似文献   

16.
Assessment of the evolutionary process is crucial for understanding the effect of protein structure and function on sequence evolution and for many other analyses in molecular evolution. Here, we used simulations to study how taxon sampling affects accuracy of parameter estimation and topological inference in the absence of branch length asymmetry. With maximum-likelihood analysis, we find that adding taxa dramatically improves both support for the evolutionary model and accurate assessment of its parameters when compared with increasing the sequence length. Using a method we call "doppelg?nger trees," we distinguish the contributions of two sources of improved topological inference: greater knowledge about internal nodes and greater knowledge of site-specific rate parameters. Surprisingly, highly significant support for the correct general model does not lead directly to improved topological inference. Instead, substantial improvement occurs only with accurate assessment of the evolutionary process at individual sites. Although these results are based on a simplified model of the evolutionary process, they indicate that in general, assuming processes are not independent and identically distributed among sites, more extensive sampling of taxonomic biodiversity will greatly improve analytical results in many current sequence data sets with moderate sequence lengths.  相似文献   

17.
Nestedness analysis has become increasingly popular in the study of biogeographic patterns of species occurrence. Nested patterns are those in which the species composition of small assemblages is a nested subset of larger assemblages. For species interaction networks such as plant–pollinator webs, nestedness analysis has also proven a valuable tool for revealing ecological and evolutionary constraints. Despite this popularity, there has been substantial controversy in the literature over the best methods to define and quantify nestedness, and how to test for patterns of nestedness against an appropriate statistical null hypothesis. Here we review this rapidly developing literature and provide suggestions and guidelines for proper analyses. We focus on the logic and the performance of different metrics and the proper choice of null models for statistical inference. We observe that traditional 'gap-counting' metrics are biased towards species loss among columns (occupied sites) and that many metrics are not invariant to basic matrix properties. The study of nestedness should be combined with an appropriate gradient analysis to infer possible causes of the observed presence–absence sequence. In our view, statistical inference should be based on a null model in which row and columns sums are fixed. Under this model, only a relatively small number of published empirical matrices are significantly nested. We call for a critical reassessment of previous studies that have used biased metrics and unconstrained null models for statistical inference.  相似文献   

18.
Abstract

Molecular sequence data have become prominent tools for phylogenetic relationship inference, particularly useful in the analysis of highly diverse taxonomic orders. Ribosomal RNA sequences provide markers that can be used in the study of phylogeny, because their function and structure have been conserved to a large extent throughout the evolutionary history of organisms. These sequences are inferred from cloned or enzymatically amplified gene sequences, or determined by direct RNA sequencing. The first step of the phylogenetic interpretation of nucleic acid sequence variations implies proper alignment of corresponding sequences from various organisms. Best alignment based on similarity criteria is greatly reinforced, in the case of ribosomal RNAs, by secondary structure homologies. Distance matrix methods to infer evolutionary trees are based on the assumption that the phylogenetic distance between each pair of organisms is proportional to the number of nucleotide substitution events. Computed tree inference methods usually take into consideration the possibility of unequal mutation rates among lineages. Divergence times can be estimated on the tree, provided that at least one lineage has been dated by fossil records. We have utilized this approach based on ribosomal RNA sequence comparison to investigate the phylogenetic relationship between dinoflagellated and other eukaryote protists, and to refine controverse phylogenies of the class Dinophycae.  相似文献   

19.
系统发育研究中“长枝吸引”现象概述   总被引:1,自引:0,他引:1  
黎一苇  于黎  张亚平 《遗传》2007,29(6):659-667
系统发育研究(phylogeny)不仅有助于重建地球所有生物体的进化历史, 而且还可以揭示进化生物学领域中的一些基本问题。清晰了解各生物物种进化历程及不同物种之间的进化关系, 是进一步研究和探索生物学其他学科的基础。但是现今广泛应用的所有系统发育分析方法都存在一定的局限性, 在一定程度上不能有效消除各种误差, 从而不能客观地处理和分析数据, 也就不能成功重建生物进化历程, 真实反映物种进化关系。系统发育研究中, “长枝吸引” (Long-branch Attraction, LBA)假象是最为困扰研究者的问题。文章从“长枝吸引”问题的产生原由、检测方法以及消除策略等多个方面进行详尽概述, 并通过列举典型实例, 阐述了解决“长枝吸引”问题的途径。  相似文献   

20.
Akashi H  Goel P  John A 《PloS one》2007,2(10):e1065
Reliable inference of ancestral sequences can be critical to identifying both patterns and causes of molecular evolution. Robustness of ancestral inference is often assumed among closely related species, but tests of this assumption have been limited. Here, we examine the performance of inference methods for data simulated under scenarios of codon bias evolution within the Drosophila melanogaster subgroup. Genome sequence data for multiple, closely related species within this subgroup make it an important system for studying molecular evolutionary genetics. The effects of asymmetric and lineage-specific substitution rates (i.e., varying levels of codon usage bias and departures from equilibrium) on the reliability of ancestral codon usage was investigated. Maximum parsimony inference, which has been widely employed in analyses of Drosophila codon bias evolution, was compared to an approach that attempts to account for uncertainty in ancestral inference by weighting ancestral reconstructions by their posterior probabilities. The latter approach employs maximum likelihood estimation of rate and base composition parameters. For equilibrium and most non-equilibrium scenarios that were investigated, the probabilistic method appears to generate reliable ancestral codon bias inferences for molecular evolutionary studies within the D. melanogaster subgroup. These reconstructions are more reliable than parsimony inference, especially when codon usage is strongly skewed. However, inference biases are considerable for both methods under particular departures from stationarity (i.e., when adaptive evolution is prevalent). Reliability of inference can be sensitive to branch lengths, asymmetry in substitution rates, and the locations and nature of lineage-specific processes within a gene tree. Inference reliability, even among closely related species, can be strongly affected by (potentially unknown) patterns of molecular evolution in lineages ancestral to those of interest.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号