首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 933 毫秒
1.
The phylogenetic inference of ancestral protein sequences is a powerful technique for the study of molecular evolution, but any conclusions drawn from such studies are only as good as the accuracy of the reconstruction method. Every inference method leads to errors in the ancestral protein sequence, resulting in potentially misleading estimates of the ancestral protein's properties. To assess the accuracy of ancestral protein reconstruction methods, we performed computational population evolution simulations featuring near-neutral evolution under purifying selection, speciation, and divergence using an off-lattice protein model where fitness depends on the ability to be stable in a specified target structure. We were thus able to compare the thermodynamic properties of the true ancestral sequences with the properties of “ancestral sequences” inferred by maximum parsimony, maximum likelihood, and Bayesian methods. Surprisingly, we found that methods such as maximum parsimony and maximum likelihood that reconstruct a “best guess” amino acid at each position overestimate thermostability, while a Bayesian method that sometimes chooses less-probable residues from the posterior probability distribution does not. Maximum likelihood and maximum parsimony apparently tend to eliminate variants at a position that are slightly detrimental to structural stability simply because such detrimental variants are less frequent. Other properties of ancestral proteins might be similarly overestimated. This suggests that ancestral reconstruction studies require greater care to come to credible conclusions regarding functional evolution. Inferred functional patterns that mimic reconstruction bias should be reevaluated.  相似文献   

2.
Reconstruction of ancestral DNA and amino acid sequences is an important means of inferring information about past evolutionary events. Such reconstructions suggest changes in molecular function and evolutionary processes over the course of evolution and are used to infer adaptation and convergence. Maximum likelihood (ML) is generally thought to provide relatively accurate reconstructed sequences compared to parsimony, but both methods lead to the inference of multiple directional changes in nucleotide frequencies in primate mitochondrial DNA (mtDNA). To better understand this surprising result, as well as to better understand how parsimony and ML differ, we constructed a series of computationally simple "conditional pathway" methods that differed in the number of substitutions allowed per site along each branch, and we also evaluated the entire Bayesian posterior frequency distribution of reconstructed ancestral states. We analyzed primate mitochondrial cytochrome b (Cyt-b) and cytochrome oxidase subunit I (COI) genes and found that ML reconstructs ancestral frequencies that are often more different from tip sequences than are parsimony reconstructions. In contrast, frequency reconstructions based on the posterior ensemble more closely resemble extant nucleotide frequencies. Simulations indicate that these differences in ancestral sequence inference are probably due to deterministic bias caused by high uncertainty in the optimization-based ancestral reconstruction methods (parsimony, ML, Bayesian maximum a posteriori). In contrast, ancestral nucleotide frequencies based on an average of the Bayesian set of credible ancestral sequences are much less biased. The methods involving simpler conditional pathway calculations have slightly reduced likelihood values compared to full likelihood calculations, but they can provide fairly unbiased nucleotide reconstructions and may be useful in more complex phylogenetic analyses than considered here due to their speed and flexibility. To determine whether biased reconstructions using optimization methods might affect inferences of functional properties, ancestral primate mitochondrial tRNA sequences were inferred and helix-forming propensities for conserved pairs were evaluated in silico. For ambiguously reconstructed nucleotides at sites with high base composition variability, ancestral tRNA sequences from Bayesian analyses were more compatible with canonical base pairing than were those inferred by other methods. Thus, nucleotide bias in reconstructed sequences apparently can lead to serious bias and inaccuracies in functional predictions.  相似文献   

3.
We constructed mutant genes of Caldococcus noboribetus isocitrate dehydrogenase containing ancestral amino acid residues that were inferred using the maximal likelihood method and a composite phylogenetic tree of isocitrate dehydrogenase and 3-isopropylmalate dehydrogenase. The mutant genes were expressed in Escherichia coli and the protein products purified. Thermostabilities, reported as the half-inactivation temperatures, for the purified enzymes were determined and compared with that of the wild-type enzyme. Four of the five mutant enzymes have greater thermal stabilities than wild-type isocitrate dehydrogenase. The results are compatible with the hyperthermophilic universal ancestor (commonote) hypothesis. Incorporation of ancestral residues into a modern-day protein sequence can be used to improve protein thermostability.  相似文献   

4.
Ancestral state reconstruction is a method used to study the evolutionary trajectories of quantitative characters on phylogenies. Although efficient methods for univariate ancestral state reconstruction under a Brownian motion model have been described for at least 25 years, to date no generalization has been described to allow more complex evolutionary models, such as multivariate trait evolution, non‐Brownian models, missing data, and within‐species variation. Furthermore, even for simple univariate Brownian motion models, most phylogenetic comparative R packages compute ancestral states via inefficient tree rerooting and full tree traversals at each tree node, making ancestral state reconstruction extremely time‐consuming for large phylogenies. Here, a computationally efficient method for fast maximum likelihood ancestral state reconstruction of continuous characters is described. The algorithm has linear complexity relative to the number of species and outperforms the fastest existing R implementations by several orders of magnitude. The described algorithm is capable of performing ancestral state reconstruction on a 1,000,000‐species phylogeny in fewer than 2 s using a standard laptop, whereas the next fastest R implementation would take several days to complete. The method is generalizable to more complex evolutionary models, such as phylogenetic regression, within‐species variation, non‐Brownian evolutionary models, and multivariate trait evolution. Because this method enables fast repeated computations on phylogenies of virtually any size, implementation of the described algorithm can drastically alleviate the computational burden of many otherwise prohibitively time‐consuming tasks requiring reconstruction of ancestral states, such as phylogenetic imputation of missing data, bootstrapping procedures, Expectation‐Maximization algorithms, and Bayesian estimation. The described ancestral state reconstruction algorithm is implemented in the Rphylopars functions anc.recon and phylopars.  相似文献   

5.
Z. Yang  S. Kumar    M. Nei 《Genetics》1995,141(4):1641-1650
A statistical method was developed for reconstructing the nucleotide or amino acid sequences of extinct ancestors, given the phylogeny and sequences of the extant species. A model of nucleotide or amino acid substitution was employed to analyze data of the present-day sequences, and maximum likelihood estimates of parameters such as branch lengths were used to compare the posterior probabilities of assignments of character states (nucleotides or amino acids) to interior nodes of the tree; the assignment having the highest probability was the best reconstruction at the site. The lysozyme c sequences of six mammals were analyzed by using the likelihood and parsimony methods. The new likelihood-based method was found to be superior to the parsimony method. The probability that the amino acids for all interior nodes at a site reconstructed by the new method are correct was calculated to be 0.91, 0.86, and 0.73 for all, variable, and parsimony-informative sites, respectively, whereas the corresponding probabilities for the parsimony method were 0.84, 0.76, and 0.51, respectively. The probability that an amino acid in an ancestral sequence is correctly reconstructed by the likelihood analysis ranged from 91.3 to 98.7% for the four ancestral sequences.  相似文献   

6.
A comparative approach was taken for identifying amino acid substitutions that may be under positive Darwinian selection and are correlated with spectral shifts among orthologous and paralogous lepidopteran long wavelength-sensitive (LW) opsins. Four novel LW opsin fragments were isolated, cloned, and sequenced from eye-specific cDNAs from two butterflies, Vanessa cardui (Nymphalidae) and Precis coenia (Nymphalidae), and two moths, Spodoptera exigua (Noctuidae) and Galleria mellonella (Pyralidae). These opsins were sampled because they encode visual pigments having a naturally occurring range of lambda(max) values (510-530 nm), which in combination with previously characterized lepidopteran opsins, provide a complete range of known spectral sensitivities (510-575 nm) among lepidopteran LW opsins. Two recent opsin gene duplication events were found within the papilionid but not within the nymphalid butterfly families through neighbor-joining, maximum parsimony, and maximum likelihood phylogenetic analyses of 13 lepidopteran opsin sequences. An elevated rate of evolution was detected in the red-shifted Papilio Rh3 branch following gene duplication, because of an increase in the amino acid substitution rate in the transmembrane domain of the protein, a region that forms the chromophore-binding pocket of the visual pigment. A maximum likelihood approach was used to estimate omega, the ratio of nonsynonymous to synonymous substitutions per site. Branch-specific tests of selection (free-ratio) identified one branch with omega = 2.1044, but the small number of substitutions involved was not significantly different from the expected number of changes under the neutral expectation of omega = 1. Ancestral sequences were reconstructed with a high degree of certainty from these data. Reconstructed ancestral sequences revealed several instances of convergence to the same amino acid between butterfly and vertebrate cone pigments, and between independent branches of the butterfly opsin tree that are correlated with spectral shifts.  相似文献   

7.
Gissi C  San Mauro D  Pesole G  Zardoya R 《Gene》2006,366(2):228-237
We explore whether phylogenetic analyses of the same sequence data set at the amino acid and nucleotide level are able to recover congruent topologies, as well as the advantages and limitations of both alternative approaches. As a case study, mitochondrial protein-coding genes were used to discern among competing hypotheses on the phylogenetic relationships of major anuran amphibian lineages. To properly address this phylogenetic question, the complete nucleotide sequences of the mitochondrial genomes of two archaeobatrachian species, Ascaphus truei and Pelobates cultripes, were determined anew. Bayesian and maximum likelihood phylogenetic inferences of the same sequence data set were performed based on both amino acid and nucleotide characters, with the latter analysed either as codons or as a reduced data set of first+second (P12) codon positions. In addition, likelihood-based ratio tests were performed to evaluate the support of alternative topologies. The different data sets arrived at congruent and highly supported topologies, suggesting a similar phylogenetic resolving power of the two character types provided that correctly selected sites and appropriate evolutionary models are used. The reconstructed anuran mitochondrial phylogeny supports the paraphyly of Archaeobatrachia, with Ascaphus as sister group to all the remaining anurans, and Pelobates as sister group of Neobatrachia. However, the employed tree reconstruction methods and likelihood-based ratio tests seemed to be negatively affected by the fast evolving sequences of neobatrachians, suggesting that the phylogeny of Anura here presented is not definitive, and needs further investigation using an extended taxon sampling.  相似文献   

8.
Nucleotide and inferred amino acid sequences from two nuclear protein-encoding genes, elongation factor-aα and RNA polymerase II, were obtained from 34 myriapods and 14 other arthropods to determine phylogenetic relationships among and within the myriapod classes. Phylogenetic analyses using maximum parsimony and maximum likelihood methods recovered all three represented myriapod classes (Chilopoda, Diplopoda, Symphyla) and all multiply sampled chilopod and diplopod orders, often with high node support. In contrast, relationships between classes and between orders were recovered less consistently and node support was typically lower. The temporal structure of phylogenetic diversification in Myriapoda may explain this apparent pattern of the phylogenetic recovery.  相似文献   

9.
为了探究进化模型对DNA条形码分类的影响, 本研究以雾灵山夜蛾科44个种的标本为材料, 获得COI基因序列。使用邻接法(neighbor-joining)、 最大简约法(maximum parsimony)、 最大似然法(maximum likelihood)以及贝叶斯法(Bayesian inference)构建系统发育树, 并且对邻接法的12种模型、 最大似然法的7种模型、 贝叶斯法的2种模型进行模型成功率的评估。结果表明, 邻接法的12种模型成功率相差不大, 较稳定; 最大似然法及贝叶斯法的不同模型成功率存在明显差异, 不稳定; 最大简约法不基于模型, 成功率比较稳定。邻接法及最大似然法共有6种相同的模型, 这6种模型在不同的方法中成功率存在差异。此外, 分子数据中存在单个物种仅有一条序列的情况, 显著降低了模型成功率, 表明在DNA条形码研究中, 每个物种需要有多个样本。  相似文献   

10.
The relative efficiencies of different protein-coding genes of the mitochondrial genome and different tree-building methods in recovering a known vertebrate phylogeny (two whale species, cow, rat, mouse, opossum, chicken, frog, and three bony fish species) was evaluated. The tree-building methods examined were the neighbor joining (NJ), minimum evolution (ME), maximum parsimony (MP), and maximum likelihood (ML), and both nucleotide sequences and deduced amino acid sequences were analyzed. Generally speaking, amino acid sequences were better than nucleotide sequences in obtaining the true tree (topology) or trees close to the true tree. However, when only first and second codon positions data were used, nucleotide sequences produced reasonably good trees. Among the 13 genes examined, Nd5 produced the true tree in all tree-building methods or algorithms for both amino acid and nucleotide sequence data. Genes Cytb and Nd4 also produced the correct tree in most tree-building algorithms when amino acid sequence data were used. By contrast, Co2, Nd1, and Nd41 showed a poor performance. In general, large genes produced better results, and when the entire set of genes was used, all tree-building methods generated the true tree. In each tree-building method, several distance measures or algorithms were used, but all these distance measures or algorithms produced essentially the same results. The ME method, in which many different topologies are examined, was no better than the NJ method, which generates a single final tree. Similarly, an ML method, in which many topologies are examined, was no better than the ML star decomposition algorithm that generates a single final tree. In ML the best substitution model chosen by using the Akaike information criterion produced no better results than simpler substitution models. These results question the utility of the currently used optimization principles in phylogenetic construction. Relatively simple methods such as the NJ and ML star decomposition algorithms seem to produce as good results as those obtained by more sophisticated methods. The efficiencies of the NJ, ME, MP, and ML methods in obtaining the correct tree were nearly the same when amino acid sequence data were used. The most important factor in constructing reliable phylogenetic trees seems to be the number of amino acids or nucleotides used.   相似文献   

11.
Summary Several forms of maximum likelihood models are applied to aligned amino acid sequence data coded for in the mitochondrial DNA of six species (chicken, frog, human, bovine, mouse, and rat). These models range in form from relatively simple models of the type currently used for inferring phylogenetic tree structure to models more complex than those that have been used previously. No major discrepancies between the optimal trees inferred by any of these methods are found, but there are huge differences in adequacy of fit. A very significant finding is that the fit of any of these models is vastly improved by allowing a certain proportion of the amino acid sites to be invariant. An even more important, although disquieting, finding is that none of these models fits well, as judged by standard statistical criteria. The primary reason for this is that amino acid sites undergo substitution according to a process that is very heterogeneous. Because most phylogenetic inference is accomplished by choosing the optimal tree under the assumption that a homogeneous process is acting on the sites, the potential invalidity of some such conclusions is raised by this article's results. The seriousness of this problem depends upon the robustness of the phylogenetic inferential procedure to departures from the underlying model.  相似文献   

12.
The reliabilities of parsimony-based and likelihood-based methods for inferring positive selection at single amino acid sites were studied using the nucleotide sequences of human leukocyte antigen (HLA) genes, in which positive selection is known to be operating at the antigen recognition site. The results indicate that the inference by parsimony-based methods is robust to the use of different evolutionary models and generally more reliable than that by likelihood-based methods. In contrast, the results obtained by likelihood-based methods depend on the models and on the initial parameter values used. It is sometimes difficult to obtain the maximum likelihood estimates of parameters for a given model, and the results obtained may be false negatives or false positives depending on the initial parameter values. It is therefore preferable to use parsimony-based methods as long as the number of sequences is relatively large and the branch lengths of the phylogenetic tree are relatively small.  相似文献   

13.
We used new 18S and 28S rRNA sequences analysed with parsimony, maximum likelihood and Bayesian methods of phylogenetic reconstruction to show that Nemertodermatida, generally classified as the sister group of Acoela within the recently proposed Phylum Acoelomorpha, are a separate basal bilaterian lineage. We used several analytical approaches to control for possible long branch attraction (LBA) artefacts in our results. Parsimony and the model based phylogenetic reconstruction methods that incorporate 'corrections' for substitution rate heterogenities yielded concordant results. When putative long branch taxa were experimentally removed the resulting topologies were consistent with our total evidence analysis. Deletion of fast-evolving nucleotide sites decreased resolution and clade support, but did not support a topology conflicting with the total evidence analysis. Establishment of Acoela and Nemertodermatida as two early lineages facilitates reconstruction of ancestral bilaterian features. The ancestor of extant Bilateria was a small, benthic direct developer without coelom or a planktonic larval stage. The previously proposed Phylum Acoelomorpha is dismissed as paraphyletic.  相似文献   

14.
Procedures for performing cladistic analyses can provide powerful tools for understanding the evolution of neuropeptide and polypeptide hormone coding genes. These analyses can be done on either amino acid data sets or nucleotide data sets and can utilize several different algorithms that are dependent on distinct sets of operating assumptions and constraints. In some cases, the results of these analyses can be used to gauge phylogenetic relationships between taxa. Selecting the proper cladistic analysis strategy is dependent on the taxonomic level of analysis and the rate of evolution within the orthologous genes being evaluated. For example, previous studies have shown that the amino acid sequence of proopiomelanocortin (POMC), the common precursor for the melanocortins and beta-endorphin, can be used to resolve phylogenetic relationships at the class and order level. This study tested the hypothesis that POMC sequences could be used to resolve phylogenetic relationships at the family taxonomic level. Cladistic analyses were performed on amphibian POMC sequences characterized from the marine toad, Bufo marinus (family Bufonidae; this study), the spadefoot toad, Spea multiplicatus (family Pelobatidae), the African clawed frog, Xenopus laevis (family Pipidae) and the laughing frog, Rana ridibunda (family Ranidae). In these analyses the sequence of Australian lungfish POMC was used as the outgroup. The analyses were done at the amino acid level using the maximum parsimony algorithm and at the nucleotide level using the maximum likelihood algorithm. For the anuran POMC genes, analysis at the nucleotide level using the maximum likelihood algorithm generated a cladogram with higher bootstrap values than the maximum parsimony analysis of the POMC amino acid data set. For anuran POMC sequences, analysis of nucleotide sequences using the maximum likelihood algorithm would appear to be the preferred strategy for resolving phylogenetic relationships at the family taxonomic level.  相似文献   

15.
Several lines of evidence such as the basal location of thermophilic lineages in large-scale phylogenetic trees and the ancestral sequence reconstruction of single enzymes or large protein concatenations support the conclusion that the ancestors of the bacterial and archaeal domains were thermophilic organisms which were adapted to hot environments during the early stages of the Earth. A parsimonious reasoning would therefore suggest that the last universal common ancestor (LUCA) was also thermophilic. Various authors have used branch-wise non-homogeneous evolutionary models that better capture the variation of molecular compositions among lineages to accurately reconstruct the ancestral G + C contents of ribosomal RNAs and the ancestral amino acid composition of highly conserved proteins. They confirmed the thermophilic nature of the ancestors of Bacteria and Archaea but concluded that LUCA, their last common ancestor, was a mesophilic organism having a moderate optimal growth temperature. In this letter, we investigate the unknown nature of the phylogenetic signal that informs ancestral sequence reconstruction to support this non-parsimonious scenario. We find that rate variation across sites of molecular sequences provides information at different time scales by recording the oldest adaptation to temperature in slow-evolving regions and subsequent adaptations in fast-evolving ones.  相似文献   

16.
Granule-bound starch synthase: structure, function, and phylogenetic utility   总被引:18,自引:2,他引:16  
Interest in the use of low-copy nuclear genes for phylogenetic analyses of plants has grown rapidly, because highly repetitive genes such as those commonly used are limited in number. Furthermore, because low- copy genes are subject to different evolutionary processes than are plastid genes or highly repetitive nuclear markers, they provide a valuable source of independent phylogenetic evidence. The gene for granule-bound starch synthase (GBSSI or waxy) exists in a single copy in nearly all plants examined so far. Our study of GBSSI had three parts: (1) Amino acid sequences were compared across a broad taxonomic range, including grasses, four dicotyledons, and the microbial homologs of GBSSI. Inferred structural information was used to aid in the alignment of these very divergent sequences. The informed alignments highlight amino acids that are conserved across all sequences, and demonstrate that structural motifs can be highly conserved in spite of marked divergence in amino acid sequence. (2) Maximum-likelihood (ML) analyses were used to examine exon sequence evolution throughout grasses. Differences in probabilities among substitution types and marked among-site rate variation contributed to the observed pattern of variation. Of the parameters examined in our set of likelihood models, the inclusion of among-site rate variation following a gamma distribution caused the greatest improvement in likelihood score. (3) We performed cladistic parsimony analyses of GBSSI sequences throughout grasses, within tribes, and within genera to examine the phylogenetic utility of the gene. Introns provide useful information among very closely related species, but quickly become difficult to align among more divergent taxa. Exons are variable enough to provide extensive resolution within the family, but with low bootstrap support. The combined results of amino acid sequence comparisons, maximum-likelihood analyses, and phylogenetic studies underscore factors that might affect phylogenetic reconstruction. In this case, accommodation of the variable rate of evolution among sites might be the first step in maximizing the phylogenetic utility of GBSSI.   相似文献   

17.
Ancestral sequence reconstruction has had recent success in decoding the origins and the determinants of complex protein functions. However, phylogenetic analyses of remote homologues must handle extreme amino acid sequence diversity resulting from extended periods of evolutionary change. We exploited the wealth of protein structures to develop an evolutionary model based on protein secondary structure. The approach follows the differences between discrete secondary structure states observed in modern proteins and those hypothesized in their immediate ancestors. We implemented maximum likelihood-based phylogenetic inference to reconstruct ancestral secondary structure. The predictive accuracy from the use of the evolutionary model surpasses that of comparative modeling and sequence-based prediction; the reconstruction extracts information not available from modern structures or the ancestral sequences alone. Based on a phylogenetic analysis of a sequence-diverse protein family, we showed that the model can highlight relationships that are evolutionarily rooted in structure and not evident in amino acid-based analysis.  相似文献   

18.
Stochastic models of nucleotide substitution are playing an increasingly important role in phylogenetic reconstruction through such methods as maximum likelihood. Here, we examine the behaviour of a simple substitution model, and establish some links between the methods of maximum parsimony and maximum likelihood under this model.  相似文献   

19.
It is widely assumed that high resource specificity predisposes lineages toward greater likelihood of extinction and lower likelihood of diversification than more generalized lineages. This suggests that host range evolution in parasitic organisms should proceed from generalist to specialist, and specialist lineages should be found at the 'tips' of phylogenies. To test these hypotheses, parsimony and maximum likelihood methods were used to reconstruct the evolution of host range on a phylogeny of parasitoid flies in the family Tachinidae. In contrast to predictions, most reconstructions indicated that generalists were repeatedly derived from specialist lineages and tended to occupy terminal branches of the phylogeny. These results are critically examined with respect to hypotheses concerning the evolution of specialization, the inherent difficulties in inferring host ranges, our knowledge of tachinid-host associations, and the methodological problems associated with ancestral character state reconstruction. Both parsimony and likelihood reconstructions are shown to provide misleading results and it is argued that independent evidence, in addition to phylogenetic trees, is needed to inform models of the evolution of host range and the evolutionary consequences of specialization.  相似文献   

20.
Phylogenetic analyses based on mitochondrial DNA have yielded widely differing relationships among members of the arthropod lineage Arachnida, depending on the nucleotide coding schemes and models of evolution used. We enhanced taxonomic coverage within the Arachnida greatly by sequencing seven new arachnid mitochondrial genomes from five orders. We then used all 13 mitochondrial protein-coding genes from these genomes to evaluate patterns of nucleotide and amino acid biases. Our data show that two of the six orders of arachnids (spiders and scorpions) have experienced shifts in both nucleotide and amino acid usage in all their protein-coding genes, and that these biases mislead phylogeny reconstruction. These biases are most striking for the hydrophobic amino acids isoleucine and valine, which appear to have evolved asymmetrical exchanges in response to shifts in nucleotide composition. To improve phylogenetic accuracy based on amino acid differences, we tested two recoding methods: (1) removing all isoleucine and valine sites and (2) recoding amino acids based on their physiochemical properties. We find that these methods yield phylogenetic trees that are consistent in their support of ancient intraordinal divergences within the major arachnid lineages. Further refinement of amino acid recoding methods may help us better delineate interordinal relationships among these diverse organisms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号