首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Reconstruction of ancestral DNA and amino acid sequences is an important means of inferring information about past evolutionary events. Such reconstructions suggest changes in molecular function and evolutionary processes over the course of evolution and are used to infer adaptation and convergence. Maximum likelihood (ML) is generally thought to provide relatively accurate reconstructed sequences compared to parsimony, but both methods lead to the inference of multiple directional changes in nucleotide frequencies in primate mitochondrial DNA (mtDNA). To better understand this surprising result, as well as to better understand how parsimony and ML differ, we constructed a series of computationally simple "conditional pathway" methods that differed in the number of substitutions allowed per site along each branch, and we also evaluated the entire Bayesian posterior frequency distribution of reconstructed ancestral states. We analyzed primate mitochondrial cytochrome b (Cyt-b) and cytochrome oxidase subunit I (COI) genes and found that ML reconstructs ancestral frequencies that are often more different from tip sequences than are parsimony reconstructions. In contrast, frequency reconstructions based on the posterior ensemble more closely resemble extant nucleotide frequencies. Simulations indicate that these differences in ancestral sequence inference are probably due to deterministic bias caused by high uncertainty in the optimization-based ancestral reconstruction methods (parsimony, ML, Bayesian maximum a posteriori). In contrast, ancestral nucleotide frequencies based on an average of the Bayesian set of credible ancestral sequences are much less biased. The methods involving simpler conditional pathway calculations have slightly reduced likelihood values compared to full likelihood calculations, but they can provide fairly unbiased nucleotide reconstructions and may be useful in more complex phylogenetic analyses than considered here due to their speed and flexibility. To determine whether biased reconstructions using optimization methods might affect inferences of functional properties, ancestral primate mitochondrial tRNA sequences were inferred and helix-forming propensities for conserved pairs were evaluated in silico. For ambiguously reconstructed nucleotides at sites with high base composition variability, ancestral tRNA sequences from Bayesian analyses were more compatible with canonical base pairing than were those inferred by other methods. Thus, nucleotide bias in reconstructed sequences apparently can lead to serious bias and inaccuracies in functional predictions.  相似文献   

2.
The phylogenetic inference of ancestral protein sequences is a powerful technique for the study of molecular evolution, but any conclusions drawn from such studies are only as good as the accuracy of the reconstruction method. Every inference method leads to errors in the ancestral protein sequence, resulting in potentially misleading estimates of the ancestral protein's properties. To assess the accuracy of ancestral protein reconstruction methods, we performed computational population evolution simulations featuring near-neutral evolution under purifying selection, speciation, and divergence using an off-lattice protein model where fitness depends on the ability to be stable in a specified target structure. We were thus able to compare the thermodynamic properties of the true ancestral sequences with the properties of “ancestral sequences” inferred by maximum parsimony, maximum likelihood, and Bayesian methods. Surprisingly, we found that methods such as maximum parsimony and maximum likelihood that reconstruct a “best guess” amino acid at each position overestimate thermostability, while a Bayesian method that sometimes chooses less-probable residues from the posterior probability distribution does not. Maximum likelihood and maximum parsimony apparently tend to eliminate variants at a position that are slightly detrimental to structural stability simply because such detrimental variants are less frequent. Other properties of ancestral proteins might be similarly overestimated. This suggests that ancestral reconstruction studies require greater care to come to credible conclusions regarding functional evolution. Inferred functional patterns that mimic reconstruction bias should be reevaluated.  相似文献   

3.
Akashi H  Goel P  John A 《PloS one》2007,2(10):e1065
Reliable inference of ancestral sequences can be critical to identifying both patterns and causes of molecular evolution. Robustness of ancestral inference is often assumed among closely related species, but tests of this assumption have been limited. Here, we examine the performance of inference methods for data simulated under scenarios of codon bias evolution within the Drosophila melanogaster subgroup. Genome sequence data for multiple, closely related species within this subgroup make it an important system for studying molecular evolutionary genetics. The effects of asymmetric and lineage-specific substitution rates (i.e., varying levels of codon usage bias and departures from equilibrium) on the reliability of ancestral codon usage was investigated. Maximum parsimony inference, which has been widely employed in analyses of Drosophila codon bias evolution, was compared to an approach that attempts to account for uncertainty in ancestral inference by weighting ancestral reconstructions by their posterior probabilities. The latter approach employs maximum likelihood estimation of rate and base composition parameters. For equilibrium and most non-equilibrium scenarios that were investigated, the probabilistic method appears to generate reliable ancestral codon bias inferences for molecular evolutionary studies within the D. melanogaster subgroup. These reconstructions are more reliable than parsimony inference, especially when codon usage is strongly skewed. However, inference biases are considerable for both methods under particular departures from stationarity (i.e., when adaptive evolution is prevalent). Reliability of inference can be sensitive to branch lengths, asymmetry in substitution rates, and the locations and nature of lineage-specific processes within a gene tree. Inference reliability, even among closely related species, can be strongly affected by (potentially unknown) patterns of molecular evolution in lineages ancestral to those of interest.  相似文献   

4.
Z. Yang  S. Kumar    M. Nei 《Genetics》1995,141(4):1641-1650
A statistical method was developed for reconstructing the nucleotide or amino acid sequences of extinct ancestors, given the phylogeny and sequences of the extant species. A model of nucleotide or amino acid substitution was employed to analyze data of the present-day sequences, and maximum likelihood estimates of parameters such as branch lengths were used to compare the posterior probabilities of assignments of character states (nucleotides or amino acids) to interior nodes of the tree; the assignment having the highest probability was the best reconstruction at the site. The lysozyme c sequences of six mammals were analyzed by using the likelihood and parsimony methods. The new likelihood-based method was found to be superior to the parsimony method. The probability that the amino acids for all interior nodes at a site reconstructed by the new method are correct was calculated to be 0.91, 0.86, and 0.73 for all, variable, and parsimony-informative sites, respectively, whereas the corresponding probabilities for the parsimony method were 0.84, 0.76, and 0.51, respectively. The probability that an amino acid in an ancestral sequence is correctly reconstructed by the likelihood analysis ranged from 91.3 to 98.7% for the four ancestral sequences.  相似文献   

5.
The ability of the principle of parsimony to accurately reconstruct molecular evolutionary pathways from an analysis of amino acid or nucleic acid sequences from extant organisms is tested by direct comparison with a known pathway. Topological errors occur under specified conditions. Importantly, given no errors in the topology, and error-free experimental sequences, the ancestral sequences inferred by the parsimony principle err significantly, the magnitude of the error increasing with the distance of the nodal sequence from the present. These errors are irreducible as an inherent consequence of any evolutionary process in which chance processes operate within the constraints imposed by Darwinian selection. Formulae are derived which predict the errors in the ancestral sequences from a knowledge of only the internodal distances. The parsimony solution is not a reliably good solution. It is necessary to develop a detailed understanding of the interaction between chance processes and natural selection to further advance our understanding of molecular change in proteins and nucleic acids.  相似文献   

6.
The pattern of amino acid residue replacement in the components of the bursicon signaling system (involving the BURSα/BURSβ heterodimer and its receptor BURSrec) was reconstructed across a phylogeny of 17 insect species, in order to test for the co-occurrence of replacements at sets of individual sites. Sets of three or more branches with perfectly concordant changes occurred to a greater extent than expected by chance, given the observed level of amino acid change. The latter sites (SPC sites) were found to have distinctive characteristics: (1) the mean number of changes was significantly lower at SPC sites than that at other sites with multiple changes; (2) SPC sites had a significantly greater tendency toward parallel amino acid changes than other sites with multiple changes, but no greater tendency toward convergent changes; and (3) parallel changes tended to involve relatively similar amino acids, as indicated by relatively low mean chemical distances. The results implicated functional constraint, permitting only a limited subset of amino acids in a given site, as a major factor in causing both parallel amino acid replacement and coordinated amino acid changes in different sites of the same protein and of interacting proteins in this system.  相似文献   

7.
The Bryaceae are a large cosmopolitan moss family including genera of significant morphological and taxonomic complexity. Phylogenetic relationships within the Bryaceae were reconstructed based on DNA sequence data from all three genomic compartments. In addition, maximum parsimony and Bayesian inference were employed to reconstruct ancestral character states of 38 morphological plus four habitat characters and eight insertion/deletion events. The recovered phylogenetic patterns are generally in accord with previous phylogenies based on chloroplast DNA sequence data and three major clades are identified. The first clade comprises Bryum bornholmense, B. rubens, B. caespiticium, and Plagiobryum. This corroborates the hypothesis suggested by previous studies that several Bryum species are more closely related to Plagiobryum than to the core Bryum species. The second clade includes Acidodontium, Anomobryum, and Haplodontium, while the third clade contains the core Bryum species plus Imbribryum. Within the latter clade, B. subapiculatum and B. tenuisetum form the sister clade to Imbribryum. Reconstructions of ancestral character states under maximum parsimony and Bayesian inference suggest fourteen morphological synapomorphies for the ingroup and synapomorphies are detected for most clades within the ingroup. Maximum parsimony and Bayesian reconstructions of ancestral character states are mostly congruent although Bayesian inference shows that the posterior probability of ancestral character states may decrease dramatically when node support is taken into account. Bayesian inference also indicates that reconstructions may be ambiguous at internal nodes for highly polymorphic characters.  相似文献   

8.
9.
Phylogenetic analyses of three families of arthropod apyrases were used to reconstruct the evolutionary relationships of salivary-expressed apyrases, which have an anti-coagulant function in blood-feeding arthropods. Members of the 5′nucleotidase family were recruited for salivary expression in blood-feeding species at least five separate times in the history of arthropods, while members of the Cimex-type apyrase family have been recruited at least twice. In spite of these independent events of recruitment for salivary function, neither of these families showed evidence of convergent amino acid sequence evolution in salivary-expressed members. On the contrary, in the 5′-nucleotide family, salivary-expressed proteins conserved ancestral amino acid residues to a significantly greater extent than related proteins without salivary function, implying parallel evolution by conservation of ancestral characters. This unusual pattern of sequence evolution suggests the hypothesis that purifying selection favoring conservation of ancestral residues is particularly strong in salivary-expressed members of the 5′-nucleotidase family of arthropods because of constraints arising from expression within the vertebrate host.  相似文献   

10.
The complete nucleotide sequence of the mitochondrial (mt) genome was determined for three species of discoglossid frogs (Amphibia:Anura:Discoglossidae), representing three of the four recognized genera: Alytes obstetricans, Bombina orientalis, and Discoglossus galganoi. The organization and size of these newly determined mt genomes are similar to those previously reported for other vertebrates. Phylogenetic analyses (maximum likelihood, Bayesian inference, minimum evolution, and maximum parsimony) of mt protein-coding genes at the amino acid level were performed in combination with already published mt genome sequence data of three species of Neobatrachia, one of Pipoidea, and four of Caudata. Phylogenetic analyses based on the deduced amino acid sequences of all mt protein-coding genes arrived at the same topology. The monophyly of Discoglossidae is strongly supported. Within the Discoglossidae, Alytes is consistently recovered as sister group of Discoglossus, to the exclusion of Bombina. The three species representing Neobatrachia exhibited extremely long branches irrespective of the phylogenetic inference method used, and hence their relative position with respect to Discoglossidae and Xenopus may be artefactual due to a severe long branch attraction effect. To further investigate the phylogenetic intrarelationships of discoglossids, nucleotide sequences of four nuclear protein-coding genes (CXCR4, RAG1, RAG2, and Rhodopsin) with sequences available for the three discoglossid genera and Xenopus were retrieved from GenBank, and together with a concatenated nucleotide sequence data set containing all mt protein-coding genes except ND6 were subjected to separate and combined phylogenetic analyses. In all cases, a sister group relationship between Alytes and Discoglossus was recovered with high statistical support.  相似文献   

11.
12.
Charadrii (shorebirds, gulls, and alcids) have exceptional diversity in ecological, behavioral, and life-history traits. A phylogenetic framework is necessary to fully understand the relationships among these traits. Despite several attempts to resolve the phylogeny of the Charadrii, none have comprehensively utilized molecular sequence data. Complete and partial cytochrome-b gene sequences for 86 Charadrii and five Falconides species (as outgroup taxa) were obtained from GenBank and aligned. We analyzed the resulting matrices using parsimony, Bayesian inference, minimum evolution, and quartet puzzling methods. Posterior probabilities, decay indices, and bootstrapping provide strong support for four major lineages consisting of gulls, alcids, plovers, and sandpipers, respectively. The broad structure of the trees differ significantly from all previous hypotheses of Charadrii phylogeny in placing the plovers at the base of the tree below the sandpipers in a pectinate sequence towards a large clade of gulls and alcids. The parsimony, Bayesian, and minimum evolution models provide strong evidence for this phylogenetic hypothesis. This is further corroborated by non-tree based measures of support and conflict (Lento plots). The quartet puzzling trees are poorly resolved and inconclusive.  相似文献   

13.
G B Golding 《Génome》1988,30(3):341-346
The divergence of immunoglobulin genes due to somatic mutation provides a natural example of DNA sequence divergence. This divergence was examined to gain insight into the processes of evolution and the determinants of the variance-to-mean ratio of sequence divergence. Normally, this ratio is found to be larger than expected (1.0 under Poisson assumptions) for the evolutionary divergence or most genes. Although not significantly less than one, all seven groups of immunoglobulin amino acid sequences have ratios smaller than expected, contrary to the evolutionary pattern generally observed. The substitutions in the immunoglobulin genes appear to be highly nonrandom and an excess of parallel changes (the major nonrandom feature of these mutations) is shown to cause smaller ratios. Because convergent or parallel mutations are often observed in the evolutionary divergence of genes, this suggests that forces causing the large observed ratios may actually have to be more powerful than previously expected. Further, since selection is one of the likely causes of parallel mutations, it should be noted that selection could significantly decrease the variance-to-mean ratio. The high frequency of parallel mutations and their resulting effects, as observed in the immunoglobulin genes, suggest that only poor inferences of sequence divergence can be made without actual knowledge of the ancestral sequence.  相似文献   

14.
Computer simulations provide a flexible method for assessing the power and robustness of phylogenetic inference methods. Unfortunately, simulated data are often obviously atypical of data encountered in studies of molecular evolution. Unrealistic simulations can lead to conclusions that are irrelevant to real-data analyses or can provide a biased view of which methods perform well. Here, we present a software tool designed to generate data under a complex codon model that allows each residue in the protein sequence to have a different set of equilibrium amino acid frequencies. The software can obtain maximum-likelihood estimates of the parameters of the Halpern and Bruno model from empirical data and a fixed tree; given an arbitrary tree and a fixed set of parameters, the software can then simulate artificial datasets.We present the results of a simulation experiment using randomly generated tree shapes and substitution parameters estimated from 1610 mammalian cytochrome b sequences.We tested tree inference at the amino acid, nucleotide and codon levels and under parsimony, maximum-likelihood, Bayesian and distance criteria (for a total of more than 650 analyses on each dataset). Based on these simulations, nucleotide-level analyses seem to be more accurate than amino acid and codon analyses. The performance of distance-based phylogenetic methods appears to be quite sensitive to the choice of model and the form of rate heterogeneity used. Further studies are needed to assess the generality of these conclusions. For example, fitting parameters of the Halpern Bruno model to sequences from other genes will reveal the extent to which our conclusions were influenced by the choice of cytochrome b. Incorporating codon bias and more sources heterogeneity into the simulator will be crucial to determining whether the current results are caused by a bias in the current simulation study in favour of nucleotide analyses.  相似文献   

15.
Although probabilistic models of genotype (e.g., DNA sequence) evolution have been greatly elaborated, less attention has been paid to the effect of phenotype on the evolution of the genotype. Here we propose an evolutionary model and a Bayesian inference procedure that are aimed at filling this gap. In the model, RNA secondary structure links genotype and phenotype by treating the approximate free energy of a sequence folded into a secondary structure as a surrogate for fitness. The underlying idea is that a nucleotide substitution resulting in a more stable secondary structure should have a higher rate than a substitution that yields a less stable secondary structure. This free energy approach incorporates evolutionary dependencies among sequence positions beyond those that are reflected simply by jointly modeling change at paired positions in an RNA helix. Although there is not a formal requirement with this approach that secondary structure be known and nearly invariant over evolutionary time, computational considerations make these assumptions attractive and they have been adopted in a software program that permits statistical analysis of multiple homologous sequences that are related via a known phylogenetic tree topology. Analyses of 5S ribosomal RNA sequences are presented to illustrate and quantify the strong impact that RNA secondary structure has on substitution rates. Analyses on simulated sequences show that the new inference procedure has reasonable statistical properties. Potential applications of this procedure, including improved ancestral sequence inference and location of functionally interesting sites, are discussed.  相似文献   

16.
Recent years have witnessed a proliferation of quantitative methods for biogeographic inference. In particular, novel parametric approaches represent exciting new opportunities for the study of range evolution. Here, we review a selection of current methods for biogeographic analysis and discuss their respective properties. These methods include generalized parsimony approaches, weighted ancestral area analysis, dispersal-vicariance analysis, the dispersal--extinction--cladogenesis model and other maximum likelihood approaches, and Bayesian stochastic mapping of ancestral ranges, including a novel approach to inferring range evolution in the context of island biogeography. Some of these methods were developed specifically for problems of ancestral range reconstruction, whereas others were designed for more general problems of character state reconstruction and subsequently applied to the study of ancestral ranges. Methods for reconstructing ancestral history on a phylogenetic tree differ not only in the types of ancestral range states that are allowed, but also in the various historical events that may change the ancestral ranges. We explore how the form of allowed ancestral ranges and allowed transitions can both affect the outcome of ancestral range estimation. Finally, we mention some promising avenues for future work in the development of model-based approaches to biogeographic analysis.  相似文献   

17.
Using a maximum-likelihood formalism, we have developed a method with which to reconstruct the sequences of ancestral proteins. Our approach allows the calculation of not only the most probable ancestral sequence but also of the probability of any amino acid at any given node in the evolutionary tree. Because we consider evolution on the amino acid level, we are better able to include effects of evolutionary pressure and take advantage of structural information about the protein through the use of mutation matrices that depend on secondary structure and surface accessibility. The computational complexity of this method scales linearly with the number of homologous proteins used to reconstruct the ancestral sequence.  相似文献   

18.
One of the lasting controversies in phylogenetic inference is the degree to which specific evolutionary models should influence the choice of methods. Model‐based approaches to phylogenetic inference (likelihood, Bayesian) are defended on the premise that without explicit statistical models there is no science, and parsimony is defended on the grounds that it provides the best rationalization of the data, while refraining from assigning specific probabilities to trees or character‐state reconstructions. Authors who favour model‐based approaches often focus on the statistical properties of the methods and models themselves, but this is of only limited use in deciding the best method for phylogenetic inference—such decision also requires considering the conditions of evolution that prevail in nature. Another approach is to compare the performance of parsimony and model‐based methods in simulations, which traditionally have been used to defend the use of models of evolution for DNA sequences. Some recent papers, however, have promoted the use of model‐based approaches to phylogenetic inference for discrete morphological data as well. These papers simulated data under models already known to be unfavourable to parsimony, and modelled morphological evolution as if it evolved just like DNA, with probabilities of change for all characters changing in concert along tree branches. The present paper discusses these issues, showing that under reasonable and less restrictive models of evolution for discrete characters, equally weighted parsimony performs as well or better than model‐based methods, and that parsimony under implied weights clearly outperforms all other methods.  相似文献   

19.
We propose two approximate methods (one based on parsimony and one on pairwise sequence comparison) for estimating the pattern of nucleotide substitution and a parsimony-based method for estimating the gamma parameter for variable substitution rates among sites. The matrix of substitution rates that represents the substitution pattern can be recovered through its relationship with the observable matrix of site pattern frequences in pairwise sequence comparisons. In the parsimony approach, the ancestral sequences reconstructed by the parsimony algorithm were used, and the two sequences compared are those at the ends of a branch in the phylogenetic tree. The method for estimating the gamma parameter was based on a reinterpretation of the numbers of changes at sites inferred by parsimony. Three data sets were analyzed to examine the utility of the approximate methods compared with the more reliable likelihood methods. The new methods for estimating the substitution pattern were found to produce estimates quite similar to those obtained from the likelihood analyses. The new method for estimating the gamma parameter was effective in reducing the bias in conventional parsimony estimates, although it also overestimated the parameter. The approximate methods are computationally very fast and appear useful for analyzing large data sets, for which use of the likelihood method requires excessive computation.   相似文献   

20.
We tested whether it is beneficial for the accuracy of phylogenetic inference to sample characters that are evolving under different sets of parameters, using both Bayesian MCMC (Markov chain Monte Carlo) and parsimony approaches. We examined differential rates of evolution among characters, differential character-state frequencies and character-state space, and differential relative branch lengths among characters. We also compared the relative performance of parsimony and Bayesian analyses by progressively incorporating more of these heterogeneous parameters and progressively increasing the severity of this heterogeneity. Bayesian analyses performed better than parsimony when heterogeneous simulation parameters were incorporated into the substitution model. However, parsimony outperformed Bayesian MCMC when heterogeneous simulation parameters were not incorporated into the Bayesian substitution model. The higher the rate of evolution simulated, the better parsimony performed relative to Bayesian analyses. Bayesian and parsimony analyses converged in their performance as the number of simulated heterogeneous model parameters increased. Up to a point, rate heterogeneity among sites was generally advantageous for phylogenetic inference using both approaches. In contrast, branch-length heterogeneity was generally disadvantageous for phylogenetic inference using both parsimony and Bayesian approaches. Parsimony was found to be more conservative than Bayesian analyses, in that it resolved fewer incorrect clades.
© The Willi Hennig Society 2006.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号