首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 750 毫秒
1.
The 567-terminal analysis of atpB, rbcL, and 18S rDNA was used as an empirical example to test the use of amino acid vs. nucleotide characters for protein-coding genes at deeper taxonomic levels. Nucleotides for atpB and rbcL had 6.5 times the amount of possible synapomorphy as amino acids. Based on parsimony analyses with unordered character states, nucleotides outperformed amino acids for all three measures of phylogenetic signal used (resolution, branch support, and congruence with independent evidence). The nucleotide tree was much more resolved than the amino acid tree, for both large and small clades. Nearly twice the percentage of well-supported clades resolved in the 18S rDNA tree were resolved using nucleotides (91.8%) relative to amino acids (49.2%). The well-supported clades resolved by both character types were much better supported by nucleotides (98.7% vs. 83.8% average jackknife support). The faster evolving nucleotides with a smaller average character-state space outperformed the slower evolving amino acids with a larger average character-state space. Nucleotides outperformed amino acids even with 90% of the terminals deleted. The lack of resolution on the amino acid trees appears to be caused by a lack of congruence among the amino acids, not a lack of replacement substitutions.  相似文献   

2.
We examined a broad selection of protein-coding loci from a diverse array of clades and genomes to quantify three factors that determine whether nucleotide or amino acid characters should be preferred for phylogenetic inference. First, we quantified the difference in observed character-state space between nucleotides and amino acids. Second, we quantified the loss of potential phylogenetic signal from silent substitutions when amino acids are used. Third, we used the disparity index to quantify the relative compositional heterogeneity of nucleotides and amino acids and then determined how commonly convergent (rather than unique) shifts in nucleotide and amino acid composition occur in a phylogenetic context. The greater potential phylogenetic signal for nucleotide characters was found to be enormous (on average 440% that of amino acids), whereas the greater observed character-state space for amino acids was less impressive (on average 150.4% that of nucleotides). While matrices of amino acid sequences had less compositional heterogeneity than their corresponding nucleotide sequences, heterogeneity in amino acid composition may be more homoplasious than heterogeneity in nucleotide composition. Given the ability of increased taxon sampling to better utilize the greater potential phylogenetic signal of nucleotide characters and decrease the potential for artifacts caused by heterogeneous nucleotide composition among taxa, we suggest that increased taxon sampling be performed whenever possible instead of restricting analyses to amino acid characters.  相似文献   

3.
The presence in proteins of amino acid residues that change in concert during evolution is associated with keeping constant the protein spatial structure and functions. As in the case with morphological features, correlated substitutions may become the cause of homoplasies--the independent evolution of identical non-homological adaptations. Our data obtained on model phylogenetic trees and corresponding sets of sequences have shown that the presence of correlated substitutions distorts the results of phylogenetic reconstructions. A method for accounting for co-evolving amino acid residues in phylogenetic analysis is proposed. According to this method, only a single site from the group of correlated amino acid positions should remain, whereas other positions should not be used in further phylogenetic analysis. Simulations performed have shown that replacement on the average of 8% of variable positions in a pair of model sequences by coordinately evolving amino acid residues is able to change the tree topology. The removal of such amino acid residues from sequences before phylogenetic analysis restores the correct topology.  相似文献   

4.
Miyazawa S 《PloS one》2011,6(12):e28892
BACKGROUND: A mechanistic codon substitution model, in which each codon substitution rate is proportional to the product of a codon mutation rate and the average fixation probability depending on the type of amino acid replacement, has advantages over nucleotide, amino acid, and empirical codon substitution models in evolutionary analysis of protein-coding sequences. It can approximate a wide range of codon substitution processes. If no selection pressure on amino acids is taken into account, it will become equivalent to a nucleotide substitution model. If mutation rates are assumed not to depend on the codon type, then it will become essentially equivalent to an amino acid substitution model. Mutation at the nucleotide level and selection at the amino acid level can be separately evaluated. RESULTS: The present scheme for single nucleotide mutations is equivalent to the general time-reversible model, but multiple nucleotide changes in infinitesimal time are allowed. Selective constraints on the respective types of amino acid replacements are tailored to each gene in a linear function of a given estimate of selective constraints. Their good estimates are those calculated by maximizing the respective likelihoods of empirical amino acid or codon substitution frequency matrices. Akaike and Bayesian information criteria indicate that the present model performs far better than the other substitution models for all five phylogenetic trees of highly-divergent to highly-homologous sequences of chloroplast, mitochondrial, and nuclear genes. It is also shown that multiple nucleotide changes in infinitesimal time are significant in long branches, although they may be caused by compensatory substitutions or other mechanisms. The variation of selective constraint over sites fits the datasets significantly better than variable mutation rates, except for 10 slow-evolving nuclear genes of 10 mammals. An critical finding for phylogenetic analysis is that assuming variable mutation rates over sites lead to the overestimation of branch lengths.  相似文献   

5.
Both traditional as well as 10 more recent methods of coding characters from exons of protein‐coding genes are reviewed. The more recent methods collectively blur the distinction between nucleotide and amino‐acid coding and enable investigators to carefully quantify the effects of different sources of phylogenetic signal as well as their potential biases. Codon models, which explicitly model silent and replacement substitutions, are a major advance and are expected to be broadly useful for simultaneously inferring recent and ancient divergences, unlike amino‐acid coding. Degeneracy coding, wherein ambiguity codes are used to eliminate silent substitutions at the individual‐nucleotide level, has clear advantages over scoring amino‐acid characters. Nucleotide, codon, and amino‐acid models are now directly comparable with easy‐to‐use programs, and widely used phylogenetics programs can analyze partitioned supermatrices that incorporate all three types of model. Therefore, it should become standard practice to test among these alternative model types before conducting parametric phylogenetic analyses. An earlier study of 78 protein‐coding genes from 360 green‐plant plastid genomes is used as an empirical example with which to quantify the relative performance of alternative character‐coding methods using five quantification measures. Codon models were selected as having the best fit to the data, yet were outperformed by nucleotide models for all five quantification measures. Third‐codon positions were found to be an important source of phylogenetic signal and even outperformed analyses of first and second positions for some measures. Degeneracy coding generally performed at least as well as amino‐acid coding and is an arguably more effective alternative.  相似文献   

6.
The Molecular Evolution of the Small Heat-Shock Proteins in Plants   总被引:13,自引:0,他引:13       下载免费PDF全文
E. R. Waters 《Genetics》1995,141(2):785-795
The small heat-shock proteins have undergone a tremendous diversification in plants; whereas only a single small heat-shock protein is found in fungi and many animals, over 20 different small heat-shock proteins are found in higher plants. The small heat-shock proteins in plants have diversified in both sequence and cellular localization and are encoded by at least five gene families. In this study, 44 small heat-shock protein DNA and amino acid sequences were examined, using both phylogenetic analysis and analysis of nucleotide substitution patterns to elucidate the evolutionary history of the small heat-shock proteins. The phylogenetic relationships of the small heat-shock proteins, estimated using parsimony and distance methods, reveal that gene duplication, sequence divergence and gene conversion have all played a role in the evolution of the small heat-shock proteins. Analysis of nonsynonymous substitutions and conservative and radical replacement substitutions (in relation to hydrophobicity) indicates that the small heat-shock protein gene families are evolving at different rates. This suggests that the small heat-shock proteins may have diversified in function as well as in sequence and cellular localization.  相似文献   

7.
Ciliates provide a powerful system to analyze the evolution of duplicated alpha-tubulin genes in the context of single-celled organisms. Genealogical analyses of ciliate alpha-tubulin sequences reveal five apparently recent gene duplications. Comparisons of paralogs in different ciliates implicate differing patterns of substitutions (e.g., ratios of replacement/synonymous nucleotides and radical/conservative amino acids) following duplication. Most substitutions between paralogs in Euplotes crassus, Halteria grandinella and Paramecium tetraurelia are synonymous. In contrast, alpha-tubulin paralogs within Stylonychia lemnae and Chilodonella uncinata are evolving at significantly different rates and have higher ratios of both replacement substitutions to synonymous substitutions and radical amino acid changes to conservative amino acid changes. Moreover, the amino acid substitutions in C. uncinata and S. lemnae paralogs are limited to short stretches that correspond to functionally important regions of the alpha-tubulin protein. The topology of ciliate alpha-tubulin genealogies are inconsistent with taxonomy based on morphology and other molecular markers, which may be due to taxonomic sampling, gene conversion, unequal rates of evolution, or asymmetric patterns of gene duplication and loss.  相似文献   

8.
Abstract— Amino acid encoding genes contain character state information that may be useful for phylogenetic analysis on at least two levels. The nucleotide sequence and the translated amino acid sequences have both been employed separately as character states for cladistic studies of various taxa, including studies of the genealogy of genes in multigene families. In essence, amino acid sequences and nucleic acid sequences are two different ways of character coding the information in a gene. Silent positions in the nucleotide sequence (first or third positions in codons that can accrue change without changing the identity of the amino acid that the triplet codes for) may accrue change relatively rapidly and become saturated, losing the pattern of historical divergence. On the other hand, non-silent nucleotide alterations and their accompanying amino acid changes may evolve too slowly to reveal relationships among closely related taxa. In general, the dynamics of sequence change in silent and non-silent positions in protein coding genes result in homoplasy and lack of resolution, respectively. We suggest that the combination of nucleic acid and the translated amino acid coded character states into the same data matrix for phylogenetic analysis addresses some of the problems caused by the rapid change of silent nucleotide positions and overall slow rate of change of non-silent nucleotide positions and slowly changing amino acid positions. One major theoretical problem with this approach is the apparent non-independence of the two sources of characters. However, there are at least three possible outcomes when comparing protein coding nucleic acid sequences with their translated amino acids in a phylogenetic context on a codon by codon basis. First, the two character sets for a codon may be entirely congruent with respect to the information they convey about the relationships of a certain set of taxa. Second, one character set may display no information concerning a phylogenetic hypothesis while the other character set may impart information to a hypothesis. These two possibilities are cases of non-independence, however, we argue that congruence in such cases can be thought of as increasing the weight of the particular phylogenetic hypothesis that is supported by those characters. In the third case, the two sources of character information for a particular codon may be entirely incongruent with respect to phylogenetic hypotheses concerning the taxa examined. In this last case the two character sets are independent in that information from neither can predict the character states of the other. Examples of these possibilities are discussed and the general applicability of combining these two sources of information for protein coding genes is presented using sequences from the homeobox region of 46 homeobox genes fromDrosophila melanogasterto develop a hypothesis of genealogical relationship of these genes in this large multigene family.  相似文献   

9.
Protein-coding genes may be analyzed in phylogenetic analyses using nucleotide-sequence characters and/or amino-acid-sequence characters. Although amino-acid-sequence characters "correct" for saturation (parallelism), amino-acid-sequence characters are subject to convergence and ignore phylogenetically informative variation. When all nucleotide-sequence characters have a consistency index of 1, characters coded using the amino acid sequence may have a consistency index of less than 1. The reason for this is that most amino acids are specified by more than one codon. If two different codons that both code for the same amino acid are derived independent of one another in divergent lineages, nucleotide-sequence characters may not be homoplasious when amino-acid-sequence characters may be homoplasious. Not only may amino-acid-sequence characters support groupings that are not supported by nucleotide-sequence characters, they may support contradictory groupings. Because this convergence is a problem of character delimitation, it affects the results of all tree-construction methods (maximum likelihood, neighbor joining, parsimony, etc.). In effect, coding amino-acid-sequence characters instead of nucleotide-sequence characters putatively corrects for saturation and definitely causes a convergence problem. An empirical example from the Mhc locus is given.  相似文献   

10.
Phylogenetic analyses based on mitochondrial DNA have yielded widely differing relationships among members of the arthropod lineage Arachnida, depending on the nucleotide coding schemes and models of evolution used. We enhanced taxonomic coverage within the Arachnida greatly by sequencing seven new arachnid mitochondrial genomes from five orders. We then used all 13 mitochondrial protein-coding genes from these genomes to evaluate patterns of nucleotide and amino acid biases. Our data show that two of the six orders of arachnids (spiders and scorpions) have experienced shifts in both nucleotide and amino acid usage in all their protein-coding genes, and that these biases mislead phylogeny reconstruction. These biases are most striking for the hydrophobic amino acids isoleucine and valine, which appear to have evolved asymmetrical exchanges in response to shifts in nucleotide composition. To improve phylogenetic accuracy based on amino acid differences, we tested two recoding methods: (1) removing all isoleucine and valine sites and (2) recoding amino acids based on their physiochemical properties. We find that these methods yield phylogenetic trees that are consistent in their support of ancient intraordinal divergences within the major arachnid lineages. Further refinement of amino acid recoding methods may help us better delineate interordinal relationships among these diverse organisms.  相似文献   

11.
rRNA二级结构序列用于真菌系统学研究的方法初探   总被引:1,自引:0,他引:1  
本文首次利用核酸二级结构特征代替核酸碱基作为探讨类群之间亲缘关系的信号,构建了基于结构特征的子囊菌部分类群的系统进化树。该方法以S(规范的碱基对),Q(不规范的碱基对),I(单链),B(侧环),M(多分枝环)和H(发卡结构)为代码将二级结构特征区分为6种不同的亚结构类型,然后将二级结构特征转换为结构序列,并进行结构序列分析。该方法使rRNA不只局限于碱基比较,拓展了其应用范围,为揭示分子的功能与进化的关系提供了线索。结果表明,结构序列分析可用于子囊菌的系统学研究;相对于核酸序列分析,结构分析的结果似乎更加清晰地体现子囊果的演化过程。  相似文献   

12.
We have performed a large-scale analysis of amino acid sequence evolution after gene duplication by comparing evolution after gene duplication with evolution after speciation in over 1,800 phylogenetic trees constructed from manually curated alignments of protein domains downloaded from the PFAM database. The site-specific rate of evolution is significantly altered by gene duplication. A significant increase in the proportion of amino acid substitutions at constrained (slowly evolving) sites after duplication was observed. An increase in the proportion of replacements at normally constrained amino acid sites could result from relaxation of purifying selective pressure. However, the proportion of amino acid replacements involving radical changes in amino acid properties after duplication does not appear to be significantly increased by relaxed selective pressure. The increased proportion of replacements at constrained sites was observed over a relatively large range of protein change (up to 25% amino acid replacements per site). These findings have implications for our understanding of the nature of evolution after duplication and may help to shed light on the evolution of novel protein functions through gene duplication.  相似文献   

13.
Models of amino acid substitution were developed and compared using maximum likelihood. Two kinds of models are considered. "Empirical" models do not explicitly consider factors that shape protein evolution, but attempt to summarize the substitution pattern from large quantities of real data. "Mechanistic" models are formulated at the codon level and separate mutational biases at the nucleotide level from selective constraints at the amino acid level. They account for features of sequence evolution, such as transition-transversion bias and base or codon frequency biases, and make use of physicochemical distances between amino acids to specify nonsynonymous substitution rates. A general approach is presented that transforms a Markov model of codon substitution into a model of amino acid replacement. Protein sequences from the entire mitochondrial genomes of 20 mammalian species were analyzed using different models. The mechanistic models were found to fit the data better than empirical models derived from large databases. Both the mutational distance between amino acids (determined by the genetic code and mutational biases such as the transition-transversion bias) and the physicochemical distance are found to have strong effects on amino acid substitution rates. A significant proportion of amino acid substitutions appeared to have involved more than one codon position, indicating that nucleotide substitutions at neighboring sites may be correlated. Rates of amino acid substitution were found to be highly variable among sites.   相似文献   

14.
研究了露螽属4种昆虫线粒体细胞色素b基因(Cyt b)的部分序列,分析了核苷酸序列组成与变异及氨基酸差异.在得到的432bp序列中,A T约占66.9%,其中102个核苷酸位点发生了变异(约23.8%),从每个氨基酸密码子来看,第3位点的A T含量较高,为79.7%.Cyt b基因编码的144个氨基酸由19种氨基酸组成,有12个发生了变异,占氨基酸总数的8.33%,其中亮氨酸(Leu)与苯丙氨酸(Phe)的含量较高,谷氨酸(Glu)、赖氨酸(Lys)与精氨酸(Arg)的含量较低,无半胱氨酸(Cys).以日本纺织娘和中华螽斯为外群构建的NJ分子系统树显示,镰尾露螽与齿尾露螽是分化较晚的类群,其次是瘦露螽,黑角露螽是分化较早的类群.  相似文献   

15.
As the number of sequenced genomes from diverse walks of life rapidly increases, phylogenetic analysis is entering a new era: reconstruction of the evolutionary history of organisms on the basis of full-scale comparison of their genomes. In addition to brute force, genome-wide analysis of alignments, rare genomic changes (RGCs) that are thought to comprise derived shared characters of individual clades are increasingly used in genome-wide phylogenetic studies. We propose a new type of RGCs designated RGC_CAMs (after Conserved Amino acids-Multiple substitutions), which are inferred using a genome-scale analysis of protein and underlying nucleotide sequence alignments. The RGC_CAM approach utilizes amino acid residues conserved in major eukaryotic lineages, with the exception of a few species comprising a putative clade, and selects for phylogenetic inference only those amino acid replacements that require 2 or 3 nucleotide substitutions, in order to reduce homoplasy. The RGC_CAM analysis was combined with a procedure for rigorous statistical testing of competing phylogenetic hypotheses. The RGC_CAM method is shown to be robust to branch length differences and taxon sampling. When applied to animal phylogeny, the RGC_CAM approach strongly supports the coelomate clade that unites chordates with arthropods as opposed to the ecdysozoan (molting animals) clade. This conclusion runs against the view of animal evolution that is currently prevailing in the evo-devo community. The final solution to the coelomate-ecdysozoa controversy will require a much larger set of complete genome sequences representing diverse animal taxa. It is expected that RGC_CAM and other RGC-based methods will be crucial for these future, definitive phylogenetic studies.  相似文献   

16.
This study examines the pattern of opsin nucleotide and amino acid substitution among mimetic species 'rings' of Heliconius butterflies that are characterized by divergent wing colour patterns. A long wavelength opsin gene, OPS1 , was sequenced from each of seven species of Heliconius and one species of Dryas (Lepidoptera: Nymphalidae). A parsimony analysis of OPS1 nucleotide and amino acid sequences resulted in a phylogeny that was consistent with that presented by Brower & Egan in 1997, which was based on mitochondrial cytochrome oxidase I and II as well as nuclear wingless genes. Nodes in the OPS1 phylogeny were well supported by bootstrap analysis and decay indices. An analysis of specific sites within the gene indicates that the accumulation of amino acid substitutions has occurred independently of the morphological diversification of Heliconius wing colour patterns. Amino acid substitutions were examined with respect to their location within the opsin protein and their possible interactions with the chromophore and the G-protein. Of the 15 amino acid substitutions identified among the eight species, one nonconservative replacement (A226Q) was identified in a position that may be involved in binding with the G-protein.  相似文献   

17.
In the past, 2 kinds of Markov models have been considered to describe protein sequence evolution. Codon-level models have been mechanistic with a small number of parameters designed to take into account features, such as transition-transversion bias, codon frequency bias, and synonymous-nonsynonymous amino acid substitution bias. Amino acid models have been empirical, attempting to summarize the replacement patterns observed in large quantities of data and not explicitly considering the distinct factors that shape protein evolution. We have estimated the first empirical codon model (ECM). Previous codon models assume that protein evolution proceeds only by successive single nucleotide substitutions, but our results indicate that model accuracy is significantly improved by incorporating instantaneous doublet and triplet changes. We also find that the affiliations between codons, the amino acid each encodes and the physicochemical properties of the amino acids are main factors driving the process of codon evolution. Neither multiple nucleotide changes nor the strong influence of the genetic code nor amino acids' physicochemical properties form a part of standard mechanistic models and their views of how codon evolution proceeds. We have implemented the ECM for likelihood-based phylogenetic analysis, and an assessment of its ability to describe protein evolution shows that it consistently outperforms comparable mechanistic codon models. We point out the biological interpretation of our ECM and possible consequences for studies of selection.  相似文献   

18.
The phylogenetic relationships among the Drosophila melanogaster subgroup species were analyzed using approximately 1550-nucleotide-long sequences of the Cu,Zn SOD gene. Phylogenetic analysis was performed using separately the whole region and the intron sequences of the gene. The resulting phylogenetic trees reveal virtually the same topology, separating the species into distinct clusters. The inferred topology generally agrees with previously proposed classifications based on morphological and molecular data. The amino acid sequences of the Cu,Zn SOD of the D. melanogaster subgroup species reveal a high-conservation pattern. Only 3.9% of the total amino acid sites are variable, and none affects the major structural elements. Comparison of the Drosophila Cu,Zn SOD amino acid sequences with the Cu,Zn SOD of Bos taurus and Xenopus laevis (whose three-dimensional structure has been elucidated) reveals conservation of all the protein's functionally important amino acids and no substitutions that dramatically change the charge or the polarity of the amino acids.  相似文献   

19.
In an attempt to define the phylogenetical relationship among 17 phenotypically related species of genera Enterobacter, Pantoea, Serratia, Klebsiella and Erwinia, we determined almost all of their groE operon sequences using the polymerase chain reaction direct sequencing method. The number of nucleotide substitutions per site was 0.12+/-0.030. The value was 3.6-fold higher than that of 16S rDNA. As a result, we were successful in constructing molecular phylogenetic trees which had a finer resolution than that based on the 16S rDNA sequences. The phylogenetic trees based on the nucleotide sequences and deduced amino acid sequences of groE operons indicated that the members of genera Enterobacter, Pantoea and Klebsiella were closely related to each other, while Serratia and Erwinia species except Erwinia carotovora, made distinct clades. The close relationship between Enterobacter aerogenes and Klebsiella pneumoniae, that had been suggested by biochemical tests and DNA hybridization, was also supported by our molecular phylogenetic trees.  相似文献   

20.
Understanding the patterns and causes of protein sequence evolution is a major challenge in evolutionary biology. One of the critical unresolved issues is the relative contribution of selection and genetic drift to the fixation of amino acid sequence differences between species. Molecular homoplasy, the independent evolution of the same amino acids at orthologous sites in different taxa, is one potential signature of selection; however, relatively little is known about its prevalence in eukaryotic proteomes. To quantify the extent and type of homoplasy among evolving proteins, we used phylogenetic methodology to analyze 8 genome-scale data matrices from clades of different evolutionary depths that span the eukaryotic tree of life. We found that the frequency of homoplastic amino acid substitutions in eukaryotic proteins was more than 2-fold higher than expected under neutral models of protein evolution. The overwhelming majority of homoplastic substitutions were parallelisms that involved the most frequently exchanged amino acids with similar physicochemical properties and that could be reached by a single-mutational step. We conclude that the role of homoplasy in shaping the protein record is much larger than generally assumed, and we suggest that its high frequency can be explained by both weak positive selection for certain substitutions and purifying selection that constrains substitutions to a small number of functionally equivalent amino acids.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号