首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 531 毫秒
1.
2.
Previously proposed methods for protein secondary structure prediction from multiple sequence alignments do not efficiently extract the evolutionary information that these alignments contain. The predictions of these methods are less accurate than they could be, because of their failure to consider explicitly the phylogenetic tree that relates aligned protein sequences. As an alternative, we present a hidden Markov model approach to secondary structure prediction that more fully uses the evolutionary information contained in protein sequence alignments. A representative example is presented, and three experiments are performed that illustrate how the appropriate representation of evolutionary relatedness can improve inferences. We explain why similar improvement can be expected in other secondary structure prediction methods and indeed any comparative sequence analysis method.  相似文献   

3.
Using a maximum-likelihood formalism, we have developed a method with which to reconstruct the sequences of ancestral proteins. Our approach allows the calculation of not only the most probable ancestral sequence but also of the probability of any amino acid at any given node in the evolutionary tree. Because we consider evolution on the amino acid level, we are better able to include effects of evolutionary pressure and take advantage of structural information about the protein through the use of mutation matrices that depend on secondary structure and surface accessibility. The computational complexity of this method scales linearly with the number of homologous proteins used to reconstruct the ancestral sequence.  相似文献   

4.
酶祖先序列重建是指通过计算机算法推导来自灭绝生物的祖先酶的氨基酸序列的技术。通常可分为6个步骤,依次为现代酶的核酸/氨基酸序列收集、多序列比对、系统发育树构建、祖先酶序列的计算机推测、基因克隆、酶学性质表征。该方法广泛应用于研究分子在行星时间尺度上对环境条件不断变化的适应性和进化机制。随着酶在生物催化领域中扮演越来越重要的角色,该方法逐渐成为研究酶序列、结构和功能关系的有力手段。同时,祖先酶大多具有温度稳定性、突变稳定性等特性,使其成为进一步定向进化的理想蛋白质支架。文中综述了酶祖先序列重建的计算机算法、应用和常用计算机软件,并结合最新研究进展,展望其在酶定向进化领域中的应用前景。  相似文献   

5.
Combining protein evolution and secondary structure   总被引:19,自引:9,他引:10  
An evolutionary model that combines protein secondary structure and amino acid replacement is introduced. It allows likelihood analysis of aligned protein sequences and does not require the underlying secondary (or tertiary) structures of these sequences to be known. One component of the model describes the organization of secondary structure along a protein sequence and another specifies the evolutionary process for each category of secondary structure. A database of proteins with known secondary structures is used to estimate model parameters representing these two components. Phylogeny, the third component of the model, can be estimated from the data set of interest. As an example, we employ our model to analyze a set of sucrose synthase sequences. For the evolution of sucrose synthase, a parametric bootstrap approach indicates that our model is statistically preferable to one that ignores secondary structure.   相似文献   

6.
The phylogenetic inference of ancestral protein sequences is a powerful technique for the study of molecular evolution, but any conclusions drawn from such studies are only as good as the accuracy of the reconstruction method. Every inference method leads to errors in the ancestral protein sequence, resulting in potentially misleading estimates of the ancestral protein's properties. To assess the accuracy of ancestral protein reconstruction methods, we performed computational population evolution simulations featuring near-neutral evolution under purifying selection, speciation, and divergence using an off-lattice protein model where fitness depends on the ability to be stable in a specified target structure. We were thus able to compare the thermodynamic properties of the true ancestral sequences with the properties of “ancestral sequences” inferred by maximum parsimony, maximum likelihood, and Bayesian methods. Surprisingly, we found that methods such as maximum parsimony and maximum likelihood that reconstruct a “best guess” amino acid at each position overestimate thermostability, while a Bayesian method that sometimes chooses less-probable residues from the posterior probability distribution does not. Maximum likelihood and maximum parsimony apparently tend to eliminate variants at a position that are slightly detrimental to structural stability simply because such detrimental variants are less frequent. Other properties of ancestral proteins might be similarly overestimated. This suggests that ancestral reconstruction studies require greater care to come to credible conclusions regarding functional evolution. Inferred functional patterns that mimic reconstruction bias should be reevaluated.  相似文献   

7.
Protein structure is generally more conserved than sequence, but for regions that can adopt different structures in different environments, does this hold true? Understanding how structurally disordered regions evolve altered secondary structure element propensities as well as conformational flexibility among paralogs are fundamental questions for our understanding of protein structural evolution. We have investigated the evolutionary dynamics of structural disorder in protein families containing both orthologs and paralogs using phylogenetic tree reconstruction, protein structure disorder prediction, and secondary structure prediction in order to shed light upon these questions. Our results indicate that the extent and location of structurally disordered regions are not universally conserved. As structurally disordered regions often have high conformational flexibility, this is likely to have an effect on how protein structure evolves as spatially altered conformational flexibility can also change the secondary structure propensities for homologous regions in a protein family.  相似文献   

8.
The antibiotic alaremycin has a structure that resembles that of 5-aminolevulinic acid (ALA), a universal precursor of porphyrins, and inhibits porphyrin biosynthesis. Genome sequencing of the alaremycin-producing bacterial strain and enzymatic analysis revealed that the first step of alaremcyin biosynthesis is catalysed by the enzyme, AlmA, which exhibits a high degree of similarity to 5-aminolevulinate synthase (ALAS) expressed by animals, protozoa, fungi, and α-proteobacteria. Site-directed mutagenesis of AlmA revealed that the substitution of two amino acids residues around the substrate binding pocket transformed its substrate specificity from that of alaremycin precursor synthesis to ALA synthesis. To estimate the evolutionary trajectory of AlmA and ALAS, we performed an ancestral sequence reconstitution analysis based on a phylogenetic tree of AlmA and ALAS. The reconstructed common ancestral enzyme of AlmA and ALAS exhibited alaremycin precursor synthetic activity, rather than ALA synthetic activity. These results suggest that ALAS evolved from an AlmA-like enzyme. We propose a new evolutionary hypothesis in which a non-essential secondary metabolic enzyme acts as an ‘evolutionary seed’ to generate an essential primary metabolic enzyme.  相似文献   

9.
Since the dynamic nature of protein structures is essential for enzymatic function, it is expected that functional evolution can be inferred from the changes in protein dynamics. However, dynamics can also diverge neutrally with sequence substitution between enzymes without changes of function. In this study, a phylogenetic approach is implemented to explore the relationship between enzyme dynamics and function through evolutionary history. Protein dynamics are described by normal mode analysis based on a simplified harmonic potential force field applied to the reduced C(α) representation of the protein structure while enzymatic function is described by Enzyme Commission numbers. Similarity of the binding pocket dynamics at each branch of the protein family's phylogeny was analyzed in two ways: (1) explicitly by quantifying the normal mode overlap calculated for the reconstructed ancestral proteins at each end and (2) implicitly using a diffusion model to obtain the reconstructed lineage-specific changes in the normal modes. Both explicit and implicit ancestral reconstruction identified generally faster rates of change in dynamics compared with the expected change from neutral evolution at the branches of potential functional divergences for the α-amylase, d-isomer-specific 2-hydroxyacid dehydrogenase, and copper-containing amine oxidase protein families. Normal mode analysis added additional information over just comparing the RMSD of static structures. However, the branch-specific changes were not statistically significant compared to background function-independent neutral rates of change of dynamic properties and blind application of the analysis would not enable prediction of changes in enzyme specificity.  相似文献   

10.
Several lines of evidence such as the basal location of thermophilic lineages in large-scale phylogenetic trees and the ancestral sequence reconstruction of single enzymes or large protein concatenations support the conclusion that the ancestors of the bacterial and archaeal domains were thermophilic organisms which were adapted to hot environments during the early stages of the Earth. A parsimonious reasoning would therefore suggest that the last universal common ancestor (LUCA) was also thermophilic. Various authors have used branch-wise non-homogeneous evolutionary models that better capture the variation of molecular compositions among lineages to accurately reconstruct the ancestral G + C contents of ribosomal RNAs and the ancestral amino acid composition of highly conserved proteins. They confirmed the thermophilic nature of the ancestors of Bacteria and Archaea but concluded that LUCA, their last common ancestor, was a mesophilic organism having a moderate optimal growth temperature. In this letter, we investigate the unknown nature of the phylogenetic signal that informs ancestral sequence reconstruction to support this non-parsimonious scenario. We find that rate variation across sites of molecular sequences provides information at different time scales by recording the oldest adaptation to temperature in slow-evolving regions and subsequent adaptations in fast-evolving ones.  相似文献   

11.
The aminoacyl-tRNA synthetases are one of the major protein components in the translation machinery. These essential proteins are found in all forms of life and are responsible for charging their cognate tRNAs with the correct amino acid. The evolution of the tRNA synthetases is of fundamental importance with respect to the nature of the biological cell and the transition from an RNA world to the modern world dominated by protein-enzymes. We present a structure-based phylogeny of the aminoacyl-tRNA synthetases. By using structural alignments of all of the aminoacyl-tRNA synthetases of known structure in combination with a new measure of structural homology, we have reconstructed the evolutionary history of these proteins. In order to derive unbiased statistics from the structural alignments, we introduce a multidimensional QR factorization which produces a nonredundant set of structures. Since protein structure is more highly conserved than protein sequence, this study has allowed us to glimpse the evolution of protein structure that predates the root of the universal phylogenetic tree. The extensive sequence-based phylogenetic analysis of the tRNA synthetases (Woese et al., Microbiol. Mol. Biol. Rev. 64:202-236, 2000) has further enabled us to reconstruct the complete evolutionary profile of these proteins and to make connections between major evolutionary events and the resulting changes in protein shape. We also discuss the effect of functional specificity on protein shape over the complex evolutionary course of the tRNA synthetases.  相似文献   

12.
Reconstructing the evolutionary history of protein sequences will provide a better understanding of divergence mechanisms of protein superfamilies and their functions. Long-term protein evolution often includes dynamic changes such as insertion, deletion, and domain shuffling. Such dynamic changes make reconstructing protein sequence evolution difficult and affect the accuracy of molecular evolutionary methods, such as multiple alignments and phylogenetic methods. Unfortunately, currently available simulation methods are not sufficiently flexible and do not allow biologically realistic dynamic protein sequence evolution. We introduce a new method, indel-Seq-Gen (iSG), that can simulate realistic evolutionary processes of protein sequences with insertions and deletions (indels). Unlike other simulation methods, iSG allows the user to simulate multiple subsequences according to different evolutionary parameters, which is necessary for generating realistic protein families with multiple domains. iSG tracks all evolutionary events including indels and outputs the "true" multiple alignment of the simulated sequences. iSG can also generate a larger sequence space by allowing the use of multiple related root sequences. With all these functions, iSG can be used to test the accuracy of, for example, multiple alignment methods, phylogenetic methods, evolutionary hypotheses, ancestral protein reconstruction methods, and protein family classification methods. We empirically evaluated the performance of iSG against currently available methods by simulating the evolution of the G protein-coupled receptor and lipocalin protein families. We examined their true multiple alignments, reconstruction of the transmembrane regions and beta-strands, and the results of similarity search against a protein database using the simulated sequences. We also presented an example of using iSG for examining how phylogenetic reconstruction is affected by high indel rates.  相似文献   

13.
Proteins evolve under a myriad of biophysical selection pressures that collectively control the patterns of amino acid substitutions. These evolutionary pressures are sufficiently consistent over time and across protein families to produce substitution patterns, summarized in global amino acid substitution matrices such as BLOSUM, JTT, WAG, and LG, which can be used to successfully detect homologs, infer phylogenies, and reconstruct ancestral sequences. Although the factors that govern the variation of amino acid substitution rates have received much attention, the influence of thermodynamic stability constraints remains unresolved. Here we develop a simple model to calculate amino acid substitution matrices from evolutionary dynamics controlled by a fitness function that reports on the thermodynamic effects of amino acid mutations in protein structures. This hybrid biophysical and evolutionary model accounts for nucleotide transition/transversion rate bias, multi‐nucleotide codon changes, the number of codons per amino acid, and thermodynamic protein stability. We find that our theoretical model accurately recapitulates the complex yet universal pattern observed in common global amino acid substitution matrices used in phylogenetics. These results suggest that selection for thermodynamically stable proteins, coupled with nucleotide mutation bias filtered by the structure of the genetic code, is the primary driver behind the global amino acid substitution patterns observed in proteins throughout the tree of life.  相似文献   

14.
On the Evolution of Structure in Aminoacyl-tRNA Synthetases   总被引:10,自引:0,他引:10       下载免费PDF全文
The aminoacyl-tRNA synthetases are one of the major protein components in the translation machinery. These essential proteins are found in all forms of life and are responsible for charging their cognate tRNAs with the correct amino acid. The evolution of the tRNA synthetases is of fundamental importance with respect to the nature of the biological cell and the transition from an RNA world to the modern world dominated by protein-enzymes. We present a structure-based phylogeny of the aminoacyl-tRNA synthetases. By using structural alignments of all of the aminoacyl-tRNA synthetases of known structure in combination with a new measure of structural homology, we have reconstructed the evolutionary history of these proteins. In order to derive unbiased statistics from the structural alignments, we introduce a multidimensional QR factorization which produces a nonredundant set of structures. Since protein structure is more highly conserved than protein sequence, this study has allowed us to glimpse the evolution of protein structure that predates the root of the universal phylogenetic tree. The extensive sequence-based phylogenetic analysis of the tRNA synthetases (Woese et al., Microbiol. Mol. Biol. Rev. 64:202-236, 2000) has further enabled us to reconstruct the complete evolutionary profile of these proteins and to make connections between major evolutionary events and the resulting changes in protein shape. We also discuss the effect of functional specificity on protein shape over the complex evolutionary course of the tRNA synthetases.  相似文献   

15.
Pax proteins play a diverse role in early animal development and contain the characteristic paired domain, consisting of two conserved helix-turn-helix motifs. In many Pax proteins the paired domain is fused to a second DNA binding domain of the paired-like homeobox family. By amino acid sequence alignments, secondary structure prediction, 3D-structure comparison, and phylogenetic reconstruction, we analyzed the relationship between Pax proteins and members of the Tc1 family of transposases, which possibly share a common ancestor with Pax proteins. We suggest that the DNA binding domain of an ancestral transposase (proto-Pax transposase) was fused to a homeodomain shortly after the emergence of metazoans about one billion years ago. Using the transposase sequences as an outgroup we reexamined the early evolution of the Pax proteins. Our novel evolutionary scenario features a single homeobox capturing event and an early duplication of Pax genes before the divergence of porifera, indicating a more diverse role of Pax proteins in primitive animals than previously expected. Received: 16 February 2000 / Accepted: 13 August 2000  相似文献   

16.
17.
The ancestors of the archosaurs, a major branch of the diapsid reptiles, originated more than 240 MYA near the dawn of the Triassic Period. We used maximum likelihood phylogenetic ancestral reconstruction methods and explored different models of evolution for inferring the amino acid sequence of a putative ancestral archosaur visual pigment. Three different types of maximum likelihood models were used: nucleotide-based, amino acid-based, and codon-based models. Where possible, within each type of model, likelihood ratio tests were used to determine which model best fit the data. Ancestral reconstructions of the ancestral archosaur node using the best-fitting models of each type were found to be in agreement, except for three amino acid residues at which one reconstruction differed from the other two. To determine if these ancestral pigments would be functionally active, the corresponding genes were chemically synthesized and then expressed in a mammalian cell line in tissue culture. The expressed artificial genes were all found to bind to 11-cis-retinal to yield stable photoactive pigments with lambda(max) values of about 508 nm, which is slightly redshifted relative to that of extant vertebrate pigments. The ancestral archosaur pigments also activated the retinal G protein transducin, as measured in a fluorescence assay. Our results show that ancestral genes from ancient organisms can be reconstructed de novo and tested for function using a combination of phylogenetic and biochemical methods.  相似文献   

18.
Although probabilistic models of genotype (e.g., DNA sequence) evolution have been greatly elaborated, less attention has been paid to the effect of phenotype on the evolution of the genotype. Here we propose an evolutionary model and a Bayesian inference procedure that are aimed at filling this gap. In the model, RNA secondary structure links genotype and phenotype by treating the approximate free energy of a sequence folded into a secondary structure as a surrogate for fitness. The underlying idea is that a nucleotide substitution resulting in a more stable secondary structure should have a higher rate than a substitution that yields a less stable secondary structure. This free energy approach incorporates evolutionary dependencies among sequence positions beyond those that are reflected simply by jointly modeling change at paired positions in an RNA helix. Although there is not a formal requirement with this approach that secondary structure be known and nearly invariant over evolutionary time, computational considerations make these assumptions attractive and they have been adopted in a software program that permits statistical analysis of multiple homologous sequences that are related via a known phylogenetic tree topology. Analyses of 5S ribosomal RNA sequences are presented to illustrate and quantify the strong impact that RNA secondary structure has on substitution rates. Analyses on simulated sequences show that the new inference procedure has reasonable statistical properties. Potential applications of this procedure, including improved ancestral sequence inference and location of functionally interesting sites, are discussed.  相似文献   

19.
We describe a method for predicting the three-dimensional (3-D) structure of proteins from their sequence alone. The method is based on the electrostatic screening model for the stability of the protein main-chain conformation. The free energy of a protein as a function of its conformation is obtained from the potentials of mean force analysis of high-resolution x-ray protein structures. The free energy function is simple and contains only 44 fitted coefficients. The minimization of the free energy is performed by the torsion space Monte Carlo procedure using the concept of hierarchic condensation. The Monte Carlo minimization procedure is applied to predict the secondary, super-secondary, and native 3-D structures of 12 proteins with 28–110 amino acids. The 3-D structures of the majority of local secondary and super-secondary structures are predicted accurately. This result suggests that control in forming the native-like local structure is distributed along the entire protein sequence. The native 3-D structure is predicted correctly for 3 of 12 proteins composed mainly from the α-helices. The method fails to predict the native 3-D structure of proteins with a predominantly β secondary structure. We suggest that the hierarchic condensation is not an appropriate procedure for simulating the folding of proteins made up primarily from β-strands. The method has been proved accurate in predicting the local secondary and super-secondary structures in the blind ab initio 3-D prediction experiment. Proteins 31:74–96, 1998. © 1998 Wiley-Liss, Inc.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号