共查询到20条相似文献,搜索用时 15 毫秒
1.
Aravind L Mazumder R Vasudevan S Koonin EV 《Current opinion in structural biology》2002,12(3):392-399
Complementary developments in comparative genomics, protein structure determination and in-depth comparison of protein sequences and structures have provided a better understanding of the prevailing trends in the emergence and diversification of protein domains. The investigation of deep relationships among different classes of proteins involved in key cellular functions, such as nucleic acid polymerases and other nucleotide-dependent enzymes, indicates that a substantial set of diverse protein domains evolved within the primordial, ribozyme-dominated RNA world. 相似文献
2.
Proteins are finicky molecules; they are barely stable and are prone to aggregate, but they must function in a crowded environment that is full of degradative enzymes bent on their destruction. It is no surprise that many common diseases are due to missense mutations that affect protein stability and aggregation. Here we review the literature on biophysics as it relates to molecular evolution, focusing on how protein stability and aggregation affect organismal fitness. We then advance a biophysical model of protein evolution that helps us to understand phenomena that range from the dynamics of molecular adaptation to the clock-like rate of protein evolution. 相似文献
3.
Hall BG 《Molecular biology and evolution》2008,25(4):688-695
Phylogenetic reconstruction based upon multiple alignments ofmolecular sequences is important to most branches of modernbiology and is central to molecular evolution. Understandingthe historical relationships among macromolecules depends uponcomputer programs that implement a variety of analytical methods.Because it is impossible to know those historical relationshipswith certainty, assessment of the accuracy of methods and theprograms that implement them requires the use of programs thatrealistically simulate the evolution of DNA sequences. EvolveAGene3 is a realistic coding sequence simulation program that separatesmutation from selection and allows the user to set selectionconditions, including variable regions of selection intensitywithin the sequence and variation in intensity of selectionover branches. Variation includes base substitutions, insertions,and deletions. To the best of my knowledge, it is the only programavailable that simulates the evolution of intact coding sequences.Output includes the true tree and true alignments of the resultingcoding sequence and corresponding protein sequences. A log filereports the frequencies of each kind of base substitution, theratio of transition to transversion substitutions, the ratioof indel to base substitution mutations, and the numbers ofsilent and amino acid replacement mutations. The realism ofthe data sets has been assessed by comparing the dN/dS ratio,the ratio of transition to transversion substitutions, and theratio of indel to base substitution mutations of the simulateddata sets with those parameters of real data sets from the "goldstandard" BaliBase collection of structural alignments. Resultsshow that the data sets produced by EvolveAGene 3 are very similarto real data sets, and EvolveAGene 3 is therefore a realisticsimulation program that can be used to evaluate a variety ofprograms and methods in molecular evolution. 相似文献
4.
Tuck Seng Wong - Both authors contributed equally to this work Danilo Roccatano Ulrich Schwaneberg 《Biocatalysis and Biotransformation》2007,25(2):229-241
Directed protein evolution is the most versatile method for studying protein structure-function relationships, and for tailoring a protein's properties to the needs of industrial applications. In this review, we performed a statistical analysis on the genetic code to study the extent and consequence of the organization of the genetic code on amino acid substitution patterns generated in directed evolution experiments. In detail, we analyzed amino acid substitution patterns caused by (a) a single nucleotide (nt) exchange at each position of all 64 codons, and (b) two subsequent nt exchanges (first and second nt, first and third nt, second and third nt). Additionally, transitions and transversions mutations were compared at the level of amino acid substitution patterns. The latter analysis showed that single nucleotide substitution in a codon generates only 39.5% of the natural diversity on the protein level with 5.2-7 amino acid substitutions per codon. Transversions generate more complex amino acid substitution patterns (increased number and chemically more diverse amino acid substitutions) than transitions. Simultaneous nt exchanges at both first and second nt of a codon generates very diverse amino acid substitution patterns, achieving 83.2% of the natural diversity. The statistical analysis described in this review sets the objectives for novel random mutagenesis methods that address the consequences of the organization of the genetic code. Random mutagenesis methods that favor transversions or introduce consecutive nt exchanges can contribute in this regard. 相似文献
5.
Tuck Seng Wong Tuck Seng Wong Danilo Roccatano 《Biocatalysis and Biotransformation》2013,31(2-4):229-241
Directed protein evolution is the most versatile method for studying protein structure–function relationships, and for tailoring a protein's properties to the needs of industrial applications. In this review, we performed a statistical analysis on the genetic code to study the extent and consequence of the organization of the genetic code on amino acid substitution patterns generated in directed evolution experiments. In detail, we analyzed amino acid substitution patterns caused by (a) a single nucleotide (nt) exchange at each position of all 64 codons, and (b) two subsequent nt exchanges (first and second nt, first and third nt, second and third nt). Additionally, transitions and transversions mutations were compared at the level of amino acid substitution patterns. The latter analysis showed that single nucleotide substitution in a codon generates only 39.5% of the natural diversity on the protein level with 5.2–7 amino acid substitutions per codon. Transversions generate more complex amino acid substitution patterns (increased number and chemically more diverse amino acid substitutions) than transitions. Simultaneous nt exchanges at both first and second nt of a codon generates very diverse amino acid substitution patterns, achieving 83.2% of the natural diversity. The statistical analysis described in this review sets the objectives for novel random mutagenesis methods that address the consequences of the organization of the genetic code. Random mutagenesis methods that favor transversions or introduce consecutive nt exchanges can contribute in this regard. 相似文献
6.
Goldstein RA 《Current opinion in structural biology》2008,18(2):170-177
The observed distribution of protein structures can give us important clues about the underlying evolutionary process, imposing important constraints on possible models. The availability of results from an increasing number of genome projects has made the development of these models an active area of research. Models explaining the observed distribution of structures have focused on the inherent functional capabilities and structural properties of different folds and on the evolutionary dynamics. Increasingly, these elements are being combined. 相似文献
7.
8.
Following the original idea of Maynard Smith on evolution of the protein sequence space, a novel tool is developed that allows the "space walk", from one sequence to its likely evolutionary relative and further on. At a given threshold of identity between consecutive steps, the walks of many steps are possible. The sequences at the ends of the walks may substantially differ from one another. In a sequence space of randomized (shuffled) sequences the walks are very short. The approach opens new perspectives for protein evolutionary studies and sequence annotation. 相似文献
9.
We assess the variability of protein function in protein sequence and structure space. Various regions in this space exhibit considerable difference in the local conservation of molecular function. We analyze and capture local function conservation by means of logistic curves. Based on this analysis, we propose a method for predicting molecular function of a query protein with known structure but unknown function. The prediction method is rigorously assessed and compared with a previously published function predictor. Furthermore, we apply the method to 500 functionally unannotated PDB structures and discuss selected examples. The proposed approach provides a simple yet consistent statistical model for the complex relations between protein sequence, structure, and function. The GOdot method is available online (http://godot.bioinf.mpi-inf.mpg.de). 相似文献
10.
To understand the physical and evolutionary determinants of protein folding, we map out the complete organization of thermodynamic and kinetic properties for protein sequences that share the same fold. The exhaustive nature of our study necessitates using simplified models of protein folding. We obtain a stability map and a folding rate map in sequence space. Comparison of the two maps reveals a common organizational principle: optimality decreases more or less uniformly with distance from the optimal sequence in the sequence space. This gives a funnel-shaped optimality surface. Evolutionary dynamics of a sequence population on these two maps reveal how the simple organization of sequence space affects the distributions of stability and folding rate preferred by evolution. 相似文献
11.
From protein sequence space to elementary protein modules 总被引:2,自引:0,他引:2
The formatted protein sequence space is built from identical size fragments of prokaryotic proteins (112 complete proteomes). Connecting sequence-wise similar fragments (points in the space) results in the formation of numerous networks, that combine sometimes different types of proteins sharing, though, fragments with similar or distantly related sequences. The networks are mapped on individual protein sequences revealing distinct regions (modules) associated with prominent networks with well-defined functional identities. Presence of multiple sites of sequence conservation (modules) in a given protein sequence suggests that the annotated protein function may be decomposed in "elementary" subfunctions of the respective modules. The modules correspond to previously discovered conserved closed loop structures and their sequence prototypes. 相似文献
12.
《Journal of Fermentation and Bioengineering》1995,79(2):107-118
A landscape in protein sequence space shows the relationship between the primary structure and the level of a property of each protein. We developed methods for observing local landscapes experimentally using catalase I from Bacillus stearothermophilus with respect to its catalatic activity, peroxidatic activity, and thermostability. The enzyme gene was randomly mutated and a mutant library composed of 2648 transformants was obtained. Based on the activity and productivity of these transformants, 82 were selected as a sample group for measuring the altitude of catalase I. The altitude of the wild-type enzyme is close to the highest level in the mutant population for the thermostability landscape, but is at the average level for the peroxidatic activity. As for the catalatic activity, its altitude lies in between the two positions. A positive correlation was found between the altitudes of the catalatic and the peroxidatic activities, indicating that the locations of the hills and valleys in the landscapes of the two activities roughly correspond with each other. In contrast, the thermostability landscape appeared quite differently. The smoothness of the landscape was examined via the number of mutations in the structural genes of the mutant enzymes of different properties. The correlation between the number of mutations and the level of each property showed that the thermostability landscape is smooth, but not the two activity landscapes. Thus, the results show that even from a rough sketch of the landscapes based on the experimental data, the characteristic features of catalase I can be elucidated. The sketch of a landscape, therefore, provides a new view in understanding enzymes. 相似文献
13.
Babajide A Farber R Hofacker IL Inman J Lapedes AS Stadler PF 《Journal of theoretical biology》2001,212(1):35-46
Knowledge-based potentials can be used to decide whether an amino acid sequence is likely to fold into a prescribed native protein structure. We use this idea to survey the sequence-structure relations in protein space. In particular, we test the following two propositions which were found to be important for efficient evolution: the sequences folding into a particular native fold form extensive neutral networks that percolate through sequence space. The neutral networks of any two native folds approach each other to within a few point mutations. Computer simulations using two very different potential functions, M. Sippl's PROSA pair potential and a neural network based potential, are used to verify these claims. 相似文献
14.
15.
Combining protein evolution and secondary structure 总被引:10,自引:9,他引:10
An evolutionary model that combines protein secondary structure and amino
acid replacement is introduced. It allows likelihood analysis of aligned
protein sequences and does not require the underlying secondary (or
tertiary) structures of these sequences to be known. One component of the
model describes the organization of secondary structure along a protein
sequence and another specifies the evolutionary process for each category
of secondary structure. A database of proteins with known secondary
structures is used to estimate model parameters representing these two
components. Phylogeny, the third component of the model, can be estimated
from the data set of interest. As an example, we employ our model to
analyze a set of sucrose synthase sequences. For the evolution of sucrose
synthase, a parametric bootstrap approach indicates that our model is
statistically preferable to one that ignores secondary structure.
相似文献
16.
Alternative splicing is thought to be one of the major sources for functional diversity in higher eukaryotes. Interestingly, when mapping splicing events onto protein structures, about half of the events affect structured and even highly conserved regions i.e. are non-trivial on the structure level. This has led to the controversial hypothesis that such splice variants result in nonsense-mediated mRNA decay or non-functional, unstructured proteins, which do not contribute to the functional diversity of an organism. Here we show in a comprehensive study on alternative splicing that proteins appear to be much more tolerant to structural deletions, insertions and replacements than previously thought. We find literature evidence that such non-trivial splicing isoforms exhibit different functional properties compared to their native counterparts and allow for interesting regulatory patterns on the protein network level. We provide examples that splicing events may represent transitions between different folds in the protein sequence–structure space and explain these links by a common genetic mechanism. Taken together, those findings hint to a more prominent role of splicing in protein structure evolution and to a different view of phenotypic plasticity of protein structures. 相似文献
17.
Thorne JL 《Current opinion in genetics & development》2000,10(6):602-605
Homologous sequences are correlated due to their common ancestry. Probabilistic models of sequence evolution are employed routinely to properly account for these phylogenetic correlations. These increasingly realistic models provide a basis for studying evolution and for exploiting it to better understand protein structure and function. Notable recent advances have been made in the treatment of insertion and deletion events, the estimation of amino-acid replacement rates, and the detection of positive selection. 相似文献
18.
Zakharia M. Frenkel Zeev M. Frenkel Edward N. Trifonov Sagi Snir 《Journal of theoretical biology》2009,260(3):438-444
A novel approach for evaluation of sequence relatedness via a network over the sequence space is presented. This relatedness is quantified by graph theoretical techniques. The graph is perceived as a flow network, and flow algorithms are applied. The number of independent pathways between nodes in the network is shown to reflect structural similarity of corresponding protein fragments. These results provide an appropriate parameter for quantitative estimation of such relatedness, as well as reliability of the prediction. They also demonstrate a new potential for sequence analysis and comparison by means of the flow network in the sequence space. 相似文献
19.
Evolutionary networks in the formatted protein sequence space. 总被引:4,自引:0,他引:4
In our recent work, a new approach to establish sequence relatedness, by walking through the protein sequence space, was introduced. The sequence space is built from 20 amino acid long fragments of proteins from a very large collection of fully sequenced prokaryotic genomes. The fragments, points in the space, are connected, if they are closely related (high sequence identity). The connected fragments form variety of networks of sequence kinship. In this research the networks in the formatted sequence space and their topology are analyzed. For lower identity thresholds a huge network of complex structure is formed, involving up to 10% points of the space. When the threshold is increased, the major network splits into a set of smaller clusters with a wide diversity of sizes and topologies. Such "evolutionary networks" may serve as a powerful sequence annotation tool that allows one to reveal fine details in the evolutionary history of proteins. 相似文献
20.
The sequence of a genome contains the plans of the possible life of an organism, but implementation of genetic information depends on the functions of the proteins and nucleic acids that it encodes. Many individual proteins of known sequence and structure present challenges to the understanding of their function. In particular, a number of genes responsible for diseases have been identified but their specific functions are unknown. Whole-genome sequencing projects are a major source of proteins of unknown function. Annotation of a genome involves assignment of functions to gene products, in most cases on the basis of amino-acid sequence alone. 3D structure can aid the assignment of function, motivating the challenge of structural genomics projects to make structural information available for novel uncharacterized proteins. Structure-based identification of homologues often succeeds where sequence-alone-based methods fail, because in many cases evolution retains the folding pattern long after sequence similarity becomes undetectable. Nevertheless, prediction of protein function from sequence and structure is a difficult problem, because homologous proteins often have different functions. Many methods of function prediction rely on identifying similarity in sequence and/or structure between a protein of unknown function and one or more well-understood proteins. Alternative methods include inferring conservation patterns in members of a functionally uncharacterized family for which many sequences and structures are known. However, these inferences are tenuous. Such methods provide reasonable guesses at function, but are far from foolproof. It is therefore fortunate that the development of whole-organism approaches and comparative genomics permits other approaches to function prediction when the data are available. These include the use of protein-protein interaction patterns, and correlations between occurrences of related proteins in different organisms, as indicators of functional properties. Even if it is possible to ascribe a particular function to a gene product, the protein may have multiple functions. A fundamental problem is that function is in many cases an ill-defined concept. In this article we review the state of the art in function prediction and describe some of the underlying difficulties and successes. 相似文献