首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Tillier ER  Biro L  Li G  Tillo D 《Proteins》2006,63(4):822-831
Approaches for the determination of interacting partners from different protein families (such as ligands and their receptors) have made use of the property that interacting proteins follow similar patterns and relative rates of evolution. Interacting protein partners can then be predicted from the similarity of their phylogenetic trees or evolutionary distances matrices. We present a novel method called Codep, for the determination of interacting protein partners by maximizing co-evolutionary signals. The order of sequences in the multiple sequence alignments from two protein families is determined in such a manner as to maximize the similarity of substitution patterns at amino acid sites in the two alignments and, thus, phylogenetic congruency. This is achieved by maximizing the total number of interdependencies of amino acids sites between the alignments. Once ordered, the corresponding sequences in the two alignments indicate the predicted interacting partners. We demonstrate the efficacy of this approach with computer simulations and in analyses of several protein families. A program implementing our method, Codep, is freely available to academic users from our website: http://www.uhnresearch.ca/labs/tillier/.  相似文献   

2.
Tiana G  Broglia RA 《Proteins》2002,49(1):82-94
In a similar way in which the folding of single-domain proteins provides an important test in the study of self-organization, the folding of homodimers constitutes a basic challenge in the quest for the mechanisms that are the basis of biological recognition. Dimerization is studied by following the evolution of two identical 20-letter amino acid chains within the framework of a lattice model and using Monte Carlo simulations. It is found that when design (evolution pressure) selects few, strongly interacting (conserved) amino acids to control the process, a three-state folding scenario follows, where the monomers first fold forming the halves of the eventual dimeric interface independently of each other, and then dimerize ("lock and key" kind of association). On the other hand, if design distributes the control of the folding process on a large number of (conserved) amino acids, a two-state folding scenario ensues, where dimerization takes place at the beginning of the process, resulting in an "induced type" of association. Making use of conservation patterns of families of analogous dimers, it is possible to compare the model predictions with the behavior of real proteins. It is found that theory provides an overall account of the experimental findings.  相似文献   

3.
Oliveira L  Paiva PB  Paiva AC  Vriend G 《Proteins》2003,52(4):544-552
We introduce sequence entropy-variability plots as a method of analyzing families of protein sequences, and demonstrate this for three well-known sequence families: globins, ras-like proteins, and serine-proteases. The location of an aligned residue position in the entropy-variability plot correlates with structural characteristics, and with known facts about the roles of individual amino acids in the function of these proteins. The large numbers of known sequences in these families allowed us to introduce new filtering methods for variability patterns. The results are discussed in terms of a simple evolutionary model for functional proteins.  相似文献   

4.
Inferring protein functions from structures is a challenging task, as a large number of orphan protein structures from structural genomics project are now solved without their biochemical functions characterized. For proteins binding to similar substrates or ligands and carrying out similar functions, their binding surfaces are under similar physicochemical constraints, and hence the sets of allowed and forbidden residue substitutions are similar. However, it is difficult to isolate such selection pressure due to protein function from selection pressure due to protein folding, and evolutionary relationship reflected by global sequence and structure similarities between proteins is often unreliable for inferring protein function. We have developed a method, called pevoSOAR (pocket-based evolutionary search of amino acid residues), for predicting protein functions by solving the problem of uncovering amino acids residue substitution pattern due to protein function and separating it from amino acids substitution pattern due to protein folding. We incorporate evolutionary information specific to an individual binding region and match local surfaces on a large scale with millions of precomputed protein surfaces to identify those with similar functions. Our pevoSOAR method also generates a probablistic model called the computed binding a profile that characterizes protein-binding activities that may involve multiple substrates or ligands. We show that our method can be used to predict enzyme functions with accuracy. Our method can also assess enzyme binding specificity and promiscuity. In an objective large-scale test of 100 enzyme families with thousands of structures, our predictions are found to be sensitive and specific: At the stringent specificity level of 99.98%, we can correctly predict enzyme functions for 80.55% of the proteins. The overall area under the receiver operating characteristic curve measuring the performance of our prediction is 0.955, close to the perfect value of 1.00. The best Matthews coefficient is 86.6%. Our method also works well in predicting the biochemical functions of orphan proteins from structural genomics projects.  相似文献   

5.

Background

Intrinsically disordered proteins (IDPs) or proteins with disordered regions (IDRs) do not have a well-defined tertiary structure, but perform a multitude of functions, often relying on their native disorder to achieve the binding flexibility through changing to alternative conformations. Intrinsic disorder is frequently found in all three kingdoms of life, and may occur in short stretches or span whole proteins. To date most studies contrasting the differences between ordered and disordered proteins focused on simple summary statistics. Here, we propose an evolutionary approach to study IDPs, and contrast patterns specific to ordered protein regions and the corresponding IDRs.

Results

Two empirical Markov models of amino acid substitutions were estimated, based on a large set of multiple sequence alignments with experimentally verified annotations of disordered regions from the DisProt database of IDPs. We applied new methods to detect differences in Markovian evolution and evolutionary rates between IDRs and the corresponding ordered protein regions. Further, we investigated the distribution of IDPs among functional categories, biochemical pathways and their preponderance to contain tandem repeats.

Conclusions

We find significant differences in the evolution between ordered and disordered regions of proteins. Most importantly we find that disorder promoting amino acids are more conserved in IDRs, indicating that in some cases not only amino acid composition but the specific sequence is important for function. This conjecture is also reinforced by the observation that for of our data set IDRs evolve more slowly than the ordered parts of the proteins, while we still support the common view that IDRs in general evolve more quickly. The improvement in model fit indicates a possible improvement for various types of analyses e.g. de novo disorder prediction using a phylogenetic Hidden Markov Model based on our matrices showed a performance similar to other disorder predictors.  相似文献   

6.
Aminoacyl-tRNA synthetases (aaRS) consist of several families of functionally conserved proteins essential for translation and protein synthesis. Like nearly all components of the translation machinery, most aaRS families are universally distributed across cellular life, being inherited from the time of the Last Universal Common Ancestor (LUCA). However, unlike the rest of the translation machinery, aaRS have undergone numerous ancient horizontal gene transfers, with several independent events detected between domains, and some possibly involving lineages diverging before the time of LUCA. These transfers reveal the complexity of molecular evolution at this early time, and the chimeric nature of genomes within cells that gave rise to the major domains. Additionally, given the role of these protein families in defining the amino acids used for protein synthesis, sequence reconstruction of their pre-LUCA ancestors can reveal the evolutionary processes at work in the origin of the genetic code. In particular, sequence reconstructions of the paralog ancestors of isoleucyl- and valyl- RS provide strong empirical evidence that at least for this divergence, the genetic code did not co-evolve with the aaRSs; rather, both amino acids were already part of the genetic code before their cognate aaRSs diverged from their common ancestor. The implications of this observation for the early evolution of RNA-directed protein biosynthesis are discussed.  相似文献   

7.
A sequence comparison of signal receptor proteins (SR) was carried out using computer techniques based on physicochemical characteristics of amino acids. A new method of conserved regions determination for a family of proteins is described. Visual pigments have four, and all SR--three such regions in the cytoplasmic loops. Possible functional significance of these regions is discussed. We also report here that the family of SR is similar with the family of G-proteins involved in extracellular signal transduction. Both families have similar regions consisting of 7-8 amino acids and a number of identical amino acids distributed on the considerable part of the polypeptide chain of the proteins. These facts may indicate that the whole ensemble of the proteins participating in transmembrane signalling pathways (or some part of it) could evolve from a common progenitor. At the same time, similar structure elements of members of the mentioned protein families my be functionally important for protein-protein interaction.  相似文献   

8.
Proteins evolve under a myriad of biophysical selection pressures that collectively control the patterns of amino acid substitutions. These evolutionary pressures are sufficiently consistent over time and across protein families to produce substitution patterns, summarized in global amino acid substitution matrices such as BLOSUM, JTT, WAG, and LG, which can be used to successfully detect homologs, infer phylogenies, and reconstruct ancestral sequences. Although the factors that govern the variation of amino acid substitution rates have received much attention, the influence of thermodynamic stability constraints remains unresolved. Here we develop a simple model to calculate amino acid substitution matrices from evolutionary dynamics controlled by a fitness function that reports on the thermodynamic effects of amino acid mutations in protein structures. This hybrid biophysical and evolutionary model accounts for nucleotide transition/transversion rate bias, multi‐nucleotide codon changes, the number of codons per amino acid, and thermodynamic protein stability. We find that our theoretical model accurately recapitulates the complex yet universal pattern observed in common global amino acid substitution matrices used in phylogenetics. These results suggest that selection for thermodynamically stable proteins, coupled with nucleotide mutation bias filtered by the structure of the genetic code, is the primary driver behind the global amino acid substitution patterns observed in proteins throughout the tree of life.  相似文献   

9.
Discovery of local packing motifs in protein structures   总被引:1,自引:0,他引:1  
We present a language for describing structural patterns of residues in protein structures and a method for the discovery of such patterns that recur in a set of protein structures. The patterns impose restrictions on the spatial position of each residue, their order along the amino acid chain, and which amino acids are allowed in each position. Unlike other methods for comparing sets of protein structures, our method is not based on the use of pairwise structure comparisons which is often time consuming and can produce inconsistent results. Instead, the method simultaneously takes into account information from all structures in the search for conserved structure patterns which are potential structure motifs. The method is based on describing the spatial neighborhoods of each residue in each structure as a string and applying a sequence pattern discovery method to find patterns common to subsets of these strings. Finally it is checked whether the similarities between the neighborhood strings correspond to spatially similar substructures. We apply the method to analyze sets of very disparate proteins from the four different protein families: serine proteases, cuprodoxins, cysteine proteinases, and ferredoxins. The motifs found by the method correspond well to the site and motif information given in the annotation of these proteins in PDB, Swiss-Prot, and PROSITE. Furthermore, the motifs are confirmed by using the motif data to constrain the structural alignment of the proteins obtained with the program SAP. This gave the best superposition/alignment of the proteins given the motif assignment.  相似文献   

10.
Here, we present statistical analysis of conservation profiles in families of homologous sequences for nine proteins whose folding nucleus was determined by protein engineering methods. We show that in all but one protein (AcP) folding nucleus residues are significantly more conserved than the rest of the protein. Two aspects of our study are especially important: (i) grouping of amino acid residues into classes according to their physical-chemical properties and (ii) proper normalization of amino acid probabilities that reflects the fact that evolutionary pressure to conserve some amino acid types may itself affect concentration of various amino acid types in protein families. Neglect of any of those two factors may make physical and biological "signals" from conservation profiles disappear.  相似文献   

11.
Most eubacteria, and all eukaryotes examined thus far, encode homologs of the DNA mismatch repair protein MutS. Although eubacteria encode only one or two MutS-like proteins, eukaryotes encode at least six distinct MutS homolog (MSH) proteins, corresponding to conserved (orthologous) gene families. This suggests evolution of individual gene family lines of descent by several duplication/specialization events. Using quantitative phylogenetic analyses (RASA, or relative apparent synapomorphy analysis), we demonstrate that comparison of complete MutS protein sequences, rather than highly conserved C-terminal domains only, maximizes information about evolutionary relationships. We identify a novel, highly conserved middle domain, as well as clearly delineate an N-terminal domain, previously implicated in mismatch recognition, that shows family-specific patterns of aromatic and charged amino acids. Our final analysis, in contrast to previous analyses of MutS-like sequences, yields a stable phylogenetic tree consistent with the known biochemical functions of MutS/MSH proteins, that now assigns all known eukaryotic MSH proteins to a monophyletic group, whose branches correspond to the respective specialized gene families. The rooted phylogenetic tree suggests their derivation from a mitochondrial MSH1-like protein, itself the descendent of the MutS of a symbiont in a primitive eukaryotic precursor.  相似文献   

12.
13.
Catalytic domains of several prokaryotic and eukaryotic protease families require dedicated N-terminal propeptide domains or "intramolecular chaperones" to facilitate correct folding. Amino acid sequence analysis of these families establishes three important characteristics: (i) propeptides are almost always less conserved than their cognate catalytic domains, (ii) they contain a large number of charged amino acids, and (iii) propeptides within different protease families display insignificant sequence similarity. The implications of these findings are, however, unclear. In this study, we have used subtilisin as our model to redesign a peptide chaperone using information databases. Our goal was to establish the minimum sequence requirements for a functional subtilisin propeptide, because such information could facilitate subsequent design of tailor-made chaperones. A decision-based computer algorithm that maintained conserved residues but varied all non-conserved residues from a multiple protein sequence alignment was developed and utilized to design a novel peptide sequence (ProD). Interestingly, despite a difference of 5 pH units between their isoelectric points and despite displaying only 16% sequence identity with the wild-type propeptide (ProWT), ProD chaperones folding and functions as a potent subtilisin inhibitor. The computed secondary structures and hydrophobic patterns within these two propeptides are similar. However, unlike ProWT, ProD adopts a well defined alpha-beta conformation as an isolated peptide and forms a stoichiometric complex with mature subtilisin. The CD spectra of this complex is similar to ProWT.subtilisin. Our results establish that despite low sequence identity and dramatically different charge distribution, both propeptides adopt similar structural scaffolds. Hence, conserved scaffolds and hydrophobic patterns, but not absolute charge, dictate propeptide function.  相似文献   

14.
15.
Information about conformational properties of a protein is contained in the hydrophobicity values of the amino acids in its primary sequence. We have investigated the possibility of extracting meaningful evolutionary information from the comparison of the hydrophobicity values of the corresponding amino acids in the sequences of homologous proteins. Distance matrices for six families of homologous proteins were made on the basis of the differences in hydrophobicity values of the amino acids. The phylogenetic trees constructed from such matrices were at least as good (as judged from their faithful reflection of evolutionary relationships), as trees constructed from the usual minimum mutation distance matrix.  相似文献   

16.
Abstract

Number of naturally occurring primary sequences of proteins is an infinitesimally small subset of the possible number of primary sequences that can be synthesized using 20 amino acids. Prevailing views ascribe this to slow and incremental mutational/selection evolutionary mechanisms. However, considering the large number of avenues available in form of diversity of emerging/evolving and/or disappearing living systems for exploring the primary sequence space over the evolutionary time scale of ~3.5 billion years, this remains a conjecture. Therefore, to investigate primary sequence space limitations, we carried out a systematic study for finding primary sequences absent in nature. We report the discovery of the smallest peptide sequence “Cysteine-Glutamine-Tryptophan-Tryptophan” that is not found in over half-a-million curated protein sequences in the Uniprot (Swiss-Prot) database. Additionally, we report a library of 83605 pentapeptides that are not found in any of the known protein sequences. Compositional analyses of these absent primary sequences yield a remarkably strong power relationship between the percentage occurrence of individual amino acids in all known protein sequences and their respective frequency of occurrence in the absent peptides, regardless of their specific position in the sequences. If random evolutionary mechanisms were responsible for limitations to the primary sequence space, then one would not expect any relationship between compositions of available and absent primary sequences. Thus, we conclusively show that stoichiometric constraints on amino acids limit the primary sequence space of proteins in nature. We discuss the possibly profound implications of our findings in both evolutionary and synthetic biology.

Communicated by Ramaswamy H. Sarma  相似文献   

17.
Amino acids do not occur randomly in proteins; rather, their occurrence at any given site is strongly influenced by the amino acid composition at other sites, the structural and functional aspects of the region of the protein in which they occur, and the evolutionary history of the protein. The goal of our research study is to identify networks of coevolving sites within the serpin proteins (serine protease inhibitors) and classify them as being caused by structural-functional constraints or by evolutionary history. To address this, a matrix of pairwise normalized mutual information (NMI) values was computed among amino acid sites for the serpin proteins. The NMI matrix was partitioned into orthogonal patterns of amino acid variability by factor analysis. Each common factor pattern was interpreted as having phylogenetic and/or structural-functional explanations. In addition, we used a bootstrap factor analysis technique to limit the effects of phylogenetic history on our factor patterns. Our results show an extensive network of correlations among amino acid sites in key functional regions (reactive center loop, shutter, and breach). Additionally, we have discovered long-range coevolution for packed amino acids within the serpin protein core. Lastly, we have discovered a group of serpin sites which coevolve in the hydrophobic core region (s5B and s4B) and appear to represent sites important for formation of the "native" instead of the "latent" serpin structure. This research provides a better understanding on how protein structure evolves; in particular, it elucidates the selective forces creating coevolution among protein sites.  相似文献   

18.
Diverse proteins with similar structures are grouped into families of homologs and analogs, if their sequence similarity is higher or lower, respectively, than 20%–30%. It was suggested that protein homologs and analogs originate from a common ancestor and diverge in their distinct evolutionary time scales, emerging as a consequence of the physical properties of the protein sequence space. Although a number of studies have determined key signatures of protein family organization, the sequence-structure factors that differentiate the two evolution-related protein families remain unknown. Here, we stipulate that subtle structural changes, which appear due to accumulating mutations in the homologous families, lead to distinct packing of the protein core and, thus, novel compositions of core residues. The latter process leads to the formation of distinct families of homologs. We propose that such differentiation results in the formation of analogous families. To test our postulate, we developed a molecular modeling and design toolkit, Medusa, to computationally design protein sequences that correspond to the same fold family. We find that analogous proteins emerge when a backbone structure deviates only 1–2 Å root-mean-square deviation from the original structure. For close homologs, core residues are highly conserved. However, when the overall sequence similarity drops to ~25%–30%, the composition of core residues starts to diverge, thereby forming novel families of protein homologs. This direct observation of the formation of protein homologs within a specific fold family supports our hypothesis. The conservation of amino acids in designed sequences recapitulates that of the naturally occurring sequences, thereby validating our computational design methodology.  相似文献   

19.
X Zou  TK Pham  PC Wright  J Noirel 《Genomics》2012,100(4):240-244
Although protein expression and regulation have been intensively studied, a complete picture of its mechanisms is still to be drawn. Analysis of high-throughput quantitative proteomics data provides a way to better understand protein regulation. Here, we introduce a bioinformatic analysis method to correlate protein regulation with individual amino acid patterns. We compare the amino acid composition between groups of regulated and unregulated proteins and investigate the correlation between codon usage patterns and protein regulation levels in two Sulfolobus species in "biofilm vs planktonic" experiments. The identified amino acids can then be associated with the regulation of specific gene functions. Strikingly, our analysis shows that functional categories of regulated proteins with similar composition and codon usage pattern of specific amino acids behave similarly. This finding can contribute to a better understanding of protein and gene expression regulation and could find applications in gene optimisation.  相似文献   

20.
Substitutions of individual amino acids in proteins may be under very different evolutionary restraints depending on their structural and functional roles. The Environment Specific Substitution Table (ESST) describes the pattern of substitutions in terms of amino acid location within elements of secondary structure, solvent accessibility, and the existence of hydrogen bonds between side chains and neighbouring amino acid residues. Clearly amino acids that have very different local environments in their functional state compared to those in the protein analysed will give rise to inconsistencies in the calculation of amino acid substitution tables. Here, we describe how the calculation of ESSTs can be improved by discarding the functional residues from the calculation of substitution tables. Four categories of functions are examined in this study: protein–protein interactions, protein–nucleic acid interactions, protein–ligand interactions, and catalytic activity of enzymes. Their contributions to residue conservation are measured and investigated. We test our new ESSTs using the program CRESCENDO, designed to predict functional residues by exploiting knowledge of amino acid substitutions, and compare the benchmark results with proteins whose functions have been defined experimentally. The new methodology increases the Z-score by 98% at the active site residues and finds 16% more active sites compared with the old ESST. We also find that discarding amino acids responsible for protein–protein interactions helps in the prediction of those residues although they are not as conserved as the residues of active sites. Our methodology can make the substitution tables better reflect and describe the substitution patterns of amino acids that are under structural restraints only.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号