期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Codep: maximizing co-evolutionary interdependencies to discover interacting proteins

Tillier ER Biro L Li G Tillo D 《Proteins》2006,63(4):822-831

Approaches for the determination of interacting partners from different protein families (such as ligands and their receptors) have made use of the property that interacting proteins follow similar patterns and relative rates of evolution. Interacting protein partners can then be predicted from the similarity of their phylogenetic trees or evolutionary distances matrices. We present a novel method called Codep, for the determination of interacting protein partners by maximizing co-evolutionary signals. The order of sequences in the multiple sequence alignments from two protein families is determined in such a manner as to maximize the similarity of substitution patterns at amino acid sites in the two alignments and, thus, phylogenetic congruency. This is achieved by maximizing the total number of interdependencies of amino acids sites between the alignments. Once ordered, the corresponding sequences in the two alignments indicate the predicted interacting partners. We demonstrate the efficacy of this approach with computer simulations and in analyses of several protein families. A program implementing our method, Codep, is freely available to academic users from our website: http://www.uhnresearch.ca/labs/tillier/. 相似文献

2.

Design and folding of dimeric proteins

Tiana G Broglia RA 《Proteins》2002,49(1):82-94

In a similar way in which the folding of single-domain proteins provides an important test in the study of self-organization, the folding of homodimers constitutes a basic challenge in the quest for the mechanisms that are the basis of biological recognition. Dimerization is studied by following the evolution of two identical 20-letter amino acid chains within the framework of a lattice model and using Monte Carlo simulations. It is found that when design (evolution pressure) selects few, strongly interacting (conserved) amino acids to control the process, a three-state folding scenario follows, where the monomers first fold forming the halves of the eventual dimeric interface independently of each other, and then dimerize ("lock and key" kind of association). On the other hand, if design distributes the control of the folding process on a large number of (conserved) amino acids, a two-state folding scenario ensues, where dimerization takes place at the beginning of the process, resulting in an "induced type" of association. Making use of conservation patterns of families of analogous dimers, it is possible to compare the model predictions with the behavior of real proteins. It is found that theory provides an overall account of the experimental findings. 相似文献

3.

Identification of functionally conserved residues with the use of entropy-variability plots

Oliveira L Paiva PB Paiva AC Vriend G 《Proteins》2003,52(4):544-552

We introduce sequence entropy-variability plots as a method of analyzing families of protein sequences, and demonstrate this for three well-known sequence families: globins, ras-like proteins, and serine-proteases. The location of an aligned residue position in the entropy-variability plot correlates with structural characteristics, and with known facts about the roles of individual amino acids in the function of these proteins. The large numbers of known sequences in these families allowed us to introduce new filtering methods for variability patterns. The results are discussed in terms of a simple evolutionary model for functional proteins. 相似文献

4.

Predicting Protein Function and Binding Profile via Matching of Local Evolutionary and Geometric Surface Patterns 总被引：1，自引：0，他引：1

Yan Yuan Tseng 《Journal of molecular biology》2009,387(2):451-1175

Inferring protein functions from structures is a challenging task, as a large number of orphan protein structures from structural genomics project are now solved without their biochemical functions characterized. For proteins binding to similar substrates or ligands and carrying out similar functions, their binding surfaces are under similar physicochemical constraints, and hence the sets of allowed and forbidden residue substitutions are similar. However, it is difficult to isolate such selection pressure due to protein function from selection pressure due to protein folding, and evolutionary relationship reflected by global sequence and structure similarities between proteins is often unreliable for inferring protein function. We have developed a method, called pevoSOAR (pocket-based evolutionary search of amino acid residues), for predicting protein functions by solving the problem of uncovering amino acids residue substitution pattern due to protein function and separating it from amino acids substitution pattern due to protein folding. We incorporate evolutionary information specific to an individual binding region and match local surfaces on a large scale with millions of precomputed protein surfaces to identify those with similar functions. Our pevoSOAR method also generates a probablistic model called the computed binding a profile that characterizes protein-binding activities that may involve multiple substrates or ligands. We show that our method can be used to predict enzyme functions with accuracy. Our method can also assess enzyme binding specificity and promiscuity. In an objective large-scale test of 100 enzyme families with thousands of structures, our predictions are found to be sensitive and specific: At the stringent specificity level of 99.98%, we can correctly predict enzyme functions for 80.55% of the proteins. The overall area under the receiver operating characteristic curve measuring the performance of our prediction is 0.955, close to the perfect value of 1.00. The best Matthews coefficient is 86.6%. Our method also works well in predicting the biochemical functions of orphan proteins from structural genomics projects. 相似文献

5.

Markov models of amino acid substitution to study proteins with intrinsically disordered regions

Szalkowski AM Anisimova M 《PloS one》2011,6(5):e20488

Background

Intrinsically disordered proteins (IDPs) or proteins with disordered regions (IDRs) do not have a well-defined tertiary structure, but perform a multitude of functions, often relying on their native disorder to achieve the binding flexibility through changing to alternative conformations. Intrinsic disorder is frequently found in all three kingdoms of life, and may occur in short stretches or span whole proteins. To date most studies contrasting the differences between ordered and disordered proteins focused on simple summary statistics. Here, we propose an evolutionary approach to study IDPs, and contrast patterns specific to ordered protein regions and the corresponding IDRs.

Results

Two empirical Markov models of amino acid substitutions were estimated, based on a large set of multiple sequence alignments with experimentally verified annotations of disordered regions from the DisProt database of IDPs. We applied new methods to detect differences in Markovian evolution and evolutionary rates between IDRs and the corresponding ordered protein regions. Further, we investigated the distribution of IDPs among functional categories, biochemical pathways and their preponderance to contain tandem repeats.

Conclusions

We find significant differences in the evolution between ordered and disordered regions of proteins. Most importantly we find that disorder promoting amino acids are more conserved in IDRs, indicating that in some cases not only amino acid composition but the specific sequence is important for function. This conjecture is also reinforced by the observation that for of our data set IDRs evolve more slowly than the ordered parts of the proteins, while we still support the common view that IDRs in general evolve more quickly. The improvement in model fit indicates a possible improvement for various types of analyses e.g. de novo disorder prediction using a phylogenetic Hidden Markov Model based on our matrices showed a performance similar to other disorder predictors. 相似文献

6.

Molecular Evolution of Aminoacyl tRNA Synthetase Proteins in the Early History of Life

Fournier GP Andam CP Alm EJ Gogarten JP 《Origins of life and evolution of the biosphere》2011,41(6):621-632

Aminoacyl-tRNA synthetases (aaRS) consist of several families of functionally conserved proteins essential for translation and protein synthesis. Like nearly all components of the translation machinery, most aaRS families are universally distributed across cellular life, being inherited from the time of the Last Universal Common Ancestor (LUCA). However, unlike the rest of the translation machinery, aaRS have undergone numerous ancient horizontal gene transfers, with several independent events detected between domains, and some possibly involving lineages diverging before the time of LUCA. These transfers reveal the complexity of molecular evolution at this early time, and the chimeric nature of genomes within cells that gave rise to the major domains. Additionally, given the role of these protein families in defining the amino acids used for protein synthesis, sequence reconstruction of their pre-LUCA ancestors can reveal the evolutionary processes at work in the origin of the genetic code. In particular, sequence reconstructions of the paralog ancestors of isoleucyl- and valyl- RS provide strong empirical evidence that at least for this divergence, the genetic code did not co-evolve with the aaRSs; rather, both amino acids were already part of the genetic code before their cognate aaRSs diverged from their common ancestor. The implications of this observation for the early evolution of RNA-directed protein biosynthesis are discussed. 相似文献

7.

The evolution of signal receptor proteins: conserved regions and the similarity to GTP-binding proteins

D I Frishman A L Berman 《Zhurnal evoliutsionno? biokhimii i fiziologii》1990,26(1):14-29

A sequence comparison of signal receptor proteins (SR) was carried out using computer techniques based on physicochemical characteristics of amino acids. A new method of conserved regions determination for a family of proteins is described. Visual pigments have four, and all SR--three such regions in the cytoplasmic loops. Possible functional significance of these regions is discussed. We also report here that the family of SR is similar with the family of G-proteins involved in extracellular signal transduction. Both families have similar regions consisting of 7-8 amino acids and a number of identical amino acids distributed on the considerable part of the polypeptide chain of the proteins. These facts may indicate that the whole ensemble of the proteins participating in transmembrane signalling pathways (or some part of it) could evolve from a common progenitor. At the same time, similar structure elements of members of the mentioned protein families my be functionally important for protein-protein interaction. 相似文献

8.

Discovery of local packing motifs in protein structures 总被引：1，自引：0，他引：1

Jonassen I Eidhammer I Taylor WR 《Proteins》1999,34(2):206-219

We present a language for describing structural patterns of residues in protein structures and a method for the discovery of such patterns that recur in a set of protein structures. The patterns impose restrictions on the spatial position of each residue, their order along the amino acid chain, and which amino acids are allowed in each position. Unlike other methods for comparing sets of protein structures, our method is not based on the use of pairwise structure comparisons which is often time consuming and can produce inconsistent results. Instead, the method simultaneously takes into account information from all structures in the search for conserved structure patterns which are potential structure motifs. The method is based on describing the spatial neighborhoods of each residue in each structure as a string and applying a sequence pattern discovery method to find patterns common to subsets of these strings. Finally it is checked whether the similarities between the neighborhood strings correspond to spatially similar substructures. We apply the method to analyze sets of very disparate proteins from the four different protein families: serine proteases, cuprodoxins, cysteine proteinases, and ferredoxins. The motifs found by the method correspond well to the site and motif information given in the annotation of these proteins in PDB, Swiss-Prot, and PROSITE. Furthermore, the motifs are confirmed by using the motif data to constrain the structural alignment of the proteins obtained with the program SAP. This gave the best superposition/alignment of the proteins given the motif assignment. 相似文献

9.

A thermodynamic model of protein structure evolution explains empirical amino acid substitution matrices

Christoffer Norn Ingemar Andr Douglas L. Theobald 《Protein science : a publication of the Protein Society》2021,30(10):2057

Proteins evolve under a myriad of biophysical selection pressures that collectively control the patterns of amino acid substitutions. These evolutionary pressures are sufficiently consistent over time and across protein families to produce substitution patterns, summarized in global amino acid substitution matrices such as BLOSUM, JTT, WAG, and LG, which can be used to successfully detect homologs, infer phylogenies, and reconstruct ancestral sequences. Although the factors that govern the variation of amino acid substitution rates have received much attention, the influence of thermodynamic stability constraints remains unresolved. Here we develop a simple model to calculate amino acid substitution matrices from evolutionary dynamics controlled by a fitness function that reports on the thermodynamic effects of amino acid mutations in protein structures. This hybrid biophysical and evolutionary model accounts for nucleotide transition/transversion rate bias, multi‐nucleotide codon changes, the number of codons per amino acid, and thermodynamic protein stability. We find that our theoretical model accurately recapitulates the complex yet universal pattern observed in common global amino acid substitution matrices used in phylogenetics. These results suggest that selection for thermodynamically stable proteins, coupled with nucleotide mutation bias filtered by the structure of the genetic code, is the primary driver behind the global amino acid substitution patterns observed in proteins throughout the tree of life. 相似文献

10.

Evolutionary conservation of the folding nucleus

Mirny L Shakhnovich E 《Journal of molecular biology》2001,308(2):123-129

Here, we present statistical analysis of conservation profiles in families of homologous sequences for nine proteins whose folding nucleus was determined by protein engineering methods. We show that in all but one protein (AcP) folding nucleus residues are significantly more conserved than the rest of the protein. Two aspects of our study are especially important: (i) grouping of amino acid residues into classes according to their physical-chemical properties and (ii) proper normalization of amino acid probabilities that reflects the fact that evolutionary pressure to conserve some amino acid types may itself affect concentration of various amino acid types in protein families. Neglect of any of those two factors may make physical and biological "signals" from conservation profiles disappear. 相似文献

11.

Evolutionary origin, diversification and specialization of eukaryotic MutS homolog mismatch repair proteins 总被引：11，自引：2，他引：9

Culligan KM Meyer-Gauen G Lyons-Weiler J Hays JB 《Nucleic acids research》2000,28(2):463-471

Most eubacteria, and all eukaryotes examined thus far, encode homologs of the DNA mismatch repair protein MutS. Although eubacteria encode only one or two MutS-like proteins, eukaryotes encode at least six distinct MutS homolog (MSH) proteins, corresponding to conserved (orthologous) gene families. This suggests evolution of individual gene family lines of descent by several duplication/specialization events. Using quantitative phylogenetic analyses (RASA, or relative apparent synapomorphy analysis), we demonstrate that comparison of complete MutS protein sequences, rather than highly conserved C-terminal domains only, maximizes information about evolutionary relationships. We identify a novel, highly conserved middle domain, as well as clearly delineate an N-terminal domain, previously implicated in mismatch recognition, that shows family-specific patterns of aromatic and charged amino acids. Our final analysis, in contrast to previous analyses of MutS-like sequences, yields a stable phylogenetic tree consistent with the known biochemical functions of MutS/MSH proteins, that now assigns all known eukaryotic MSH proteins to a monophyletic group, whose branches correspond to the respective specialized gene families. The rooted phylogenetic tree suggests their derivation from a mitochondrial MSH1-like protein, itself the descendent of the MutS of a symbiont in a primitive eukaryotic precursor. 相似文献

12.

Multiple independent evolutionary solutions to core histone gene regulation

Mariño-Ramírez L Jordan IK Landsman D 《Genome biology》2006,7(12):R122-17

相似文献

13.

Folding pathway mediated by an intramolecular chaperone. A functional peptide chaperone designed using sequence databases

Yabuta Y Subbian E Oiry C Shinde U 《The Journal of biological chemistry》2003,278(17):15246-15251

Catalytic domains of several prokaryotic and eukaryotic protease families require dedicated N-terminal propeptide domains or "intramolecular chaperones" to facilitate correct folding. Amino acid sequence analysis of these families establishes three important characteristics: (i) propeptides are almost always less conserved than their cognate catalytic domains, (ii) they contain a large number of charged amino acids, and (iii) propeptides within different protease families display insignificant sequence similarity. The implications of these findings are, however, unclear. In this study, we have used subtilisin as our model to redesign a peptide chaperone using information databases. Our goal was to establish the minimum sequence requirements for a functional subtilisin propeptide, because such information could facilitate subsequent design of tailor-made chaperones. A decision-based computer algorithm that maintained conserved residues but varied all non-conserved residues from a multiple protein sequence alignment was developed and utilized to design a novel peptide sequence (ProD). Interestingly, despite a difference of 5 pH units between their isoelectric points and despite displaying only 16% sequence identity with the wild-type propeptide (ProWT), ProD chaperones folding and functions as a potent subtilisin inhibitor. The computed secondary structures and hydrophobic patterns within these two propeptides are similar. However, unlike ProWT, ProD adopts a well defined alpha-beta conformation as an isolated peptide and forms a stoichiometric complex with mature subtilisin. The CD spectra of this complex is similar to ProWT.subtilisin. Our results establish that despite low sequence identity and dramatically different charge distribution, both propeptides adopt similar structural scaffolds. Hence, conserved scaffolds and hydrophobic patterns, but not absolute charge, dictate propeptide function. 相似文献

14.

Evolution of function in the "two dinucleotide binding domains" flavoproteins

下载免费PDF全文

Ojha S Meng EC Babbitt PC 《PLoS computational biology》2007,3(7):e121

相似文献

15.

Phylogenetic trees constructed from hydrophobicity values of protein sequences

J A Leunissen W W de Jong 《Journal of theoretical biology》1986,119(2):189-196

Information about conformational properties of a protein is contained in the hydrophobicity values of the amino acids in its primary sequence. We have investigated the possibility of extracting meaningful evolutionary information from the comparison of the hydrophobicity values of the corresponding amino acids in the sequences of homologous proteins. Distance matrices for six families of homologous proteins were made on the basis of the differences in hydrophobicity values of the amino acids. The phylogenetic trees constructed from such matrices were at least as good (as judged from their faithful reflection of evolutionary relationships), as trees constructed from the usual minimum mutation distance matrix. 相似文献

16.

What limits the primary sequence space of natural proteins?

Aditya Mittal Anandkumar Madhavjibhai Changani Sakshi Taparia 《Journal of biomolecular structure & dynamics》2020,38(15):4579-4583

Abstract

Number of naturally occurring primary sequences of proteins is an infinitesimally small subset of the possible number of primary sequences that can be synthesized using 20 amino acids. Prevailing views ascribe this to slow and incremental mutational/selection evolutionary mechanisms. However, considering the large number of avenues available in form of diversity of emerging/evolving and/or disappearing living systems for exploring the primary sequence space over the evolutionary time scale of ～3.5 billion years, this remains a conjecture. Therefore, to investigate primary sequence space limitations, we carried out a systematic study for finding primary sequences absent in nature. We report the discovery of the smallest peptide sequence “Cysteine-Glutamine-Tryptophan-Tryptophan” that is not found in over half-a-million curated protein sequences in the Uniprot (Swiss-Prot) database. Additionally, we report a library of 83605 pentapeptides that are not found in any of the known protein sequences. Compositional analyses of these absent primary sequences yield a remarkably strong power relationship between the percentage occurrence of individual amino acids in all known protein sequences and their respective frequency of occurrence in the absent peptides, regardless of their specific position in the sequences. If random evolutionary mechanisms were responsible for limitations to the primary sequence space, then one would not expect any relationship between compositions of available and absent primary sequences. Thus, we conclusively show that stoichiometric constraints on amino acids limit the primary sequence space of proteins in nature. We discuss the possibly profound implications of our findings in both evolutionary and synthetic biology.

Communicated by Ramaswamy H. Sarma 相似文献

17.

Networks of coevolving sites in structural and functional domains of serpin proteins

Buck MJ Atchley WR 《Molecular biology and evolution》2005,22(7):1627-1634

Amino acids do not occur randomly in proteins; rather, their occurrence at any given site is strongly influenced by the amino acid composition at other sites, the structural and functional aspects of the region of the protein in which they occur, and the evolutionary history of the protein. The goal of our research study is to identify networks of coevolving sites within the serpin proteins (serine protease inhibitors) and classify them as being caused by structural-functional constraints or by evolutionary history. To address this, a matrix of pairwise normalized mutual information (NMI) values was computed among amino acid sites for the serpin proteins. The NMI matrix was partitioned into orthogonal patterns of amino acid variability by factor analysis. Each common factor pattern was interpreted as having phylogenetic and/or structural-functional explanations. In addition, we used a bootstrap factor analysis technique to limit the effects of phylogenetic history on our factor patterns. Our results show an extensive network of correlations among amino acid sites in key functional regions (reactive center loop, shutter, and breach). Additionally, we have discovered long-range coevolution for packed amino acids within the serpin protein core. Lastly, we have discovered a group of serpin sites which coevolve in the hydrophobic core region (s5B and s4B) and appear to represent sites important for formation of the "native" instead of the "latent" serpin structure. This research provides a better understanding on how protein structure evolves; in particular, it elucidates the selective forces creating coevolution among protein sites. 相似文献

18.

Emergence of protein fold families through rational design

下载免费PDF全文

Ding F Dokholyan NV 《PLoS computational biology》2006,2(7):e85

Diverse proteins with similar structures are grouped into families of homologs and analogs, if their sequence similarity is higher or lower, respectively, than 20%–30%. It was suggested that protein homologs and analogs originate from a common ancestor and diverge in their distinct evolutionary time scales, emerging as a consequence of the physical properties of the protein sequence space. Although a number of studies have determined key signatures of protein family organization, the sequence-structure factors that differentiate the two evolution-related protein families remain unknown. Here, we stipulate that subtle structural changes, which appear due to accumulating mutations in the homologous families, lead to distinct packing of the protein core and, thus, novel compositions of core residues. The latter process leads to the formation of distinct families of homologs. We propose that such differentiation results in the formation of analogous families. To test our postulate, we developed a molecular modeling and design toolkit, Medusa, to computationally design protein sequences that correspond to the same fold family. We find that analogous proteins emerge when a backbone structure deviates only 1–2 Å root-mean-square deviation from the original structure. For close homologs, core residues are highly conserved. However, when the overall sequence similarity drops to ~25%–30%, the composition of core residues starts to diverge, thereby forming novel families of protein homologs. This direct observation of the formation of protein homologs within a specific fold family supports our hypothesis. The conservation of amino acids in designed sequences recapitulates that of the naturally occurring sequences, thereby validating our computational design methodology. 相似文献

19.

Bioinformatic study of the relationship between protein regulation and sequence properties

X Zou TK Pham PC Wright J Noirel 《Genomics》2012,100(4):240-244

Although protein expression and regulation have been intensively studied, a complete picture of its mechanisms is still to be drawn. Analysis of high-throughput quantitative proteomics data provides a way to better understand protein regulation. Here, we introduce a bioinformatic analysis method to correlate protein regulation with individual amino acid patterns. We compare the amino acid composition between groups of regulated and unregulated proteins and investigate the correlation between codon usage patterns and protein regulation levels in two Sulfolobus species in "biofilm vs planktonic" experiments. The identified amino acids can then be associated with the regulation of specific gene functions. Strikingly, our analysis shows that functional categories of regulated proteins with similar composition and codon usage pattern of specific amino acids behave similarly. This finding can contribute to a better understanding of protein and gene expression regulation and could find applications in gene optimisation. 相似文献

20.

Frequent and widespread parallel evolution of protein sequences

Rokas A Carroll SB 《Molecular biology and evolution》2008,25(9):1943-1953

Understanding the patterns and causes of protein sequence evolution is a major challenge in evolutionary biology. One of the critical unresolved issues is the relative contribution of selection and genetic drift to the fixation of amino acid sequence differences between species. Molecular homoplasy, the independent evolution of the same amino acids at orthologous sites in different taxa, is one potential signature of selection; however, relatively little is known about its prevalence in eukaryotic proteomes. To quantify the extent and type of homoplasy among evolving proteins, we used phylogenetic methodology to analyze 8 genome-scale data matrices from clades of different evolutionary depths that span the eukaryotic tree of life. We found that the frequency of homoplastic amino acid substitutions in eukaryotic proteins was more than 2-fold higher than expected under neutral models of protein evolution. The overwhelming majority of homoplastic substitutions were parallelisms that involved the most frequently exchanged amino acids with similar physicochemical properties and that could be reached by a single-mutational step. We conclude that the role of homoplasy in shaping the protein record is much larger than generally assumed, and we suggest that its high frequency can be explained by both weak positive selection for certain substitutions and purifying selection that constrains substitutions to a small number of functionally equivalent amino acids. 相似文献