首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 875 毫秒
1.
Armando D. Solis 《Proteins》2015,83(12):2198-2216
To reduce complexity, understand generalized rules of protein folding, and facilitate de novo protein design, the 20‐letter amino acid alphabet is commonly reduced to a smaller alphabet by clustering amino acids based on some measure of similarity. In this work, we seek the optimal alphabet that preserves as much of the structural information found in long‐range (contact) interactions among amino acids in natively‐folded proteins. We employ the Information Maximization Device, based on information theory, to partition the amino acids into well‐defined clusters. Numbering from 2 to 19 groups, these optimal clusters of amino acids, while generated automatically, embody well‐known properties of amino acids such as hydrophobicity/polarity, charge, size, and aromaticity, and are demonstrated to maintain the discriminative power of long‐range interactions with minimal loss of mutual information. Our measurements suggest that reduced alphabets (of less than 10) are able to capture virtually all of the information residing in native contacts and may be sufficient for fold recognition, as demonstrated by extensive threading tests. In an expansive survey of the literature, we observe that alphabets derived from various approaches—including those derived from physicochemical intuition, local structure considerations, and sequence alignments of remote homologs—fare consistently well in preserving contact interaction information, highlighting a convergence in the various factors thought to be relevant to the folding code. Moreover, we find that alphabets commonly used in experimental protein design are nearly optimal and are largely coherent with observations that have arisen in this work. Proteins 2015; 83:2198–2216. © 2015 Wiley Periodicals, Inc.  相似文献   

2.
Huang JT  Tian J 《Proteins》2006,63(3):551-554
The significant correlation between protein folding rates and the sequence-predicted secondary structure suggests that folding rates are largely determined by the amino acid sequence. Here, we present a method for predicting the folding rates of proteins from sequences using the intrinsic properties of amino acids, which does not require any information on secondary structure prediction and structural topology. The contribution of residue to the folding rate is expressed by the residue's Omega value. For a given residue, its Omega depends on the amino acid properties (amino acid rigidity and dislike of amino acid for secondary structures). Our investigation achieves 82% correlation with folding rates determined experimentally for simple, two-state proteins studied until the present, suggesting that the amino acid sequence of a protein is an important determinant of the protein-folding rate and mechanism.  相似文献   

3.
Li T  Fan K  Wang J  Wang W 《Protein engineering》2003,16(5):323-330
It is well known that there are some similarities among various naturally occurring amino acids. Thus, the complexity in protein systems could be reduced by sorting these amino acids with similarities into groups and then protein sequences can be simplified by reduced alphabets. This paper discusses how to group similar amino acids and whether there is a minimal amino acid alphabet by which proteins can be folded. Various reduced alphabets are obtained by reserving the maximal information for the simplified protein sequence compared with the parent sequence using global sequence alignment. With these reduced alphabets and simplified similarity matrices, we achieve recognition of the protein fold based on the similarity score of the sequence alignment. The coverage in dataset SCOP40 for various levels of reduction on the amino acid types is obtained, which is the number of homologous pairs detected by program BLAST to the number marked by SCOP40. For the reduced alphabets containing 10 types of amino acids, the ability to detect distantly related folds remains almost at the same level as that by the alphabet of 20 types of amino acids, which implies that 10 types of amino acids may be the degree of freedom for characterizing the complexity in proteins.  相似文献   

4.
Reduced amino acid alphabets are useful to understand molecular evolution as they reveal basal, shared properties of amino acids, which the structures and functions of proteins rely on. Several previous studies derived such reduced alphabets and linked them to the origin of life and biotechnological applications. However, all this previous work presupposes that only direct contacts of amino acids in native protein structures are relevant. We show in this work, using information–theoretical measures, that an appropriate alphabet reduction scheme is in fact a function of the maximum distance amino acids interact at. Although for small distances our results agree with previous ones, we show how long‐range interactions change the overall picture and prompt for a revised understanding of the protein design process. Proteins 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

5.
Melo F  Marti-Renom MA 《Proteins》2006,63(4):986-995
Reduced or simplified amino acid alphabets group the 20 naturally occurring amino acids into a smaller number of representative protein residues. To date, several reduced amino acid alphabets have been proposed, which have been derived and optimized by a variety of methods. The resulting reduced amino acid alphabets have been applied to pattern recognition, generation of consensus sequences from multiple alignments, protein folding, and protein structure prediction. In this work, amino acid substitution matrices and statistical potentials were derived based on several reduced amino acid alphabets and their performance assessed in a large benchmark for the tasks of sequence alignment and fold assessment of protein structure models, using as a reference frame the standard alphabet of 20 amino acids. The results showed that a large reduction in the total number of residue types does not necessarily translate into a significant loss of discriminative power for sequence alignment and fold assessment. Therefore, some definitions of a few residue types are able to encode most of the relevant sequence/structure information that is present in the 20 standard amino acids. Based on these results, we suggest that the use of reduced amino acid alphabets may allow to increasing the accuracy of current substitution matrices and statistical potentials for the prediction of protein structure of remote homologs.  相似文献   

6.
Ma BG  Guo JX  Zhang HY 《Proteins》2006,65(2):362-372
Discovering the mechanism of protein folding, in molecular biology, is a great challenge. A key step to this end is to find factors that correlate with protein folding rates. Over the past few years, many empirical parameters, such as contact order, long-range order, total contact distance, secondary structure contents, have been developed to reflect the correlation between folding rates and protein tertiary or secondary structures. However, the correlation between proteins' folding rates and their amino acid compositions has not been explored. In the present work, we examined systematically the correlation between proteins' folding rates and their amino acid compositions for two-state and multistate folders and found that different amino acids contributed differently to the folding progress. The relation between the amino acids' molecular weight and degeneracy and the folding rates was examined, and the role of hydrophobicity in the protein folding process was also inspected. As a consequence, a new indicator called composition index was derived, which takes no structure factors into account and is merely determined by the amino acid composition of a protein. Such an indicator is found to be highly correlated with the protein's folding rate (r > 0.7). From the results of this work, three points of concluding remarks are evident. (1) Two-state folders and multistate folders have different rate-determining amino acids. (2) The main determining information of a protein's folding rate is largely reflected in its amino acid composition. (3) Composition index may be the best predictor for an ab initio protein folding rate prediction directly from protein sequence from the standpoint of practical application.  相似文献   

7.
Proteins fold by either two‐state or multistate kinetic mechanism. We observe that amino acids play different roles in different mechanism. Many residues that are easy to form regular secondary structures (α helices, β sheets and turns) can promote the two‐state folding reactions of small proteins. Most of hydrophilic residues can speed up the multistate folding reactions of large proteins. Folding rates of large proteins are equally responsive to the flexibility of partial amino acids. Other properties of amino acids (including volume, polarity, accessible surface, exposure degree, isoelectric point, and phase transfer energy) have contributed little to folding kinetics of the proteins. Cysteine is a special residue, it triggers two‐state folding reaction and but inhibits multistate folding reaction. These findings not only provide a new insight into protein structure prediction, but also could be used to direct the point mutations that can change folding rate. Proteins 2014; 82:2375–2382. © 2014 Wiley Periodicals, Inc.  相似文献   

8.
A new computational approach optimizes searches for reduced protein folding alphabets that use fewer than 20 types of amino acids. The predicted optimal five-letter alphabet happens to be in agreement with the suggestive results of a recent experiment, but whether highly reduced alphabets are sufficient for truly protein-like properties remains an open experimental question.  相似文献   

9.
Algorithms predicting RNA secondary structures based on different folding criteria – minimum free energies (mfe), kinetic folding (kin), maximum matching (mm) – and different parameter sets are studied systematically. Two base pairing alphabets were used: the binary GC and the natural four-letter AUGC alphabet. Computed structures and free energies depend strongly on both the algorithm and the parameter set. Statistical properties, such as mean number of base pairs, mean numbers of stacks, mean loop sizes, etc., are much less sensitive to the choice of parameter set and even of algorithm. Some features of RNA secondary structures, such as structure correlation functions, shape space covering and neutral networks, seem to depend only on the base pairing logic (GC or AUGC alphabet). Received: 16 May 1996 / Accepted: 10 July 1996  相似文献   

10.
What is the minimum number of letters required to fold a protein?   总被引:4,自引:0,他引:4  
Experimental studies have shown that the full sequence complexity of naturally occurring proteins is not required to generate rapidly folding and functional proteins, i.e. proteins can be designed with fewer than 20 letters. This raises the question of what is the minimum number of amino acid types required to encode complex protein folds? Here, we investigate this issue from three aspects. First, we study the minimum sequence complexity that can reserve the necessary structural information for detection of distantly related homologues. Second, we compare the ability of designing foldable model sequences over a wide range of reduced amino acid alphabets, which find the minimum number of letters that have the similar design ability as 20. Finally, we survey the lower bound of alphabet size of globular proteins in a non-redundant protein database. These different approaches give a remarkably consistent view, that the minimum number of letters required to fold a protein is around ten.  相似文献   

11.
从氨基酸序列预测蛋白质折叠速率   总被引:1,自引:0,他引:1  
蛋白质折叠速率预测是当今生物物理学最具挑战性的课题之一.近年来,许多科研工作者开展了大量的研究工作来探索折叠速率的决定因素,许多参数和方法被相继提出.但氨基酸残基间的相互作用、氨基酸的序列顺序等信息对折叠速率的影响从未被提及.采用伪氨基酸组成的方法提取氨基酸的序列顺序信息,利用蒙特卡洛方法选择最佳特征因子,建立线性回归模型进行折叠速率预测.该方法能在不需要任何(显示)结构信息的情况下,直接从蛋白质的氨基酸序列出发对折叠速率进行预测.在Jackknife交互检验方法的验证下,对含有99个蛋白质的数据集,发现折叠速率的预测值与实验值有很好的相关性,相关系数能达到0.81,预测误差仅为2.54.这一精度明显优于其他基于序列的方法,充分说明蛋白质的序列顺序信息是影响蛋白质折叠速率的重要因素.  相似文献   

12.
Folding type-specific secondary structure propensities of 20 naturally occurring amino acids have been derived from α-helical, β-sheet, α/β, and α+β proteins of known structures. These data show that each residue type of amino acids has intrinsic propensities in different regions of secondary structures for different folding types of proteins. Each of the folding types shows markedly different rank ordering, indicating folding type-specific effects on the secondary structure propensities of amino acids. Rigorous statistical tests have been made to validate the folding type-specific effects. It should be noted that α and β proteins have relatively small α-helices and β-strands forming propensities respectively compared with those of α+β and α/β proteins. This may suggest that, with more complex architectures than α and β proteins, α+β and α/β proteins require larger propensities to distinguish from interacting α-helices and β-strands. Our finding of folding type-specific secondary structure propensities suggests that sequence space accessible to each folding type may have differing features. Differing sequence space features might be constrained by topological requirement for each of the folding types. Almost all strong β-sheet forming residues are hydrophobic in character regardless of folding types, thus suggesting the hydrophobicities of side chains as a key determinant of β-sheet structures. In contrast, conformational entropy of side chains is a major determinant of the helical propensities of amino acids, although other interactions such as hydrophobicities and charged interactions cannot be neglected. These results will be helpful to protein design, class-based secondary structure prediction, and protein folding. © 1998 John Wiley & Sons, Inc. Biopoly 45: 35–49, 1998  相似文献   

13.
Discovering structural correlations in alpha-helices.   总被引:5,自引:2,他引:3       下载免费PDF全文
We have developed a new representation for structural and functional motifs in protein sequences based on correlations between pairs of amino acids and applied it to alpha-helical and beta-sheet sequences. Existing probabilistic methods for representing and analyzing protein sequences have traditionally assumed conditional independence of evidence. In other words, amino acids are assumed to have no effect on each other. However, analyses of protein structures have repeatedly demonstrated the importance of interactions between amino acids in conferring both structure and function. Using Bayesian networks, we are able to model the relationships between amino acids at distinct positions in a protein sequence in addition to the amino acid distributions at each position. We have also developed an automated program for discovering sequence correlations using standard statistical tests and validation techniques. In this paper, we test this program on sequences from secondary structure motifs, namely alpha-helices and beta-sheets. In each case, the correlations our program discovers correspond well with known physical and chemical interactions between amino acids in structures. Furthermore, we show that, using different chemical alphabets for the amino acids, we discover structural relationships based on the same chemical principle used in constructing the alphabet. This new representation of 3-dimensional features in protein motifs, such as those arising from structural or functional constraints on the sequence, can be used to improve sequence analysis tools including pattern analysis and database search.  相似文献   

14.
Screening of functional proteins from a random‐sequence library has been used to evolve novel proteins in the field of evolutionary protein engineering. However, random‐sequence proteins consisting of the 20 natural amino acids tend to aggregate, and the occurrence rate of functional proteins in a random‐sequence library is low. From the viewpoint of the origin of life, it has been proposed that primordial proteins consisted of a limited set of amino acids that could have been abundantly formed early during chemical evolution. We have previously found that members of a random‐sequence protein library constructed with five primitive amino acids show high solubility (Doi et al., Protein Eng Des Sel 2005;18:279–284). Although such a library is expected to be appropriate for finding functional proteins, the functionality may be limited, because they have no positively charged amino acid. Here, we constructed three libraries of 120‐amino acid, random‐sequence proteins using alphabets of 5, 12, and 20 amino acids by preselection using mRNA display (to eliminate sequences containing stop codons and frameshifts) and characterized and compared the structural properties of random‐sequence proteins arbitrarily chosen from these libraries. We found that random‐sequence proteins constructed with the 12‐member alphabet (including five primitive amino acids and positively charged amino acids) have higher solubility than those constructed with the 20‐member alphabet, though other biophysical properties are very similar in the two libraries. Thus, a library of moderate complexity constructed from 12 amino acids may be a more appropriate resource for functional screening than one constructed from 20 amino acids.  相似文献   

15.
Similarities and differences between amino acids define the rates at which they substitute for one another within protein sequences and the patterns by which these sequences form protein structures. However, there exist many ways to measure similarity, whether one considers the molecular attributes of individual amino acids, the roles that they play within proteins, or some nuanced contribution of each. One popular approach to representing these relationships is to divide the 20 amino acids of the standard genetic code into groups, thereby forming a simplified amino acid alphabet. Here, we develop a method to compare or combine different simplified alphabets, and apply it to 34 simplified alphabets from the scientific literature. We use this method to show that while different suggestions vary and agree in non-intuitive ways, they combine to reveal a consensus view of amino acid similarity that is clearly rooted in physico-chemistry.  相似文献   

16.
Principles of protein folding--a perspective from simple exact models.   总被引:32,自引:12,他引:20       下载免费PDF全文
General principles of protein structure, stability, and folding kinetics have recently been explored in computer simulations of simple exact lattice models. These models represent protein chains at a rudimentary level, but they involve few parameters, approximations, or implicit biases, and they allow complete explorations of conformational and sequence spaces. Such simulations have resulted in testable predictions that are sometimes unanticipated: The folding code is mainly binary and delocalized throughout the amino acid sequence. The secondary and tertiary structures of a protein are specified mainly by the sequence of polar and nonpolar monomers. More specific interactions may refine the structure, rather than dominate the folding code. Simple exact models can account for the properties that characterize protein folding: two-state cooperativity, secondary and tertiary structures, and multistage folding kinetics--fast hydrophobic collapse followed by slower annealing. These studies suggest the possibility of creating "foldable" chain molecules other than proteins. The encoding of a unique compact chain conformation may not require amino acids; it may require only the ability to synthesize specific monomer sequences in which at least one monomer type is solvent-averse.  相似文献   

17.
Huang JT  Xing DJ  Huang W 《Amino acids》2012,43(2):567-572
The successful prediction of protein-folding rates based on the sequence-predicted secondary structure suggests that the folding rates might be predicted from sequence alone. To pursue this question, we directly predict the folding rates from amino acid sequences, which do not require any information on secondary or tertiary structure. Our work achieves 88% correlation with folding rates determined experimentally for proteins of all folding types and peptide, suggesting that almost all of the information needed to specify a protein's folding kinetics and mechanism is comprised within its amino acid sequence. The influence of residue on folding rate is related to amino acid properties. Hydrophobic character of amino acids may be an important determinant of folding kinetics, whereas other properties, size, flexibility, polarity and isoelectric point, of amino acids have contributed little to the folding rate constant.  相似文献   

18.
Despite the large number of publications on three‐helix protein folding, there is no study devoted to the influence of handedness on the rate of three‐helix protein folding. From the experimental studies, we make a conclusion that the left‐handed three‐helix proteins fold faster than the right‐handed ones. What may explain this difference? An important question arising in this paper is whether the modeling of protein folding can catch the difference between the protein folding rates of proteins with similar structures but with different folding mechanisms. To answer this question, the folding of eight three‐helix proteins (four right‐handed and four left‐handed), which are similar in size, was modeled using the Monte Carlo and dynamic programming methods. The studies allowed us to determine the orders of folding of the secondary‐structure elements in these domains and amino acid residues which are important for the folding. The obtained data are in good correlation with each other and with the experimental data. Structural analysis of these proteins demonstrated that the left‐handed domains have a lesser number of contacts per residue and a smaller radius of cross section than the right‐handed domains. This may be one of the explanations of the observed fact. The same tendency is observed for the large dataset consisting of 332 three‐helix proteins (238 right‐ and 94 left‐handed). From our analysis, we found that the left‐handed three‐helix proteins have some less‐dense packing that should result in faster folding for some proteins as compared to the case of right‐handed proteins.Proteins 2013; © 2013 Wiley Periodicals, Inc.  相似文献   

19.
Protein design experiments have shown that the use of specific subsets of amino acids can produce foldable proteins. This prompts the question of whether there is a minimal amino acid alphabet which could be used to fold all proteins. In this work we make an analogy between sequence patterns which produce foldable sequences and those which make it possible to detect structural homologs by aligning sequences, and use it to suggest the possible size of such a reduced alphabet. We estimate that reduced alphabets containing 10-12 letters can be used to design foldable sequences for a large number of protein families. This estimate is based on the observation that there is little loss of the information necessary to pick out structural homologs in a clustered protein sequence database when a suitable reduction of the amino acid alphabet from 20 to 10 letters is made, but that this information is rapidly degraded when further reductions in the alphabet are made.  相似文献   

20.
To understand the folding behavior of proteins is an important and challenging problem in modern molecular biology. In the present investigation, a large number of features representing protein sequences were developed based on sequence autocorrelation weighted by properties of amino acid residues. Genetic algorithm (GA) combined with multiple linear regression (MLR) was employed to select significant features related to protein folding rates, and to build global predictive model. Moreover, local lazy regression (LLR) method was also used to predict the protein folding rates. The obtained results indicated that LLR performed much better than the global MLR model. The important properties of amino acid residues affecting protein folding rates were also analyzed. The results of this study will be helpful to understand the mechanism of protein folding. Our results also demonstrate that the features of amino acid sequence autocorrelation is effective in representing the relationship between protein sequence and folding rates, and the local method is a powerful tool to predict the protein folding rates.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号