首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Armando D. Solis 《Proteins》2015,83(12):2198-2216
To reduce complexity, understand generalized rules of protein folding, and facilitate de novo protein design, the 20‐letter amino acid alphabet is commonly reduced to a smaller alphabet by clustering amino acids based on some measure of similarity. In this work, we seek the optimal alphabet that preserves as much of the structural information found in long‐range (contact) interactions among amino acids in natively‐folded proteins. We employ the Information Maximization Device, based on information theory, to partition the amino acids into well‐defined clusters. Numbering from 2 to 19 groups, these optimal clusters of amino acids, while generated automatically, embody well‐known properties of amino acids such as hydrophobicity/polarity, charge, size, and aromaticity, and are demonstrated to maintain the discriminative power of long‐range interactions with minimal loss of mutual information. Our measurements suggest that reduced alphabets (of less than 10) are able to capture virtually all of the information residing in native contacts and may be sufficient for fold recognition, as demonstrated by extensive threading tests. In an expansive survey of the literature, we observe that alphabets derived from various approaches—including those derived from physicochemical intuition, local structure considerations, and sequence alignments of remote homologs—fare consistently well in preserving contact interaction information, highlighting a convergence in the various factors thought to be relevant to the folding code. Moreover, we find that alphabets commonly used in experimental protein design are nearly optimal and are largely coherent with observations that have arisen in this work. Proteins 2015; 83:2198–2216. © 2015 Wiley Periodicals, Inc.  相似文献   

2.
A number of investigators have addressed the issue of why certain protein structures are especially common by considering structure designability, defined as the number of sequences that would successfully fold into any particular native structure. One such approach, based on foldability, suggested that structures could be classified according to their maximum possible foldability and that this optimal foldability would be highly correlated with structure designability. Other approaches have focused on computing the designability of lattice proteins written with reduced two-letter amino acid alphabets. These different approaches suggested contrasting characteristics of the most designable structures. This report compares the designability of lattice proteins over a wide range of amino acid alphabets and foldability requirements. While all alphabets have a wide distribution of protein designabilities, the form of the distribution depends on how protein "viability" is defined. Furthermore, under increasing foldability requirements, the change in designabilities for all alphabets are in good agreement with the previous conclusions of the foldability approach. Most importantly, it was noticed that those structures that were highly designable for the two-letter amino acid alphabets are not especially designable with higher-letter alphabets.  相似文献   

3.
What are the key building blocks that would have been needed to construct complex protein folds? This is an important issue for understanding protein folding mechanism and guiding de novo protein design. Twenty naturally occurring amino acids and eight secondary structures consist of a 28‐letter alphabet to determine folding kinetics and mechanism. Here we predict folding kinetic rates of proteins from many reduced alphabets. We find that a reduced alphabet of 10 letters achieves good correlation with folding rates, close to the one achieved by full 28‐letter alphabet. Many other reduced alphabets are not significantly correlated to folding rates. The finding suggests that not all amino acids and secondary structures are equally important for protein folding. The foldable sequence of a protein could be designed using at least 10 folding units, which can either promote or inhibit protein folding. Reducing alphabet cardinality without losing key folding kinetic information opens the door to potentially faster machine learning and data mining applications in protein structure prediction, sequence alignment and protein design. Proteins 2015; 83:631–639. © 2015 Wiley Periodicals, Inc.  相似文献   

4.
Similarities and differences between amino acids define the rates at which they substitute for one another within protein sequences and the patterns by which these sequences form protein structures. However, there exist many ways to measure similarity, whether one considers the molecular attributes of individual amino acids, the roles that they play within proteins, or some nuanced contribution of each. One popular approach to representing these relationships is to divide the 20 amino acids of the standard genetic code into groups, thereby forming a simplified amino acid alphabet. Here, we develop a method to compare or combine different simplified alphabets, and apply it to 34 simplified alphabets from the scientific literature. We use this method to show that while different suggestions vary and agree in non-intuitive ways, they combine to reveal a consensus view of amino acid similarity that is clearly rooted in physico-chemistry.  相似文献   

5.
Melo F  Marti-Renom MA 《Proteins》2006,63(4):986-995
Reduced or simplified amino acid alphabets group the 20 naturally occurring amino acids into a smaller number of representative protein residues. To date, several reduced amino acid alphabets have been proposed, which have been derived and optimized by a variety of methods. The resulting reduced amino acid alphabets have been applied to pattern recognition, generation of consensus sequences from multiple alignments, protein folding, and protein structure prediction. In this work, amino acid substitution matrices and statistical potentials were derived based on several reduced amino acid alphabets and their performance assessed in a large benchmark for the tasks of sequence alignment and fold assessment of protein structure models, using as a reference frame the standard alphabet of 20 amino acids. The results showed that a large reduction in the total number of residue types does not necessarily translate into a significant loss of discriminative power for sequence alignment and fold assessment. Therefore, some definitions of a few residue types are able to encode most of the relevant sequence/structure information that is present in the 20 standard amino acids. Based on these results, we suggest that the use of reduced amino acid alphabets may allow to increasing the accuracy of current substitution matrices and statistical potentials for the prediction of protein structure of remote homologs.  相似文献   

6.
Li T  Fan K  Wang J  Wang W 《Protein engineering》2003,16(5):323-330
It is well known that there are some similarities among various naturally occurring amino acids. Thus, the complexity in protein systems could be reduced by sorting these amino acids with similarities into groups and then protein sequences can be simplified by reduced alphabets. This paper discusses how to group similar amino acids and whether there is a minimal amino acid alphabet by which proteins can be folded. Various reduced alphabets are obtained by reserving the maximal information for the simplified protein sequence compared with the parent sequence using global sequence alignment. With these reduced alphabets and simplified similarity matrices, we achieve recognition of the protein fold based on the similarity score of the sequence alignment. The coverage in dataset SCOP40 for various levels of reduction on the amino acid types is obtained, which is the number of homologous pairs detected by program BLAST to the number marked by SCOP40. For the reduced alphabets containing 10 types of amino acids, the ability to detect distantly related folds remains almost at the same level as that by the alphabet of 20 types of amino acids, which implies that 10 types of amino acids may be the degree of freedom for characterizing the complexity in proteins.  相似文献   

7.
Discovering structural correlations in alpha-helices.   总被引:5,自引:2,他引:3       下载免费PDF全文
We have developed a new representation for structural and functional motifs in protein sequences based on correlations between pairs of amino acids and applied it to alpha-helical and beta-sheet sequences. Existing probabilistic methods for representing and analyzing protein sequences have traditionally assumed conditional independence of evidence. In other words, amino acids are assumed to have no effect on each other. However, analyses of protein structures have repeatedly demonstrated the importance of interactions between amino acids in conferring both structure and function. Using Bayesian networks, we are able to model the relationships between amino acids at distinct positions in a protein sequence in addition to the amino acid distributions at each position. We have also developed an automated program for discovering sequence correlations using standard statistical tests and validation techniques. In this paper, we test this program on sequences from secondary structure motifs, namely alpha-helices and beta-sheets. In each case, the correlations our program discovers correspond well with known physical and chemical interactions between amino acids in structures. Furthermore, we show that, using different chemical alphabets for the amino acids, we discover structural relationships based on the same chemical principle used in constructing the alphabet. This new representation of 3-dimensional features in protein motifs, such as those arising from structural or functional constraints on the sequence, can be used to improve sequence analysis tools including pattern analysis and database search.  相似文献   

8.
Screening of functional proteins from a random‐sequence library has been used to evolve novel proteins in the field of evolutionary protein engineering. However, random‐sequence proteins consisting of the 20 natural amino acids tend to aggregate, and the occurrence rate of functional proteins in a random‐sequence library is low. From the viewpoint of the origin of life, it has been proposed that primordial proteins consisted of a limited set of amino acids that could have been abundantly formed early during chemical evolution. We have previously found that members of a random‐sequence protein library constructed with five primitive amino acids show high solubility (Doi et al., Protein Eng Des Sel 2005;18:279–284). Although such a library is expected to be appropriate for finding functional proteins, the functionality may be limited, because they have no positively charged amino acid. Here, we constructed three libraries of 120‐amino acid, random‐sequence proteins using alphabets of 5, 12, and 20 amino acids by preselection using mRNA display (to eliminate sequences containing stop codons and frameshifts) and characterized and compared the structural properties of random‐sequence proteins arbitrarily chosen from these libraries. We found that random‐sequence proteins constructed with the 12‐member alphabet (including five primitive amino acids and positively charged amino acids) have higher solubility than those constructed with the 20‐member alphabet, though other biophysical properties are very similar in the two libraries. Thus, a library of moderate complexity constructed from 12 amino acids may be a more appropriate resource for functional screening than one constructed from 20 amino acids.  相似文献   

9.
Dong Q  Wang X  Lin L  Wang Y 《Proteins》2008,72(1):163-172
In recent years, protein structure prediction using local structure information has made great progress. Many fragment libraries or structure alphabets have been developed. In this study, the entropies and correlations of local structures are first calculated. The results show that neighboring local structures are strongly correlated. Then, a dual-layer model has been designed for protein local structure prediction. The position-specific score matrix, generated by PSI-BLAST, is inputted to the first-layer classifier, whose output is further enhanced by a second-layer classifier. The neural network is selected as the classifier. Two structure alphabets are explored, which are represented in Cartesian coordinate space and in torsion angles space respectively. Testing on the nonredundant dataset shows that the dual-layer model is an efficient method for protein local structure prediction. The Q-scores are 0.456 and 0.585 for the two structure alphabets, which is a significant improvement in comparison with related works.  相似文献   

10.
Weitao Sun  Jing He 《Biopolymers》2010,93(10):904-916
Residue clusters play essential role in stabilizing protein structures in the form of complex networks. We show that the cluster sizes in a native protein follow the log‐normal distribution for a dataset consisting of 424 proteins. To our knowledge, this is the first time of such fitting for the native structures. Based on log‐normal model, the asymptotically increasing mean cluster sizes produce a critical protein chain length of about 200 amino acids, beyond which length most globular proteins have nearly the same mean cluster sizes. This suggests that the larger proteins use a different packing mechanism than the smaller proteins. We confirmed the scale‐free property of the residue contact network for most of the protein structures in the dataset, although the violations were observed for the tightly packed proteins. Residue cluster network wheel (RCNW) is proposed to visualize the relationship between the multiple properties of the residue network such as the cluster size, the residue types and contacts, and the flexibility of the residue. We noticed that the residues with large cluster size have smaller Cα displacement measured using the normal mode analysis. © 2010 Wiley Periodicals, Inc. Biopolymers 93: 904–916, 2010.  相似文献   

11.
The Ara h 2 proteins are major determinants of peanut allergens. These proteins have not been fully studied at the molecular level. It has been previously proposed that there are two isoforms of Ara h 2, based on primary structures that were deduced from two reported cDNA sequences. In this report, four isoforms have been purified and characterized individually. Mass spectrometric methods have been used to determine the protein sequences and to define post‐translational modifications for all four isoforms. Two pairs of isoforms have been identified, corresponding to a long‐chain form and a form that is shorter by 12 amino acids. Each pair is further differentiated by the presence or absence of a two amino acid sequence at the carboxyl terminus of the protein. Modifications that were characterized include site‐specific hydroxylation of proline residues, but no glycosylation was found, in contrast to previous reports.  相似文献   

12.
A new computational approach optimizes searches for reduced protein folding alphabets that use fewer than 20 types of amino acids. The predicted optimal five-letter alphabet happens to be in agreement with the suggestive results of a recent experiment, but whether highly reduced alphabets are sufficient for truly protein-like properties remains an open experimental question.  相似文献   

13.
Receptor‐like kinases (RLKs) represent the largest group of cell surface receptors in plants. The monophyletic leucine‐rich repeat (LRR)‐RLK subfamily II is considered to contain the somatic embryogenesis receptor kinases (SERKs) and NSP‐interacting kinases known to be involved in developmental processes and cellular immunity in plants. There are only a few published studies on the phylogenetics of LRR‐RLKII; unfortunately these suffer from poor taxon/gene sampling. Hence, it is not clear how many and what main clades this family contains, let alone what structure–function relationships exist. We used 1342 protein sequences annotated as ‘SERK’ and ‘SERK‐like’ plus related sequences in order to estimate phylogeny within the LRR‐RLKII clade, using the nematode protein kinase Pelle as an outgroup. We reconstruct five main clades (LRR‐RLKII 1–5), in each of which the main pattern of land plant relationships re‐occurs, confirming previous hypotheses that duplication events happened in this gene subfamily prior to divergence among land plant lineages. We show that domain structures and intron–exon boundaries within the five clades are well conserved in evolution. Furthermore, phylogenetic patterns based on the separate LRR and kinase parts of LRR‐RLKs are incongruent: whereas the LRR part supports a LRR‐RLKII 2/3 sister group relationship, the kinase part supports clades 1/2. We infer that the kinase part includes few ‘radical’ amino acid changes compared with the LRR part. Finally, our results confirm that amino acids involved in each LRR‐RLKII–receptor complex interaction are located at N‐capping residues, and that the short amino acid motifs of this interaction domain are highly conserved throughout evolution within the five LRR‐RLKII clades.  相似文献   

14.
Proline is an amino acid with a unique cyclic structure that facilitates the folding of many proteins, but also impedes the rate of peptide bond formation by the ribosome. As a ribosome substrate, proline reacts markedly slower when compared with other amino acids both as a donor and as an acceptor of the nascent peptide. Furthermore, synthesis of peptides with consecutive proline residues triggers ribosome stalling. Here, we report crystal structures of the eukaryotic ribosome bound to analogs of mono‐ and diprolyl‐tRNAs. These structures provide a high‐resolution insight into unique properties of proline as a ribosome substrate. They show that the cyclic structure of proline residue prevents proline positioning in the amino acid binding pocket and affects the nascent peptide chain position in the ribosomal peptide exit tunnel. These observations extend current knowledge of the protein synthesis mechanism. They also revise an old dogma that amino acids bind the ribosomal active site in a uniform way by showing that proline has a binding mode distinct from other amino acids.  相似文献   

15.
The interactions of Met and Cys with other amino acid side chains have received little attention, in contrast to aromatic–aromatic, aromatic–aliphatic or/and aliphatic–aliphatic interactions. Precisely, these are the only amino acids that contain a sulfur atom, which is highly polarizable and, thus, likely to participate in strong Van der Waals interactions. Analysis of the interactions present in membrane protein crystal structures, together with the characterization of their strength in small‐molecule model systems at the ab‐initio level, predicts that Met–Met interactions are stronger than Met–Cys ≈ Met–Phe ≈ Cys–Phe interactions, stronger than Phe–Phe ≈ Phe–Leu interactions, stronger than the Met–Leu interaction, and stronger than Leu–Leu ≈ Cys–Leu interactions. These results show that sulfur‐containing amino acids form stronger interactions than aromatic or aliphatic amino acids. Thus, these amino acids may provide additional driving forces for maintaining the 3D structure of membrane proteins and may provide functional specificity.  相似文献   

16.
Due to the complexity of Plasmodium falciparumis genome, predicting secretory proteins of P. falciparum is more difficult than other species. In this study, based on the measure of diversity definition, a new K-nearest neighbor method, K-minimum increment of diversity (K-MID), is introduced to predict secretory proteins. The prediction performance of the K-MID by using amino acids composition as the only input vector achieves 88.89% accuracy with 0.78 Mathew’s correlation coefficient (MCC). Further, the several reduced amino acids alphabets are applied to predict secretory proteins and the results show that the prediction results are improved to 90.67% accuracy with 0.83 MCC by using the 169 dipeptide compositions of the reduced amino acids alphabets obtained from Protein Blocks method.  相似文献   

17.
Accurately estimating probabilities from observations is important for probabilistic-based approaches to problems in computational biology. In this paper we present a biologically-motivated method for estimating probability distributions over discrete alphabets from observations using a mixture model of common ancestors. The method is an extension of substitution matrix-based probability estimation methods. In contrast to previous such methods, our method has a simple Bayesian interpretation and has the advantage over Dirichlet mixtures that it is both effective and simple to compute for large alphabets. The method is applied to estimate amino acid probabilities based on observed counts in an alignment and is shown to perform comparably to previous methods. The method is also applied to estimate probability distributions over protein families and improves protein classification accuracy.  相似文献   

18.
19.
Mehdi Mirzaie 《Proteins》2018,86(4):467-474
Evaluation of protein structures needs a trustworthy potential function. Although several knowledge‐based potential functions exist, the impact of different types of amino acids in the scoring functions has not been studied yet. Previously, we have reported the importance of nonlocal interactions in scoring function (based on Delaunay tessellation) in discrimination of native structures. Then, we have questioned the structural impact of hydrophobic amino acids in protein fold recognition. Therefore, a Hydrophobic Reduced Model (HRM) was designed to reduce protein structure of FS (Full Structure) into RS (Reduced Structure). RS is considered as a reduced structure of only seven hydrophobic amino acids (L, V, F, I, A, W, Y) and all their interactions. The presented model was evaluated via four different performance metrics including the number of correctly identified natives, the Z‐score of the native energy, the RMSD of the minimum score, and the Pearson correlation coefficient between the energy and the model quality. Results indicated that only nonlocal interactions between hydrophobic amino acids could be sufficient and accurate enough for protein fold recognition. Interestingly, the results of HRM is significantly close to the model that considers all amino acids (20‐amino acid model) to discriminate the native structure of the proteins on eleven decoy sets. This indicates that the power of knowledge‐based potential functions in protein fold recognition is mostly due to hydrophobic interactions. Hence, we suggest combining a different well‐designed scoring function for non‐hydrophobic interactions with HRM to achieve better performance in fold recognition.  相似文献   

20.
The large number of macromolecular structures deposited with the Protein Data Bank (PDB) describing complexes between proteins and either physiological compounds or synthetic drugs made it possible a systematic analysis of the interactions occurring between proteins and their ligands. In this work, the binding pockets of about 4000 PDB protein‐ligand complexes were investigated and amino acid and interaction types were analyzed. The residues observed with lowest frequency in protein sequences, Trp, His, Met, Tyr, and Phe, turned out to be the most abundant in binding pockets. Significant differences between drug‐like and physiological compounds were found. On average, physiological compounds establish with respect to drugs about twice as many hydrogen bonds with protein atoms, whereas drugs rely more on hydrophobic interactions to establish target selectivity. The large number of PDB structures describing homologous proteins in complex with the same ligand made it possible to analyze the conservation of binding pocket residues among homologous protein structures bound to the same ligand, showing that Gly, Glu, Arg, Asp, His, and Thr are more conserved than other amino acids. Also in the cases in which the same ligand is bound to unrelated proteins, the binding pockets showed significant conservation in the residue types. In this case, the probability of co‐occurrence of the same amino acid type in the binding pockets could be up to thirteen times higher than that expected on a random basis. The trends identified in this study may provide an useful guideline in the process of drug design and lead optimization. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号