首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Proteins are generally classified into four structural classes: all-alpha proteins, all-beta proteins, alpha + beta proteins, and alpha/beta proteins. In this article, a protein is expressed as a vector of 20-dimensional space, in which its 20 components are defined by the composition of its 20 amino acids. Based on this, a new method, the so-called maximum component coefficient method, is proposed for predicting the structural class of a protein according to its amino acid composition. In comparison with the existing methods, the new method yields a higher general accuracy of prediction. Especially for the all-alpha proteins, the rate of correct prediction obtained by the new method is much higher than that by any of the existing methods. For instance, for the 19 all-alpha proteins investigated previously by P.Y. Chou, the rate of correct prediction by means of his method was 84.2%, but the correct rate when predicted with the new method would be 100%! Furthermore, the new method is characterized by an explicable physical picture. This is reflected by the process in which the vector representing a protein to be predicted is decomposed into four component vectors, each of which corresponds to one of the norms of the four protein structural classes.  相似文献   

2.
A protein is usually classified into one of the following four structural classes: all alpha, all beta, (alpha + beta) and alpha/beta. In this paper, based on the maximum correlation-coefficient principle, a new formulation is proposed for predicting the structural class of a protein according to its amino acid composition. Calculations have been made for a development set of proteins from which the amino acid compositions for the standard structural classes were derived, and an independent set of proteins which are outside the development set. The former can test the self consistency of a method and the latter can test its extrapolating effectiveness. In both cases, the results showed that the new method gave a considerably higher rate of correct prediction than any of the previous methods, implying that a significant improvement has been achieved by implementing the maximum-correlation-coefficient principle in the new method.  相似文献   

3.
Wang ZX  Yuan Z 《Proteins》2000,38(2):165-175
Proteins of known structures are usually classified into four structural classes: all-alpha, all-beta, alpha+beta, and alpha/beta type of proteins. A number of methods to predicting the structural class of a protein based on its amino acid composition have been developed during the past few years. Recently, a component-coupled method was developed for predicting protein structural class according to amino acid composition. This method is based on the least Mahalanobis distance principle, and yields much better predicted results in comparison with the previous methods. However, the success rates reported for structural class prediction by different investigators are contradictory. The highest reported accuracies by this method are near 100%, but the lowest one is only about 60%. The goal of this study is to resolve this paradox and to determine the possible upper limit of prediction rate for structural classes. In this paper, based on the normality assumption and the Bayes decision rule for minimum error, a new method is proposed for predicting the structural class of a protein according to its amino acid composition. The detailed theoretical analysis indicates that if the four protein folding classes are governed by the normal distributions, the present method will yield the optimum predictive result in a statistical sense. A non-redundant data set of 1,189 protein domains is used to evaluate the performance of the new method. Our results demonstrate that 60% correctness is the upper limit for a 4-type class prediction from amino acid composition alone for an unknown query protein. The apparent relatively high accuracy level (more than 90%) attained in the previous studies was due to the preselection of test sets, which may not be adequately representative of all unrelated proteins.  相似文献   

4.
The folding type of a protein is relevant to the amino acid composition   总被引:36,自引:0,他引:36  
The folding types of 135 proteins, the three-dimensional structures of which are known, were analyzed in terms of the amino acid composition. The amino acid composition of a protein was expressed as a point in a multidimensional space spanned with 20 axes, on which the corresponding contents of 20 amino acids in the protein were represented. The distribution pattern of proteins in this composition space was examined in relation to five folding types, alpha, beta, alpha/beta, alpha + beta, and irregular type. The results show that amino acid compositions of the alpha, beta, and alpha/beta types are located in different regions in the composition space, thus allowing distinct separation of proteins depending on the folding types. The points representing proteins of the alpha + beta and irregular types, however, are widely scattered in the space, and the existing regions overlap with those of the other folding types. A simple method of utilizing the "distance" in the space was found to be convenient for classification of proteins into the five folding types. The assignment of the folding type with this method gave an accuracy of 70% in the coincidence with the experimental data.  相似文献   

5.
We present a new method for predicting the secondary structure of globular proteins based on non-linear neural network models. Network models learn from existing protein structures how to predict the secondary structure of local sequences of amino acids. The average success rate of our method on a testing set of proteins non-homologous with the corresponding training set was 64.3% on three types of secondary structure (alpha-helix, beta-sheet, and coil), with correlation coefficients of C alpha = 0.41, C beta = 0.31 and Ccoil = 0.41. These quality indices are all higher than those of previous methods. The prediction accuracy for the first 25 residues of the N-terminal sequence was significantly better. We conclude from computational experiments on real and artificial structures that no method based solely on local information in the protein sequence is likely to produce significantly better results for non-homologous proteins. The performance of our method of homologous proteins is much better than for non-homologous proteins, but is not as good as simply assuming that homologous sequences have identical structures.  相似文献   

6.
A new method is proposed for predicting the folding type of a protein according to its amino acid composition based on the following physical picture: (1) a protein is characterized as a vector of 20-dimensional space, in which its 20 components are defined by the compositions of its 20 amino acids; and (2) the similarity of two proteins is proportional to the mutual projection of their characterized vectors, and hence inversely proportional to the size of their correlation angle. Thus, the prediction is performed by calculating the correlation angles of the vector for the predicted protein with a set of standard vectors representing the norms of four protein folding types (i.e., alla, all ,a+, anda/). In comparison with the existing methods, the new method has the merits of yielding a higher rate of correct prediction, displaying a more intuitive physical picture, and being convenient in application. For instance, in predicting the 64 proteins in the development set based on which the standard vectors are derived, the average accuracy rate is 83.6%, which is higher than that obtained for the same set of proteins by any of the existing methods. The average accuracy predicted for an independent set of 35 proteins of known X-ray structure is 91.4%, which is significantly higher than any of the reported accuracies so far, implying that the new method is of great value in practical application. All of these have demonstrated that the new method as proposed in this paper is characterized by an improved feature in both self-consistency and extrapolating-effectiveness.On sabbatical leave from Department of Physics, Tianjin University, Tianjin, China.  相似文献   

7.
Deciphering the native conformation of proteins from their amino acid sequences is one of the most challenging problems in molecular biology. Information on the secondary structure of a protein can be helpful in understanding its native folded state. In our earlier work on molecular chaperones, we have analyzed the hydrophobic and charged patches, short-, medium- and long-range contacts and residue distributions along the sequence. In this article, we have made an attempt to predict the structural class of globular and chaperone proteins based on the information obtained from residue distributions. This method predicts the structural class with an accuracy of 93 and 96%, respectively, for the four- and three-state models in a training set of 120 globular proteins, and 90 and 96%, respectively, for a test set of 80 proteins. We have used this information and methodology to predict the structural classes of chaperones. Interestingly most of the chaperone proteins are predicted under alpha/beta or mixed folding type.  相似文献   

8.
The sequences of two Drosophila and one rabbit protein phosphatase (PP) 1 catalytic subunits were determined from their cDNA. The sequence of Drosophila PP1 alpha 1 was deduced from a 2.2-kb cDNA purified from an embryonic cDNA library, while that for Drosophila PP1 beta was obtained from overlapping clones isolated from both a head cDNA library and an eye imaginal disc cDNA library. The gene for Drosophila PP1 alpha 1 is at 96A2-5 on chromosome 3 and encodes a protein of 327 amino acids with a calculated molecular mass of 37.3 kDa. The gene for Drosophila PP1 beta is localized at 9C1-2 on the X chromosome and encodes a protein of 330 amino acids with a predicted molecular mass of 37.8 kDa. PP1 alpha 1 shows 96% amino acid sequence identity to PP1 alpha 2 (302 amino acids), an isoform whose gene is located in the 87B6-12 region of chromosome 3 [Dombrádi, V., Axton, J. M., Glover, D.M. Cohen, P.T.W. (1989) Eur. J. Biochem. 183, 603-610]. PP1 beta shows 85% identity to PP1 alpha 1 and PP1 alpha 2 over the 302 homologous amino acids. These results demonstrate that at least three genes are present in Drosophila that encode different isoforms of PP1. Drosophila PP1 alpha 1 and PP1 beta show 89% amino acid sequence identity to rabbit PP1 alpha (330 amino acids) [Cohen, P.T.W. (1988) FEBS Lett. 232, 17-23] and PP1 beta (327 amino acids), respectively, demonstrating that the structures of both isoforms are among the most conserved proteins known throughout the evolution of the animal kingdom. The presence of characteristic structural differences between PP1 alpha and PP1 beta, which have been preserved from insects to mammals, implies that the alpha and beta isoforms may have distinct biological functions.  相似文献   

9.
The 60S ribosomal subunits from Saccharomyces cerevisiae contain a set of four acidic proteins named YP1alpha, YP1beta, YP2alpha, and YP2beta. The genes for each were PCR amplified from a yeast cDNA library, sequenced, and expressed in Escherichia coli cells using two expression systems. The first system, pLM1, was used for YP1beta, YP2alpha, and YP2beta. The second one, pT7-7, was used for YP1alpha. Expression in both cases was under the control of a strong inducible T7 promoter. The amount of induced recombinant proteins in the host cells was around 10 to 20% of the total soluble bacterial proteins. A new protocol for purification of all four recombinant proteins was established. The preliminary steps of purification were done by ammonium sulfate precipitation (YP1alpha, YP1beta) or NH4Cl/ethanol extraction (YP2alpha, YP2beta). The recombinant proteins were then purified to apparent homogeneity by only two steps of classical chromatographies, ion exchange (DEAE-cellulose) and gel filtration (Sephacryl S-200). Isoelectrofocusing analysis of YP2alpha and YP2beta showed the pIs of the recombinant proteins are the same as that of the native yeast ribosomal P2 proteins. The pI of YP1alpha is changed due to the addition of five amino acids attached to the N-terminus of recombinant polypeptide from the expression vector. YP1beta was obtained as a truncated form of polypeptide, similar to its ribosomal counterpart, YP1beta'. This was proved by isoelectrofocusing gel analysis.  相似文献   

10.
用离散量预测蛋白质的结构型   总被引:14,自引:2,他引:12  
基于蛋白质的结构类型决定了它的二级结构序列的概念,用二级结构序列参数Nα,Nβ,Nβaβ,N(βαβ)构成离散源,并计算离散量D(Xα),D(Xβ),D(Xα+β),利用离散增量预测蛋白质的结构类型,它是由这个蛋白质的离散量D(Xn)与四个标准离散D(Xα),D(Xβ),D(Xα/β),D(Xα+β)之间离散增量的最小值所决定的,预测结果表明,准确率分别达到84.8%(标准集)和83.3%(检验集)。  相似文献   

11.
Prediction of protein structural class by discriminant analysis   总被引:7,自引:0,他引:7  
Protein structural class--alpha, beta, mixed (alpha/beta or alpha + beta), irregular--can be predicted from the amino acid sequence by discriminant analysis. Discrimination is based on distributions, in the classes, of vectors of attributes characterizing the sequences. In this paper, two sets of attributes and two methods of estimating their distributions are compared using more than 100 proteins from the Protein Data Bank. The best results were obtained when canonical variates of the frequencies of occurrence of 20 amino acids and non-parametric estimates of their distributions were used. Three variates are sufficient to allocate proteins to one of four classes with 83% reliability (estimated by cross-validation) and four variates allowed allocation to one of five classes with 78% reliability.  相似文献   

12.
The model of formation of alpha-helices and beta-structures determined by joint action of the three elements: N-terminal, internal and C-terminal fragments are presented. Algorithm for calculation of their localization in a given amino acid sequence was constructed on the base of this model. The preference of the fragments of the amino acid sequence to a definite type of the secondary structure was estimated on the base of corresponding average values of linear discriminant functions dsk (s = alpha, beta, k = N, in, C). The latter were constructed in the previous paper on the base of the revealed significant characteristics. These integral characteristics are used for calculating the localisation of discrete secondary structures. The total prediction for 3 states (alpha, beta, c) given 71% correctly predicted residues (for 4 states alpha, beta, c, t) 62% for the training set, consisting of 72 proteins. For the control set (15 proteins) the accuracy of prediction is about 65%. The essential advantages of this method are: 1) the possibility to localize the discrete secondary structures; 2) the high accuracy of prediction of long secondary structures (for alpha-helices approximately 90%, for beta-structures approximately 80%), which is important for the determination of the protein folding. The influence of mutation on the secondary structure of proteins was investigated. The anormally high stability of the secondary structures of immunoglobulins to mutations was revealed. This probably results from the selection during evolution of such variants of amino acid sequences, which are able to provide the functional variability of antigenic determinants, but keep invariant the tertially structure of protein.  相似文献   

13.
Adducin is a membrane-skeletal protein which is a candidate to promote assembly of a spectrin-actin network in erythrocytes and at sites of cell-cell contact in epithelial tissues. The complete sequence of both subunits of human adducin, alpha (737 amino acids), and beta (726 amino acids) has been deduced by analysis of the cDNAs. The two subunits have strikingly conserved amino acid sequences with 49% identity and 66% similarity, suggesting evolution by gene duplication. Each adducin subunit has three distinct domains: a 39-kD NH2-terminal globular protease-resistant domain, connected by a 9-kD domain to a 33-kD COOH- terminal protease-sensitive tail comprised almost entirely of hydrophilic amino acids. The tail is responsible for the high frictional ratio of adducin noted previously, and was visualized by EM. The head domains of both adducin subunits exhibit a limited sequence similarity with the NH2-terminal actin-binding motif present in members of the spectrin superfamily and actin gelation proteins. The COOH- termini of both subunits contain an identical, highly basic stretch of 22 amino acids with sequence similarity to the MARCKS protein. Predicted sites of phosphorylation by protein kinase C include the COOH- terminus and sites at the junction of the head and tail. Northern blot analysis of mRNA from rat tissues, K562 erythroleukemia cells and reticulocytes has shown that alpha adducin is expressed in all the tissues tested as a single message size of 4 kb. In contrast, beta adducin shows tissue specific variability in size of mRNA and level of expression. A striking divergence between alpha and beta mRNAs was noted in reticulocytes, where alpha adducin mRNA is present in at least 20-fold higher levels than that of beta adducin. The beta subunit thus is a candidate to perform a limiting role in assembly of functional adducin molecules.  相似文献   

14.
Ofran Y  Margalit H 《Proteins》2006,64(1):275-279
It is well established that there is a relationship between the amino acid composition of a protein and its structural class (i.e., alpha, beta, alpha + beta, or alpha/beta). Several studies have even shown the power of amino acid composition in predicting the secondary structure class of a protein. Herein, we show that significant similarity in amino acid composition exists not only between proteins of the same class, but even between proteins of the same fold. To test conjectural explanations for this phenomenon, we analyzed a set of structurally similar proteins that are dissimilar in sequence. Based on this analysis, we suggest that specific residues that are involved in intramolecular interactions may account for this surprising relationship between composition and structure.  相似文献   

15.
The environmental preference for the occurrence of noncanonical hydrogen bonding and cation-pi interactions, in a data set containing 71 nonredundant (alpha/beta)(8) barrel proteins, with respect to amino acid type, secondary structure, solvent accessibility, and stabilizing residues has been performed. Our analysis reveals some important findings, which include (a) higher contribution of weak interactions mediated by main-chain atoms irrespective of the amino acids involved; (b) domination of the aromatic amino acids among interactions involving side-chain atoms; (c) involvement of strands as the principal secondary structural unit, accommodating cross strand ion pair interaction and clustering of aromatic amino acid residues; (d) significant contribution to weak interactions occur in the solvent exposed areas of the protein; (e) majority of the interactions involve long-range contacts; (f) the preference of Arg is higher than Lys to form cation-pi interaction; and (g) probability of theoretically predicted stabilizing amino acid residues involved in weak interaction is higher for polar amino acids such as Trp, Glu, and Gln. On the whole, the present study reveals that the weak interactions contribute to the global stability of (alpha/beta)(8) TIM-barrel proteins in an environment-specific manner, which can possibly be exploited for protein engineering applications.  相似文献   

16.
The bulk hydrophobic character for the 20 natural amino acid residues, has been obtained from a database of 60 protein structures, grouped in the four structural classes alpha alpha, beta beta, alpha + beta and alpha/beta. The hydrophobicity coefficients thus obtained are compared with Ponnuswamy's original values using scales normalized to average = 0.0 and standard deviation = 1.0. Even though most of the amino acid residues do not change their hydropathic character in the different structural classes, their behaviour suggests the convenience that averaging methods should only consider proteins of the same structural class and that this information should be included in the secondary structure methods.  相似文献   

17.
18.
Hering JA  Innocent PR  Haris PI 《Proteomics》2003,3(8):1464-1475
Fourier transform infrared (FTIR) spectroscopy is a very flexible technique for characterization of protein secondary structure. Measurements can be carried out rapidly in a number of different environments based on only small quantities of proteins. For this technique to become more widely used for protein secondary structure characterization, however, further developments in methods to accurately quantify protein secondary structure are necessary. Here we propose a structural classification of proteins (SCOP) class specialized neural networks architecture combining an adaptive neuro-fuzzy inference system (ANFIS) with SCOP class specialized backpropagation neural networks for improved protein secondary structure prediction. Our study shows that proteins can be accurately classified into two main classes "all alpha proteins" and "all beta proteins" merely based on the amide I band maximum position of their FTIR spectra. ANFIS is employed to perform the classification task to demonstrate the potential of this architecture with moderately complex problems. Based on studies using a reference set of 17 proteins and an evaluation set of 4 proteins, improved predictions were achieved compared to a conventional neural network approach, where structure specialized neural networks are trained based on protein spectra of both "all alpha" and "all beta" proteins. The standard errors of prediction (SEPs) in % structure were improved by 4.05% for helix structure, by 5.91% for sheet structure, by 2.68% for turn structure, and by 2.15% for bend structure. For other structure, an increase of SEP by 2.43% was observed. Those results were confirmed by a "leave-one-out" run with the combined set of 21 FTIR spectra of proteins.  相似文献   

19.
Based on the 210 non-homologous proteins (domains) classified manually by Michie et al. (J. Mol. Biol. 262, 168-185, 1996), a new structure classification criterion of globular proteins relying on the content of helix/strand has been proposed, using a quadratic discriminant method. Each protein is classified into one of the three classes, i.e. those of alpha class, beta class and alphabeta class (including alpha/beta and alpha+beta classes). According to the new structure classification criterion, of the 210 proteins in the training set, 207 are correctly classified and thus the accuracy is 207/210=98.57%. Multiple cross-validation tests are performed. The jackknife test shows that of the 210 proteins 207 are correctly classified with an accuracy of 98.57%. To test the method further, of 3577 proteins (domains) extracted from SCOP, 91.39% of them are correctly reclassified by the new classification criterion. On average, the accuracy of the new criterion is about 8 percentage points higher than that of the criterion proposed by Nakashima et al. (J. Biochem. 99, 153-162, 1986). Our result shows that the classification based solely on structures is basically consistent with that combining both structural and evolutionary information. Further complete automated classification scheme should consider both structures and evolutionary relationship. The methodology presented provides an appropriate mathematical format to reach this goal.  相似文献   

20.
Screening of functional proteins from a random‐sequence library has been used to evolve novel proteins in the field of evolutionary protein engineering. However, random‐sequence proteins consisting of the 20 natural amino acids tend to aggregate, and the occurrence rate of functional proteins in a random‐sequence library is low. From the viewpoint of the origin of life, it has been proposed that primordial proteins consisted of a limited set of amino acids that could have been abundantly formed early during chemical evolution. We have previously found that members of a random‐sequence protein library constructed with five primitive amino acids show high solubility (Doi et al., Protein Eng Des Sel 2005;18:279–284). Although such a library is expected to be appropriate for finding functional proteins, the functionality may be limited, because they have no positively charged amino acid. Here, we constructed three libraries of 120‐amino acid, random‐sequence proteins using alphabets of 5, 12, and 20 amino acids by preselection using mRNA display (to eliminate sequences containing stop codons and frameshifts) and characterized and compared the structural properties of random‐sequence proteins arbitrarily chosen from these libraries. We found that random‐sequence proteins constructed with the 12‐member alphabet (including five primitive amino acids and positively charged amino acids) have higher solubility than those constructed with the 20‐member alphabet, though other biophysical properties are very similar in the two libraries. Thus, a library of moderate complexity constructed from 12 amino acids may be a more appropriate resource for functional screening than one constructed from 20 amino acids.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号