共查询到20条相似文献,搜索用时 15 毫秒
1.
Using Chou's amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes 总被引:2,自引:1,他引:1
With the rapid increment of protein sequence data, it is indispensable to develop automated and reliable predictive methods for protein function annotation. One approach for facilitating protein function prediction is to classify proteins into functional families from primary sequence. Being the most important group of all proteins, the accurate prediction for enzyme family classes and subfamily classes is closely related to their biological functions. In this paper, for the prediction of enzyme subfamily classes, the Chou's amphiphilic pseudo-amino acid composition [Chou, K.C., 2005. Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21, 10-19] has been adopted to represent the protein samples for training the 'one-versus-rest' support vector machine. As a demonstration, the jackknife test was performed on the dataset that contains 2640 oxidoreductase sequences classified into 16 subfamily classes [Chou, K.C., Elrod, D.W., 2003. Prediction of enzyme family classes. J. Proteome Res. 2, 183-190]. The overall accuracy thus obtained was 80.87%. The significant enhancement in the accuracy indicates that the current method might play a complementary role to the exiting methods. 相似文献
2.
Cell membranes are vitally important to the life of a cell. Although the basic structure of biological membrane is provided by the lipid bilayer, membrane proteins perform most of the specific functions. Membrane proteins are putatively classified into five different types. Identification of their types is currently an important topic in bioinformatics and proteomics. In this paper, based on the concept of representing protein samples in terms of their pseudo-amino acid composition (Chou, K.C., 2001. Prediction of protein cellular attributes using pseudo amino acid composition. Proteins: Struct. Funct. Genet. 43, 246-255), the fuzzy K-nearest neighbors (KNN) algorithm has been introduced to predict membrane protein types, and high success rates were observed. It is anticipated that, the current approach, which is based on a branch of fuzzy mathematics and represents a new strategy, may play an important complementary role to the existing methods in this area. The novel approach may also have notable impact on prediction of the other attributes, such as protein structural class, protein subcellular localization, and enzyme family class, among many others. 相似文献
3.
Support vector machines for predicting membrane protein types by using functional domain composition 总被引:9,自引:0,他引:9
下载免费PDF全文
![点击此处可从《Biophysical journal》网站下载免费的PDF全文](/ch/ext_images/free.gif)
Membrane proteins are generally classified into the following five types: 1), type I membrane protein; 2), type II membrane protein; 3), multipass transmembrane proteins; 4), lipid chain-anchored membrane proteins; and 5), GPI-anchored membrane proteins. In this article, based on the concept of using the functional domain composition to define a protein, the Support Vector Machine algorithm is developed for predicting the membrane protein type. High success rates are obtained by both the self-consistency and jackknife tests. The current approach, complemented with the powerful covariant discriminant algorithm based on the pseudo-amino acid composition that has incorporated quasi-sequence-order effect as recently proposed by K. C. Chou (2001), may become a very useful high-throughput tool in the area of bioinformatics and proteomics. 相似文献
4.
Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo-amino acid composition 总被引:3,自引:0,他引:3
Apoptosis proteins are very important for understanding the mechanism of programmed cell death. The apoptosis protein localization can provide valuable information about its molecular function. The prediction of localization of an apoptosis protein is a challenging task. In our previous work we proposed an increment of diversity (ID) method using protein sequence information for this prediction task. In this work, based on the concept of Chou's pseudo-amino acid composition [Chou, K.C., 2001. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins: Struct. Funct. Genet. (Erratum: Chou, K.C., 2001, vol. 44, 60) 43, 246-255, Chou, K.C., 2005. Using amphiphilic pseudo-amino acid composition to predict enzyme subfamily classes. Bioinformatics 21, 10-19], a different pseudo-amino acid composition by using the hydropathy distribution information is introduced. A novel ID_SVM algorithm combined ID with support vector machine (SVM) is proposed. This method is applied to three data sets (317 apoptosis proteins, 225 apoptosis proteins and 98 apoptosis proteins). The higher predictive success rates than the previous algorithms are obtained by the jackknife tests. 相似文献
5.
A novel approach was developed for predicting the structural classes of proteins based on their sequences. It was assumed that proteins belonging to the same structural class must bear some sort of similar texture on the images generated by the cellular automaton evolving rule [Wolfram, S., 1984. Cellular automation as models of complexity. Nature 311, 419-424]. Based on this, two geometric invariant moment factors derived from the image functions were used as the pseudo amino acid components [Chou, K.C., 2001. Prediction of protein cellular attributes using pseudo amino acid composition. Proteins: Struct., Funct., Genet. (Erratum: ibid., 2001, vol. 44, 60) 43, 246-255] to formulate the protein samples for statistical prediction. The success rates thus obtained on a previously constructed benchmark dataset are quite promising, implying that the cellular automaton image can help to reveal some inherent and subtle features deeply hidden in a pile of long and complicated amino acid sequences. 相似文献
6.
Membrane proteins are generally classified into the following five types: (1) type I membrane protein, (2) type II membrane protein, (3) multipass transmembrane proteins, (4) lipid chain-anchored membrane proteins, and (5) GPI-anchored membrane proteins. Given the sequence of an uncharacterized membrane protein, how can we identify which one of the above five types it belongs to? This is important because the biological function of a membrane protein is closely correlated with its type. Particularly, with the explosion of protein sequences entering into databanks, it is in high demand to develop an automated method to address this problem. To realize this, the key is to catch the statistical characteristics for each of the five types. However, it is not easy because they are buried in a pile of long and complicated sequences. In this paper, based on the concept of the pseudo amino acid composition (Chou, K. C. (2001). PROTEINS: Structure, Function, and Genetics 43: 246–255), the technique of Fourier spectrum analysis is introduced. By doing so, the sample of a protein is represented by a set of discrete components that can incorporate a considerable amount of the sequence order effects as well as its amino acid composition information. On the basis of such a statistical frame, the support vector machine (SVM) is introduced to perform predictions. High success rates were yielded by the self-consistency test, jackknife test, and independent dataset test, suggesting that the current approach holds a promising potential to become a high throughput tool for membrane protein type prediction as well as other related areas. 相似文献
7.
Summary. Recent advances in large-scale genome sequencing have led to the rapid accumulation of amino acid sequences of proteins whose functions are unknown. Because the functions of these proteins are closely correlated with their subcellular localizations, it is vitally important to develop an automated method as a high-throughput tool to timely identify their subcellular location. Based on the concept of the pseudo amino acid composition by which a considerable amount of sequence-order effects can be incorporated into a set of discrete numbers (Chou, K. C., Proteins: Structure, Function, and Genetics, 2001, 43: 246–255), the complexity measure approach is introduced. The advantage by incorporating the complexity measure factor as one of the pseudo amino acid components for a protein is that it can more effectively reflect its overall sequence-order feature than the conventional correlation factors. With such a formulation frame to represent the samples of protein sequences, the covariant-discriminant predictor (Chou, K. C. and Elrod, D. W., Protein Engineering, 1999, 12: 107–118) was adopted to conduct prediction. High success rates were obtained by both the jackknife cross-validation test and independent dataset test, suggesting that introduction of the concept of the complexity measure into prediction of protein subcellular location is quite promising, and might also hold a great potential as a useful vehicle for the other areas of molecular biology. 相似文献
8.
Cell membranes are vitally important to living cells. Although the infrastructure of biological membrane is provided by the lipid bilayer, membrane proteins perform most of the specific functions. Knowledge of membrane protein types often provides crucial hints toward determining the function of an uncharacterized membrane protein. With the avalanche of new protein sequences generated in the post-genomic era, it is highly demanded to develop a high throughput tool in identifying the type of newly found membrane proteins according to their primary sequences, so as to timely annotate them for reference usage in both basic research and drug discovery. To realize this, the key is to establish a powerful identifier that can catch their characteristic sequence patterns for different membrane protein types. However, it is not easy because they are buried in a pile of long and complicated sequences. In this paper, based on the concept of the pseudo-amino acid composition [K.C. Chou, PROTEINS: Struct., Funct., Genet. 43 (2001) 246-255], the low-frequency Fourier spectrum analysis is introduced. The merits by doing so are that the sequence pattern information can be more effectively incorporated into a set of discrete components, and that all the existing prediction algorithms can be straightforwardly used on such a formulation for protein samples. High success rates were observed by the re-substitution test, jackknife test, and independent dataset test, indicating that the low-frequency Fourier spectrum approach may become a very useful tool for membrane protein type prediction. The novel approach also holds a high potential for predicting many other attributes of proteins. 相似文献
9.
The membrane protein type is an important feature in characterizing the overall topological folding type of a protein or its domains therein. Many investigators have put their efforts to the prediction of membrane protein type. Here, we propose a new approach, the bootstrap aggregating method or bragging learner, to address this problem based on the protein amino acid composition. As a demonstration, the benchmark dataset constructed by K.C. Chou and D.W. Elrod was used to test the new method. The overall success rate thus obtained by jackknife cross-validation was over 84%, indicating that the bragging learner as presented in this paper holds a quite high potential in predicting the attributes of proteins, or at least can play a complementary role to many existing algorithms in this area. It is anticipated that the prediction quality can be further enhanced if the pseudo amino acid composition can be effectively incorporated into the current predictor. An online membrane protein type prediction web server developed in our lab is available at http://chemdata.shu.edu.cn/protein/protein.jsp. 相似文献
10.
11.
In this paper, based on the approach by combining the "functional domain composition" [K.C. Chou, Y. D. Cai, J. Biol. Chem. 277 (2002) 45765] and the pseudo-amino acid composition [K.C. Chou, Proteins Struct. Funct. Genet. 43 (2001) 246; Correction Proteins Struct. Funct. Genet. 2044 (2001) 2060], the Nearest Neighbour Algorithm (NNA) was developed for predicting the protein subcellular location. Very high success rates were observed, suggesting that such a hybrid approach may become a useful high-throughput tool in the area of bioinformatics and proteomics. 相似文献
12.
13.
Membrane transporters catalyze the transport of small solute molecules across biological barriers such as lipid bilayer membranes. As the experimental annotation of which proteins transport which substrates is incomplete it is highly desirable to develop computational methods that can assist in the classification and substrate annotation of putative membrane transport proteins. Here, we determined the similarity of membrane transporter sequences annotated in the Transport Classification Database (Saier et al., Nucleic Acids Res 2006, 34, D181-D186) and Arabidopsis thaliana membrane transporters annotated in the database Aramemnon (Schwacke et al., Plant Physiol 2003, 131, 16-26). The similarity measure was based on the amino acid composition either considering the full sequences or separately in the transmembrane (TM) and external parts of the sequences. We considered four different substrate sets and three different subfamilies and tried to classify the given proteins into these classes. Family or substrate prediction based on the simple amino acid frequency had an average accuracy of 76%. The differentiation between TM and non-TM regions led to an improved accuracy of 80% on average. 相似文献
14.
Hong-Liang Zou 《The Journal of membrane biology》2014,247(11):1141-1148
Membrane protein is an important composition of cell membrane. Given a membrane protein sequence, how can we identify its type(s) is very important because the type keeps a close correlation with its functions. According to previous studies, membrane protein can be divided into the following eight types: single-pass type I, single-pass type II, single-pass type III, single-pass type IV, multipass, lipid-anchor, GPI-anchor, peripheral membrane protein. With the avalanche of newly found protein sequences in the post-genomic age, it is urgent to develop an automatic and effective computational method to rapid and reliable prediction of the types of membrane proteins. At present, most of the existing methods were based on the assumption that one membrane protein only belongs to one type. Actually, a membrane protein may simultaneously exist at two or more different functional types. In this study, a new method by hybridizing the pseudo amino acid composition with multi-label algorithm called LIFT (multi-label learning with label-specific features) was proposed to predict the functional types both singleplex and multiplex animal membrane proteins. Experimental result on a stringent benchmark dataset of membrane proteins by jackknife test show that the absolute-true obtained was 0.6342, indicating that our approach is quite promising. It may become a useful high-through tool, or at least play a complementary role to the existing predictors in identifying functional types of membrane proteins. 相似文献
15.
16.
Membrane proteins can be classified among the following five types: (1) type I membrane protein. (2) type II membrane protein. (3) multipass transmembrane proteins. (4) lipid chain-anchored membrane proteins, and (5) GPI-anchored membrane proteins. T. Kohonen's self-organization model which is a typical neural network is applied for predicting the type of a given membrane protein based on its amino acid composition. As a result, the high rates of self-consistency (94.80%) and cross-validation (77.76%), and stronger fault-tolerant ability were obtained. 相似文献
17.
Summary. The subnuclear localization of nuclear protein is very important for in-depth understanding of the construction and function
of the nucleus. Based on the amino acid and pseudo amino acid composition (PseAA) as originally introduced by K. C. Chou can
incorporate much more information of a protein sequence than the classical amino acid composition so as to significantly enhance
the power of using a discrete model to predict various attributes of a protein, an algorithm of increment of diversity combined
with the improved quadratic discriminant analysis is proposed to predict the protein subnuclear location. The overall predictive
success rates and correlation coefficient are 75.4% and 0.629 for 504 single localization proteins in jackknife test, and
80.4% for an independent set of 92 multi-localization proteins, respectively. For 406 single localization nuclear proteins
with ≤25% sequence identity, the results of jackknife test show that the overall accuracy of prediction is 77.1%.
Authors’ address: Qian-Zhong Li, Laboratory of Theoretical Biophysics, Department of Physics, College of Sciences and Technology,
Inner Mongolia University, Hohhot 010021, China 相似文献
18.
A new algorithm to predict the types of membrane proteins is proposed. Besides the amino acid composition of the query protein, the information within the amino acid sequence is taken into account. A formulation of the autocorrelation functions based on the hydrophobicity index of the 20 amino acids is adopted. The overall predictive accuracy is remarkably increased for the database of 2054 membrane proteins studied here. An improvement of about 13% in the resubstitution test and 8% in the jackknife test is achieved compared with those of algorithms based merely on the amino acid composition. Consequently, overall predictive accuracy is as high as 94% and 82% for the resubstitution and jackknife tests, respectively, for the prediction of the five types. Since the proposed algorithm is based on more parameters than those in the amino acid composition approach, the predictive accuracy would be further increased for a larger and more class-balanced database. The present algorithm should be useful in the determination of the types and functions of new membrane proteins. The computer program is available on request. 相似文献
19.
The accuracy of predicting protein folding types can be significantly enhanced by a recently developed algorithm in which the coupling effect among different amino acid components is taken into account [Chou and Zhang (1994)J. Biol. Chem.
269, 22014-22020]. However, in practical calculations using this powerful algorithm, one may sometimes face illconditioned matrices. To overcome such a difficulty, an effective eigenvalue-eigenvector approach is proposed. Furthermore, the new approach has been used to predict a recently constructed set of 76 proteins not included in the training set, and the accuracy of prediction is also much higher than those of other methods. 相似文献