首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
With the rapid increment of protein sequence data, it is indispensable to develop automated and reliable predictive methods for protein function annotation. One approach for facilitating protein function prediction is to classify proteins into functional families from primary sequence. Being the most important group of all proteins, the accurate prediction for enzyme family classes and subfamily classes is closely related to their biological functions. In this paper, for the prediction of enzyme subfamily classes, the Chou's amphiphilic pseudo-amino acid composition [Chou, K.C., 2005. Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21, 10-19] has been adopted to represent the protein samples for training the 'one-versus-rest' support vector machine. As a demonstration, the jackknife test was performed on the dataset that contains 2640 oxidoreductase sequences classified into 16 subfamily classes [Chou, K.C., Elrod, D.W., 2003. Prediction of enzyme family classes. J. Proteome Res. 2, 183-190]. The overall accuracy thus obtained was 80.87%. The significant enhancement in the accuracy indicates that the current method might play a complementary role to the exiting methods.  相似文献   

2.
Cell membranes are vitally important to the life of a cell. Although the basic structure of biological membrane is provided by the lipid bilayer, membrane proteins perform most of the specific functions. Membrane proteins are putatively classified into five different types. Identification of their types is currently an important topic in bioinformatics and proteomics. In this paper, based on the concept of representing protein samples in terms of their pseudo-amino acid composition (Chou, K.C., 2001. Prediction of protein cellular attributes using pseudo amino acid composition. Proteins: Struct. Funct. Genet. 43, 246-255), the fuzzy K-nearest neighbors (KNN) algorithm has been introduced to predict membrane protein types, and high success rates were observed. It is anticipated that, the current approach, which is based on a branch of fuzzy mathematics and represents a new strategy, may play an important complementary role to the existing methods in this area. The novel approach may also have notable impact on prediction of the other attributes, such as protein structural class, protein subcellular localization, and enzyme family class, among many others.  相似文献   

3.
Cai YD  Zhou GP  Chou KC 《Biophysical journal》2003,84(5):3257-3263
Membrane proteins are generally classified into the following five types: 1), type I membrane protein; 2), type II membrane protein; 3), multipass transmembrane proteins; 4), lipid chain-anchored membrane proteins; and 5), GPI-anchored membrane proteins. In this article, based on the concept of using the functional domain composition to define a protein, the Support Vector Machine algorithm is developed for predicting the membrane protein type. High success rates are obtained by both the self-consistency and jackknife tests. The current approach, complemented with the powerful covariant discriminant algorithm based on the pseudo-amino acid composition that has incorporated quasi-sequence-order effect as recently proposed by K. C. Chou (2001), may become a very useful high-throughput tool in the area of bioinformatics and proteomics.  相似文献   

4.
Apoptosis proteins are very important for understanding the mechanism of programmed cell death. The apoptosis protein localization can provide valuable information about its molecular function. The prediction of localization of an apoptosis protein is a challenging task. In our previous work we proposed an increment of diversity (ID) method using protein sequence information for this prediction task. In this work, based on the concept of Chou's pseudo-amino acid composition [Chou, K.C., 2001. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins: Struct. Funct. Genet. (Erratum: Chou, K.C., 2001, vol. 44, 60) 43, 246-255, Chou, K.C., 2005. Using amphiphilic pseudo-amino acid composition to predict enzyme subfamily classes. Bioinformatics 21, 10-19], a different pseudo-amino acid composition by using the hydropathy distribution information is introduced. A novel ID_SVM algorithm combined ID with support vector machine (SVM) is proposed. This method is applied to three data sets (317 apoptosis proteins, 225 apoptosis proteins and 98 apoptosis proteins). The higher predictive success rates than the previous algorithms are obtained by the jackknife tests.  相似文献   

5.
A novel approach was developed for predicting the structural classes of proteins based on their sequences. It was assumed that proteins belonging to the same structural class must bear some sort of similar texture on the images generated by the cellular automaton evolving rule [Wolfram, S., 1984. Cellular automation as models of complexity. Nature 311, 419-424]. Based on this, two geometric invariant moment factors derived from the image functions were used as the pseudo amino acid components [Chou, K.C., 2001. Prediction of protein cellular attributes using pseudo amino acid composition. Proteins: Struct., Funct., Genet. (Erratum: ibid., 2001, vol. 44, 60) 43, 246-255] to formulate the protein samples for statistical prediction. The success rates thus obtained on a previously constructed benchmark dataset are quite promising, implying that the cellular automaton image can help to reveal some inherent and subtle features deeply hidden in a pile of long and complicated amino acid sequences.  相似文献   

6.
Liu H  Yang J  Wang M  Xue L  Chou KC 《The protein journal》2005,24(6):385-389
Membrane proteins are generally classified into the following five types: (1) type I membrane protein, (2) type II membrane protein, (3) multipass transmembrane proteins, (4) lipid chain-anchored membrane proteins, and (5) GPI-anchored membrane proteins. Given the sequence of an uncharacterized membrane protein, how can we identify which one of the above five types it belongs to? This is important because the biological function of a membrane protein is closely correlated with its type. Particularly, with the explosion of protein sequences entering into databanks, it is in high demand to develop an automated method to address this problem. To realize this, the key is to catch the statistical characteristics for each of the five types. However, it is not easy because they are buried in a pile of long and complicated sequences. In this paper, based on the concept of the pseudo amino acid composition (Chou, K. C. (2001). PROTEINS: Structure, Function, and Genetics 43: 246–255), the technique of Fourier spectrum analysis is introduced. By doing so, the sample of a protein is represented by a set of discrete components that can incorporate a considerable amount of the sequence order effects as well as its amino acid composition information. On the basis of such a statistical frame, the support vector machine (SVM) is introduced to perform predictions. High success rates were yielded by the self-consistency test, jackknife test, and independent dataset test, suggesting that the current approach holds a promising potential to become a high throughput tool for membrane protein type prediction as well as other related areas.  相似文献   

7.
Xiao X  Shao S  Ding Y  Huang Z  Huang Y  Chou KC 《Amino acids》2005,28(1):57-61
Summary. Recent advances in large-scale genome sequencing have led to the rapid accumulation of amino acid sequences of proteins whose functions are unknown. Because the functions of these proteins are closely correlated with their subcellular localizations, it is vitally important to develop an automated method as a high-throughput tool to timely identify their subcellular location. Based on the concept of the pseudo amino acid composition by which a considerable amount of sequence-order effects can be incorporated into a set of discrete numbers (Chou, K. C., Proteins: Structure, Function, and Genetics, 2001, 43: 246–255), the complexity measure approach is introduced. The advantage by incorporating the complexity measure factor as one of the pseudo amino acid components for a protein is that it can more effectively reflect its overall sequence-order feature than the conventional correlation factors. With such a formulation frame to represent the samples of protein sequences, the covariant-discriminant predictor (Chou, K. C. and Elrod, D. W., Protein Engineering, 1999, 12: 107–118) was adopted to conduct prediction. High success rates were obtained by both the jackknife cross-validation test and independent dataset test, suggesting that introduction of the concept of the complexity measure into prediction of protein subcellular location is quite promising, and might also hold a great potential as a useful vehicle for the other areas of molecular biology.  相似文献   

8.
Cell membranes are vitally important to living cells. Although the infrastructure of biological membrane is provided by the lipid bilayer, membrane proteins perform most of the specific functions. Knowledge of membrane protein types often provides crucial hints toward determining the function of an uncharacterized membrane protein. With the avalanche of new protein sequences generated in the post-genomic era, it is highly demanded to develop a high throughput tool in identifying the type of newly found membrane proteins according to their primary sequences, so as to timely annotate them for reference usage in both basic research and drug discovery. To realize this, the key is to establish a powerful identifier that can catch their characteristic sequence patterns for different membrane protein types. However, it is not easy because they are buried in a pile of long and complicated sequences. In this paper, based on the concept of the pseudo-amino acid composition [K.C. Chou, PROTEINS: Struct., Funct., Genet. 43 (2001) 246-255], the low-frequency Fourier spectrum analysis is introduced. The merits by doing so are that the sequence pattern information can be more effectively incorporated into a set of discrete components, and that all the existing prediction algorithms can be straightforwardly used on such a formulation for protein samples. High success rates were observed by the re-substitution test, jackknife test, and independent dataset test, indicating that the low-frequency Fourier spectrum approach may become a very useful tool for membrane protein type prediction. The novel approach also holds a high potential for predicting many other attributes of proteins.  相似文献   

9.
The membrane protein type is an important feature in characterizing the overall topological folding type of a protein or its domains therein. Many investigators have put their efforts to the prediction of membrane protein type. Here, we propose a new approach, the bootstrap aggregating method or bragging learner, to address this problem based on the protein amino acid composition. As a demonstration, the benchmark dataset constructed by K.C. Chou and D.W. Elrod was used to test the new method. The overall success rate thus obtained by jackknife cross-validation was over 84%, indicating that the bragging learner as presented in this paper holds a quite high potential in predicting the attributes of proteins, or at least can play a complementary role to many existing algorithms in this area. It is anticipated that the prediction quality can be further enhanced if the pseudo amino acid composition can be effectively incorporated into the current predictor. An online membrane protein type prediction web server developed in our lab is available at http://chemdata.shu.edu.cn/protein/protein.jsp.  相似文献   

10.
11.
In this paper, based on the approach by combining the "functional domain composition" [K.C. Chou, Y. D. Cai, J. Biol. Chem. 277 (2002) 45765] and the pseudo-amino acid composition [K.C. Chou, Proteins Struct. Funct. Genet. 43 (2001) 246; Correction Proteins Struct. Funct. Genet. 2044 (2001) 2060], the Nearest Neighbour Algorithm (NNA) was developed for predicting the protein subcellular location. Very high success rates were observed, suggesting that such a hybrid approach may become a useful high-throughput tool in the area of bioinformatics and proteomics.  相似文献   

12.
膜蛋白是重要的药物靶位点,对膜蛋白类型的研究有助于药物的成功设计,因此正确预测膜蛋白类型对于药物研发是十分必要的。本文采用由274条分枝杆菌膜蛋白序列组成的一致性小于40%的数据集,以经过优化的伪氨基酸组分为特征,利用支持向量机分类算法预测分枝杆菌膜蛋白类型,在Jackknife检验下,得到85.4%的总体准确率和72.2%的平均准确率。结果说明,该方法可用于分枝杆菌膜蛋白类型的识别,将有助于抗分枝杆菌药物的开发。  相似文献   

13.
Schaadt NS  Helms V 《Biopolymers》2012,97(7):558-567
Membrane transporters catalyze the transport of small solute molecules across biological barriers such as lipid bilayer membranes. As the experimental annotation of which proteins transport which substrates is incomplete it is highly desirable to develop computational methods that can assist in the classification and substrate annotation of putative membrane transport proteins. Here, we determined the similarity of membrane transporter sequences annotated in the Transport Classification Database (Saier et al., Nucleic Acids Res 2006, 34, D181-D186) and Arabidopsis thaliana membrane transporters annotated in the database Aramemnon (Schwacke et al., Plant Physiol 2003, 131, 16-26). The similarity measure was based on the amino acid composition either considering the full sequences or separately in the transmembrane (TM) and external parts of the sequences. We considered four different substrate sets and three different subfamilies and tried to classify the given proteins into these classes. Family or substrate prediction based on the simple amino acid frequency had an average accuracy of 76%. The differentiation between TM and non-TM regions led to an improved accuracy of 80% on average.  相似文献   

14.
Membrane protein is an important composition of cell membrane. Given a membrane protein sequence, how can we identify its type(s) is very important because the type keeps a close correlation with its functions. According to previous studies, membrane protein can be divided into the following eight types: single-pass type I, single-pass type II, single-pass type III, single-pass type IV, multipass, lipid-anchor, GPI-anchor, peripheral membrane protein. With the avalanche of newly found protein sequences in the post-genomic age, it is urgent to develop an automatic and effective computational method to rapid and reliable prediction of the types of membrane proteins. At present, most of the existing methods were based on the assumption that one membrane protein only belongs to one type. Actually, a membrane protein may simultaneously exist at two or more different functional types. In this study, a new method by hybridizing the pseudo amino acid composition with multi-label algorithm called LIFT (multi-label learning with label-specific features) was proposed to predict the functional types both singleplex and multiplex animal membrane proteins. Experimental result on a stringent benchmark dataset of membrane proteins by jackknife test show that the absolute-true obtained was 0.6342, indicating that our approach is quite promising. It may become a useful high-through tool, or at least play a complementary role to the existing predictors in identifying functional types of membrane proteins.  相似文献   

15.
16.
Artificial neural network model for predicting membrane protein types   总被引:5,自引:0,他引:5  
Membrane proteins can be classified among the following five types: (1) type I membrane protein. (2) type II membrane protein. (3) multipass transmembrane proteins. (4) lipid chain-anchored membrane proteins, and (5) GPI-anchored membrane proteins. T. Kohonen's self-organization model which is a typical neural network is applied for predicting the type of a given membrane protein based on its amino acid composition. As a result, the high rates of self-consistency (94.80%) and cross-validation (77.76%), and stronger fault-tolerant ability were obtained.  相似文献   

17.
Li FM  Li QZ 《Amino acids》2008,34(1):119-125
Summary. The subnuclear localization of nuclear protein is very important for in-depth understanding of the construction and function of the nucleus. Based on the amino acid and pseudo amino acid composition (PseAA) as originally introduced by K. C. Chou can incorporate much more information of a protein sequence than the classical amino acid composition so as to significantly enhance the power of using a discrete model to predict various attributes of a protein, an algorithm of increment of diversity combined with the improved quadratic discriminant analysis is proposed to predict the protein subnuclear location. The overall predictive success rates and correlation coefficient are 75.4% and 0.629 for 504 single localization proteins in jackknife test, and 80.4% for an independent set of 92 multi-localization proteins, respectively. For 406 single localization nuclear proteins with ≤25% sequence identity, the results of jackknife test show that the overall accuracy of prediction is 77.1%. Authors’ address: Qian-Zhong Li, Laboratory of Theoretical Biophysics, Department of Physics, College of Sciences and Technology, Inner Mongolia University, Hohhot 010021, China  相似文献   

18.
A new algorithm to predict the types of membrane proteins is proposed. Besides the amino acid composition of the query protein, the information within the amino acid sequence is taken into account. A formulation of the autocorrelation functions based on the hydrophobicity index of the 20 amino acids is adopted. The overall predictive accuracy is remarkably increased for the database of 2054 membrane proteins studied here. An improvement of about 13% in the resubstitution test and 8% in the jackknife test is achieved compared with those of algorithms based merely on the amino acid composition. Consequently, overall predictive accuracy is as high as 94% and 82% for the resubstitution and jackknife tests, respectively, for the prediction of the five types. Since the proposed algorithm is based on more parameters than those in the amino acid composition approach, the predictive accuracy would be further increased for a larger and more class-balanced database. The present algorithm should be useful in the determination of the types and functions of new membrane proteins. The computer program is available on request.  相似文献   

19.
An eigenvalue-eigenvector approach to predicting protein folding types   总被引:1,自引:0,他引:1  
The accuracy of predicting protein folding types can be significantly enhanced by a recently developed algorithm in which the coupling effect among different amino acid components is taken into account [Chou and Zhang (1994)J. Biol. Chem. 269, 22014-22020]. However, in practical calculations using this powerful algorithm, one may sometimes face illconditioned matrices. To overcome such a difficulty, an effective eigenvalue-eigenvector approach is proposed. Furthermore, the new approach has been used to predict a recently constructed set of 76 proteins not included in the training set, and the accuracy of prediction is also much higher than those of other methods.  相似文献   

20.
了解真核细胞中细胞核内蛋白质的定位情况对于新发现蛋白质的功能注释具有重要意义.随着蛋白质数据库中蛋白质序列数量的急速增加,采用计算方法来预测蛋白质亚核定位已经成为蛋白质科学领域研究的热点.根据Chou提出的伪氨基酸组成离散模型,提出了一种新的蛋白质亚核定位预测方法.计算蛋白质序列的近似熵作为附加特征构建伪氨基酸组成,表示蛋白质序列特征,AdaBoost分类算法作为预测工具.与已报道的亚核定位预测方法的性能相比,这种方法具有更高的准确率.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号