首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 363 毫秒
1.
In this paper, based on the approach by combining the "functional domain composition" [K.C. Chou, Y. D. Cai, J. Biol. Chem. 277 (2002) 45765] and the pseudo-amino acid composition [K.C. Chou, Proteins Struct. Funct. Genet. 43 (2001) 246; Correction Proteins Struct. Funct. Genet. 2044 (2001) 2060], the Nearest Neighbour Algorithm (NNA) was developed for predicting the protein subcellular location. Very high success rates were observed, suggesting that such a hybrid approach may become a useful high-throughput tool in the area of bioinformatics and proteomics.  相似文献   

2.
The function of a protein is closely correlated with its subcellular location. With the success of human genome project and the rapid increase in the number of newly found protein sequences entering into data banks, it is highly desirable to develop an automated method for predicting the subcellular location of proteins. The establishment of such a predictor will no doubt expedite the functionality determination of newly found proteins and the process of prioritizing genes and proteins identified by genomics efforts as potential molecular targets for drug design. Based on the concept of pseudo amino acid composition originally proposed by K. C. Chou (Proteins: Struct. Funct. Genet. 43: 246–255, 2001), the digital signal processing approach has been introduced to partially incorporate the sequence order effect. One of the remarkable merits by doing so is that many existing tools in mathematics and engineering can be straightforwardly used in predicting protein subcellular location. The results thus obtained are quite encouraging. It is anticipated that the digital signal processing may serve as a useful vehicle for many other protein science areas as well.  相似文献   

3.
Knowledge of membrane protein type often provides crucial hints toward determining the function of an uncharacterized membrane protein. With the avalanche of new protein sequences emerging during the post-genomic era, it is highly desirable to develop an automated method that can serve as a high throughput tool in identifying the types of newly found membrane proteins according to their primary sequences, so as to timely make the relevant annotations on them for the reference usage in both basic research and drug discovery. Based on the concept of pseudo-amino acid composition [K.C. Chou, Proteins: Struct. Funct. Genet. 43 (2001) 246-255; Erratum: Proteins: Struct. Funct. Genet. 44 (2001) 60] that has made it possible to incorporate a considerable amount of sequence-order effects by representing a protein sample in terms of a set of discrete numbers, a novel predictor, the so-called "optimized evidence-theoretic K-nearest neighbor" or "OET-KNN" classifier, was proposed. It was demonstrated via the self-consistency test, jackknife test, and independent dataset test that the new predictor, compared with many previous ones, yielded higher success rates in most cases. The new predictor can also be used to improve the prediction quality for, among many other protein attributes, structural class, subcellular localization, enzyme family class, and G-protein coupled receptor type. The OET-KNN classifier will be available as a web-server at http://www.pami.sjtu.edu.cn/kcchou.  相似文献   

4.
Apoptosis proteins are very important for understanding the mechanism of programmed cell death. The apoptosis protein localization can provide valuable information about its molecular function. The prediction of localization of an apoptosis protein is a challenging task. In our previous work we proposed an increment of diversity (ID) method using protein sequence information for this prediction task. In this work, based on the concept of Chou's pseudo-amino acid composition [Chou, K.C., 2001. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins: Struct. Funct. Genet. (Erratum: Chou, K.C., 2001, vol. 44, 60) 43, 246-255, Chou, K.C., 2005. Using amphiphilic pseudo-amino acid composition to predict enzyme subfamily classes. Bioinformatics 21, 10-19], a different pseudo-amino acid composition by using the hydropathy distribution information is introduced. A novel ID_SVM algorithm combined ID with support vector machine (SVM) is proposed. This method is applied to three data sets (317 apoptosis proteins, 225 apoptosis proteins and 98 apoptosis proteins). The higher predictive success rates than the previous algorithms are obtained by the jackknife tests.  相似文献   

5.
Cell membranes are vitally important to the life of a cell. Although the basic structure of biological membrane is provided by the lipid bilayer, membrane proteins perform most of the specific functions. Membrane proteins are putatively classified into five different types. Identification of their types is currently an important topic in bioinformatics and proteomics. In this paper, based on the concept of representing protein samples in terms of their pseudo-amino acid composition (Chou, K.C., 2001. Prediction of protein cellular attributes using pseudo amino acid composition. Proteins: Struct. Funct. Genet. 43, 246-255), the fuzzy K-nearest neighbors (KNN) algorithm has been introduced to predict membrane protein types, and high success rates were observed. It is anticipated that, the current approach, which is based on a branch of fuzzy mathematics and represents a new strategy, may play an important complementary role to the existing methods in this area. The novel approach may also have notable impact on prediction of the other attributes, such as protein structural class, protein subcellular localization, and enzyme family class, among many others.  相似文献   

6.
Prediction of protein subcellular locations by GO-FunD-PseAA predictor   总被引:8,自引:0,他引:8  
The localization of a protein in a cell is closely correlated with its biological function. With the explosion of protein sequences entering into DataBanks, it is highly desired to develop an automated method that can fast identify their subcellular location. This will expedite the annotation process, providing timely useful information for both basic research and industrial application. In view of this, a powerful predictor has been developed by hybridizing the gene ontology approach [Nat. Genet. 25 (2000) 25], functional domain composition approach [J. Biol. Chem. 277 (2002) 45765], and the pseudo-amino acid composition approach [Proteins Struct. Funct. Genet. 43 (2001) 246; Erratum: ibid. 44 (2001) 60]. As a showcase, the recently constructed dataset [Bioinformatics 19 (2003) 1656] was used for demonstration. The dataset contains 7589 proteins classified into 12 subcellular locations: chloroplast, cytoplasmic, cytoskeleton, endoplasmic reticulum, extracellular, Golgi apparatus, lysosomal, mitochondrial, nuclear, peroxisomal, plasma membrane, and vacuolar. The overall success rate of prediction obtained by the jackknife cross-validation was 92%. This is so far the highest success rate performed on this dataset by following an objective and rigorous cross-validation procedure.  相似文献   

7.
A novel approach was developed for predicting the structural classes of proteins based on their sequences. It was assumed that proteins belonging to the same structural class must bear some sort of similar texture on the images generated by the cellular automaton evolving rule [Wolfram, S., 1984. Cellular automation as models of complexity. Nature 311, 419-424]. Based on this, two geometric invariant moment factors derived from the image functions were used as the pseudo amino acid components [Chou, K.C., 2001. Prediction of protein cellular attributes using pseudo amino acid composition. Proteins: Struct., Funct., Genet. (Erratum: ibid., 2001, vol. 44, 60) 43, 246-255] to formulate the protein samples for statistical prediction. The success rates thus obtained on a previously constructed benchmark dataset are quite promising, implying that the cellular automaton image can help to reveal some inherent and subtle features deeply hidden in a pile of long and complicated amino acid sequences.  相似文献   

8.
Gao Y  Shao S  Xiao X  Ding Y  Huang Y  Huang Z  Chou KC 《Amino acids》2005,28(4):373-376
Summary. With the avalanche of new protein sequences we are facing in the post-genomic era, it is vitally important to develop an automated method for fast and accurately determining the subcellular location of uncharacterized proteins. In this article, based on the concept of pseudo amino acid composition (Chou, K.C. Proteins: Structure, Function, and Genetics, 2001, 43: 246–255), three pseudo amino acid components are introduced via Lyapunov index, Bessel function, Chebyshev filter that can be more efficiently used to deal with the chaos and complexity in protein sequences, leading to a higher success rate in predicting protein subcellular location.  相似文献   

9.
Cell membranes are vitally important to living cells. Although the infrastructure of biological membrane is provided by the lipid bilayer, membrane proteins perform most of the specific functions. Knowledge of membrane protein types often provides crucial hints toward determining the function of an uncharacterized membrane protein. With the avalanche of new protein sequences generated in the post-genomic era, it is highly demanded to develop a high throughput tool in identifying the type of newly found membrane proteins according to their primary sequences, so as to timely annotate them for reference usage in both basic research and drug discovery. To realize this, the key is to establish a powerful identifier that can catch their characteristic sequence patterns for different membrane protein types. However, it is not easy because they are buried in a pile of long and complicated sequences. In this paper, based on the concept of the pseudo-amino acid composition [K.C. Chou, PROTEINS: Struct., Funct., Genet. 43 (2001) 246-255], the low-frequency Fourier spectrum analysis is introduced. The merits by doing so are that the sequence pattern information can be more effectively incorporated into a set of discrete components, and that all the existing prediction algorithms can be straightforwardly used on such a formulation for protein samples. High success rates were observed by the re-substitution test, jackknife test, and independent dataset test, indicating that the low-frequency Fourier spectrum approach may become a very useful tool for membrane protein type prediction. The novel approach also holds a high potential for predicting many other attributes of proteins.  相似文献   

10.
Li ZC  Zhou XB  Dai Z  Zou XY 《Amino acids》2009,37(2):415-425
A prior knowledge of protein structural classes can provide useful information about its overall structure, so it is very important for quick and accurate determination of protein structural class with computation method in protein science. One of the key for computation method is accurate protein sample representation. Here, based on the concept of Chou’s pseudo-amino acid composition (AAC, Chou, Proteins: structure, function, and genetics, 43:246–255, 2001), a novel method of feature extraction that combined continuous wavelet transform (CWT) with principal component analysis (PCA) was introduced for the prediction of protein structural classes. Firstly, the digital signal was obtained by mapping each amino acid according to various physicochemical properties. Secondly, CWT was utilized to extract new feature vector based on wavelet power spectrum (WPS), which contains more abundant information of sequence order in frequency domain and time domain, and PCA was then used to reorganize the feature vector to decrease information redundancy and computational complexity. Finally, a pseudo-amino acid composition feature vector was further formed to represent primary sequence by coupling AAC vector with a set of new feature vector of WPS in an orthogonal space by PCA. As a showcase, the rigorous jackknife cross-validation test was performed on the working datasets. The results indicated that prediction quality has been improved, and the current approach of protein representation may serve as a useful complementary vehicle in classifying other attributes of proteins, such as enzyme family class, subcellular localization, membrane protein types and protein secondary structure, etc.  相似文献   

11.
Membrane protein plays an important role in some biochemical process such as signal transduction, transmembrane transport, etc. Membrane proteins are usually classified into five types [Chou, K.C., Elrod, D.W., 1999. Prediction of membrane protein types and subcellular locations. Proteins: Struct. Funct. Genet. 34, 137-153] or six types [Chou, K.C., Cai, Y.D., 2005. J. Chem. Inf. Modelling 45, 407-413]. Designing in silico methods to identify and classify membrane protein can help us understand the structure and function of unknown proteins. This paper introduces an integrative approach, IAMPC, to classify membrane proteins based on protein sequences and protein profiles. These modules extract the amino acid composition of the whole profiles, the amino acid composition of N-terminal and C-terminal profiles, the amino acid composition of profile segments and the dipeptide composition of the whole profiles. In the computational experiment, the overall accuracy of the proposed approach is comparable with the functional-domain-based method. In addition, the performance of the proposed approach is complementary to the functional-domain-based method for different membrane protein types.  相似文献   

12.
Apoptosis, or programmed cell death, plays an important role in development of an organism. Obtaining information on subcellular location of apoptosis proteins is very helpful to understand the apoptosis mechanism. In this paper, based on the concept that the position distribution information of amino acids is closely related with the structure and function of proteins, we introduce the concept of distance frequency [Matsuda, S., Vert, J.P., Ueda, N., Toh, H., Akutsu, T., 2005. A novel representation of protein sequences for prediction of subcellular location using support vector machines. Protein Sci. 14, 2804-2813] and propose a novel way to calculate distance frequencies. In order to calculate the local features, each protein sequence is separated into p parts with the same length in our paper. Then we use the novel representation of protein sequences and adopt support vector machine to predict subcellular location. The overall prediction accuracy is significantly improved by jackknife test.  相似文献   

13.
The function of protein is closely correlated with it subcellular location. Prediction of subcellular location of apoptosis proteins is an important research area in post-genetic era because the knowledge of apoptosis proteins is useful to understand the mechanism of programmed cell death. Compared with the conventional amino acid composition (AAC), the Pseudo Amino Acid composition (PseAA) as originally introduced by Chou can incorporate much more information of a protein sequence so as to remarkably enhance the power of using a discrete model to predict various attributes of a protein. In this study, a novel approach is presented to predict apoptosis protein solely from sequence based on the concept of Chou's PseAA composition. The concept of approximate entropy (ApEn), which is a parameter denoting complexity of time series, is used to construct PseAA composition as additional features. Fuzzy K-nearest neighbor (FKNN) classifier is selected as prediction engine. Particle swarm optimization (PSO) algorithm is adopted for optimizing the weight factors which are important in PseAA composition. Two datasets are used to validate the performance of the proposed approach, which incorporate six subcellular location and four subcellular locations, respectively. The results obtained by jackknife test are quite encouraging. It indicates that the ApEn of protein sequence could represent effectively the information of apoptosis proteins subcellular locations. It can at least play a complimentary role to many of the existing methods, and might become potentially useful tool for protein function prediction. The software in Matlab is available freely by contacting the corresponding author.  相似文献   

14.
Three independent three-dimensional reconstructions of the spinach photosystem II-light-harvesting complex supercomplex were derived from single particle analyses of non-stained, vitrified samples imaged by electron microscopy. Each reconstruction was found to differ significantly in the composition of the lumenal oxygen-evolving complex extrinsic proteins. From difference mapping, aided by electron microscopy of negatively stained selectively washed samples, regions of density were assigned to the PsbO and PsbP/PsbQ proteins. Interpretation of the density assigned to the PsbO protein was explored using computer-aided structural predictions. PsbO is calculated to be mainly a beta-protein (38% beta) composed of two domains within an overall elongated shape (Pazos, F., Heredia, P., Valencia, A., and De Las Rivas, J. (2001) Proteins Struct. Funct. Genet. 45, 372-381). The positioning and fitting of the proposed structural model for the PsbO protein within the three-dimensional map indicated that there is a single copy per reaction center. Moreover, the structural model derived for PsbO, together with difference mapping, indicates that this protein stretches across the surface of the reaction center with its N- and C-terminal domains located toward the CP47 and CP43 side, respectively. This structural assignment is discussed in terms of the recent x-ray-derived cyanobacterial model of PSII (Zouni, A., Witt, H.-T., Kern, J., Fromme, P., Krauss, N., Saenger, W., and Orth, P. (2001) Nature 409, 739-743).  相似文献   

15.
蛋白质的亚细胞定位是进行蛋白质功能研究的重要信息.蛋白质合成后被转运到特定的细胞器中,只有转运到正确的部位才能参与细胞的各种生命活动,有效地发挥功能.尝试了将保守序列及蛋白质相互作用数据的编码信息结合传统的氨基酸组成编码,采用支持向量机进行蛋白质亚细胞定位预测,在真核生物中5轮交叉验证精度达到91.8%,得到了显著的提高.  相似文献   

16.
Xiao X  Shao S  Ding Y  Huang Z  Huang Y  Chou KC 《Amino acids》2005,28(1):57-61
Summary. Recent advances in large-scale genome sequencing have led to the rapid accumulation of amino acid sequences of proteins whose functions are unknown. Because the functions of these proteins are closely correlated with their subcellular localizations, it is vitally important to develop an automated method as a high-throughput tool to timely identify their subcellular location. Based on the concept of the pseudo amino acid composition by which a considerable amount of sequence-order effects can be incorporated into a set of discrete numbers (Chou, K. C., Proteins: Structure, Function, and Genetics, 2001, 43: 246–255), the complexity measure approach is introduced. The advantage by incorporating the complexity measure factor as one of the pseudo amino acid components for a protein is that it can more effectively reflect its overall sequence-order feature than the conventional correlation factors. With such a formulation frame to represent the samples of protein sequences, the covariant-discriminant predictor (Chou, K. C. and Elrod, D. W., Protein Engineering, 1999, 12: 107–118) was adopted to conduct prediction. High success rates were obtained by both the jackknife cross-validation test and independent dataset test, suggesting that introduction of the concept of the complexity measure into prediction of protein subcellular location is quite promising, and might also hold a great potential as a useful vehicle for the other areas of molecular biology.  相似文献   

17.
The location of a protein in a cell is closely correlated with its biological function. Based on the concept that the protein subcellular location is mainly determined by its amino acid and pseudo amino acid composition (PseAA), a new algorithm of increment of diversity combined with support vector machine is proposed to predict the protein subcellular location. The subcellular locations of plant and non-plant proteins are investigated by our method. The overall prediction accuracies in jackknife test are 88.3% for the eukaryotic plant proteins and 92.4% for the eukaryotic non-plant proteins, respectively. In order to estimate the effect of the sequence identity on predictive result, the proteins with sequence identity 相似文献   

18.
We have combined three mutations previously shown to stabilize lambda repressor against thermal denaturation. Two of these mutations are in helix 3, where Gly-46 and Gly-48 have been replaced by alanines [Hecht, M. H., et al. (1986) Proteins: Struct., Funct., Genet. 1, 43-46]. The other mutation, which replaces Tyr-88 with cysteine, allows the protein to form an intersubunit disulfide bond [Sauer, R. T., et al. (1986) Biochemistry 25, 5992-5998]. Calorimetric measurements show that the two alanine substitutions stabilize repressor by about 8 degrees C, that the disulfide bond stabilizes repressor by about 8 degrees C, and that the triple mutant is 16 degrees C more stable than wild-type repressor.  相似文献   

19.
Shi JY  Zhang SW  Pan Q  Zhou GP 《Amino acids》2008,35(2):321-327
In the Post Genome Age, there is an urgent need to develop the reliable and effective computational methods to predict the subcellular localization for the explosion of newly found proteins. Here, a novel method of pseudo amino acid (PseAA) composition, the so-called “amino acid composition distribution” (AACD), is introduced. First, a protein sequence is divided equally into multiple segments. Then, amino acid composition of each segment is calculated in series. After that, each protein sequence can be represented by a feature vector. Finally, the feature vectors of all sequences thus obtained are further input into the multi-class support vector machines to predict the subcellular localization. The results show that AACD is quite effective in representing protein sequences for the purpose of predicting protein subcellular localization.  相似文献   

20.
Fast and proper assessment of bio macro-molecular complex structural rigidity as a measure of structural stability can be useful in systematic studies to predict molecular function, and can also enable the design of rapid scoring functions to rank automatically generated bio-molecular complexes. Based on the graph theoretical approach of Jacobs et al. [Jacobs DJ, Rader AJ, Kuhn LA, Thorpe MF (2001) Protein flexibility predictions using graph theory. Proteins: Struct Funct Genet 44:150–165] for expressing molecular flexibility, we propose a new scheme to analyze the structural stability of bio-molecular complexes. This analysis is performed in terms of the identification in interacting subunits of clusters of flappy amino acids (those constituting regions of potential internal motion) that undergo an increase in rigidity at complex formation. Gains in structural rigidity of the interacting subunits upon bio-molecular complex formation can be evaluated by expansion of the network of intra-molecular inter-atomic interactions to include inter-molecular inter-atomic interaction terms. We propose two indices for quantifying this change: one local, which can express localized (at the amino acid level) structural rigidity, the other global to express overall structural stability for the complex. The new system is validated with a series of protein complex structures reported in the protein data bank. Finally, the indices are used as scoring coefficients to rank automatically generated protein complex decoys.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号