首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Proteins are generally classified into the following 12 subcellular locations: 1) chloroplast, 2) cytoplasm, 3) cytoskeleton, 4) endoplasmic reticulum, 5) extracellular, 6) Golgi apparatus, 7) lysosome, 8) mitochondria, 9) nucleus, 10) peroxisome, 11) plasma membrane, and 12) vacuole. Because the function of a protein is closely correlated with its subcellular location, with the rapid increase in new protein sequences entering into databanks, it is vitally important for both basic research and pharmaceutical industry to establish a high throughput tool for predicting protein subcellular location. In this paper, a new concept, the so-called "functional domain composition" is introduced. Based on the novel concept, the representation for a protein can be defined as a vector in a high-dimensional space, where each of the clustered functional domains derived from the protein universe serves as a vector base. With such a novel representation for a protein, the support vector machine (SVM) algorithm is introduced for predicting protein subcellular location. High success rates are obtained by the self-consistency test, jackknife test, and independent dataset test, respectively. The current approach not only can play an important complementary role to the powerful covariant discriminant algorithm based on the pseudo amino acid composition representation (Chou, K. C. (2001) Proteins Struct. Funct. Genet. 43, 246-255; Correction (2001) Proteins Struct. Funct. Genet. 44, 60), but also may greatly stimulate the development of this area.  相似文献   

2.
In this paper, based on the approach by combining the "functional domain composition" [K.C. Chou, Y. D. Cai, J. Biol. Chem. 277 (2002) 45765] and the pseudo-amino acid composition [K.C. Chou, Proteins Struct. Funct. Genet. 43 (2001) 246; Correction Proteins Struct. Funct. Genet. 2044 (2001) 2060], the Nearest Neighbour Algorithm (NNA) was developed for predicting the protein subcellular location. Very high success rates were observed, suggesting that such a hybrid approach may become a useful high-throughput tool in the area of bioinformatics and proteomics.  相似文献   

3.
Cell membranes are vitally important to the life of a cell. Although the basic structure of biological membrane is provided by the lipid bilayer, membrane proteins perform most of the specific functions. Membrane proteins are putatively classified into five different types. Identification of their types is currently an important topic in bioinformatics and proteomics. In this paper, based on the concept of representing protein samples in terms of their pseudo-amino acid composition (Chou, K.C., 2001. Prediction of protein cellular attributes using pseudo amino acid composition. Proteins: Struct. Funct. Genet. 43, 246-255), the fuzzy K-nearest neighbors (KNN) algorithm has been introduced to predict membrane protein types, and high success rates were observed. It is anticipated that, the current approach, which is based on a branch of fuzzy mathematics and represents a new strategy, may play an important complementary role to the existing methods in this area. The novel approach may also have notable impact on prediction of the other attributes, such as protein structural class, protein subcellular localization, and enzyme family class, among many others.  相似文献   

4.
Cell membranes are vitally important to living cells. Although the infrastructure of biological membrane is provided by the lipid bilayer, membrane proteins perform most of the specific functions. Knowledge of membrane protein types often provides crucial hints toward determining the function of an uncharacterized membrane protein. With the avalanche of new protein sequences generated in the post-genomic era, it is highly demanded to develop a high throughput tool in identifying the type of newly found membrane proteins according to their primary sequences, so as to timely annotate them for reference usage in both basic research and drug discovery. To realize this, the key is to establish a powerful identifier that can catch their characteristic sequence patterns for different membrane protein types. However, it is not easy because they are buried in a pile of long and complicated sequences. In this paper, based on the concept of the pseudo-amino acid composition [K.C. Chou, PROTEINS: Struct., Funct., Genet. 43 (2001) 246-255], the low-frequency Fourier spectrum analysis is introduced. The merits by doing so are that the sequence pattern information can be more effectively incorporated into a set of discrete components, and that all the existing prediction algorithms can be straightforwardly used on such a formulation for protein samples. High success rates were observed by the re-substitution test, jackknife test, and independent dataset test, indicating that the low-frequency Fourier spectrum approach may become a very useful tool for membrane protein type prediction. The novel approach also holds a high potential for predicting many other attributes of proteins.  相似文献   

5.
Prediction of protein subcellular locations by GO-FunD-PseAA predictor   总被引:8,自引:0,他引:8  
The localization of a protein in a cell is closely correlated with its biological function. With the explosion of protein sequences entering into DataBanks, it is highly desired to develop an automated method that can fast identify their subcellular location. This will expedite the annotation process, providing timely useful information for both basic research and industrial application. In view of this, a powerful predictor has been developed by hybridizing the gene ontology approach [Nat. Genet. 25 (2000) 25], functional domain composition approach [J. Biol. Chem. 277 (2002) 45765], and the pseudo-amino acid composition approach [Proteins Struct. Funct. Genet. 43 (2001) 246; Erratum: ibid. 44 (2001) 60]. As a showcase, the recently constructed dataset [Bioinformatics 19 (2003) 1656] was used for demonstration. The dataset contains 7589 proteins classified into 12 subcellular locations: chloroplast, cytoplasmic, cytoskeleton, endoplasmic reticulum, extracellular, Golgi apparatus, lysosomal, mitochondrial, nuclear, peroxisomal, plasma membrane, and vacuolar. The overall success rate of prediction obtained by the jackknife cross-validation was 92%. This is so far the highest success rate performed on this dataset by following an objective and rigorous cross-validation procedure.  相似文献   

6.
The function of a protein is closely correlated with its subcellular location. With the success of human genome project and the rapid increase in the number of newly found protein sequences entering into data banks, it is highly desirable to develop an automated method for predicting the subcellular location of proteins. The establishment of such a predictor will no doubt expedite the functionality determination of newly found proteins and the process of prioritizing genes and proteins identified by genomics efforts as potential molecular targets for drug design. Based on the concept of pseudo amino acid composition originally proposed by K. C. Chou (Proteins: Struct. Funct. Genet. 43: 246–255, 2001), the digital signal processing approach has been introduced to partially incorporate the sequence order effect. One of the remarkable merits by doing so is that many existing tools in mathematics and engineering can be straightforwardly used in predicting protein subcellular location. The results thus obtained are quite encouraging. It is anticipated that the digital signal processing may serve as a useful vehicle for many other protein science areas as well.  相似文献   

7.
A novel approach was developed for predicting the structural classes of proteins based on their sequences. It was assumed that proteins belonging to the same structural class must bear some sort of similar texture on the images generated by the cellular automaton evolving rule [Wolfram, S., 1984. Cellular automation as models of complexity. Nature 311, 419-424]. Based on this, two geometric invariant moment factors derived from the image functions were used as the pseudo amino acid components [Chou, K.C., 2001. Prediction of protein cellular attributes using pseudo amino acid composition. Proteins: Struct., Funct., Genet. (Erratum: ibid., 2001, vol. 44, 60) 43, 246-255] to formulate the protein samples for statistical prediction. The success rates thus obtained on a previously constructed benchmark dataset are quite promising, implying that the cellular automaton image can help to reveal some inherent and subtle features deeply hidden in a pile of long and complicated amino acid sequences.  相似文献   

8.
Apoptosis proteins are very important for understanding the mechanism of programmed cell death. The apoptosis protein localization can provide valuable information about its molecular function. The prediction of localization of an apoptosis protein is a challenging task. In our previous work we proposed an increment of diversity (ID) method using protein sequence information for this prediction task. In this work, based on the concept of Chou's pseudo-amino acid composition [Chou, K.C., 2001. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins: Struct. Funct. Genet. (Erratum: Chou, K.C., 2001, vol. 44, 60) 43, 246-255, Chou, K.C., 2005. Using amphiphilic pseudo-amino acid composition to predict enzyme subfamily classes. Bioinformatics 21, 10-19], a different pseudo-amino acid composition by using the hydropathy distribution information is introduced. A novel ID_SVM algorithm combined ID with support vector machine (SVM) is proposed. This method is applied to three data sets (317 apoptosis proteins, 225 apoptosis proteins and 98 apoptosis proteins). The higher predictive success rates than the previous algorithms are obtained by the jackknife tests.  相似文献   

9.
Membrane protein plays an important role in some biochemical process such as signal transduction, transmembrane transport, etc. Membrane proteins are usually classified into five types [Chou, K.C., Elrod, D.W., 1999. Prediction of membrane protein types and subcellular locations. Proteins: Struct. Funct. Genet. 34, 137-153] or six types [Chou, K.C., Cai, Y.D., 2005. J. Chem. Inf. Modelling 45, 407-413]. Designing in silico methods to identify and classify membrane protein can help us understand the structure and function of unknown proteins. This paper introduces an integrative approach, IAMPC, to classify membrane proteins based on protein sequences and protein profiles. These modules extract the amino acid composition of the whole profiles, the amino acid composition of N-terminal and C-terminal profiles, the amino acid composition of profile segments and the dipeptide composition of the whole profiles. In the computational experiment, the overall accuracy of the proposed approach is comparable with the functional-domain-based method. In addition, the performance of the proposed approach is complementary to the functional-domain-based method for different membrane protein types.  相似文献   

10.
Facing the explosion of newly generated protein sequences in the post genomic era, we are challenged to develop an automated method for fast and reliably annotating their subcellular locations. Knowledge of subcellular locations of proteins can provide useful hints for revealing their functions and understanding how they interact with each other in cellular networking. Unfortunately, it is both expensive and time-consuming to determine the localization of an uncharacterized protein in a living cell purely based on experiments. To tackle the challenge, a novel hybridization classifier was developed by fusing many basic individual classifiers through a voting system. The "engine" of these basic classifiers was operated by the OET-KNN (Optimized Evidence-Theoretic K-Nearest Neighbor) rule. As a demonstration, predictions were performed with the fusion classifier for proteins among the following 16 localizations: (1) cell wall, (2) centriole, (3) chloroplast, (4) cyanelle, (5) cytoplasm, (6) cytoskeleton, (7) endoplasmic reticulum, (8) extracell, (9) Golgi apparatus, (10) lysosome, (11) mitochondria, (12) nucleus, (13) peroxisome, (14) plasma membrane, (15) plastid, and (16) vacuole. To get rid of redundancy and homology bias, none of the proteins investigated here had >/=25% sequence identity to any other in a same subcellular location. The overall success rates thus obtained via the jack-knife cross-validation test and independent dataset test were 81.6% and 83.7%, respectively, which were 46 approximately 63% higher than those performed by the other existing methods on the same benchmark datasets. Also, it is clearly elucidated that the overwhelmingly high success rates obtained by the fusion classifier is by no means a trivial utilization of the GO annotations as prone to be misinterpreted because there is a huge number of proteins with given accession numbers and the corresponding GO numbers, but their subcellular locations are still unknown, and that the percentage of proteins with GO annotations indicating their subcellular components is even less than the percentage of proteins with known subcellular location annotation in the Swiss-Prot database. It is anticipated that the powerful fusion classifier may also become a very useful high throughput tool in characterizing other attributes of proteins according to their sequences, such as enzyme class, membrane protein type, and nuclear receptor subfamily, among many others. A web server, called "Euk-OET-PLoc", has been designed at http://202.120.37.186/bioinf/euk-oet for public to predict subcellular locations of eukaryotic proteins by the fusion OET-KNN classifier.  相似文献   

11.
Given an uncharacterized protein sequence, how can we identify whether it is a membrane protein or not? If it is, which membrane protein type it belongs to? These questions are important because they are closely relevant to the biological function of the query protein and to its interaction process with other molecules in a biological system. Particularly, with the avalanche of protein sequences generated in the Post-Genomic Age and the relatively much slower progress in using biochemical experiments to determine their functions, it is highly desired to develop an automated method that can be used to help address these questions. In this study, a 2-layer predictor, called MemType-2L, has been developed: the 1st layer prediction engine is to identify a query protein as membrane or non-membrane; if it is a membrane protein, the process will be automatically continued with the 2nd-layer prediction engine to further identify its type among the following eight categories: (1) type I, (2) type II, (3) type III, (4) type IV, (5) multipass, (6) lipid-chain-anchored, (7) GPI-anchored, and (8) peripheral. MemType-2L is featured by incorporating the evolution information through representing the protein samples with the Pse-PSSM (Pseudo Position-Specific Score Matrix) vectors, and by containing an ensemble classifier formed by fusing many powerful individual OET-KNN (Optimized Evidence-Theoretic K-Nearest Neighbor) classifiers. The success rates obtained by MemType-2L on a new-constructed stringent dataset by both the jackknife test and the independent dataset test are quite high, indicating that MemType-2L may become a very useful high throughput tool. As a Web server, MemType-2L is freely accessible to the public at http://chou.med.harvard.edu/bioinf/MemType.  相似文献   

12.
Fast and proper assessment of bio macro-molecular complex structural rigidity as a measure of structural stability can be useful in systematic studies to predict molecular function, and can also enable the design of rapid scoring functions to rank automatically generated bio-molecular complexes. Based on the graph theoretical approach of Jacobs et al. [Jacobs DJ, Rader AJ, Kuhn LA, Thorpe MF (2001) Protein flexibility predictions using graph theory. Proteins: Struct Funct Genet 44:150–165] for expressing molecular flexibility, we propose a new scheme to analyze the structural stability of bio-molecular complexes. This analysis is performed in terms of the identification in interacting subunits of clusters of flappy amino acids (those constituting regions of potential internal motion) that undergo an increase in rigidity at complex formation. Gains in structural rigidity of the interacting subunits upon bio-molecular complex formation can be evaluated by expansion of the network of intra-molecular inter-atomic interactions to include inter-molecular inter-atomic interaction terms. We propose two indices for quantifying this change: one local, which can express localized (at the amino acid level) structural rigidity, the other global to express overall structural stability for the complex. The new system is validated with a series of protein complex structures reported in the protein data bank. Finally, the indices are used as scoring coefficients to rank automatically generated protein complex decoys.  相似文献   

13.
We have combined three mutations previously shown to stabilize lambda repressor against thermal denaturation. Two of these mutations are in helix 3, where Gly-46 and Gly-48 have been replaced by alanines [Hecht, M. H., et al. (1986) Proteins: Struct., Funct., Genet. 1, 43-46]. The other mutation, which replaces Tyr-88 with cysteine, allows the protein to form an intersubunit disulfide bond [Sauer, R. T., et al. (1986) Biochemistry 25, 5992-5998]. Calorimetric measurements show that the two alanine substitutions stabilize repressor by about 8 degrees C, that the disulfide bond stabilizes repressor by about 8 degrees C, and that the triple mutant is 16 degrees C more stable than wild-type repressor.  相似文献   

14.
Membrane proteins are vitally important for many biological processes and have become an attractive target for both basic research and drug design. Knowledge of membrane protein types often provides useful clues in deducing the functions of uncharacterized membrane proteins. With the unprecedented increasing of newly found protein sequences in the post-genomic era, it is highly demanded to develop an automated method for fast and accurately identifying the types of membrane proteins according to their amino acid sequences. Although quite a few identifiers have been developed in this regard through various approaches, such as covariant discriminant (CD), support vector machine (SVM), artificial neural network (ANN), and K-nearest neighbor (KNN), classifier the way they operate the identification is basically individual. As is well known, wise persons usually take into account the opinions from several experts rather than rely on only one when they are making critical decisions. Likewise, a sophisticated identifier should be trained by several different modes. In view of this, based on the frame of pseudo-amino acid that can incorporate a considerable amount of sequence-order effects, a novel approach called "stacked generalization" or "stacking" has been introduced. Unlike the "bagging" and "boosting" approaches which only combine the classifiers of a same type, the stacking approach can combine several different types of classifiers through a meta-classifier to maximize the generalization accuracy. The results thus obtained were very encouraging. It is anticipated that the stacking approach may also hold a high potential to improve the identification quality for, among many other protein attributes, subcellular location, enzyme family class, protease type, and protein-protein interaction type. The stacked generalization classifier is available as a web-server named "SG-MPt_Pred" at: http://202.120.37.186/bioinf/wangsq/service.htm.  相似文献   

15.
We have expanded our reference set of proteins used in the estimation of protein secondary structure by CD spectroscopy from 29 to 37 proteins by including 3 additional globular proteins with known X-ray structure and 5 denatured proteins. We have also modified the self-consistent method for analyzing protein CD spectra, SELCON3, by including a new selection criterion developed by W. C. Johnson, Jr. (Proteins Struct. Funct. Genet. 35, 307-312, 1999). The secondary structure corresponding to the denatured proteins was approximated to be 90% unordered, owing to the spectral similarity of the denatured proteins and unordered structures. We examined the thermal denaturation of ribonuclease T1 by CD using both the original and expanded sets of reference proteins and obtained more consistent results with the expanded set. The expanded set of reference proteins will be helpful for the determination of protein secondary structure from protein CD spectra with higher reliability, especially of proteins with significant unordered structure content and/or in the course of denaturation.  相似文献   

16.
Hemoglobin Ypsilanti (HbY) is a stable tetrameric hemoglobin that binds oxygen with little or no cooperativity and with high affinity [Doyle, M. L., et al. (1992) Proteins: Struct., Funct., Genet. 14, 351-362]. It displays an especially large quaternary enhancement effect. An X-ray crystallographic study [Smith, F. R., et al. (1991) Proteins: Struct., Funct., Genet. 10, 81-91] of the carboxy derivative of this hemoglobin (COHbY) revealed a new quaternary structure that partially resembles the recently described R2 structure [Silva, M. M., et al. (1992) J. Biol. Chem. 267, 17248-17256]. Very little is known about either the solution phase conformations of the liganded and deoxy forms of HbY or the molecular basis for the large quaternary enhancement effect (Doyle et al., 1992). In this study, near-IR absorption, Soret-enhanced Raman, and UV (229 nm) resonance Raman spectroscopies are used to probe the liganded and deoxy derivatives of HbY in solution. Nanosecond time-resolved near-IR absorption measurements are used to expose the relaxation properties of the photoproduct of COHbY. Time-resolved (Soret band) absorption is used to generate the geminate and solvent phase ligand rebinding curves for photodissociated COHbY. The spectroscopic results indicate that COHbY has an R-like conformation with respect to both the proximal heme pocket and the hinge region of the alpha 1 beta 2 interface. The deoxy derivative of HbY has spectroscopic features that are very similar to those observed for species assigned to the deoxy R or half-liganded R conformations of human adult hemoglobin (HbA). The 10 ns to 100 micros relaxation properties of the photoproduct of COHbY are distinctly different from those of HbA in that for HbY, little if any tertiary or quaternary relaxation is observed. The near-absence of relaxation in the HbY photoproduct explains the differences in the geminate and solvent phase CO recombination between HbA and HbY. The impact of the conformational and relaxation properties of HbY on the geminate rebinding process forms the basis of a model that accounts for the large quaternary enhancement effect reported for HbY (Doyle et al., 1992). In addition, the spectroscopic data and the X-ray crystallographic results explain the slow relaxation for HbY and the near-absence of cooperative ligand binding for this protein based on the behavior of the penultimate tyrosines.  相似文献   

17.
Three independent three-dimensional reconstructions of the spinach photosystem II-light-harvesting complex supercomplex were derived from single particle analyses of non-stained, vitrified samples imaged by electron microscopy. Each reconstruction was found to differ significantly in the composition of the lumenal oxygen-evolving complex extrinsic proteins. From difference mapping, aided by electron microscopy of negatively stained selectively washed samples, regions of density were assigned to the PsbO and PsbP/PsbQ proteins. Interpretation of the density assigned to the PsbO protein was explored using computer-aided structural predictions. PsbO is calculated to be mainly a beta-protein (38% beta) composed of two domains within an overall elongated shape (Pazos, F., Heredia, P., Valencia, A., and De Las Rivas, J. (2001) Proteins Struct. Funct. Genet. 45, 372-381). The positioning and fitting of the proposed structural model for the PsbO protein within the three-dimensional map indicated that there is a single copy per reaction center. Moreover, the structural model derived for PsbO, together with difference mapping, indicates that this protein stretches across the surface of the reaction center with its N- and C-terminal domains located toward the CP47 and CP43 side, respectively. This structural assignment is discussed in terms of the recent x-ray-derived cyanobacterial model of PSII (Zouni, A., Witt, H.-T., Kern, J., Fromme, P., Krauss, N., Saenger, W., and Orth, P. (2001) Nature 409, 739-743).  相似文献   

18.
Shen HB  Chou KC 《Amino acids》2007,32(4):483-488
Predicting membrane protein type is both an important and challenging topic in current molecular and cellular biology. This is because knowledge of membrane protein type often provides useful clues for determining, or sheds light upon, the function of an uncharacterized membrane protein. With the explosion of newly-found protein sequences in the post-genomic era, it is in a great demand to develop a computational method for fast and reliably identifying the types of membrane proteins according to their primary sequences. In this paper, a novel classifier, the so-called "ensemble classifier", was introduced. It is formed by fusing a set of nearest neighbor (NN) classifiers, each of which is defined in a different pseudo amino acid composition space. The type for a query protein is determined by the outcome of voting among these constituent individual classifiers. It was demonstrated through the self-consistency test, jackknife test, and independent dataset test that the ensemble classifier outperformed other existing classifiers widely used in biological literatures. It is anticipated that the idea of ensemble classifier can also be used to improve the prediction quality in classifying other attributes of proteins according to their sequences.  相似文献   

19.
Konermann L 《Proteins》2006,65(1):153-163
It should take an astronomical time span for unfolded protein chains to find their native state based on an unguided conformational random search. The experimental observation that folding is fast can be rationalized by assuming that protein energy landscapes are sloped towards the native state minimum, such that rapid folding can proceed from virtually any point in conformational space. Folding transitions often exhibit two-state behavior, involving extensively disordered and highly structured conformers as the only two observable kinetic species. This study employs a simple Brownian dynamics model of "protein particles" moving in a spherically symmetrical potential. As expected, the presence of an overall slope towards the native state minimum is an effective means to speed up folding. However, the two-state nature of the transition is eradicated if a significant energetic bias extends too far into the non-native conformational space. The breakdown of two-state cooperativity under these conditions is caused by a continuous conformational drift of the unfolded proteins. Ideal two-state behavior can only be maintained on surfaces exhibiting large regions that are energetically flat, a result that is supported by other recent data in the literature (Kaya and Chan, Proteins: Struct Funct Genet 2003;52:510-523). Rapid two-state folding requires energy landscapes exhibiting the following features: (i) A large region in conformational space that is energetically flat, thus allowing for a significant degree of random sampling, such that unfolded proteins can retain a random coil structure; (ii) a trapping area that is strongly sloped towards the native state minimum.  相似文献   

20.
The nucleus is the brain of eukaryotic cells that guides the life processes of the cell by issuing key instructions. For in-depth understanding of the biochemical process of the nucleus, the knowledge of localization of nuclear proteins is very important. With the avalanche of protein sequences generated in the post-genomic era, it is highly desired to develop an automated method for fast annotating the subnuclear locations for numerous newly found nuclear protein sequences so as to be able to timely utilize them for basic research and drug discovery. In view of this, a novel approach is developed for predicting the protein subnuclear location. It is featured by introducing a powerful classifier, the optimized evidence-theoretic K-nearest classifier, and using the pseudo amino acid composition [K.C. Chou, PROTEINS: Structure, Function, and Genetics, 43 (2001) 246], which can incorporate a considerable amount of sequence-order effects, to represent protein samples. As a demonstration, identifications were performed for 370 nuclear proteins among the following 9 subnuclear locations: (1) Cajal body, (2) chromatin, (3) heterochromatin, (4) nuclear diffuse, (5) nuclear pore, (6) nuclear speckle, (7) nucleolus, (8) PcG body, and (9) PML body. The overall success rates thus obtained by both the re-substitution test and jackknife cross-validation test are significantly higher than those by existing classifiers on the same working dataset. It is anticipated that the powerful approach may also become a useful high throughput vehicle to bridge the huge gap occurring in the post-genomic era between the number of gene sequences in databases and the number of gene products that have been functionally characterized. The OET-KNN classifier will be available at www.pami.sjtu.edu.cn/people/hbshen.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号