首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Assigning subcellular localization (SL) to proteins is one of the major tasks of functional proteomics. Despite the impressive technical advances of the past decades, it is still time-consuming and laborious to experimentally determine SL on a high throughput scale. Thus, computational predictions are the preferred method for large-scale assignment of protein SL, and if appropriate, followed up by experimental studies. In this report, using a machine learning approach, the Nearest Neighbor Algorithm (NNA), we developed a prediction system for protein SL in which we incorporated a protein functional domain profile. The overall accuracy achieved by this system is 93.96%. Furthermore, comparisons with other methods have been conducted to demonstrate the validity and efficiency of our prediction system. We also provide an implementation of our Subcellular Location Prediction System (SLPS), which is available at http://pcal.biosino.org.  相似文献   

2.
MOTIVATION: A key goal of genomics is to assign function to genes, especially for orphan sequences. RESULTS: We compared the clustered functional domains in the SBASE database to each protein sequence using BLASTP. This representation for a protein is a vector, where each of the non-zero entries in the vector indicates a significant match between the sequence of interest and the SBASE domain. The machine learning methods nearest neighbour algorithm (NNA) and support vector machines are used for predicting protein functional classes from this information. We find that the best results are found using the SBASE-A database and the NNA, namely 72% accuracy for 79% coverage. We tested an assigning function based on searching for InterPro sequence motifs and by taking the most significant BLAST match within the dataset. We applied the functional domain composition method to predict the functional class of 2018 currently unclassified yeast open reading frames. AVAILABILITY: A program for the prediction method, that uses NNA called Functional Class Prediction based on Functional Domains (FCPFD) is available and can be obtained by contacting Y.D.Cai at y.cai@umist.ac.uk  相似文献   

3.
4.
The functional domain composition is introduced to predict the structural class of a protein or domain according to the following classification: all-alpha, all-beta, alpha/beta, alpha+beta, micro (multi-domain), sigma (small protein), and rho (peptide). The advantage by doing so is that both the sequence-order-related features and the function-related features are naturally incorporated in the predictor. As a demonstration, the jackknife cross-validation test was performed on a dataset that consists of proteins and domains with only less than 20% sequence identity to each other in order to get rid of any homologous bias. The overall success rate thus obtained was 98%. In contrast to this, the corresponding rates obtained by the simple geometry approaches based on the amino acid composition were only 36-39%. This indicates that using the functional domain composition to represent the sample of a protein for statistical prediction is very promising, and that the functional type of a domain is closely correlated with its structural class.  相似文献   

5.

Background  

The number and the arrangement of subunits that form a protein are referred to as quaternary structure. Quaternary structure is an important protein attribute that is closely related to its function. Proteins with quaternary structure are called oligomeric proteins. Oligomeric proteins are involved in various biological processes, such as metabolism, signal transduction, and chromosome replication. Thus, it is highly desirable to develop some computational methods to automatically classify the quaternary structure of proteins from their sequences.  相似文献   

6.
As a continuous effort to use the sequence approach to identify enzymatic function at a deeper level, investigations are extended from the main enzyme classes (Protein Sci. 2004, 13, 2857-2863) to their subclasses. This is indispensable if we wish to understand the molecular mechanism of an enzyme at a deeper level. For each of the 6 main enzyme classes (i.e., oxidoreductase, transferase, hydrolase, lyase, isomerase, and ligase), a subclass training dataset is constructed. To reduce homologous bias, a stringent cutoff was imposed that all the entries included in the datasets have less than 40% sequence identity to each other. To catch the core feature that is intimately related to the biological function, the sample of a protein is represented by hybridizing the functional domain composition and pseudo amino acid composition. On the basis of such a hybridization representation, the FunD-PseAA predictor is established. It is demonstrated by the jackknife cross-validation tests that the overall success rate in identifying the 21 subclasses of oxidoreductases is above 86%, and the corresponding rates in identifying the subclasses of the other 5 main enzyme classes are 94-97%. The high success rates imply that the FunD-PseAA predictor may become a useful tool in bioinformatics and proteomics of the post-genomic era.  相似文献   

7.
Given the sequence of a protein, how can we predict whether it is a membrane protein or non-membrane protein? If it is, what membrane protein type it belongs to? Since these questions are closely relevant to the function of an uncharacterized protein, their importance is self-evident. Particularly, with the explosion of protein sequences entering into databanks and the relatively much slower progress in using biochemical experiments to determine their functions, it is highly desired to develop an automated method that can be used to give a fast answers to these questions. By hybridizing the functional domain (FunD) and pseudo-amino acid composition (PseAA), a new strategy called FunD-PseAA predictor was introduced. To test the power of the predictor, a highly non-homologous data set was constructed where none of proteins has 25% sequence identity to any other. The overall success rates obtained with the FunD-PseAA predictor on such a data set by the jackknife cross-validation test was 85% for the case in identifying membrane protein and non-membrane protein, and 91% in identifying the membrane protein type among the following 5 categories: (1) type-1 membrane protein, (2) type-2 membrane protein, (3) multipass transmembrane protein, (4) lipid chain-anchored membrane protein, and (5) GPI-anchored membrane protein. These rates are much higher than those obtained by the other methods on the same stringent data set, indicating that the FunD-PseAA predictor may become a useful high throughput tool in bioinformatics and proteomics.  相似文献   

8.
According to their main EC (Enzyme Commission) numbers, enzymes are classified into the following 6 main classes: oxidoreductases, transferases, hydrolases, lyases, isomerases, and ligases. A new method has been developed to predict the enzymatic attribute of proteins by introducing the functional domain composition to formulate a given protein sequence. The advantage by doing so is that both the sequence-order-related features and the function-related features are naturally incorporated in the predictor. As a demonstration, the jackknife cross-validation test was performed on a dataset that consists of proteins with only less than 20% sequence identity to each other in order to get rid of any homologous bias. The overall success rate thus obtained was 85% in identifying the enzyme family classes (including the identification of nonenzyme protein sequences as well). The success rate is significantly higher than those obtained by the other methods on such a stringent dataset. This indicates that using the functional domain composition to represent protein samples for statistical prediction is indeed very promising, and will become a powerful tool in bioinformatics and proteomics.  相似文献   

9.
Recent advances in large-scale genome sequencing have led to the rapid accumulation of amino acid sequences of proteins whose functions are unknown. Since the functions of these proteins are closely correlated with their subcellular localizations, many efforts have been made to develop a variety of methods for predicting protein subcellular location. In this study, based on the strategy by hybridizing the functional domain composition and the pseudo-amino acid composition (Cai and Chou [2003]: Biochem. Biophys. Res. Commun. 305:407-411), the Intimate Sorting Algorithm (ISort predictor) was developed for predicting the protein subcellular location. As a showcase, the same plant and non-plant protein datasets as investigated by the previous investigators were used for demonstration. The overall success rate by the jackknife test for the plant protein dataset was 85.4%, and that for the non-plant protein dataset 91.9%. These are so far the highest success rates achieved for the two datasets by following a rigorous cross validation test procedure, further confirming that such a hybrid approach may become a very useful high-throughput tool in the area of bioinformatics, proteomics, as well as molecular cell biology.  相似文献   

10.
Intramembrane proteolysis is now firmly established as a prominent biological process, and structure elucidation is emerging as the new frontier in the understanding of these novel membrane-embedded enzymes. Reproducing this unusual hydrolysis within otherwise water-excluding transmembrane regions with purified proteins is a challenging prerequisite for such structural studies. Here we show the bacterial expression, purification, and reconstitution of proteolytically active signal peptide peptidase (SPP), a membrane-embedded enzyme in the presenilin family of aspartyl proteases. This finding formally proves that, unlike presenilin, SPP does not require any additional proteins for proteolysis. Surprisingly, the conserved C-terminal half of SPP is sufficient for proteolytic activity; purification and reconstitution of this engineered fragment of several SPP orthologues revealed that this region defines a functional domain for an intramembrane aspartyl protease. The discovery of minimal requirements for intramembrane proteolysis should facilitate mechanistic and structural analysis and help define general biochemical principles of hydrolysis in a hydrophobic environment.  相似文献   

11.
A new method based on the analysis of oligopeptide composition of the amino acid sequences from different protein families is presented. We assume, that any protein family can be characterized by the set of oligopeptides (oligopeptides vocabulary). We demonstrate, that oligopeptides vocabulary comparison can distinguish different families from each other and from random sequences. It should be noted, that this comparison can be successfully performed on the set of only 25 dipeptides and without preliminary alignment. We demonstrate, that characteristic peptides are localized in the regions of functional significance, as shown on the example of GTP-binding domain of translation elongation factors. We suggest how to use this method to localize the boundaries of functional domains in amino sequences. On the example of few functional domains we demonstrate, that the average error of prediction does not exceed 3-4 amino acid residue.  相似文献   

12.
13.
In this paper, based on the approach by combining the "functional domain composition" [K.C. Chou, Y. D. Cai, J. Biol. Chem. 277 (2002) 45765] and the pseudo-amino acid composition [K.C. Chou, Proteins Struct. Funct. Genet. 43 (2001) 246; Correction Proteins Struct. Funct. Genet. 2044 (2001) 2060], the Nearest Neighbour Algorithm (NNA) was developed for predicting the protein subcellular location. Very high success rates were observed, suggesting that such a hybrid approach may become a useful high-throughput tool in the area of bioinformatics and proteomics.  相似文献   

14.
Prediction of protein (domain) structural classes based on amino-acid index.   总被引:10,自引:0,他引:10  
A protein (domain) is usually classified into one of the following four structural classes: all-alpha, all-beta, alpha/beta and alpha + beta. In this paper, a new formulation is proposed to predict the structural class of a protein (domain) from its primary sequence. Instead of the amino-acid composition used widely in the previous structural class prediction work, the auto-correlation functions based on the profile of amino-acid index along the primary sequence of the query protein (domain) are used for the structural class prediction. Consequently, the overall predictive accuracy is remarkably improved. For the same training database consisting of 359 proteins (domains) and the same component-coupled algorithm [Chou, K.C. & Maggiora, G.M. (1998) Protein Eng. 11, 523-538], the overall predictive accuracy of the new method for the jackknife test is 5-7% higher than the accuracy based only on the amino-acid composition. The overall predictive accuracy finally obtained for the jackknife test is as high as 90.5%, implying that a significant improvement has been achieved by making full use of the information contained in the primary sequence for the class prediction. This improvement depends on the size of the training database, the auto-correlation functions selected and the amino-acid index used. We have found that the amino-acid index proposed by Oobatake and Ooi, i.e. the average nonbonded energy per residue, leads to the optimal predictive result in the case for the database sets studied in this paper. This study may be considered as an alternative step towards making the structural class prediction more practical.  相似文献   

15.
This study provides evidence for the organization of a bacterial cytoplasmic membrane in functional domains. The cytoplasmic membrane ofRhodospirillum rubrum was fractionated on the basis of affinity for the -lactam agent 6-aminopenicillanic acid. Cytoplasmic membrane components were found to be differentially localized in membrane subpopulations; this implies that these membranes arose from distinct domains in the laterally differentiated cytoplasmic membrane.  相似文献   

16.
Lantibiotics are ribosomally synthesized and post-translationally modified peptide antibiotics that contain unusual amino acids such as dehydro and lanthionine residues. Nukacin ISK-1 is a class II lantibiotic, whose precursor peptide (NukA) is modified by NukM to form modified NukA. ATP-binding cassette (ABC) transporter NukT is predicted to cleave off the N-terminal leader peptide of modified NukA and secrete the mature peptide. Multiple sequence alignments revealed that NukT has an N-terminal peptidase domain (PEP) and a C-terminal ATP binding domain (ABD). Previously, in vitro reconstitution of NukT has revealed that NukT peptidase activity depends on ATP hydrolysis. Here, we constructed a series of NukT mutants and investigated their transport activity in vivo and peptidase activity in vitro. Most of the mutations of the conserved residues of PEP or ABD resulted in failure of nukacin ISK-1 production and accumulation of modified NukA inside the cells. NukT(N106D) was found to be the only mutant capable of producing nukacin ISK-1. Asn(106) is conserved as Asp in other related ABC transporters. Additionally, an in vitro peptidase assay of NukT mutants demonstrated that PEP is on the cytosolic side and all of the ABD mutants as well as PEP (with the exception of NukT(N106D)) did not have peptidase activity in vitro. Taken together, these observations suggest that the leader peptide is cleaved off inside the cells before peptide secretion; both PEP and ABD are important for NukT peptidase activity, and cooperation between these two domains inside the cells is indispensable for proper functioning of NukT.  相似文献   

17.
rMuc3 is a typical transmembrane mucin and contains a 174 amino acid domain called an SEA module in its C-terminal domain which is cleaved in eukaryotic cells. However, the mechanism by which the rMuc3 SEA module is proteolyzed and its biological significance has to be elucidated. In this study, we showed that the rMuc3 C-terminal domain was cleaved at LSKGSIVV motif within SEA module in prokaryotic cells, the time-dependence of the cleavage was found in the purified rMuc3 C-terminal domain carrying a mutated LSKASIVV motif expressed in bacteria. Thus, the cleavage of rMuc3 SEA module depended on autoproteolysis. The autoproteolysis of the SEA module of rMuc3 C-terminal domain played a critical role in the migration and invasion of the LoVo human colon cancer cells with rMuc3 C-terminal domain in vitro. The rMuc3 C-terminal domain induced a significant activation of HER/ErbB2 phosphorylated form (py1248) in LoVo cells. Inhibition of the phosphorylation by gefitinib (ZD1839) did attenuate migration and invasion of LoVo cells with rMuc3 C-terminal domain. Thus, rMuc3 C-terminal domain undergoes autoproteolysis at its SEA module, which maintains its availability for the potentiation of the signaling process that is modulated by HER/ErbB2 phosphorylation to promote the migration and invasion of LoVo cells.  相似文献   

18.
Cai YD  Zhou GP  Chou KC 《Biophysical journal》2003,84(5):3257-3263
Membrane proteins are generally classified into the following five types: 1), type I membrane protein; 2), type II membrane protein; 3), multipass transmembrane proteins; 4), lipid chain-anchored membrane proteins; and 5), GPI-anchored membrane proteins. In this article, based on the concept of using the functional domain composition to define a protein, the Support Vector Machine algorithm is developed for predicting the membrane protein type. High success rates are obtained by both the self-consistency and jackknife tests. The current approach, complemented with the powerful covariant discriminant algorithm based on the pseudo-amino acid composition that has incorporated quasi-sequence-order effect as recently proposed by K. C. Chou (2001), may become a very useful high-throughput tool in the area of bioinformatics and proteomics.  相似文献   

19.
We have created a database of two-domain proteins with homology less than 25% (452 proteins). Based on one half of this set of proteins statistics of appearance of amino acid residues on the domain boundaries of multiple domain proteins has been obtained. Small and hydrophilic amino acids (proline, glycine, asparagine, glutamic acid, arginine and others) appear on the domain boundaries more often than in the whole protein. Opposite, hydrophobic amino acid residues (tryptophane, methionine, phenylalanine and others) appear on the domain boundaries more rarely. The obtained scales of the appearance of amino acid residues on the boundary regions from the statistics have been used for calculation of domain boundaries in the proteins of the second half of the database. The probability scale obtained by averaging the appearance of amino acid residues on the domain boundary region including 8 residues (+/-4 residues from the real domain boundary) gives the best result: for 57% of proteins the predicted boundary was closer than 40 residues to the boundary assigned from three-dimensional structures, for 41% it was closer than 20 residues from the real boundary. The probability scale was used to predict domain boundaries for proteins with unknown three-dimensional structure (international competition CASP6).  相似文献   

20.

Background  

Metabolic pathway is a highly regulated network consisting of many metabolic reactions involving substrates, enzymes, and products, where substrates can be transformed into products with particular catalytic enzymes. Since experimental determination of the network of substrate-enzyme-product triad (whether the substrate can be transformed into the product with a given enzyme) is both time-consuming and expensive, it would be very useful to develop a computational approach for predicting the network of substrate-enzyme-product triads.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号