首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Prediction of neurotoxins based on their function and source   总被引:1,自引:0,他引:1  
Saha S  Raghava GP 《In silico biology》2007,7(4-5):369-387
We have developed a method NTXpred for predicting neurotoxins and classifying them based on their function and origin. The dataset used in this study consists of 582 non-redundant, experimentally annotated neurotoxins obtained from Swiss-Prot. A number of modules have been developed for predicting neurotoxins using residue composition based on feed-forwarded neural network (FNN), recurrent neural network (RNN), support vector machine (SVM) and achieved maximum accuracy of 84.19%, 92.75%, 97.72% respectively. In addition, SVM modules have been developed for classifying neurotoxins based on their source (e.g., eubacteria, cnidarians, molluscs, arthropods have been and chordate) using amino acid composition and dipeptide composition and achieved maximum overall accuracy of 78.94% and 88.07% respectively. The overall accuracy increased to 92.10%, when the evolutionary information obtained from PSI-BLAST was combined with SVM module of source classification. We have also developed SVM modules for classifying neurotoxins based on functions using amino acid, dipeptide composition and achieved overall accuracy of 83.11%, 91.10% respectively. The overall accuracy of function classification improved to 95.11%, when PSI-BLAST output was combined with SVM module. All the modules developed in this study were evaluated using five-fold cross-validation technique. The NTXpred is available at www.imtech.res.in/raghava/ntxpred/ and mirror site at http://bioinformatics.uams.edu/mirror/ntxpred.  相似文献   

3.
Chen YL  Li QZ  Zhang LQ 《Amino acids》2012,42(4):1309-1316
Due to the complexity of Plasmodium falciparum (PF) genome, predicting mitochondrial proteins of PF is more difficult than other species. In this study, using the n-peptide composition of reduced amino acid alphabet (RAAA) obtained from structural alphabet named Protein Blocks as feature parameter, the increment of diversity (ID) is firstly developed to predict mitochondrial proteins. By choosing the 1-peptide compositions on the N-terminal regions with 20 residues as the only input vector, the prediction performance achieves 86.86% accuracy with 0.69 Mathew’s correlation coefficient (MCC) by the jackknife test. Moreover, by combining with the hydropathy distribution along protein sequence and several reduced amino acid alphabets, we achieved maximum MCC 0.82 with accuracy 92% in the jackknife test by using the developed ID model. When evaluating on an independent dataset our method performs better than existing methods. The results indicate that the ID is a simple and efficient prediction method for mitochondrial proteins of malaria parasite.  相似文献   

4.
Most of the prediction methods for secretory proteins require the presence of a correct N-terminal end of the preprotein for correct classification. As large scale genome sequencing projects sometimes assign the 5'-end of genes incorrectly, many proteins are encoded without the correct N-terminus leading to incorrect prediction. In this study, a systematic attempt has been made to predict secretory proteins irrespective of presence or absence of N-terminal signal peptides (also known as classical and non-classical secreted proteins respectively), using machine-learning techniques; artificial neural network (ANN) and support vector machine (SVM). We trained and tested our methods on a dataset of 3321 secretory and 3654 non-secretory mammalian proteins using five-fold cross-validation technique. First, ANN-based modules have been developed for predicting secretory proteins using 33 physico-chemical properties, amino acid composition and dipeptide composition and achieved accuracies of 73.1%, 76.1% and 77.1%, respectively. Similarly, SVM-based modules using 33 physico-chemical properties, amino acid, and dipeptide composition have been able to achieve accuracies of 77.4%, 79.4% and 79.9%, respectively. In addition, BLAST and PSI-BLAST modules designed for predicting secretory proteins based on similarity search achieved 23.4% and 26.9% accuracy, respectively. Finally, we developed a hybrid-approach by integrating amino acid and dipeptide composition based SVM modules and PSI-BLAST module that increased the accuracy to 83.2%, which is significantly better than individual modules. We also achieved high sensitivity of 60.4% with low value of 5% false positive predictions using hybrid module. A web server SRTpred has been developed based on above study for predicting classical and non-classical secreted proteins from whole sequence of mammalian proteins, which is available from http://www.imtech.res.in/raghava/srtpred/.  相似文献   

5.
This study describes a method for predicting and classifying oxygen-binding proteins. Firstly, support vector machine (SVM) modules were developed using amino acid composition and dipeptide composition for predicting oxygen-binding proteins, and achieved maximum accuracy of 85.5% and 87.8%, respectively. Secondly, an SVM module was developed based on amino acid composition, classifying the predicted oxygen-binding proteins into six classes with accuracy of 95.8%, 97.5%, 97.5%, 96.9%, 99.4%, and 96.0% for erythrocruorin, hemerythrin, hemocyanin, hemoglobin, leghemoglobin, and myoglobin proteins, respectively. Finally, an SVM module was developed using dipeptide composition for classifying the oxygen-binding proteins, and achieved maximum accuracy of 96.1%, 98.7%, 98.7%, 85.6%, 99.6%, and 93.3% for the above six classes, respectively. All modules were trained and tested by five-fold cross validation. Based on the above approach, a web server Oxypred was developed for predicting and classifying oxygen-binding proteins (available from http://www.imtech.res.in/raghava/oxypred/).  相似文献   

6.
This study describes a method for predicting and classifying oxygen-binding pro- teins. Firstly, support vector machine (SVM) modules were developed using amino acid composition and dipeptide composition for predicting oxygen-binding pro- teins, and achieved maximum accuracy of 85.5% and 87.8%, respectively. Sec- ondly, an SVM module was developed based on amino acid composition, classify- ing the predicted oxygen-binding proteins into six classes with accuracy of 95.8%, 97.5%, 97.5%, 96.9%, 99.4%, and 96.0% for erythrocruorin, hemerythrin, hemo- cyanin, hemoglobin, leghemoglobin, and myoglobin proteins, respectively. Finally, an SVM module was developed using dipeptide composition for classifying the oxygen-binding proteins, and achieved maximum accuracy of 96.1%, 98.7%, 98.7%, 85.6%, 99.6%, and 93.3% for the above six classes, respectively. All modules were trained and tested by five-fold cross validation. Based on the above approach, a web server Oxypred was developed for predicting and classifying oxygen-binding proteins(available from http://www.imtech.res.in/raghava/oxypred/).  相似文献   

7.
This study describes a method for predicting and classifying oxygen-binding pro- teins. Firstly, support vector machine (SVM) modules were developed using amino acid composition and dipeptide composition for predicting oxygen-binding pro- teins, and achieved maximum accuracy of 85.5% and 87.8%, respectively. Sec- ondly, an SVM module was developed based on amino acid composition, classify- ing the predicted oxygen-binding proteins into six classes with accuracy of 95.8%, 97.5%, 97.5%, 96.9%, 99.4%, and 96.0% for erythrocruorin, hemerythrin, hemo- cyanin, hemoglobin, leghemoglobin, and myoglobin proteins, respectively. Finally, an SVM module was developed using dipeptide composition for classifying the oxygen-binding proteins, and achieved maximum accuracy of 96.1%, 98.7%, 98.7%, 85.6%, 99.6%, and 93.3% for the above six classes, respectively. All modules were trained and tested by five-fold cross validation. Based on the above approach, a web server Oxypred was developed for predicting and classifying oxygen-binding proteins(available from http://www.imtech.res.in/raghava/oxypred/).  相似文献   

8.
Saha S  Raghava GP 《In silico biology》2007,7(4-5):405-412
This paper describes a method developed for predicting bacterial toxins from their amino acid sequences. All the modules, developed in this study, were trained and tested on a non-redundant dataset of 150 bacterial toxins that included 77 exotoxins and 73 endotoxins. Firstly, support vector machines (SVM) based modules were developed for predicting the bacterial toxins using amino acids and dipeptides composition and achieved an accuracy of 96.07% and 92.50%, respectively. Secondly, SVM based modules were developed for discriminating entotoxins and exotoxins, using amino acids and dipeptides composition and achieved an accuracy of 95.71% and 92.86%, respectively. In addition, modules have been developed for classifying the exotoxins (e.g. activate adenylate cyclase, activate guanylate cyclase, neurotoxins) using hidden Markov models (HMM), PSI-BLAST and a combination of the two and achieved overall accuracy of 95.75%, 97.87% and 100%, respectively. Based on the above study, a web server called 'BTXpred' has been developed, which is available at http://www.imtech.res.in/raghava/btxpred/. Supplementary information is available at http://www.imtech.res.in/raghava/btxpred/supplementary.html.  相似文献   

9.
Here we report a systematic approach for predicting subcellular localization (cytoplasm, mitochondrial, nuclear, and plasma membrane) of human proteins. First, support vector machine (SVM)-based modules for predicting subcellular localization using traditional amino acid and dipeptide (i + 1) composition achieved overall accuracy of 76.6 and 77.8%, respectively. PSI-BLAST, when carried out using a similarity-based search against a nonredundant data base of experimentally annotated proteins, yielded 73.3% accuracy. To gain further insight, a hybrid module (hybrid1) was developed based on amino acid composition, dipeptide composition, and similarity information and attained better accuracy of 84.9%. In addition, SVM modules based on a different higher order dipeptide i.e. i + 2, i + 3, and i + 4 were also constructed for the prediction of subcellular localization of human proteins, and overall accuracy of 79.7, 77.5, and 77.1% was accomplished, respectively. Furthermore, another SVM module hybrid2 was developed using traditional dipeptide (i + 1) and higher order dipeptide (i + 2, i + 3, and i + 4) compositions, which gave an overall accuracy of 81.3%. We also developed SVM module hybrid3 based on amino acid composition, traditional and higher order dipeptide compositions, and PSI-BLAST output and achieved an overall accuracy of 84.4%. A Web server HSLPred (www.imtech.res.in/raghava/hslpred/ or bioinformatics.uams.edu/raghava/hslpred/) has been designed to predict subcellular localization of human proteins using the above approaches.  相似文献   

10.
Mitochondria are considered as one of the core organelles of eukaryotic cells hence prediction of mitochondrial proteins is one of the major challenges in the field of genome annotation. This study describes a method, MitPred, developed for predicting mitochondrial proteins with high accuracy. The data set used in this study was obtained from Guda, C., Fahy, E. & Subramaniam, S. (2004) Bioinformatics 20, 1785-1794. First support vector machine-based modules/methods were developed using amino acid and dipeptide composition of proteins and achieved accuracy of 78.37 and 79.38%, respectively. The accuracy of prediction further improved to 83.74% when split amino acid composition (25 N-terminal, 25 C-terminal, and remaining residues) of proteins was used. Then BLAST search and support vector machine-based method were combined to get 88.22% accuracy. Finally we developed a hybrid approach that combined hidden Markov model profiles of domains (exclusively found in mitochondrial proteins) and the support vector machine-based method. We were able to predict mitochondrial protein with 100% specificity at a 56.36% sensitivity rate and with 80.50% specificity at 98.95% sensitivity. The method estimated 9.01, 6.35, 4.84, 3.95, and 4.25% of proteins as mitochondrial in Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, mouse, and human proteomes, respectively. MitPred was developed on the above hybrid approach.  相似文献   

11.
The attainment of complete map‐based sequence for rice (Oryza sativa) is clearly a major milestone for the research community. Identifying the localization of encoded proteins is the key to understanding their functional characteristics and facilitating their purification. Our proposed method, RSLpred, is an effort in this direction for genome‐scale subcellular prediction of encoded rice proteins. First, the support vector machine (SVM)‐based modules have been developed using traditional amino acid‐, dipeptide‐ (i+1) and four parts‐amino acid composition and achieved an overall accuracy of 81.43, 80.88 and 81.10%, respectively. Secondly, a similarity search‐based module has been developed using position‐specific iterated‐basic local alignment search tool and achieved 68.35% accuracy. Another module developed using evolutionary information of a protein sequence extracted from position‐specific scoring matrix achieved an accuracy of 87.10%. In this study, a large number of modules have been developed using various encoding schemes like higher‐order dipeptide composition, N‐ and C‐terminal, splitted amino acid composition and the hybrid information. In order to benchmark RSLpred, it was tested on an independent set of rice proteins where it outperformed widely used prediction methods such as TargetP, Wolf‐PSORT, PA‐SUB, Plant‐Ploc and ESLpred. To assist the plant research community, an online web tool ‘RSLpred’ has been developed for subcellular prediction of query rice proteins, which is freely accessible at http://www.imtech.res.in/raghava/rslpred.  相似文献   

12.
13.

Background

Mannose binding proteins (MBPs) play a vital role in several biological functions such as defense mechanisms. These proteins bind to mannose on the surface of a wide range of pathogens and help in eliminating these pathogens from our body. Thus, it is important to identify mannose interacting residues (MIRs) in order to understand mechanism of recognition of pathogens by MBPs.

Results

This paper describes modules developed for predicting MIRs in a protein. Support vector machine (SVM) based models have been developed on 120 mannose binding protein chains, where no two chains have more than 25% sequence similarity. SVM models were developed on two types of datasets: 1) main dataset consists of 1029 mannose interacting and 1029 non-interacting residues, 2) realistic dataset consists of 1029 mannose interacting and 10320 non-interacting residues. In this study, firstly, we developed standard modules using binary and PSSM profile of patterns and got maximum MCC around 0.32. Secondly, we developed SVM modules using composition profile of patterns and achieved maximum MCC around 0.74 with accuracy 86.64% on main dataset. Thirdly, we developed a model on a realistic dataset and achieved maximum MCC of 0.62 with accuracy 93.08%. Based on this study, a standalone program and web server have been developed for predicting mannose interacting residues in proteins (http://www.imtech.res.in/raghava/premier/).

Conclusions

Compositional analysis of mannose interacting and non-interacting residues shows that certain types of residues are preferred in mannose interaction. It was also observed that residues around mannose interacting residues have a preference for certain types of residues. Composition of patterns/peptide/segment has been used for predicting MIRs and achieved reasonable high accuracy. It is possible that this novel strategy may be effective to predict other types of interacting residues. This study will be useful in annotating the function of protein as well as in understanding the role of mannose in the immune system.  相似文献   

14.
Glutathione S-transferase (GST) proteins play vital role in living organism that includes detoxification of exogenous and endogenous chemicals, survivability during stress condition. This paper describes a method developed for predicting GST proteins. We have used a dataset of 107 GST and 107 non-GST proteins for training and the performance of the method was evaluated with five-fold cross-validation technique. First a SVM based method has been developed using amino acid and dipeptide composition and achieved the maximum accuracy of 91.59% and 95.79% respectively. In addition we developed a SVM based method using tripeptide composition and achieved maximum accuracy 97.66% which is better than accuracy achieved by HMM based searching (96.26%). Based on above study a web-server GSTPred has been developed (http://www.imtech.res.in/raghava/gstpred/).  相似文献   

15.
Matrix metalloproteinase (MMPs) and disintegrin and metalloprotease (ADAMs) belong to the zinc-dependent metalloproteinase family of proteins. These proteins participate in various physiological and pathological states. Thus, prediction of these proteins using amino acid sequence would be helpful. We have developed a method to predict these proteins based on the features derived from Chou’s pseudo amino acid composition (PseAAC) server and support vector machine (SVM) as a powerful machine learning approach. With this method, for ADAMs and MMPs families, an overall accuracy and Matthew’s correlation coefficient (MCC) of 95.89 and 0.90% were achieved respectively. Furthermore, the method is able to predict two major subclasses of MMP family; Furin-activated secreted MMPs and Type II trans-membrane; with MCC of 0.89 and 0.91%, respectively. The overall accuracy for Furin-activated secreted MMPs and Type II trans-membrane was 98.18 and 99.07, respectively. Our data demonstrates an effective classification of Metalloproteinase family based on the concept of PseAAC and SVM.  相似文献   

16.

Background

One of the major challenges in the field of vaccine design is to predict conformational B-cell epitopes in an antigen. In the past, several methods have been developed for predicting conformational B-cell epitopes in an antigen from its tertiary structure. This is the first attempt in this area to predict conformational B-cell epitope in an antigen from its amino acid sequence.

Results

All Support vector machine (SVM) models were trained and tested on 187 non-redundant protein chains consisting of 2261 antibody interacting residues of B-cell epitopes. Models have been developed using binary profile of pattern (BPP) and physiochemical profile of patterns (PPP) and achieved a maximum MCC of 0.22 and 0.17 respectively. In this study, for the first time SVM model has been developed using composition profile of patterns (CPP) and achieved a maximum MCC of 0.73 with accuracy 86.59%. We compare our CPP based model with existing structure based methods and observed that our sequence based model is as good as structure based methods.

Conclusion

This study demonstrates that prediction of conformational B-cell epitope in an antigen is possible from is primary sequence. This study will be very useful in predicting conformational B-cell epitopes in antigens whose tertiary structures are not available. A web server CBTOPE has been developed for predicting B-cell epitope http://www.imtech.res.in/raghava/cbtope/.  相似文献   

17.
This study describes methods for predicting and classifying voltage-gated ion channels. Firstly, a standard support vector machine (SVM) method was developed for predicting ion channels by using amino acid composition and dipeptide composition, with an accuracy of 82.89% and 85.56%, respectively. The accuracy of this SVM method was improved from 85.56% to 89.11% when combined with PSIBLAST similarity search. Then we developed an SVM method for classifying ion channels (potassium, sodium, calcium, and chloride) by using dipeptide composition and achieved an overall accuracy of 96.89%. We further achieved a classification accuracy of 97.78% by using a hybrid method that combines dipeptidebased SVM and hidden Markov model methods. A web server VGIchan has been developed for predicting and classifying voltage-gated ion channels using the above approaches. VGIchan is freely available at www.imtech.res.in/raghava/vgichan/.  相似文献   

18.
Natt NK  Kaur H  Raghava GP 《Proteins》2004,56(1):11-18
This article describes a method developed for predicting transmembrane beta-barrel regions in membrane proteins using machine learning techniques: artificial neural network (ANN) and support vector machine (SVM). The ANN used in this study is a feed-forward neural network with a standard back-propagation training algorithm. The accuracy of the ANN-based method improved significantly, from 70.4% to 80.5%, when evolutionary information was added to a single sequence as a multiple sequence alignment obtained from PSI-BLAST. We have also developed an SVM-based method using a primary sequence as input and achieved an accuracy of 77.4%. The SVM model was modified by adding 36 physicochemical parameters to the amino acid sequence information. Finally, ANN- and SVM-based methods were combined to utilize the full potential of both techniques. The accuracy and Matthews correlation coefficient (MCC) value of SVM, ANN, and combined method are 78.5%, 80.5%, and 81.8%, and 0.55, 0.63, and 0.64, respectively. These methods were trained and tested on a nonredundant data set of 16 proteins, and performance was evaluated using "leave one out cross-validation" (LOOCV). Based on this study, we have developed a Web server, TBBPred, for predicting transmembrane beta-barrel regions in proteins (available at http://www.imtech.res.in/raghava/tbbpred).  相似文献   

19.
Prediction of RNA binding sites in a protein using SVM and PSSM profile   总被引:1,自引:0,他引:1  
Kumar M  Gromiha MM  Raghava GP 《Proteins》2008,71(1):189-194
  相似文献   

20.
Sethi D  Garg A  Raghava GP 《Amino acids》2008,35(3):599-605
The association of structurally disordered proteins with a number of diseases has engendered enormous interest and therefore demands a prediction method that would facilitate their expeditious study at molecular level. The present study describes the development of a computational method for predicting disordered proteins using sequence and profile compositions as input features for the training of SVM models. First, we developed the amino acid and dipeptide compositions based SVM modules which yielded sensitivities of 75.6 and 73.2% along with Matthew’s Correlation Coefficient (MCC) values of 0.75 and 0.60, respectively. In addition, the use of predicted secondary structure content (coil, sheet and helices) in the form of composition values attained a sensitivity of 76.8% and MCC value of 0.77. Finally, the training of SVM models using evolutionary information hidden in the multiple sequence alignment profile improved the prediction performance by achieving a sensitivity value of 78% and MCC of 0.78. Furthermore, when evaluated on an independent dataset of partially disordered proteins, the same SVM module provided a correct prediction rate of 86.6%. Based on the above study, a web server (“DPROT”) was developed for the prediction of disordered proteins, which is available at .  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号