期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Prediction of RNA-binding proteins from primary sequence by a support vector machine approach 总被引：3，自引：0，他引：3

Han LY Cai CZ Lo SL Chung MC Chen YZ 《RNA (New York, N.Y.)》2004,10(3):355-368

Elucidation of the interaction of proteins with different molecules is of significance in the understanding of cellular processes. Computational methods have been developed for the prediction of protein-protein interactions. But insufficient attention has been paid to the prediction of protein-RNA interactions, which play central roles in regulating gene expression and certain RNA-mediated enzymatic processes. This work explored the use of a machine learning method, support vector machines (SVM), for the prediction of RNA-binding proteins directly from their primary sequence. Based on the knowledge of known RNA-binding and non-RNA-binding proteins, an SVM system was trained to recognize RNA-binding proteins. A total of 4011 RNA-binding and 9781 non-RNA-binding proteins was used to train and test the SVM classification system, and an independent set of 447 RNA-binding and 4881 non-RNA-binding proteins was used to evaluate the classification accuracy. Testing results using this independent evaluation set show a prediction accuracy of 94.1%, 79.3%, and 94.1% for rRNA-, mRNA-, and tRNA-binding proteins, and 98.7%, 96.5%, and 99.9% for non-rRNA-, non-mRNA-, and non-tRNA-binding proteins, respectively. The SVM classification system was further tested on a small class of snRNA-binding proteins with only 60 available sequences. The prediction accuracy is 40.0% and 99.9% for snRNA-binding and non-snRNA-binding proteins, indicating a need for a sufficient number of proteins to train SVM. The SVM classification systems trained in this work were added to our Web-based protein functional classification software SVMProt, at http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi. Our study suggests the potential of SVM as a useful tool for facilitating the prediction of protein-RNA interactions. 相似文献

2.

GISMO--gene identification using a support vector machine for ORF classification

下载免费PDF全文

Krause L McHardy AC Nattkemper TW Pühler A Stoye J Meyer F 《Nucleic acids research》2007,35(2):540-549

We present the novel prokaryotic gene finder GISMO, which combines searches for protein family domains with composition-based classification based on a support vector machine. GISMO is highly accurate; exhibiting high sensitivity and specificity in gene identification. We found that it performs well for complete prokaryotic chromosomes, irrespective of their GC content, and also for plasmids as short as 10 kb, short genes and for genes with atypical sequence composition. Using GISMO, we found several thousand new predictions for the published genomes that are supported by extrinsic evidence, which strongly suggest that these are very likely biologically active genes. The source code for GISMO is freely available under the GPL license. 相似文献

3.

Using recurrence quantification analysis descriptors for protein sequence classification with support vector machines

Mitra J Mundra P Kulkarni BD Jayaraman VK 《Journal of biomolecular structure & dynamics》2007,25(3):289-298

相似文献

4.

Protein function classification via support vector machine approach 总被引：2，自引：0，他引：2

Cai CZ Wang WL Sun LZ Chen YZ 《Mathematical biosciences》2003,185(2):111-122

Support vector machine (SVM) is introduced as a method for the classification of proteins into functionally distinguished classes. Studies are conducted on a number of protein classes including RNA-binding proteins; protein homodimers, proteins responsible for drug absorption, proteins involved in drug distribution and excretion, and drug metabolizing enzymes. Testing accuracy for the classification of these protein classes is found to be in the range of 84-96%. This suggests the usefulness of SVM in the classification of protein functional classes and its potential application in protein function prediction. 相似文献

5.

PhosphoSVM: prediction of phosphorylation sites by integrating various protein sequence attributes with a support vector machine

Yongchao Dou Bo Yao Chi Zhang 《Amino acids》2014,46(6):1459-1469

相似文献

6.

Identification of catalytic residues from protein structure using support vector machine with sequence and structural features

Pugalenthi G Kumar KK Suganthan PN Gangal R 《Biochemical and biophysical research communications》2008,367(3):630-634

Identification of catalytic residues can provide valuable insights into protein function. With the increasing number of protein 3D structures having been solved by X-ray crystallography and NMR techniques, it is highly desirable to develop an efficient method to identify their catalytic sites. In this paper, we present an SVM method for the identification of catalytic residues using sequence and structural features. The algorithm was applied to the 2096 catalytic residues derived from Catalytic Site Atlas database. We obtained overall prediction accuracy of 88.6% from 10-fold cross validation and 95.76% from resubstitution test. Testing on the 254 catalytic residues shows our method can correctly predict all 254 residues. This result suggests the usefulness of our approach for facilitating the identification of catalytic residues from protein structures. 相似文献

7.

Recognition and classification of histones using support vector machine.

Manoj Bhasin Ellis L Reinherz Pedro A Reche 《Journal of computational biology》2006,13(1):102-112

Histones are DNA-binding proteins found in the chromatin of all eukaryotic cells. They are highly conserved and can be grouped into five major classes: H1/H5, H2A, H2B, H3, and H4. Two copies of H2A, H2B, H3, and H4 bind to about 160 base pairs of DNA forming the core of the nucleosome (the repeating structure of chromatin) and H1/H5 bind to its DNA linker sequence. Overall, histones have a high arginine/lysine content that is optimal for interaction with DNA. This sequence bias can make the classification of histones difficult using standard sequence similarity approaches. Therefore, in this paper, we applied support vector machine (SVM) to recognize and classify histones on the basis of their amino acid and dipeptide composition. On evaluation through a five-fold cross-validation, the SVM-based method was able to distinguish histones from nonhistones (nuclear proteins) with an accuracy around 98%. Similarly, we obtained an overall >95% accuracy in discriminating the five classes of histones through the application of 1-versus-rest (1-v-r) SVM. Finally, we have applied this SVM-based method to the detection of histones from whole proteomes and found a comparable sensitivity to that accomplished by hidden Markov motifs (HMM) profiles. 相似文献

8.

Identification of functionally diverse lipocalin proteins from sequence information using support vector machine

Ganesan Pugalenthi Krishna Kumar Kandaswamy P. N. Suganthan G. Archunan R. Sowdhamini 《Amino acids》2010,39(3):777-783

Lipocalins are functionally diverse proteins that are composed of 120–180 amino acid residues. Members of this family have several important biological functions including ligand transport, cryptic coloration, sensory transduction, endonuclease activity, stress response activity in plants, odorant binding, prostaglandin biosynthesis, cellular homeostasis regulation, immunity, immunotherapy and so on. Identification of lipocalins from protein sequence is more challenging due to the poor sequence identity which often falls below the twilight zone. So far, no specific method has been reported to identify lipocalins from primary sequence. In this paper, we report a support vector machine (SVM) approach to predict lipocalins from protein sequence using sequence-derived properties. LipoPred was trained using a dataset consisting of 325 lipocalin proteins and 325 non-lipocalin proteins, and evaluated by an independent set of 140 lipocalin proteins and 21,447 non-lipocalin proteins. LipoPred achieved 88.61% accuracy with 89.26% sensitivity, 85.27% specificity and 0.74 Matthew’s correlation coefficient (MCC). When applied on the test dataset, LipoPred achieved 84.25% accuracy with 88.57% sensitivity, 84.22% specificity and MCC of 0.16. LipoPred achieved better performance rate when compared with PSI-BLAST, HMM and SVM-Prot methods. Out of 218 lipocalins, LipoPred correctly predicted 194 proteins including 39 lipocalins that are non-homologous to any protein in the SWISSPROT database. This result shows that LipoPred is potentially useful for predicting the lipocalin proteins that have no sequence homologs in the sequence databases. Further, successful prediction of nine hypothetical lipocalin proteins and five new members of lipocalin family prove that LipoPred can be efficiently used to identify and annotate the new lipocalin proteins from sequence databases. The LipoPred software and dataset are available at . 相似文献

9.

Constructing support vector machine ensembles for cancer classification based on proteomic profiling

Mao Y Zhou XB Pi DY Sun YX 《基因组蛋白质组与生物信息学报(英文版)》2005,3(4):238-241

In this study, we present a constructive algorithm for training cooperative support vector machine ensembles （CSVMEs）. CSVME combines ensemble architecture design with cooperative training for individual SVMs in ensembles. Unlike most previous studies on training ensembles, CSVME puts emphasis on both accuracy and collaboration among individual SVMs in an ensemble. A group of SVMs selected on the basis of recursive classifier elimination is used in CSVME, and the number of the individual SVMs selected to construct CSVME is determined by 10-fold cross-validation. This kind of SVME has been tested on two ovarian cancer datasets previously obtained by proteomic mass spectrometry. By combining several individual SVMs, the proposed method achieves better performance than the SVME of all base SVMs. 相似文献

10.

SVM Classifier - a comprehensive java interface for support vector machine classification of microarray data

Pirooznia M Deng Y 《BMC bioinformatics》2006,7(Z4):S25

相似文献

11.

ESVM: evolutionary support vector machine for automatic feature selection and classification of microarray data 总被引：1，自引：0，他引：1

Huang HL Chang FL 《Bio Systems》2007,90(2):516-528

An optimal design of support vector machine (SVM)-based classifiers for prediction aims to optimize the combination of feature selection, parameter setting of SVM, and cross-validation methods. However, SVMs do not offer the mechanism of automatic internal relevant feature detection. The appropriate setting of their control parameters is often treated as another independent problem. This paper proposes an evolutionary approach to designing an SVM-based classifier (named ESVM) by simultaneous optimization of automatic feature selection and parameter tuning using an intelligent genetic algorithm, combined with k-fold cross-validation regarded as an estimator of generalization ability. To illustrate and evaluate the efficiency of ESVM, a typical application to microarray classification using 11 multi-class datasets is adopted. By considering model uncertainty, a frequency-based technique by voting on multiple sets of potentially informative features is used to identify the most effective subset of genes. It is shown that ESVM can obtain a high accuracy of 96.88% with a small number 10.0 of selected genes using 10-fold cross-validation for the 11 datasets averagely. The merits of ESVM are three-fold: (1) automatic feature selection and parameter setting embedded into ESVM can advance prediction abilities, compared to traditional SVMs; (2) ESVM can serve not only as an accurate classifier but also as an adaptive feature extractor; (3) ESVM is developed as an efficient tool so that various SVMs can be used conveniently as the core of ESVM for bioinformatics problems. 相似文献

12.

An efficient support vector machine approach for identifying protein S-nitrosylation sites

Li YX Shao YH Jing L Deng NY 《Protein and peptide letters》2011,18(6):573-587

Protein S-nitrosylation plays a key and specific role in many cellular processes. Detecting possible S-nitrosylated substrates and their corresponding exact sites is crucial for studying the mechanisms of these biological processes. Comparing with the expensive and time-consuming biochemical experiments, the computational methods are attracting considerable attention due to their convenience and fast speed. Although some computational models have been developed to predict S-nitrosylation sites, their accuracy is still low. In this work,we incorporate support vector machine to predict protein S-nitrosylation sites. After a careful evaluation of six encoding schemes, we propose a new efficient predictor, CPR-SNO, using the coupling patterns based encoding scheme. The performance of our CPR-SNO is measured with the area under the ROC curve (AUC) of 0.8289 in 10-fold cross validation experiments, which is significantly better than the existing best method GPS-SNO 1.0's 0.685 performance. In further annotating large-scale potential S-nitrosylated substrates, CPR-SNO also presents an encouraging predictive performance. These results indicate that CPR-SNO can be used as a competitive protein S-nitrosylation sites predictor to the biological community. Our CPR-SNO has been implemented as a web server and is available at http://math.cau.edu.cn/CPR -SNO/CPR-SNO.html. 相似文献

13.

Local sequence information-based support vector machine to classify voltage-gated potassium channels 总被引：3，自引：1，他引：3

Liu LX Li ML Tan FY Lu MC Wang KL Guo YZ Wen ZN Jiang L 《Acta biochimica et biophysica Sinica》2006,38(6):363-371

In our previous work,we developed a computational tool,PreK-ClassK-ClassKv,to predictand classify potassium (K~ ) channels.For K channel prediction (PreK) and classification at family level(ClassK),this method performs well.However,it does not perform so well in classifying voltage-gatedpotassium (Kv) channels (ClassKv).In this paper,a new method based on the local sequence information ofKv channels is introduced to classify Kv channels.Six transmembrane domains of a Kv channel protein areused to define a protein,and the dipeptide composition technique is used to transform an amino acid sequenceto a numerical sequence.A Kv channel protein is represented by a vector with 2000 elements,and a supportvector machine algorithm is applied to classify Kv channels.This method shows good performance withaverages of total accuracy (Acc),sensitivity (SE),specificity (SP),reliability (R) and Matthews correlationcoefficient (MCC) of 98.0%,89.9%,100%,0.95 and 0.94 respectively.The results indicate that the localsequence information-based method is better than the global sequence information-based method to classifyKv channels. 相似文献

14.

基于多分类支持向量机的基因表达系列分析

苏洪全朱义胜姜玉梅《生物信息学》2010,8(4):356-358,363

基因表达系列分析(Serial analysis of gene expression,SAGE)是一种基因表达数据,反映了细胞内的动态变化。模式识别和可视化方法是分析SAGE数据的基本工具,但是由于缺乏描述数据的统计特性,传统的聚类分析技术不适用于SAGE数据的分析。本文提出了一种基于多分类和支持向量机的SAGE数据的分析法。经过对模拟数据和人类癌症SAGE数据的分析,基于径向基核函数的多分类支持向量机算法一对一(one-against-one,OAO)算法提供了比PoissonC和PoissonS更好的分类结果。相似文献

15.

TANGLE: two-level support vector regression approach for protein backbone torsion angle prediction from primary sequences

Song J Tan H Wang M Webb GI Akutsu T 《PloS one》2012,7(2):e30361

Protein backbone torsion angles (Phi) and (Psi) involve two rotation angles rotating around the C(α)-N bond (Phi) and the C(α)-C bond (Psi). Due to the planarity of the linked rigid peptide bonds, these two angles can essentially determine the backbone geometry of proteins. Accordingly, the accurate prediction of protein backbone torsion angle from sequence information can assist the prediction of protein structures. In this study, we develop a new approach called TANGLE (Torsion ANGLE predictor) to predict the protein backbone torsion angles from amino acid sequences. TANGLE uses a two-level support vector regression approach to perform real-value torsion angle prediction using a variety of features derived from amino acid sequences, including the evolutionary profiles in the form of position-specific scoring matrices, predicted secondary structure, solvent accessibility and natively disordered region as well as other global sequence features. When evaluated based on a large benchmark dataset of 1,526 non-homologous proteins, the mean absolute errors (MAEs) of the Phi and Psi angle prediction are 27.8° and 44.6°, respectively, which are 1% and 3% respectively lower than that using one of the state-of-the-art prediction tools ANGLOR. Moreover, the prediction of TANGLE is significantly better than a random predictor that was built on the amino acid-specific basis, with the p-value<1.46e-147 and 7.97e-150, respectively by the Wilcoxon signed rank test. As a complementary approach to the current torsion angle prediction algorithms, TANGLE should prove useful in predicting protein structural properties and assisting protein fold recognition by applying the predicted torsion angles as useful restraints. TANGLE is freely accessible at http://sunflower.kuicr.kyoto-u.ac.jp/~sjn/TANGLE/. 相似文献

16.

Identification of ATP binding residues of a protein from its primary sequence

Jagat S Chauhan Nitish K Mishra Gajendra PS Raghava 《BMC bioinformatics》2009,10(1):434

Background

One of the major challenges in post-genomic era is to provide functional annotations for large number of proteins arising from genome sequencing projects. The function of many proteins depends on their interaction with small molecules or ligands. ATP is one such important ligand that plays critical role as a coenzyme in the functionality of many proteins. There is a need to develop method for identifying ATP interacting residues in a ATP binding proteins (ABPs), in order to understand mechanism of protein-ligands interaction. 相似文献

17.

Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure

Lewis DP Jebara T Noble WS 《Bioinformatics (Oxford, England)》2006,22(22):2753-2760

相似文献

18.

Cirrhosis classification based on MRI with duplicative-feature support vector machine (DFSVM)

Liu Hui Guo Dong Mei Liu Xiang 《Biomedical signal processing and control》2013,8(4):346-353

相似文献

19.

A novel representation for apoptosis protein subcellular localization prediction using support vector machine

Li Zhang Dachao Li 《Journal of theoretical biology》2009,259(2):361-99

Apoptosis, or programmed cell death, plays an important role in development of an organism. Obtaining information on subcellular location of apoptosis proteins is very helpful to understand the apoptosis mechanism. In this paper, based on the concept that the position distribution information of amino acids is closely related with the structure and function of proteins, we introduce the concept of distance frequency [Matsuda, S., Vert, J.P., Ueda, N., Toh, H., Akutsu, T., 2005. A novel representation of protein sequences for prediction of subcellular location using support vector machines. Protein Sci. 14, 2804-2813] and propose a novel way to calculate distance frequencies. In order to calculate the local features, each protein sequence is separated into p parts with the same length in our paper. Then we use the novel representation of protein sequences and adopt support vector machine to predict subcellular location. The overall prediction accuracy is significantly improved by jackknife test. 相似文献

20.

miTarget: microRNA target gene prediction using a support vector machine

Sung-Kyu Kim Jin-Wu Nam Je-Keun Rhee Wha-Jin Lee Byoung-Tak Zhang 《BMC bioinformatics》2006,7(1):411

相似文献