首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Identifying prokaryotes in silico is commonly based on DNA sequences. In experiments where DNA sequences may not be immediately available, we need to have a different approach to detect prokaryotes based on RNA or protein sequences. N-formylmethionine (fMet) is known as a typical characteristic of prokaryotes. A web tool has been implemented here for predicting prokaryotes through detecting the N-formylmethionine residues in protein sequences. The predictor is constructed using support vector machine. An online predictor has been implemented using Python. The implemented predictor is able to achieve the total prediction accuracy 80% with the specificity 80% and the sensitivity 81%.  相似文献   

2.
As a result of genome and other sequencing projects, the gap between the number of known protein sequences and the number of known protein structural classes is widening rapidly. In order to narrow this gap, it is vitally important to develop a computational prediction method for fast and accurately determining the protein structural class. In this paper, a novel predictor is developed for predicting protein structural class. It is featured by employing a support vector machine learning system and using a different pseudo-amino acid composition (PseAA), which was introduced to, to some extent, take into account the sequence-order effects to represent protein samples. As a demonstration, the jackknife cross-validation test was performed on a working dataset that contains 204 non-homologous proteins. The predicted results are very encouraging, indicating that the current predictor featured with the PseAA may play an important complementary role to the elegant covariant discriminant predictor and other existing algorithms.  相似文献   

3.
In the post-genome era, the prediction of protein function is one of the most demanding tasks in the study of bioinformatics. Machine learning methods, such as the support vector machines (SVMs), greatly help to improve the classification of protein function. In this work, we integrated SVMs, protein sequence amino acid composition, and associated physicochemical properties into the study of nucleic-acid-binding proteins prediction. We developed the binary classifications for rRNA-, RNA-, DNA-binding proteins that play an important role in the control of many cell processes. Each SVM predicts whether a protein belongs to rRNA-, RNA-, or DNA-binding protein class. Self-consistency and jackknife tests were performed on the protein data sets in which the sequences identity was < 25%. Test results show that the accuracies of rRNA-, RNA-, DNA-binding SVMs predictions are approximately 84%, approximately 78%, approximately 72%, respectively. The predictions were also performed on the ambiguous and negative data set. The results demonstrate that the predicted scores of proteins in the ambiguous data set by RNA- and DNA-binding SVM models were distributed around zero, while most proteins in the negative data set were predicted as negative scores by all three SVMs. The score distributions agree well with the prior knowledge of those proteins and show the effectiveness of sequence associated physicochemical properties in the protein function prediction. The software is available from the author upon request.  相似文献   

4.
Knowledge of structural class plays an important role in understanding protein folding patterns. In this study, a simple and powerful computational method, which combines support vector machine with PSI-BLAST profile, is proposed to predict protein structural class for low-similarity sequences. The evolution information encoding in the PSI-BLAST profiles is converted into a series of fixed-length feature vectors by extracting amino acid composition and dipeptide composition from the profiles. The resulting vectors are then fed to a support vector machine classifier for the prediction of protein structural class. To evaluate the performance of the proposed method, jackknife cross-validation tests are performed on two widely used benchmark datasets, 1189 (containing 1092 proteins) and 25PDB (containing 1673 proteins) with sequence similarity lower than 40% and 25%, respectively. The overall accuracies attain 70.7% and 72.9% for 1189 and 25PDB datasets, respectively. Comparison of our results with other methods shows that our method is very promising to predict protein structural class particularly for low-similarity datasets and may at least play an important complementary role to existing methods.  相似文献   

5.
6.
Apoptosis, or programmed cell death, plays an important role in development of an organism. Obtaining information on subcellular location of apoptosis proteins is very helpful to understand the apoptosis mechanism. In this paper, based on the concept that the position distribution information of amino acids is closely related with the structure and function of proteins, we introduce the concept of distance frequency [Matsuda, S., Vert, J.P., Ueda, N., Toh, H., Akutsu, T., 2005. A novel representation of protein sequences for prediction of subcellular location using support vector machines. Protein Sci. 14, 2804-2813] and propose a novel way to calculate distance frequencies. In order to calculate the local features, each protein sequence is separated into p parts with the same length in our paper. Then we use the novel representation of protein sequences and adopt support vector machine to predict subcellular location. The overall prediction accuracy is significantly improved by jackknife test.  相似文献   

7.
8.
复杂疾病驱使的融合SDA-SVM集成基因挖掘方法   总被引:1,自引:0,他引:1  
提出了一种新颖的复杂疾病驱使的融合SDA-SVM(Stepwise Discriminant Analysis-Support Vector Machine,SDA-SVM)技术的集成基因挖掘方法。该集成方法融合逐步判别分析和支持向量机的优点,能够有效地进行复杂疾病相关基因的深度挖掘,使得挖掘出的基因能够较好地识别疾病类型和亚型。通过将该方法应用于一套弥散性大B细胞淋巴瘤DNA表达谱数据,并与其它基因挖掘方法对比,结果表明该方法挖掘出的基因具有较高的疾病相关性和较强的疾病类型识别能力。  相似文献   

9.
Chen C  Zhou X  Tian Y  Zou X  Cai P 《Analytical biochemistry》2006,357(1):116-121
Because a priori knowledge of a protein structural class can provide useful information about its overall structure, the determination of protein structural class is a quite meaningful topic in protein science. However, with the rapid increase in newly found protein sequences entering into databanks, it is both time-consuming and expensive to do so based solely on experimental techniques. Therefore, it is vitally important to develop a computational method for predicting the protein structural class quickly and accurately. To deal with the challenge, this article presents a dual-layer support vector machine (SVM) fusion network that is featured by using a different pseudo-amino acid composition (PseAA). The PseAA here contains much information that is related to the sequence order of a protein and the distribution of the hydrophobic amino acids along its chain. As a showcase, the rigorous jackknife cross-validation test was performed on the two benchmark data sets constructed by Zhou. A significant enhancement in success rates was observed, indicating that the current approach may serve as a powerful complementary tool to other existing methods in this area.  相似文献   

10.
A change in the normal concentration of essential trace elements in the human body might lead to major health disturbances. In this study, hair samples were collected from 115 human subject, including 55 healthy people and 60 patients with prostate cancer. The concentrations of 20 trace elements (TEs) in these samples were measured by inductively coupled plasma-mass spectrometry. A support vector machine was used to investigate the relationship between TEs and prostate cancer. It is found that, among the 20 TEs, 10 (Mg P, K, Ca, Cr, Mn, Fe. Cu, Zn, and Se) are related to the risk of prostate cancer. These 10 TEs were used to build the prediction model for prostate cancer. The model obtained can satisfactorily distinguish the healthy samples from the cancer samples. Furthermore, the cross-validation by leaving-one method proved that the prediction ability of this model reaches as high as 95.8%. It is practical to predict the risk of prostate cancer using this model in the clinics  相似文献   

11.
苏洪全  朱义胜  姜玉梅 《生物信息学》2010,8(4):356-358,363
基因表达系列分析(Serial analysis of gene expression,SAGE)是一种基因表达数据,反映了细胞内的动态变化。模式识别和可视化方法是分析SAGE数据的基本工具,但是由于缺乏描述数据的统计特性,传统的聚类分析技术不适用于SAGE数据的分析。本文提出了一种基于多分类和支持向量机的SAGE数据的分析法。经过对模拟数据和人类癌症SAGE数据的分析,基于径向基核函数的多分类支持向量机算法一对一(one-against-one,OAO)算法提供了比PoissonC和PoissonS更好的分类结果。  相似文献   

12.
We present here the recent update of AutoMotif Server (AMS 2.0) that predicts post-translational modification sites in protein sequences. The support vector machine (SVM) algorithm was trained on data gathered in 2007 from various sets of proteins containing experimentally verified chemical modifications of proteins. Short sequence segments around a modification site were dissected from a parent protein, and represented in the training set as binary or profile vectors. The updated efficiency of the SVM classification for each type of modification and the predictive power of both representations were estimated using leave-one-out tests for model of general phosphorylation and for modifications catalyzed by several specific protein kinases. The accuracy of the method was improved in comparison to the previous version of the service (Plewczynski et al., “AutoMotif server: prediction of single residue post-translational modifications in proteins”, Bioinformatics 21: 2525–7, 2005). The precision of the updated version reached over 90% for selected types of phosphorylation and was optimized in trade of lower recall value of the classification model. The AutoMotif Server version 2007 is freely available at . Additionally, the reference dataset for optimization of prediction of phosphorylation sites, collected from the UniProtKB was also provided and can be accessed at .  相似文献   

13.
With the rapid increment of protein sequence data, it is indispensable to develop automated and reliable predictive methods for protein function annotation. One approach for facilitating protein function prediction is to classify proteins into functional families from primary sequence. Being the most important group of all proteins, the accurate prediction for enzyme family classes and subfamily classes is closely related to their biological functions. In this paper, for the prediction of enzyme subfamily classes, the Chou's amphiphilic pseudo-amino acid composition [Chou, K.C., 2005. Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21, 10-19] has been adopted to represent the protein samples for training the 'one-versus-rest' support vector machine. As a demonstration, the jackknife test was performed on the dataset that contains 2640 oxidoreductase sequences classified into 16 subfamily classes [Chou, K.C., Elrod, D.W., 2003. Prediction of enzyme family classes. J. Proteome Res. 2, 183-190]. The overall accuracy thus obtained was 80.87%. The significant enhancement in the accuracy indicates that the current method might play a complementary role to the exiting methods.  相似文献   

14.
The study proposes a method for supervised classification of multi-channel surface electromyographic signals with the aim of controlling myoelectric prostheses. The representation space is based on the discrete wavelet transform (DWT) of each recorded EMG signal using unconstrained parameterization of the mother wavelet. The classification is performed with a support vector machine (SVM) approach in a multi-channel representation space. The mother wavelet is optimized with the criterion of minimum classification error, as estimated from the learning signal set. The method was applied to the classification of six hand movements with recording of the surface EMG from eight locations over the forearm. Misclassification rate in six subjects using the eight channels was (mean ± S.D.) 4.7 ± 3.7% with the proposed approach while it was 11.1 ± 10.0% without wavelet optimization (Daubechies wavelet). The DWT and SVM can be implemented with fast algorithms, thus, the method is suitable for real-time implementation.  相似文献   

15.
A new method is proposed for detecting fraudulent whiplash claims based on measurements of movement control of the neck. The method is noninvasive and inexpensive. The subjects track a slowly moving object on a computer screen with their head. The deviation between the measured and actual trajectory is quantified and used as input to an ensemble of support vector machine classifiers. The ensemble was trained on a group of 34 subjects with chronic whiplash disorder together with a group of 31 healthy subjects instructed to feign whiplash injury. The sensitivity of the proposed method was 86%, the specificity 84% and the area under curve (AUC) was 0.86. This suggests that the method can be of practical use for evaluating the validity of whiplash claims.  相似文献   

16.
17.
Penicillin-Binding Proteins are peptidases that play an important role in cell-wall biogenesis in bacteria and thus maintaining bacterial infections. A wide class of β-lactam drugs are known to act on these proteins and inhibit bacterial infections by disrupting the cell-wall biogenesis pathway. Penicillin-Binding proteins have recently gained importance with the increase in the number of multi-drug resistant bacteria. In this work, we have collected a dataset of over 700 Penicillin-Binding and non-Penicillin Binding Proteins and extracted various sequence-related features. We then created models to classify the proteins into Penicillin-Binding and non-binding using supervised machine learning algorithms such as Support Vector Machines and Random Forest. We obtain a good classification performance for both the models using both the methods.  相似文献   

18.
Cancers are regarded as malignant proliferations of tumor cells present in many tissues and organs, which can severely curtail the quality of human life. The potential of using plasma DNA for cancer detection has been widely recognized, leading to the need of mapping the tissue-of-origin through the identification of somatic mutations. With cutting-edge technologies, such as next-generation sequencing, numerous somatic mutations have been identified, and the mutation signatures have been uncovered across different cancer types. However, somatic mutations are not independent events in carcinogenesis but exert functional effects. In this study, we applied a pan-cancer analysis to five types of cancers: (I) breast cancer (BRCA), (II) colorectal adenocarcinoma (COADREAD), (III) head and neck squamous cell carcinoma (HNSC), (IV) kidney renal clear cell carcinoma (KIRC), and (V) ovarian cancer (OV). Based on the mutated genes of patients suffering from one of the aforementioned cancer types, patients they were encoded into a large number of numerical values based upon the enrichment theory of gene ontology (GO) terms and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. We analyzed these features with the Monte-Carlo Feature Selection (MCFS) method, followed by the incremental feature selection (IFS) method to identify functional alteration features that could be used to build the support vector machine (SVM)-based classifier for distinguishing the five types of cancers. Our results showed that the optimal classifier with the selected 344 features had the highest Matthews correlation coefficient value of 0.523. Sixteen decision rules produced by the MCFS method can yield an overall accuracy of 0.498 for the classification of the five cancer types. Further analysis indicated that some of these features and rules were supported by previous experiments. This study not only presents a new approach to mapping the tissue-of-origin for cancer detection but also unveils the specific functional alterations of each cancer type, providing insight into cancer-specific functional aberrations as potential therapeutic targets. This article is part of a Special Issue entitled: Accelerating Precision Medicine through Genetic and Genomic Big Data Analysis edited by Yudong Cai & Tao Huang.  相似文献   

19.
Peng S  Xu Q  Ling XB  Peng X  Du W  Chen L 《FEBS letters》2003,555(2):358-362
Simultaneous multiclass classification of tumor types is essential for future clinical implementations of microarray-based cancer diagnosis. In this study, we have combined genetic algorithms (GAs) and all paired support vector machines (SVMs) for multiclass cancer identification. The predictive features have been selected through iterative SVMs/GAs, and recursive feature elimination post-processing steps, leading to a very compact cancer-related predictive gene set. Leave-one-out cross-validations yielded accuracies of 87.93% for the eight-class and 85.19% for the fourteen-class cancer classifications, outperforming the results derived from previously published methods.  相似文献   

20.
Conotoxins are disulfide rich small peptides that target a broad spectrum of ion-channels and neuronal receptors. They offer promising avenues in the treatment of chronic pain, epilepsy and cardiovascular diseases. Assignment of newly sequenced mature conotoxins into appropriate superfamilies using a computational approach could provide valuable preliminary information on the biological and pharmacological functions of the toxins. However, creation of protein sequence patterns for the reliable identification and classification of new conotoxin sequences may not be effective due to the hypervariability of mature toxins. With the aim of formulating an in silico approach for the classification of conotoxins into superfamilies, we have incorporated the concept of pseudo-amino acid composition to represent a peptide in a mathematical framework that includes the sequence-order effect along with conventional amino acid composition. The polarity index attribute, which encodes information such as residue surface buriability, polarity, and hydropathy, was used to store the sequence-order effect. Several methods like BLAST, ISort (Intimate Sorting) predictor, least Hamming distance algorithm, least Euclidean distance algorithm and multi-class support vector machines (SVMs), were explored for superfamily identification. The SVMs outperform other methods providing an overall accuracy of 88.1% for all correct predictions with generalized squared correlation of 0.75 using jackknife cross-validation test for A, M, O and T superfamilies and a negative set consisting of short cysteine rich sequences from different eukaryotes having diverse functions. The computed sensitivity and specificity for the superfamilies were found to be in the range of 84.0-94.1% and 80.0-95.5%, respectively, attesting to the efficacy of multi-class SVMs for the successful in silico classification of the conotoxins into their superfamilies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号