首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 406 毫秒
1.
Advances in the prediction of protein targeting signals   总被引:5,自引:0,他引:5  
Schneider G  Fechner U 《Proteomics》2004,4(6):1571-1580
Enlarged sets of reference data and special machine learning approaches have improved the accuracy of the prediction of protein subcellular localization. Recent approaches report over 95% correct predictions with low fractions of false-positives for secretory proteins. A clear trend is to develop specifically tailored organism- and organelle-specific prediction tools rather than using one general method. Focus of the review is on machine learning systems, highlighting four concepts: the artificial neural feed-forward network, the self-organizing map (SOM), the Hidden-Markov-Model (HMM), and the support vector machine (SVM).  相似文献   

2.
Ubiquitin functions to regulate protein turnover in a cell by closely regulating the degradation of specific proteins. Such a regulatory role is very important, and thus I have analyzed the proteins that are ubiquitin-like, using an artificial neural network, support vector machines and a hidden Markov model (HMM). The methods were trained and tested on a set of 373 ubiquitin proteins and 373 non-ubiquitin proteins, obtained from Entrez protein database. The artificial neural network and support vector machine are trained and tested using both the physicochemical properties and PSSM matrices generated from PSI-BLAST, while in the HMM based method direct sequences are used for training-testing procedures. Further, the performance measures of the methods are calculated for test sequences, i.e. accuracy, specificity, sensitivity and Matthew's correlation coefficients of the methods are calculated. The highest accuracy of 90.2%, specificity of 87.04% and sensitivity of 94.08% was achieved using the support vector machine model with PSSM matrices. While accuracies of 86.82%, 83.37%, 80.18% and 72.11% were obtained for the support vector machine with physicochemical properties, neural network with PSSM matrices, neural networks with physicochemical properties, and hidden Markov model, respectively. As the accuracy for SVM model is better both using physicochemical properties and the PSSM matrices, it is concluded that kernel methods such as SVM outperforms neural networks and hidden Markov models.  相似文献   

3.
4.
Various attempts have been made to predict the individual disease risk based on genotype data from genome-wide association studies (GWAS). However, most studies only investigated one or two classification algorithms and feature encoding schemes. In this study, we applied seven different classification algorithms on GWAS case-control data sets for seven different diseases to create models for disease risk prediction. Further, we used three different encoding schemes for the genotypes of single nucleotide polymorphisms (SNPs) and investigated their influence on the predictive performance of these models. Our study suggests that an additive encoding of the SNP data should be the preferred encoding scheme, as it proved to yield the best predictive performances for all algorithms and data sets. Furthermore, our results showed that the differences between most state-of-the-art classification algorithms are not statistically significant. Consequently, we recommend to prefer algorithms with simple models like the linear support vector machine (SVM) as they allow for better subsequent interpretation without significant loss of accuracy.  相似文献   

5.
支持向量机是一种基于统计学习理论的新型学习机。文章提出一种基于支持向量机的癫痫脑电特征提取与识别方法,充分发挥其泛化能力强的特点,在与神经网络方法的比较中,表现出较低的漏检率和较好的鲁棒性,有深入研究的价值和良好的应用前景。  相似文献   

6.
We develop ways to predict the side chain orientations of residues within a protein structure by using several different statistical machine learning methods. Here side chain orientation of a given residue i is measured by an angle Omega(i) between the vector pointing from the center of the protein structure to the C(i)(alpha) atom and the vector pointing from the C(i)(alpha) atom to the center of its side chain atoms. To predict the Omega(i) angles, we construct statistical models by using several different methods such as general linear regression, a regression tree and bagging, a neural network, and a support vector machine. The root mean square errors for the different models range only from 36.67 to 37.60 degrees and the correlation coefficients are all between 30% and 34%. The performances of different models in the test set are, thus, quite similar, and show the relative predictive power of these models to be significant in comparison with random side chain orientations.  相似文献   

7.
基于支持向量机方法的蛋白可溶性预测   总被引:1,自引:0,他引:1  
按照蛋白质序列中残基的相对可溶性,将其分为两类(表面/内部)和三类(表面/中间/内部)进行预测。选择不同窗宽和参数对数据进行训练和预测,以确保得到最好的分类效果,并同其他已有方法进行比较。对同一数据集不同分类阈值的预测结果显示,支持向量机方法对蛋白质可溶性的整体预测效果优于神经网络和信息论的方法。其中,对两类数据的最优分类结果达到79.0%,对三类数据的最优分类结果达到67.5%,表明支持向量机是蛋白质残基可溶性预测的一种有效方法。  相似文献   

8.
Protein solubility plays a major role for understanding the crystal growth and crystallization process of protein. How to predict the propensity of a protein to be soluble or to form inclusion body is a long but not fairly resolved problem. After choosing almost 10,000 protein sequences from NCBI database and eliminating the sequences with 90% homologous similarity by CD-HIT, 5692 sequences remained. By using Chou's pseudo amino acid composition features, we predict the soluble protein with the three methods: support vector machine (SVM), back propagation neural network (BP Neural Network) and hybrid method based on SVM and BP Neural Network, respectively. Each method is evaluated by re-substitution test and 10-fold cross-validation test. In the re-substitution test, the BP Neural Network performs with the best results, in which the accuracy achieves 0.9288 and Matthews Correlation Coefficient (MCC) achieves 0.8513. Meanwhile, the other two methods are better than BP Neural Network in 10-fold cross-validation test. The hybrid method based on SVM and BP Neural Network is the best. The average accuracy is 0.8678 and average MCC is 0.7233. Although all of the three methods achieve considerable evaluations, the hybrid method is deemed to be the best, according to the performance comparison.  相似文献   

9.
Prediction of protein subcellular localization   总被引:6,自引:0,他引:6  
Yu CS  Chen YC  Lu CH  Hwang JK 《Proteins》2006,64(3):643-651
Because the protein's function is usually related to its subcellular localization, the ability to predict subcellular localization directly from protein sequences will be useful for inferring protein functions. Recent years have seen a surging interest in the development of novel computational tools to predict subcellular localization. At present, these approaches, based on a wide range of algorithms, have achieved varying degrees of success for specific organisms and for certain localization categories. A number of authors have noticed that sequence similarity is useful in predicting subcellular localization. For example, Nair and Rost (Protein Sci 2002;11:2836-2847) have carried out extensive analysis of the relation between sequence similarity and identity in subcellular localization, and have found a close relationship between them above a certain similarity threshold. However, many existing benchmark data sets used for the prediction accuracy assessment contain highly homologous sequences-some data sets comprising sequences up to 80-90% sequence identity. Using these benchmark test data will surely lead to overestimation of the performance of the methods considered. Here, we develop an approach based on a two-level support vector machine (SVM) system: the first level comprises a number of SVM classifiers, each based on a specific type of feature vectors derived from sequences; the second level SVM classifier functions as the jury machine to generate the probability distribution of decisions for possible localizations. We compare our approach with a global sequence alignment approach and other existing approaches for two benchmark data sets-one comprising prokaryotic sequences and the other eukaryotic sequences. Furthermore, we carried out all-against-all sequence alignment for several data sets to investigate the relationship between sequence homology and subcellular localization. Our results, which are consistent with previous studies, indicate that the homology search approach performs well down to 30% sequence identity, although its performance deteriorates considerably for sequences sharing lower sequence identity. A data set of high homology levels will undoubtedly lead to biased assessment of the performances of the predictive approaches-especially those relying on homology search or sequence annotations. Our two-level classification system based on SVM does not rely on homology search; therefore, its performance remains relatively unaffected by sequence homology. When compared with other approaches, our approach performed significantly better. Furthermore, we also develop a practical hybrid method, which combines the two-level SVM classifier and the homology search method, as a general tool for the sequence annotation of subcellular localization.  相似文献   

10.
Support vector machine applications in bioinformatics   总被引:14,自引:0,他引:14  
  相似文献   

11.
We present an approach to predicting protein structural class that uses amino acid composition and hydrophobic pattern frequency information as input to two types of neural networks: (1) a three-layer back-propagation network and (2) a learning vector quantization network. The results of these methods are compared to those obtained from a modified Euclidean statistical clustering algorithm. The protein sequence data used to drive these algorithms consist of the normalized frequency of up to 20 amino acid types and six hydrophobic amino acid patterns. From these frequency values the structural class predictions for each protein (all-alpha, all-beta, or alpha-beta classes) are derived. Examples consisting of 64 previously classified proteins were randomly divided into multiple training (56 proteins) and test (8 proteins) sets. The best performing algorithm on the test sets was the learning vector quantization network using 17 inputs, obtaining a prediction accuracy of 80.2%. The Matthews correlation coefficients are statistically significant for all algorithms and all structural classes. The differences between algorithms are in general not statistically significant. These results show that information exists in protein primary sequences that is easily obtainable and useful for the prediction of protein structural class by neural networks as well as by standard statistical clustering algorithms.  相似文献   

12.
Li L  Zhang Y  Zou L  Li C  Yu B  Zheng X  Zhou Y 《PloS one》2012,7(1):e31057
With the rapid increase of protein sequences in the post-genomic age, it is challenging to develop accurate and automated methods for reliably and quickly predicting their subcellular localizations. Till now, many efforts have been tried, but most of which used only a single algorithm. In this paper, we proposed an ensemble classifier of KNN (k-nearest neighbor) and SVM (support vector machine) algorithms to predict the subcellular localization of eukaryotic proteins based on a voting system. The overall prediction accuracies by the one-versus-one strategy are 78.17%, 89.94% and 75.55% for three benchmark datasets of eukaryotic proteins. The improved prediction accuracies reveal that GO annotations and hydrophobicity of amino acids help to predict subcellular locations of eukaryotic proteins.  相似文献   

13.
Natt NK  Kaur H  Raghava GP 《Proteins》2004,56(1):11-18
This article describes a method developed for predicting transmembrane beta-barrel regions in membrane proteins using machine learning techniques: artificial neural network (ANN) and support vector machine (SVM). The ANN used in this study is a feed-forward neural network with a standard back-propagation training algorithm. The accuracy of the ANN-based method improved significantly, from 70.4% to 80.5%, when evolutionary information was added to a single sequence as a multiple sequence alignment obtained from PSI-BLAST. We have also developed an SVM-based method using a primary sequence as input and achieved an accuracy of 77.4%. The SVM model was modified by adding 36 physicochemical parameters to the amino acid sequence information. Finally, ANN- and SVM-based methods were combined to utilize the full potential of both techniques. The accuracy and Matthews correlation coefficient (MCC) value of SVM, ANN, and combined method are 78.5%, 80.5%, and 81.8%, and 0.55, 0.63, and 0.64, respectively. These methods were trained and tested on a nonredundant data set of 16 proteins, and performance was evaluated using "leave one out cross-validation" (LOOCV). Based on this study, we have developed a Web server, TBBPred, for predicting transmembrane beta-barrel regions in proteins (available at http://www.imtech.res.in/raghava/tbbpred).  相似文献   

14.
李高磊  黄玮  孙浩  李余动 《微生物学报》2021,61(9):2581-2593
随着大数据时代的到来,如何将生物组学海量数据转化为易理解及可视化的知识是当前生物信息学面临的重要挑战之一。为了处理复杂、高维的微生物组数据,目前机器学习算法已被应用于人体微生物组研究,以揭示疾病背后的复杂机制。本文首先简述了微生物组数据处理方法及常用的机器学习算法,如支持向量机(SVM)、随机森林(RF)和人工神经网络(ANN)等,然后对机器学习的工作流程及其要点进行阐述,并探讨了机器学习算法在基于微生物组数据预测宿主表型方面的应用。最后以唾液微生物组数据预测口腔异味为例,实现了机器学习算法的模型构建与评估分析,并提供了可用于微生物组研究实践的R/Python代码(https://github.com/LiLabZSU/microbioML)。  相似文献   

15.
A support vector machine (SVM) modeling approach for short-term load forecasting is proposed. The SVM learning scheme is applied to the power load data, forcing the network to learn the inherent internal temporal property of power load sequence. We also study the performance when other related input variables such as temperature and humidity are considered. The performance of our proposed SVM modeling approach has been tested and compared with feed-forward neural network and cosine radial basis function neural network approaches. Numerical results show that the SVM approach yields better generalization capability and lower prediction error compared to those neural network approaches.  相似文献   

16.
Several QSAR (quantitative structure-activity relationships) models for predicting the inhibitory activity of 117 Aurora-A kinase inhibitors were developed. The whole dataset was split into a training set and a test set based on two different methods, (1) by a random selection; and (2) on the basis of a Kohonen’s self-organizing map (SOM). Then the inhibitory activity of 117 Aurora-A kinase inhibitors was predicted using multilinear regression (MLR) analysis and support vector machine (SVM) methods, respectively. For the two MLR models and the two SVM models, for the test sets, the correlation coefficients of over 0.92 were achieved.  相似文献   

17.
The leaf, which is a crucial indicator for evaluating crop status, plays an important role in plants' functions. Determining and monitoring leaf parameters can facilitate the detection and estimation of crop yield, which is essential for food security. Crop monitoring by remote sensing technology is critical to support crop production, especially over large scales. In this study, we developed a methodology to estimate leaf parameters based entirely on vegetation indices (VIs) from remotely sensed imagery in wheat under different management practices. Therefore, the current study aimed to examine the utility of VIs calculated from the sentinel-2 data in estimating the Leaf area index (LAI) and leaf parameters at wheat farms using machine learning algorithms. Leaf parameters included leaf dry weight (LDW), specific leaf area (SLA) and leaf specific weight (SLW), and machine learning algorithms were SVM (support vector machine), ANN (artificial neural network) and DNN (deep neural network). Leaf parameters were measured at several developmental stages of wheat in two contrasting environments in the southern Iran. The results demonstrated that the DNN algorithm could efficiently predict leaf parameters in the southern Iran with an overall precision of >72%, which assessed the potential of employing DNN to achieve the temporal and spatial distribution data of wheat based on the Sentinel-2 imagery. The validation of the DNN model generally showed high accuracy (R = 0.80, RMSE = 1.19, and MAE = 0.98) between observed and estimated LAI values when this model was used. NDVI was also highly sensitive to wheat LDW and SLA parameters, with a good correlation between field measurements and those predicted by the DNN model from sentinel-2 imagery, with the R values of 0.66 and 0.85, respectively. Further, NDVI and PVI (Perpendicular Vegetation Index) were linearly correlated with SLW across both temporal and spatial scales (R = 0.79). Among VIs considered from sentinel-2 imagery to predict wheat leaf parameters, NDVI was more sensitive than other VIs. This research, thus, indicated that using sentinel-2 data within a DNN model could provide a comparatively precise and robust prediction of leaf parameters and yield valuable insights into crop management with high temporal and spatial accuracy.  相似文献   

18.
基于已知的人类PolII启动子序列数据,综合选取启动子序列内容和序列信号特征,构建启动子的支持向量机分类器.分别以启动子序列的6-mer频数作为离散源参数构建序列内容特征。同时选取24个位点的3-mer频数作为序列信号特征构建PWM,将所得到的两类参数输入支持向量机对人类启动子进行预测.用10折叠交叉检验和独立数据集来衡量算法的预测能力,相关系数指标达到95%以上,结果显示结合了支持向量机的离散增量算法能够有效的提高预测成功率,是进行真核生物启动子预测的一种很有效的方法.  相似文献   

19.
Support vector machine (SVM) was applied to predict vasorelaxation effect of different structural molecules. A good classification model had been established, and the accuracy in prediction for the training, test, and overall datasets was 93.0%, 82.6%, and 89.5%, respectively. Furthermore, the model was used to predict the activity of a series of prenylated flavonoids. According to the estimated result, eleven molecules 1-11 were selected and synthesized. Their vasodilatory activities were determined experimentally in rat aorta rings that were pretreated with phenylephrine (PE). Structure-activity relationship (SAR) analysis revealed that flavanone derivatives showed the most potent activities, while flavone and chalcone derivatives exhibited medium activities.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号