首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Natt NK  Kaur H  Raghava GP 《Proteins》2004,56(1):11-18
This article describes a method developed for predicting transmembrane beta-barrel regions in membrane proteins using machine learning techniques: artificial neural network (ANN) and support vector machine (SVM). The ANN used in this study is a feed-forward neural network with a standard back-propagation training algorithm. The accuracy of the ANN-based method improved significantly, from 70.4% to 80.5%, when evolutionary information was added to a single sequence as a multiple sequence alignment obtained from PSI-BLAST. We have also developed an SVM-based method using a primary sequence as input and achieved an accuracy of 77.4%. The SVM model was modified by adding 36 physicochemical parameters to the amino acid sequence information. Finally, ANN- and SVM-based methods were combined to utilize the full potential of both techniques. The accuracy and Matthews correlation coefficient (MCC) value of SVM, ANN, and combined method are 78.5%, 80.5%, and 81.8%, and 0.55, 0.63, and 0.64, respectively. These methods were trained and tested on a nonredundant data set of 16 proteins, and performance was evaluated using "leave one out cross-validation" (LOOCV). Based on this study, we have developed a Web server, TBBPred, for predicting transmembrane beta-barrel regions in proteins (available at http://www.imtech.res.in/raghava/tbbpred).  相似文献   

2.
Prediction of neurotoxins based on their function and source   总被引:1,自引:0,他引:1  
Saha S  Raghava GP 《In silico biology》2007,7(4-5):369-387
We have developed a method NTXpred for predicting neurotoxins and classifying them based on their function and origin. The dataset used in this study consists of 582 non-redundant, experimentally annotated neurotoxins obtained from Swiss-Prot. A number of modules have been developed for predicting neurotoxins using residue composition based on feed-forwarded neural network (FNN), recurrent neural network (RNN), support vector machine (SVM) and achieved maximum accuracy of 84.19%, 92.75%, 97.72% respectively. In addition, SVM modules have been developed for classifying neurotoxins based on their source (e.g., eubacteria, cnidarians, molluscs, arthropods have been and chordate) using amino acid composition and dipeptide composition and achieved maximum overall accuracy of 78.94% and 88.07% respectively. The overall accuracy increased to 92.10%, when the evolutionary information obtained from PSI-BLAST was combined with SVM module of source classification. We have also developed SVM modules for classifying neurotoxins based on functions using amino acid, dipeptide composition and achieved overall accuracy of 83.11%, 91.10% respectively. The overall accuracy of function classification improved to 95.11%, when PSI-BLAST output was combined with SVM module. All the modules developed in this study were evaluated using five-fold cross-validation technique. The NTXpred is available at www.imtech.res.in/raghava/ntxpred/ and mirror site at http://bioinformatics.uams.edu/mirror/ntxpred.  相似文献   

3.
Machine learning (ML) models are a leading analytical technique used to monitor, map and quantify land use and land cover (LULC) and its change over time. Models such as k-nearest neighbour (kNN), support vector machines (SVM), artificial neural networks (ANN), and random forests (RF) have been used effectively to classify LULC types at a range of geographical scales. However, ML models have not been widely applied in African tropical regions due to methodological challenges that arise from relying on the coarse-resolution satellite images available for these areas. In this study, we compared the performance of four ML algorithms (kNN, SVM, ANN and RF) applied to LULC monitoring within the Mayo Rey department, North Province, Cameroon. We used satellite data from the Landsat 7 Enhanced Thematic Mapper Plus (ETM+) combined with 8 Operational Land Imager (OLI) images of northern Cameroon for November 2000 and November 2020. Our results showed that all four classification algorithms produced relatively high accuracy (overall classification accuracy >80%), with the RF model (> 90% classification accuracy) outperforming the kNN, SVM, and ANN models. We found that approximately 7% of all forested areas (dense forest and woody savanna) were converted to other land cover types between 2000 and 2020; this forest loss is particularly associated with an expansion of both croplands and built-up areas. Our study represents a novel application and comparison of statistical and ML approaches to LULC monitoring using coarse-resolution satellite images in an African tropical forest and savanna setting. The resulting land cover maps serve as an important baseline that will be useful to the Cameroon government for policy development, conservation planning, urban planning, and deforestation and agricultural monitoring.  相似文献   

4.
5.
We have introduced a new method of protein secondary structure prediction which is based on the theory of support vector machine (SVM). SVM represents a new approach to supervised pattern classification which has been successfully applied to a wide range of pattern recognition problems, including object recognition, speaker identification, gene function prediction with microarray expression profile, etc. In these cases, the performance of SVM either matches or is significantly better than that of traditional machine learning approaches, including neural networks.The first use of the SVM approach to predict protein secondary structure is described here. Unlike the previous studies, we first constructed several binary classifiers, then assembled a tertiary classifier for three secondary structure states (helix, sheet and coil) based on these binary classifiers. The SVM method achieved a good performance of segment overlap accuracy SOV=76.2 % through sevenfold cross validation on a database of 513 non-homologous protein chains with multiple sequence alignments, which out-performs existing methods. Meanwhile three-state overall per-residue accuracy Q(3) achieved 73.5 %, which is at least comparable to existing single prediction methods. Furthermore a useful "reliability index" for the predictions was developed. In addition, SVM has many attractive features, including effective avoidance of overfitting, the ability to handle large feature spaces, information condensing of the given data set, etc. The SVM method is conveniently applied to many other pattern classification tasks in biology.  相似文献   

6.
基于支持向量机的~(31)P磁共振波谱肝细胞癌诊断   总被引:1,自引:1,他引:0  
支持向量机是在统计学习理论基础上发展起来的一种新的机器学习方法,在模式识别领域有着广泛的应用。利用基于支持向量机模型的31P磁共振波谱数据对肝脏进行分类,区别肝细胞癌,肝硬化和正常的肝组织。通过对基于多项式核函数和径向基核函数的支持向量机分类器进行比较,并且得到三种肝脏分类的识别率。实验表明基于31P磁共振波谱数据的支持向量机分类模型能够对活体肝脏进行诊断性的预测。  相似文献   

7.
In this paper a new method based on artificial neural networks (ANN), is introduced for identifying pathogenic antibodies in Systemic Lupus Erythmatosus (SLE). dsDNA binding antibodies have been implicated in the pathogenesis of this autoimmune disease. In order to identify these dsDNA binding antibodies, the protein sequences of 42 dsDNA binding and 608 non-dsDNA binding antibodies were extracted from Kabat database and encoded using a physicochemical property of their amino acids namely Hydrophilicity. Encoded antibodies were used as the training patterns of a general regression neural network (GRNN). Simulation results show that the accuracy of proposed method in recognizing dsDNA binding antibodies is 83.2%. We have also investigated the roles of the light and heavy chains of anti-dsDNA antibodies in binding to DNA. Simulation results concur with the published experimental findings that in binding to DNA, the heavy chain of anti-dsDNA is more important than their light chain.  相似文献   

8.
9.
Ubiquitin functions to regulate protein turnover in a cell by closely regulating the degradation of specific proteins. Such a regulatory role is very important, and thus I have analyzed the proteins that are ubiquitin-like, using an artificial neural network, support vector machines and a hidden Markov model (HMM). The methods were trained and tested on a set of 373 ubiquitin proteins and 373 non-ubiquitin proteins, obtained from Entrez protein database. The artificial neural network and support vector machine are trained and tested using both the physicochemical properties and PSSM matrices generated from PSI-BLAST, while in the HMM based method direct sequences are used for training-testing procedures. Further, the performance measures of the methods are calculated for test sequences, i.e. accuracy, specificity, sensitivity and Matthew's correlation coefficients of the methods are calculated. The highest accuracy of 90.2%, specificity of 87.04% and sensitivity of 94.08% was achieved using the support vector machine model with PSSM matrices. While accuracies of 86.82%, 83.37%, 80.18% and 72.11% were obtained for the support vector machine with physicochemical properties, neural network with PSSM matrices, neural networks with physicochemical properties, and hidden Markov model, respectively. As the accuracy for SVM model is better both using physicochemical properties and the PSSM matrices, it is concluded that kernel methods such as SVM outperforms neural networks and hidden Markov models.  相似文献   

10.
Iqbal S  Masood K  Jafer O 《Bioinformation》2011,6(6):237-239
Two types of antiviral treatments, namely, interferon and nucleoside/nucleotide analogues are available for hepatitis infections. The selection of drug and dose determined using known pharmacokinetics and pharmacodynamics data is important. The lack of sufficient information for pharmacokinetics of a drug may not produce the desired results. Artificial neural network (ANN) provides a novel model-independent approach to pharmacokinetics and pharmacodynamics data. ANN model is created by supervised learning of 90 patients sample to predict the treatment strategy (lamivudine only and Lamivudine + Interferon) on the basis of viral load, liver function test, visit number, treatment duration, ethnic area, sex, and age. The model was trained with 68 (77.3%) samples and tested with 20 (22.7%) samples. The model produced 92% accuracy with 92.8% sensitivity and 83.3% specificity.  相似文献   

11.
12.
《Theriogenology》2015,84(9):1445-1450
The freezing of bull semen significantly hamper the motility of sperm which reduces the conception rate in dairy cattle. The prediction of postthaw motility (PTM) before freezing will be useful to take the decision on discarding or freezing of the germplasm. The artificial neural network (ANN) methodology found to be useful in prediction and classification problems related to animal science, and hence, the present study was undertaken to compare the efficiency of ANN in prediction of PTM on the basis of the number of ejaculates, volume, and concentration of sperms. The combined effect of Y-specific microsatellite alleles on the actual and predicted PTM was also studied. The results revealed that the prediction accuracy of PTM based on the semen quality parameters was comparatively lower because of higher variability in the data set. The ANN gave better prediction accuracy (34.88%) than the multiple regression analysis models (32.04%). The root mean square error was lower for ANN (8.4353) than that in the multiple regression analysis (8.6168). The haplotype or combined effect of microsatellite alleles on actual and predicted PTM was found to be highly significant (P < 0.01). On the basis of results, it was concluded that the ANN methodology can be used for prediction of PTM in crossbred bulls.  相似文献   

13.
基于SVM 的药物靶点预测方法及其应用   总被引:1,自引:0,他引:1       下载免费PDF全文
目的:基于已知药物靶点和潜在药物靶点蛋白的一级结构相似性,结合SVM技术研究新的有效的药物靶点预测方法。方法:构造训练样本集,提取蛋白质序列的一级结构特征,进行数据预处理,选择最优核函数,优化参数并进行特征选择,训练最优预测模型,检验模型的预测效果。以G蛋白偶联受体家族的蛋白质为预测集,应用建立的最优分类模型对其进行潜在药物靶点挖掘。结果:基于SVM所建立的最优分类模型预测的平均准确率为81.03%。应用最优分类器对构造的G蛋白预测集进行预测,结果发现预测排位在前20的蛋白质中有多个与疾病相关。特别的,其中有两个G蛋白在治疗靶点数据库(TTD)中显示已作为临床试验的药物靶点。结论:基于SVM和蛋白质序列特征的药物靶点预测方法是有效的,应用该方法预测出的潜在药物靶点能够为发现新的药靶提供参考。  相似文献   

14.
N. Bhaskar  M. Suchetha 《IRBM》2021,42(4):268-276
ObjectivesIn this paper, we propose a computationally efficient Correlational Neural Network (CorrNN) learning model and an automated diagnosis system for detecting Chronic Kidney Disease (CKD). A Support Vector Machine (SVM) classifier is integrated with the CorrNN model for improving the prediction accuracy.Material and methodsThe proposed hybrid model is trained and tested with a novel sensing module. We have monitored the concentration of urea in the saliva sample to detect the disease. Experiments are carried out to test the model with real-time samples and to compare its performance with conventional Convolutional Neural Network (CNN) and other traditional data classification methods.ResultsThe proposed method outperforms the conventional methods in terms of computational speed and prediction accuracy. The CorrNN-SVM combined network achieved a prediction accuracy of 98.67%. The experimental evaluations show a reduction in overall computation time of about 9.85% compared to the conventional CNN algorithm.ConclusionThe use of the SVM classifier has improved the capability of the network to make predictions more accurately. The proposed framework substantially advances the current methodology, and it provides more precise results compared to other data classification methods.  相似文献   

15.
基于大脑皮层互信息理论的睡眠分级研究   总被引:4,自引:0,他引:4  
睡眠的分级研究是睡眠状况分析和睡眠质量评价的前提和基本内容。目前国际通用的睡眠分级方法,是利用脑电信号另加脑功能信号(如肌电图、眼动电流图),且必须由人工来判别分析的。大脑皮层互信息理论是研究脑功能变化的有力工具。通过动态计算睡眠脑电四个导联之间的互信息时间序列的复杂度,并利用一个三层的人工神经网络进行六个级别的分类,6例720个不同时期的睡眠片段的测试表明,系统睡眠分级与人工分级的总相符率达到90.83%,且实现了睡眠动态自动分级。神经网络的学习功能,可使系统的准确率进一步提高,逐渐接近或达到人工分级的水平。  相似文献   

16.
The problem of predicting the enzymes and non-enzymes from the protein sequence information is still an open problem in bioinformatics. It is further becoming more important as the number of sequenced information grows exponentially over time. We describe a novel approach for predicting the enzymes and non-enzymes from its amino-acid sequence using artificial neural network (ANN). Using 61 sequence derived features alone we have been able to achieve 79 percent correct prediction of enzymes/non-enzymes (in the set of 660 proteins). For the complete set of 61 parameters using 5-fold cross-validated classification, ANN model reveal a superior model (accuracy = 78.79 plus or minus 6.86 percent, Q(pred) = 74.734 plus or minus 17.08 percent, sensitivity = 84.48 plus or minus 6.73 percent, specificity = 77.13 plus or minus 13.39 percent). The second module of ANN is based on PSSM matrix. Using the same 5-fold cross-validation set, this ANN model predicts enzymes/non-enzymes with more accuracy (accuracy = 80.37 plus or minus 6.59 percent, Q(pred) = 67.466 plus or minus 12.41 percent, sensitivity = 0.9070 plus or minus 3.37 percent, specificity = 74.66 plus or minus 7.17 percent).  相似文献   

17.
Most of the prediction methods for secretory proteins require the presence of a correct N-terminal end of the preprotein for correct classification. As large scale genome sequencing projects sometimes assign the 5'-end of genes incorrectly, many proteins are encoded without the correct N-terminus leading to incorrect prediction. In this study, a systematic attempt has been made to predict secretory proteins irrespective of presence or absence of N-terminal signal peptides (also known as classical and non-classical secreted proteins respectively), using machine-learning techniques; artificial neural network (ANN) and support vector machine (SVM). We trained and tested our methods on a dataset of 3321 secretory and 3654 non-secretory mammalian proteins using five-fold cross-validation technique. First, ANN-based modules have been developed for predicting secretory proteins using 33 physico-chemical properties, amino acid composition and dipeptide composition and achieved accuracies of 73.1%, 76.1% and 77.1%, respectively. Similarly, SVM-based modules using 33 physico-chemical properties, amino acid, and dipeptide composition have been able to achieve accuracies of 77.4%, 79.4% and 79.9%, respectively. In addition, BLAST and PSI-BLAST modules designed for predicting secretory proteins based on similarity search achieved 23.4% and 26.9% accuracy, respectively. Finally, we developed a hybrid-approach by integrating amino acid and dipeptide composition based SVM modules and PSI-BLAST module that increased the accuracy to 83.2%, which is significantly better than individual modules. We also achieved high sensitivity of 60.4% with low value of 5% false positive predictions using hybrid module. A web server SRTpred has been developed based on above study for predicting classical and non-classical secreted proteins from whole sequence of mammalian proteins, which is available from http://www.imtech.res.in/raghava/srtpred/.  相似文献   

18.
Lu H  Jiang W  Ghiassi M  Lee S  Nitin M 《PloS one》2012,7(1):e29704
Leaf characters have been successfully utilized to classify Camellia (Theaceae) species; however, leaf characters combined with supervised pattern recognition techniques have not been previously explored. We present results of using leaf morphological and venation characters of 93 species from five sections of genus Camellia to assess the effectiveness of several supervised pattern recognition techniques for classifications and compare their accuracy. Clustering approach, Learning Vector Quantization neural network (LVQ-ANN), Dynamic Architecture for Artificial Neural Networks (DAN2), and C-support vector machines (SVM) are used to discriminate 93 species from five sections of genus Camellia (11 in sect. Furfuracea, 16 in sect. Paracamellia, 12 in sect. Tuberculata, 34 in sect. Camellia, and 20 in sect. Theopsis). DAN2 and SVM show excellent classification results for genus Camellia with DAN2's accuracy of 97.92% and 91.11% for training and testing data sets respectively. The RBF-SVM results of 97.92% and 97.78% for training and testing offer the best classification accuracy. A hierarchical dendrogram based on leaf architecture data has confirmed the morphological classification of the five sections as previously proposed. The overall results suggest that leaf architecture-based data analysis using supervised pattern recognition techniques, especially DAN2 and SVM discrimination methods, is excellent for identification of Camellia species.  相似文献   

19.
基于人工神经网络的天然林生物量遥感估测   总被引:5,自引:0,他引:5  
基于Landsat TM遥感图像, 以吉林省汪清天然林区为例, 应用B-P神经网络建立了森林生物量非线性遥感模型系统. 除采用遥感数据外, 该系统还引入了地形因子(海拔、坡度、坡向、立地类型等)作为模型自变量. 通过压缩输入数据和增强网络训练学习算法等措施, 对标准B-P神经网络进行了增强. 模型仿真结果表明:增强型B-P神经网络具有收敛速度快和自学习、自适应功能强的特点, 能最大限度地利用样本集的先验知识, 自动提取合理的模型, 模型预测结果能真实合理地反映实际情况. 针叶林、阔叶林和针阔混交林的生物量遥感模型系统仿真结果的平均相对误差分别为-1.47%、2.38%和3.56%, 平均相对误差绝对值分别为6.33%、8.46%和8.91%, 预估效果较理想. 应用该模型系统生成了研究区的森林生物量定量分布图, 其总体精度为88.04%.  相似文献   

20.
Aims: To establish an identification system for probiotic Saccharomyces cerevisiae strains based on artificial neural network (ANN)–assisted Fourier‐transform infrared (FTIR) spectroscopy to improve quality control of animal feed. Methods and Results: The ANN‐based system for differentiating environmental from probiotic S. cerevisiae strains comprises five authorized feed additive strains plus environmental strains isolated from different habitats. A total of 108 isolates were used as reference strains to create the ANN. DHPLC analysis and δ‐PCR were used as reference methods to type probiotic yeast isolates. The performance of the FTIR‐ANN was tested in an internal validation using unknown spectra of each reference strain. This validation step yielded a classification rate of 99·1 %. For an external validation, a test data set comprising 965 spectra of 63 probiotic and environmental S. cerevisiae isolates unknown to the ANN was used, resulting in a classification rate of 98·2 %. Conclusions: Our results demonstrate that probiotic S. cerevisiae strains in feed can be differentiated successfully from environmental isolates using both genotypic approaches and ANN‐based FTIR spectroscopy. Significance and Impact of the Study: FTIR‐based artificial neural network analysis provides a rapid and inexpensive technique for yeast identification both at the species and at the strain level in routine diagnostic laboratories, using a single sample preparation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号