首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Mitochondria are considered as one of the core organelles of eukaryotic cells hence prediction of mitochondrial proteins is one of the major challenges in the field of genome annotation. This study describes a method, MitPred, developed for predicting mitochondrial proteins with high accuracy. The data set used in this study was obtained from Guda, C., Fahy, E. & Subramaniam, S. (2004) Bioinformatics 20, 1785-1794. First support vector machine-based modules/methods were developed using amino acid and dipeptide composition of proteins and achieved accuracy of 78.37 and 79.38%, respectively. The accuracy of prediction further improved to 83.74% when split amino acid composition (25 N-terminal, 25 C-terminal, and remaining residues) of proteins was used. Then BLAST search and support vector machine-based method were combined to get 88.22% accuracy. Finally we developed a hybrid approach that combined hidden Markov model profiles of domains (exclusively found in mitochondrial proteins) and the support vector machine-based method. We were able to predict mitochondrial protein with 100% specificity at a 56.36% sensitivity rate and with 80.50% specificity at 98.95% sensitivity. The method estimated 9.01, 6.35, 4.84, 3.95, and 4.25% of proteins as mitochondrial in Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, mouse, and human proteomes, respectively. MitPred was developed on the above hybrid approach.  相似文献   

2.
MOTIVATION: Protein fold recognition is an important approach to structure discovery without relying on sequence similarity. We study this approach with new multi-class classification methods and examined many issues important for a practical recognition system. RESULTS: Most current discriminative methods for protein fold prediction use the one-against-others method, which has the well-known 'False Positives' problem. We investigated two new methods: the unique one-against-others and the all-against-all methods. Both improve prediction accuracy by 14-110% on a dataset containing 27 SCOP folds. We used the Support Vector Machine (SVM) and the Neural Network (NN) learning methods as base classifiers. SVMs converges fast and leads to high accuracy. When scores of multiple parameter datasets are combined, majority voting reduces noise and increases recognition accuracy. We examined many issues involved with large number of classes, including dependencies of prediction accuracy on the number of folds and on the number of representatives in a fold. Overall, recognition systems achieve 56% fold prediction accuracy on a protein test dataset, where most of the proteins have below 25% sequence identity with the proteins used in training.  相似文献   

3.
4.
Yuan Z  Burrage K  Mattick JS 《Proteins》2002,48(3):566-570
A Support Vector Machine learning system has been trained to predict protein solvent accessibility from the primary structure. Different kernel functions and sliding window sizes have been explored to find how they affect the prediction performance. Using a cut-off threshold of 15% that splits the dataset evenly (an equal number of exposed and buried residues), this method was able to achieve a prediction accuracy of 70.1% for single sequence input and 73.9% for multiple alignment sequence input, respectively. The prediction of three and more states of solvent accessibility was also studied and compared with other methods. The prediction accuracies are better than, or comparable to, those obtained by other methods such as neural networks, Bayesian classification, multiple linear regression, and information theory. In addition, our results further suggest that this system may be combined with other prediction methods to achieve more reliable results, and that the Support Vector Machine method is a very useful tool for biological sequence analysis.  相似文献   

5.
Sun XD  Huang RB 《Amino acids》2006,30(4):469-475
Summary. The support vector machine, a machine-learning method, is used to predict the four structural classes, i.e. mainly α, mainly β, α–β and fss, from the topology-level of CATH protein structure database. For the binary classification, any two structural classes which do not share any secondary structure such as α and β elements could be classified with as high as 90% accuracy. The accuracy, however, will decrease to less than 70% if the structural classes to be classified contain structure elements in common. Our study also shows that the dimensions of feature space 202 = 400 (for dipeptide) and 203 = 8 000 (for tripeptide) give nearly the same prediction accuracy. Among these 4 structural classes, multi-class classification gives an overall accuracy of about 52%, indicating that the multi-class classification technique in support of vector machines may still need to be further improved in future investigation.  相似文献   

6.
Discrimination of outer membrane proteins using support vector machines   总被引:3,自引:0,他引:3  
MOTIVATION: Discriminating outer membrane proteins from other folding types of globular and membrane proteins is an important task both for dissecting outer membrane proteins (OMPs) from genomic sequences and for the successful prediction of their secondary and tertiary structures. RESULTS: We have developed a method based on support vector machines using amino acid composition and residue pair information. Our approach with amino acid composition has correctly predicted the OMPs with a cross-validated accuracy of 94% in a set of 208 proteins. Further, this method has successfully excluded 633 of 673 globular proteins and 191 of 206 alpha-helical membrane proteins. We obtained an overall accuracy of 92% for correctly picking up the OMPs from a dataset of 1087 proteins belonging to all different types of globular and membrane proteins. Furthermore, residue pair information improved the accuracy from 92 to 94%. This accuracy of discriminating OMPs is higher than that of other methods in the literature, which could be used for dissecting OMPs from genomic sequences. AVAILABILITY: Discrimination results are available at http://tmbeta-svm.cbrc.jp.  相似文献   

7.
This paper presents a composite multi-layer classifier system for predicting the subcellular localization of proteins based on their amino acid sequence. The work is an extension of our previous predictor PProwler v1.1 which is itself built upon the series of predictors SignalP and TargetP. In this study we outline experiments conducted to improve the classifier design. The major improvement came from using Support Vector machines as a "smart gate" sorting the outputs of several different targeting peptide detection networks. Our final model (PProwler v1.2) gives MCC values of 0.873 for non-plant and 0.849 for plant proteins. The model improves upon the accuracy of our previous subcellular localization predictor (PProwler v1.1) by 2% for plant data (which represents 7.5% improvement upon TargetP).  相似文献   

8.
Predicting the hand and fingers posture during grasping tasks is an important issue in the frame of biomechanics. In this paper, a technique based on neural networks is proposed to learn the inverse kinematics mapping between the fingertip 3D position and the corresponding joint angles. Finger movements are obtained by an instrumented glove and are mapped to a multichain model of the hand. From the fingertip desired position, the neural networks allow predicting the corresponding finger joint angles keeping the specific subject coordination patterns. Two sets of movements are considered in this study. The first one, the training set, consisting of free fingers movements is used to construct the mapping between fingertip position and joint angles. The second one, constructed for testing purposes, is composed of a sequence of grasping tasks of everyday-life objects. The maximal mean error between fingertip measured position and fingertip position obtained from simulated joint angles and forward kinematics is 0.99+/-0.76mm for the training set and 1.49+/-1.62mm for the test set. Also, the maximal RMS error of joint angles prediction is 2.85 degrees and 5.10 degrees for the training and test sets respectively, while the maximal mean joint angles prediction error is -0.11+/-4.34 degrees and -2.52+/-6.71 degrees for the training and test sets, respectively. Results relative to the learning and generalization capabilities of this architecture are also presented and discussed.  相似文献   

9.

Background  

Identification of DNA-binding proteins is one of the major challenges in the field of genome annotation, as these proteins play a crucial role in gene-regulation. In this paper, we developed various SVM modules for predicting DNA-binding domains and proteins. All models were trained and tested on multiple datasets of non-redundant proteins.  相似文献   

10.
Recently, two different models have been developed for predicting gamma-turns in proteins by Kaur and Raghava [2002. An evaluation of beta-turn prediction methods. Bioinformatics 18, 1508-1514; 2003. A neural-network based method for prediction of gamma-turns in proteins from multiple sequence alignment. Protein Sci. 12, 923-929]. However, the major limitation of previous methods is inability in predicting gamma-turns types. Thus, there is a need to predict gamma-turn types using an approach which will be useful in overall tertiary structure prediction. In this work, support vector machines (SVMs), a powerful model is proposed for predicting gamma-turn types in proteins. The high rates of prediction accuracy showed that the formation of gamma-turn types is evidently correlated with the sequence of tripeptides, and hence can be approximately predicted based on the sequence information of the tripeptides alone.  相似文献   

11.
An incorrect version of Figure 3 was published in the abovearticle, the corrected version is reproduced below.  相似文献   

12.
Artificial neural networks are made upon of highly interconnected layers of simple neuron-like nodes. The neurons act as non-linear processing elements within the network. An attractive property of artificial neural networks is that given the appropriate network topology, they are capable of learning and characterising non-linear functional relationships. Furthermore, the structure of the resulting neural network based process model may be considered generic, in the sense that little prior process knowledge is required in its determination. The methodology therefore provides a cost efficient and reliable process modelling technique. One area where such a technique could be useful is biotechnological systems. Here, for example, the use of a process model within an estimation scheme has long been considered an effective means of overcoming inherent on-line measurement problems. However, the development of an accurate process model is extremely time consuming and often results in a model of limited applicability. Artificial neural networks could therefore prove to be a useful model building tool when striving to improve bioprocess operability. Two large scale industrial fermentation systems have been considered as test cases; a fed-batch penicillin fermentation and a continuous mycelial fermentation. Both systems serve to demonstrate the utility, flexibility and potential of the artificial neural network approach to process modelling.  相似文献   

13.
Prediction of beta-turns in proteins using neural networks   总被引:7,自引:0,他引:7  
The use of neural networks to improve empirical secondary structure prediction is explored with regard to the identification of the position and conformational class of beta-turns, a four-residue chain reversal. Recently an algorithm was developed for beta-turn predictions based on the empirical approach of Chou and Fasman using different parameters for three classes (I, II and non-specific) of beta-turns. In this paper, using the same data, an alternative approach to derive an empirical prediction method is used based on neural networks which is a general learning algorithm extensively used in artificial intelligence. Thus the results of the two approaches can be compared. The most severe test of prediction accuracy is the percentage of turn predictions that are correct and the neural network gives an overall improvement from 20.6% to 26.0%. The proportion of correctly predicted residues is 71%, compared to a chance level of about 58%. Thus neural networks provide a method of obtaining more accurate predictions from empirical data than a simpler method of deriving propensities.  相似文献   

14.
Artificial neural networks (ANNs) have been used for the recognition of non-linear patterns, a characteristic of bioprocesses like wine production. In this work, ANNs were tested to predict problems of wine fermentation. A database of about 20,000 data from industrial fermentations of Cabernet Sauvignon and 33 variables was used. Two different ways of inputting data into the model were studied, by points and by fermentation. Additionally, different sub-cases were studied by varying the predictor variables (total sugar, alcohol, glycerol, density, organic acids and nitrogen compounds) and the time of fermentation (72, 96 and 256 h). The input of data by fermentations gave better results than the input of data by points. In fact, it was possible to predict 100% of normal and problematic fermentations using three predictor variables: sugars, density and alcohol at 72 h (3 days). Overall, ANNs were capable of obtaining 80% of prediction using only one predictor variable at 72 h; however, it is recommended to add more fermentations to confirm this promising result.  相似文献   

15.
16.

Background

β-turns are secondary structure type that have essential role in molecular recognition, protein folding, and stability. They are found to be the most common type of non-repetitive structures since 25% of amino acids in protein structures are situated on them. Their prediction is considered to be one of the crucial problems in bioinformatics and molecular biology, which can provide valuable insights and inputs for the fold recognition and drug design.

Results

We propose an approach that combines support vector machines (SVMs) and logistic regression (LR) in a hybrid prediction method, which we call (H-SVM-LR) to predict β-turns in proteins. Fractional polynomials are used for LR modeling. We utilize position specific scoring matrices (PSSMs) and predicted secondary structure (PSS) as features. Our simulation studies show that H-SVM-LR achieves Qtotal of 82.87%, 82.84%, and 82.32% on the BT426, BT547, and BT823 datasets respectively. These values are the highest among other β-turns prediction methods that are based on PSSMs and secondary structure information. H-SVM-LR also achieves favorable performance in predicting β-turns as measured by the Matthew's correlation coefficient (MCC) on these datasets. Furthermore, H-SVM-LR shows good performance when considering shape strings as additional features.

Conclusions

In this paper, we present a comprehensive approach for β-turns prediction. Experiments show that our proposed approach achieves better performance compared to other competing prediction methods.
  相似文献   

17.
Glucose is a simple sugar that plays an essential role in many basic metabolic and signaling pathways. Many proteins have binding sites that are highly specific to glucose. The exponential increase of genomic data has revealed the identity of many proteins that seem to be central to biological processes, but whose exact functions are unknown. Many of these proteins seem to be associated with disease processes. Being able to predict glucose‐specific binding sites in these proteins will greatly enhance our ability to annotate protein function and may significantly contribute to drug design. We hereby present the first glucose‐binding site classifier algorithm. We consider the sugar‐binding pocket as a spherical spatio‐chemical environment and represent it as a vector of geometric and chemical features. We then perform Random Forests feature selection to identify key features and analyze them using support vector machines classification. Our work shows that glucose binding sites can be modeled effectively using a limited number of basic chemical and residue features. Using a leave‐one‐out cross‐validation method, our classifier achieves a 8.11% error, a 89.66% sensitivity and a 93.33% specificity over our dataset. From a biochemical perspective, our results support the relevance of ordered water molecules and ions in determining glucose specificity. They also reveal the importance of carboxylate residues in glucose binding and the high concentration of negatively charged atoms in direct contact with the bound glucose molecule. Proteins 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

18.
The lack of sensors for some relevant state variables in fermentation processes can be coped by developing appropriate software sensors. In this work, NARX-ANN, NARMAX-ANN, NARX-SVM and NARMAX-SVM models are compared when acting as software sensors of biomass concentration for a solid substrate cultivation (SSC) process. Results show that NARMAX-SVM outperforms the other models with an SMAPE index under 9 for a 20 % amplitude noise. In addition, NARMAX models perform better than NARX models under the same noise conditions because of their better predictive capabilities as they include prediction errors as inputs. In the case of perturbation of initial conditions of the autoregressive variable, NARX models exhibited better convergence capabilities. This work also confirms that a difficult to measure variable, like biomass concentration, can be estimated on-line from easy to measure variables like CO2 and O2 using an adequate software sensor based on computational intelligence techniques.  相似文献   

19.
Cheng J  Randall A  Baldi P 《Proteins》2006,62(4):1125-1132
Accurate prediction of protein stability changes resulting from single amino acid mutations is important for understanding protein structures and designing new proteins. We use support vector machines to predict protein stability changes for single amino acid mutations leveraging both sequence and structural information. We evaluate our approach using cross-validation methods on a large dataset of single amino acid mutations. When only the sign of the stability changes is considered, the predictive method achieves 84% accuracy-a significant improvement over previously published results. Moreover, the experimental results show that the prediction accuracy obtained using sequence alone is close to the accuracy obtained using tertiary structure information. Because our method can accurately predict protein stability changes using primary sequence information only, it is applicable to many situations where the tertiary structure is unknown, overcoming a major limitation of previous methods which require tertiary information. The web server for predictions of protein stability changes upon mutations (MUpro), software, and datasets are available at http://www.igb.uci.edu/servers/servers.html.  相似文献   

20.

Background  

The prediction of the secondary structure of a protein is a critical step in the prediction of its tertiary structure and, potentially, its function. Moreover, the backbone dihedral angles, highly correlated with secondary structures, provide crucial information about the local three-dimensional structure.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号