首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Intrinsically disordered proteins are an important class of proteins with unique functions and properties. Here, we have applied a support vector machine (SVM) trained on naturally occurring disordered and ordered proteins to examine the contribution of various parameters (vectors) to recognizing proteins that contain disordered regions. We find that a SVM that incorporates only amino acid composition has a recognition accuracy of 87+/-2%. This result suggests that composition alone is sufficient to accurately recognize disorder. Interestingly, SVMs using reduced sets of amino acids based on chemical similarity preserve high recognition accuracy. A set as small as four retains an accuracy of 84+/-2%; this suggests that general physicochemical properties rather than specific amino acids are important factors contributing to protein disorder.  相似文献   

2.
Automatising the analysis of haematopoietic cells culture is a step which is necessary in order to develop the use of this kind of measurement in toxicological assessment. The purpose is the classification of cell aggregates into three groups namely micro-clusters, macro-clusters and colonies. This is classically done using human vision. However, reproducibility is not good and comparisons between experts or laboratories remains very difficult. In this work, we propose a method based on learning machine: different parameters representative of these three kinds of clusters are tested in order to choose the best discriminative ones, then a learning database is created in order to parametrise the software, before using it on actual clusters images. The algorithm used is the support vector machine (SVM). Results show an excellent ability because almost 90% of the test database is well classified using three geometric parameters namely area, minimum and maximum centroid distances.  相似文献   

3.
Identifying prokaryotes in silico is commonly based on DNA sequences. In experiments where DNA sequences may not be immediately available, we need to have a different approach to detect prokaryotes based on RNA or protein sequences. N-formylmethionine (fMet) is known as a typical characteristic of prokaryotes. A web tool has been implemented here for predicting prokaryotes through detecting the N-formylmethionine residues in protein sequences. The predictor is constructed using support vector machine. An online predictor has been implemented using Python. The implemented predictor is able to achieve the total prediction accuracy 80% with the specificity 80% and the sensitivity 81%.  相似文献   

4.
许嘉 《生物信息学》2013,11(4):297-299
抗冻蛋白是一类具有提高生物抗冻能力的蛋白质。抗冻蛋白能够特异性的与冰晶相结合,进而阻止体液内冰核的形成与生长。因此,对抗冻蛋白的生物信息学研究对生物工程发展。提高作物抗冻性有重要的推动作用。本文采用由400条抗冻蛋白序列和400条非抗冻蛋白序列构成数据集,以伪氨基酸组分为特征,利用支持向量机分类算法预测抗冻蛋白,对训练集预测精度达到91.3%,对测试集预测精度达到78.8%。该结果证明伪氨基酸组分能够很好的反映抗冻蛋白特性,并能够用于预测抗冻蛋白。  相似文献   

5.
As a result of genome and other sequencing projects, the gap between the number of known protein sequences and the number of known protein structural classes is widening rapidly. In order to narrow this gap, it is vitally important to develop a computational prediction method for fast and accurately determining the protein structural class. In this paper, a novel predictor is developed for predicting protein structural class. It is featured by employing a support vector machine learning system and using a different pseudo-amino acid composition (PseAA), which was introduced to, to some extent, take into account the sequence-order effects to represent protein samples. As a demonstration, the jackknife cross-validation test was performed on a working dataset that contains 204 non-homologous proteins. The predicted results are very encouraging, indicating that the current predictor featured with the PseAA may play an important complementary role to the elegant covariant discriminant predictor and other existing algorithms.  相似文献   

6.
Knowledge of structural class plays an important role in understanding protein folding patterns. In this study, a simple and powerful computational method, which combines support vector machine with PSI-BLAST profile, is proposed to predict protein structural class for low-similarity sequences. The evolution information encoding in the PSI-BLAST profiles is converted into a series of fixed-length feature vectors by extracting amino acid composition and dipeptide composition from the profiles. The resulting vectors are then fed to a support vector machine classifier for the prediction of protein structural class. To evaluate the performance of the proposed method, jackknife cross-validation tests are performed on two widely used benchmark datasets, 1189 (containing 1092 proteins) and 25PDB (containing 1673 proteins) with sequence similarity lower than 40% and 25%, respectively. The overall accuracies attain 70.7% and 72.9% for 1189 and 25PDB datasets, respectively. Comparison of our results with other methods shows that our method is very promising to predict protein structural class particularly for low-similarity datasets and may at least play an important complementary role to existing methods.  相似文献   

7.
Ding S  Zhang S  Li Y  Wang T 《Biochimie》2012,94(5):1166-1171
Knowledge of structural classes plays an important role in understanding protein folding patterns. In this paper, features based on the predicted secondary structure sequence and the corresponding E–H sequence are extracted. Then, an 11-dimensional feature vector is selected based on a wrapper feature selection algorithm and a support vector machine (SVM). Among the 11 selected features, 4 novel features are newly designed to model the differences between α/β class and α + β class, and other 7 rational features are proposed by previous researchers. To examine the performance of our method, a total of 5 datasets are used to design and test the proposed method. The results show that competitive prediction accuracies can be achieved by the proposed method compared to existing methods (SCPRED, RKS-PPSC and MODAS), and 4 new features are demonstrated essential to differentiate α/β and α + β classes. Standalone version of the proposed method is written in JAVA language and it can be downloaded from http://web.xidian.edu.cn/slzhang/paper.html.  相似文献   

8.
膜蛋白是重要的药物靶位点,对膜蛋白类型的研究有助于药物的成功设计,因此正确预测膜蛋白类型对于药物研发是十分必要的。本文采用由274条分枝杆菌膜蛋白序列组成的一致性小于40%的数据集,以经过优化的伪氨基酸组分为特征,利用支持向量机分类算法预测分枝杆菌膜蛋白类型,在Jackknife检验下,得到85.4%的总体准确率和72.2%的平均准确率。结果说明,该方法可用于分枝杆菌膜蛋白类型的识别,将有助于抗分枝杆菌药物的开发。  相似文献   

9.
蛋白质亚细胞定位预测对蛋白质的功能、相互作用及调控机制的研究具有重要意义。本文基于物化性质和结构性质对氨基酸的约化,描述序列局部和全局信息的"组成"、"转换"和"分布"特征,并利用氨基酸亲疏水性的数值统计特征,提出了一种新的蛋白质特征表示方法(NSBH)。分别使用三种分类器KNN、SVM及BP神经网络进行蛋白质亚细胞定位预测,比较了几种方法和特征融合方法的预测结果,显示融合特征表示及结合SVM分类器时能够达到更好的预测准确率。同时,还详细讨论了不同参数对实验结果的影响,具体的实验及比较结果显示了该方法的有效性。  相似文献   

10.
Apoptosis proteins are very important for understanding the mechanism of programmed cell death. The apoptosis protein localization can provide valuable information about its molecular function. The prediction of localization of an apoptosis protein is a challenging task. In our previous work we proposed an increment of diversity (ID) method using protein sequence information for this prediction task. In this work, based on the concept of Chou's pseudo-amino acid composition [Chou, K.C., 2001. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins: Struct. Funct. Genet. (Erratum: Chou, K.C., 2001, vol. 44, 60) 43, 246-255, Chou, K.C., 2005. Using amphiphilic pseudo-amino acid composition to predict enzyme subfamily classes. Bioinformatics 21, 10-19], a different pseudo-amino acid composition by using the hydropathy distribution information is introduced. A novel ID_SVM algorithm combined ID with support vector machine (SVM) is proposed. This method is applied to three data sets (317 apoptosis proteins, 225 apoptosis proteins and 98 apoptosis proteins). The higher predictive success rates than the previous algorithms are obtained by the jackknife tests.  相似文献   

11.
Prediction of protein classification is an important topic in molecular biology. This is because it is able to not only provide useful information from the viewpoint of structure itself, but also greatly stimulate the characterization of many other features of proteins that may be closely correlated with their biological functions. In this paper, the LogitBoost, one of the boosting algorithms developed recently, is introduced for predicting protein structural classes. It performs classification using a regression scheme as the base learner, which can handle multi-class problems and is particularly superior in coping with noisy data. It was demonstrated that the LogitBoost outperformed the support vector machines in predicting the structural classes for a given dataset, indicating that the new classifier is very promising. It is anticipated that the power in predicting protein structural classes as well as many other bio-macromolecular attributes will be further strengthened if the LogitBoost and some other existing algorithms can be effectively complemented with each other.  相似文献   

12.
The thermostability of proteins is particularly relevant for enzyme engineering. Developing a computational method to identify mesophilic proteins would be helpful for protein engineering and design. In this work, we developed support vector machine based method to predict thermophilic proteins using the information of amino acid distribution and selected amino acid pairs. A reliable benchmark dataset including 915 thermophilic proteins and 793 non-thermophilic proteins was constructed for training and testing the proposed models. Results showed that 93.8% thermophilic proteins and 92.7% non-thermophilic proteins could be correctly predicted by using jackknife cross-validation. High predictive successful rate exhibits that this model can be applied for designing stable proteins.  相似文献   

13.
Structural class characterizes the overall folding type of a protein or its domain and the prediction of protein structural class has become both an important and a challenging topic in protein science. Moreover, the prediction itself can stimulate the development of novel predictors that may be straightforwardly applied to many other relational areas. In this paper, 10 frequently used sequence-derived structural and physicochemical features, which can be easily computed by the PROFEAT (Protein Features) web server, were taken as inputs of support vector machines to develop statistical learning models for predicting the protein structural class. More importantly, a strategy of merging different features, called best-first search, was developed. It was shown through the rigorous jackknife cross-validation test that the success rates by our method were significantly improved. We anticipate that the present method may also have important impacts on boosting the predictive accuracies for a series of other protein attributes, such as subcellular localization, membrane types, enzyme family and subfamily classes, among many others.  相似文献   

14.
Ion channels are integral membrane proteins that control movement of ions into or out of cells. They are key components in a wide range of biological processes. Different types of ion channels have different biological functions. With the appearance of vast proteomic data, it is highly desirable for both basic research and drug-target discovery to develop a computational method for the reliable prediction of ion channels and their types. In this study, we developed a support vector machine-based method to predict ion channels and their types using primary sequence information. A feature selection technique, analysis of variance (ANOVA), was introduced to remove feature redundancy and find out an optimized feature set for improving predictive performance. Jackknife cross-validated results show that the proposed method can discriminate ion channels from non-ion channels with an overall accuracy of 86.6%, classify voltage-gated ion channels and ligand-gated ion channels with an overall accuracy of 92.6% and predict four types (potassium, sodium, calcium and anion) of voltage-gated ion channels with an overall accuracy of 87.8%, respectively. These results indicate that the proposed method can correctly identify ion channels and provide important instructions for drug-target discovery. The predictor can be freely downloaded from http://cobi.uestc.edu.cn/people/hlin/tools/IonchanPred/.  相似文献   

15.
《Biochimie》2013,95(9):1741-1744
In this study, a 12-dimensional feature vector is constructed to reflect the general contents and spatial arrangements of the secondary structural elements of a given protein sequence. Among the 12 features, 6 novel features are specially designed to improve the prediction accuracies for α/β and α + β classes based on the distributions of α-helices and β-strands and the characteristics of parallel β-sheets and anti-parallel β-sheets. To evaluate our method, the jackknife cross-validating test is employed on two widely-used datasets, 25PDB and 1189 datasets with sequence similarity lower than 40% and 25%, respectively. The performance of our method outperforms the recently reported methods in most cases, and the 6 newly-designed features have significant positive effect to the prediction accuracies, especially for α/β and α + β classes.  相似文献   

16.
This paper presents an essentially new method used to construct phylogenetic trees from related amino acid sequences. The method is based on a new distance measure which describes sequence relationships by means of typical steric and physicochemical properties of the amino acids and is advantageous in some essential points. The method was applied to different sets of protein sequences and the results were compared with other well-established methods.  相似文献   

17.
To evaluate the possibility of an unknown protein to be a resistant gene against Xanthomonas oryzae pv. oryzae, a different mode of pseudo amino acid composition (PseAAC) is proposed to formulate the protein samples by integrating the amino acid composition, as well as the Chaos games representation (CGR) method. Some numerical comparisons of triangle, quadrangle and 12-vertex polygon CGR are carried to evaluate the efficiency of using these fractal figures in classifiers. The numerical results show that among the three polygon methods, triangle method owns a good fractal visualization and performs the best in the classifier construction. By using triangle + 12-vertex polygon CGR as the mathematical feature, the classifier achieves 98.13% in Jackknife test and MCC achieves 0.8462.  相似文献   

18.
One major problem with the existing algorithm for the prediction of protein structural classes is low accuracies for proteins from α/β and α+β classes. In this study, three novel features were rationally designed to model the differences between proteins from these two classes. In combination with other rational designed features, an 11-dimensional vector prediction method was proposed. By means of this method, the overall prediction accuracy based on 25PDB dataset was 1.5% higher than the previous best-performing method, MODAS. Furthermore, the prediction accuracy for proteins from α+β class based on 25PDB dataset was 5% higher than the previous best-performing method, SCPRED. The prediction accuracies obtained with the D675 and FC699 datasets were also improved.  相似文献   

19.
Chen C  Zhou X  Tian Y  Zou X  Cai P 《Analytical biochemistry》2006,357(1):116-121
Because a priori knowledge of a protein structural class can provide useful information about its overall structure, the determination of protein structural class is a quite meaningful topic in protein science. However, with the rapid increase in newly found protein sequences entering into databanks, it is both time-consuming and expensive to do so based solely on experimental techniques. Therefore, it is vitally important to develop a computational method for predicting the protein structural class quickly and accurately. To deal with the challenge, this article presents a dual-layer support vector machine (SVM) fusion network that is featured by using a different pseudo-amino acid composition (PseAA). The PseAA here contains much information that is related to the sequence order of a protein and the distribution of the hydrophobic amino acids along its chain. As a showcase, the rigorous jackknife cross-validation test was performed on the two benchmark data sets constructed by Zhou. A significant enhancement in success rates was observed, indicating that the current approach may serve as a powerful complementary tool to other existing methods in this area.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号