首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 656 毫秒
1.
寡聚蛋白质广泛地参与多种生命活动,对其预测研究有重要的意义。文章从蛋白质序列出发,提出多策略滑动伸缩窗特征提取方法,采用“ 一对一”的多类分类策略,对蛋白质同源寡聚体进行预测研究。结果表明,在Jackknife检验下,基于支持向量机的多策略滑动伸缩窗特征和氨基酸组成成分构成的特征集在加权情况下,其总分类精度最高达到了75.37%,比单纯的氨基酸组成成分法提高10.05%,比参考文献最好特征BG_Zhang提高了3.82%。 说明多策略滑动伸缩窗特征提取方法对于蛋白质同源寡聚体分类,是一种非常有效的特征提取方法。  相似文献   

2.
基于支持向量机的蛋白质同源寡聚体分类研究   总被引:14,自引:1,他引:13  
基于支持向量机和贝叶斯方法,从蛋白质一级序列出发对蛋白质同源二聚体、同源三聚体、同源四聚体、同源六聚体进行分类研究,结果表明:基于支持向量机, 采用“一对多”和“一对一”策略, 其分类总精度分别为77.36%和93.43%, 分别比基于贝叶斯协方差判别法的分类总精度50.64%提高26.72和42.79个百分点.从而说明支持向量机可用于蛋白质同源寡聚体分类,且是一种非常有效的方法.对于多类蛋白质同源寡聚体分类,基于相同的机器学习方法(如支持向量机),采用“一对一”策略比“一对多”效果好.同时亦表明蛋白质同源寡聚体一级序列包含四级结构信息.  相似文献   

3.
蛋白质的亚细胞定位与蛋白质的功能密切相关,其定位预测有助于人们了解蛋白质功能.文章提出一种分段伪氨基酸组成成分特征提取方法,采用支持向量机算法对Chou构建的两个蛋白质亚细胞定位数据集(C2129,CS2423)进行了分类研究,并采用总分类精度Q3、内容平衡精度指数Q9等参数评估预测分类系统性能.预测结果表明,基于分段伪氨基酸组成成分特征提取方法的预测性能,优于基于完整蛋白质序列的伪氨基酸组成成分特征提取方法.例如,基于分段矩描述子伪氨基酸组成成分特征提取方法,数据集C2129的Q3和Q9分别为84.7%和60.8%,比基于完整蛋白质序列的矩描述子伪氨基酸组成成分特征提取方法分别提高1.8和2.2个百分点,且Q3比现有Xiao等人的方法提高了9.1个百分点.基于分段伪氨基酸组成成分特征提取方法构成的特征向量不仅包含残基之间的位置信息,而且还包含蛋白质子序列之问的耦合信息,另外蛋白质分段子序列可能和蛋白质的功能域有一定的联系,从而使这一方法能够有效地预测蛋白质亚细胞定位.  相似文献   

4.
蛋白质相互作用研究有助于揭示生命过程的许多本质问题,也有助于疾病预防、诊断,对药物研制具有重要的参考价值。文章首先构建出蛋白质作用数据库,提出分段氨基酸组成成分特征提取方法来预测蛋白质相互作用。10CV检验下,基于支持向量机的3段氨基酸组成成分特征提取方法的预测总精度为86.2%,比传统的氨基酸组成成分方法提高2.31个百分点;采用Guo的数据库和检验方法,3段氨基酸组成成分特征提取方法的预测总精度为90.11%,比Guo的自相关函数特征提取方法提高2.75个百分点,从而表明分段氨基酸组成成分特征提取方法可有效地应用于蛋白质相互作用预测。  相似文献   

5.
基于最近邻居算法,从蛋白质一级序列出发,利用蛋白质序列氨基酸组成、二肤组成以及混合组成方法对蛋白质单聚体、二聚体、三聚体、四聚体、五聚体、六聚体和八聚体进行分类研究。结果表明:采用二肽组成编码方法的预洲效果最好,Jackknife检验和独立测试集检验的总体预测精度分别达到90.83%和95.48%,比相同数据集上基于伪氨基酸组成和组分耦合预测的方法提高了12和15个百分点;特别是对于五聚体蛋白,预测精度分别提高了90和50个百分点;说明二肽组成对于蛋白质四级结构分类研究是一种非常有效的特征提取方法。  相似文献   

6.
从非同源蛋白质的一级序列预测其结构类   总被引:8,自引:1,他引:7  
对基于氨基酸组成、自相关函数和自协方差函数提取特征的蛋白质结构类预测算法进行分析比较,对氨基酸组成和自相关函数相结合的方法,以及氨基酸组成和自协放差函数相结合的方法的预测算法进行了研究。结果表明:对非同源蛋白质,因氨基酸和自相关函数相结合的方法中,采用Miyazawa和Jernigan的疏水值时,训练的自检验的总精度为95.34%,其Jackknife检验的总精度为81.92%,检验加的他检验的总精工为86.61%。在氨基酸组成和自协方差函数相结合的方法中,采用Wold等的疏水值时,训练库的自检验的总精度为96.71%,其Jackknife检验的总精度为82.18%,检验加的他检验的总精工为86.88%。这说明氨基酸组成和自相关函数相结合的方法,以及氨基酸组成和自协方差函数相结合的方法可有效提高结构类预测精度,表明提取更多有效的序列信息是提高分类精度的关键。  相似文献   

7.
基于模糊支持向量机的膜蛋白折叠类型预测   总被引:1,自引:0,他引:1  
现有的基于支持向量机(support vector machine,SVM)来预测膜蛋白折叠类型的方法.利用的蛋白质序列特征并不充分.并且在处理多类蛋白质分类问题时存在不可分区域,针对这两类问题.提取蛋白质序列的氨基酸和二肽组成特征,并计算加权的多阶氨基酸残基指数相关系数特征,将3类特征融和作为分类器的输入特征矢量.并采用模糊SVM(fuzzy SVM,FSVM)算法解决对传统SVM不可分数据的分类.在无冗余的数据集上测试结果显示.改进的特征提取方法在相同分类算法下预测性能优于已有的特征提取方法:FSVM在相同特征提取方法下性能优于传统的SVM.二者相结合的分类策略在独立性数据集测试下的预测精度达到96.6%.优于现有的多种预测方法.能够作为预测膜蛋白和其它蛋白质折叠类型的有效工具.  相似文献   

8.
文中提出了一种简单有效的蛋白质亚细胞区间定位预测方法,为进一步了解蛋白质的功能和性质提供理论基础。运用稀疏编码,结合氨基酸组成信息提取蛋白质序列特征,基于不同字典大小对得到的特征进行多层次池化整合,并送入支持向量机进行分类。经Jackknife检验,在数据集ZD98、CH317和Gram1253上的预测成功率分别达到95.9%、93.4%和94.7%。实验证明基于多层次稀疏编码的分类预测算法能显著提高蛋白质亚细胞区间定位的预测精度。  相似文献   

9.
使用图像特征构建快速有效的蛋白质折叠识别方法   总被引:2,自引:0,他引:2  
蛋白质结构自动分类是探索蛋白质结构- 功能关系的一种重要研究手段。首先将蛋白质折叠子三维空间结构映射成为二维距离矩阵,并将距离矩阵视作灰度图像。然后基于灰度直方图和灰度共生矩阵提出了一种计算简单的折叠子结构特征提取方法,得到了低维且能够反映折叠结构特点的特征,并进一步阐明了直方图中零灰度孤峰形成原因,深入分析了共生矩阵特征中灰度分布、不同角度和像素距离对应的结构意义。最后应用于27类折叠子分类,对独立集测试的精度达到了71.95 %,对所有数据进行10 交叉验证的精度为78.94 %。与多个基于序列和结构的折叠识别方法的对比结果表明,此方法不仅具有低维和简洁的特征,而且无需复杂的分类系统,能够有效和高效地实现多类折叠子识别。  相似文献   

10.
集成改进KNN算法预测蛋白质亚细胞定位   总被引:1,自引:0,他引:1  
基于Adaboost算法对多个相似性比对K最近邻(K-nearest neighbor,KNN)分类器集成实现蛋白质的亚细胞定位预测。相似性比对KNN算法分别以氨基酸组成、二肽、伪氨基酸组成为蛋白序列特征,在KNN的决策阶段使用Blast比对决定蛋白质的亚细胞定位。在Jackknife检验下,Adaboost集成分类算法提取3种蛋白序列特征,3种特征在数据集CH317和Gram1253的最高预测成功率分别为92.4%和93.1%。结果表明Adaboost集成改进KNN分类预测方法是一种有效的蛋白质亚细胞定位预测方法。  相似文献   

11.
Shi JY  Zhang SW  Pan Q  Zhou GP 《Amino acids》2008,35(2):321-327
In the Post Genome Age, there is an urgent need to develop the reliable and effective computational methods to predict the subcellular localization for the explosion of newly found proteins. Here, a novel method of pseudo amino acid (PseAA) composition, the so-called “amino acid composition distribution” (AACD), is introduced. First, a protein sequence is divided equally into multiple segments. Then, amino acid composition of each segment is calculated in series. After that, each protein sequence can be represented by a feature vector. Finally, the feature vectors of all sequences thus obtained are further input into the multi-class support vector machines to predict the subcellular localization. The results show that AACD is quite effective in representing protein sequences for the purpose of predicting protein subcellular localization.  相似文献   

12.
13.
Zhang SW  Pan Q  Zhang HC  Shao ZC  Shi JY 《Amino acids》2006,30(4):461-468
Summary. The interaction of non-covalently bound monomeric protein subunits forms oligomers. The oligomeric proteins are superior to the monomers within the scope of functional evolution of biomacromolecules. Such complexes are involved in various biological processes, and play an important role. It is highly desirable to predict oligomer types automatically from their sequence. Here, based on the concept of pseudo amino acid composition, an improved feature extraction method of weighted auto-correlation function of amino acid residue index and Naive Bayes multi-feature fusion algorithm is proposed and applied to predict protein homo-oligomer types. We used the support vector machine (SVM) as base classifiers, in order to obtain better results. For example, the total accuracies of A, B, C, D and E sets based on this improved feature extraction method are 77.63, 77.16, 76.46, 76.70 and 75.06% respectively in the jackknife test, which are 6.39, 5.92, 5.22, 5.46 and 3.82% higher than that of G set based on conventional amino acid composition method with the same SVM. Comparing with Chou’s feature extraction method of incorporating quasi-sequence-order effect, our method can increase the total accuracy at a level of 3.51 to 1.01%. The total accuracy improves from 79.66 to 80.83% by using the Naive Bayes Feature Fusion algorithm. These results show: 1) The improved feature extraction method is effective and feasible, and the feature vectors based on this method may contain more protein quaternary structure information and appear to capture essential information about the composition and hydrophobicity of residues in the surface patches that buried in the interfaces of associated subunits; 2) Naive Bayes Feature Fusion algorithm and SVM can be referred as a powerful computational tool for predicting protein homo-oligomer types.  相似文献   

14.
Predicting the cofactors of oxidoreductases plays an important role in inferring their catalytic mechanism. Feature extraction is a critical part in the prediction systems, requiring raw sequence data to be transformed into appropriate numerical feature vectors while minimizing information loss. In this paper, we present an amino acid composition distribution method for extracting useful features from primary sequence, and the k-nearest neighbor was used as the classifier. The overall prediction accuracy evaluated by the 10-fold cross-validation reached 90.74%. Comparing our method with other eight feature extraction methods, the improvement of the overall prediction accuracy ranged from 3.49% to 15.74%. Our experimental results confirm that the method we proposed is very useful and may be used for other bioinformatical predictions. Interestingly, when features extracted by our method and Chou's amphiphilic pseudo-amino acid composition were combined, the overall accuracy could reach 92.53%.  相似文献   

15.
16.
17.
Abstract

For high accuracy classification of DNA sequences through Convolutional Neural Networks (CNNs), it is essential to use an efficient sequence representation that can accelerate similarity comparison between DNA sequences. In addition, CNN networks can be improved by avoiding the dimensionality problem associated with multi-layer CNN features. This paper presents a new approach for classification of bacterial DNA sequences based on a custom layer. A CNN is used with Frequency Chaos Game Representation (FCGR) of DNA. The FCGR is adopted as a sequence representation method with a suitable choice of the frequency k-lengthen words occurrence in DNA sequences. The DNA sequence is mapped using FCGR that produces an image of a gene sequence. This sequence displays both local and global patterns. A pre-trained CNN is built for image classification. First, the image is converted to feature maps through convolutional layers. This is sometimes followed by a down-sampling operation that reduces the spatial size of the feature map and removes redundant spatial information using the pooling layers. The Random Projection (RP) with an activation function, which carries data with a decent variety with some randomness, is suggested instead of the pooling layers. The feature reduction is achieved while keeping the high accuracy for classifying bacteria into taxonomic levels. The simulation results show that the proposed CNN based on RP has a trade-off between accuracy score and processing time.  相似文献   

18.
文献报道采用氨基酸组成分布提取特征值能有效提高预测分类精度, 本文采用该方法提取特征值, 使用一种新的组合分类器——随机森林, 从蛋白质一级结构对嗜热和嗜冷蛋白进行分类。通过10倍交叉验证和独立样本测试两种方法检测, 结果表明:当分段数量为1时, 其精度最优, 分别为92.9%和90.2%, 暗示使用基于氨基酸组成分布提取特征值在该算法中并不能有效提高识别精度, 这与报道结果不符, 而该提取方法在SVM中却能适当提高识别精度; 当引入6个新变量后, 其精度分别提高到93.2%和92.2%, ROC曲线下面积分别为0.9771和0.9696, 优于其它组合分类器。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号