首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到16条相似文献,搜索用时 140 毫秒
1.
利用分组重量编码预测细胞凋亡蛋白的亚细胞定位   总被引:2,自引:1,他引:1       下载免费PDF全文
从氨基酸的物化特性出发,利用物理学中“粗粒化”和“分组”的思想,提出了一种新的蛋白质序列特征提取方法——分组重量编码方法。采用组分耦合算法作为分类器,从蛋白质一级序列出发对细胞凋亡蛋白的亚细胞定位进行研究。针对Zhou和Doctor使用的数据集,Re—substitution和Jackknife检验总体预测精度分别为98、O%和85.7%,比基于氨基酸组成和组分耦合算法的总体预测精度提高了7.2%和13.2%;针对陈颖丽和李前忠使用的数据集,Re—substitution和Jackknife检验总体预测精度分别为94.0%和80、1%,比基于二肽组成和离散增量算法的总体预测精度提高了5.9%和2、0%。针对我们自己整理的最新数据集,通过Re—substitution和Jackknife检验,总体预测精度分别为97.33%和75、11%。实验结果表明蛋白质序列的分组重量编码对于细胞凋亡蛋白的定位研究是一种有效的特征提取方法。  相似文献   

2.
林昊 《生物信息学》2009,7(4):252-254
由于蛋白质亚细胞位置与其一级序列存在很强的相关性,利用多样性增量来描述蛋白质之间氨基酸组分和二肽组分的相似程度,采用修正的马氏判别式(这里称为IDQD方法)对分枝杆菌蛋白质的亚细胞位置进行了预测。利用Jackknife检验对不同序列相似度下的蛋白质数据集进行了预测研究,结果显示,当数据集的序列相似度小于等于70%时,算法的预测精度稳定在75%左右。在对整体852条蛋白质的预测成功率达到87.7%,这一结果优于已有算法的预测精度,说明IDQD是一种有效的分枝杆菌蛋白质亚细胞预测方法。  相似文献   

3.
用离散增量结合支持向量机方法预测蛋白质亚细胞定位   总被引:3,自引:0,他引:3  
赵禹  赵巨东  姚龙 《生物信息学》2010,8(3):237-239,244
对未知蛋白的功能注释是蛋白质组学的主要目标。一个关键的注释是蛋白质亚细胞定位的预测。本文应用离散增量结合支持向量机(ID_SVM)的方法,对阳性革兰氏细菌蛋白的5类亚细胞定位点进行预测。在独立检验下,其总体预测成功率为89.66%。结果发现ID_SVM算法对预测的成功率有很大改进。  相似文献   

4.
根据凋亡蛋白的亚细胞位置主要决定于它的氨基酸序列这一观点,基于局部氨基酸序列的n肽组分和序列的亲疏水性分布信息,采用离散增量结合支持向量机(ID_SVM)算法,对六类细胞凋亡蛋白的亚细胞位置进行预测。结果表明,在Re-substitution检验和Jackknife检验下,ID_SVM算法的总体预测成功率分别达到了94.6%和84.2%;在5-fold检验和10-fold检验下,其总体预测成功率也都达到了83%以上。通过比较ID和ID_SVM两种方法的预测能力发现,结合了支持向量机的离散增量算法能够改进预测成功率,结果表明ID_SVM是预测凋亡蛋白亚细胞位置的一种很有效的方法。  相似文献   

5.
集成改进KNN算法预测蛋白质亚细胞定位   总被引:1,自引:0,他引:1  
基于Adaboost算法对多个相似性比对K最近邻(K-nearest neighbor,KNN)分类器集成实现蛋白质的亚细胞定位预测。相似性比对KNN算法分别以氨基酸组成、二肽、伪氨基酸组成为蛋白序列特征,在KNN的决策阶段使用Blast比对决定蛋白质的亚细胞定位。在Jackknife检验下,Adaboost集成分类算法提取3种蛋白序列特征,3种特征在数据集CH317和Gram1253的最高预测成功率分别为92.4%和93.1%。结果表明Adaboost集成改进KNN分类预测方法是一种有效的蛋白质亚细胞定位预测方法。  相似文献   

6.
文中提出了一种简单有效的蛋白质亚细胞区间定位预测方法,为进一步了解蛋白质的功能和性质提供理论基础。运用稀疏编码,结合氨基酸组成信息提取蛋白质序列特征,基于不同字典大小对得到的特征进行多层次池化整合,并送入支持向量机进行分类。经Jackknife检验,在数据集ZD98、CH317和Gram1253上的预测成功率分别达到95.9%、93.4%和94.7%。实验证明基于多层次稀疏编码的分类预测算法能显著提高蛋白质亚细胞区间定位的预测精度。  相似文献   

7.
相似性比对预测蛋白质亚细胞区间   总被引:1,自引:0,他引:1  
王雄飞  张梁  薛卫  赵南  徐焕良 《微生物学通报》2016,43(10):2298-2305
【目的】对蛋白质所属的亚细胞区间进行预测,为进一步研究蛋白质的生物学功能提供基础。【方法】以蛋白质序列的氨基酸组成、二肽、伪氨基酸组成作为序列特征,用BLAST比对改进K最近邻分类算法(K-nearest neighbor,KNN)实现蛋白序列所属亚细胞区间预测。【结果】在Jackknife检验下,数据集CH317三种特征的成功率分别为91.5%、91.5%和89.3%,数据集ZD98成功率分别为93.9%、92.9%和89.8%。【结论】BLAST比对改进KNN算法是预测蛋白质亚细胞区间的一种有效方法。  相似文献   

8.
随机森林方法预测膜蛋白类型   总被引:2,自引:0,他引:2       下载免费PDF全文
膜蛋白的类型与其功能是密切相关的,因此膜蛋白类型的预测是研究其功能的重要手段,从蛋白质的氨基酸序列出发对膜蛋白的类型进行预测有重要意义。文章基于蛋白质的氨基酸序列,将组合离散增量和伪氨基酸组分信息共同作为预测参数,采用随机森林分类器,对8类膜蛋白进行了预测。在Jackknife检验下的预测精度为86.3%,独立检验的预测精度为93.8%,取得了好于前人的预测结果。  相似文献   

9.
膜蛋白是生物膜功能的主要体现者,是细胞执行各种功能的物质基础,在细胞中发挥着至关重要的作用.分类预测未知类型的膜蛋白对于生物学相关研究具有指导性意义,是膜蛋白结构与功能研究领域的一项重要基础性工作.针对膜蛋白分类预测问题,利用k子串离散源的方法对膜蛋白序列进行特征提取,并融合最小离散增量方法和加权K近邻算法构建一种新型的膜蛋白分类预测模型,在自检验、Jackknife检验和独立测试集检验三种典型的检验方式下,预测准确率分别为99.95%、86.16%和98.36%.实验结果表明,k子串离散源方法能够有效提取膜蛋白序列的特征信息,与现有方法相比,该分类模型具有较高的分类预测成功率.  相似文献   

10.
用离散量预测蛋白质的结构型   总被引:14,自引:2,他引:12  
基于蛋白质的结构类型决定了它的二级结构序列的概念,用二级结构序列参数Nα,Nβ,Nβaβ,N(βαβ)构成离散源,并计算离散量D(Xα),D(Xβ),D(Xα+β),利用离散增量预测蛋白质的结构类型,它是由这个蛋白质的离散量D(Xn)与四个标准离散D(Xα),D(Xβ),D(Xα/β),D(Xα+β)之间离散增量的最小值所决定的,预测结果表明,准确率分别达到84.8%(标准集)和83.3%(检验集)。  相似文献   

11.
Prediction of the subcellular location of apoptosis proteins   总被引:4,自引:0,他引:4  
Apoptosis proteins have a central role in the development and the homeostasis of an organism. These proteins are very important for understanding the mechanism of programmed cell death. The function of an apoptosis protein is closely related to its subcellular location. Based on the concept that the subcellular location of an apoptosis protein is mainly determined by its amino acid sequence, a new algorithm for prediction of the subcellular location of an apoptosis protein is proposed. By using of a distinctive set of information parameters derived from the primary sequence of 317 apoptosis proteins, the increment of diversity (ID), the sole prediction parameter, is calculated. The higher predictive success rates than the previous other algorithms is obtained by the jackknife tests using the expanded dataset. Our prediction results show that the local compositions of twin amino acids and hydropathy distribution are very useful to predict subcellular location of protein.  相似文献   

12.
The location of a protein in a cell is closely correlated with its biological function. Based on the concept that the protein subcellular location is mainly determined by its amino acid and pseudo amino acid composition (PseAA), a new algorithm of increment of diversity combined with support vector machine is proposed to predict the protein subcellular location. The subcellular locations of plant and non-plant proteins are investigated by our method. The overall prediction accuracies in jackknife test are 88.3% for the eukaryotic plant proteins and 92.4% for the eukaryotic non-plant proteins, respectively. In order to estimate the effect of the sequence identity on predictive result, the proteins with sequence identity 相似文献   

13.
张振慧  王勇献  王正华 《激光生物学报》2007,16(2):249-252,F0003
细胞凋亡蛋白对生物体的发育、维持内环境稳定及人们理解细胞凋亡机制非常重要。文中提出了一种新的蛋白质序列特征提取方法—三肽离散源方法。计算了蛋白质序列中紧邻三联体的出现个数,利用离散增量极小化对凋亡蛋白进行定位预测;同时推广了张春霆等提出的内容平衡精度指数,使其能评估任意类的分类问题。实验结果表明:在凋亡蛋白定位预测研究中,三肽离散源方法在提高总体预测精度的同时,能够较好的解决样本不均衡问题;而内容平衡精度指数能比传统的总体预测精度更准确的评估预测算法的预测能力,有效的反映预测算法对样本不均衡问题的相容能力。  相似文献   

14.
The study of rat proteins is an indispensable task in experimental medicine and drug development. The function of a rat protein is closely related to its subcellular location. Based on the above concept, we construct the benchmark rat proteins dataset and develop a combined approach for predicting the subcellular localization of rat proteins. From protein primary sequence, the multiple sequential features are obtained by using of discrete Fourier analysis, position conservation scoring function and increment of diversity, and these sequential features are selected as input parameters of the support vector machine. By the jackknife test, the overall success rate of prediction is 95.6% on the rat proteins dataset. Our method are performed on the apoptosis proteins dataset and the Gram-negative bacterial proteins dataset with the jackknife test, the overall success rates are 89.9% and 96.4%, respectively. The above results indicate that our proposed method is quite promising and may play a complementary role to the existing predictors in this area.  相似文献   

15.
Recent advances in large-scale genome sequencing have led to the rapid accumulation of amino acid sequences of proteins whose functions are unknown. Since the functions of these proteins are closely correlated with their subcellular localizations, many efforts have been made to develop a variety of methods for predicting protein subcellular location. In this study, based on the strategy by hybridizing the functional domain composition and the pseudo-amino acid composition (Cai and Chou [2003]: Biochem. Biophys. Res. Commun. 305:407-411), the Intimate Sorting Algorithm (ISort predictor) was developed for predicting the protein subcellular location. As a showcase, the same plant and non-plant protein datasets as investigated by the previous investigators were used for demonstration. The overall success rate by the jackknife test for the plant protein dataset was 85.4%, and that for the non-plant protein dataset 91.9%. These are so far the highest success rates achieved for the two datasets by following a rigorous cross validation test procedure, further confirming that such a hybrid approach may become a very useful high-throughput tool in the area of bioinformatics, proteomics, as well as molecular cell biology.  相似文献   

16.
Cai J  Huang Y  Li F  Li Y 《Proteins》2006,62(3):793-799
Alternative translation is an important cellular mechanism contributing to the generation of proteins and the diversity of protein functions. Instead of studying individual cases, we systematically analyzed the alteration of protein subcellular location and domain formation by alternative translational initiation in eukaryotes. The results revealed that 85.7% of alternative translation events generated biological diversity, attributed to different subcellular localizations and distinct domain contents in alternative isoforms. Analysis of isoelectric point values revealed that most N-terminal truncated isoforms significantly lowered their isoelectric point values targeted at different subcellular localizations, whereas they had conserved domain contents the same as the full-length isoforms. Furthermore, Fisher's exact test indicated that the two ways-targeting at different cellular compartments and changing domain contents-were negatively associated. The N-term truncated isoforms should have only one way to diversify their functions distinct from the full-length ones. The peculiar consequence of subcellular relocation as well as change of domain contents reflected the very high level of biological complexity as alternative usage of initiation codons.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号