首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 187 毫秒
1.
利用分组重量编码预测细胞凋亡蛋白的亚细胞定位   总被引:2,自引:1,他引:1  
从氨基酸的物化特性出发,利用物理学中“粗粒化”和“分组”的思想,提出了一种新的蛋白质序列特征提取方法——分组重量编码方法。采用组分耦合算法作为分类器,从蛋白质一级序列出发对细胞凋亡蛋白的亚细胞定位进行研究。针对Zhou和Doctor使用的数据集,Re—substitution和Jackknife检验总体预测精度分别为98、O%和85.7%,比基于氨基酸组成和组分耦合算法的总体预测精度提高了7.2%和13.2%;针对陈颖丽和李前忠使用的数据集,Re—substitution和Jackknife检验总体预测精度分别为94.0%和80、1%,比基于二肽组成和离散增量算法的总体预测精度提高了5.9%和2、0%。针对我们自己整理的最新数据集,通过Re—substitution和Jackknife检验,总体预测精度分别为97.33%和75、11%。实验结果表明蛋白质序列的分组重量编码对于细胞凋亡蛋白的定位研究是一种有效的特征提取方法。  相似文献   

2.
根据凋亡蛋白的亚细胞位置主要决定于它的氨基酸序列这一观点,基于局部氨基酸序列的n肽组分和序列的亲疏水性分布信息,采用离散增量结合支持向量机(ID_SVM)算法,对六类细胞凋亡蛋白的亚细胞位置进行预测。结果表明,在Re-substitution检验和Jackknife检验下,ID_SVM算法的总体预测成功率分别达到了94.6%和84.2%;在5-fold检验和10-fold检验下,其总体预测成功率也都达到了83%以上。通过比较ID和ID_SVM两种方法的预测能力发现,结合了支持向量机的离散增量算法能够改进预测成功率,结果表明ID_SVM是预测凋亡蛋白亚细胞位置的一种很有效的方法。  相似文献   

3.
蛋白质的亚细胞定位与蛋白质的功能密切相关,其定位预测有助于人们了解蛋白质功能.文章提出一种分段伪氨基酸组成成分特征提取方法,采用支持向量机算法对Chou构建的两个蛋白质亚细胞定位数据集(C2129,CS2423)进行了分类研究,并采用总分类精度Q3、内容平衡精度指数Q9等参数评估预测分类系统性能.预测结果表明,基于分段伪氨基酸组成成分特征提取方法的预测性能,优于基于完整蛋白质序列的伪氨基酸组成成分特征提取方法.例如,基于分段矩描述子伪氨基酸组成成分特征提取方法,数据集C2129的Q3和Q9分别为84.7%和60.8%,比基于完整蛋白质序列的矩描述子伪氨基酸组成成分特征提取方法分别提高1.8和2.2个百分点,且Q3比现有Xiao等人的方法提高了9.1个百分点.基于分段伪氨基酸组成成分特征提取方法构成的特征向量不仅包含残基之间的位置信息,而且还包含蛋白质子序列之问的耦合信息,另外蛋白质分段子序列可能和蛋白质的功能域有一定的联系,从而使这一方法能够有效地预测蛋白质亚细胞定位.  相似文献   

4.
蛋白质亚细胞定位的识别   总被引:5,自引:2,他引:3  
根据蛋白质的亚细胞定位,将蛋白质分为12类,用离散量的数学理论,以蛋白质中400个氨基酸二联体数目构成离散源,通过计算离散增量预测蛋白质的亚细胞定位,用Self-consistency和Jackknife两种方法测试均获得较高的预测成功率。结果表明:Self-consistency方法预测成功率为84.5%,Jackknife方法预测成功率为81.1%。  相似文献   

5.
集成改进KNN算法预测蛋白质亚细胞定位   总被引:1,自引:0,他引:1       下载免费PDF全文
基于Adaboost算法对多个相似性比对K最近邻(K-nearest neighbor,KNN)分类器集成实现蛋白质的亚细胞定位预测。相似性比对KNN算法分别以氨基酸组成、二肽、伪氨基酸组成为蛋白序列特征,在KNN的决策阶段使用Blast比对决定蛋白质的亚细胞定位。在Jackknife检验下,Adaboost集成分类算法提取3种蛋白序列特征,3种特征在数据集CH317和Gram1253的最高预测成功率分别为92.4%和93.1%。结果表明Adaboost集成改进KNN分类预测方法是一种有效的蛋白质亚细胞定位预测方法。  相似文献   

6.
用离散增量结合支持向量机方法预测蛋白质亚细胞定位   总被引:3,自引:0,他引:3  
赵禹  赵巨东  姚龙 《生物信息学》2010,8(3):237-239,244
对未知蛋白的功能注释是蛋白质组学的主要目标。一个关键的注释是蛋白质亚细胞定位的预测。本文应用离散增量结合支持向量机(ID_SVM)的方法,对阳性革兰氏细菌蛋白的5类亚细胞定位点进行预测。在独立检验下,其总体预测成功率为89.66%。结果发现ID_SVM算法对预测的成功率有很大改进。  相似文献   

7.
王伟  郑小琪  窦永超  刘太岗  赵娟  王军 《生物信息学》2011,9(2):171-175,180
蛋白质的亚细胞位点信息有助于我们了解蛋白质的功能以及它们之间的相互作用,同时还可以为新药物的研发提供帮助。目前普遍采用的亚细胞位点预测方法主要是基于N端分选信号或氨基酸组分特征,但研究表明,单纯基于N端分选信号或氨基酸组分的方法都会丢失序列的序信息。为了克服此缺陷,本文提出了一种基于最优分割位点的蛋白质亚细胞位点预测方法。首先,把每条蛋白质序列分割为N端、中间和C端三部分,然后在每个子序列和整条序列中分别提取氨基酸组分、双肽组分和物理化学性质,最后我们把这些特征融合起来作为整条序列的特征。通过夹克刀检验,该方法在NNPSL数据集上得到的总体精度分别是87.8%和92.1%。  相似文献   

8.
蛋白质序列的编码是亚细胞定位预测问题中的关键技术之一。该文较为详细地介绍了目前已有的蛋白质序列编码算法;并指出了序列编码中存在的一些问题及可能的发展方向。  相似文献   

9.
相似性比对预测蛋白质亚细胞区间   总被引:1,自引:0,他引:1  
王雄飞  张梁  薛卫  赵南  徐焕良 《微生物学通报》2016,43(10):2298-2305
【目的】对蛋白质所属的亚细胞区间进行预测,为进一步研究蛋白质的生物学功能提供基础。【方法】以蛋白质序列的氨基酸组成、二肽、伪氨基酸组成作为序列特征,用BLAST比对改进K最近邻分类算法(K-nearest neighbor,KNN)实现蛋白序列所属亚细胞区间预测。【结果】在Jackknife检验下,数据集CH317三种特征的成功率分别为91.5%、91.5%和89.3%,数据集ZD98成功率分别为93.9%、92.9%和89.8%。【结论】BLAST比对改进KNN算法是预测蛋白质亚细胞区间的一种有效方法。  相似文献   

10.
确定T细胞所识别抗原分子上的短肽序列对T细胞表位进行定位,对于研究特异性免疫应答有着重要意义。综述了近年来实验确定和理论预测T细胞蛋白质抗原袁位的常用方法,以及T细胞抗原表位分析的研究方法。  相似文献   

11.
12.
Prediction of the subcellular location of apoptosis proteins   总被引:4,自引:0,他引:4  
Apoptosis proteins have a central role in the development and the homeostasis of an organism. These proteins are very important for understanding the mechanism of programmed cell death. The function of an apoptosis protein is closely related to its subcellular location. Based on the concept that the subcellular location of an apoptosis protein is mainly determined by its amino acid sequence, a new algorithm for prediction of the subcellular location of an apoptosis protein is proposed. By using of a distinctive set of information parameters derived from the primary sequence of 317 apoptosis proteins, the increment of diversity (ID), the sole prediction parameter, is calculated. The higher predictive success rates than the previous other algorithms is obtained by the jackknife tests using the expanded dataset. Our prediction results show that the local compositions of twin amino acids and hydropathy distribution are very useful to predict subcellular location of protein.  相似文献   

13.
The location of a protein in a cell is closely correlated with its biological function. Based on the concept that the protein subcellular location is mainly determined by its amino acid and pseudo amino acid composition (PseAA), a new algorithm of increment of diversity combined with support vector machine is proposed to predict the protein subcellular location. The subcellular locations of plant and non-plant proteins are investigated by our method. The overall prediction accuracies in jackknife test are 88.3% for the eukaryotic plant proteins and 92.4% for the eukaryotic non-plant proteins, respectively. In order to estimate the effect of the sequence identity on predictive result, the proteins with sequence identity 相似文献   

14.
Evaluation of gene-finding algorithms by a content-balancing accuracy index   总被引:2,自引:0,他引:2  
A content-balancing accuracy index, called q(9), to evaluate gene-finding algorithms has been proposed. Here the concept of content-balancing means that the evaluation by this index is independent of the coding and non-coding composition of the sequence being evaluated. Since the coding and non-coding compositions are severely unbalanced in eukaryotic genomes, the performance of gene-finding algorithms is either over- or under-evaluated by the widely used accuracy indices, e.g., the correlation coefficient, due to the lack of content-balancing ability. Using the new accuracy index q(9), seven gene-finding algorithms, FGENES; Gene-Mark.hmm; Genie; Genescan; HMMgene; Morgan and MZEF, were compared and evaluated. It is shown that Genescan is still the best one, but with q(9)= 89%, averaged over the prediction for 195 sequences. In addition to the content-balancing ability, q(9) has the merit of having definition in all possible cases. It is also shown that the traditional specificity s(p) carries important information on the performance of the algorithm being evaluated. The set of sensitivity s(n), specificity s(p) and the accuracy q(9) constitutes a complete kit to evaluate gene-finding algorithms at nucleotide level. In addition, a graphic method to compare and evaluate gene-finding algorithms has been proposed, too. Its major advantage is that the overall performance of algorithms can be grasped quickly in a perceivable form. Additionally, the new accuracy index q(9) may be applied to evaluate the performance of weather forecast, clinical diagnosis, psychological examination and protein secondary structure prediction etc.  相似文献   

15.
Given a raw protein sequence, knowing its subcellular location is an important step toward understanding its function and designing further experiments. A novel method is proposed for the prediction of protein subcellular locations from sequences. For four categories of eukaryotic proteins the overall predictive accuracy is 82.0%, 2.6% higher than that by using SVM approach. For three subcellular locations of prokaryotic proteins, an overall accuracy of 89.9% is obtained. In accordance with the architecture of cells, a hierarchical prediction approach is designed. Based on amino acid composition extracellular proteins and intracellular proteins can be identified with accuracy of 97%.  相似文献   

16.
The study of rat proteins is an indispensable task in experimental medicine and drug development. The function of a rat protein is closely related to its subcellular location. Based on the above concept, we construct the benchmark rat proteins dataset and develop a combined approach for predicting the subcellular localization of rat proteins. From protein primary sequence, the multiple sequential features are obtained by using of discrete Fourier analysis, position conservation scoring function and increment of diversity, and these sequential features are selected as input parameters of the support vector machine. By the jackknife test, the overall success rate of prediction is 95.6% on the rat proteins dataset. Our method are performed on the apoptosis proteins dataset and the Gram-negative bacterial proteins dataset with the jackknife test, the overall success rates are 89.9% and 96.4%, respectively. The above results indicate that our proposed method is quite promising and may play a complementary role to the existing predictors in this area.  相似文献   

17.
Apoptosis, or programmed cell death, plays an important role in development of an organism. Obtaining information on subcellular location of apoptosis proteins is very helpful to understand the apoptosis mechanism. In this paper, based on the concept that the position distribution information of amino acids is closely related with the structure and function of proteins, we introduce the concept of distance frequency [Matsuda, S., Vert, J.P., Ueda, N., Toh, H., Akutsu, T., 2005. A novel representation of protein sequences for prediction of subcellular location using support vector machines. Protein Sci. 14, 2804-2813] and propose a novel way to calculate distance frequencies. In order to calculate the local features, each protein sequence is separated into p parts with the same length in our paper. Then we use the novel representation of protein sequences and adopt support vector machine to predict subcellular location. The overall prediction accuracy is significantly improved by jackknife test.  相似文献   

18.
A content-balancing accuracy index, called Q(9), has been proposed to evaluate algorithms of protein secondary structure prediction. Here the content-balancing means that the evaluation is independent of the contents of helix, strand and coil in the protein being predicted. It is shown that Q(9) is much superior to the widely used index Q(3). Therefore, algorithms are more objectively evaluated by Q(9) than Q(3). Based on 396 non-homologous proteins, five algorithms of secondary structure prediction were evaluated and compared by the new index Q(9). Of the five algorithms, PHD turned out to be the unique algorithm with an average Q(9) better than 60%. Based on the new index, it is shown that the performance of the consensus method based on a jury-decision from several algorithms is even worse than that of the best individual method. Rather than Q(3), we believe that Q(9) should be used to evaluate algorithms of protein secondary structure prediction in future studies in order to improve prediction quality.  相似文献   

19.
Neural networks have been trained to predict the subcellular location of proteins in prokaryotic or eukaryotic cells from their amino acid composition. For three possible subcellular locations in prokaryotic organisms a prediction accuracy of 81% can be achieved. Assigning a reliability index, 33% of the predictions can be made with an accuracy of 91%. For eukaryotic proteins (excluding plant sequences) an overall prediction accuracy of 66% for four locations was achieved, with 33% of the sequences being predicted with an accuracy of 82% or better. With the subcellular location restricting a protein's possible function, this method should be a useful tool for the systematic analysis of genome data and is available via a server on the world wide web.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号