首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 58 毫秒
1.
集成改进KNN算法预测蛋白质亚细胞定位   总被引:1,自引:0,他引:1  
基于Adaboost算法对多个相似性比对K最近邻(K-nearest neighbor,KNN)分类器集成实现蛋白质的亚细胞定位预测。相似性比对KNN算法分别以氨基酸组成、二肽、伪氨基酸组成为蛋白序列特征,在KNN的决策阶段使用Blast比对决定蛋白质的亚细胞定位。在Jackknife检验下,Adaboost集成分类算法提取3种蛋白序列特征,3种特征在数据集CH317和Gram1253的最高预测成功率分别为92.4%和93.1%。结果表明Adaboost集成改进KNN分类预测方法是一种有效的蛋白质亚细胞定位预测方法。  相似文献   

2.
蛋白质序列的编码是亚细胞定位预测问题中的关键技术之一。该文较为详细地介绍了目前已有的蛋白质序列编码算法;并指出了序列编码中存在的一些问题及可能的发展方向。  相似文献   

3.
王伟  郑小琪  窦永超  刘太岗  赵娟  王军 《生物信息学》2011,9(2):171-175,180
蛋白质的亚细胞位点信息有助于我们了解蛋白质的功能以及它们之间的相互作用,同时还可以为新药物的研发提供帮助。目前普遍采用的亚细胞位点预测方法主要是基于N端分选信号或氨基酸组分特征,但研究表明,单纯基于N端分选信号或氨基酸组分的方法都会丢失序列的序信息。为了克服此缺陷,本文提出了一种基于最优分割位点的蛋白质亚细胞位点预测方法。首先,把每条蛋白质序列分割为N端、中间和C端三部分,然后在每个子序列和整条序列中分别提取氨基酸组分、双肽组分和物理化学性质,最后我们把这些特征融合起来作为整条序列的特征。通过夹克刀检验,该方法在NNPSL数据集上得到的总体精度分别是87.8%和92.1%。  相似文献   

4.
文中提出了一种简单有效的蛋白质亚细胞区间定位预测方法,为进一步了解蛋白质的功能和性质提供理论基础。运用稀疏编码,结合氨基酸组成信息提取蛋白质序列特征,基于不同字典大小对得到的特征进行多层次池化整合,并送入支持向量机进行分类。经Jackknife检验,在数据集ZD98、CH317和Gram1253上的预测成功率分别达到95.9%、93.4%和94.7%。实验证明基于多层次稀疏编码的分类预测算法能显著提高蛋白质亚细胞区间定位的预测精度。  相似文献   

5.
研究真核蛋白质的亚细胞位点是了解真核蛋白质功能,深入研究蛋白质相关信号通路内在机制的基础。同时,可以为了解 疾病发病机制及为新药研发提供帮助。因此,研究真核蛋白质的亚细胞位点意义十分重大。随着基因组测序的完成,真核蛋白质 序列信息增长迅速,为真核蛋白质亚细胞位点的研究提出了更多的挑战。传统的实验法难以满足蛋白质信息量迅速增长的需求。 而采用生物信息学手段处理大规模数据的计算预测方法,可在较短时间内获得大量真核蛋白质亚细胞位点信息,弥补了实验法 的不足。因此,运用计算预测法预测真核蛋白质的亚细胞位点成为生物信息学领域的研究热点之一。本文主要从提取真核蛋白质 的特征信息、计算预测方法及预测效果的评价三个方面,介绍近年来真核蛋白质亚细胞位点预测的研究进展。  相似文献   

6.
7.
宋艳萍  姜颖  贺福初 《生命科学》2007,19(1):104-108
运用蛋白质组学方法对亚细胞组分进行研究是目前的研究热点。传统的亚细胞分离技术结合质谱鉴定技术为亚细胞蛋白质组学的研究提供了技术平台。本文对快速发展的亚细胞蛋白质组研究进展及其所揭示的生物学意义进行了综述。  相似文献   

8.
蛋白质的亚细胞定位是进行蛋白质功能研究的重要信息.蛋白质合成后被转运到特定的细胞器中,只有转运到正确的部位才能参与细胞的各种生命活动,有效地发挥功能.尝试了将保守序列及蛋白质相互作用数据的编码信息结合传统的氨基酸组成编码,采用支持向量机进行蛋白质亚细胞定位预测,在真核生物中5轮交叉验证精度达到91.8%,得到了显著的提高.  相似文献   

9.
《生物技术世界》2008,(3):81-81
我们知道,对蛋白质在细胞中的特异定位,可以增进我们对该蛋白质以及与该蛋白互作的其他蛋白功能的了解,确定蛋白质位置的一个方法是。在其上加一特异抗体。即对其贴上标签,然后加入能同此抗体特异结合的探针,最后用显微镜确定此探针在细胞中的位置。  相似文献   

10.
蛋白质亚细胞定位的识别   总被引:3,自引:2,他引:3  
根据蛋白质的亚细胞定位,将蛋白质分为12类,用离散量的数学理论,以蛋白质中400个氨基酸二联体数目构成离散源,通过计算离散增量预测蛋白质的亚细胞定位,用Self-consistency和Jackknife两种方法测试均获得较高的预测成功率。结果表明:Self-consistency方法预测成功率为84.5%,Jackknife方法预测成功率为81.1%。  相似文献   

11.
Prediction of protein subcellular locations by GO-FunD-PseAA predictor   总被引:8,自引:0,他引:8  
The localization of a protein in a cell is closely correlated with its biological function. With the explosion of protein sequences entering into DataBanks, it is highly desired to develop an automated method that can fast identify their subcellular location. This will expedite the annotation process, providing timely useful information for both basic research and industrial application. In view of this, a powerful predictor has been developed by hybridizing the gene ontology approach [Nat. Genet. 25 (2000) 25], functional domain composition approach [J. Biol. Chem. 277 (2002) 45765], and the pseudo-amino acid composition approach [Proteins Struct. Funct. Genet. 43 (2001) 246; Erratum: ibid. 44 (2001) 60]. As a showcase, the recently constructed dataset [Bioinformatics 19 (2003) 1656] was used for demonstration. The dataset contains 7589 proteins classified into 12 subcellular locations: chloroplast, cytoplasmic, cytoskeleton, endoplasmic reticulum, extracellular, Golgi apparatus, lysosomal, mitochondrial, nuclear, peroxisomal, plasma membrane, and vacuolar. The overall success rate of prediction obtained by the jackknife cross-validation was 92%. This is so far the highest success rate performed on this dataset by following an objective and rigorous cross-validation procedure.  相似文献   

12.
Assigning subcellular localization (SL) to proteins is one of the major tasks of functional proteomics. Despite the impressive technical advances of the past decades, it is still time-consuming and laborious to experimentally determine SL on a high throughput scale. Thus, computational predictions are the preferred method for large-scale assignment of protein SL, and if appropriate, followed up by experimental studies. In this report, using a machine learning approach, the Nearest Neighbor Algorithm (NNA), we developed a prediction system for protein SL in which we incorporated a protein functional domain profile. The overall accuracy achieved by this system is 93.96%. Furthermore, comparisons with other methods have been conducted to demonstrate the validity and efficiency of our prediction system. We also provide an implementation of our Subcellular Location Prediction System (SLPS), which is available at http://pcal.biosino.org.  相似文献   

13.
In the last two decades, predicting protein subcellular locations has become a hot topic in bioinformatics. A number of algorithms and online services have been developed to computationally assign a subcellular location to a given protein sequence. With the progress of many proteome projects, more and more proteins are annotated with more than one subcellular location. However, multisite prediction has only been considered in a handful of recent studies, in which there are several common challenges. In this special report, the authors discuss what these challenges are, why these challenges are important and how the existing studies gave their solutions. Finally, a vision of the future of predicting multisite protein subcellular locations is given.  相似文献   

14.
How to incorporate the sequence order effect is a key and logical step for improving the prediction quality of protein subcellular location, but meanwhile it is a very difficult problem as well. This is because the number of possible sequence order patterns in proteins is extremely large, which has posed a formidable barrier to construct an effective training data set for statistical treatment based on the current knowledge. That is why most of the existing prediction algorithms are operated based on the amino-acid composition alone. In this paper, based on the physicochemical distance between amino acids, a set of sequence-order-coupling numbers was introduced to reflect the sequence order effect, or in a rigorous term, the quasi-sequence-order effect. Furthermore, the covariant discriminant algorithm by Chou and Elrod (Protein Eng. 12, 107-118, 1999) developed recently was augmented to allow the prediction performed by using the input of both the sequence-order-coupling numbers and amino-acid composition. A remarkable improvement was observed in the prediction quality using the augmented covariant discriminant algorithm. The approach described here represents one promising step forward in the efforts of incorporating sequence order effect in protein subcellular location prediction. It is anticipated that the current approach may also have a series of impacts on the prediction of other protein features by statistical approaches.  相似文献   

15.
In this study, the predictors are developed for protein submitochondria locations based on various features of sequences. Information about the submitochondria location for a mitochondria protein can provide much better understanding about its function. We use ten representative models of protein samples such as pseudo amino acid composition, dipeptide composition, functional domain composition, the combining discrete model based on prediction of solvent accessibility and secondary structure elements, the discrete model of pairwise sequence similarity, etc. We construct a predictor based on support vector machines (SVMs) for each representative model. The overall prediction accuracy by the leave-one-out cross validation test obtained by the predictor which is based on the discrete model of pairwise sequence similarity is 1% better than the best computational system that exists for this problem. Moreover, we develop a method based on ordered weighted averaging (OWA) which is one of the fusion data operators. Therefore, OWA is applied on the 11 best SVM-based classifiers that are constructed based on various features of sequence. This method is called Mito-Loc. The overall leave-one-out cross validation accuracy obtained by Mito-Loc is about 95%. This indicates that our proposed approach (Mito-Loc) is superior to the result of the best existing approach which has already been reported.  相似文献   

16.
Prediction of protein subcellular localization   总被引:6,自引:0,他引:6  
Yu CS  Chen YC  Lu CH  Hwang JK 《Proteins》2006,64(3):643-651
Because the protein's function is usually related to its subcellular localization, the ability to predict subcellular localization directly from protein sequences will be useful for inferring protein functions. Recent years have seen a surging interest in the development of novel computational tools to predict subcellular localization. At present, these approaches, based on a wide range of algorithms, have achieved varying degrees of success for specific organisms and for certain localization categories. A number of authors have noticed that sequence similarity is useful in predicting subcellular localization. For example, Nair and Rost (Protein Sci 2002;11:2836-2847) have carried out extensive analysis of the relation between sequence similarity and identity in subcellular localization, and have found a close relationship between them above a certain similarity threshold. However, many existing benchmark data sets used for the prediction accuracy assessment contain highly homologous sequences-some data sets comprising sequences up to 80-90% sequence identity. Using these benchmark test data will surely lead to overestimation of the performance of the methods considered. Here, we develop an approach based on a two-level support vector machine (SVM) system: the first level comprises a number of SVM classifiers, each based on a specific type of feature vectors derived from sequences; the second level SVM classifier functions as the jury machine to generate the probability distribution of decisions for possible localizations. We compare our approach with a global sequence alignment approach and other existing approaches for two benchmark data sets-one comprising prokaryotic sequences and the other eukaryotic sequences. Furthermore, we carried out all-against-all sequence alignment for several data sets to investigate the relationship between sequence homology and subcellular localization. Our results, which are consistent with previous studies, indicate that the homology search approach performs well down to 30% sequence identity, although its performance deteriorates considerably for sequences sharing lower sequence identity. A data set of high homology levels will undoubtedly lead to biased assessment of the performances of the predictive approaches-especially those relying on homology search or sequence annotations. Our two-level classification system based on SVM does not rely on homology search; therefore, its performance remains relatively unaffected by sequence homology. When compared with other approaches, our approach performed significantly better. Furthermore, we also develop a practical hybrid method, which combines the two-level SVM classifier and the homology search method, as a general tool for the sequence annotation of subcellular localization.  相似文献   

17.
Prediction of protein subcellular locations using fuzzy k-NN method   总被引:7,自引:0,他引:7  
MOTIVATION: Protein localization data are a valuable information resource helpful in elucidating protein functions. It is highly desirable to predict a protein's subcellular locations automatically from its sequence. RESULTS: In this paper, fuzzy k-nearest neighbors (k-NN) algorithm has been introduced to predict proteins' subcellular locations from their dipeptide composition. The prediction is performed with a new data set derived from version 41.0 SWISS-PROT databank, the overall predictive accuracy about 80% has been achieved in a jackknife test. The result demonstrates the applicability of this relative simple method and possible improvement of prediction accuracy for the protein subcellular locations. We also applied this method to annotate six entirely sequenced proteomes, namely Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Oryza sativa, Arabidopsis thaliana and a subset of all human proteins. AVAILABILITY: Supplementary information and subcellular location annotations for eukaryotes are available at http://166.111.30.65/hying/fuzzy_loc.htm  相似文献   

18.
Prediction of membrane protein types and subcellular locations.   总被引:12,自引:0,他引:12  
K C Chou  D W Elrod 《Proteins》1999,34(1):137-153
Membrane proteins are classified according to two different schemes. In scheme 1, they are discriminated among the following five types: (1) type I single-pass transmembrane, (2) type II single-pass transmembrane, (3) multipass transmembrane, (4) lipid chain-anchored membrane, and (5) GPI-anchored membrane proteins. In scheme 2, they are discriminated among the following nine locations: (1) chloroplast, (2) endoplasmic reticulum, (3) Golgi apparatus, (4) lysosome, (5) mitochondria, (6) nucleus, (7) peroxisome, (8) plasma, and (9) vacuole. An algorithm is formulated for predicting the type or location of a given membrane protein based on its amino acid composition. The overall rates of correct prediction thus obtained by both self-consistency and jackknife tests, as well as by an independent dataset test, were around 76-81% for the classification of five types, and 66-70% for the classification of nine cellular locations. Furthermore, classification and prediction were also conducted between inner and outer membrane proteins; the corresponding rates thus obtained were 88-91%. These results imply that the types of membrane proteins, as well as their cellular locations and other attributes, are closely correlated with their amino acid composition. It is anticipated that the classification schemes and prediction algorithm can expedite the functionality determination of new proteins. The concept and method can be also useful in the prioritization of genes and proteins identified by genomics efforts as potential molecular targets for drug design.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号