首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 250 毫秒
1.
2.
Fan GL  Li QZ 《Amino acids》2012,43(2):545-555
Knowledge of the submitochondria location of protein is integral to understanding its function and a necessity in the proteomics era. In this work, a new submitochondria data set is constructed, and an approach for predicting protein submitochondria locations is proposed by combining the amino acid composition, dipeptide composition, reduced physicochemical properties, gene ontology, evolutionary information, and pseudo-average chemical shift. The overall prediction accuracy is 93.57% for the submitochondria location and 97.79% for the three membrane protein types in the mitochondria inner membrane using the algorithm of the increment of diversity combined with the support vector machine. The performance of the pseudo-average chemical shift is excellent. For contrast, the method is also used to predict submitochondria locations in the data set constructed by Du and Li; an accuracy of 94.95% is obtained by our method, which is better than that of other existing methods.  相似文献   

3.
本文建立了一个最新的蛋白质亚线粒体定位数据集,包含4个亚线粒体定位的1 293条序列,结合基因本体(GO)信息和同源信息对线粒体蛋白质进行特征提取,利用支持向量机算法建立分类器,经Jackknife检验,对于4个亚线粒体位置的总体预测准确率为93.27%,其中3个亚线粒体位置的总体预测准确率为94.73%.  相似文献   

4.
蛋白质合成后被转运到特定的细胞器中,只有转运到正确的部位才能参与细胞的各种生命活动,有效地发挥功能,因此蛋白质的功能与其亚细胞定位有着密切的联系,通过确定蛋白质在细胞中的位置可以获取蛋白质功能和结构的信息。在近二十年中,蛋白质亚细胞定位预测算法研究已经取得很大的成绩,在此基础上,蛋白质在细胞器内亚结构的定位预测研究,如对蛋白质亚线粒体和亚叶绿体定位的研究成为更深层次的问题,本文简要介绍国内外在蛋白质亚叶绿体和亚线粒体定位预测方面的研究进展。  相似文献   

5.
The mitochondrion is a key organelle of eukaryotic cell that provides the energy for cellular activities. Correctly identifying submitochondria locations of proteins can provide plentiful information for understanding their functions. However, using web-experimental methods to recognize submitochondria locations of proteins are time-consuming and costly. Thus, it is highly desired to develop a bioinformatics method to predict the submitochondria locations of mitochondrion proteins. In this work, a novel method based on support vector machine was developed to predict the submitochondria locations of mitochondrion proteins by using over-represented tetrapeptides selected by using binomial distribution. A reliable and rigorous benchmark dataset including 495 mitochondrion proteins with sequence identity ≤25 % was constructed for testing and evaluating the proposed model. Jackknife cross-validated results showed that the 91.1 % of the 495 mitochondrion proteins can be correctly predicted. Subsequently, our model was estimated by three existing benchmark datasets. The overall accuracies are 94.0, 94.7 and 93.4 %, respectively, suggesting that the proposed model is potentially useful in the realm of mitochondrion proteome research. Based on this model, we built a predictor called TetraMito which is freely available at http://lin.uestc.edu.cn/server/TetraMito.  相似文献   

6.

Background  

Knowing the submitochondria localization of a mitochondria protein is an important step to understand its function. We develop a method which is based on an extended version of pseudo-amino acid composition to predict the protein localization within mitochondria. This work goes one step further than predicting protein subcellular location. We also try to predict the membrane protein type for mitochondrial inner membrane proteins.  相似文献   

7.
In this study, the predictors are developed for protein submitochondria locations based on various features of sequences. Information about the submitochondria location for a mitochondria protein can provide much better understanding about its function. We use ten representative models of protein samples such as pseudo amino acid composition, dipeptide composition, functional domain composition, the combining discrete model based on prediction of solvent accessibility and secondary structure elements, the discrete model of pairwise sequence similarity, etc. We construct a predictor based on support vector machines (SVMs) for each representative model. The overall prediction accuracy by the leave-one-out cross validation test obtained by the predictor which is based on the discrete model of pairwise sequence similarity is 1% better than the best computational system that exists for this problem. Moreover, we develop a method based on ordered weighted averaging (OWA) which is one of the fusion data operators. Therefore, OWA is applied on the 11 best SVM-based classifiers that are constructed based on various features of sequence. This method is called Mito-Loc. The overall leave-one-out cross validation accuracy obtained by Mito-Loc is about 95%. This indicates that our proposed approach (Mito-Loc) is superior to the result of the best existing approach which has already been reported.  相似文献   

8.
Jiang L  Li M  Wen Z  Wang K  Diao Y 《The protein journal》2006,25(4):241-249
A new method was proposed for prediction of mitochondrial proteins by the discrete wavelet transform, based on the sequence–scale similarity measurement. This sequence–scale similarity, revealing more information than other conventional methods, does not rely on subcellular location information and can directly predict protein sequences with different length. In our experiments, 499 mitochondrial protein sequences, constituting a mitochondria database, were used as training dataset, and 681 non-mitochondrial protein sequences were tested. The system can predict these sequences with sensitivity, specificity, accuracy and MCC of 50.30%, 95.74%, 76.53% and 0.54, respectively. Source code of the new program is available on request from the authors.  相似文献   

9.
MOTIVATION: Currently available methods for the prediction of subcellular location of mitochondrial proteins rely largely on the presence of mitochondrial targeting signals in the protein sequences. However, a large fraction of mitochondrial proteins lack such signals, making those tools ineffective for genome-scale prediction of mitochondria-targeted proteins. Here, we propose a method for genome-scale prediction of nucleus-encoded mitochondrial proteins. The new method, MITOPRED, is based on the Pfam domain occurrence patterns and the amino acid compositional differences between mitochondrial and non-mitochondrial proteins. RESULTS: MITOPRED could predict mitochondrial proteins with 100% specificity at a 44% sensitivity rate and with 67% specificity at 99% sensitivity. Additionally, it was sufficiently robust to predict mitochondrial proteins across different eukaryotic species with similar accuracy. Based on Matthews correlation coefficient measure, the prediction performance of MITOPRED is clearly superior (0.73) to those of the two popular methods TargetP (0.51) and PSORT (0.53). Using this method, we predicted the nucleus-encoded mitochondrial proteins from six complete genomes (three invertebrate, two vertebrate and one plant species) and estimated the total number in each genome. In human, our method estimated the existence of 1362 mitochondrial proteins corresponding to 4.8% of the total proteome. AVAILABILITY: MITOPRED program is freely accessible at http://mitopred.sdsc.edu. Source code is available on request from the authors. SUPPLEMENTARY INFORMATION: Training data sets are also available at http://mitopred.sdsc.edu  相似文献   

10.
The chloroplast is a type of plant specific subcellular organelle. It is of central importance in several biological processes like photosynthesis and amino acid biosynthesis. Thus, understanding the function of chloroplast proteins is of significant value. Since the function of chloroplast proteins correlates with their subchloroplast locations, the knowledge of their subchloroplast locations can be very helpful in understanding their role in the biological processes. In the current paper, by introducing the evidence-theoretic K-nearest neighbor (ET-KNN) algorithm, we developed a method for predicting the protein subchloroplast locations. This is the first algorithm for predicting the protein subchloroplast locations. We have implemented our algorithm as an online service, SubChlo (http://bioinfo.au.tsinghua.edu.cn/subchlo). This service may be useful to the chloroplast proteome research.  相似文献   

11.
相似性比对预测蛋白质亚细胞区间   总被引:1,自引:0,他引:1  
王雄飞  张梁  薛卫  赵南  徐焕良 《微生物学通报》2016,43(10):2298-2305
【目的】对蛋白质所属的亚细胞区间进行预测,为进一步研究蛋白质的生物学功能提供基础。【方法】以蛋白质序列的氨基酸组成、二肽、伪氨基酸组成作为序列特征,用BLAST比对改进K最近邻分类算法(K-nearest neighbor,KNN)实现蛋白序列所属亚细胞区间预测。【结果】在Jackknife检验下,数据集CH317三种特征的成功率分别为91.5%、91.5%和89.3%,数据集ZD98成功率分别为93.9%、92.9%和89.8%。【结论】BLAST比对改进KNN算法是预测蛋白质亚细胞区间的一种有效方法。  相似文献   

12.
集成改进KNN算法预测蛋白质亚细胞定位   总被引:1,自引:0,他引:1  
基于Adaboost算法对多个相似性比对K最近邻(K-nearest neighbor,KNN)分类器集成实现蛋白质的亚细胞定位预测。相似性比对KNN算法分别以氨基酸组成、二肽、伪氨基酸组成为蛋白序列特征,在KNN的决策阶段使用Blast比对决定蛋白质的亚细胞定位。在Jackknife检验下,Adaboost集成分类算法提取3种蛋白序列特征,3种特征在数据集CH317和Gram1253的最高预测成功率分别为92.4%和93.1%。结果表明Adaboost集成改进KNN分类预测方法是一种有效的蛋白质亚细胞定位预测方法。  相似文献   

13.
MOTIVATION: The subcellular location of a protein is closely correlated to its function. Thus, computational prediction of subcellular locations from the amino acid sequence information would help annotation and functional prediction of protein coding genes in complete genomes. We have developed a method based on support vector machines (SVMs). RESULTS: We considered 12 subcellular locations in eukaryotic cells: chloroplast, cytoplasm, cytoskeleton, endoplasmic reticulum, extracellular medium, Golgi apparatus, lysosome, mitochondrion, nucleus, peroxisome, plasma membrane, and vacuole. We constructed a data set of proteins with known locations from the SWISS-PROT database. A set of SVMs was trained to predict the subcellular location of a given protein based on its amino acid, amino acid pair, and gapped amino acid pair compositions. The predictors based on these different compositions were then combined using a voting scheme. Results obtained through 5-fold cross-validation tests showed an improvement in prediction accuracy over the algorithm based on the amino acid composition only. This prediction method is available via the Internet.  相似文献   

14.
Gao QB  Wang ZZ  Yan C  Du YH 《FEBS letters》2005,579(16):3444-3448
To understand the structure and function of a protein, an important task is to know where it occurs in the cell. Thus, a computational method for properly predicting the subcellular location of proteins would be significant in interpreting the original data produced by the large-scale genome sequencing projects. The present work tries to explore an effective method for extracting features from protein primary sequence and find a novel measurement of similarity among proteins for classifying a protein to its proper subcellular location. We considered four locations in eukaryotic cells and three locations in prokaryotic cells, which have been investigated by several groups in the past. A combined feature of primary sequence defined as a 430D (dimensional) vector was utilized to represent a protein, including 20 amino acid compositions, 400 dipeptide compositions and 10 physicochemical properties. To evaluate the prediction performance of this encoding scheme, a jackknife test based on nearest neighbor algorithm was employed. The prediction accuracies for cytoplasmic, extracellular, mitochondrial, and nuclear proteins in the former dataset were 86.3%, 89.2%, 73.5% and 89.4%, respectively, and the total prediction accuracy reached 86.3%. As for the prediction accuracies of cytoplasmic, extracellular, and periplasmic proteins in the latter dataset, the prediction accuracies were 97.4%, 86.0%, and 79.7, respectively, and the total prediction accuracy of 92.5% was achieved. The results indicate that this method outperforms some existing approaches based on amino acid composition or amino acid composition and dipeptide composition.  相似文献   

15.
Purification of mitochondria and mitochondrial protein complexes from green tissues is often severely impaired by the presence of chloroplasts and their proteins. Here we present a method which allows analysis of respiratory protein complexes from potato leaves. The procedure includes the preparation of an organellar fraction specifically enriched in mitochondria and the separation of organellar protein complexes by blue-native polyacrylamide gel electrophoresis (BN-PAGE). For the first time mitochondrial and chloroplast protein complexes have been resolved simultaneously in a native gel. BN-PAGE allowed the separation of eleven bands, including the mitochondrial NADH-dehydrogenase, the bc1 complex and the mitochondrial F1-ATP synthase as well as the chloroplast F1-ATP synthase, the cytochrome b6f complex, the two photosystems and the light harvesting complex. The resolution of the protein complexes in the first dimension was good enough to allow identification of all subunits of individual complexes in the second dimension under denaturing conditions. Thus, BN-PAGE offers an opportunity to analyze mitochondrial and chloroplast protein complexes from a single preparation from very small amounts of tissue. The implications of our findings, for studies on protein expression and turnover in different tissues and developmental stages, are discussed.  相似文献   

16.
Proteins may simultaneously exist at, or move between, two or more different subcellular locations. Proteins with multiple locations or dynamic feature of this kind are particularly interesting because they may have some very special biological functions intriguing to investigators in both basic research and drug discovery. For instance, among the 6408 human protein entries that have experimentally observed subcellular location annotations in the Swiss-Prot database (version 50.7, released 19-Sept-2006), 973 ( approximately 15%) have multiple location sites. The number of total human protein entries (except those annotated with "fragment" or those with less than 50 amino acids) in the same database is 14,370, meaning a gap of (14,370-6408)=7962 entries for which no knowledge is available about their subcellular locations. Although one can use the computational approach to predict the desired information for the gap, so far all the existing methods for predicting human protein subcellular localization are limited in the case of single location site only. To overcome such a barrier, a new ensemble classifier, named Hum-mPLoc, was developed that can be used to deal with the case of multiple location sites as well. Hum-mPLoc is freely accessible to the public as a web server at http://202.120.37.186/bioinf/hum-multi. Meanwhile, for the convenience of people working in the relevant areas, Hum-mPLoc has been used to identify all human protein entries in the Swiss-Prot database that do not have subcellular location annotations or are annotated as being uncertain. The large-scale results thus obtained have been deposited in a downloadable file prepared with Microsoft Excel and named "Tab_Hum-mPLoc.xls". This file is available at the same website and will be updated twice a year to include new entries of human proteins and reflect the continuous development of Hum-mPLoc.  相似文献   

17.
Many chloroplast proteins are synthesized in the cytoplasm as precursors which contain an amino terminal transit peptide. These precursors are subsequently imported into chloroplast and targeted to one of several organellar locations. This import is mediated by the transit peptide, which is cleaved off during import. We have used the transit peptides of ferredoxin (chloroplast stroma) and plastocyanin (thylakoid lumen) to study chloroplast protein import and intra-organellar routing toward different compartments. Chimeric genes were constructed that encode precursor proteins in which the transit peptides are linked to yeast mitochondrial manganese superoxide dismutase. Chloroplast protein import and localization experiments show that both chimeric proteins are imported into the chloroplast stroma and processed. The plastocyanin transit sequence did not direct superoxide dismutase to the thylakoids; this protein was found in the stroma as an intermediate that still contains part of the plastocyanin transit peptide. The organelle specificity of these chimeric precursors reflected the transit peptide parts of the molecules, because neither the ferredoxin and plastocyanin precursors nor the chimeric proteins were imported into isolated yeast mitochondria.  相似文献   

18.
MOTIVATION: There is a scarcity of efficient computational methods for predicting protein subcellular localization in eukaryotes. Currently available methods are inadequate for genome-scale predictions with several limitations. Here, we present a new prediction method, pTARGET that can predict proteins targeted to nine different subcellular locations in the eukaryotic animal species. RESULTS: The nine subcellular locations predicted by pTARGET include cytoplasm, endoplasmic reticulum, extracellular/secretory, golgi, lysosomes, mitochondria, nucleus, plasma membrane and peroxisomes. Predictions are based on the location-specific protein functional domains and the amino acid compositional differences across different subcellular locations. Overall, this method can predict 68-87% of the true positives at accuracy rates of 96-99%. Comparison of the prediction performance against PSORT showed that pTARGET prediction rates are higher by 11-60% in 6 of the 8 locations tested. Besides, the pTARGET method is robust enough for genome-scale prediction of protein subcellular localizations since, it does not rely on the presence of signal or target peptides. AVAILABILITY: A public web server based on the pTARGET method is accessible at the URL http://bioinformatics.albany.edu/~ptarget. Datasets used for developing pTARGET can be downloaded from this web server. Source code will be available on request from the corresponding author.  相似文献   

19.
Chen YL  Li QZ  Zhang LQ 《Amino acids》2012,42(4):1309-1316
Due to the complexity of Plasmodium falciparum (PF) genome, predicting mitochondrial proteins of PF is more difficult than other species. In this study, using the n-peptide composition of reduced amino acid alphabet (RAAA) obtained from structural alphabet named Protein Blocks as feature parameter, the increment of diversity (ID) is firstly developed to predict mitochondrial proteins. By choosing the 1-peptide compositions on the N-terminal regions with 20 residues as the only input vector, the prediction performance achieves 86.86% accuracy with 0.69 Mathew’s correlation coefficient (MCC) by the jackknife test. Moreover, by combining with the hydropathy distribution along protein sequence and several reduced amino acid alphabets, we achieved maximum MCC 0.82 with accuracy 92% in the jackknife test by using the developed ID model. When evaluating on an independent dataset our method performs better than existing methods. The results indicate that the ID is a simple and efficient prediction method for mitochondrial proteins of malaria parasite.  相似文献   

20.
The cleavable pre-sequences of imported chloroplast and mitochondrial proteins have several features in common. This structural similarity prompted us to test whether a chloroplast pre-sequence (`transit peptide') can also be decoded by the mitochondrial import machinery. In the green alga, Chlamydomonas reinhardtii, the small subunit of ribulose-1,5-bisphosphate carboxylase/oxygenase (Rubisco) (a chloroplast protein) is nuclear-encoded and synthesized in the cytosol with a transient pre-sequence of 45 residues. The 31 amino-terminal residues of this chloroplast pre-sequence were fused to mouse dihydrofolate reductase (a cytosolic protein) and to yeast cytochrome oxidase subunit IV (an imported mitochondrial protein) from which the authentic pre-sequence had been removed. The chloroplast pre-sequence transported both attached proteins into the yeast mitochondrial matrix or inner membrane, although it functioned less efficiently than an authentic mitochondrial pre-sequence. We conclude that mitochondrial and chloroplast pre-sequences perform their function by a similar mechanism.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号