首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
One of the main challenges faced by biological applications is to predict protein subcellular localization in an automatic fashion accurately. To achieve this in these applications, a wide variety of machine learning methods have been proposed in recent years. Most of them focus on finding the optimal classification scheme and less of them take the simplifying the complexity of biological system into account. Traditionally such bio-data are analyzed by first performing a feature selection before classification. Motivated by CS (Compressive Sensing), we propose a method which performs locality preserving projection with a sparseness criterion such that the feature selection and dimension reduction are merged into one analysis. The proposed sparse method decreases the complexity of biological system, while increases protein subcellular localization accuracy. Experimental results are quite encouraging, indicating that the aforementioned sparse method is quite promising in dealing with complicated biological problems, such as predicting the subcellular localization of Gram-negative bacterial proteins.  相似文献   

2.
Shi JY  Zhang SW  Pan Q  Zhou GP 《Amino acids》2008,35(2):321-327
In the Post Genome Age, there is an urgent need to develop the reliable and effective computational methods to predict the subcellular localization for the explosion of newly found proteins. Here, a novel method of pseudo amino acid (PseAA) composition, the so-called “amino acid composition distribution” (AACD), is introduced. First, a protein sequence is divided equally into multiple segments. Then, amino acid composition of each segment is calculated in series. After that, each protein sequence can be represented by a feature vector. Finally, the feature vectors of all sequences thus obtained are further input into the multi-class support vector machines to predict the subcellular localization. The results show that AACD is quite effective in representing protein sequences for the purpose of predicting protein subcellular localization.  相似文献   

3.
The subcellular location of a protein is highly related to its function. Identifying the location of a given protein is an essential step for investigating its related problems. Traditional experimental methods can produce solid determination. However, their limitations, such as high cost and low efficiency, are evident. Computational methods provide an alternative means to address these problems. Most previous methods constantly extract features from protein sequences or structures for building prediction models. In this study, we use two types of features and combine them to construct the model. The first feature type is extracted from a protein–protein interaction network to abstract the relationship between the encoded protein and other proteins. The second type is obtained from gene ontology and biological pathways to indicate the existing functions of the encoded protein. These features are analyzed using some feature selection methods. The final optimum features are adopted to build the model with recurrent neural network as the classification algorithm. Such model yields good performance with Matthews correlation coefficient of 0.844. A decision tree is used as a rule learning classifier to extract decision rules. Although the performance of decision rules is poor, they are valuable in revealing the molecular mechanism of proteins with different subcellular locations. The final analysis confirms the reliability of the extracted rules. The source code of the propose method is freely available at https://github.com/xypan1232/rnnloc  相似文献   

4.
Lee K  Kim DW  Na D  Lee KH  Lee D 《Nucleic acids research》2006,34(17):4655-4666
Subcellular localization is one of the key functional characteristics of proteins. An automatic and efficient prediction method for the protein subcellular localization is highly required owing to the need for large-scale genome analysis. From a machine learning point of view, a dataset of protein localization has several characteristics: the dataset has too many classes (there are more than 10 localizations in a cell), it is a multi-label dataset (a protein may occur in several different subcellular locations), and it is too imbalanced (the number of proteins in each localization is remarkably different). Even though many previous works have been done for the prediction of protein subcellular localization, none of them tackles effectively these characteristics at the same time. Thus, a new computational method for protein localization is eventually needed for more reliable outcomes. To address the issue, we present a protein localization predictor based on D-SVDD (PLPD) for the prediction of protein localization, which can find the likelihood of a specific localization of a protein more easily and more correctly. Moreover, we introduce three measurements for the more precise evaluation of a protein localization predictor. As the results of various datasets which are made from the experiments of Huh et al. (2003), the proposed PLPD method represents a different approach that might play a complimentary role to the existing methods, such as Nearest Neighbor method and discriminate covariant method. Finally, after finding a good boundary for each localization using the 5184 classified proteins as training data, we predicted 138 proteins whose subcellular localizations could not be clearly observed by the experiments of Huh et al. (2003).  相似文献   

5.

Background  

Prediction of protein subcellular localization generally involves many complex factors, and using only one or two aspects of data information may not tell the true story. For this reason, some recent predictive models are deliberately designed to integrate multiple heterogeneous data sources for exploiting multi-aspect protein feature information. Gene ontology, hereinafter referred to as GO, uses a controlled vocabulary to depict biological molecules or gene products in terms of biological process, molecular function and cellular component. With the rapid expansion of annotated protein sequences, gene ontology has become a general protein feature that can be used to construct predictive models in computational biology. Existing models generally either concatenated the GO terms into a flat binary vector or applied majority-vote based ensemble learning for protein subcellular localization, both of which can not estimate the individual discriminative abilities of the three aspects of gene ontology.  相似文献   

6.
The information of protein subcellular localization is vitally important for in-depth understanding the intricate pathways that regulate biological processes at the cellular level. With the rapidly increasing number of newly found protein sequence in the Post-Genomic Age, many automated methods have been developed attempting to help annotate their subcellular locations in a timely manner. However, very few of them were developed using the protein-protein interaction (PPI) network information. In this paper, we have introduced a new concept called "tethering potential" by which the PPI information can be effectively fused into the formulation for protein samples. Based on such a network frame, a new predictor called Yeast-PLoc has been developed for identifying budding yeast proteins among their 19 subcellular location sites. Meanwhile, a purely sequence-based approach, called the "hybrid-property" method, is integrated into Yeast-PLoc as a fall-back to deal with those proteins without sufficient PPI information. The overall success rate by the jackknife test on the 4,683 yeast proteins in the training dataset was 70.25%. Furthermore, it was shown that the success rate by Yeast- PLoc on an independent dataset was remarkably higher than those by some other existing predictors, indicating that the current approach by incorporating the PPI information is quite promising. As a user-friendly web-server, Yeast-PLoc is freely accessible at http://yeastloc.biosino.org/.  相似文献   

7.
Apoptosis proteins have a central role in the development and the homeostasis of an organism. These proteins are very important for understanding the mechanism of programmed cell death. The function of an apoptosis protein is closely related to its subcellular location. It is crucial to develop powerful tools to predict apoptosis protein locations for rapidly increasing gap between the number of known structural proteins and the number of known sequences in protein databank. In this study, amino acids pair compositions with different spaces are used to construct feature sets for representing sample of protein feature selection approach based on binary particle swarm optimization, which is applied to extract effective feature. Ensemble classifier is used as prediction engine, of which the basic classifier is the fuzzy K-nearest neighbor. Each basic classifier is trained with different feature sets. Two datasets often used in prior works are selected to validate the performance of proposed approach. The results obtained by jackknife test are quite encouraging, indicating that the proposed method might become a potentially useful tool for subcellular location of apoptosis protein, or at least can play a complimentary role to the existing methods in the relevant areas. The supplement information and software written in Matlab are available by contacting the corresponding author.  相似文献   

8.
Shi JY  Zhang SW  Pan Q  Cheng YM  Xie J 《Amino acids》2007,33(1):69-74
As more and more genomes have been discovered in recent years, there is an urgent need to develop a reliable method to predict the subcellular localization for the explosion of newly found proteins. However, many well-known prediction methods based on amino acid composition have problems utilizing the sequence-order information. Here, based on the concept of Chou's pseudo amino acid composition (PseAA), a new feature extraction method, the multi-scale energy (MSE) approach, is introduced to incorporate the sequence-order information. First, a protein sequence was mapped to a digital signal using the amino acid index. Then, by wavelet transform, the mapped signal was broken down into several scales in which the energy factors were calculated and further formed into an MSE feature vector. Following this, combining this MSE feature vector with amino acid composition (AA), we constructed a series of MSEPseAA feature vectors to represent the protein subcellular localization sequences. Finally, according to a new kind of normalization approach, the MSEPseAA feature vectors were normalized to form the improved MSEPseAA vectors, named as IEPseAA. Using the technique of IEPseAA, C-support vector machine (C-SVM) and three multi-class SVMs strategies, quite promising results were obtained, indicating that MSE is quite effective in reflecting the sequence-order effects and might become a useful tool for predicting the other attributes of proteins as well.  相似文献   

9.
Subcellular localization of a protein is important to understand proteins’ functions and interactions. There are many techniques based on computational methods to predict protein subcellular locations, but it has been shown that many prediction tasks have a training data shortage problem. This paper introduces a new method to mine proteins with non-experimental annotations, which are labeled by non-experimental evidences of protein databases to overcome the training data shortage problem. A novel active sample selection strategy is designed, taking advantage of active learning technology, to actively find useful samples from the entire data pool of candidate proteins with non-experimental annotations. This approach can adequately estimate the “value” of each sample, automatically select the most valuable samples and add them into the original training set, to help to retrain the classifiers. Numerical experiments with for four popular multi-label classifiers on three benchmark datasets show that the proposed method can effectively select the valuable samples to supplement the original training set and significantly improve the performances of predicting classifiers.  相似文献   

10.
Predicting protein localization in budding yeast   总被引:4,自引:0,他引:4  
MOTIVATION: Most of the existing methods in predicting protein subcellular location were used to deal with the cases limited within the scope from two to five localizations, and only a few of them can be effectively extended to cover the cases of 12-14 localizations. This is because the more the locations involved are, the poorer the success rate would be. Besides, some proteins may occur in several different subcellular locations, i.e. bear the feature of 'multiplex locations'. So far there is no method that can be used to effectively treat the difficult multiplex location problem. The present study was initiated in an attempt to address (1) how to efficiently identify the localization of a query protein among many possible subcellular locations, and (2) how to deal with the case of multiplex locations. RESULTS: By hybridizing gene ontology, functional domain and pseudo amino acid composition approaches, a new method has been developed that can be used to predict subcellular localization of proteins with multiplex location feature. A global analysis of the proteins in budding yeast classified into 22 locations was performed by jack-knife cross-validation with the new method. The overall success identification rate thus obtained is 70%. In contrast to this, the corresponding rates obtained by some other existing methods were only 13-14%, indicating that the new method is very powerful and promising. Furthermore, predictions were made for the four proteins whose localizations could not be determined by experiments, as well as for the 236 proteins whose localizations in budding yeast were ambiguous according to experimental observations. However, according to our predicted results, many of these 'ambiguous proteins' were found to have the same score and ranking for several different subcellular locations, implying that they may simultaneously exist, or move around, in these locations. This finding is intriguing because it reflects the dynamic feature of these proteins in a cell that may be associated with some special biological functions.  相似文献   

11.
12.
研究真核蛋白质的亚细胞位点是了解真核蛋白质功能,深入研究蛋白质相关信号通路内在机制的基础。同时,可以为了解 疾病发病机制及为新药研发提供帮助。因此,研究真核蛋白质的亚细胞位点意义十分重大。随着基因组测序的完成,真核蛋白质 序列信息增长迅速,为真核蛋白质亚细胞位点的研究提出了更多的挑战。传统的实验法难以满足蛋白质信息量迅速增长的需求。 而采用生物信息学手段处理大规模数据的计算预测方法,可在较短时间内获得大量真核蛋白质亚细胞位点信息,弥补了实验法 的不足。因此,运用计算预测法预测真核蛋白质的亚细胞位点成为生物信息学领域的研究热点之一。本文主要从提取真核蛋白质 的特征信息、计算预测方法及预测效果的评价三个方面,介绍近年来真核蛋白质亚细胞位点预测的研究进展。  相似文献   

13.
Information of protein subcellular location plays an important role in molecular cell biology. Prediction of the subcellular location of proteins will help to understand their functions and interactions. In this paper, a different mode of pseudo amino acid composition was proposed to represent protein samples for predicting their subcellular localization via the following procedures: based on the optimal splice site of each protein sequence, we divided a sequence into sorting signal part and mature protein part, and extracted sequence features from each part separately. Then, the combined features were fed into the SVM classifier to perform the prediction. By the jackknife test on a benchmark dataset in which none of proteins included has more than 90% pairwise sequence identity to any other, the overall accuracies achieved by the method are 94.5% and 90.3% for prokaryotic and eukaryotic proteins, respectively. The results indicate that the prediction quality by our method is quite satisfactory. It is anticipated that the current method may serve as an alternative approach to the existing prediction methods.  相似文献   

14.
Zhang SW  Zhang YL  Yang HF  Zhao CH  Pan Q 《Amino acids》2008,34(4):565-572
The rapidly increasing number of sequence entering into the genome databank has called for the need for developing automated methods to analyze them. Information on the subcellular localization of new found protein sequences is important for helping to reveal their functions in time and conducting the study of system biology at the cellular level. Based on the concept of Chou’s pseudo-amino acid composition, a series of useful information and techniques, such as residue conservation scores, von Neumann entropies, multi-scale energy, and weighted auto-correlation function were utilized to generate the pseudo-amino acid components for representing the protein samples. Based on such an infrastructure, a hybridization predictor was developed for identifying uncharacterized proteins among the following 12 subcellular localizations: chloroplast, cytoplasm, cytoskeleton, endoplasmic reticulum, extracell, Golgi apparatus, lysosome, mitochondria, nucleus, peroxisome, plasma membrane, and vacuole. Compared with the results reported by the previous investigators, higher success rates were obtained, suggesting that the current approach is quite promising, and may become a useful high-throughput tool in the relevant areas.  相似文献   

15.
Protein subcellular localization is a major determinant of protein function. However, this important protein feature is often described in terms of discrete and qualitative categories of subcellular compartments, and therefore it has limited applications in quantitative protein function analyses. Here, we present Protein Localization Analysis and Search Tools (PLAST), an automated analysis framework for constructing and comparing quantitative signatures of protein subcellular localization patterns based on microscopy images. PLAST produces human-interpretable protein localization maps that quantitatively describe the similarities in the localization patterns of proteins and major subcellular compartments, without requiring manual assignment or supervised learning of these compartments. Using the budding yeast Saccharomyces cerevisiae as a model system, we show that PLAST is more accurate than existing, qualitative protein localization annotations in identifying known co-localized proteins. Furthermore, we demonstrate that PLAST can reveal protein localization-function relationships that are not obvious from these annotations. First, we identified proteins that have similar localization patterns and participate in closely-related biological processes, but do not necessarily form stable complexes with each other or localize at the same organelles. Second, we found an association between spatial and functional divergences of proteins during evolution. Surprisingly, as proteins with common ancestors evolve, they tend to develop more diverged subcellular localization patterns, but still occupy similar numbers of compartments. This suggests that divergence of protein localization might be more frequently due to the development of more specific localization patterns over ancestral compartments than the occupation of new compartments. PLAST enables systematic and quantitative analyses of protein localization-function relationships, and will be useful to elucidate protein functions and how these functions were acquired in cells from different organisms or species. A public web interface of PLAST is available at http://plast.bii.a-star.edu.sg.  相似文献   

16.
Xiao X  Shao S  Ding Y  Huang Z  Huang Y  Chou KC 《Amino acids》2005,28(1):57-61
Summary. Recent advances in large-scale genome sequencing have led to the rapid accumulation of amino acid sequences of proteins whose functions are unknown. Because the functions of these proteins are closely correlated with their subcellular localizations, it is vitally important to develop an automated method as a high-throughput tool to timely identify their subcellular location. Based on the concept of the pseudo amino acid composition by which a considerable amount of sequence-order effects can be incorporated into a set of discrete numbers (Chou, K. C., Proteins: Structure, Function, and Genetics, 2001, 43: 246–255), the complexity measure approach is introduced. The advantage by incorporating the complexity measure factor as one of the pseudo amino acid components for a protein is that it can more effectively reflect its overall sequence-order feature than the conventional correlation factors. With such a formulation frame to represent the samples of protein sequences, the covariant-discriminant predictor (Chou, K. C. and Elrod, D. W., Protein Engineering, 1999, 12: 107–118) was adopted to conduct prediction. High success rates were obtained by both the jackknife cross-validation test and independent dataset test, suggesting that introduction of the concept of the complexity measure into prediction of protein subcellular location is quite promising, and might also hold a great potential as a useful vehicle for the other areas of molecular biology.  相似文献   

17.
Proteins are played key roles in different functionalities in our daily life. All functional roles of a protein are a bit enhanced in interaction compared to individuals. Identification of essential proteins of an organism is a time consume and costly task during observation in the wet lab. The results of observation in wet lab always ensure high reliability and accuracy in the biological ground. Essential protein prediction using computational approaches is an alternative choice in research. It proves its significance rapidly in day-to-day life as well as reduces the experimental cost of wet lab effectively. Existing computational methods were implemented using Protein interaction networks (PPIN), Sequence, Gene Expression Dataset (GED), Gene Ontology (GO), Orthologous groups, and Subcellular localized datasets. Machine learning has diverse categories of features that enable to model and predict essential macromolecules of understudied organisms. A novel methodology MEM-FET (membership feature) is predicted based on features, that is, edge clustering coefficient, Average clustering coefficient, subcellular localization, and Gene Ontology within a compartment of common neighbors. The accuracy (ACC) values of the predicted true positive (TP) essential proteins are 0.79, 0.74, 0.78, and 0.71 for YHQ, YMIPS, YDIP, and YMBD datasets. An enriched set of essential proteins are also predicted using the MEM-FET algorithm. Ensemble ML also validated the proposed model with an accuracy of 60%. It has been predicted that MEM-FET algorithms outperform other existing algorithms with an ACC value of 80% for the yeast dataset.  相似文献   

18.
19.
20.
Changes in membrane protein localization are critical to establishing cell polarity and regulating cell signaling. Fluorescence microscopy of labeled proteins allows visualization of these changes, but quantitative analysis is needed to study this aspect of cell signaling in full mechanistic detail. We have developed a novel approach for quantitative assessment of membrane protein redistribution based on four-dimensional video microscopy of fluorescently labeled proteins. Our analytic system provides robust automated methods for cell surface reconstruction, cell shape tracking, cell-surface distance measurement, and cluster formation analysis. These methods permit statistical analyses and testing of mechanistic hypotheses regarding cell signaling. We have used this approach to measure antigen-dependent clustering of signaling molecules in CD4+ T lymphocytes, obtaining clustering velocities consistent with single-particle tracking data. Our system captures quantitative differences in clustering between signaling proteins with distinct biological functions. Our methods can be generalized to a range of cell-signaling phenomena and enable novel applications not feasible with single-particle studies, such as analysis of subcellular protein localization in live organ culture.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号