首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Apoptosis proteins are very important for understanding the mechanism of programmed cell death. The apoptosis protein localization can provide valuable information about its molecular function. The prediction of localization of an apoptosis protein is a challenging task. In our previous work we proposed an increment of diversity (ID) method using protein sequence information for this prediction task. In this work, based on the concept of Chou's pseudo-amino acid composition [Chou, K.C., 2001. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins: Struct. Funct. Genet. (Erratum: Chou, K.C., 2001, vol. 44, 60) 43, 246-255, Chou, K.C., 2005. Using amphiphilic pseudo-amino acid composition to predict enzyme subfamily classes. Bioinformatics 21, 10-19], a different pseudo-amino acid composition by using the hydropathy distribution information is introduced. A novel ID_SVM algorithm combined ID with support vector machine (SVM) is proposed. This method is applied to three data sets (317 apoptosis proteins, 225 apoptosis proteins and 98 apoptosis proteins). The higher predictive success rates than the previous algorithms are obtained by the jackknife tests.  相似文献   

2.
Apoptosis, or programmed cell death, plays an important role in development of an organism. Obtaining information on subcellular location of apoptosis proteins is very helpful to understand the apoptosis mechanism. In this paper, based on the concept that the position distribution information of amino acids is closely related with the structure and function of proteins, we introduce the concept of distance frequency [Matsuda, S., Vert, J.P., Ueda, N., Toh, H., Akutsu, T., 2005. A novel representation of protein sequences for prediction of subcellular location using support vector machines. Protein Sci. 14, 2804-2813] and propose a novel way to calculate distance frequencies. In order to calculate the local features, each protein sequence is separated into p parts with the same length in our paper. Then we use the novel representation of protein sequences and adopt support vector machine to predict subcellular location. The overall prediction accuracy is significantly improved by jackknife test.  相似文献   

3.
用离散增量结合支持向量机方法预测蛋白质亚细胞定位   总被引:3,自引:0,他引:3  
赵禹  赵巨东  姚龙 《生物信息学》2010,8(3):237-239,244
对未知蛋白的功能注释是蛋白质组学的主要目标。一个关键的注释是蛋白质亚细胞定位的预测。本文应用离散增量结合支持向量机(ID_SVM)的方法,对阳性革兰氏细菌蛋白的5类亚细胞定位点进行预测。在独立检验下,其总体预测成功率为89.66%。结果发现ID_SVM算法对预测的成功率有很大改进。  相似文献   

4.
王伟  郑小琪  窦永超  刘太岗  赵娟  王军 《生物信息学》2011,9(2):171-175,180
蛋白质的亚细胞位点信息有助于我们了解蛋白质的功能以及它们之间的相互作用,同时还可以为新药物的研发提供帮助。目前普遍采用的亚细胞位点预测方法主要是基于N端分选信号或氨基酸组分特征,但研究表明,单纯基于N端分选信号或氨基酸组分的方法都会丢失序列的序信息。为了克服此缺陷,本文提出了一种基于最优分割位点的蛋白质亚细胞位点预测方法。首先,把每条蛋白质序列分割为N端、中间和C端三部分,然后在每个子序列和整条序列中分别提取氨基酸组分、双肽组分和物理化学性质,最后我们把这些特征融合起来作为整条序列的特征。通过夹克刀检验,该方法在NNPSL数据集上得到的总体精度分别是87.8%和92.1%。  相似文献   

5.
蛋白质亚细胞定位的识别   总被引:3,自引:2,他引:3  
根据蛋白质的亚细胞定位,将蛋白质分为12类,用离散量的数学理论,以蛋白质中400个氨基酸二联体数目构成离散源,通过计算离散增量预测蛋白质的亚细胞定位,用Self-consistency和Jackknife两种方法测试均获得较高的预测成功率。结果表明:Self-consistency方法预测成功率为84.5%,Jackknife方法预测成功率为81.1%。  相似文献   

6.
Abstract Several mutant hTNFα genes were constructed by deletion and stepwise reconstitution of regions coding for C-terminal sequences. The mutant hTNFα proteins behaved differently from native hTNFα when expressed in Escherichia coli . They were either sensitive to proteolytic degradation or formed insoluble aggregates depending on the strains and conditions used for expression. By contrast, native hTNFα was always present in a soluble form and had a tendency to associate with the cytoplasmic membrane. It was even transported to the periplasmic space in E. coli as shown by both cell fractionation and immunoelectron microscopy. The different behaviour of mutant hTNFα proteins probably results from a disturbance of protein folding.  相似文献   

7.
Revealing the subcellular location of newly discovered protein sequences can bring insight to their function and guide research at the cellular level. The rapidly increasing number of sequences entering the genome databanks has called for the development of automated analysis methods. Currently, most existing methods used to predict protein subcellular locations cover only one, or a very limited number of species. Therefore, it is necessary to develop reliable and effective computational approaches to further improve the performance of protein subcellular prediction and, at the same time, cover more species. The current study reports the development of a novel predictor called MSLoc-DT to predict the protein subcellular locations of human, animal, plant, bacteria, virus, fungi, and archaea by introducing a novel feature extraction approach termed Amino Acid Index Distribution (AAID) and then fusing gene ontology information, sequential evolutionary information, and sequence statistical information through four different modes of pseudo amino acid composition (PseAAC) with a decision template rule. Using the jackknife test, MSLoc-DT can achieve 86.5, 98.3, 90.3, 98.5, 95.9, 98.1, and 99.3% overall accuracy for human, animal, plant, bacteria, virus, fungi, and archaea, respectively, on seven stringent benchmark datasets. Compared with other predictors (e.g., Gpos-PLoc, Gneg-PLoc, Virus-PLoc, Plant-PLoc, Plant-mPLoc, ProLoc-Go, Hum-PLoc, GOASVM) on the gram-positive, gram-negative, virus, plant, eukaryotic, and human datasets, the new MSLoc-DT predictor is much more effective and robust. Although the MSLoc-DT predictor is designed to predict the single location of proteins, our method can be extended to multiple locations of proteins by introducing multilabel machine learning approaches, such as the support vector machine and deep learning, as substitutes for the K-nearest neighbor (KNN) method. As a user-friendly web server, MSLoc-DT is freely accessible at http://bioinfo.ibp.ac.cn/MSLOC_DT/index.html.  相似文献   

8.
Apoptosis proteins have a central role in the development and the homeostasis of an organism. These proteins are very important for understanding the mechanism of programmed cell death. The function of an apoptosis protein is closely related to its subcellular location. It is crucial to develop powerful tools to predict apoptosis protein locations for rapidly increasing gap between the number of known structural proteins and the number of known sequences in protein databank. In this study, amino acids pair compositions with different spaces are used to construct feature sets for representing sample of protein feature selection approach based on binary particle swarm optimization, which is applied to extract effective feature. Ensemble classifier is used as prediction engine, of which the basic classifier is the fuzzy K-nearest neighbor. Each basic classifier is trained with different feature sets. Two datasets often used in prior works are selected to validate the performance of proposed approach. The results obtained by jackknife test are quite encouraging, indicating that the proposed method might become a potentially useful tool for subcellular location of apoptosis protein, or at least can play a complimentary role to the existing methods in the relevant areas. The supplement information and software written in Matlab are available by contacting the corresponding author.  相似文献   

9.
根据凋亡蛋白的亚细胞位置主要决定于它的氨基酸序列这一观点,基于局部氨基酸序列的n肽组分和序列的亲疏水性分布信息,采用离散增量结合支持向量机(ID_SVM)算法,对六类细胞凋亡蛋白的亚细胞位置进行预测。结果表明,在Re-substitution检验和Jackknife检验下,ID_SVM算法的总体预测成功率分别达到了94.6%和84.2%;在5-fold检验和10-fold检验下,其总体预测成功率也都达到了83%以上。通过比较ID和ID_SVM两种方法的预测能力发现,结合了支持向量机的离散增量算法能够改进预测成功率,结果表明ID_SVM是预测凋亡蛋白亚细胞位置的一种很有效的方法。  相似文献   

10.
Jia C  Liu T  Chang AK  Zhai Y 《Biochimie》2011,93(4):778-782
Mitochondrial proteins of Plasmodium falciparum are considered as attractive targets for anti-malarial drugs, but the experimental identification of these proteins is a difficult and time-consuming task. Computational prediction of mitochondrial proteins offers an alternative approach. However, the commonly used subcellular location prediction methods are unsuited for P. falciparum mitochondrial proteins whereas the organism and organelle-specific methods were constructed on the basis of a rather small dataset. In this study, a novel dataset termed PfM233, which included 108 mitochondrial and 125 non-mitochondrial proteins with sequence similarity below 25%, was established and the methods for predicting mitochondrial proteins of P. falciparum were described. Both bi-profile Bayes and split amino acid composition were applied to extract the features from the N- and C-terminal sequences of these proteins, which were then used to construct two SVM based classifiers (PfMP-N25 and PfMP-30). Using PfM233 as the dataset, PfMP-N25 and PfMP-30 achieved accuracies (MCCs) of 90.13% (0.80) and 90.99% (0.82). When tested with the commonly used 40 mitochondrial proteins in PfM175 and the 108 mitochondrial proteins in PfM233, these two methods obviously outperformed the existing general, organelle-specific and organism and organelle-specific methods.  相似文献   

11.
The outer membrane proteins (OMPs) are β-barrel membrane proteins that performed lots of biology functions. The discriminating OMPs from other non-OMPs is a very important task for understanding some biochemical process. In this study, a method that combines increment of diversity with modified Mahalanobis Discriminant, called IDQD, is presented to predict 208 OMPs, 206 transmembrane helical proteins (TMHPs) and 673 globular proteins (GPs) by using Chou's pseudo amino acid compositions as parameters. The overall accuracy of jackknife cross-validation is 93.2% and 96.1%, respectively, for three datasets (OMPs, TMHPs and GPs) and two datasets (OMPs and non-OMPs). These predicted results suggest that the method can be effectively applied to discriminate OMPs, TMHPs and GPs. And it also indicates that the pseudo amino acid composition can better reflect the core feature of membrane proteins than the classical amino acid composition.  相似文献   

12.
13.
Zhou GP  Doctor K 《Proteins》2003,50(1):44-48
Apoptosis proteins have a central role in the development and homeostasis of an organism. These proteins are very important for understanding the mechanism of programmed cell death. Many efforts in pharmaceutical research have been aimed at understanding their structure and function. Unfortunately, thus far, very few apoptosis protein structures have been determined. In contrast, many apoptosis protein sequences are known, and many more are expected to come in the near future. Because of the extremely unbalanced state, it would be worthwhile to develop a fast sequence-based method to identify their subcellular location so as to gain some insight about their biological function. In view of this, a study was initiated in an attempt to identify the subcellular location of apoptosis proteins according to their sequences by means of the covariant discriminant function, which was established based on the Mahalanobis distance and Chou's invariance theorem (Chou, Proteins 1995;21:319-344). The results were quite promising, indicating that the subcellular location of apoptosis proteins are predictable to a considerably accurate extent if a good training data set can be established. It is expected that, with a continuous improvement of the training data set by incorporating more and more new data, the current method might eventually become a useful tool in this area because the function of an apoptosis protein is closely related to its subcellular location.  相似文献   

14.
In order to investigate the basic mechanism of earthworm activities enhancing plants growth and heavy metals accumulations. A hydroponic experiment was carried out to investigate the effects of earthworm mucus and mimic amino acids solution of earthworm mucus on cadmium (Cd) subcellular distributions and chemical forms in tomato seedlings. The result showed that earthworm mucus significantly increased the concentrations of Cd stored in soluble fraction in subcellular distribution and the concentrations of inorganic and soluble forms of Cd in tomato seedlings, which may explain the increase plants growth and Cd accumulation by earthworm mucus. Meanwhile, amino acids have same function as earthworm mucus, but the effect was much lower than of earthworm mucus. These results indicated that earthworm mucus could increase tomato seedlings growth and Cd accumulations through changing Cd subcellular distribution and chemical forms in plants.  相似文献   

15.
Tang SN  Sun JM  Xiong WW  Cong PS  Li TH 《Biochimie》2012,94(3):847-853
Mycobacterium, the most common disease-causing genus, infects billions of people and is notoriously difficult to treat. Understanding the subcellular localization of mycobacterial proteins can provide essential clues for protein function and drug discovery. In this article, we present a novel approach that focuses on local sequence information to identify localization motifs that are generated by a merging algorithm and are selected based on a binomially distributed model. These localization motifs are employed as features for identifying the subcellular localization of mycobacterial proteins. Our approach provides more accurate results than previous methods and was tested on an independent dataset recently obtained from an experimental study to provide a first and reasonably accurate prediction of subcellular localization. Our approach can also be used for large-scale prediction of new protein entries in the UniportKB database and of protein sequences obtained experimentally. In addition, our approach identified many local motifs involved with the subcellular localization that also interact with the environment. Thus, our method may have widespread applications both in the study of the functions of mycobacterial proteins and in the search for a potential vaccine target for designing drugs.  相似文献   

16.
Gao QB  Wang ZZ  Yan C  Du YH 《FEBS letters》2005,579(16):3444-3448
To understand the structure and function of a protein, an important task is to know where it occurs in the cell. Thus, a computational method for properly predicting the subcellular location of proteins would be significant in interpreting the original data produced by the large-scale genome sequencing projects. The present work tries to explore an effective method for extracting features from protein primary sequence and find a novel measurement of similarity among proteins for classifying a protein to its proper subcellular location. We considered four locations in eukaryotic cells and three locations in prokaryotic cells, which have been investigated by several groups in the past. A combined feature of primary sequence defined as a 430D (dimensional) vector was utilized to represent a protein, including 20 amino acid compositions, 400 dipeptide compositions and 10 physicochemical properties. To evaluate the prediction performance of this encoding scheme, a jackknife test based on nearest neighbor algorithm was employed. The prediction accuracies for cytoplasmic, extracellular, mitochondrial, and nuclear proteins in the former dataset were 86.3%, 89.2%, 73.5% and 89.4%, respectively, and the total prediction accuracy reached 86.3%. As for the prediction accuracies of cytoplasmic, extracellular, and periplasmic proteins in the latter dataset, the prediction accuracies were 97.4%, 86.0%, and 79.7, respectively, and the total prediction accuracy of 92.5% was achieved. The results indicate that this method outperforms some existing approaches based on amino acid composition or amino acid composition and dipeptide composition.  相似文献   

17.
Guo J  Lin Y  Liu X 《Proteomics》2006,6(19):5099-5105
This paper proposes a new integrative system (GNBSL--Gram-negative bacteria subcellular localization) for subcellular localization specifized on the Gram-negative bacteria proteins. First, the system generates a position-specific frequency matrix (PSFM) and a position-specific scoring matrix (PSSM) for each protein sequence by searching the Swiss-Prot database. Then different features are extracted by four modules from the PSFM and the PSSM. The features include whole-sequence amino acid composition, N- and C-terminus amino acid composition, dipeptide composition, and segment composition. Four probabilistic neural network (PNN) classifiers are used to classify these modules. To further improve the performance, two modules trained by support vector machine (SVM) are added in this system. One module extracts the residue-couple distribution from the amino acid sequence and the other module applies a pairwise profile alignment kernel to measure the local similarity between every two sequences. Finally, an additional SVM is used to fuse the outputs from the six modules. Test on a benchmark dataset shows that the overall success rate of GNBSL is higher than those of PSORT-B, CELLO, and PSLpred. A web server GNBSL can be visited from http://166.111.24.5/webtools/GNBSL/index.htm.  相似文献   

18.
A novel alignment-free method for computing functional similarity of membrane proteins based on features of hydropathy distribution is presented. The features of hydropathy distribution are used to represent protein families as hydropathy profiles. The profiles statistically summarize the hydropathy distribution of member proteins. The summation is made by using hydropathy features that numerically represent structurally/functionally significant portions of protein sequences. The hydropathy profiles are numerical vectors that are points in a high dimensional ‘hydropathy’ space. Their similarities are identified by projection of the space onto principal axes. Here, the approach is applied to the secondary transporters. The analysis using the presented approach is validated by the standard classification of the secondary transporters. The presented analysis allows for prediction of function attributes for proteins of uncharacterized families of secondary transporters. The results obtained using the presented analysis may help to characterize unknown function attributes of secondary transporters. They also show that analysis of hydropathy distribution can be used for function prediction of membrane proteins.  相似文献   

19.
Recent advances in large-scale genome sequencing have led to the rapid accumulation of amino acid sequences of proteins whose functions are unknown. Since the functions of these proteins are closely correlated with their subcellular localizations, many efforts have been made to develop a variety of methods for predicting protein subcellular location. In this study, based on the strategy by hybridizing the functional domain composition and the pseudo-amino acid composition (Cai and Chou [2003]: Biochem. Biophys. Res. Commun. 305:407-411), the Intimate Sorting Algorithm (ISort predictor) was developed for predicting the protein subcellular location. As a showcase, the same plant and non-plant protein datasets as investigated by the previous investigators were used for demonstration. The overall success rate by the jackknife test for the plant protein dataset was 85.4%, and that for the non-plant protein dataset 91.9%. These are so far the highest success rates achieved for the two datasets by following a rigorous cross validation test procedure, further confirming that such a hybrid approach may become a very useful high-throughput tool in the area of bioinformatics, proteomics, as well as molecular cell biology.  相似文献   

20.
统计了大肠杆菌sigma70启动子在不同基因间的分布。计算了683条大肠杆菌sigma70启动子的每个位点六联体的保守性M6(l)值及涨落限,以大于涨落限7.2的21个保守位点的六联体频数作为参数,利用离散增量理论对大肠杆菌全序列进行启动子搜索。结果显示683条启动子序列被全部正确预测且得到126条预测序列,利用启动子在不同基因间的分布和TSS到TIS的距离分布进行二次筛选,推测其中的84条序列是实验未测定的启动子序列。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号