首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 163 毫秒
1.
Chen YL  Li QZ  Zhang LQ 《Amino acids》2012,42(4):1309-1316
Due to the complexity of Plasmodium falciparum (PF) genome, predicting mitochondrial proteins of PF is more difficult than other species. In this study, using the n-peptide composition of reduced amino acid alphabet (RAAA) obtained from structural alphabet named Protein Blocks as feature parameter, the increment of diversity (ID) is firstly developed to predict mitochondrial proteins. By choosing the 1-peptide compositions on the N-terminal regions with 20 residues as the only input vector, the prediction performance achieves 86.86% accuracy with 0.69 Mathew’s correlation coefficient (MCC) by the jackknife test. Moreover, by combining with the hydropathy distribution along protein sequence and several reduced amino acid alphabets, we achieved maximum MCC 0.82 with accuracy 92% in the jackknife test by using the developed ID model. When evaluating on an independent dataset our method performs better than existing methods. The results indicate that the ID is a simple and efficient prediction method for mitochondrial proteins of malaria parasite.  相似文献   

2.
3.
Panwar B  Raghava GP 《Amino acids》2012,42(5):1703-1713
Since endo-symbiotic events occur, all genes of mitochondrial aminoacyl tRNA synthetase (AARS) were lost or transferred from ancestral mitochondrial genome into the nucleus. The canonical pattern is that both cytosolic and mitochondrial AARSs coexist in the nuclear genome. In the present scenario all mitochondrial AARSs are nucleus-encoded, synthesized on cytosolic ribosomes and post-translationally imported from the cytosol into the mitochondria in eukaryotic cell. The site-based discrimination between similar types of enzymes is very challenging because they have almost same physico-chemical properties. It is very important to predict the sub-cellular location of AARSs, to understand the mitochondrial protein synthesis. We have analyzed and optimized the distinguishable patterns between cytosolic and mitochondrial AARSs. Firstly, support vector machines (SVM)-based modules have been developed using amino acid and dipeptide compositions and achieved Mathews correlation coefficient (MCC) of 0.82 and 0.73, respectively. Secondly, we have developed SVM modules using position-specific scoring matrix and achieved the maximum MCC of 0.78. Thirdly, we developed SVM modules using N-terminal, intermediate residues, C-terminal and split amino acid composition (SAAC) and achieved MCC of 0.82, 0.70, 0.39 and 0.86, respectively. Finally, a SVM module was developed using selected attributes of split amino acid composition (SA-SAAC) approach and achieved MCC of 0.92 with an accuracy of 96.00%. All modules were trained and tested on a non-redundant data set and evaluated using fivefold cross-validation technique. On the independent data sets, SA-SAAC based prediction model achieved MCC of 0.95 with an accuracy of 97.77%. The web-server 'MARSpred' based on above study is available at http://www.imtech.res.in/raghava/marspred/.  相似文献   

4.
Afridi TH  Khan A  Lee YS 《Amino acids》2012,42(4):1443-1454
Mitochondria are all-important organelles of eukaryotic cells since they are involved in processes associated with cellular mortality and human diseases. Therefore, trustworthy techniques are highly required for the identification of new mitochondrial proteins. We propose Mito-GSAAC system for prediction of mitochondrial proteins. The aim of this work is to investigate an effective feature extraction strategy and to develop an ensemble approach that can better exploit the advantages of this feature extraction strategy for mitochondria classification. We investigate four kinds of protein representations for prediction of mitochondrial proteins: amino acid composition, dipeptide composition, pseudo amino acid composition, and split amino acid composition (SAAC). Individual classifiers such as support vector machine (SVM), k-nearest neighbor, multilayer perceptron, random forest, AdaBoost, and bagging are first trained. An ensemble classifier is then built using genetic programming (GP) for evolving a complex but effective decision space from the individual decision spaces of the trained classifiers. The highest prediction performance for Jackknife test is 92.62% using GP-based ensemble classifier on SAAC features, which is the highest accuracy, reported so far on the Mitochondria dataset being used. While on the Malaria Parasite Mitochondria dataset, the highest accuracy is obtained by SVM using SAAC and it is further enhanced to 93.21% using GP-based ensemble. It is observed that SAAC has better discrimination power for mitochondria prediction over the rest of the feature extraction strategies. Thus, the improved prediction performance is largely due to the better capability of SAAC for discriminating between mitochondria and non-mitochondria proteins at the N and C terminus and the effective combination capability of GP. Mito-GSAAC can be accessed at . It is expected that the novel approach and the accompanied predictor will have a major impact to Molecular Cell Biology, Proteomics, Bioinformatics, System Biology, and Drug Development.  相似文献   

5.
Due to the complexity of Plasmodium falciparumis genome, predicting secretory proteins of P. falciparum is more difficult than other species. In this study, based on the measure of diversity definition, a new K-nearest neighbor method, K-minimum increment of diversity (K-MID), is introduced to predict secretory proteins. The prediction performance of the K-MID by using amino acids composition as the only input vector achieves 88.89% accuracy with 0.78 Mathew’s correlation coefficient (MCC). Further, the several reduced amino acids alphabets are applied to predict secretory proteins and the results show that the prediction results are improved to 90.67% accuracy with 0.83 MCC by using the 169 dipeptide compositions of the reduced amino acids alphabets obtained from Protein Blocks method.  相似文献   

6.
Prediction of RNA binding sites in a protein using SVM and PSSM profile   总被引:1,自引:0,他引:1  
Kumar M  Gromiha MM  Raghava GP 《Proteins》2008,71(1):189-194
  相似文献   

7.
Jia C  Liu T  Chang AK  Zhai Y 《Biochimie》2011,93(4):778-782
Mitochondrial proteins of Plasmodium falciparum are considered as attractive targets for anti-malarial drugs, but the experimental identification of these proteins is a difficult and time-consuming task. Computational prediction of mitochondrial proteins offers an alternative approach. However, the commonly used subcellular location prediction methods are unsuited for P. falciparum mitochondrial proteins whereas the organism and organelle-specific methods were constructed on the basis of a rather small dataset. In this study, a novel dataset termed PfM233, which included 108 mitochondrial and 125 non-mitochondrial proteins with sequence similarity below 25%, was established and the methods for predicting mitochondrial proteins of P. falciparum were described. Both bi-profile Bayes and split amino acid composition were applied to extract the features from the N- and C-terminal sequences of these proteins, which were then used to construct two SVM based classifiers (PfMP-N25 and PfMP-30). Using PfM233 as the dataset, PfMP-N25 and PfMP-30 achieved accuracies (MCCs) of 90.13% (0.80) and 90.99% (0.82). When tested with the commonly used 40 mitochondrial proteins in PfM175 and the 108 mitochondrial proteins in PfM233, these two methods obviously outperformed the existing general, organelle-specific and organism and organelle-specific methods.  相似文献   

8.
Here we report a systematic approach for predicting subcellular localization (cytoplasm, mitochondrial, nuclear, and plasma membrane) of human proteins. First, support vector machine (SVM)-based modules for predicting subcellular localization using traditional amino acid and dipeptide (i + 1) composition achieved overall accuracy of 76.6 and 77.8%, respectively. PSI-BLAST, when carried out using a similarity-based search against a nonredundant data base of experimentally annotated proteins, yielded 73.3% accuracy. To gain further insight, a hybrid module (hybrid1) was developed based on amino acid composition, dipeptide composition, and similarity information and attained better accuracy of 84.9%. In addition, SVM modules based on a different higher order dipeptide i.e. i + 2, i + 3, and i + 4 were also constructed for the prediction of subcellular localization of human proteins, and overall accuracy of 79.7, 77.5, and 77.1% was accomplished, respectively. Furthermore, another SVM module hybrid2 was developed using traditional dipeptide (i + 1) and higher order dipeptide (i + 2, i + 3, and i + 4) compositions, which gave an overall accuracy of 81.3%. We also developed SVM module hybrid3 based on amino acid composition, traditional and higher order dipeptide compositions, and PSI-BLAST output and achieved an overall accuracy of 84.4%. A Web server HSLPred (www.imtech.res.in/raghava/hslpred/ or bioinformatics.uams.edu/raghava/hslpred/) has been designed to predict subcellular localization of human proteins using the above approaches.  相似文献   

9.
Hayat M  Khan A  Yeasin M 《Amino acids》2012,42(6):2447-2460
Knowledge of the types of membrane protein provides useful clues in deducing the functions of uncharacterized membrane proteins. An automatic method for efficiently identifying uncharacterized proteins is thus highly desirable. In this work, we have developed a novel method for predicting membrane protein types by exploiting the discrimination capability of the difference in amino acid composition at the N and C terminus through split amino acid composition (SAAC). We also show that the ensemble classification can better exploit this discriminating capability of SAAC. In this study, membrane protein types are classified using three feature extraction and several classification strategies. An ensemble classifier Mem-EnsSAAC is then developed using the best feature extraction strategy. Pseudo amino acid (PseAA) composition, discrete wavelet analysis (DWT), SAAC, and a hybrid model are employed for feature extraction. The nearest neighbor, probabilistic neural network, support vector machine, random forest, and Adaboost are used as individual classifiers. The predicted results of the individual learners are combined using genetic algorithm to form an ensemble classifier, Mem-EnsSAAC yielding an accuracy of 92.4 and 92.2% for the Jackknife and independent dataset test, respectively. Performance measures such as MCC, sensitivity, specificity, F-measure, and Q-statistics show that SAAC-based prediction yields significantly higher performance compared to PseAA- and DWT-based systems, and is also the best reported so far. The proposed Mem-EnsSAAC is able to predict the membrane protein types with high accuracy and consequently, can be very helpful in drug discovery. It can be accessed at http://111.68.99.218/membrane.  相似文献   

10.
Nicotinamide adenine dinucleotide (NAD) plays an important role in cellular metabolism and acts as hydrideaccepting and hydride-donating coenzymes in energy production. Identification of NAD protein interacting sites can significantly aid in understanding the NAD dependent metabolism and pathways, and it could further contribute useful information for drug development. In this study, a computational method is proposed to predict NAD-protein interacting sites using the sequence information and structure-based information. All models developed in this work are evaluated using the 7-fold cross validation technique. Results show that using the position specific scoring matrix (PSSM) as an input feature is quite encouraging for predicting NAD interacting sites. After considering the unbalance dataset, the ensemble support vector machine (SVM), which is an assembly of many individual SVM classifiers, is developed to predict the NAD interacting sites. It was observed that the overall accuracy (Acc) thus obtained was 87.31% with Matthew's correlation coefficient (MCC) equal to 0.56. In contrast, the corresponding rate by the single SVM approach was only 80.86% with MCC of 0.38. These results indicated that the prediction accuracy could be remarkably improved via the ensemble SVM classifier approach.  相似文献   

11.

Background

Mannose binding proteins (MBPs) play a vital role in several biological functions such as defense mechanisms. These proteins bind to mannose on the surface of a wide range of pathogens and help in eliminating these pathogens from our body. Thus, it is important to identify mannose interacting residues (MIRs) in order to understand mechanism of recognition of pathogens by MBPs.

Results

This paper describes modules developed for predicting MIRs in a protein. Support vector machine (SVM) based models have been developed on 120 mannose binding protein chains, where no two chains have more than 25% sequence similarity. SVM models were developed on two types of datasets: 1) main dataset consists of 1029 mannose interacting and 1029 non-interacting residues, 2) realistic dataset consists of 1029 mannose interacting and 10320 non-interacting residues. In this study, firstly, we developed standard modules using binary and PSSM profile of patterns and got maximum MCC around 0.32. Secondly, we developed SVM modules using composition profile of patterns and achieved maximum MCC around 0.74 with accuracy 86.64% on main dataset. Thirdly, we developed a model on a realistic dataset and achieved maximum MCC of 0.62 with accuracy 93.08%. Based on this study, a standalone program and web server have been developed for predicting mannose interacting residues in proteins (http://www.imtech.res.in/raghava/premier/).

Conclusions

Compositional analysis of mannose interacting and non-interacting residues shows that certain types of residues are preferred in mannose interaction. It was also observed that residues around mannose interacting residues have a preference for certain types of residues. Composition of patterns/peptide/segment has been used for predicting MIRs and achieved reasonable high accuracy. It is possible that this novel strategy may be effective to predict other types of interacting residues. This study will be useful in annotating the function of protein as well as in understanding the role of mannose in the immune system.  相似文献   

12.
Palmitoylation is the post‐translational reversible addition of the acyl moiety, palmitate, to cysteine residues of proteins and is involved in regulating protein trafficking, localization, stability and function. The Aspartate‐Histidine‐Histidine‐Cysteine (DHHC) protein family, named for their highly conserved DHHC signature motif, is thought to be responsible for catalysing protein palmitoylation. Palmitoylation is widespread in all eukaryotes, including the malaria parasite, Plasmodium falciparum, where over 400 palmitoylated proteins are present in the asexual intraerythrocytic schizont stage parasites, including proteins involved in key aspects of parasite maturation and development. The P. falciparum genome includes 12 proteins containing the conserved DHHC motif. In this study, we adapted a palmitoyl‐transferase activity assay for use with P. falciparum proteins and demonstrated for the first time that P. falciparum DHHC proteins are responsible for the palmitoylation of P. falciparum substrates. This assay also reveals that multiple DHHCs are capable of palmitoylating the same substrate, indicating functional redundancy at least in vitro. To test whether functional redundancy also exists in vivo, we investigated the endogenous localization and essentiality of a subset of schizont‐expressed PfDHHC proteins. Individual PfDHHC proteins localized to distinct organelles, including parasite‐specific organelles such as the rhoptries and inner membrane complex. Knock‐out studies identified individual DHHCs that may be essential for blood‐stage growth and others that were functionally redundant in the blood stages but may have functions in other stages of parasite development. Supporting this hypothesis, disruption of PfDHHC9 had no effect on blood‐stage growth but reduced the formation of gametocytes, suggesting that this protein could be exploited as a transmission‐blocking target. The localization and stage‐specific expression of the DHHC proteins may be important for regulating their substrate specificity and thus may provide a path for inhibitor development.  相似文献   

13.
Sethi D  Garg A  Raghava GP 《Amino acids》2008,35(3):599-605
The association of structurally disordered proteins with a number of diseases has engendered enormous interest and therefore demands a prediction method that would facilitate their expeditious study at molecular level. The present study describes the development of a computational method for predicting disordered proteins using sequence and profile compositions as input features for the training of SVM models. First, we developed the amino acid and dipeptide compositions based SVM modules which yielded sensitivities of 75.6 and 73.2% along with Matthew’s Correlation Coefficient (MCC) values of 0.75 and 0.60, respectively. In addition, the use of predicted secondary structure content (coil, sheet and helices) in the form of composition values attained a sensitivity of 76.8% and MCC value of 0.77. Finally, the training of SVM models using evolutionary information hidden in the multiple sequence alignment profile improved the prediction performance by achieving a sensitivity value of 78% and MCC of 0.78. Furthermore, when evaluated on an independent dataset of partially disordered proteins, the same SVM module provided a correct prediction rate of 86.6%. Based on the above study, a web server (“DPROT”) was developed for the prediction of disordered proteins, which is available at .  相似文献   

14.
The parasite Plasmodium falciparum, responsible for the most deadly form of human malaria, is one of the extremely AT-rich genomes sequenced so far and known to possess many atypical characteristics. Using multivariate statistical approaches, the present study analyzes the amino acid usage pattern in 5038 annotated protein-coding sequences in P. falciparum clone 3D7. The amino acid composition of individual proteins, though dominated by the directional mutational pressure, exhibits wide variation across the proteome. The Asn content, expression level, mean molecular weight, hydropathy, and aromaticity are found to be the major sources of variation in amino acid usage. At all stages of development, frequencies of residues encoded by GC-rich codons such as Gly, Ala, Arg, and Pro increase significantly in the products of the highly expressed genes. Investigation of nucleotide substitution patterns in P. falciparum and other Plasmodium species reveals that the nonsynonymous sites of highly expressed genes are more conserved than those of the lowly expressed ones, though for synonymous sites, the reverse is true. The highly expressed genes are, therefore, expected to be closer to their putative ancestral state in amino acid composition, and a plausible reason for their sequences being GC-rich at nonsynonymous codon positions could be that their ancestral state was less AT-biased. Negative correlation of the expression level of proteins with respective molecular weights supports the notion that P. falciparum, in spite of its intracellular parasitic lifestyle, follows the principle of cost minimization. [Reviewing Editor : Dr. Richard Kliman]  相似文献   

15.
The reduced genomes of the apicoplast and mitochondrion of the malaria parasite Plasmodium falciparum are actively translated and antibiotic‐mediated translation inhibition is detrimental to parasite survival. In order to understand recycling of organellar ribosomes, a critical step in protein translation, we identified ribosome recycling factors (RRF) encoded by the parasite nuclear genome. Targeting of PfRRF1 and PfRRF2 to the apicoplast and mitochondrion respectively was established by localization of leader sequence–GFP fusions. Unlike any RRF characterized thus far, PfRRF2 formed dimers with disulphide interaction(s) and additionally localized in the cytoplasm, thus suggesting adjunct functions for the factor. PfRRF1 carries a large 108‐amino‐acid insertion in the functionally critical hinge region between the head and tail domains of the protein, yet complemented Escherichia coli RRF in the LJ14frrts mutant and disassembled surrogate E. coli 70S ribosomes in the presence of apicoplast‐targeted EF‐G. Recombinant PfRRF2 bound E. coli ribosomes and could split monosomes in the presence of the relevant mitochondrial EF‐G but failed to complement the LJ14frrts mutant. Although proteins comprising subunits of P. falciparum organellar ribosomes are predicted to differ from bacterial and mitoribosomal counterparts, our results indicate that the essential interactions required for recycling are conserved in parasite organelles.  相似文献   

16.
Erythrocyte invasion is a critical step for survival of Plasmodium parasites, the causative agents of malaria, in their host and recognition of the host cell receptors by Plasmodium erythrocyte-binding-like (EBL) proteins plays an important role. Although EBL subcellular localization was shown to be closely linked to parasite virulence in the rodent model of malaria, the trafficking of EBL to micronemes, the secretory organelle in the invasive parasite is not fully understood. In this study, we assessed the impact of the deletion and amino acid replacement of Plasmodium falciparum EBL (EBA-175) using transgenic P. falciparum lines expressing modified EBA-175. We found that, in addition to a signal peptide and a cysteine rich region (region 6) to the cytoplasmic tail, a previously unrecognized sequence segment in region 5 was required for correct microneme trafficking of EBA-175. Replacement of Arg or Phe residues in this segment altered microneme trafficking, suggesting that the sequence itself contained critical information. Based on these findings, we propose that the sequence segment in region 5 is also required for the recognition of EBA-175 by the trafficking machinery to direct this protein to the microneme. Our results provide key information to clarify an as yet unidentified EBA-175 trafficking mechanism.  相似文献   

17.
Thanks to the extensive use of recombinant DNA technology and immunological methods, much insight into cellular functions of the human malaria parasite Plasmodium falciparum has been gained since it was learnt ten years ago how to grow this organism in culture. The amino acid sequence of over a dozen surface proteins of the parasite and of several proteins the parasite excretes into its most important host cell, the erythrocyte, have been determined. Interestingly many of these proteins show blocks of repeated amino acids. Several proteins have been shown to be involved in specific aspects of the complex hostparasite interaction, such as penetration of host cells or increased stickiness of infected red blood cells in the blood vessels. Some of the proteins described here may be protective antigens and may become important in vaccine development.  相似文献   

18.
Secretory proteins are of particular importance to apicomplexan parasites and comprise over 15% of the genomes of the human pathogens that cause diseases like malaria, toxoplasmosis and babesiosis as well as other diseases of agricultural significance. Here, we developed an approach that allows us to control the trafficking destination of secretory proteins in the human malaria parasite Plasmodium falciparum. Based on the unique structural requirements of apicoplast transit peptides, we designed three conditional localization domains (CLD1, 2 and 3) that can be used to control protein trafficking via the addition of a cell permeant ligand. Studies comparing the trafficking dynamics of each CLD show that CLD2 has the most optimal trafficking efficiency. To validate this system, we tested whether CLD2 could conditionally localize a biotin ligase called holocarboxylase synthetase 1 (HCS1) without interfering with the function of the enzyme. In a parasite line expressing CLD2‐HCS1, we were able to control protein biotinylation in the apicoplast in a ligand‐dependent manner, demonstrating the full functionality of the CLD tool. We have developed and validated a novel molecular tool that may be used in future studies to help elucidate the function of secretory proteins in malaria parasites.  相似文献   

19.
Wang  Cui-cui  Fang  Yaping  Xiao  Jiamin  Li  Menglong 《Amino acids》2011,40(1):239-248
RNA–protein interactions play a pivotal role in various biological processes, such as mRNA processing, protein synthesis, assembly, and function of ribosome. In this work, we have introduced a computational method for predicting RNA-binding sites in proteins based on support vector machines by using a variety of features from amino acid sequence information including position-specific scoring matrix (PSSM) profiles, physicochemical properties and predicted solvent accessibility. Considering the influence of the surrounding residues of an amino acid and the dependency effect from the neighboring amino acids, a sliding window and a smoothing window are used to encode the PSSM profiles. The outer fivefold cross-validation method is evaluated on the data set of 77 RNA-binding proteins (RBP77). It achieves an overall accuracy of 88.66% with the Matthew’s correlation coefficient (MCC) of 0.69. Furthermore, an independent data set of 39 RNA-binding proteins (RBP39) is employed to further evaluate the performance and achieves an overall accuracy of 82.36% with the MCC of 0.44. The result shows that our method has good generalization abilities in predicting RNA-binding sites for novel proteins. Compared with other previous methods, our method performs well on the same data set. The prediction results suggest that the used features are effective in predicting RNA-binding sites in proteins. The code and all data sets used in this article are freely available at .  相似文献   

20.
Plasmodium falciparum is responsible for the majority of life-threatening cases of human malaria. The global emergence of drug-resistant malarial parasites necessitates identification and characterization of novel drug targets. Carbonic anhydrase (CA) is present at high levels in human red cells and in P. falciparum. Existence of at least three isozymes of the α class was demonstrated in P. falciparum and a rodent malarial parasite Plasmodium berghei. The major isozyme CA1 was purified and partially characterized from P. falciparum (PfCA1). A search of the malarial genome database yielded an open reading frame similar to the α-CAs from various organisms, including human. The primary amino acid sequence of the PfCA1 has 60% identity with a rodent parasite Plasmodium yoelii enzyme (PyCA). The single open reading frames encoded 235 and 252 amino acid proteins for PfCA1 and PyCA, respectively. The highly conserved active site residues were also found among organisms having α-CAs. The PfCA1 gene was cloned, sequenced and expressed in Escherichia coli. The purified recombinant PfCA1 enzyme was catalytically active. It was sensitive to acetazolamide and sulfanilamide inhibition. Kinetic properties of the recombinant PfCA1 revealed the authenticity to the wild type enzyme purified from P. falciparum in vitro culture. Furthermore, the PfCA1 inhibitors acetazolamide and sulfanilamide showed good antimalarial effect on the in vitro growth of P. falciparum. Our molecular tools developed for the recombinant enzyme expression will be useful for developing potential antimalarials directed at P. falciparum carbonic anhydrase.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号