期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Using pseudo amino acid composition to predict transmembrane regions in protein: cellular automata and Lempel-Ziv complexity

Diao Y Ma D Wen Z Yin J Xiang J Li M 《Amino acids》2008,34(1):111-117

Summary. Transmembrane (TM) proteins represent about 20–30% of the protein sequences in higher eukaryotes, playing important roles across a range of cellular functions. Moreover, knowledge about topology of these proteins often provides crucial hints toward their function. Due to the difficulties in experimental structure determinations of TM protein, theoretical prediction methods are highly preferred in identifying the topology of newly found ones according to their primary sequences, useful in both basic research and drug discovery. In this paper, based on the concept of pseudo amino acid composition (PseAA) that can incorporate sequence-order information of a protein sequence so as to remarkably enhance the power of discrete models (Chou, K. C., Proteins: Structure, Function, and Genetics, 2001, 43: 246–255), cellular automata and Lempel-Ziv complexity are introduced to predict the TM regions of integral membrane proteins including both α-helical and β-barrel membrane proteins, validated by jackknife test. The result thus obtained is quite promising, which indicates that the current approach might be a quite potential high throughput tool in the post-genomic era. The source code and dataset are available for academic users at liml@scu.edu.cn. Authors’ address: Menglong Li, College of Chemistry, Sichuan University, Chengdu, Sichuan 610064, P.R. China 相似文献

2.

Using cellular automata images and pseudo amino acid composition to predict protein subcellular location 总被引：6，自引：0，他引：6

Xiao X Shao S Ding Y Huang Z Chou KC 《Amino acids》2006,30(1):49-54

Summary. The avalanche of newly found protein sequences in the post-genomic era has motivated and challenged us to develop an automated method that can rapidly and accurately predict the localization of an uncharacterized protein in cells because the knowledge thus obtained can greatly speed up the process in finding its biological functions. However, it is very difficult to establish such a desired predictor by acquiring the key statistical information buried in a pile of extremely complicated and highly variable sequences. In this paper, based on the concept of the pseudo amino acid composition (Chou, K. C. PROTEINS: Structure, Function, and Genetics, 2001, 43: 246–255), the approach of cellular automata image is introduced to cope with this problem. Many important features, which are originally hidden in the long amino acid sequences, can be clearly displayed through their cellular automata images. One of the remarkable merits by doing so is that many image recognition tools can be straightforwardly applied to the target aimed here. High success rates were observed through the self-consistency, jackknife, and independent dataset tests, respectively. 相似文献

3.

Predicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid composition 总被引：2，自引：0，他引：2

Shen HB Chou KC 《Biochemical and biophysical research communications》2005,337(3):752-756

The nucleus is the brain of eukaryotic cells that guides the life processes of the cell by issuing key instructions. For in-depth understanding of the biochemical process of the nucleus, the knowledge of localization of nuclear proteins is very important. With the avalanche of protein sequences generated in the post-genomic era, it is highly desired to develop an automated method for fast annotating the subnuclear locations for numerous newly found nuclear protein sequences so as to be able to timely utilize them for basic research and drug discovery. In view of this, a novel approach is developed for predicting the protein subnuclear location. It is featured by introducing a powerful classifier, the optimized evidence-theoretic K-nearest classifier, and using the pseudo amino acid composition [K.C. Chou, PROTEINS: Structure, Function, and Genetics, 43 (2001) 246], which can incorporate a considerable amount of sequence-order effects, to represent protein samples. As a demonstration, identifications were performed for 370 nuclear proteins among the following 9 subnuclear locations: (1) Cajal body, (2) chromatin, (3) heterochromatin, (4) nuclear diffuse, (5) nuclear pore, (6) nuclear speckle, (7) nucleolus, (8) PcG body, and (9) PML body. The overall success rates thus obtained by both the re-substitution test and jackknife cross-validation test are significantly higher than those by existing classifiers on the same working dataset. It is anticipated that the powerful approach may also become a useful high throughput vehicle to bridge the huge gap occurring in the post-genomic era between the number of gene sequences in databases and the number of gene products that have been functionally characterized. The OET-KNN classifier will be available at www.pami.sjtu.edu.cn/people/hbshen. 相似文献

4.

Prediction of protein homo-oligomer types by pseudo amino acid composition: Approached with an improved feature extraction and Naive Bayes Feature Fusion

Zhang SW Pan Q Zhang HC Shao ZC Shi JY 《Amino acids》2006,30(4):461-468

Summary. The interaction of non-covalently bound monomeric protein subunits forms oligomers. The oligomeric proteins are superior to the monomers within the scope of functional evolution of biomacromolecules. Such complexes are involved in various biological processes, and play an important role. It is highly desirable to predict oligomer types automatically from their sequence. Here, based on the concept of pseudo amino acid composition, an improved feature extraction method of weighted auto-correlation function of amino acid residue index and Naive Bayes multi-feature fusion algorithm is proposed and applied to predict protein homo-oligomer types. We used the support vector machine (SVM) as base classifiers, in order to obtain better results. For example, the total accuracies of A, B, C, D and E sets based on this improved feature extraction method are 77.63, 77.16, 76.46, 76.70 and 75.06% respectively in the jackknife test, which are 6.39, 5.92, 5.22, 5.46 and 3.82% higher than that of G set based on conventional amino acid composition method with the same SVM. Comparing with Chou’s feature extraction method of incorporating quasi-sequence-order effect, our method can increase the total accuracy at a level of 3.51 to 1.01%. The total accuracy improves from 79.66 to 80.83% by using the Naive Bayes Feature Fusion algorithm. These results show: 1) The improved feature extraction method is effective and feasible, and the feature vectors based on this method may contain more protein quaternary structure information and appear to capture essential information about the composition and hydrophobicity of residues in the surface patches that buried in the interfaces of associated subunits; 2) Naive Bayes Feature Fusion algorithm and SVM can be referred as a powerful computational tool for predicting protein homo-oligomer types. 相似文献

5.

Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo-amino acid composition 总被引：3，自引：0，他引：3

Chen YL Li QZ 《Journal of theoretical biology》2007,248(2):377-381

Apoptosis proteins are very important for understanding the mechanism of programmed cell death. The apoptosis protein localization can provide valuable information about its molecular function. The prediction of localization of an apoptosis protein is a challenging task. In our previous work we proposed an increment of diversity (ID) method using protein sequence information for this prediction task. In this work, based on the concept of Chou's pseudo-amino acid composition [Chou, K.C., 2001. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins: Struct. Funct. Genet. (Erratum: Chou, K.C., 2001, vol. 44, 60) 43, 246-255, Chou, K.C., 2005. Using amphiphilic pseudo-amino acid composition to predict enzyme subfamily classes. Bioinformatics 21, 10-19], a different pseudo-amino acid composition by using the hydropathy distribution information is introduced. A novel ID_SVM algorithm combined ID with support vector machine (SVM) is proposed. This method is applied to three data sets (317 apoptosis proteins, 225 apoptosis proteins and 98 apoptosis proteins). The higher predictive success rates than the previous algorithms are obtained by the jackknife tests. 相似文献

6.

Using pseudo amino acid composition to predict protein subcellular location: Approached with Lyapunov index, Bessel function, and Chebyshev filter 总被引：1，自引：2，他引：1

Gao Y Shao S Xiao X Ding Y Huang Y Huang Z Chou KC 《Amino acids》2005,28(4):373-376

Summary. With the avalanche of new protein sequences we are facing in the post-genomic era, it is vitally important to develop an automated method for fast and accurately determining the subcellular location of uncharacterized proteins. In this article, based on the concept of pseudo amino acid composition (Chou, K.C. Proteins: Structure, Function, and Genetics, 2001, 43: 246–255), three pseudo amino acid components are introduced via Lyapunov index, Bessel function, Chebyshev filter that can be more efficiently used to deal with the chaos and complexity in protein sequences, leading to a higher success rate in predicting protein subcellular location. 相似文献

7.

Predicting conotoxin superfamily and family by using pseudo amino acid composition and modified Mahalanobis discriminant

Lin H Li QZ 《Biochemical and biophysical research communications》2007,354(2):548-551

The conotoxin proteins are disulfide rich small peptides that target ion channels and G protein coupled receptors. And they provide promising application in treating some chronic pain, epilepsy, cardiovascular diseases, and so on. Conotoxins may be classified into 11 superfamilies: A, D, I1, I2, J, L, M, O, P, S, and T according to the disulfide connectivity, highly conserved N-terminal precursor sequence and similar mode of actions. Successful prediction mature conotoxin superfamily peptide has important signification for the biological and pharmacological functions of the toxins. In this study, a new algorithm of increment of diversity combined with modified Mahalanobis discriminant is presented to predict five superfamilies by using the pseudo amino acid composition. The results of jackknife cross-validation test show that the overall prediction sensitivity and specificity are 88% and 91%, respectively. The predictive algorithm is also used to predict three O-conotoxin families. The 72% sensitivity and 78% specificity are obtained. These results indicate that the conotoxin superfamily peptides correlate with their amino acid compositions. 相似文献

8.

The modified Mahalanobis Discriminant for predicting outer membrane proteins by using Chou's pseudo amino acid composition

Lin H 《Journal of theoretical biology》2008,252(2):350-356

The outer membrane proteins (OMPs) are β-barrel membrane proteins that performed lots of biology functions. The discriminating OMPs from other non-OMPs is a very important task for understanding some biochemical process. In this study, a method that combines increment of diversity with modified Mahalanobis Discriminant, called IDQD, is presented to predict 208 OMPs, 206 transmembrane helical proteins (TMHPs) and 673 globular proteins (GPs) by using Chou's pseudo amino acid compositions as parameters. The overall accuracy of jackknife cross-validation is 93.2% and 96.1%, respectively, for three datasets (OMPs, TMHPs and GPs) and two datasets (OMPs and non-OMPs). These predicted results suggest that the method can be effectively applied to discriminate OMPs, TMHPs and GPs. And it also indicates that the pseudo amino acid composition can better reflect the core feature of membrane proteins than the classical amino acid composition. 相似文献

9.

Using the augmented Chou's pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach

Yu-hong Zeng 《Journal of theoretical biology》2009,259(2):366-372

相似文献

10.

Identification of DNA-binding proteins by incorporating evolutionary information into pseudo amino acid composition via the top-n-gram approach

Ruifeng Xu Jiyun Zhou Yulan He Quan Zou Xiaolong Wang 《Journal of biomolecular structure & dynamics》2013,31(8):1720-1730

DNA-binding proteins are crucial for various cellular processes and hence have become an important target for both basic research and drug development. With the avalanche of protein sequences generated in the postgenomic age, it is highly desired to establish an automated method for rapidly and accurately identifying DNA-binding proteins based on their sequence information alone. Owing to the fact that all biological species have developed beginning from a very limited number of ancestral species, it is important to take into account the evolutionary information in developing such a high-throughput tool. In view of this, a new predictor was proposed by incorporating the evolutionary information into the general form of pseudo amino acid composition via the top-n-gram approach. It was observed by comparing the new predictor with the existing methods via both jackknife test and independent data-set test that the new predictor outperformed its counterparts. It is anticipated that the new predictor may become a useful vehicle for identifying DNA-binding proteins. It has not escaped our notice that the novel approach to extract evolutionary information into the formulation of statistical samples can be used to identify many other protein attributes as well. 相似文献

11.

Using pseudo amino acid composition to predict protein subcellular location: approached with amino acid composition distribution

Shi JY Zhang SW Pan Q Zhou GP 《Amino acids》2008,35(2):321-327

In the Post Genome Age, there is an urgent need to develop the reliable and effective computational methods to predict the subcellular localization for the explosion of newly found proteins. Here, a novel method of pseudo amino acid (PseAA) composition, the so-called “amino acid composition distribution” (AACD), is introduced. First, a protein sequence is divided equally into multiple segments. Then, amino acid composition of each segment is calculated in series. After that, each protein sequence can be represented by a feature vector. Finally, the feature vectors of all sequences thus obtained are further input into the multi-class support vector machines to predict the subcellular localization. The results show that AACD is quite effective in representing protein sequences for the purpose of predicting protein subcellular localization. 相似文献

12.

Using string kernel to predict signal peptide cleavage site based on subsite coupling model 总被引：2，自引：0，他引：2

Wang M Yang J Chou KC 《Amino acids》2005,28(4):395-402

Summary. Owing to the importance of signal peptides for studying the molecular mechanisms of genetic diseases, reprogramming cells for gene therapy, and finding new drugs for healing a specific defect, it is in great demand to develop a fast and accurate method to identify the signal peptides. Introduction of the so-called {−3,−1, +1} coupling model (Chou, K. C.: Protein Engineering, 2001, 14–2, 75–79) has made it possible to take into account the coupling effect among some key subsites and hence can significantly enhance the prediction quality of peptide cleavage site. Based on the subsite coupling model, a kind of string kernels for protein sequence is introduced. Integrating the biologically relevant prior knowledge, the constructed string kernels can thus be used by any kernel-based method. A Support vector machines (SVM) is thus built to predict the cleavage site of signal peptides from the protein sequences. The current approach is compared with the classical weight matrix method. At small false positive ratios, our method outperforms the classical weight matrix method, indicating the current approach may at least serve as a powerful complemental tool to other existing methods for predicting the signal peptide cleavage site. The software that generated the results reported in this paper is available upon requirement, and will appear at http://www.pami.sjtu.edu.cn/wm. An erratum to this article is available at . 相似文献

13.

Using radial basis function on the general form of Chou's pseudo amino acid composition and PSSM to predict subcellular locations of proteins with both single and multiple sites 总被引：1，自引：0，他引：1

Chao Huang Jingqi Yuan 《Bio Systems》2013

Prediction of protein subcellular location is a meaningful task which attracted much attention in recent years. A lot of protein subcellular location predictors which can only deal with the single-location proteins were developed. However, some proteins may belong to two or even more subcellular locations. It is important to develop predictors which will be able to deal with multiplex proteins, because these proteins have extremely useful implication in both basic biological research and drug discovery. Considering the circumstance that the number of methods dealing with multiplex proteins is limited, it is meaningful to explore some new methods which can predict subcellular location of proteins with both single and multiple sites. Different methods of feature extraction and different models of predict algorithms using on different benchmark datasets may receive some general results. In this paper, two different feature extraction methods and two different models of neural networks were performed on three benchmark datasets of different kinds of proteins, i.e. datasets constructed specially for Gram-positive bacterial proteins, plant proteins and virus proteins. These benchmark datasets have different number of location sites. The application result shows that RBF neural network has apparently superiorities against BP neural network on these datasets no matter which type of feature extraction is chosen. 相似文献

14.

Using complexity measure factor to predict protein subcellular location

Xiao X Shao S Ding Y Huang Z Huang Y Chou KC 《Amino acids》2005,28(1):57-61

Summary. Recent advances in large-scale genome sequencing have led to the rapid accumulation of amino acid sequences of proteins whose functions are unknown. Because the functions of these proteins are closely correlated with their subcellular localizations, it is vitally important to develop an automated method as a high-throughput tool to timely identify their subcellular location. Based on the concept of the pseudo amino acid composition by which a considerable amount of sequence-order effects can be incorporated into a set of discrete numbers (Chou, K. C., Proteins: Structure, Function, and Genetics, 2001, 43: 246–255), the complexity measure approach is introduced. The advantage by incorporating the complexity measure factor as one of the pseudo amino acid components for a protein is that it can more effectively reflect its overall sequence-order feature than the conventional correlation factors. With such a formulation frame to represent the samples of protein sequences, the covariant-discriminant predictor (Chou, K. C. and Elrod, D. W., Protein Engineering, 1999, 12: 107–118) was adopted to conduct prediction. High success rates were obtained by both the jackknife cross-validation test and independent dataset test, suggesting that introduction of the concept of the complexity measure into prediction of protein subcellular location is quite promising, and might also hold a great potential as a useful vehicle for the other areas of molecular biology. 相似文献

15.

Laccase-induced derivatization of unprotected amino acid L-tryptophan by coupling with p-hydroquinone 2,5-dihydroxy-N-(2-hydroxyethyl)-benzamide

Manda K Hammer E Mikolasch A Gördes D Thurow K Schauer F 《Amino acids》2006,31(4):409-419

Summary. We have studied the enzymatic derivatization of amino acids by use of the polyphenol oxidase laccase. Derivatization of L-tryptophan was achieved by enzymatic crosslinking with the laccase substrate 2,5-dihydroxy-N-(2-hydroxyethyl)-benzamide. The main product (yield up to 70%) was identified as the quinoid compound 2-[2-(2-hydroxy-ethylcarbamoyl)-3,6-dioxo-cyclohexa-1,4-dienylamino]-3-(1H-indol-3-yl)- propionic acid and demonstrates that laccase-catalyzed C–N-coupling occurred on the amino group of the aliphatic side chain. These enzyme based reactions provide a simple and fast method for the derivatization of unprotected amino acids. 相似文献

16.

Excitatory amino acid stimulation of the survival of rat cerebellar granule cells in culture is associated with an increase in SMN, the spinal muscular atrophy disease gene product

Andreassi C Patrizi AL Brahe C Eboli ML 《Amino acids》2000,18(3):299-304

Summary. Excitatory amino acids which promote the survival of cerebellar granule cells in culture, also promote the expression of the survival of motor neuron (SMN) protein. Immunolocalization studies using SMN monoclonal antibody showed that SMN is decreased in cultures grown in low K⁺ or chemically defined medium with respect to cultures grown in high K⁺ medium and that an increase of SMN can be induced by treatment of low K⁺ cultures with glutamate or N-methyl-D-aspartate. Received March 31, 1999 相似文献