共查询到18条相似文献,搜索用时 0 毫秒
1.
Summary. Transmembrane (TM) proteins represent about 20–30% of the protein sequences in higher eukaryotes, playing important roles
across a range of cellular functions. Moreover, knowledge about topology of these proteins often provides crucial hints toward
their function. Due to the difficulties in experimental structure determinations of TM protein, theoretical prediction methods
are highly preferred in identifying the topology of newly found ones according to their primary sequences, useful in both
basic research and drug discovery. In this paper, based on the concept of pseudo amino acid composition (PseAA) that can incorporate
sequence-order information of a protein sequence so as to remarkably enhance the power of discrete models (Chou, K. C., Proteins:
Structure, Function, and Genetics, 2001, 43: 246–255), cellular automata and Lempel-Ziv complexity are introduced to predict
the TM regions of integral membrane proteins including both α-helical and β-barrel membrane proteins, validated by jackknife
test. The result thus obtained is quite promising, which indicates that the current approach might be a quite potential high
throughput tool in the post-genomic era. The source code and dataset are available for academic users at liml@scu.edu.cn.
Authors’ address: Menglong Li, College of Chemistry, Sichuan University, Chengdu, Sichuan 610064, P.R. China 相似文献
2.
Using cellular automata images and pseudo amino acid composition to predict protein subcellular location 总被引:6,自引:0,他引:6
Summary. The avalanche of newly found protein sequences in the post-genomic era has motivated and challenged us to develop an automated
method that can rapidly and accurately predict the localization of an uncharacterized protein in cells because the knowledge
thus obtained can greatly speed up the process in finding its biological functions. However, it is very difficult to establish
such a desired predictor by acquiring the key statistical information buried in a pile of extremely complicated and highly
variable sequences. In this paper, based on the concept of the pseudo amino acid composition (Chou, K. C. PROTEINS: Structure, Function, and Genetics, 2001, 43: 246–255), the approach of cellular automata image is introduced to cope with this problem. Many important features,
which are originally hidden in the long amino acid sequences, can be clearly displayed through their cellular automata images.
One of the remarkable merits by doing so is that many image recognition tools can be straightforwardly applied to the target
aimed here. High success rates were observed through the self-consistency, jackknife, and independent dataset tests, respectively. 相似文献
3.
Predicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid composition 总被引:2,自引:0,他引:2
The nucleus is the brain of eukaryotic cells that guides the life processes of the cell by issuing key instructions. For in-depth understanding of the biochemical process of the nucleus, the knowledge of localization of nuclear proteins is very important. With the avalanche of protein sequences generated in the post-genomic era, it is highly desired to develop an automated method for fast annotating the subnuclear locations for numerous newly found nuclear protein sequences so as to be able to timely utilize them for basic research and drug discovery. In view of this, a novel approach is developed for predicting the protein subnuclear location. It is featured by introducing a powerful classifier, the optimized evidence-theoretic K-nearest classifier, and using the pseudo amino acid composition [K.C. Chou, PROTEINS: Structure, Function, and Genetics, 43 (2001) 246], which can incorporate a considerable amount of sequence-order effects, to represent protein samples. As a demonstration, identifications were performed for 370 nuclear proteins among the following 9 subnuclear locations: (1) Cajal body, (2) chromatin, (3) heterochromatin, (4) nuclear diffuse, (5) nuclear pore, (6) nuclear speckle, (7) nucleolus, (8) PcG body, and (9) PML body. The overall success rates thus obtained by both the re-substitution test and jackknife cross-validation test are significantly higher than those by existing classifiers on the same working dataset. It is anticipated that the powerful approach may also become a useful high throughput vehicle to bridge the huge gap occurring in the post-genomic era between the number of gene sequences in databases and the number of gene products that have been functionally characterized. The OET-KNN classifier will be available at www.pami.sjtu.edu.cn/people/hbshen. 相似文献
4.
Summary. The interaction of non-covalently bound monomeric protein subunits forms oligomers. The oligomeric proteins are superior to
the monomers within the scope of functional evolution of biomacromolecules. Such complexes are involved in various biological
processes, and play an important role. It is highly desirable to predict oligomer types automatically from their sequence.
Here, based on the concept of pseudo amino acid composition, an improved feature extraction method of weighted auto-correlation
function of amino acid residue index and Naive Bayes multi-feature fusion algorithm is proposed and applied to predict protein
homo-oligomer types. We used the support vector machine (SVM) as base classifiers, in order to obtain better results. For
example, the total accuracies of A, B, C, D and E sets based on this improved feature extraction method are 77.63, 77.16,
76.46, 76.70 and 75.06% respectively in the jackknife test, which are 6.39, 5.92, 5.22, 5.46 and 3.82% higher than that of
G set based on conventional amino acid composition method with the same SVM. Comparing with Chou’s feature extraction method
of incorporating quasi-sequence-order effect, our method can increase the total accuracy at a level of 3.51 to 1.01%. The
total accuracy improves from 79.66 to 80.83% by using the Naive Bayes Feature Fusion algorithm. These results show: 1) The
improved feature extraction method is effective and feasible, and the feature vectors based on this method may contain more
protein quaternary structure information and appear to capture essential information about the composition and hydrophobicity
of residues in the surface patches that buried in the interfaces of associated subunits; 2) Naive Bayes Feature Fusion algorithm
and SVM can be referred as a powerful computational tool for predicting protein homo-oligomer types. 相似文献
5.
Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo-amino acid composition 总被引:3,自引:0,他引:3
Apoptosis proteins are very important for understanding the mechanism of programmed cell death. The apoptosis protein localization can provide valuable information about its molecular function. The prediction of localization of an apoptosis protein is a challenging task. In our previous work we proposed an increment of diversity (ID) method using protein sequence information for this prediction task. In this work, based on the concept of Chou's pseudo-amino acid composition [Chou, K.C., 2001. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins: Struct. Funct. Genet. (Erratum: Chou, K.C., 2001, vol. 44, 60) 43, 246-255, Chou, K.C., 2005. Using amphiphilic pseudo-amino acid composition to predict enzyme subfamily classes. Bioinformatics 21, 10-19], a different pseudo-amino acid composition by using the hydropathy distribution information is introduced. A novel ID_SVM algorithm combined ID with support vector machine (SVM) is proposed. This method is applied to three data sets (317 apoptosis proteins, 225 apoptosis proteins and 98 apoptosis proteins). The higher predictive success rates than the previous algorithms are obtained by the jackknife tests. 相似文献
6.
Summary. With the avalanche of new protein sequences we are facing in the post-genomic era, it is vitally important to develop an automated method for fast and accurately determining the subcellular location of uncharacterized proteins. In this article, based on the concept of pseudo amino acid composition (Chou, K.C. Proteins: Structure, Function, and Genetics, 2001, 43: 246–255), three pseudo amino acid components are introduced via Lyapunov index, Bessel function, Chebyshev filter that can be more efficiently used to deal with the chaos and complexity in protein sequences, leading to a higher success rate in predicting protein subcellular location. 相似文献
7.
The conotoxin proteins are disulfide rich small peptides that target ion channels and G protein coupled receptors. And they provide promising application in treating some chronic pain, epilepsy, cardiovascular diseases, and so on. Conotoxins may be classified into 11 superfamilies: A, D, I1, I2, J, L, M, O, P, S, and T according to the disulfide connectivity, highly conserved N-terminal precursor sequence and similar mode of actions. Successful prediction mature conotoxin superfamily peptide has important signification for the biological and pharmacological functions of the toxins. In this study, a new algorithm of increment of diversity combined with modified Mahalanobis discriminant is presented to predict five superfamilies by using the pseudo amino acid composition. The results of jackknife cross-validation test show that the overall prediction sensitivity and specificity are 88% and 91%, respectively. The predictive algorithm is also used to predict three O-conotoxin families. The 72% sensitivity and 78% specificity are obtained. These results indicate that the conotoxin superfamily peptides correlate with their amino acid compositions. 相似文献
8.
Lin H 《Journal of theoretical biology》2008,252(2):350-356
The outer membrane proteins (OMPs) are β-barrel membrane proteins that performed lots of biology functions. The discriminating OMPs from other non-OMPs is a very important task for understanding some biochemical process. In this study, a method that combines increment of diversity with modified Mahalanobis Discriminant, called IDQD, is presented to predict 208 OMPs, 206 transmembrane helical proteins (TMHPs) and 673 globular proteins (GPs) by using Chou's pseudo amino acid compositions as parameters. The overall accuracy of jackknife cross-validation is 93.2% and 96.1%, respectively, for three datasets (OMPs, TMHPs and GPs) and two datasets (OMPs and non-OMPs). These predicted results suggest that the method can be effectively applied to discriminate OMPs, TMHPs and GPs. And it also indicates that the pseudo amino acid composition can better reflect the core feature of membrane proteins than the classical amino acid composition. 相似文献
9.
10.
11.
Ruifeng Xu Jiyun Zhou Yulan He Quan Zou Xiaolong Wang 《Journal of biomolecular structure & dynamics》2013,31(8):1720-1730
DNA-binding proteins are crucial for various cellular processes and hence have become an important target for both basic research and drug development. With the avalanche of protein sequences generated in the postgenomic age, it is highly desired to establish an automated method for rapidly and accurately identifying DNA-binding proteins based on their sequence information alone. Owing to the fact that all biological species have developed beginning from a very limited number of ancestral species, it is important to take into account the evolutionary information in developing such a high-throughput tool. In view of this, a new predictor was proposed by incorporating the evolutionary information into the general form of pseudo amino acid composition via the top-n-gram approach. It was observed by comparing the new predictor with the existing methods via both jackknife test and independent data-set test that the new predictor outperformed its counterparts. It is anticipated that the new predictor may become a useful vehicle for identifying DNA-binding proteins. It has not escaped our notice that the novel approach to extract evolutionary information into the formulation of statistical samples can be used to identify many other protein attributes as well. 相似文献
12.
In the Post Genome Age, there is an urgent need to develop the reliable and effective computational methods to predict the
subcellular localization for the explosion of newly found proteins. Here, a novel method of pseudo amino acid (PseAA) composition,
the so-called “amino acid composition distribution” (AACD), is introduced. First, a protein sequence is divided equally into
multiple segments. Then, amino acid composition of each segment is calculated in series. After that, each protein sequence
can be represented by a feature vector. Finally, the feature vectors of all sequences thus obtained are further input into
the multi-class support vector machines to predict the subcellular localization. The results show that AACD is quite effective
in representing protein sequences for the purpose of predicting protein subcellular localization. 相似文献
13.
Using string kernel to predict signal peptide cleavage site based on subsite coupling model 总被引:2,自引:0,他引:2
Summary. Owing to the importance of signal peptides for studying the molecular mechanisms of genetic diseases, reprogramming cells
for gene therapy, and finding new drugs for healing a specific defect, it is in great demand to develop a fast and accurate
method to identify the signal peptides. Introduction of the so-called {−3,−1, +1} coupling model (Chou, K. C.: Protein Engineering, 2001, 14–2, 75–79) has made it possible to take into account the coupling effect among some key subsites and hence can significantly
enhance the prediction quality of peptide cleavage site. Based on the subsite coupling model, a kind of string kernels for
protein sequence is introduced. Integrating the biologically relevant prior knowledge, the constructed string kernels can
thus be used by any kernel-based method. A Support vector machines (SVM) is thus built to predict the cleavage site of signal
peptides from the protein sequences. The current approach is compared with the classical weight matrix method. At small false
positive ratios, our method outperforms the classical weight matrix method, indicating the current approach may at least serve
as a powerful complemental tool to other existing methods for predicting the signal peptide cleavage site.
The software that generated the results reported in this paper is available upon requirement, and will appear at http://www.pami.sjtu.edu.cn/wm.
An erratum to this article is available at . 相似文献
14.
Prediction of protein subcellular location is a meaningful task which attracted much attention in recent years. A lot of protein subcellular location predictors which can only deal with the single-location proteins were developed. However, some proteins may belong to two or even more subcellular locations. It is important to develop predictors which will be able to deal with multiplex proteins, because these proteins have extremely useful implication in both basic biological research and drug discovery. Considering the circumstance that the number of methods dealing with multiplex proteins is limited, it is meaningful to explore some new methods which can predict subcellular location of proteins with both single and multiple sites. Different methods of feature extraction and different models of predict algorithms using on different benchmark datasets may receive some general results. In this paper, two different feature extraction methods and two different models of neural networks were performed on three benchmark datasets of different kinds of proteins, i.e. datasets constructed specially for Gram-positive bacterial proteins, plant proteins and virus proteins. These benchmark datasets have different number of location sites. The application result shows that RBF neural network has apparently superiorities against BP neural network on these datasets no matter which type of feature extraction is chosen. 相似文献
15.
Summary. Recent advances in large-scale genome sequencing have led to the rapid accumulation of amino acid sequences of proteins whose functions are unknown. Because the functions of these proteins are closely correlated with their subcellular localizations, it is vitally important to develop an automated method as a high-throughput tool to timely identify their subcellular location. Based on the concept of the pseudo amino acid composition by which a considerable amount of sequence-order effects can be incorporated into a set of discrete numbers (Chou, K. C., Proteins: Structure, Function, and Genetics, 2001, 43: 246–255), the complexity measure approach is introduced. The advantage by incorporating the complexity measure factor as one of the pseudo amino acid components for a protein is that it can more effectively reflect its overall sequence-order feature than the conventional correlation factors. With such a formulation frame to represent the samples of protein sequences, the covariant-discriminant predictor (Chou, K. C. and Elrod, D. W., Protein Engineering, 1999, 12: 107–118) was adopted to conduct prediction. High success rates were obtained by both the jackknife cross-validation test and independent dataset test, suggesting that introduction of the concept of the complexity measure into prediction of protein subcellular location is quite promising, and might also hold a great potential as a useful vehicle for the other areas of molecular biology. 相似文献
16.
A novel approach was developed for predicting the structural classes of proteins based on their sequences. It was assumed that proteins belonging to the same structural class must bear some sort of similar texture on the images generated by the cellular automaton evolving rule [Wolfram, S., 1984. Cellular automation as models of complexity. Nature 311, 419-424]. Based on this, two geometric invariant moment factors derived from the image functions were used as the pseudo amino acid components [Chou, K.C., 2001. Prediction of protein cellular attributes using pseudo amino acid composition. Proteins: Struct., Funct., Genet. (Erratum: ibid., 2001, vol. 44, 60) 43, 246-255] to formulate the protein samples for statistical prediction. The success rates thus obtained on a previously constructed benchmark dataset are quite promising, implying that the cellular automaton image can help to reveal some inherent and subtle features deeply hidden in a pile of long and complicated amino acid sequences. 相似文献
17.
Summary. We have studied the enzymatic derivatization of amino acids by use of the polyphenol oxidase laccase. Derivatization of L-tryptophan
was achieved by enzymatic crosslinking with the laccase substrate 2,5-dihydroxy-N-(2-hydroxyethyl)-benzamide. The main product
(yield up to 70%) was identified as the quinoid compound 2-[2-(2-hydroxy-ethylcarbamoyl)-3,6-dioxo-cyclohexa-1,4-dienylamino]-3-(1H-indol-3-yl)-
propionic acid and demonstrates that laccase-catalyzed C–N-coupling occurred on the amino group of the aliphatic side chain.
These enzyme based reactions provide a simple and fast method for the derivatization of unprotected amino acids. 相似文献
18.
Summary. Excitatory amino acids which promote the survival of cerebellar granule cells in culture, also promote the expression of
the survival of motor neuron (SMN) protein. Immunolocalization studies using SMN monoclonal antibody showed that SMN is decreased
in cultures grown in low K+ or chemically defined medium with respect to cultures grown in high K+ medium and that an increase of SMN can be induced by treatment of low K+ cultures with glutamate or N-methyl-D-aspartate.
Received March 31, 1999 相似文献