首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Naveed M  Khan A  Khan AU 《Amino acids》2012,42(5):1809-1823
G protein-coupled receptors (GPCRs) are transmembrane proteins, which transduce signals from extracellular ligands to intracellular G protein. Automatic classification of GPCRs can provide important information for the development of novel drugs in pharmaceutical industry. In this paper, we propose an evolutionary approach, GPCR-MPredictor, which combines individual classifiers for predicting GPCRs. GPCR-MPredictor is a web predictor that can efficiently predict GPCRs at five levels. The first level determines whether a protein sequence is a GPCR or a non-GPCR. If the predicted sequence is a GPCR, then it is further classified into family, subfamily, sub-subfamily, and subtype levels. In this work, our aim is to analyze the discriminative power of different feature extraction and classification strategies in case of GPCRs prediction and then to use an evolutionary ensemble approach for enhanced prediction performance. Features are extracted using amino acid composition, pseudo amino acid composition, and dipeptide composition of protein sequences. Different classification approaches, such as k-nearest neighbor (KNN), support vector machine (SVM), probabilistic neural networks (PNN), J48, Adaboost, and Naives Bayes, have been used to classify GPCRs. The proposed hierarchical GA-based ensemble classifier exploits the prediction results of SVM, KNN, PNN, and J48 at each level. The GA-based ensemble yields an accuracy of 99.75, 92.45, 87.80, 83.57, and 96.17% at the five levels, on the first dataset. We further perform predictions on a dataset consisting of 8,000 GPCRs at the family, subfamily, and sub-subfamily level, and on two other datasets of 365 and 167 GPCRs at the second and fourth levels, respectively. In comparison with the existing methods, the results demonstrate the effectiveness of our proposed GPCR-MPredictor in classifying GPCRs families. It is accessible at .  相似文献   

2.
Family 3 G-protein-coupled receptors (GPCRs), which includes metabotropic glutamate receptors (mGluRs), sweet and "umami" taste receptors (T1Rs), and the extracellular calcium-sensing receptor (CaR), represent a distinct group among the superfamily of GPCRs characterized by large amino-terminal extracellular ligand-binding domains (ECD) with homology to bacterial periplasmic amino acid-binding proteins that are responsible for signal detection and receptor activation through as yet unresolved mechanism(s) via the seven-transmembrane helical domain (7TMD) common to all GPCRs. To address the mechanism(s) by which ligand-induced conformational changes are conveyed from the ECD to the 7TMD for G-protein activation, we altered the length and composition of a 14-amino acid linker segment common to all family 3 GPCRs except GABA(B) receptor, in the CaR by insertion, deletion, and site-directed mutagenesis of specific highly conserved residues. Small alterations in the length and composition of the linker impaired cell surface expression and abrogated signaling of the chimeric receptors. The exchange of nine amino acids within the linker of CaR with the homologous sequence of mGluR1, however, preserved receptor function. Ala substitution for the four highly conserved residues within this amino acid sequence identified a Leu at position 606 of the CaR critical for cell surface expression and signaling. Substitution of Leu(606) for Ala resulted in impaired cell surface expression. However, Ile and Val substitutions displayed strong activating phenotypes. Disruption of the linker by insertion of nine amino acids of a random-coiled structure uncoupled the ECD from regulating the 7TMD. These data are consistent with a model of receptor activation in which the peptide linker, and particularly Leu(606), provides a critical interaction for the CaR signal transmission, a finding likely to be relevant for all family 3 GPCRs containing this conserved motif.  相似文献   

3.
G-protein-coupled receptors (GPCRs) constitute a remarkable protein family of receptors that are involved in a broad range of biological processes. A large number of clinically used drugs elicit their biological effect via a GPCR. Thus, developing a reliable computational method for predicting the functional roles of GPCRs would be very useful in the pharmaceutical industry. Nowadays, researchers are more interested in functional roles of GPCRs at the finest subtype level. However, with the accumulation of many new protein sequences, none of the existing methods can completely classify these GPCRs to their finest subtype level. In this paper, a pioneer work was performed trying to resolve this problem by using a hierarchical classification method. The first level determines whether a query protein is a GPCR or a non-GPCR. If it is considered as a GPCR, it will be finally classified to its finest subtype level. GPCRs are characterized by 170 sequence-derived features encapsulating both amino acid composition and physicochemical features of proteins, and support vector machines are used as the classification engine. To test the performance of the present method, a non-redundant dataset was built which are organized at seven levels and covers more functional classes of GPCRs than existing datasets. The number of protein sequences in each level is 5956, 2978, 8079, 8680, 6477, 1580 and 214, respectively. By 5-fold cross-validation test, the overall accuracy of 99.56%, 93.96%, 82.81%, 85.93%, 94.1%, 95.38% and 92.06% were observed at each level. When compared with some previous methods, the present method achieved a consistently higher overall accuracy. The results demonstrate the power and effectiveness of the proposed method to accomplish the classification of GPCRs to the finest subtype level.  相似文献   

4.
G protein-coupled receptors (GPCRs) are among the most frequent targets of therapeutic drugs. With the avalanche of newly generated protein sequences in the post genomic age, to expedite the process of drug discovery, it is highly desirable to develop an automated method to rapidly identify GPCRs and their types. A new predictor was developed by hybridizing two different modes of pseudo-amino acid composition (PseAAC): the functional domain PseAAC and the low-frequency Fourier spectrum PseAAC. The new predictor is called GPCR-2L, where "2L" means that it is a two-layer predictor: the 1st layer prediction engine is to identify a query protein as GPCR or not; if it is, the prediction will be automatically continued to further identify it as belonging to one of the following six types: (1) rhodopsin-like (Class A), (2) secretin-like (Class B), (3) metabotropic glutamate/pheromone (Class C), (4) fungal pheromone (Class D), (5) cAMP receptor (Class E), or (6) frizzled/smoothened family (Class F). The overall success rate of GPCR-2L in identifying proteins as GPCRs or non-GPCRs is over 97.2%, while identifying GPCRs among their six types is over 97.8%. Such high success rates were derived by the rigorous jackknife cross-validation on a stringent benchmark dataset, in which none of the included proteins had ≥40% pairwise sequence identity to any other protein in a same subset. As a user-friendly web-server, GPCR-2L is freely accessible to the public at http://icpr.jci.edu.cn/, by which one can obtain the 2-level results in about 20 s for a query protein sequence of 500 amino acids. The longer the sequence is, the more time it may usually need. The high success rates reported here indicate that it is a quite effective approach to identify GPCRs and their types with the functional domain information and the low-frequency Fourier spectrum analysis. It is anticipated that GPCR-2L may become a useful tool for both basic research and drug development in the areas related to GPCRs.  相似文献   

5.
To evaluate the possibility of an unknown protein to be a resistant gene against Xanthomonas oryzae pv. oryzae, a different mode of pseudo amino acid composition (PseAAC) is proposed to formulate the protein samples by integrating the amino acid composition, as well as the Chaos games representation (CGR) method. Some numerical comparisons of triangle, quadrangle and 12-vertex polygon CGR are carried to evaluate the efficiency of using these fractal figures in classifiers. The numerical results show that among the three polygon methods, triangle method owns a good fractal visualization and performs the best in the classifier construction. By using triangle + 12-vertex polygon CGR as the mathematical feature, the classifier achieves 98.13% in Jackknife test and MCC achieves 0.8462.  相似文献   

6.
Huang JH  Cao DS  Yan J  Xu QS  Hu QN  Liang YZ 《Biochimie》2012,94(8):1697-1704
As the most frequent drug target, G protein-coupled receptors (GPCRs) are a large family of seven trans-membrane receptors that sense molecules outside the cell and activate inside signal transduction pathways. The activity and lifetime of activated receptors are regulated by receptor phosphorylation. Therefore, investigating the exact positions of phosphorylation sites in GPCRs sequence could provide useful clues for drug design and other biotechnology applications. Experimental identification of phosphorylation sites is expensive and laborious. Hence, there is significant interest in the development of computational methods for reliable prediction of phosphorylation sites from amino acid sequences. In this article, we presented a simple and effective method to recognize phosphorylation sites of human GPCRs by combining amino acid hydrophobicity and support vector machine. The prediction accuracy, sensitivity, specificity, Matthews correlation coefficient and area under the curve values for phosphoserine, phosphothreonine, and phosphotyrosine were 0.964, 0.790, 0.999, 0.866, 0.941; 0.954, 0.800, 0.985, 0.828, 0.958; and 0.976, 0.820, 0.993, 0.861, 0.959, respectively. The establishment of such a fast and accurate prediction method will speed up the pace of identifying proper GPCRs sites to facilitate drug discovery.  相似文献   

7.
We have developed an alignment-independent method for classification of G-protein coupled receptors (GPCRs) according to the principal chemical properties of their amino acid sequences. The method relies on a multivariate approach where the primary amino acid sequences are translated into vectors based on the principal physicochemical properties of the amino acids and transformation of the data into a uniform matrix by applying a modified autocross-covariance transform. The application of principal component analysis to a data set of 929 class A GPCRs showed a clear separation of the major classes of GPCRs. The application of partial least squares projection to latent structures created a highly valid model (cross-validated correlation coefficient, Q(2) = 0.895) that gave unambiguous classification of the GPCRs in the training set according to their ligand binding class. The model was further validated by external prediction of 535 novel GPCRs not included in the training set. Of the latter, only 14 sequences, confined in rapidly expanding GPCR classes, were mispredicted. Moreover, 90 orphan GPCRs out of 165 were tentatively identified to GPCR ligand binding class. The alignment-independent method could be used to assess the importance of the principal chemical properties of every single amino acid in the protein sequences for their contributions in explaining GPCR family membership. It was then revealed that all amino acids in the unaligned sequences contributed to the classifications, albeit to varying extent; the most important amino acids being those that could also be determined to be conserved by using traditional alignment-based methods.  相似文献   

8.
A new method has been developed to predict the enzymatic attribute of proteins by hybridizing the gene product composition and pseudo amino acid composition. As a demonstration, a working dataset was generated with a cutoff of 60% sequence identity to avoid redundancy and bias in statistical prediction. The dataset thus constructed contains 39989 protein sequences, of which 27469 are non-enzymes and 12520 enzymes that were further classified into 6 enzyme family classes according to their 6 main EC (Enzyme Commission) numbers (2314 are oxidoreductases, 3653 transferases, 3246 hydrolases, 1307 lyases, 676 isomerases, and 1324 ligases). The overall success rate by the jackknife test for the identification between enzyme and non-enzyme was 94%, and that for the identification among the 6 enzyme family classes was 98%. It is anticipated that, with the rapid increase of protein sequences entering into databanks, the current method will become a useful automated tool in identifying the enzymatic attribute of a newly found protein sequence.  相似文献   

9.
G protein-coupled receptors (GPCRs) are part of multi-protein networks called ‘receptosomes’. These GPCR interacting proteins (GIPs) in the receptosomes control the targeting, trafficking and signaling of GPCRs. PDZ domain proteins constitute the largest protein family among the GIPs, and the predominant function of the PDZ domain proteins is to assemble signaling pathway components into close proximity by recognition of the last four C-terminal amino acids of GPCRs. We present here a machine learning based approach for the identification of GPCR-binding PDZ domain proteins. In order to characterize the network of interactions between amino acid residues that contribute to the stability of the PDZ domain-ligand complex and to encode the complex into a feature vector, amino acid contact matrices and physicochemical distance matrix were constructed and adopted. This novel machine learning based method displayed high performance for the identification of PDZ domain-ligand interactions and allowed the identification of novel GPCR-PDZ domain protein interactions.  相似文献   

10.
Gao Y  Shao S  Xiao X  Ding Y  Huang Y  Huang Z  Chou KC 《Amino acids》2005,28(4):373-376
Summary. With the avalanche of new protein sequences we are facing in the post-genomic era, it is vitally important to develop an automated method for fast and accurately determining the subcellular location of uncharacterized proteins. In this article, based on the concept of pseudo amino acid composition (Chou, K.C. Proteins: Structure, Function, and Genetics, 2001, 43: 246–255), three pseudo amino acid components are introduced via Lyapunov index, Bessel function, Chebyshev filter that can be more efficiently used to deal with the chaos and complexity in protein sequences, leading to a higher success rate in predicting protein subcellular location.  相似文献   

11.
Involved in many diseases such as cancer, diabetes, neurodegenerative, inflammatory and respiratory disorders, G-protein-coupled receptors (GPCRs) are among the most frequent targets of therapeutic drugs. It is time-consuming and expensive to determine whether a drug and a GPCR are to interact with each other in a cellular network purely by means of experimental techniques. Although some computational methods were developed in this regard based on the knowledge of the 3D (dimensional) structure of protein, unfortunately their usage is quite limited because the 3D structures for most GPCRs are still unknown. To overcome the situation, a sequence-based classifier, called “iGPCR-drug”, was developed to predict the interactions between GPCRs and drugs in cellular networking. In the predictor, the drug compound is formulated by a 2D (dimensional) fingerprint via a 256D vector, GPCR by the PseAAC (pseudo amino acid composition) generated with the grey model theory, and the prediction engine is operated by the fuzzy K-nearest neighbour algorithm. Moreover, a user-friendly web-server for iGPCR-drug was established at http://www.jci-bioinfo.cn/iGPCR-Drug/. For the convenience of most experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results without the need to follow the complicated math equations presented in this paper just for its integrity. The overall success rate achieved by iGPCR-drug via the jackknife test was 85.5%, which is remarkably higher than the rate by the existing peer method developed in 2010 although no web server was ever established for it. It is anticipated that iGPCR-Drug may become a useful high throughput tool for both basic research and drug development, and that the approach presented here can also be extended to study other drug – target interaction networks.  相似文献   

12.
The outer membrane proteins (OMPs) are β-barrel membrane proteins that performed lots of biology functions. The discriminating OMPs from other non-OMPs is a very important task for understanding some biochemical process. In this study, a method that combines increment of diversity with modified Mahalanobis Discriminant, called IDQD, is presented to predict 208 OMPs, 206 transmembrane helical proteins (TMHPs) and 673 globular proteins (GPs) by using Chou's pseudo amino acid compositions as parameters. The overall accuracy of jackknife cross-validation is 93.2% and 96.1%, respectively, for three datasets (OMPs, TMHPs and GPs) and two datasets (OMPs and non-OMPs). These predicted results suggest that the method can be effectively applied to discriminate OMPs, TMHPs and GPs. And it also indicates that the pseudo amino acid composition can better reflect the core feature of membrane proteins than the classical amino acid composition.  相似文献   

13.
It is a critical challenge to develop automated methods for fast and accurately determining the structures of proteins because of the increasingly widening gap between the number of sequence-known proteins and that of structure-known proteins in the post-genomic age. The knowledge of protein structural class can provide useful information towards the determination of protein structure. Thus, it is highly desirable to develop computational methods for identifying the structural classes of newly found proteins based on their primary sequence. In this study, according to the concept of Chou's pseudo amino acid composition (PseAA), eight PseAA vectors are used to represent protein samples. Each of the PseAA vectors is a 40-D (dimensional) vector, which is constructed by the conventional amino acid composition (AA) and a series of sequence-order correlation factors as original introduced by Chou. The difference among the eight PseAA representations is that different physicochemical properties are used to incorporate the sequence-order effects for the protein samples. Based on such a framework, a dual-layer fuzzy support vector machine (FSVM) network is proposed to predict protein structural classes. In the first layer of the FSVM network, eight FSVM classifiers trained by different PseAA vectors are established. The 2nd layer FSVM classifier is applied to reclassify the outputs of the first layer. The results thus obtained are quite promising, indicating that the new method may become a useful tool for predicting not only the structural classification of proteins but also their other attributes.  相似文献   

14.
Being the largest family of cell surface receptors, G-protein-coupled receptors (GPCRs) are among the most frequent targets of therapeutic drugs. The functions of many of GPCRs are unknown, and it is both time-consuming and expensive to determine their ligands and signaling pathways. This forces us to face a critical challenge: how to develop an automated method for classifying the family of GPCRs so as to help us in classifying drugs and expedite the process of drug discovery. Owing to their highly divergent nature, it is difficult to predict the classification of GPCRs by means of conventional sequence alignment approaches. To cope with such a situation, the CD (Covariant Discriminant) predictor was introduced to predict the families of GPCRs. The overall success rate thus obtained by jack-knife test for 1238 GPCRs classified into three main families, i.e., class A-"rhodopsin like", class B-"secretin like", and class C-"metabotrophic/glutamate/pheromone", was over 97%. The high success rate suggests that the CD predictor holds very high potential to become a useful tool for understanding the actions of drugs that target GPCRs and designing new medications with fewer side effects and greater efficacy.  相似文献   

15.
Translation is a key process for gene expression. Timely identification of the translation initiation site (TIS) is very important for conducting in-depth genome analysis. With the avalanche of genome sequences generated in the postgenomic age, it is highly desirable to develop automated methods for rapidly and effectively identifying TIS. Although some computational methods were proposed in this regard, none of them considered the global or long-range sequence-order effects of DNA, and hence their prediction quality was limited. To count this kind of effects, a new predictor, called “iTIS-PseTNC,” was developed by incorporating the physicochemical properties into the pseudo trinucleotide composition, quite similar to the PseAAC (pseudo amino acid composition) approach widely used in computational proteomics. It was observed by the rigorous cross-validation test on the benchmark dataset that the overall success rate achieved by the new predictor in identifying TIS locations was over 97%. As a web server, iTIS-PseTNC is freely accessible at http://lin.uestc.edu.cn/server/iTIS-PseTNC. To maximize the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the web server to obtain the desired results without the need to go through detailed mathematical equations, which are presented in this paper just for the integrity of the new prection method.  相似文献   

16.
Guo Y  Li M  Lu M  Wen Z  Huang Z 《Proteins》2006,65(1):55-60
Determining G-protein coupled receptors (GPCRs) coupling specificity is very important for further understanding the functions of receptors. A successful method in this area will benefit both basic research and drug discovery practice. Previously published methods rely on the transmembrane topology prediction at training step, even at prediction step. However, the transmembrane topology predicted by even the best algorithm is not of high accuracy. In this study, we developed a new method, autocross-covariance (ACC) transform based support vector machine (SVM), to predict coupling specificity between GPCRs and G-proteins. The primary amino acid sequences are translated into vectors based on the principal physicochemical properties of the amino acids and the data are transformed into a uniform matrix by applying ACC transform. SVMs for nonpromiscuous coupled GPCRs and promiscuous coupled GPCRs were trained and validated by jackknife test and the results thus obtained are very promising. All classifiers were also evaluated by the test datasets with good performance. Besides the high prediction accuracy, the most important feature of this method is that it does not require any transmembrane topology prediction at either training or prediction step but only the primary sequences of proteins. The results indicate that this relatively simple method is applicable. Academic users can freely download the prediction program at http://www.scucic.net/group/database/Service.asp.  相似文献   

17.
18.
基于不同标度伪氨基酸组成预测脂肪酶的类型   总被引:1,自引:0,他引:1  
从序列出发预测某蛋白质是否为脂肪酶以及属于哪种脂肪酶具有重要的理论和应用价值.提出了基于Z标度和T标度的伪氨基酸组成方法提取序列特征值,采用了k-近邻算法回答上述问题.经参数选择后,三种方法在各自最优运行参数下,其1倍交叉验证的结果为:对脂肪酶和非脂肪酶预测精度分别为92.8%、91.4%和91.3%;对脂肪酶类型预测的精度分别为92.3%、90.3%和89.7%.其中基于Z标度伪氨基酸组成效果最佳.基于T标度的次之,但均明显优于其他6种常见的特征值提取方法,并对其可能的原因进行了探讨.  相似文献   

19.
Shi JY  Zhang SW  Pan Q  Zhou GP 《Amino acids》2008,35(2):321-327
In the Post Genome Age, there is an urgent need to develop the reliable and effective computational methods to predict the subcellular localization for the explosion of newly found proteins. Here, a novel method of pseudo amino acid (PseAA) composition, the so-called “amino acid composition distribution” (AACD), is introduced. First, a protein sequence is divided equally into multiple segments. Then, amino acid composition of each segment is calculated in series. After that, each protein sequence can be represented by a feature vector. Finally, the feature vectors of all sequences thus obtained are further input into the multi-class support vector machines to predict the subcellular localization. The results show that AACD is quite effective in representing protein sequences for the purpose of predicting protein subcellular localization.  相似文献   

20.
Pattern recognition receptors (PRRs) play a key role in the innate immune response by recognizing pathogen associated molecular patterns derived from a diverse collection of microbial pathogens. PRRs form a superfamily of proteins related to host health and disease. Thus, prediction of PRR family might supply biologically significant information for functional annotation of PRRs and development of novel drugs. In this paper, a computational method is proposed for predicting the families of PRRs. The prediction was performed on the basis of amino acid composition and pseudo-amino acid composition (PseAAC) from primary sequences of proteins using support vector machines. A non-redundant dataset consisted of 332 PRRs in seven families was constructed to do training and testing. It was demonstrated that different families of PRRs were quite closely correlated with amino acid composition as well as PseAAC. In the jackknife test, overall accuracies of amino acid composition-based and PseAAC-based classifiers reached 96.1% and 97.9%, respectively. The results indicate that families of PRRs are predictable with high accuracy. It is anticipated that this computational method might be a powerful tool for the automated assignment of families of PRRs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号