首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A computational system for the prediction and classification of human G-protein coupled receptors (GPCRs) has been developed based on the support vector machine (SVM) method and protein sequence information. The feature vectors used to develop the SVM prediction models consist of statistically significant features selected from single amino acid, dipeptide, and tripeptide compositions of protein sequences. Furthermore, the length distribution difference between GPCRs and non-GPCRs has also been exploited to improve the prediction performance. The testing results with annotated human protein sequences demonstrate that this system can get good performance for both prediction and classification of human GPCRs.  相似文献   

2.
We present a system for multi-class protein classification based on neural networks. The basic issue concerning the construction of neural network systems for protein classification is the sequence encoding scheme that must be used in order to feed the neural network. To deal with this problem we propose a method that maps a protein sequence into a numerical feature space using the matching scores of the sequence to groups of conserved patterns (called motifs) into protein families. We consider two alternative ways for identifying the motifs to be used for feature generation and provide a comparative evaluation of the two schemes. We also evaluate the impact of the incorporation of background features (2-grams) on the performance of the neural system. Experimental results on real datasets indicate that the proposed method is highly efficient and is superior to other well-known methods for protein classification.  相似文献   

3.
G-protein coupled receptor (GPCR) is a protein family that is found only in the Eukaryotes. They are used for the interfacing of cell to the outside world and are involved in many physiological processes. Their role in drug development is evident. Hence, the prediction of GPCRs is very much demanding. Because of the unavailability of 3D structures of most of the GPCRs; the statistical and machine learning based prediction of GPCRs is much demanding. The GPCRs are classified into family, sub family and sub-sub family levels in the proposed approach. We have extracted features using the hybrid combination of Pseudo amino acid, Fast Fourier Transform and Split amino acid techniques. The overall feature vector is then reduced using Principle component analysis. Mostly, GPCRs are composed of two or more sub units. The arrangement and number of sub units forming a GPCR are referred to as quaternary structure. The functions of GPCRs are closely related to their quaternary structure. The classification in the present research is performed using grey incidence degree (GID) measure, which can efficiently analyze the numerical relation between various components of GPCRs. The GID measure based classification has shown remarkable improvement in predicting GPCRs.  相似文献   

4.
G-protein coupled receptor (GPCR) is a membrane protein family, which serves as an interface between cell and the outside world. They are involved in various physiological processes and are the targets of more than 50% of the marketed drugs. The function of GPCRs can be known by conducting Biological experiments. However, the rapid increase of GPCR sequences entering into databanks, it is very time consuming and expensive to determine their function based only on experimental techniques. Hence, the computational prediction of GPCRs is very much demanding for both pharmaceutical and educational research. Feature extraction of GPCRs in the proposed research is performed using three techniques i.e. Pseudo amino acid composition, Wavelet based multi-scale energy and Evolutionary information based feature extraction by utilizing the position specific scoring matrices. For classification purpose, a majority voting based ensemble method is used; whose weights are optimized using genetic algorithm. Four classifiers are used in the ensemble i.e. Nearest Neighbor, Probabilistic Neural Network, Support Vector Machine and Grey Incidence Degree. The performance of the proposed method is assessed using Jackknife test for a number of datasets. First, the individual performances of classifiers are assessed for each dataset using Jackknife test. After that, the performance for each dataset is improved by using weighted ensemble classification. The weights of ensemble are optimized using various runs of Genetic Algorithm. We have compared our method with various other methods. The significance in performance of the proposed method depicts it to be useful for GPCRs classification.  相似文献   

5.
海南岛热带草地的数量分类和排序研究   总被引:1,自引:0,他引:1       下载免费PDF全文
本文用一些数量分类和排序的方法对海南岛鹦歌岭热带草地进行了分类和排序。所用的方法包括两种多元等级聚合分类--最近邻体法(NN)和最远邻体法(FN),极点排序(PO)和主分量分析(PCA)排序。结果表明:把19个样地分为三大类型、9个群落,其分布格局与坡度、放牧强度和土壤肥力密切相关。所用的四种方法在热带草地的研究中均有一定的适用性。  相似文献   

6.
G-protein-coupled receptors (GPCRs) are the largest family of cell surface receptors that, via trimetric guanine nucleotide-binding proteins (G-proteins), initiate some signaling pathways in the eukaryotic cell. Many diseases involve malfunction of GPCRs making their role evident in drug discovery. Thus, the automatic prediction of GPCRs can be very helpful in the pharmaceutical industry. However, prediction of GPCRs, their families, and their subfamilies is a challenging task. In this article, GPCRs are classified into families, subfamilies, and sub-subfamilies using pseudo-amino-acid composition and multiscale energy representation of different physiochemical properties of amino acids. The aim of the current research is to assess different feature extraction strategies and to develop a hybrid feature extraction strategy that can exploit the discrimination capability in both the spatial and transform domains for GPCR classification. Support vector machine, nearest neighbor, and probabilistic neural network are used for classification purposes. The overall performance of each classifier is computed individually for each feature extraction strategy. It is observed that using the jackknife test the proposed GPCR–hybrid method provides the best results reported so far. The GPCR–hybrid web predictor to help researchers working on GPCRs in the field of biochemistry and bioinformatics is available at http://111.68.99.218/GPCR.  相似文献   

7.
8.
G protein-coupled receptors (GPCRs) are part of multi-protein networks called ‘receptosomes’. These GPCR interacting proteins (GIPs) in the receptosomes control the targeting, trafficking and signaling of GPCRs. PDZ domain proteins constitute the largest protein family among the GIPs, and the predominant function of the PDZ domain proteins is to assemble signaling pathway components into close proximity by recognition of the last four C-terminal amino acids of GPCRs. We present here a machine learning based approach for the identification of GPCR-binding PDZ domain proteins. In order to characterize the network of interactions between amino acid residues that contribute to the stability of the PDZ domain-ligand complex and to encode the complex into a feature vector, amino acid contact matrices and physicochemical distance matrix were constructed and adopted. This novel machine learning based method displayed high performance for the identification of PDZ domain-ligand interactions and allowed the identification of novel GPCR-PDZ domain protein interactions.  相似文献   

9.
Recent advances in next-generation sequencing technologies have resulted in an exponential increase in the rate at which protein sequence data are being acquired. The k-gram feature representation, commonly used for protein sequence classification, usually results in prohibitively high dimensional input spaces, for large values of k. Applying data mining algorithms to these input spaces may be intractable due to the large number of dimensions. Hence, using dimensionality reduction techniques can be crucial for the performance and the complexity of the learning algorithms. In this paper, we study the applicability of feature hashing to protein sequence classification, where the original high-dimensional space is "reduced" by hashing the features into a low-dimensional space, using a hash function, i.e., by mapping features into hash keys, where multiple features can be mapped (at random) to the same hash key, and "aggregating" their counts. We compare feature hashing with the "bag of k-grams" approach. Our results show that feature hashing is an effective approach to reducing dimensionality on protein sequence classification tasks.  相似文献   

10.
11.
Agonist activation of a G protein-coupled receptor (GPCR) results in the redistribution of the receptor protein away from the cell surface into internal cellular compartments through a process of endocytosis known as internalization. Visualization of receptor internalization has become experimentally practicable by using fluorescent reagents such as green fluorescent protein (GFP). In this study, we examined whether the ligand-mediated internalization of a GPCR can be exploited for pharmacological evaluations. We acquired fluorescent images of cells expressing GFP-labeled GPCRs and evaluated the ligand-mediated internalization quantitatively by image processing. Using beta2-adrenoceptor and vasopressin V1a receptor as model GPCRs that couple to Gs and Gq, respectively, we first examined whether these GFP-tagged GPCRs exhibited appropriate pharmacology. The rank order of receptor internalization potency for a variety of agonists and antagonists specific to each receptor corresponded well with that previously observed in ligand binding studies. In addition to chemical ligand-induced internalization, this cell-based fluorescence imaging system successfully monitored the internalization of the proton-sensing GPCR TDAG8, and that of the free fatty acid-sensitive GPCR GPR120. The results show that monitoring receptor internalization can be a useful approach for pharmacological characterization of GPCRs and in fishing for ligands of orphan GPCRs.  相似文献   

12.
This paper mainly focuses on how to effectively and efficiently measure visual similarity for local feature based representation. Among existing methods, metrics based on Bag of Visual Word (BoV) techniques are efficient and conceptually simple, at the expense of effectiveness. By contrast, kernel based metrics are more effective, but at the cost of greater computational complexity and increased storage requirements. We show that a unified visual matching framework can be developed to encompass both BoV and kernel based metrics, in which local kernel plays an important role between feature pairs or between features and their reconstruction. Generally, local kernels are defined using Euclidean distance or its derivatives, based either explicitly or implicitly on an assumption of Gaussian noise. However, local features such as SIFT and HoG often follow a heavy-tailed distribution which tends to undermine the motivation behind Euclidean metrics. Motivated by recent advances in feature coding techniques, a novel efficient local coding based matching kernel (LCMK) method is proposed. This exploits the manifold structures in Hilbert space derived from local kernels. The proposed method combines advantages of both BoV and kernel based metrics, and achieves a linear computational complexity. This enables efficient and scalable visual matching to be performed on large scale image sets. To evaluate the effectiveness of the proposed LCMK method, we conduct extensive experiments with widely used benchmark datasets, including 15-Scenes, Caltech101/256, PASCAL VOC 2007 and 2011 datasets. Experimental results confirm the effectiveness of the relatively efficient LCMK method.  相似文献   

13.
Classifying G-protein coupled receptors with support vector machines   总被引:7,自引:0,他引:7  
MOTIVATION: The enormous amount of protein sequence data uncovered by genome research has increased the demand for computer software that can automate the recognition of new proteins. We discuss the relative merits of various automated methods for recognizing G-Protein Coupled Receptors (GPCRs), a superfamily of cell membrane proteins. GPCRs are found in a wide range of organisms and are central to a cellular signalling network that regulates many basic physiological processes. They are the focus of a significant amount of current pharmaceutical research because they play a key role in many diseases. However, their tertiary structures remain largely unsolved. The methods described in this paper use only primary sequence information to make their predictions. We compare a simple nearest neighbor approach (BLAST), methods based on multiple alignments generated by a statistical profile Hidden Markov Model (HMM), and methods, including Support Vector Machines (SVMs), that transform protein sequences into fixed-length feature vectors. RESULTS: The last is the most computationally expensive method, but our experiments show that, for those interested in annotation-quality classification, the results are worth the effort. In two-fold cross-validation experiments testing recognition of GPCR subfamilies that bind a specific ligand (such as a histamine molecule), the errors per sequence at the Minimum Error Point (MEP) were 13.7% for multi-class SVMs, 17.1% for our SVMtree method of hierarchical multi-class SVM classification, 25.5% for BLAST, 30% for profile HMMs, and 49% for classification based on nearest neighbor feature vector Kernel Nearest Neighbor (kernNN). The percentage of true positives recognized before the first false positive was 65% for both SVM methods, 13% for BLAST, 5% for profile HMMs and 4% for kernNN.  相似文献   

14.
Naveed M  Khan A  Khan AU 《Amino acids》2012,42(5):1809-1823
G protein-coupled receptors (GPCRs) are transmembrane proteins, which transduce signals from extracellular ligands to intracellular G protein. Automatic classification of GPCRs can provide important information for the development of novel drugs in pharmaceutical industry. In this paper, we propose an evolutionary approach, GPCR-MPredictor, which combines individual classifiers for predicting GPCRs. GPCR-MPredictor is a web predictor that can efficiently predict GPCRs at five levels. The first level determines whether a protein sequence is a GPCR or a non-GPCR. If the predicted sequence is a GPCR, then it is further classified into family, subfamily, sub-subfamily, and subtype levels. In this work, our aim is to analyze the discriminative power of different feature extraction and classification strategies in case of GPCRs prediction and then to use an evolutionary ensemble approach for enhanced prediction performance. Features are extracted using amino acid composition, pseudo amino acid composition, and dipeptide composition of protein sequences. Different classification approaches, such as k-nearest neighbor (KNN), support vector machine (SVM), probabilistic neural networks (PNN), J48, Adaboost, and Naives Bayes, have been used to classify GPCRs. The proposed hierarchical GA-based ensemble classifier exploits the prediction results of SVM, KNN, PNN, and J48 at each level. The GA-based ensemble yields an accuracy of 99.75, 92.45, 87.80, 83.57, and 96.17% at the five levels, on the first dataset. We further perform predictions on a dataset consisting of 8,000 GPCRs at the family, subfamily, and sub-subfamily level, and on two other datasets of 365 and 167 GPCRs at the second and fourth levels, respectively. In comparison with the existing methods, the results demonstrate the effectiveness of our proposed GPCR-MPredictor in classifying GPCRs families. It is accessible at .  相似文献   

15.
The relative solvent accessibility (RSA) of an amino acid residue in a protein structure is a real number that represents the solvent exposed surface area of this residue in relative terms. The problem of predicting the RSA from the primary amino acid sequence can therefore be cast as a regression problem. Nevertheless, RSA prediction has so far typically been cast as a classification problem. Consequently, various machine learning techniques have been used within the classification framework to predict whether a given amino acid exceeds some (arbitrary) RSA threshold and would thus be predicted to be "exposed," as opposed to "buried." We have recently developed novel methods for RSA prediction using nonlinear regression techniques which provide accurate estimates of the real-valued RSA and outperform classification-based approaches with respect to commonly used two-class projections. However, while their performance seems to provide a significant improvement over previously published approaches, these Neural Network (NN) based methods are computationally expensive to train and involve several thousand parameters. In this work, we develop alternative regression models for RSA prediction which are computationally much less expensive, involve orders-of-magnitude fewer parameters, and are still competitive in terms of prediction quality. In particular, we investigate several regression models for RSA prediction using linear L1-support vector regression (SVR) approaches as well as standard linear least squares (LS) regression. Using rigorously derived validation sets of protein structures and extensive cross-validation analysis, we compare the performance of the SVR with that of LS regression and NN-based methods. In particular, we show that the flexibility of the SVR (as encoded by metaparameters such as the error insensitivity and the error penalization terms) can be very beneficial to optimize the prediction accuracy for buried residues. We conclude that the simple and computationally much more efficient linear SVR performs comparably to nonlinear models and thus can be used in order to facilitate further attempts to design more accurate RSA prediction methods, with applications to fold recognition and de novo protein structure prediction methods.  相似文献   

16.
Imai T  Fujita N 《Proteins》2004,56(4):650-660
G-protein-coupled receptors (GPCRs) play a crucial role in signal transduction and receive a wide variety of ligands. GPCRs are a major target in drug design, as nearly 50% of all contemporary medicines act on GPCRs. GPCRs are membrane proteins possessing a common structural feature, seven transmembrane helices. In order to design an effective drug to act on a GPCR, knowledge of the three-dimensional (3D) structure of the target GPCR is indispensable. However, as GPCRs are membrane bound, their 3D structures are difficult to obtain. Thus we conducted statistical sequence analyses to find information about 3D structure and ligand binding using the receptors' primary sequences. We present statistical sequence analyses of 270 human GPCRs with regard to entropy (Shannon entropy in sequence alignment), hydrophobicity and volume, which are associated with the alpha-helical periodicity of the accessibility to the surrounding lipid. We found periodicity such that the phase changes once in the middle of each transmembrane region, both in the entropy plot and in the hydrophobicity plot. The phase shift in the entropy plot reflects the variety of ligands and the generality of the mechanism of signal transduction. The two periodic regions in the hydrophobicity plot indicate the regions facing the hydrophobic lipid chain and the polar phospholipid headgroup. We also found a simple periodicity in the plot of volume deviation, which suggests conservation of the stable structural packing among the transmembrane helices.  相似文献   

17.
We have developed an alignment-independent method for classification of G-protein coupled receptors (GPCRs) according to the principal chemical properties of their amino acid sequences. The method relies on a multivariate approach where the primary amino acid sequences are translated into vectors based on the principal physicochemical properties of the amino acids and transformation of the data into a uniform matrix by applying a modified autocross-covariance transform. The application of principal component analysis to a data set of 929 class A GPCRs showed a clear separation of the major classes of GPCRs. The application of partial least squares projection to latent structures created a highly valid model (cross-validated correlation coefficient, Q(2) = 0.895) that gave unambiguous classification of the GPCRs in the training set according to their ligand binding class. The model was further validated by external prediction of 535 novel GPCRs not included in the training set. Of the latter, only 14 sequences, confined in rapidly expanding GPCR classes, were mispredicted. Moreover, 90 orphan GPCRs out of 165 were tentatively identified to GPCR ligand binding class. The alignment-independent method could be used to assess the importance of the principal chemical properties of every single amino acid in the protein sequences for their contributions in explaining GPCR family membership. It was then revealed that all amino acids in the unaligned sequences contributed to the classifications, albeit to varying extent; the most important amino acids being those that could also be determined to be conserved by using traditional alignment-based methods.  相似文献   

18.
Breakthroughs in G protein-coupled receptor structure determination based on crystallography have been mainly obtained from receptors occupied in their transmembrane domain core by low molecular weight ligands, and we have only recently begun to elucidate how the extracellular surface of G protein-coupled receptors (GPCRs) allows for the binding of larger peptide molecules. In the present study, we used a unique chemoselective photoaffinity labeling strategy, the methionine proximity assay, to directly identify at physiological conditions a total of 38 discrete ligand/receptor contact residues that form the extracellular peptide-binding site of an activated GPCR, the angiotensin II type 1 receptor. This experimental data set was used in homology modeling to guide the positioning of the angiotensin II (AngII) peptide within several GPCR crystal structure templates. We found that the CXC chemokine receptor type 4 accommodated the results better than the other templates evaluated; ligand/receptor contact residues were spatially grouped into defined interaction clusters with AngII. In the resulting receptor structure, a β-hairpin fold in extracellular loop 2 in conjunction with two extracellular disulfide bridges appeared to open and shape the entrance of the ligand-binding site. The bound AngII adopted a somewhat vertical binding mode, allowing concomitant contacts across the extracellular surface and deep within the transmembrane domain core of the receptor. We propose that such a dualistic nature of GPCR interaction could be well suited for diffusible linear peptide ligands and a common feature of other peptidergic class A GPCRs.  相似文献   

19.

Background:

We participated in three of the protein-protein interaction subtasks of the Second BioCreative Challenge: classification of abstracts relevant for protein-protein interaction (interaction article subtask [IAS]), discovery of protein pairs (interaction pair subtask [IPS]), and identification of text passages characterizing protein interaction (interaction sentences subtask [ISS]) in full-text documents. We approached the abstract classification task with a novel, lightweight linear model inspired by spam detection techniques, as well as an uncertainty-based integration scheme. We also used a support vector machine and singular value decomposition on the same features for comparison purposes. Our approach to the full-text subtasks (protein pair and passage identification) includes a feature expansion method based on word proximity networks.

Results:

Our approach to the abstract classification task (IAS) was among the top submissions for this task in terms of measures of performance used in the challenge evaluation (accuracy, F-score, and area under the receiver operating characteristic curve). We also report on a web tool that we produced using our approach: the Protein Interaction Abstract Relevance Evaluator (PIARE). Our approach to the full-text tasks resulted in one of the highest recall rates as well as mean reciprocal rank of correct passages.

Conclusion:

Our approach to abstract classification shows that a simple linear model, using relatively few features, can generalize and uncover the conceptual nature of protein-protein interactions from the bibliome. Because the novel approach is based on a rather lightweight linear model, it can easily be ported and applied to similar problems. In full-text problems, the expansion of word features with word proximity networks is shown to be useful, although the need for some improvements is discussed.
  相似文献   

20.
《IRBM》2022,43(6):621-627
Objective: Steady-State Visual Evoked Potentials based Brain-Computer Interfaces (SSVEP-based BCIs) systems have been shown as promising technology due to their short response time and ease of use. SSVEP-based BCIs use brain responses to a flickering visual stimulus as an input command to an external application or device, and it can be influenced by stimulus properties, signal recording, and signal processing. We aim to investigate the system performance varying the stimuli spatial proximity (a stimulus property).Material and methods: We performed a comparative analysis of two visual interface designs (named cross and square) for an SSVEP-based BCI. The power spectrum density (PSD) was used as feature extraction and the Support Machine Vector (SVM) as classification method. We also analyzed the effects of five flickering frequencies (6.67, 8.57, 10, 12 e 15 Hz) between and within interfaces.Results: We found higher accuracy rates for the flickering frequencies of 10, 12, and 15 Hz. The stimulus of 10 Hz presented the highest SSVEP amplitude response for both interfaces. The system presented the best performance (highest classification accuracy and information transfer rate) using the cross interface (lower visual angle).Conclusion: Our findings suggest that the system has the highest performance in the spatial proximity range from 4° to 13° (visual angle). In addition, we conclude that as the stimulus spatial proximity increases, the interference from other stimuli reduces, and the SSVEP amplitude response decreases, which reduces system accuracy. The inter-stimulus distance is a visual interface parameter that must be chosen carefully to increase the efficiency of an SSVEP-based BCI.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号