首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Classifying G-protein coupled receptors with support vector machines   总被引:7,自引:0,他引:7  
MOTIVATION: The enormous amount of protein sequence data uncovered by genome research has increased the demand for computer software that can automate the recognition of new proteins. We discuss the relative merits of various automated methods for recognizing G-Protein Coupled Receptors (GPCRs), a superfamily of cell membrane proteins. GPCRs are found in a wide range of organisms and are central to a cellular signalling network that regulates many basic physiological processes. They are the focus of a significant amount of current pharmaceutical research because they play a key role in many diseases. However, their tertiary structures remain largely unsolved. The methods described in this paper use only primary sequence information to make their predictions. We compare a simple nearest neighbor approach (BLAST), methods based on multiple alignments generated by a statistical profile Hidden Markov Model (HMM), and methods, including Support Vector Machines (SVMs), that transform protein sequences into fixed-length feature vectors. RESULTS: The last is the most computationally expensive method, but our experiments show that, for those interested in annotation-quality classification, the results are worth the effort. In two-fold cross-validation experiments testing recognition of GPCR subfamilies that bind a specific ligand (such as a histamine molecule), the errors per sequence at the Minimum Error Point (MEP) were 13.7% for multi-class SVMs, 17.1% for our SVMtree method of hierarchical multi-class SVM classification, 25.5% for BLAST, 30% for profile HMMs, and 49% for classification based on nearest neighbor feature vector Kernel Nearest Neighbor (kernNN). The percentage of true positives recognized before the first false positive was 65% for both SVM methods, 13% for BLAST, 5% for profile HMMs and 4% for kernNN.  相似文献   

2.
A computational system for the prediction and classification of human G-protein coupled receptors (GPCRs) has been developed based on the support vector machine (SVM) method and protein sequence information. The feature vectors used to develop the SVM prediction models consist of statistically significant features selected from single amino acid, dipeptide, and tripeptide compositions of protein sequences. Furthermore, the length distribution difference between GPCRs and non-GPCRs has also been exploited to improve the prediction performance. The testing results with annotated human protein sequences demonstrate that this system can get good performance for both prediction and classification of human GPCRs.  相似文献   

3.
Although the sequence information on G-protein coupled receptors (GPCRs) continues to grow, many GPCRs remain orphaned (i.e. ligand specificity unknown) or poorly characterized with little structural information available, so an automated and reliable method is badly needed to facilitate the identification of novel receptors. In this study, a method of fast Fourier transform-based support vector machine has been developed for predicting GPCR subfamilies according to protein's hydrophobicity. In classifying Class B, C, D and F subfamilies, the method achieved an overall Matthew's correlation coefficient and accuracy of 0.95 and 93.3%, respectively, when evaluated using the jackknife test. The method achieved an accuracy of 100% on the Class B independent dataset. The results show that this method can classify GPCR subfamilies as well as their functional classification with high accuracy. A web server implementing the prediction is available at http://chem.scu.edu.cn/blast/Pred-GPCR.  相似文献   

4.
Yao Y  Zhang T  Xiong Y  Li L  Huo J  Wei DQ 《Biotechnology journal》2011,6(11):1367-1376
The support vector machine (SVM), an effective statistical learning method, has been widely used in mutation prediction. Two factors, i.e., feature selection and parameter setting, have shown great influence on the efficiency and accuracy of SVM classification. In this study, according to the principles of a genetic algorithm (GA) and SVM, we developed a GA-SVM program and applied it to human cytochrome P450s (CYP450s), which are important monooxygenases in phase I drug metabolism. The program optimizes features and parameters simultaneously, and hence fewer features are used and the overall prediction accuracy is improved. We focus on the mutation of non-synonymous single nucleotide polymorphisms (nsSNPs) in protein sequences that appear to exhibit significant influences on drug metabolism. The final predictive model has a quite satisfactory performance, with the prediction accuracy of 61% and cross-validation accuracy of 73%. The results indicate that the GA-SVM program is a powerful tool in optimizing mutation predictive models of nsSNPs of human CYP450s.  相似文献   

5.
G-protein coupled receptors (GPCRs) are involved in various physiological processes. Therefore, classification of amine type GPCRs is important for proper understanding of their functions. Though some effective methods have been developed, it still remains unknown how many and which features are essential for this task. Empirical studies show that feature selection might address this problem and provide us with some biologically useful knowledge. In this paper, a feature selection technique is introduced to identify those relevant features of proteins which are potentially important for the prediction of amine type GPCRs. The selected features are finally accepted to characterize proteins in a more compact form. High prediction accuracy is observed on two data sets with different sequence similarity by 5-fold cross-validation test. The comparison with a previous method demonstrates the efficiency and effectiveness of the proposed method.  相似文献   

6.
7.
Li ZC  Zhou XB  Lin YR  Zou XY 《Amino acids》2008,35(3):581-590
Structural class characterizes the overall folding type of a protein or its domain. Most of the existing methods for determining the structural class of a protein are based on a group of features that only possesses a kind of discriminative information for the prediction of protein structure class. However, different types of discriminative information associated with primary sequence have been completely missed, which undoubtedly has reduced the success rate of prediction. We present a novel method for the prediction of protein structure class by coupling the improved genetic algorithm (GA) with the support vector machine (SVM). This improved GA was applied to the selection of an optimized feature subset and the optimization of SVM parameters. Jackknife tests on the working datasets indicated that the prediction accuracies for the different classes were in the range of 97.8–100% with an overall accuracy of 99.5%. The results indicate that the approach has a high potential to become a useful tool in bioinformatics.  相似文献   

8.
研究表明,许多神经退行性疾病都与蛋白质在高尔基体中的定位有关,因此,正确识别亚高尔基体蛋白质对相关疾病药物的研制有一定帮助,本文建立了两类亚高尔基体蛋白质数据集,提取了氨基酸组分信息、联合三联体信息、平均化学位移、基因本体注释信息等特征信息,利用支持向量机算法进行预测,基于5-折交叉检验下总体预测成功率为87.43%。  相似文献   

9.
Bacteriorhodopsin is a light-driven hydrogen-ion pump whose structure is known to about 6.0 A in three dimensions and 2.8 A in projection. It consists of seven transmembrane helices surrounding the chromophore, retinal. Halorhodopsin is a second member of the same family of membrane proteins, both of them from the cell membrane of halobacteria. Halorhodopsin is a light-driven chloride-ion pump but has very close homology to bacteriorhodopsin, especially around the retinal. In contrast, the visual opsins that are responsible for the primary step in visual transduction in all eukaryotes from Drosophila upwards, form a separate family with no direct sequence homology to the bacteriorhodopsin family. The visual opsin family now includes about 15 other receptor proteins, all of which active G-protein cascades, including the beta-adrenergic receptor as well as several others. Despite the lack of clear relations at the level of amino acid sequence, there are topographical similarities between the bacteriorhodopsin and the visual opsin families in the nature and site of chromophore attachment, the number of transmembrane helices and the positions of the amino and carboxyl termini in the membrane. These suggest that if the two were at one time closely related, they have diverged too far to have sequences that are detectably similar.  相似文献   

10.
Li L  Jiang W  Li X  Moser KL  Guo Z  Du L  Wang Q  Topol EJ  Wang Q  Rao S 《Genomics》2005,85(1):16-23
Development of a robust and efficient approach for extracting useful information from microarray data continues to be a significant and challenging task. Microarray data are characterized by a high dimension, high signal-to-noise ratio, and high correlations between genes, but with a relatively small sample size. Current methods for dimensional reduction can further be improved for the scenario of the presence of a single (or a few) high influential gene(s) in which its effect in the feature subset would prohibit inclusion of other important genes. We have formalized a robust gene selection approach based on a hybrid between genetic algorithm and support vector machine. The major goal of this hybridization was to exploit fully their respective merits (e.g., robustness to the size of solution space and capability of handling a very large dimension of feature genes) for identification of key feature genes (or molecular signatures) for a complex biological phenotype. We have applied the approach to the microarray data of diffuse large B cell lymphoma to demonstrate its behaviors and properties for mining the high-dimension data of genome-wide gene expression profiles. The resulting classifier(s) (the optimal gene subset(s)) has achieved the highest accuracy (99%) for prediction of independent microarray samples in comparisons with marginal filters and a hybrid between genetic algorithm and K nearest neighbors.  相似文献   

11.
Summary Insect octopamine receptors are G-protein coupled receptors. They can be coupled to second messenger pathways to mediate either increases or decreases in intracellular cyclic AMP levels or the generation of intracellular calcium signals. Insect octopamine receptors were originally classified on the basis of second messenger changes induced in a variety of intact tissue preparations. Such a classification system is problematic if more than one receptor subtype is present in the same tissue preparation. Recent progress on the cloning and characterization in heterologous cell systems of octopamine receptors from Drosophila and other insects is reviewed. A new classification system for insect octopamine receptors into “α-adrenergic-like octopamine receptors (OctαRs)”, “β-adrenergic-like octopamine receptors (OctβRs)” and “octopamine/tyramine (or tyraminergic) receptors” is proposed based on their similarities in structure and in signalling properties with vertebrate adrenergic receptors. In future studies on the molecular basis of octopamine signalling in individual tissues it will be essential to identify the relative expression levels of the different classes of octopamine receptor present. In addition, it will be essential to identify if co-expression of such receptors in the same cells results in the formation of oligomeric receptors with specific emergent pharmacological and signalling properties.  相似文献   

12.
Yu CS  Lu CH 《PloS one》2011,6(5):e20445
For the first time, multiple sets of n-peptide compositions from antifreeze protein (AFP) sequences of various cold-adapted fish and insects were analyzed using support vector machine and genetic algorithms. The identification of AFPs is difficult because they exist as evolutionarily divergent types, and because their sequences and structures are present in limited numbers in currently available databases. Our results reveal that it is feasible to identify the shared sequential features among the various structural types of AFPs. Moreover, we were able to identify residues involved in ice binding without requiring knowledge of the three-dimensional structures of these AFPs. This approach should be useful for genomic and proteomic studies involving cold-adapted organisms.  相似文献   

13.
14.
This paper proposes a new power spectral-based hybrid genetic algorithm-support vector machines (SVMGA) technique to classify five types of electrocardiogram (ECG) beats, namely normal beats and four manifestations of heart arrhythmia. This method employs three modules: a feature extraction module, a classification module and an optimization module. Feature extraction module extracts electrocardiogram's spectral and three timing interval features. Non-parametric power spectral density (PSD) estimation methods are used to extract spectral features. Support vector machine (SVM) is employed as a classifier to recognize the ECG beats. We investigate and compare two such classification approaches. First they are specified experimentally by the trial and error method. In the second technique the approach optimizes the relevant parameters through an intelligent algorithm. These parameters are: Gaussian radial basis function (GRBF) kernel parameter σ and C penalty parameter of SVM classifier. Then their performances in classification of ECG signals are evaluated for eight files obtained from the MIT–BIH arrhythmia database. Classification accuracy of the SVMGA approach proves superior to that of the SVM which has constant and manually extracted parameter.  相似文献   

15.
Microarrays are a new technology that allows biologists to better understand the interactions between diverse pathologic state at the gene level. However, the amount of data generated by these tools becomes problematic, even though data are supposed to be automatically analyzed (e.g., for diagnostic purposes). The issue becomes more complex when the expression data involve multiple states. We present a novel approach to the gene selection problem in multi-class gene expression-based cancer classification, which combines support vector machines and genetic algorithms. This new method is able to select small subsets and still improve the classification accuracy.  相似文献   

16.
We have developed an alignment-independent method for classification of G-protein coupled receptors (GPCRs) according to the principal chemical properties of their amino acid sequences. The method relies on a multivariate approach where the primary amino acid sequences are translated into vectors based on the principal physicochemical properties of the amino acids and transformation of the data into a uniform matrix by applying a modified autocross-covariance transform. The application of principal component analysis to a data set of 929 class A GPCRs showed a clear separation of the major classes of GPCRs. The application of partial least squares projection to latent structures created a highly valid model (cross-validated correlation coefficient, Q(2) = 0.895) that gave unambiguous classification of the GPCRs in the training set according to their ligand binding class. The model was further validated by external prediction of 535 novel GPCRs not included in the training set. Of the latter, only 14 sequences, confined in rapidly expanding GPCR classes, were mispredicted. Moreover, 90 orphan GPCRs out of 165 were tentatively identified to GPCR ligand binding class. The alignment-independent method could be used to assess the importance of the principal chemical properties of every single amino acid in the protein sequences for their contributions in explaining GPCR family membership. It was then revealed that all amino acids in the unaligned sequences contributed to the classifications, albeit to varying extent; the most important amino acids being those that could also be determined to be conserved by using traditional alignment-based methods.  相似文献   

17.

Background  

MicroRNAs (miRNAs) are a group of short (~22 nt) non-coding RNAs that play important regulatory roles. MiRNA precursors (pre-miRNAs) are characterized by their hairpin structures. However, a large amount of similar hairpins can be folded in many genomes. Almost all current methods for computational prediction of miRNAs use comparative genomic approaches to identify putative pre-miRNAs from candidate hairpins. Ab initio method for distinguishing pre-miRNAs from sequence segments with pre-miRNA-like hairpin structures is lacking. Being able to classify real vs. pseudo pre-miRNAs is important both for understanding of the nature of miRNAs and for developing ab initio prediction methods that can discovery new miRNAs without known homology.  相似文献   

18.
In this study, we present a constructive algorithm for training cooperative support vector machine ensembles (CSVMEs). CSVME combines ensemble architecture design with cooperative training for individual SVMs in ensembles. Unlike most previous studies on training ensembles, CSVME puts emphasis on both accuracy and collaboration among individual SVMs in an ensemble. A group of SVMs selected on the basis of recursive classifier elimination is used in CSVME, and the number of the individual SVMs selected to construct CSVME is determined by 10-fold cross-validation. This kind of SVME has been tested on two ovarian cancer datasets previously obtained by proteomic mass spectrometry. By combining several individual SVMs, the proposed method achieves better performance than the SVME of all base SVMs.  相似文献   

19.
Apoptosis proteins have a central role in the development and homeostasis of an organism. These proteins are very important for understanding the mechanism of programmed cell death. As a result of genome and other sequencing projects, the gap between the number of known apoptosis protein sequences and the number of known apoptosis protein structures is widening rapidly. Because of this extremely unbalanced state, it would be worthwhile to develop a fast and reliable method to identify their subcellular locations so as to gain better insight into their biological functions. In view of this, a new method, in which the support vector machine combines with discrete wavelet transform, has been developed to predict the subcellular location of apoptosis proteins. The results obtained by the jackknife test were quite promising, and indicated that the proposed method can remarkably improve the prediction accuracy of subcellular locations, and might also become a useful high-throughput tool in characterizing other attributes of proteins, such as enzyme class, membrane protein type, and nuclear receptor subfamily according to their sequences.  相似文献   

20.
膜蛋白是一类结构独特的蛋白质,是细胞执行各种功能的物质基础。根据其在细胞膜上的不同存在方式,主要分为六种类型。本文利用压缩的氨基酸对原始膜蛋白序列进行信息压缩,再对压缩序列进行氨基酸组成和顺序特征的提取,最后采用支持向量机构建分类模型。通过五叠交叉验证的结果表明,该方法对于六种膜蛋白的分类预测,准确度最高可达98%以上,平均预测准确度在85%以上,可有效实现膜蛋白六种类型的划分,为进一步分析膜蛋白的结构和功能奠定基础。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号