共查询到20条相似文献,搜索用时 0 毫秒
1.
Apoptosis, or programmed cell death, plays an important role in development of an organism. Obtaining information on subcellular location of apoptosis proteins is very helpful to understand the apoptosis mechanism. In this paper, based on the concept that the position distribution information of amino acids is closely related with the structure and function of proteins, we introduce the concept of distance frequency [Matsuda, S., Vert, J.P., Ueda, N., Toh, H., Akutsu, T., 2005. A novel representation of protein sequences for prediction of subcellular location using support vector machines. Protein Sci. 14, 2804-2813] and propose a novel way to calculate distance frequencies. In order to calculate the local features, each protein sequence is separated into p parts with the same length in our paper. Then we use the novel representation of protein sequences and adopt support vector machine to predict subcellular location. The overall prediction accuracy is significantly improved by jackknife test. 相似文献
2.
Protein secondary structure prediction based on an improved support vector machines approach 总被引:7,自引:0,他引:7
The prediction of protein secondary structure is an important step in the prediction of protein tertiary structure. A new protein secondary structure prediction method, SVMpsi, was developed to improve the current level of prediction by incorporating new tertiary classifiers and their jury decision system, and the PSI-BLAST PSSM profiles. Additionally, efficient methods to handle unbalanced data and a new optimization strategy for maximizing the Q(3) measure were developed. The SVMpsi produces the highest published Q(3) and SOV94 scores on both the RS126 and CB513 data sets to date. For a new KP480 set, the prediction accuracy of SVMpsi was Q(3) = 78.5% and SOV94 = 82.8%. Moreover, the blind test results for 136 non-redundant protein sequences which do not contain homologues of training data sets were Q(3) = 77.2% and SOV94 = 81.8%. The SVMpsi results in CASP5 illustrate that it is another competitive method to predict protein secondary structure. 相似文献
3.
4.
5.
Knowledge of structural classes plays an important role in understanding protein folding patterns. In this paper, features based on the predicted secondary structure sequence and the corresponding E–H sequence are extracted. Then, an 11-dimensional feature vector is selected based on a wrapper feature selection algorithm and a support vector machine (SVM). Among the 11 selected features, 4 novel features are newly designed to model the differences between α/β class and α + β class, and other 7 rational features are proposed by previous researchers. To examine the performance of our method, a total of 5 datasets are used to design and test the proposed method. The results show that competitive prediction accuracies can be achieved by the proposed method compared to existing methods (SCPRED, RKS-PPSC and MODAS), and 4 new features are demonstrated essential to differentiate α/β and α + β classes. Standalone version of the proposed method is written in JAVA language and it can be downloaded from http://web.xidian.edu.cn/slzhang/paper.html. 相似文献
6.
We present a new method for protein secondary structure prediction, based on the recognition of well-defined pentapeptides, in a large databank. Using a databank of 635 protein chains, we obtained a success rate of 68.6%. We show that progress is achieved when the databank is enlarged, when the 20 amino acids are adequately grouped in 10 sets and when more pentapeptides are attributed one of the defined conformations, alpha-helices or beta-strands. The analysis of the model indicates that the essential variable is the number of pentapeptides of well-defined structure in the database. Our model is simple, does not rely on arbitrary parameters and allows the analysis in detail of the results of each chosen hypothesis. 相似文献
7.
A novel method for protein secondary structure prediction using dual-layer SVM and profiles 总被引:2,自引:0,他引:2
A high-performance method was developed for protein secondary structure prediction based on the dual-layer support vector machine (SVM) and position-specific scoring matrices (PSSMs). SVM is a new machine learning technology that has been successfully applied in solving problems in the field of bioinformatics. The SVM's performance is usually better than that of traditional machine learning approaches. The performance was further improved by combining PSSM profiles with the SVM analysis. The PSSMs were generated from PSI-BLAST profiles, which contain important evolution information. The final prediction results were generated from the second SVM layer output. On the CB513 data set, the three-state overall per-residue accuracy, Q3, reached 75.2%, while segment overlap (SOV) accuracy increased to 80.0%. On the CB396 data set, the Q3 of our method reached 74.0% and the SOV reached 78.1%. A web server utilizing the method has been constructed and is available at http://www.bioinfo.tsinghua.edu.cn/pmsvm. 相似文献
8.
Amino acid sequence patterns have been used to identify the location of turns in globular proteins [Cohen et al. (1986) Biochemistry 25, 266-275]. We have developed sequence patterns that facilitate the prediction of helices in all helical proteins. Regular expression patterns recognize the component parts of a helix: the amino terminus (N-cap), the core of the helix (core), and the carboxy terminus (C-cap). These patterns recognize the core features of helices with a 95% success rate and the N- and C-capping features with success rates of 56% and 48%, respectively. A metapattern language, ALPPS, coordinates the recognition of turns and helical components in a scheme that predicts the location and extent of alpha-helices. On the basis of raw residue scoring, a 71% success rate is observed. By focusing on the recognition of core helical features, we achieve a 78% success rate. Amended scoring procedures are presented and discussed, and comparisons are made to other predictive schemes. 相似文献
9.
MOTIVATION: Subcellular localization is a key functional characteristic of proteins. A fully automatic and reliable prediction system for protein subcellular localization is needed, especially for the analysis of large-scale genome sequences. RESULTS: In this paper, Support Vector Machine has been introduced to predict the subcellular localization of proteins from their amino acid compositions. The total prediction accuracies reach 91.4% for three subcellular locations in prokaryotic organisms and 79.4% for four locations in eukaryotic organisms. Predictions by our approach are robust to errors in the protein N-terminal sequences. This new approach provides superior prediction performance compared with existing algorithms based on amino acid composition and can be a complementary method to other existing methods based on sorting signals. AVAILABILITY: A web server implementing the prediction method is available at http://www.bioinfo.tsinghua.edu.cn/SubLoc/. SUPPLEMENTARY INFORMATION: Supplementary material is available at http://www.bioinfo.tsinghua.edu.cn/SubLoc/. 相似文献
10.
Background
Protein secondary structure prediction method based on probabilistic models such as hidden Markov model (HMM) appeals to many because it provides meaningful information relevant to sequence-structure relationship. However, at present, the prediction accuracy of pure HMM-type methods is much lower than that of machine learning-based methods such as neural networks (NN) or support vector machines (SVM). 相似文献11.
Secondary structure prediction with support vector machines 总被引:8,自引:0,他引:8
MOTIVATION: A new method that uses support vector machines (SVMs) to predict protein secondary structure is described and evaluated. The study is designed to develop a reliable prediction method using an alternative technique and to investigate the applicability of SVMs to this type of bioinformatics problem. METHODS: Binary SVMs are trained to discriminate between two structural classes. The binary classifiers are combined in several ways to predict multi-class secondary structure. RESULTS: The average three-state prediction accuracy per protein (Q(3)) is estimated by cross-validation to be 77.07 +/- 0.26% with a segment overlap (Sov) score of 73.32 +/- 0.39%. The SVM performs similarly to the 'state-of-the-art' PSIPRED prediction method on a non-homologous test set of 121 proteins despite being trained on substantially fewer examples. A simple consensus of the SVM, PSIPRED and PROFsec achieves significantly higher prediction accuracy than the individual methods. 相似文献
12.
13.
14.
A novel method for predicting the secondary structures of proteins from amino acid sequence has been presented. The protein
secondary structure seqlets that are analogous to the words in natural language have been extracted. These seqlets will capture
the relationship between amino acid sequence and the secondary structures of proteins and further form the protein secondary
structure dictionary. To be elaborate, the dictionary is organism-specific. Protein secondary structure prediction is formulated
as an integrated word segmentation and part of speech tagging problem. The word-lattice is used to represent the results of
the word segmentation and the maximum entropy model is used to calculate the probability of a seqlet tagged as a certain secondary
structure type. The method is markovian in the seqlets, permitting efficient exact calculation of the posterior probability
distribution over all possible word segmentations and their tags by viterbi algorithm. The optimal segmentations and their
tags are computed as the results of protein secondary structure prediction. The method is applied to predict the secondary
structures of proteins of four organisms respectively and compared with the PHD method. The results show that the performance
of this method is higher than that of PHD by about 3.9% Q3 accuracy and 4.6% SOV accuracy. Combining with the local similarity
protein sequences that are obtained by BLAST can give better prediction. The method is also tested on the 50 CASP5 target
proteins with Q3 accuracy 78.9% and SOV accuracy 77.1%. A web server for protein secondary structure prediction has been constructed which
is available at http://www.insun.hit.edu.cn:81/demos/biology/index.html. 相似文献
15.
DONG Qiwen WANG Xiaolong LIN Lei & GUAN Yi School of Computer Science Technology Harbin Institute of Technology Harbin China 《中国科学:生命科学英文版》2005,48(4):394-405
1 Introduction The prediction of protein structure and function from amino acid sequences is one of the most impor-tant problems in molecular biology. This problem is becoming more pressing as the number of known pro-tein sequences is explored as a result of genome and other sequencing projects, and the protein sequence- structure gap is widening rapidly[1]. Therefore, com-putational tools to predict protein structures are needed to narrow the widening gap. Although the prediction of three dim… 相似文献
16.
Protein S-nitrosylation plays a key and specific role in many cellular processes. Detecting possible S-nitrosylated substrates and their corresponding exact sites is crucial for studying the mechanisms of these biological processes. Comparing with the expensive and time-consuming biochemical experiments, the computational methods are attracting considerable attention due to their convenience and fast speed. Although some computational models have been developed to predict S-nitrosylation sites, their accuracy is still low. In this work,we incorporate support vector machine to predict protein S-nitrosylation sites. After a careful evaluation of six encoding schemes, we propose a new efficient predictor, CPR-SNO, using the coupling patterns based encoding scheme. The performance of our CPR-SNO is measured with the area under the ROC curve (AUC) of 0.8289 in 10-fold cross validation experiments, which is significantly better than the existing best method GPS-SNO 1.0's 0.685 performance. In further annotating large-scale potential S-nitrosylated substrates, CPR-SNO also presents an encouraging predictive performance. These results indicate that CPR-SNO can be used as a competitive protein S-nitrosylation sites predictor to the biological community. Our CPR-SNO has been implemented as a web server and is available at http://math.cau.edu.cn/CPR -SNO/CPR-SNO.html. 相似文献
17.
18.
Pugalenthi G Kandaswamy KK Suganthan PN Sowdhamini R Martinetz T Kolatkar PR 《Journal of biomolecular structure & dynamics》2010,28(3):405-414
Knowledge of three dimensional structure is essential to understand the function of a protein. Although the overall fold is made from the whole details of its sequence, a small group of residues, often called as structural motifs, play a crucial role in determining the protein fold and its stability. Identification of such structural motifs requires sufficient number of sequence and structural homologs to define conservation and evolutionary information. Unfortunately, there are many structures in the protein structure databases have no homologous structures or sequences. In this work, we report an SVM method, SMpred, to identify structural motifs from single protein structure without using sequence and structural homologs. SMpred method was trained and tested using 132 proteins domains containing 581 motifs. SMpred method achieved 78.79% accuracy with 79.06% sensitivity and 78.53% specificity. The performance of SMpred was evaluated with MegaMotifBase using 188 proteins containing 1161 motifs. Out of 1161 motifs, SMpred correctly identified 1503 structural motifs reported in MegaMotifBase. Further, we showed that SMpred is useful approach for the length deviant superfamilies and single member superfamilies. This result suggests the usefulness of our approach for facilitating the identification of structural motifs in protein structure in the absence of sequence and structural homologs. The dataset and executable for the SMpred algorithm is available at http://www3.ntu.edu.sg/home/EPNSugan/index_files/SMpred.htm. 相似文献
19.
20.
We develop a knowledge-based approach (called PROSP) for protein secondary structure prediction. The knowledge base contains small peptide fragments together with their secondary structural information. A quantitative measure M, called match rate, is defined to measure the amount of structural information that a target protein can extract from the knowledge base. Our experimental results show that proteins with a higher match rate will likely be predicted more accurately based on PROSP. That is, there is roughly a monotone correlation between the prediction accuracy and the amount of structure matching with the knowledge base. To fully utilize the strength of our knowledge base, a hybrid prediction method is proposed as follows: if the match rate of a target protein is at least 80%, we use the extracted information to make the prediction; otherwise, we adopt a popular machine-learning approach. This comprises our hybrid protein structure prediction (HYPROSP) approach. We use the DSSP and EVA data as our datasets and PSIPRED as our underlying machine-learning algorithm. For target proteins with match rate at least 80%, the average Q3 of PROSP is 3.96 and 7.2 better than that of PSIPRED on DSSP and EVA data, respectively. 相似文献