首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A genetic algorithm (GA) for feature selection in conjunction with neural network was applied to predict protein structural classes based on single amino acid and all dipeptide composition frequencies. These sequence parameters were encoded as input features for a GA in feature selection procedure and classified with a three-layered neural network to predict protein structural classes. The system was established through optimization of the classification performance of neural network which was used as evaluation function. In this study, self-consistency and jackknife tests on a database containing 498 proteins were used to verify the performance of this hybrid method, and were compared with some of prior works. The adoption of a hybrid model, which encompasses genetic and neural technologies, demonstrated to be a promising approach in the task of protein structural class prediction.  相似文献   

2.
A neural network-based method has been developed for the prediction of beta-turns in proteins by using multiple sequence alignment. Two feed-forward back-propagation networks with a single hidden layer are used where the first-sequence structure network is trained with the multiple sequence alignment in the form of PSI-BLAST-generated position-specific scoring matrices. The initial predictions from the first network and PSIPRED-predicted secondary structure are used as input to the second structure-structure network to refine the predictions obtained from the first net. A significant improvement in prediction accuracy has been achieved by using evolutionary information contained in the multiple sequence alignment. The final network yields an overall prediction accuracy of 75.5% when tested by sevenfold cross-validation on a set of 426 nonhomologous protein chains. The corresponding Q(pred), Q(obs), and Matthews correlation coefficient values are 49.8%, 72.3%, and 0.43, respectively, and are the best among all the previously published beta-turn prediction methods. The Web server BetaTPred2 (http://www.imtech.res.in/raghava/betatpred2/) has been developed based on this approach.  相似文献   

3.
A pair of neural network-based algorithms is presented for predicting the tertiary structural class and the secondary structure of proteins. Each algorithm realizes improvements in accuracy based on information provided by the other. Structural class prediction of proteins nonhomologous to any in the training set is improved significantly, from 62.3% to 73.9%, and secondary structure prediction accuracy improves slightly, from 62.26% to 62.64%. A number of aspects of neural network optimization and testing are examined. They include network overtraining and an output filter based on a rolling average. Secondary structure prediction results vary greatly depending on the particular proteins chosen for the training and test sets; consequently, an appropriate measure of accuracy reflects the more unbiased approach of “jackknife” cross-validation (testing each protein in the database individually).  相似文献   

4.
Prediction of lipoprotein signal peptides in Gram-negative bacteria   总被引:41,自引:0,他引:41  
A method to predict lipoprotein signal peptides in Gram-negative Eubacteria, LipoP, has been developed. The hidden Markov model (HMM) was able to distinguish between lipoproteins (SPaseII-cleaved proteins), SPaseI-cleaved proteins, cytoplasmic proteins, and transmembrane proteins. This predictor was able to predict 96.8% of the lipoproteins correctly with only 0.3% false positives in a set of SPaseI-cleaved, cytoplasmic, and transmembrane proteins. The results obtained were significantly better than those of previously developed methods. Even though Gram-positive lipoprotein signal peptides differ from Gram-negatives, the HMM was able to identify 92.9% of the lipoproteins included in a Gram-positive test set. A genome search was carried out for 12 Gram-negative genomes and one Gram-positive genome. The results for Escherichia coli K12 were compared with new experimental data, and the predictions by the HMM agree well with the experimentally verified lipoproteins. A neural network-based predictor was developed for comparison, and it gave very similar results. LipoP is available as a Web server at www.cbs.dtu.dk/services/LipoP/.  相似文献   

5.
Kaur H  Raghava GP 《Proteins》2004,55(1):83-90
In this paper a systematic attempt has been made to develop a better method for predicting alpha-turns in proteins. Most of the commonly used approaches in the field of protein structure prediction have been tried in this study, which includes statistical approach "Sequence Coupled Model" and machine learning approaches; i) artificial neural network (ANN); ii) Weka (Waikato Environment for Knowledge Analysis) Classifiers and iii) Parallel Exemplar Based Learning (PEBLS). We have also used multiple sequence alignment obtained from PSIBLAST and secondary structure information predicted by PSIPRED. The training and testing of all methods has been performed on a data set of 193 non-homologous protein X-ray structures using five-fold cross-validation. It has been observed that ANN with multiple sequence alignment and predicted secondary structure information outperforms other methods. Based on our observations we have developed an ANN-based method for predicting alpha-turns in proteins. The main components of the method are two feed-forward back-propagation networks with a single hidden layer. The first sequence-structure network is trained with the multiple sequence alignment in the form of PSI-BLAST-generated position specific scoring matrices. The initial predictions obtained from the first network and PSIPRED predicted secondary structure are used as input to the second structure-structure network to refine the predictions obtained from the first net. The final network yields an overall prediction accuracy of 78.0% and MCC of 0.16. A web server AlphaPred (http://www.imtech.res.in/raghava/alphapred/) has been developed based on this approach.  相似文献   

6.
Ubiquitin functions to regulate protein turnover in a cell by closely regulating the degradation of specific proteins. Such a regulatory role is very important, and thus I have analyzed the proteins that are ubiquitin-like, using an artificial neural network, support vector machines and a hidden Markov model (HMM). The methods were trained and tested on a set of 373 ubiquitin proteins and 373 non-ubiquitin proteins, obtained from Entrez protein database. The artificial neural network and support vector machine are trained and tested using both the physicochemical properties and PSSM matrices generated from PSI-BLAST, while in the HMM based method direct sequences are used for training-testing procedures. Further, the performance measures of the methods are calculated for test sequences, i.e. accuracy, specificity, sensitivity and Matthew's correlation coefficients of the methods are calculated. The highest accuracy of 90.2%, specificity of 87.04% and sensitivity of 94.08% was achieved using the support vector machine model with PSSM matrices. While accuracies of 86.82%, 83.37%, 80.18% and 72.11% were obtained for the support vector machine with physicochemical properties, neural network with PSSM matrices, neural networks with physicochemical properties, and hidden Markov model, respectively. As the accuracy for SVM model is better both using physicochemical properties and the PSSM matrices, it is concluded that kernel methods such as SVM outperforms neural networks and hidden Markov models.  相似文献   

7.
Gromiha MM  Suresh MX 《Proteins》2008,70(4):1274-1279
Discriminating thermophilic proteins from their mesophilic counterparts is a challenging task and it would help to design stable proteins. In this work, we have systematically analyzed the amino acid compositions of 3075 mesophilic and 1609 thermophilic proteins belonging to 9 and 15 families, respectively. We found that the charged residues Lys, Arg, and Glu as well as the hydrophobic residues, Val and Ile have higher occurrence in thermophiles than mesophiles. Further, we have analyzed the performance of different methods, based on Bayes rules, logistic functions, neural networks, support vector machines, decision trees and so forth for discriminating mesophilic and thermophilic proteins. We found that most of the machine learning techniques discriminate these classes of proteins with similar accuracy. The neural network-based method could discriminate the thermophiles from mesophiles at the five-fold cross-validation accuracy of 89% in a dataset of 4684 proteins. Moreover, this method is tested with 325 mesophiles in Xylella fastidosa and 382 thermophiles in Aquifex aeolicus and it could successfully discriminate them with the accuracy of 91%. These accuracy levels are better than other methods in the literature and we suggest that this method could be effectively used to discriminate mesophilic and thermophilic proteins.  相似文献   

8.
Faraggi E  Xue B  Zhou Y 《Proteins》2009,74(4):847-856
This article attempts to increase the prediction accuracy of residue solvent accessibility and real-value backbone torsion angles of proteins through improved learning. Most methods developed for improving the backpropagation algorithm of artificial neural networks are limited to small neural networks. Here, we introduce a guided-learning method suitable for networks of any size. The method employs a part of the weights for guiding and the other part for training and optimization. We demonstrate this technique by predicting residue solvent accessibility and real-value backbone torsion angles of proteins. In this application, the guiding factor is designed to satisfy the intuitive condition that for most residues, the contribution of a residue to the structural properties of another residue is smaller for greater separation in the protein-sequence distance between the two residues. We show that the guided-learning method makes a 2-4% reduction in 10-fold cross-validated mean absolute errors (MAE) for predicting residue solvent accessibility and backbone torsion angles, regardless of the size of database, the number of hidden layers and the size of input windows. This together with introduction of two-layer neural network with a bipolar activation function leads to a new method that has a MAE of 0.11 for residue solvent accessibility, 36 degrees for psi, and 22 degrees for phi. The method is available as a Real-SPINE 3.0 server in http://sparks.informatics.iupui.edu.  相似文献   

9.
The goal of this study was to train an artificial neural network to generate accurate saccades in Listing's plane and then determine how the hidden units performed the visuomotor transformation. A three-layer neural network was successfully trained, using back-prop, to take in oculocentric retinal error vectors and three-dimensional eye orientation and to generate the correct head-centric motor error vector within Listing's plane. Analysis of the hidden layer of trained networks showed that explicit representations of desired target direction and eye orientation were not employed. Instead, the hidden-layer units consistently divided themselves into four parallel modules: a dominant "vector-propagation" class (approximately 50% of units) with similar visual and motor tuning but negligible position sensitivity and three classes with specific spatial relations between position, visual, and motor tuning. Surprisingly, the vector-propagation units, and only these, formed a highly precise and consistent orthogonal coordinate system aligned with Listing's plane. Selective "lesions" confirmed that the vector-propagation module provided the main drive for saccade magnitude and direction, whereas a balance between activity in the other modules was required for the correct eye-position modulation. Thus, contrary to popular expectation, error-driven learning in itself was sufficient to produce a "neural" algorithm with discrete functional modules and explicit coordinate systems, much like those observed in the real saccade generator.  相似文献   

10.
A hybrid system (hidden neural network) based on a hidden Markov model (HMM) and neural networks (NN) was trained to predict the bonding states of cysteines in proteins starting from the residue chains. Training was performed using 4136 cysteine-containing segments extracted from 969 non-homologous proteins of well-resolved 3D structure and without chain-breaks. After a 20-fold cross-validation procedure, the efficiency of the prediction scores as high as 80% using neural networks based on evolutionary information. When the whole protein is taken into account by means of an HMM, a hybrid system is generated, whose emission probabilities are computed using the NN output (hidden neural networks). In this case, the predictor accuracy increases up to 88%. Further, when tested on a protein basis, the hybrid system can correctly predict 84% of the chains in the data set, with a gain of at least 27% over the NN predictor.  相似文献   

11.
Transmembrane helices predicted at 95% accuracy.   总被引:27,自引:1,他引:27       下载免费PDF全文
We describe a neural network system that predicts the locations of transmembrane helices in integral membrane proteins. By using evolutionary information as input to the network system, the method significantly improved on a previously published neural network prediction method that had been based on single sequence information. The input data were derived from multiple alignments for each position in a window of 13 adjacent residues: amino acid frequency, conservation weights, number of insertions and deletions, and position of the window with respect to the ends of the protein chain. Additional input was the amino acid composition and length of the whole protein. A rigorous cross-validation test on 69 proteins with experimentally determined locations of transmembrane segments yielded an overall two-state per-residue accuracy of 95%. About 94% of all segments were predicted correctly. When applied to known globular proteins as a negative control, the network system incorrectly predicted fewer than 5% of globular proteins as having transmembrane helices. The method was applied to all 269 open reading frames from the complete yeast VIII chromosome. For 59 of these, at least two transmembrane helices were predicted. Thus, the prediction is that about one-fourth of all proteins from yeast VIII contain one transmembrane helix, and some 20%, more than one.  相似文献   

12.
Gromiha MM  Suwa M 《Proteins》2006,63(4):1031-1037
Discriminating outer membrane proteins (OMPs) from other folding types of globular and membrane proteins is an important task both for identifying OMPs from genomic sequences and for the successful prediction of their secondary and tertiary structures. In this work, we have analyzed the performance of different methods, based on Bayes rules, logistic functions, neural networks, support vector machines, decision trees, etc. for discriminating OMPs. We found that most of the machine learning techniques discriminate OMPs with similar accuracy. The neural network-based method could discriminate the OMPs from other proteins [globular/transmembrane helical (TMH)] at the fivefold cross-validation accuracy of 91.0% in a dataset of 1,088 proteins. The accuracy of discriminating globular proteins is 88.8% and that of TMH proteins is 93.7%. Further, the neural network method is tested with globular proteins belonging to 30 different folding types and it could successfully exclude 95% of the considered proteins. The proteins with SAM domain such as knottins, rubredoxin, and thioredoxin folds are eliminated with 100% accuracy. These accuracy levels are comparable to or better than other methods in the literature. We suggest that this method could be effectively used to discriminate OMPs and for detecting OMPs in genomic sequences.  相似文献   

13.
This work presents a dynamic artificial neural network methodology, which classifies the proteins into their classes from their sequences alone: the lysosomal membrane protein classes and the various other membranes protein classes. In this paper, neural networks-based lysosomal-associated membrane protein type prediction system is proposed. Different protein sequence representations are fused to extract the features of a protein sequence, which includes seven feature sets; amino acid (AA) composition, sequence length, hydrophobic group, electronic group, sum of hydrophobicity, R-group, and dipeptide composition. To reduce the dimensionality of the large feature vector, we applied the principal component analysis. The probabilistic neural network, generalized regression neural network, and Elman regression neural network (RNN) are used as classifiers and compared with layer recurrent network (LRN), a dynamic network. The dynamic networks have memory, i.e. its output depends not only on the input but the previous outputs also. Thus, the accuracy of LRN classifier among all other artificial neural networks comes out to be the highest. The overall accuracy of jackknife cross-validation is 93.2% for the data-set. These predicted results suggest that the method can be effectively applied to discriminate lysosomal associated membrane proteins from other membrane proteins (Type-I, Outer membrane proteins, GPI-Anchored) and Globular proteins, and it also indicates that the protein sequence representation can better reflect the core feature of membrane proteins than the classical AA composition.  相似文献   

14.
O-GalNAc-glycosylation is one of the main types of glycosylation in mammalian cells. No consensus recognition sequence for the O-glycosyltransferases is known, making prediction methods necessary to bridge the gap between the large number of known protein sequences and the small number of proteins experimentally investigated with regard to glycosylation status. From O-GLYCBASE a total of 86 mammalian proteins experimentally investigated for in vivo O-GalNAc sites were extracted. Mammalian protein homolog comparisons showed that a glycosylated serine or threonine is less likely to be precisely conserved than a nonglycosylated one. The Protein Data Bank was analyzed for structural information, and 12 glycosylated structures were obtained. All positive sites were found in coil or turn regions. A method for predicting the location for mucin-type glycosylation sites was trained using a neural network approach. The best overall network used as input amino acid composition, averaged surface accessibility predictions together with substitution matrix profile encoding of the sequence. To improve prediction on isolated (single) sites, networks were trained on isolated sites only. The final method combines predictions from the best overall network and the best isolated site network; this prediction method correctly predicted 76% of the glycosylated residues and 93% of the nonglycosylated residues. NetOGlyc 3.1 can predict sites for completely new proteins without losing its performance. The fact that the sites could be predicted from averaged properties together with the fact that glycosylation sites are not precisely conserved indicates that mucin-type glycosylation in most cases is a bulk property and not a very site-specific one. NetOGlyc 3.1 is made available at www.cbs.dtu.dk/services/netoglyc.  相似文献   

15.
Improved prediction of signal peptides: SignalP 3.0   总被引:63,自引:0,他引:63  
We describe improvements of the currently most popular method for prediction of classically secreted proteins, SignalP. SignalP consists of two different predictors based on neural network and hidden Markov model algorithms, where both components have been updated. Motivated by the idea that the cleavage site position and the amino acid composition of the signal peptide are correlated, new features have been included as input to the neural network. This addition, combined with a thorough error-correction of a new data set, have improved the performance of the predictor significantly over SignalP version 2. In version 3, correctness of the cleavage site predictions has increased notably for all three organism groups, eukaryotes, Gram-negative and Gram-positive bacteria. The accuracy of cleavage site prediction has increased in the range 6-17% over the previous version, whereas the signal peptide discrimination improvement is mainly due to the elimination of false-positive predictions, as well as the introduction of a new discrimination score for the neural network. The new method has been benchmarked against other available methods. Predictions can be made at the publicly available web server  相似文献   

16.
Accuracy of predicting protein secondary structure and solvent accessibility from sequence information has been improved significantly by using information contained in multiple sequence alignments as input to a neural 'network system. For the Asilomar meeting, predictions for 13 proteins were generated automatically using the publicly available prediction method PHD. The results confirm the estimate of 72% three-state prediction accuracy. The fairly accurate predictions of secondary structure segments made the tool useful as a starting point for modeling of higher dimensional aspects of protein structure. © 1995 Wiley-Liss, Inc.  相似文献   

17.
NetCGlyc 1.0: prediction of mammalian C-mannosylation sites   总被引:2,自引:0,他引:2  
Julenius K 《Glycobiology》2007,17(8):868-876
  相似文献   

18.
We have recently described a method based on artificial neural networks to cluster protein sequences into families. The network was trained with Kohonen''s unsupervised learning algorithm using, as inputs, the matrix patterns derived from the dipeptide composition of the proteins. We present here a large-scale application of that method to classify the 1,758 human protein sequences stored in the SwissProt database (release 19.0), whose lengths are greater than 50 amino acids. In the final 2-dimensional topologically ordered map of 15 x 15 neurons, proteins belonging to known families were associated with the same neuron or with neighboring ones. Also, as an attempt to reduce the time-consuming learning procedure, we compared 2 learning protocols: one of 500 epochs (100 SUN CPU-hours [CPU-h]), and another one of 30 epochs (6.7 CPU-h). A further reduction of learning-computing time, by a factor of about 3.3, with similar protein clustering results, was achieved using a matrix of 11 x 11 components to represent the sequences. Although network training is time consuming, the classification of a new protein in the final ordered map is very fast (14.6 CPU-seconds). We also show a comparison between the artificial neural network approach and conventional methods of biosequence analysis.  相似文献   

19.
Saha S  Raghava GP 《Proteins》2006,65(1):40-48
B-cell epitopes play a vital role in the development of peptide vaccines, in diagnosis of diseases, and also for allergy research. Experimental methods used for characterizing epitopes are time consuming and demand large resources. The availability of epitope prediction method(s) can rapidly aid experimenters in simplifying this problem. The standard feed-forward (FNN) and recurrent neural network (RNN) have been used in this study for predicting B-cell epitopes in an antigenic sequence. The networks have been trained and tested on a clean data set, which consists of 700 non-redundant B-cell epitopes obtained from Bcipep database and equal number of non-epitopes obtained randomly from Swiss-Prot database. The networks have been trained and tested at different input window length and hidden units. Maximum accuracy has been obtained using recurrent neural network (Jordan network) with a single hidden layer of 35 hidden units for window length of 16. The final network yields an overall prediction accuracy of 65.93% when tested by fivefold cross-validation. The corresponding sensitivity, specificity, and positive prediction values are 67.14, 64.71, and 65.61%, respectively. It has been observed that RNN (JE) was more successful than FNN in the prediction of B-cell epitopes. The length of the peptide is also important in the prediction of B-cell epitopes from antigenic sequences. The webserver ABCpred is freely available at www.imtech.res.in/raghava/abcpred/.  相似文献   

20.
In order to study how neurons in the primary motor cortex (MI) are dynamically linked together during skilled movement, we recorded simultaneously from many cortical neurons in cats trained to perform a reaching and retrieval task using their forelimbs. Analysis of task-related spike activity in the MI of the hemisphere contralateral to the reaching forelimb (in identified forelimb or hindlimb representations) recorded through chronically implanted microwires, was followed by pairwise evaluation of temporally correlated activity in these neurons during task performance using shuffle corrected cross-correlograms. Over many months of recording, a variety of task-related modulations of neural activities were observed in individual efferent zones. Positively correlated activity (mainly narrow peaks at zero or short latencies) was seen during task performance frequently between neurons recorded within the forelimb representation of MI, rarely within the hindlimb area of MI, and never between forelimb and hindlimb areas. Correlated activity was frequently observed between neurons with different patterns of task-related activity or preferential activity during different task elements (reaching, feeding, etc.), and located in efferent zones with dissimilar representation as defined by intracortical microstimulation. The observed synchronization of action potentials among selected but functionally varied groups of MI neurons possibly reflects dynamic recruitment of network connections between efferent zones during skilled movement.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号