首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A feed-forward neural network has been employed for protein secondary structure prediction. Attempts were made to improve on previous prediction accuracies using a hierarchical mixture of experts (HME). In this method input data are clustered and used to train a series of different networks. Application of an HME to the prediction of protein secondary structure is shown to provide no advantages over a single network. We have also tried various new input representations, chosen to incorporate the effect of residues a long distance away in the one-dimensional amino acid chain. Prediction accuracy using these methods is comparable to that achieved by other neural networks.1–4  相似文献   

2.
Local structure prediction can facilitate ab initio structure prediction, protein threading, and remote homology detection. However, the accuracy of existing methods is limited. In this paper, we propose a knowledge-based prediction method that assigns a measure called the local match rate to each position of an amino acid sequence to estimate the confidence of our method. Empirically, the accuracy of the method correlates positively with the local match rate; therefore, we employ it to predict the local structures of positions with a high local match rate. For positions with a low local match rate, we propose a neural network prediction method. To better utilize the knowledge-based and neural network methods, we design a hybrid prediction method, HYPLOSP (HYbrid method to Protein LOcal Structure Prediction) that combines both methods. To evaluate the performance of the proposed methods, we first perform cross-validation experiments by applying our knowledge-based method, a neural network method, and HYPLOSP to a large dataset of 3,925 protein chains. We test our methods extensively on three different structural alphabets and evaluate their performance by two widely used criteria, Maximum Deviation of backbone torsion Angle (MDA) and Q(N), which is similar to Q(3) in secondary structure prediction. We then compare HYPLOSP with three previous studies using a dataset of 56 new protein chains. HYPLOSP shows promising results in terms of MDA and Q(N) accuracy and demonstrates its alphabet-independent capability.  相似文献   

3.
The estimation of prediction quality is important because without quality measures, it is difficult to determine the usefulness of a prediction. Currently, methods for ligand binding site residue predictions are assessed in the function prediction category of the biennial Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiment, utilizing the Matthews Correlation Coefficient (MCC) and Binding-site Distance Test (BDT) metrics. However, the assessment of ligand binding site predictions using such metrics requires the availability of solved structures with bound ligands. Thus, we have developed a ligand binding site quality assessment tool, FunFOLDQA, which utilizes protein feature analysis to predict ligand binding site quality prior to the experimental solution of the protein structures and their ligand interactions. The FunFOLDQA feature scores were combined using: simple linear combinations, multiple linear regression and a neural network. The neural network produced significantly better results for correlations to both the MCC and BDT scores, according to Kendall's τ, Spearman's ρ and Pearson's r correlation coefficients, when tested on both the CASP8 and CASP9 datasets. The neural network also produced the largest Area Under the Curve score (AUC) when Receiver Operator Characteristic (ROC) analysis was undertaken for the CASP8 dataset. Furthermore, the FunFOLDQA algorithm incorporating the neural network, is shown to add value to FunFOLD, when both methods are employed in combination. This results in a statistically significant improvement over all of the best server methods, the FunFOLD method (6.43%), and one of the top manual groups (FN293) tested on the CASP8 dataset. The FunFOLDQA method was also found to be competitive with the top server methods when tested on the CASP9 dataset. To the best of our knowledge, FunFOLDQA is the first attempt to develop a method that can be used to assess ligand binding site prediction quality, in the absence of experimental data.  相似文献   

4.
Improved prediction of signal peptides: SignalP 3.0   总被引:63,自引:0,他引:63  
We describe improvements of the currently most popular method for prediction of classically secreted proteins, SignalP. SignalP consists of two different predictors based on neural network and hidden Markov model algorithms, where both components have been updated. Motivated by the idea that the cleavage site position and the amino acid composition of the signal peptide are correlated, new features have been included as input to the neural network. This addition, combined with a thorough error-correction of a new data set, have improved the performance of the predictor significantly over SignalP version 2. In version 3, correctness of the cleavage site predictions has increased notably for all three organism groups, eukaryotes, Gram-negative and Gram-positive bacteria. The accuracy of cleavage site prediction has increased in the range 6-17% over the previous version, whereas the signal peptide discrimination improvement is mainly due to the elimination of false-positive predictions, as well as the introduction of a new discrimination score for the neural network. The new method has been benchmarked against other available methods. Predictions can be made at the publicly available web server  相似文献   

5.
Our previous work applied neural network techniques to the problem of discriminating open reading frame (ORF) sequences taken from introns versus exons. The method counted the codon frequencies in an ORF of a specified length, and then used this codon frequency representation of DNA fragments to train a neural net (essentially a Perceptron with a sigmoidal, or "soft step function", output) to perform this discrimination. After training, the network was then applied to a disjoint "predict" set of data to assess accuracy. The resulting accuracy in our previous work was 98.4%, exceeding accuracies reported in the literature at that time for other algorithms. Here, we report even higher accuracies stemming from calculations of mutual information (a correlation measure) of spatially separated codons in exons, and in introns. Significant mutual information exists in exons, but not in introns, between adjacent codons. This suggests that dicodon frequencies of adjacent codons are important for intron/exon discrimination. We report that accuracies obtained using a neural net trained on the frequency of dicodons is significantly higher at smaller fragment lengths than even our original results using codon frequencies, which were already higher than simple statistical methods that also used codon frequencies. We also report accuracies obtained from including codon and dicodon statistics in all six reading frames, i.e. the three frames on the original and complement strand. Inclusion of six-frame statistics increases the accuracy still further. We also compare these neural net results to a Bayesian statistical prediction method that assumes independent codon frequencies in each position. The performance of the Bayesian scheme is poorer than any of the neural based schemes, however many methods reported in the literature either explicitly, or implicitly, use this method. Specifically, Bayesian prediction schemes based on codon frequencies achieve 90.9% accuracy on 90 codon ORFs, while our best neural net scheme reaches 99.4% accuracy on 60 codon ORFs. "Accuracy" is defined as the average of the exon and intron sensitivities. Achievement of sufficiently high accuracies on short fragment lengths can be useful in providing a computational means of finding coding regions in unannotated DNA sequences such as those arising from the mega-base sequencing efforts of the Human Genome Project. We caution that the high accuracies reported here do not represent a complete solution to the problem of identifying exons in "raw" base sequences. The accuracies are considerably lower from exons of small length, although still higher than accuracies reported in the literature for other methods. Short exon lengths are not uncommon.(ABSTRACT TRUNCATED AT 400 WORDS)  相似文献   

6.
This study investigated the use of neural networks in the identification of Escherichia coli ribosome binding sites. The recognition of these sites based on primary sequence data is difficult due to the multiple determinants that define them. Additionally, secondary structure plays a significant role in the determination of the site and this information is difficult to include in the models. Efforts to solve this problem have so far yielded poor results. A new compilation of E. coli ribosome binding sites was generated for this study. Feedforward backpropagation networks were applied to their identification. Perceptrons were also applied, since they have been the previous best method since 1982. Evaluation of performance for all the neural networks and perceptrons was determined by ROC analysis. The neural network provided significant improvement in the recognition of these sites when compared with the previous best method, finding less than half the number of false positives when both models were adjusted to find an equal number of actual sites. The best neural network used an input window of 101 nucleotides and a single hidden layer of 9 units. Both the neural network and the perceptron trained on the new compilation performed better than the original perceptron published by Stormo et al. in 1982.  相似文献   

7.
A neural network has been used to predict both the location and the type of beta-turns in a set of 300 nonhomologous protein domains. A substantial improvement in prediction accuracy compared with previous methods has been achieved by incorporating secondary structure information in the input data. The total percentage of residues correctly classified as beta-turn or not-beta-turn is around 75% with predicted secondary structure information. More significantly, the method gives a Matthews correlation coefficient (MCC) of around 0.35, compared with a typical MCC of around 0.20 using other beta-turn prediction methods. Our method also distinguishes the two most numerous and well-defined types of beta-turn, types I and II, with a significant level of accuracy (MCCs 0.22 and 0.26, respectively).  相似文献   

8.
应用神经网络和多元回归技术预测森林产量   总被引:16,自引:0,他引:16  
应用传统统计技术常会因样本小和测量数据不符某种分布而受到限制。本文评价一种前馈型神经网络算法以预测落叶阔叶林产量。另外,还介绍一种由定性变为定量的数据变换方法,以用相对小的样本建立多元回归预测模型。数据变换方法有助于改善多元回归模型的预测效果。在本实验的条件下,研究结果表明神经网络技术能够产生最好的预测效果.  相似文献   

9.
A priori knowledge of secondary structure content can be of great use in theoretical and experimental determination of protein structure. We present a method that uses two computer-simulated neural networks placed in "tandem" to predict the secondary structure content of water-soluble, globular proteins. The first of the two networks, NET1, predicts a protein's helix and strand content given information about the protein's amino acid composition, molecular weight and heme presence. Because NET1 contained more adjustable parameters (network weights) than learning examples, this network experienced problems with memorization, which is the inability to generalize onto new, never-seen-before examples. To overcome this problem, we designed a second network, NET2, which learned to determine when NET1 was in a state of generalization. Together, these two networks produce prediction errors as low as 5.0% and 5.6% for helix and strand content, respectively, on a set of protein crystal structures bearing little homology to those used in network training. A comparison between three other methods including a multiple linear regression analysis, a non-hidden-node network analysis and a secondary structure assignment analysis reveals that our tandem neural network scheme is, indeed, the best method for predicting secondary structure content. The results of our analysis suggest that the knowledge of sequence information is not necessary for highly accurate predictions of protein secondary structure content.  相似文献   

10.
MOTIVATION: beta-turn is an important element of protein structure. In the past three decades, numerous beta-turn prediction methods have been developed based on various strategies. For a detailed discussion about the importance of beta-turns and a systematic introduction of the existing prediction algorithms for beta-turns and their types, please see a recent review (Chou, Analytical Biochemistry, 286, 1-16, 2000). However at present, it is still difficult to say which method is better than the other. This is because of the fact that these methods were developed on different sets of data. Thus, it is important to evaluate the performance of beta-turn prediction methods. RESULTS: We have evaluated the performance of six methods of beta-turn prediction. All the methods have been tested on a set of 426 non-homologous protein chains. It has been observed that the performance of the neural network based method, BTPRED, is significantly better than the statistical methods. One of the reasons for its better performance is that it utilizes the predicted secondary structure information. We have also trained, tested and evaluated the performance of all methods except BTPRED and GORBTURN, on new data set using a 7-fold cross-validation technique. There is a significant improvement in performance of all the methods when secondary structure information is incorporated. Moreover, after incorporating secondary structure information, the Sequence Coupled Model has yielded better results in predicting beta-turns as compared with other methods. In this study, both threshold dependent and independent (ROC) measures have been used for evaluation.  相似文献   

11.
Recognition of peptides bound to major histocompatibility complex (MHC) class I molecules by T lymphocytes is an essential part of immune surveillance. Each MHC allele has a characteristic peptide binding preference, which can be captured in prediction algorithms, allowing for the rapid scan of entire pathogen proteomes for peptide likely to bind MHC. Here we make public a large set of 48,828 quantitative peptide-binding affinity measurements relating to 48 different mouse, human, macaque, and chimpanzee MHC class I alleles. We use this data to establish a set of benchmark predictions with one neural network method and two matrix-based prediction methods extensively utilized in our groups. In general, the neural network outperforms the matrix-based predictions mainly due to its ability to generalize even on a small amount of data. We also retrieved predictions from tools publicly available on the internet. While differences in the data used to generate these predictions hamper direct comparisons, we do conclude that tools based on combinatorial peptide libraries perform remarkably well. The transparent prediction evaluation on this dataset provides tool developers with a benchmark for comparison of newly developed prediction methods. In addition, to generate and evaluate our own prediction methods, we have established an easily extensible web-based prediction framework that allows automated side-by-side comparisons of prediction methods implemented by experts. This is an advance over the current practice of tool developers having to generate reference predictions themselves, which can lead to underestimating the performance of prediction methods they are not as familiar with as their own. The overall goal of this effort is to provide a transparent prediction evaluation allowing bioinformaticians to identify promising features of prediction methods and providing guidance to immunologists regarding the reliability of prediction tools.  相似文献   

12.
Fuchs A  Kirschner A  Frishman D 《Proteins》2009,74(4):857-871
Despite rapidly increasing numbers of available 3D structures, membrane proteins still account for less than 1% of all structures in the Protein Data Bank. Recent high-resolution structures indicate a clearly broader structural diversity of membrane proteins than initially anticipated, motivating the development of reliable structure prediction methods specifically tailored for this class of molecules. One important prediction target capturing all major aspects of a protein's 3D structure is its contact map. Our analysis shows that computational methods trained to predict residue contacts in globular proteins perform poorly when applied to membrane proteins. We have recently published a method to identify interacting alpha-helices in membrane proteins based on the analysis of coevolving residues in predicted transmembrane regions. Here, we present a substantially improved algorithm for the same problem, which uses a newly developed neural network approach to predict helix-helix contacts. In addition to the input features commonly used for contact prediction of soluble proteins, such as windowed residue profiles and residue distance in the sequence, our network also incorporates features that apply to membrane proteins only, such as residue position within the transmembrane segment and its orientation toward the lipophilic environment. The obtained neural network can predict contacts between residues in transmembrane segments with nearly 26% accuracy. It is therefore the first published contact predictor developed specifically for membrane proteins performing with equal accuracy to state-of-the-art contact predictors available for soluble proteins. The predicted helix-helix contacts were employed in a second step to identify interacting helices. For our dataset consisting of 62 membrane proteins of solved structure, we gained an accuracy of 78.1%. Because the reliable prediction of helix interaction patterns is an important step in the classification and prediction of membrane protein folds, our method will be a helpful tool in compiling a structural census of membrane proteins.  相似文献   

13.
14.
Most biological networks are modular but previous work with small model networks has indicated that modularity does not necessarily lead to increased functional efficiency. Most biological networks are large, however, and here we examine the relative functional efficiency of modular and non-modular neural networks at a range of sizes. We conduct a detailed analysis of efficiency in networks of two size classes: ‘small’ and ‘large’, and a less detailed analysis across a range of network sizes. The former analysis reveals that while the modular network is less efficient than one of the two non-modular networks considered when networks are small, it is usually equally or more efficient than both non-modular networks when networks are large. The latter analysis shows that in networks of small to intermediate size, modular networks are much more efficient that non-modular networks of the same (low) connective density. If connective density must be kept low to reduce energy needs for example, this could promote modularity. We have shown how relative functionality/performance scales with network size, but the precise nature of evolutionary relationship between network size and prevalence of modularity will depend on the costs of connectivity.  相似文献   

15.
Kaur H  Raghava GP 《FEBS letters》2004,564(1-2):47-57
In this study, an attempt has been made to develop a neural network-based method for predicting segments in proteins containing aromatic-backbone NH (Ar-NH) interactions using multiple sequence alignment. We have analyzed 3121 segments seven residues long containing Ar-NH interactions, extracted from 2298 non-redundant protein structures where no two proteins have more than 25% sequence identity. Two consecutive feed-forward neural networks with a single hidden layer have been trained with standard back-propagation as learning algorithm. The performance of the method improves from 0.12 to 0.15 in terms of Matthews correlation coefficient (MCC) value when evolutionary information (multiple alignment obtained from PSI-BLAST) is used as input instead of a single sequence. The performance of the method further improves from MCC 0.15 to 0.20 when secondary structure information predicted by PSIPRED is incorporated in the prediction. The final network yields an overall prediction accuracy of 70.1% and an MCC of 0.20 when tested by five-fold cross-validation. Overall the performance is 15.2% higher than the random prediction. The method consists of two neural networks: (i) a sequence-to-structure network which predicts the aromatic residues involved in Ar-NH interaction from multiple alignment of protein sequences and (ii) a structure-to structure network where the input consists of the output obtained from the first network and predicted secondary structure. Further, the actual position of the donor residue within the 'potential' predicted fragment has been predicted using a separate sequence-to-structure neural network. Based on the present study, a server Ar_NHPred has been developed which predicts Ar-NH interaction in a given amino acid sequence. The web server Ar_NHPred is available at and (mirror site).  相似文献   

16.
Saha S  Raghava GP 《Proteins》2006,65(1):40-48
B-cell epitopes play a vital role in the development of peptide vaccines, in diagnosis of diseases, and also for allergy research. Experimental methods used for characterizing epitopes are time consuming and demand large resources. The availability of epitope prediction method(s) can rapidly aid experimenters in simplifying this problem. The standard feed-forward (FNN) and recurrent neural network (RNN) have been used in this study for predicting B-cell epitopes in an antigenic sequence. The networks have been trained and tested on a clean data set, which consists of 700 non-redundant B-cell epitopes obtained from Bcipep database and equal number of non-epitopes obtained randomly from Swiss-Prot database. The networks have been trained and tested at different input window length and hidden units. Maximum accuracy has been obtained using recurrent neural network (Jordan network) with a single hidden layer of 35 hidden units for window length of 16. The final network yields an overall prediction accuracy of 65.93% when tested by fivefold cross-validation. The corresponding sensitivity, specificity, and positive prediction values are 67.14, 64.71, and 65.61%, respectively. It has been observed that RNN (JE) was more successful than FNN in the prediction of B-cell epitopes. The length of the peptide is also important in the prediction of B-cell epitopes from antigenic sequences. The webserver ABCpred is freely available at www.imtech.res.in/raghava/abcpred/.  相似文献   

17.
We demonstrate the applicability of our previously developed Bayesian probabilistic approach for predicting residue solvent accessibility to the problem of predicting secondary structure. Using only single-sequence data, this method achieves a three-state accuracy of 67% over a database of 473 non-homologous proteins. This approach is more amenable to inspection and less likely to overlearn specifics of a dataset than "black box" methods such as neural networks. It is also conceptually simpler and less computationally costly. We also introduce a novel method for representing and incorporating multiple-sequence alignment information within the prediction algorithm, achieving 72% accuracy over a dataset of 304 non-homologous proteins. This is accomplished by creating a statistical model of the evolutionarily derived correlations between patterns of amino acid substitution and local protein structure. This model consists of parameter vectors, termed "substitution schemata," which probabilistically encode the structure-based heterogeneity in the distributions of amino acid substitutions found in alignments of homologous proteins. The model is optimized for structure prediction by maximizing the mutual information between the set of schemata and the database of secondary structures. Unlike "expert heuristic" methods, this approach has been demonstrated to work well over large datasets. Unlike the opaque neural network algorithms, this approach is physicochemically intelligible. Moreover, the model optimization procedure, the formalism for predicting one-dimensional structural features and our previously developed method for tertiary structure recognition all share a common Bayesian probabilistic basis. This consistency starkly contrasts with the hybrid and ad hoc nature of methods that have dominated this field in recent years.  相似文献   

18.
In this paper we describe an improved neural network method to predict T-cell class I epitopes. A novel input representation has been developed consisting of a combination of sparse encoding, Blosum encoding, and input derived from hidden Markov models. We demonstrate that the combination of several neural networks derived using different sequence-encoding schemes has a performance superior to neural networks derived using a single sequence-encoding scheme. The new method is shown to have a performance that is substantially higher than that of other methods. By use of mutual information calculations we show that peptides that bind to the HLA A*0204 complex display signal of higher order sequence correlations. Neural networks are ideally suited to integrate such higher order correlations when predicting the binding affinity. It is this feature combined with the use of several neural networks derived from different and novel sequence-encoding schemes and the ability of the neural network to be trained on data consisting of continuous binding affinities that gives the new method an improved performance. The difference in predictive performance between the neural network methods and that of the matrix-driven methods is found to be most significant for peptides that bind strongly to the HLA molecule, confirming that the signal of higher order sequence correlation is most strongly present in high-binding peptides. Finally, we use the method to predict T-cell epitopes for the genome of hepatitis C virus and discuss possible applications of the prediction method to guide the process of rational vaccine design.  相似文献   

19.
Protein backbone angle prediction with machine learning approaches   总被引:2,自引:0,他引:2  
MOTIVATION: Protein backbone torsion angle prediction provides useful local structural information that goes beyond conventional three-state (alpha, beta and coil) secondary structure predictions. Accurate prediction of protein backbone torsion angles will substantially improve modeling procedures for local structures of protein sequence segments, especially in modeling loop conformations that do not form regular structures as in alpha-helices or beta-strands. RESULTS: We have devised two novel automated methods in protein backbone conformational state prediction: one method is based on support vector machines (SVMs); the other method combines a standard feed-forward back-propagation artificial neural network (NN) with a local structure-based sequence profile database (LSBSP1). Extensive benchmark experiments demonstrate that both methods have improved the prediction accuracy rate over the previously published methods for conformation state prediction when using an alphabet of three or four states. AVAILABILITY: LSBSP1 and the NN algorithm have been implemented in PrISM.1, which is available from www.columbia.edu/~ay1/. SUPPLEMENTARY INFORMATION: Supplementary data for the SVM method can be downloaded from the Website www.cs.columbia.edu/compbio/backbone.  相似文献   

20.
NETASA: neural network based prediction of solvent accessibility   总被引:3,自引:0,他引:3  
MOTIVATION: Prediction of the tertiary structure of a protein from its amino acid sequence is one of the most important problems in molecular biology. The successful prediction of solvent accessibility will be very helpful to achieve this goal. In the present work, we have implemented a server, NETASA for predicting solvent accessibility of amino acids using our newly optimized neural network algorithm. Several new features in the neural network architecture and training method have been introduced, and the network learns faster to provide accuracy values, which are comparable or better than other methods of ASA prediction. RESULTS: Prediction in two and three state classification systems with several thresholds are provided. Our prediction method achieved the accuracy level upto 90% for training and 88% for test data sets. Three state prediction results provide a maximum 65% accuracy for training and 63% for the test data. Applicability of neural networks for ASA prediction has been confirmed with a larger data set and wider range of state thresholds. Salient differences between a linear and exponential network for ASA prediction have been analysed. AVAILABILITY: Online predictions are freely available at: http://www.netasa.org. Linux ix86 binaries of the program written for this work may be obtained by email from the corresponding author.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号