首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Kuhn M  Meiler J  Baker D 《Proteins》2004,54(2):282-288
Beta-sheet proteins have been particularly challenging for de novo structure prediction methods, which tend to pair adjacent beta-strands into beta-hairpins and produce overly local topologies. To remedy this problem and facilitate de novo prediction of beta-sheet protein structures, we have developed a neural network that classifies strand-loop-strand motifs by local hairpins and nonlocal diverging turns by using the amino acid sequence as input. The neural network is trained with a representative subset of the Protein Data Bank and achieves a prediction accuracy of 75.9 +/- 4.4% compared to a baseline prediction rate of 59.1%. Hairpins are predicted with an accuracy of 77.3 +/- 6.1%, diverging turns with an accuracy of 73.9 +/- 6.0%. Incorporation of the beta-hairpin/diverging turn classification into the ROSETTA de novo structure prediction method led to higher contact order models and somewhat improved tertiary structure predictions for a test set of 11 all-beta-proteins and 3 alphabeta-proteins. The beta-hairpin/diverging turn classification from amino acid sequences is available online for academic use (Meiler and Kuhn, 2003; www.jens-meiler.de/turnpred.html).  相似文献   

3.
通过研究神经网络权值矩阵的算法,挖掘蛋白质二级结构与氨基酸序列间的内在规律,提高一级序列预测二级结构的准确度。神经网络方法在特征分类方面具有良好表现,经过学习训练后的神经元连接权值矩阵包含样本的内在特征和规律。研究使用神经网络权值矩阵打分预测;采用错位比对方法寻找敏感的氨基酸邻域;分析测试集在不同加窗长度下的共性表现。实验表明,在滑动窗口长度L=7时,预测性能变化显著;邻域位置P=4的氨基酸残基对预测性能有加强作用。该研究方法为基于局部序列特征的蛋白质二级结构预测提供了新的算法设计。  相似文献   

4.
Prediction of the disulfide-bonding state of cysteine in proteins   总被引:5,自引:0,他引:5  
The bonding states of cysteine play important functional and structural roles in proteins. In particular, disulfide bond formation is one of the most important factors influencing the three-dimensional fold of proteins. Proteins of known structure were used to teach computer-simulated neural networks rules for predicting the disulfide-bonding state of a cysteine given only its flanking amino acid sequence. Resulting networks make accurate predictions on sequences different from those used in training, suggesting that local sequence greatly influences cysteines in disulfide bond formation. The average prediction rate after seven independent network experiments is 81.4% for disulfide-bonded and 80.0% for non-disulfide-bonded scenarios. Predictive accuracy is related to the strength of network output activities. Network weights reveal interesting position-dependent amino acid preferences and provide a physical basis for understanding the correlation between the flanking sequence and a cysteine's disulfide-bonding state. Network predictions may be used to increase or decrease the stability of existing disulfide bonds or to aid the search for potential sites to introduce new disulfide bonds.  相似文献   

5.
The architecture and weights of an artificial neural network model that predicts putative transmembrane sequences have been developed and optimized by the algorithm of structure evolution. The resulting filter is able to classify membrane/nonmembrane transition regions in sequences of integral human membrane proteins with high accuracy. Similar results have been obtained for both training and test set data, indicating that the network has focused on general features of transmembrane sequences rather than specializing on the training data. Seven physicochemical amino acid properties have been used for sequence encoding. The predictions are compared to hydrophobicity plots.  相似文献   

6.
Peptide ligands of G protein-coupled receptors constitute valuable natural lead structures for the development of highly selective drugs and high-affinity tools to probe ligand-receptor interaction. Currently, pharmacological and metabolic modification of natural peptides involves either an iterative trial-and-error process based on structure-activity relationships or screening of peptide libraries that contain many structural variants of the native molecule. Here, we present a novel neural network architecture for the improvement of metabolic stability without loss of bioactivity. In this approach the peptide sequence determines the topology of the neural network and each cell corresponds one-to-one to a single amino acid of the peptide chain. Using a training set, the learning algorithm calculated weights for each cell. The resulting network calculated the fitness function in a genetic algorithm to explore the virtual space of all possible peptides. The network training was based on gradient descent techniques which rely on the efficient calculation of the gradient by back-propagation. After three consecutive cycles of sequence design by the neural network, peptide synthesis and bioassay this new approach yielded a ligand with 70fold higher metabolic stability compared to the wild type peptide without loss of the subnanomolar activity in the biological assay. Combining specialized neural networks with an exploration of the combinatorial amino acid sequence space by genetic algorithms represents a novel rational strategy for peptide design and optimization.  相似文献   

7.
Hamilton N  Burrage K  Ragan MA  Huber T 《Proteins》2004,56(4):679-684
We describe a new method for using neural networks to predict residue contact pairs in a protein. The main inputs to the neural network are a set of 25 measures of correlated mutation between all pairs of residues in two "windows" of size 5 centered on the residues of interest. While the individual pair-wise correlations are a relatively weak predictor of contact, by training the network on windows of correlation the accuracy of prediction is significantly improved. The neural network is trained on a set of 100 proteins and then tested on a disjoint set of 1033 proteins of known structure. An average predictive accuracy of 21.7% is obtained taking the best L/2 predictions for each protein, where L is the sequence length. Taking the best L/10 predictions gives an average accuracy of 30.7%. The predictor is also tested on a set of 59 proteins from the CASP5 experiment. The accuracy is found to be relatively consistent across different sequence lengths, but to vary widely according to the secondary structure. Predictive accuracy is also found to improve by using multiple sequence alignments containing many sequences to calculate the correlations.  相似文献   

8.
NETASA: neural network based prediction of solvent accessibility   总被引:3,自引:0,他引:3  
MOTIVATION: Prediction of the tertiary structure of a protein from its amino acid sequence is one of the most important problems in molecular biology. The successful prediction of solvent accessibility will be very helpful to achieve this goal. In the present work, we have implemented a server, NETASA for predicting solvent accessibility of amino acids using our newly optimized neural network algorithm. Several new features in the neural network architecture and training method have been introduced, and the network learns faster to provide accuracy values, which are comparable or better than other methods of ASA prediction. RESULTS: Prediction in two and three state classification systems with several thresholds are provided. Our prediction method achieved the accuracy level upto 90% for training and 88% for test data sets. Three state prediction results provide a maximum 65% accuracy for training and 63% for the test data. Applicability of neural networks for ASA prediction has been confirmed with a larger data set and wider range of state thresholds. Salient differences between a linear and exponential network for ASA prediction have been analysed. AVAILABILITY: Online predictions are freely available at: http://www.netasa.org. Linux ix86 binaries of the program written for this work may be obtained by email from the corresponding author.  相似文献   

9.
10.
11.
An artificial neural network has been developed for the recognition and prediction of transmembrane regions in the amino acid sequences of human integral membrane proteins. It provides an additional prediction method besides the common hydrophobicity analysis by statistical means. Membrane/nonmembrane transition regions are predicted with 92% accuracy in both training and independent test data. The method used for the development of the neural filter is the algorithm of structure evolution. It subjects both the architecture and parameters of the system to a systematical optimization process and carries out local search in the respective structure and parameter spaces. The training technique of incomplete induction as part of the structure evolution provides for a comparatively general solution of the problem that is described by input-output relations only. Seven physicochemical side-chain properties were used to encode the amino acid sequences. It was found that geometric parameters like side-chain volume, bulkiness, or surface area are of minor importance. The properties polarity, refractivity, and hydrophobicity, however, turned out to support feature extraction. It is concluded that membrane transition regions in proteins are encoded in sequences as a characteristic feature based on the respective side-chain properties. The method of structure evolution is described in detail for this particular application and suggestions for further development of amino acid sequence filters are made. © 1996 John Wiley & Sons, Inc.  相似文献   

12.
We present an approach to predicting protein structural class that uses amino acid composition and hydrophobic pattern frequency information as input to two types of neural networks: (1) a three-layer back-propagation network and (2) a learning vector quantization network. The results of these methods are compared to those obtained from a modified Euclidean statistical clustering algorithm. The protein sequence data used to drive these algorithms consist of the normalized frequency of up to 20 amino acid types and six hydrophobic amino acid patterns. From these frequency values the structural class predictions for each protein (all-alpha, all-beta, or alpha-beta classes) are derived. Examples consisting of 64 previously classified proteins were randomly divided into multiple training (56 proteins) and test (8 proteins) sets. The best performing algorithm on the test sets was the learning vector quantization network using 17 inputs, obtaining a prediction accuracy of 80.2%. The Matthews correlation coefficients are statistically significant for all algorithms and all structural classes. The differences between algorithms are in general not statistically significant. These results show that information exists in protein primary sequences that is easily obtainable and useful for the prediction of protein structural class by neural networks as well as by standard statistical clustering algorithms.  相似文献   

13.
14.
We have compared a novel sequence-structure matching technique, FORESST, for detecting remote homologs to three existing sequence based methods, including local amino acid sequence similarity by BLASTP, hidden Markov models (HMMs) of sequences of protein families using SAM, HMMs based on sequence motifs identified using meta-MEME. FORESST compares predicted secondary structures to a library of structural families of proteins, using HMMs. Altogether 45 proteins from nine structural families in the database CATH were used in a cross-validated test of the fold assignment accuracy of each method. Local sequence similarity of a query sequence to a protein family is measured by the highest segment pair (HSP) score. Each of the HMM-based approaches (FORESST, MEME, amino acid sequence-based HMM) yielded log-odds score for the query sequence. In order to make a fair comparison among these methods, the scores for each method were converted to Z-scores in a uniform way by comparing the raw scores of a query protein with the corresponding scores for a set of unrelated proteins. Z-Scores were analyzed as a function of the maximum pairwise sequence identity (MPSID) of the query sequence to sequences used in training the model. For MPSID above 20%, the Z-scores increase linearly with MPSID for the sequence-based methods but remain roughly constant for FORESST. Below 15%, average Z-scores are close to zero for the sequence-based methods, whereas the FORESST method yielded average Z-scores of 1.8 and 1.1, using observed and predicted secondary structures, respectively. This demonstrates the advantage of the sequence-structure method for detecting remote homologs.  相似文献   

15.
The model of formation of alpha-helices and beta-structures determined by joint action of the three elements: N-terminal, internal and C-terminal fragments are presented. Algorithm for calculation of their localization in a given amino acid sequence was constructed on the base of this model. The preference of the fragments of the amino acid sequence to a definite type of the secondary structure was estimated on the base of corresponding average values of linear discriminant functions dsk (s = alpha, beta, k = N, in, C). The latter were constructed in the previous paper on the base of the revealed significant characteristics. These integral characteristics are used for calculating the localisation of discrete secondary structures. The total prediction for 3 states (alpha, beta, c) given 71% correctly predicted residues (for 4 states alpha, beta, c, t) 62% for the training set, consisting of 72 proteins. For the control set (15 proteins) the accuracy of prediction is about 65%. The essential advantages of this method are: 1) the possibility to localize the discrete secondary structures; 2) the high accuracy of prediction of long secondary structures (for alpha-helices approximately 90%, for beta-structures approximately 80%), which is important for the determination of the protein folding. The influence of mutation on the secondary structure of proteins was investigated. The anormally high stability of the secondary structures of immunoglobulins to mutations was revealed. This probably results from the selection during evolution of such variants of amino acid sequences, which are able to provide the functional variability of antigenic determinants, but keep invariant the tertially structure of protein.  相似文献   

16.
We present a method based on hierarchical self-organizing maps (SOMs) for recognizing patterns in protein sequences. The method is fully automatic, does not require prealigned sequences, is insensitive to redundancy in the training set, and works surprisingly well even with small learning sets. Because it uses unsupervised neural networks, it is able to extract patterns that are not present in all of the unaligned sequences of the learning set. The identification of these patterns in sequence databases is sensitive and efficient. The procedure comprises three main training stages. In the first stage, one SOM is trained to extract common features from the set of unaligned learning sequences. A feature is a number of ungapped sequence segments (usually 4-16 residues long) that are similar to segments in most of the sequences of the learning set according to an initial similarity matrix. In the second training stage, the recognition of each individual feature is refined by selecting an optimal weighting matrix out of a variety of existing amino acid similarity matrices. In a third stage of the SOM procedure, the position of the features in the individual sequences is learned. This allows for variants with feature repeats and feature shuffling. The procedure has been successfully applied to a number of notoriously difficult cases with distinct recognition problems: helix-turn-helix motifs in DNA-binding proteins, the CUB domain of developmentally regulated proteins, and the superfamily of ribokinases. A comparison with the established database search procedure PROFILE (and with several others) led to the conclusion that the new automatic method performs satisfactorily.  相似文献   

17.
An empirical relation between the amino acid composition and three-dimensional folding pattern of several classes of proteins has been determined. Computer simulated neural networks have been used to assign proteins to one of the following classes based on their amino acid composition and size: (1) 4α-helical bundles, (2) parallel (α/β)8 barrels, (3) nucleotide binding fold, (4) immunoglobulin fold, or (5) none of these. Networks trained on the known crystal structures as well as sequences of closely related proteins are shown to correctly predict folding classes of proteins not represented in the training set with an average accuracy of 87%. Other folding motifs can easily be added to the prediction scheme once larger databases become available. Analysis of the neural network weights reveals that amino acids favoring prediction of a folding class are usually over represented in that class and amino acids with unfavorable weights are underrepresented in composition. The neural networks utilize combinations of these multiple small variations in amino acid composition in order to make a prediction. The favorably weighted amino acids in a given class also form the most intramolecular interactions with other residues in proteins of that class. A detailed examination of the contacts of these amino acids reveals some general patterns that may help stabilize each folding class. © 1993 Wiley-Liss, Inc.  相似文献   

18.
Subcellular location is an important functional annotation of proteins. An automatic, reliable and efficient prediction system for protein subcellular localization is necessary for large-scale genome analysis. This paper describes a protein subcellular localization method which extracts features from protein profiles rather than from amino acid sequences. The protein profile represents a protein family, discards part of the sequence information that is not conserved throughout the family and therefore is more sensitive than the amino acid sequence. The amino acid compositions of whole profile and the N-terminus of the profile are extracted, respectively, to train and test the probabilistic neural network classifiers. On two benchmark datasets, the overall accuracies of the proposed method reach 89.1% and 68.9%, respectively. The prediction results show that the proposed method perform better than those methods based on amino acid sequences. The prediction results of the proposed method are also compared with Subloc on two redundance-reduced datasets.  相似文献   

19.
Structural genomics projects as well as ab initio protein structure prediction methods provide structures of proteins with no sequence or fold similarity to proteins with known functions. These are often low-resolution structures that may only include the positions of C alpha atoms. We present a fast and efficient method to predict DNA-binding proteins from just the amino acid sequences and low-resolution, C alpha-only protein models. The method uses the relative proportions of certain amino acids in the protein sequence, the asymmetry of the spatial distribution of certain other amino acids as well as the dipole moment of the molecule. These quantities are used in a linear formula, with coefficients derived from logistic regression performed on a training set, and DNA-binding is predicted based on whether the result is above a certain threshold. We show that the method is insensitive to errors in the atomic coordinates and provides correct predictions even on inaccurate protein models. We demonstrate that the method is capable of predicting proteins with novel binding site motifs and structures solved in an unbound state. The accuracy of our method is close to another, published method that uses all-atom structures, time-consuming calculations and information on conserved residues.  相似文献   

20.

One fundamental problem of protein biochemistry is to predict protein structure from amino acid sequence. The inverse problem, predicting either entire sequences or individual mutations that are consistent with a given protein structure, has received much less attention even though it has important applications in both protein engineering and evolutionary biology. Here, we ask whether 3D convolutional neural networks (3D CNNs) can learn the local fitness landscape of protein structure to reliably predict either the wild-type amino acid or the consensus in a multiple sequence alignment from the local structural context surrounding site of interest. We find that the network can predict wild type with good accuracy, and that network confidence is a reliable measure of whether a given prediction is likely going to be correct or not. Predictions of consensus are less accurate and are primarily driven by whether or not the consensus matches the wild type. Our work suggests that high-confidence mis-predictions of the wild type may identify sites that are primed for mutation and likely targets for protein engineering.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号