首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Prediction of protein structural class from the amino acid sequence   总被引:9,自引:0,他引:9  
P Klein  C Delisi 《Biopolymers》1986,25(9):1659-1672
The multidimensional statistical technique of discriminant analysis is used to allocate amino acid sequences to one of four secondary structural classes: high α content, high β content, mixed α and β, low content of ordered structure. Discrimination is based on four attributes: estimates of percentages of α and β structures, and regular variations in the hydrophobic values of residues along the sequence, occurring with periods of 2 and 3.6 residues. The reliability of the method, estimated by classifying 138 sequences from the Brookhaven Protein Data Bank, is 80%, with no misallocations between α-rich and β-rich classes. The reliability can be increased to 84% by making no allocation for proteins classified with odds close to 1. Classification using previously developed secondary structural prediction methods is considerably less reliable, the best result being 64% obtained using predictions based on the Delphi method.  相似文献   

2.
Hydrogen exchange experiments provide detailed information about the local stability and the solvent accessibility of different regions of the structures of folded proteins, protein complexes, and amyloid fibrils. We introduce an approach to predict protection factors from hydrogen exchange in proteins based on the knowledge of their amino acid sequences without the inclusion of any additional structural information. These results suggest that the propensity of different regions of the structures of globular proteins to undergo local unfolding events can be predicted from their amino acid sequences with an accuracy of 80% or better.  相似文献   

3.
4.
Prediction of RNA binding sites in proteins from amino acid sequence   总被引:3,自引:0,他引:3  
RNA-protein interactions are vitally important in a wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses. We have developed a computational tool for predicting which amino acids of an RNA binding protein participate in RNA-protein interactions, using only the protein sequence as input. RNABindR was developed using machine learning on a validated nonredundant data set of interfaces from known RNA-protein complexes in the Protein Data Bank. It generates a classifier that captures primary sequence signals sufficient for predicting which amino acids in a given protein are located in the RNA-protein interface. In leave-one-out cross-validation experiments, RNABindR identifies interface residues with >85% overall accuracy. It can be calibrated by the user to obtain either high specificity or high sensitivity for interface residues. RNABindR, implementing a Naive Bayes classifier, performs as well as a more complex neural network classifier (to our knowledge, the only previously published sequence-based method for RNA binding site prediction) and offers the advantages of speed, simplicity and interpretability of results. RNABindR predictions on the human telomerase protein hTERT are in good agreement with experimental data. The availability of computational tools for predicting which residues in an RNA binding protein are likely to contact RNA should facilitate design of experiments to directly test RNA binding function and contribute to our understanding of the diversity, mechanisms, and regulation of RNA-protein complexes in biological systems. (RNABindR is available as a Web tool from http://bindr.gdcb.iastate.edu.).  相似文献   

5.
Post-translational modifications (PTMs) occur on almost all proteins analyzed to date. The function of a modified protein is often strongly affected by these modifications and therefore increased knowledge about the potential PTMs of a target protein may increase our understanding of the molecular processes in which it takes part. High-throughput methods for the identification of PTMs are being developed, in particular within the fields of proteomics and mass spectrometry. However, these methods are still in their early stages, and it is indeed advantageous to cut down on the number of experimental steps by integrating computational approaches into the validation procedures. Many advanced methods for the prediction of PTMs exist and many are made publicly available. We describe our experiences with the development of prediction methods for phosphorylation and glycosylation sites and the development of PTM-specific databases. In addition, we discuss novel ideas for PTM visualization (exemplified by kinase landscapes) and improvements for prediction specificity (by using ESS--evolutionary stable sites). As an example, we present a new method for kinase-specific prediction of phosphorylation sites, NetPhosK, which extends our earlier and more general tool, NetPhos. The new server, NetPhosK, is made publicly available at the URL http://www.cbs.dtu.dk/services/NetPhosK/. The issues of underestimation, over-prediction and strategies for improving prediction specificity are also discussed.  相似文献   

6.
7.
Prediction of the location of structural domains in globular proteins   总被引:7,自引:0,他引:7  
The location of structural domains in proteins is predicted from the amino acid sequence, based on the analysis of a computed contact map for the protein, the average distance map (ADM). Interactions between residues i and j in a protein are subdivided into several ranges, according to the separation |i-j| in the amino acid sequence. Within each range, average spatial distances between every pair of amino acid residues are computed from a data base of known protein structures. Infrequently occurring pairs are omitted as being statistically insignificant. The average distances are used to construct a predicted ADM. The ADM is analyzed for the occurrence of regions with high densities of contacts (compact regions). Locations of rapid changes of density between various parts of the map are determined by the use of scanning plots of contact densities. These locations serve to pinpoint the distribution of compact regions. This distribution, in turn, is used to predict boundaries of domains in the protein. The technique provides an objective method for the location of domains both on a contact map derived from a known three-dimensional protein structure, the real distance map (RDM), and on an ADM. While most other published methods for the identification of domains locate them in the known three-dimensional structure of a protein, the technique presented here also permits the prediction of domains in proteins of unknown spatial structure, as the construction of the ADM for a given protein requires knowledge of only its amino acid sequence.  相似文献   

8.
An algorithm was derived to relate the amino acid sequence of a collagen triple helix to its thermal stability. This calculation is based on the triple helical stabilization propensities of individual residues and their intermolecular and intramolecular interactions, as quantitated by melting temperature values of host-guest peptides. Experimental melting temperature values of a number of triple helical peptides of varying length and sequence were successfully predicted by this algorithm. However, predicted T(m) values are significantly higher than experimental values when there are strings of oppositely charged residues or concentrations of like charges near the terminus. Application of the algorithm to collagen sequences highlights regions of unusually high or low stability, and these regions often correlate with biologically significant features. The prediction of stability from sequence indicates an understanding of the major forces maintaining this protein motif. The use of highly favorable KGE and KGD sequences is seen to complement the stabilizing effects of imino acids in modulating stability and may become dominant in the collagenous domains of bacterial proteins that lack hydroxyproline. The effect of single amino acid mutations in the X and Y positions can be evaluated with this algorithm. An interactive collagen stability calculator based on this algorithm is available online.  相似文献   

9.
The dynamic differential equation model developed and tested for bovine pancreatic trypsin inhibitor and tuna ferrocytochrome c in Ponnuswamy, P.K. & Bhaskaran, R. (Int. J. Peptide Protein Res. 24, 168-179, 1984) is extended for 17 more protein crystals in this work. Average displacements are computed for 20 amino acid residues observed in 19 proteins. Detailed information on the dynamic behaviour of the individual proteins and individual residues is presented. The effect of atomic packing on the fluctuations of the amino acid residues in alpha-chymotrypsin is illustrated. A number of general points on the dynamic characteristics of globular protein molecules are presented.  相似文献   

10.
The conformational parametersP k for each amino acid species (j=1–20) of sequential peptides in proteins are presented as the product ofP i,k , wherei is the number of the sequential residues in thekth conformational state (k=-helix,-sheet,-turn, or unordered structure). Since the average parameter for ann-residue segment is related to the average probability of finding the segment in the kth state, it becomes a geometric mean of (P k )av=(P i,k ) 1/n with amino acid residuei increasing from 1 ton. We then used ln(Pk)av to convert a multiplicative process to a summation, i.e., ln(P k ) av =(1/n)P i,k (i=1 ton) for ease of operation. However, this is unlike the popular Chou-Fasman algorithm, which has the flaw of using the arithmetic mean for relative probabilities. The Chou-Fasman algorithm happens to be close to our calculations in many cases mainly because the difference between theirP k and our InP k is nearly constant for about one-half of the 20 amino acids. When stronger conformation formers and breakers exist, the difference become larger and the prediction at the N- and C-terminal-helix or-sheet could differ. If the average conformational parameters of the overlapping segments of any two states are too close for a unique solution, our calculations could lead to a different prediction.  相似文献   

11.
An empirical relation between the amino acid composition and three-dimensional folding pattern of several classes of proteins has been determined. Computer simulated neural networks have been used to assign proteins to one of the following classes based on their amino acid composition and size: (1) 4α-helical bundles, (2) parallel (α/β)8 barrels, (3) nucleotide binding fold, (4) immunoglobulin fold, or (5) none of these. Networks trained on the known crystal structures as well as sequences of closely related proteins are shown to correctly predict folding classes of proteins not represented in the training set with an average accuracy of 87%. Other folding motifs can easily be added to the prediction scheme once larger databases become available. Analysis of the neural network weights reveals that amino acids favoring prediction of a folding class are usually over represented in that class and amino acids with unfavorable weights are underrepresented in composition. The neural networks utilize combinations of these multiple small variations in amino acid composition in order to make a prediction. The favorably weighted amino acids in a given class also form the most intramolecular interactions with other residues in proteins of that class. A detailed examination of the contacts of these amino acids reveals some general patterns that may help stabilize each folding class. © 1993 Wiley-Liss, Inc.  相似文献   

12.
A new approach of predicting structural classes of protein domain sequences is presented in this paper. Besides the amino acid composition, the composition of several dipeptides, tripeptides, tetrapeptides, pentapeptides and hexapeptides are taken into account based on the stepwise discriminant analysis. The result of jackknife test shows that this new approach can lead to higher predictive sensitivity and specificity for reduced sequence similarity datasets. Considering the dataset PDB40-B constructed by Brenner and colleagues, 75.2% protein domain sequences are correctly assigned in the jackknife test for the four structural classes: all-alpha, all-beta, alpha/beta and alpha + beta, which is improved by 19.4% in jackknife test and 25.5% in resubstitution test, in contrast with the component-coupled algorithm using amino acid composition alone (AAC approach) for the same dataset. In the cross-validation test with dataset PDB40-J constructed by Park and colleagues, more than 80% predictive accuracy is obtained. Furthermore, for the dataset constructed by Chou and Maggiona, the accuracy of 100% and 99.7% can be easily achieved, respectively, in the resubstitution test and in the jackknife test merely taking the composition of dipeptides into account. Therefore, this new method provides an effective tool to extract valuable information from protein sequences, which can be used for the systematic analysis of small or medium size protein sequences. The computer programs used in this paper are available on request.  相似文献   

13.

Background  

Understanding the molecular details of protein-DNA interactions is critical for deciphering the mechanisms of gene regulation. We present a machine learning approach for the identification of amino acid residues involved in protein-DNA interactions.  相似文献   

14.
The predictive limits of the amino acid composition for the secondary structural content (percentage of residues in the secondary structural states helix, sheet, and coil) in proteins are assessed quantitatively. For the first time, techniques for prediction of secondary structural content are presented which rely on the amino acid composition as the only information on the query protein. In our first method, the amino acid composition of an unknown protein is represented by the best (in a least square sense) linear combination of the characteristic amino acid compositions of the three secondary structural types computed from a learning set of tertiary structures. The second technique is a generalization of the first one and takes into account also possible compositional couplings between any two sorts of amino acids. Its mathematical formulation results in an eigenvalue/eigenvector problem of the second moment matrix describing the amino acid compositional fluctuations of secondary structural types in various proteins of a learning set. Possible correlations of the principal directions of the eigenspaces with physical properties of the amino acids were also checked. For example, the first two eigenvectors of the helical eigenspace correlate with the size and hydrophobicity of the residue types respectively. As learning and test sets of tertiary structures, we utilized representative, automatically generated subsets of Protein Data Bank (PDB) consisting of non-homologous protein structures at the resolution thresholds ≤1.8Å, ≤2.0Å, ≤2.5Å, and ≤3.0Å. We show that the consideration of compositional couplings improves prediction accuracy, albeit not dramatically. Whereas in the self-consistency test (learning with the protein to be predicted), a clear decrease of prediction accuracy with worsening resolution is observed, the jackknife test (leave the predicted protein out) yielded best results for the largest dataset (≤3.0 Å, almost no difference to the self-consistency test!), i.e., only this set, with more than 400 proteins, is sufficient for stable computation of the parameters in the prediction function of the second method. The average absolute error in predicting the fraction of helix, sheet, and coil from amino acid composition of the query protein are 13.7, 12.6, and 11.4%, respectively with r.m.s. deviations in the range of 8.6 ÷ 11.8% for the 3.0 Å dataset in a jackknife test. The absolute precision of the average absolute errors is in the range of 1 ÷ 3% as measured for other representative subsets of the PDB. Secondary structural content prediction methods found in the literature have been clustered in accordance with their prediction accuracies. To our surprise, much more complex secondary structure prediction methods utilized for the same purpose of secondary structural content prediction achieve prediction accuracies very similar to those of the present analytic techniques, implying that all the information beyond the amino acid composition is, in fact, mainly utilized for positioning the secondary structural state in the sequence but not for determination of the overall number of residues in a secondary structural type. This result implies that higher prediction accuracies cannot be achieved relying solely on the amino acid composition of an unknown query protein as prediction input. Our prediction program SSCP has been made available as a World Wide Web and E-mail service. © 1996 Wiley-Liss, Inc.  相似文献   

15.
The prediction of the secondary structure content (-helix and-strand content) of a globular protein may play an important complementary role in the prediction of the protein's structure. We propose a new prediction algorithm based on Chou's database [Chou (1995),Proteins Struct. Fund Genet. 21, 319]. The new algorithm is an improved multiple linear regression method, taking the nonlinear and coupling terms of the frequencies of different amino acids into account. The prediction is also based on the structural classes of proteins. A resubstitution examination for the algorithm shows that the average errors are 0.040 and 0.033 for the prediction of-helix content and-strand content, respectively. The examination of cross-validation, the jackknife analysis, shows that the average errors are 0.051 and 0.044 for the prediction of-helix content and-strand content, respectively. Both examinations indicate the self-consistency and the extrapolative effectiveness of the new algorithm. Compared with the other methods available currently, our method has the merits of simplicity and convenience for use, as well as a high prediction accuracy. By incorporating the prediction of the structural classes, the only input of our method is the amino acid composition of the protein to be predicted.  相似文献   

16.
A neural network-based tool, TargetP, for large-scale subcellular location prediction of newly identified proteins has been developed. Using N-terminal sequence information only, it discriminates between proteins destined for the mitochondrion, the chloroplast, the secretory pathway, and other localizations with a success rate of 85% (plant) or 90% (non-plant) on redundancy-reduced test sets. From a TargetP analysis of the recently sequenced Arabidopsis thaliana chromosomes 2 and 4 and the Ensembl Homo sapiens protein set, we estimate that 10% of all plant proteins are mitochondrial and 14% chloroplastic, and that the abundance of secretory proteins, in both Arabidopsis and Homo, is around 10%. TargetP also predicts cleavage sites with levels of correctly predicted sites ranging from approximately 40% to 50% (chloroplastic and mitochondrial presequences) to above 70% (secretory signal peptides). TargetP is available as a web-server at http://www.cbs.dtu.dk/services/TargetP/.  相似文献   

17.
Probable models of secondary calmodulin structure of vertebrates, invertebrates, higher plants and fungi were obtained. The results obtained are in favor of the idea that the secondary structure of calmodulin of different origin is similar in general outline. Peculiarities of the secondary calmodulin structure of invertebrates and fungi were analysed.  相似文献   

18.
We describe a novel presentation of the conformation of the backbone atoms for proteins of known structure. Given the Cα atom cartesian co-ordinates from X-ray crystallography, a matrix is calculated, where the ijth element of the matrix is the cosine of the angle between the direction of the chain at residue i and the direction of the chain at residue j. These “direction matrices” have distinctive patterns which correspond to α-helix, extended structure, straight or bent segments, “superhelix”, and many other important structural features. We discuss the direction matrices for a number of proteins, and make some generalizations on the basic principles of protein folding.  相似文献   

19.
E V Barkovski? 《Biofizika》1986,31(6):944-948
Distribution of the pairs of amino acids i, i + 1 in alpha-helical, beta-sheet and random coil regions from 46 globular proteins comprising 8115 amino acid residues was analyzed. Statistical analysis of the data excludes null hypothesis about random pairing of the amino acid residues i, i + 1 in beta-sheet and random coil configurations. The distribution of the amino acid pairs, i, i + 1 in alpha-helical configurations does not differ from the random pairing.  相似文献   

20.
In the native folded state of globular proteins, amino acid residues place themselves at various positions from the centroid of the molecule. Applying information theory on 19 protein crystals the spatial preferences have been found out from the frequencies of occurrence of residues within various concentric ellipsoidal zones of proteins. The intrinsic spatial preferences of individual residues are related to their physical and chemical properties. The directing power of the individual residues on the chain path and the spatial information contained by doublets of residues have been found out. The derived information is used to predict the spatial/zonal preference of residues in carp myogen using the knowledge of amino acid sequence. The implication of packing densities in different spatial zones are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号