首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
2.
3.

Background

The segment overlap score (SOV) has been used to evaluate the predicted protein secondary structures, a sequence composed of helix (H), strand (E), and coil (C), by comparing it with the native or reference secondary structures, another sequence of H, E, and C. SOV’s advantage is that it can consider the size of continuous overlapping segments and assign extra allowance to longer continuous overlapping segments instead of only judging from the percentage of overlapping individual positions as Q3 score does. However, we have found a drawback from its previous definition, that is, it cannot ensure increasing allowance assignment when more residues in a segment are further predicted accurately.

Results

A new way of assigning allowance has been designed, which keeps all the advantages of the previous SOV score definitions and ensures that the amount of allowance assigned is incremental when more elements in a segment are predicted accurately. Furthermore, our improved SOV has achieved a higher correlation with the quality of protein models measured by GDT-TS score and TM-score, indicating its better abilities to evaluate tertiary structure quality at the secondary structure level. We analyzed the statistical significance of SOV scores and found the threshold values for distinguishing two protein structures (SOV_refine  > 0.19) and indicating whether two proteins are under the same CATH fold (SOV_refine > 0.94 and > 0.90 for three- and eight-state secondary structures respectively). We provided another two example applications, which are when used as a machine learning feature for protein model quality assessment and comparing different definitions of topologically associating domains. We proved that our newly defined SOV score resulted in better performance.

Conclusions

The SOV score can be widely used in bioinformatics research and other fields that need to compare two sequences of letters in which continuous segments have important meanings. We also generalized the previous SOV definitions so that it can work for sequences composed of more than three states (e.g., it can work for the eight-state definition of protein secondary structures). A standalone software package has been implemented in Perl with source code released. The software can be downloaded from http://dna.cs.miami.edu/SOV/.
  相似文献   

4.
A measure of protein structure similarity is calculated from the matching of pairs of secondary structure elements between two proteins. The interaction of each pair was estimated from their axial line segments and combined with other geometric features to produce an optimal discrimination between intrafamily and interfamily relationships. The matching used a fast bipartite graph-matching algorithm that avoids the computational complexity of searching for the full subgraph isomorphism between the two sets of interactions. The main algorithm used was the "stable marriage" algorithm, which works on the ranked "preferences" of one interaction for another. The method takes 1/10 of a second for a typical comparison making it suitable as a fast pre-filter for slower, more exhaustive approaches. An application to protein structure classification is described.  相似文献   

5.
Hanson RM  Kohler D  Braun SG 《Proteins》2011,79(7):2172-2180
We describe here definitions of "local helical axis" and "straightness" that are developed using a simple quaternion-based analysis of protein structure without resort to least-squares fitting. As part of this analysis, it is shown how quaternion differences can be visualized to depict accurately the local helical axis relating any two adjacent amino acid residues in standard, nonidealized proteins. Three different options for the definition of amino acid residue orientation in terms of quaternion frames are described. Two of these, the "C(α) frame" and the "P frame," are shown to be correlated strongly with a simple approximate measure derived solely from Ramachandran angles. The relationship between quaternion-based straightness and recognized DSSP-derived secondary structure motifs is discussed.  相似文献   

6.

Background  

A number of methods are now available to perform automatic assignment of periodic secondary structures from atomic coordinates, based on different characteristics of the secondary structures. In general these methods exhibit a broad consensus as to the location of most helix and strand core segments in protein structures. However the termini of the segments are often ill-defined and it is difficult to decide unambiguously which residues at the edge of the segments have to be included. In addition, there is a "twilight zone" where secondary structure segments depart significantly from the idealized models of Pauling and Corey. For these segments, one has to decide whether the observed structural variations are merely distorsions or whether they constitute a break in the secondary structure.  相似文献   

7.
A simple approach to estimate the number of alpha-helical and beta-strand segments from protein circular dichroism spectra is described. The alpha-helix and beta-sheet conformations in globular protein structures, assigned by DSSP and STRIDE algorithms, were divided into regular and distorted fractions by considering a certain number of terminal residues in a given alpha-helix or beta-strand segment to be distorted. The resulting secondary structure fractions for 29 reference proteins were used in the analyses of circular dichroism spectra by the SELCON method. From the performance indices of the analyses, we determined that, on an average, four residues per alpha-helix and two residues per beta-strand may be considered distorted in proteins. The number of alpha-helical and beta-strand segments and their average length in a given protein were estimated from the fraction of distorted alpha-helix and beta-strand conformations determined from the analysis of circular dichroism spectra. The statistical test for the reference protein set shows the high reliability of such a classification of protein secondary structure. The method was used to analyze the circular dichroism spectra of four additional proteins and the predicted structural characteristics agree with the crystal structure data.  相似文献   

8.
Shestopalov BV 《Tsitologiia》2003,45(7):707-713
In the previous paper (Shestopalov, 2003) we presented the amino acid code of protein secondary structure as a partial solution of the fundamental problem of the protein three-dimensional structure calculation from the amino acid sequence. Here a statistical model of the code is described. The model is based on the structural data from 2258 protein chains (417,112 amino acid residues used). 60 and 61% of the secondary structure, calculated using the model, coincide, respectively, with the observed secondary structure in the training subset and test subset (104 protein chains and 21,166 residues used). This is equal to the threshold value for all the secondary structure calculations, based on the models, where, similarly as here, only the nearest and middle-range interactions are considered. Therefore the constructed model can be applied for the protein structure prediction from the amino acid sequence, especially when additional information is used along with expert analysis, as in the most successful prediction methods. The model can be used for analysis of the secondary structure changes during protein folding by comparison of the calculated and observed secondary structures. The information about the conformationally invariant segments can serve for the simulation of the supersecondary structure formation. One can try to obtain and examine the protein subset, in which the calculated and observed secondary structures are very similar.  相似文献   

9.
Although most proteins conform to the classical one‐structure/one‐function paradigm, an increasing number of proteins with dual structures and functions have been discovered. In response to cellular stimuli, such proteins undergo structural changes sufficiently dramatic to remodel even their secondary structures and domain organization. This “fold‐switching” capability fosters protein multi‐functionality, enabling cells to establish tight control over various biochemical processes. Accurate predictions of fold‐switching proteins could both suggest underlying mechanisms for uncharacterized biological processes and reveal potential drug targets. Recently, we developed a prediction method for fold‐switching proteins using structure‐based thermodynamic calculations and discrepancies between predicted and experimentally determined protein secondary structure (Porter and Looger, Proc Natl Acad Sci U S A 2018; 115:5968–5973). Here we seek to leverage the negative information found in these secondary structure prediction discrepancies. To do this, we quantified secondary structure prediction accuracies of 192 known fold‐switching regions (FSRs) within solved protein structures found in the Protein Data Bank (PDB). We find that the secondary structure prediction accuracies for these FSRs vary widely. Inaccurate secondary structure predictions are strongly associated with fold‐switching proteins compared to equally long segments of non‐fold‐switching proteins selected at random. These inaccurate predictions are enriched in helix‐to‐strand and strand‐to‐coil discrepancies. Finally, we find that most proteins with inaccurate secondary structure predictions are underrepresented in the PDB compared with their alternatively folded cognates, suggesting that unequal representation of fold‐switching conformers within the PDB could be an important cause of inaccurate secondary structure predictions. These results demonstrate that inconsistent secondary structure predictions can serve as a useful preliminary marker of fold switching.  相似文献   

10.
Hydrophobic cluster analysis (HCA) [15] is a very efficient method to analyse and compare protein sequences. Despite its effectiveness, this method is not widely used because it relies in part on the experience and training of the user. In this article, detailed guidelines as to the use of HCA are presented and include discussions on: the definition of the hydrophobic clusters and their relationships with secondary and tertiary structures; the length of the clusters; the amino acid classification used for HCA; the HCA plot programs; and the working strategies. Various procedures for the analysis of a single sequence are presented: structural segmentation, structural domains and secondary structure evaluation. Like most sequence analysis methods, HCA is more efficient when several homologous sequences are compared. Procedures for the detection and alignment of distantly related proteins by HCA are described through several published examples along with 2 previously unreported cases: the beta-glucosidase from Ruminococcus albus is clearly related to the beta-glucosidases from Clostridum thermocellum and Hansenula anomala although they display a reverse organization of their constitutive domains; the alignment of the sequence of human GTPase activating protein with that of the Crk oncogene is presented. Finally, the pertinence of HCA in the identification of important residues for structure/function as well as in the preparation of homology modelling is discussed.  相似文献   

11.
Knowledge-based potentials are used widely in protein folding and inverse folding algorithms. Two kinds of derivation methods are used. (1) The interactions in a database of known protein structures are assumed to obey a Boltzmann distribution. (2) The stability of the native folds relative to a manifold of misfolded structures is optimized. Here, a set of previously derived contact and secondary structure propensity potentials, taken as the "true" potentials, are employed to construct an artificial protein structural database from protein fragments. Then, new sets of potentials are derived to see how they are related to the true potentials. Using the Boltzmann distribution method, when the stability of the structures in the database lies within a certain range, both contact potentials and secondary structure propensities can be derived separately with remarkable accuracy. In general, the optimization method was found to be less accurate due to errors in the "excess energy" contribution. When the excess energy terms are kept as a constraint, the true potentials are recovered exactly.  相似文献   

12.
Cuff JA  Barton GJ 《Proteins》1999,34(4):508-519
A new dataset of 396 protein domains is developed and used to evaluate the performance of the protein secondary structure prediction algorithms DSC, PHD, NNSSP, and PREDATOR. The maximum theoretical Q3 accuracy for combination of these methods is shown to be 78%. A simple consensus prediction on the 396 domains, with automatically generated multiple sequence alignments gives an average Q3 prediction accuracy of 72.9%. This is a 1% improvement over PHD, which was the best single method evaluated. Segment Overlap Accuracy (SOV) is 75.4% for the consensus method on the 396-protein set. The secondary structure definition method DSSP defines 8 states, but these are reduced by most authors to 3 for prediction. Application of the different published 8- to 3-state reduction methods shows variation of over 3% on apparent prediction accuracy. This suggests that care should be taken to compare methods by the same reduction method. Two new sequence datasets (CB513 and CB251) are derived which are suitable for cross-validation of secondary structure prediction methods without artifacts due to internal homology. A fully automatic World Wide Web service that predicts protein secondary structure by a combination of methods is available via http://barton.ebi.ac.uk/.  相似文献   

13.
A simple alternative method for obtaining "random coil" chemical shifts by intrinsic referencing using the protein's own peptide sequence is presented. These intrinsic random coil backbone shifts were then used to calculate secondary chemical shifts, that provide important information on the residual secondary structure elements in the acid-denatured state of an acyl-coenzyme A binding protein. This method reveals a clear correlation between the carbon secondary chemical shifts and the amide secondary chemical shifts 3-5 residues away in the primary sequence. These findings strongly suggest transient formation of short helix-like segments, and identify unique sequence segments important for protein folding.  相似文献   

14.
An algorithm for determining of protein domain structure is proposed. Domain structures resulted from the algorithm application have been obtained and compared with available data. The method is based on entirely physical model of van der Waals interactions that reflects as illustrated in this work the distribution of electron density. Various levels of hierarchy in the protein spatial structure are discerned by analysis of the energy interaction between structural units of different scales. Thus the level of energy hierarchy plays role of sole parameter, and the method obviates the use of complicated geometrical criteria with numerous fitting parameters. The algorithm readily and accurately locates domains formed by continuous segments of the protein chain as well as those comprising non-sequential segments, sets no limit to the number of segments in a domain. We have analyzed 309 protein structures. Among 277 structures for which our results could be compared with the domain definitions made in other works, 243 showed complete or partial coincidence, and only in 34 cases the domain structures proved substantially different. The domains delineated with our approach may coincide with reference definition at different levels of the globule hierarchy. Along with defining the domain structure, our approach allows one to consider the protein spatial structure in terms of the spatial distribution of the interaction energy in order to establish the correspondence between the hierarchy of energy distribution and the hierarchy of structural elements.  相似文献   

15.
Topology prediction of membrane proteins.   总被引:19,自引:3,他引:16       下载免费PDF全文
A new method is described for prediction of protein membrane topology (intra- and extracellular sidedness) from multiply aligned amino acid sequences after determination of the membrane-spanning segments. The prediction technique relies on residue compositional differences in the protein segments exposed at each side of the membrane. Intra/extracellular ratios are calculated for the residue types Asn, Asp, Gly, Phe, Pro, Trp, Tyr, and Val, preferably found on the extracellular side, and for Ala, Arg, Cys, and Lys, mostly occurring on the intracellular side. The consensus over these 12 residue distributions is used for sidedness prediction. The method was developed with a test set of 42 protein families, for which all but one were correctly predicted with the new algorithm. This represents an improvement over predictions based on the widely used "positive-inside rule" and other techniques, where at least six mispredictions were observed for the same data set. Further, application of this and other methods to 12 protein families not in the test set still showed the better performance of the present technique, which was subsequently applied to another set of membrane protein families where the topology has yet to be determined.  相似文献   

16.
A method is described to construct sets of decoy models that can be used to generate a background score distribution for protein structure comparison. The models are derived directly from the two proteins being compared and retain all the essential properties of the structures, including length, density, shape and secondary structure composition but have different folds. As each comparison involves a pair of proteins of the same length, no explicit normalisation is required to adjust for the length of the proteins being compared. This allows substructure (or domain) matches to score almost equally to the comparison of isolated domains. A normalised probability measure was derived that allows joint family/family comparison. The method was applied to some of the CASP6 models for targets with new folds.  相似文献   

17.
Using evolutionary information contained in multiple sequence alignments as input to neural networks, secondary structure can be predicted at significantly increased accuracy. Here, we extend our previous three-level system of neural networks by using additional input information derived from multiple alignments. Using a position-specific conservation weight as part of the input increases performance. Using the number of insertions and deletions reduces the tendency for overprediction and increases overall accuracy. Addition of the global amino acid content yields a further improvement, mainly in predicting structural class. The final network system has a sustained overall accuracy of 71.6% in a multiple cross-validation test on 126 unique protein chains. A test on a new set of 124 recently solved protein structures that have no significant sequence similarity to the learning set confirms the high level of accuracy. The average cross-validated accuracy for all 250 sequence-unique chains is above 72%. Using various data sets, the method is compared to alternative prediction methods, some of which also use multiple alignments: the performance advantage of the network system is at least 6 percentage points in three-state accuracy. In addition, the network estimates secondary structure content from multiple sequence alignments about as well as circular dichroism spectroscopy on a single protein and classifies 75% of the 250 proteins correctly into one of four protein structural classes. Of particular practical importance is the definition of a position-specific reliability index. For 40% of all residues the method has a sustained three-state accuracy of 88%, as high as the overall average for homology modelling. A further strength of the method is greatly increased accuracy in predicting the placement of secondary structure segments. © 1994 Wiley-Liss, Inc.  相似文献   

18.
S Hayward 《Proteins》1999,36(4):425-435
With the use of a recently developed method, twenty-four proteins for which two or more X-ray conformers are known have been analyzed to reveal structural principles that govern domain motions in proteins. In all 24 cases, the domain motion is a rotation about a physical axis created through local interactions both covalent and noncovalent. In many cases, two or more mechanical hinges separated in space create a stable hinge axis for precise control of the domain closure. The terminal regions of alpha-helices and beta-sheets have been found to act as mechanical hinges in a significant number of cases. In some cases, the two terminal regions of neighboring strands of a single beta-sheet can create a hinge axis, as can the two termini of a single alpha-helix. These two structures have been termed the "double-hinged beta-sheet" and "double-hinged alpha-helix," respectively. A flexible loop that attaches one domain to another and through which the effective hinge axis passes is another construct that is used to create a hinge. Noncovalent interactions between segments remote along the polypeptide chain can also form hinges. In addition alpha-helices that preserve their hydrogen bonding structure when bent have been found to behave as mechanical hinges. It is suggested that these alpha-helices act as a store of elastic energy that drives the closing of domains for rapid capture of the substrate. If the repertoire of possible interdomain structures is as limited as this study suggests, the dynamic behavior of proteins could soon be predicted using bioinformatics techniques. Proteins 1999;36:425-435.  相似文献   

19.
Two geometrical parameters describing the structure of a polypeptide: V-dihedral angle between two sequential peptide bond planes and R-radius of curvature are used for structural classification of polypeptide structure in proteins. The relation between these two parameters was the basis for the definition of the conformational sub-space for early-stage structural forms. The cluster analysis of V and lnR, applied to the selected proteins of well-defined secondary structure (according to DSSP classification) and to proteins without any introductory classified analysis, revealed that several of the discriminated groups of proteins agree with the assumed model of early-stage conformational sub-space. This analysis shows that protein structures may be represented in VR space instead of Phi, Psi angles space, thus lowering the conformational space dimensionality. The VR model allows classification of traditional secondary structure elements as well as different Random Coil motifs, which broadens the range of recognized structural categories (compared to standard secondary structure elements).  相似文献   

20.
Membrane proteins conduct many important biological functions essential to the survival of organisms. However, due to their inherent hydrophobic nature, it is very difficult to obtain structural information on membrane‐bound proteins using traditional biophysical techniques. We are developing a new approach to probe the secondary structure of membrane proteins using the pulsed EPR technique of Electron Spin Echo Envelope Modulation (ESEEM) Spectroscopy. This method has been successfully applied to model peptides made synthetically. However, in order for this ESEEM technique to be widely applicable to larger membrane protein systems with no size limitations, protein samples with deuterated residues need to be prepared via protein expression methods. For the first time, this study shows that the ESEEM approach can be used to probe the local secondary structure of a 2H‐labeled d8‐Val overexpressed membrane protein in a membrane mimetic environment. The membrane‐bound human KCNE1 protein was used with a known solution NMR structure to demonstrate the applicability of this methodology. Three different α‐helical regions of KCNE1 were probed: the extracellular domain (Val21), transmembrane domain (Val50), and cytoplasmic domain (Val95). These results indicated α‐helical structures in all three segments, consistent with the micelle structure of KCNE1. Furthermore, KCNE1 was incorporated into a lipid bilayer and the secondary structure of the transmembrane domain (Val50) was shown to be α‐helical in a more native‐like environment. This study extends the application of this ESEEM approach to much larger membrane protein systems that are difficult to study with X‐ray crystallography and/or NMR spectroscopy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号