首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 250 毫秒
1.
We have developed an automatic algorithm STRIDE for protein secondary structure assignment from atomic coordinates based on the combined use of hydrogen bond energy and statistically derived backbone torsional angle information. Parameters of the pattern recognition procedure were optimized using designations provided by the crystallographers as a standard-of-truth. Comparison to the currently most widely used technique DSSP by Kabsch and Sander (Biopolymers 22:2577-2637, 1983) shows that STRIDE and DSSP assign secondary structural states in 58 and 31% of 226 protein chains in our data sample, respectively, in greater agreement with the specific residue-by-residue definitions provided by the discoverers of the structures while in 11% of the chains, the assignments are the same. STRIDE delineates every 11th helix and every 32nd strand more in accord with published assignments. © 1995 Wiley-Liss, Inc.  相似文献   

2.
MOTIVATION: What constitutes a baseline level of success for protein fold recognition methods? As fold recognition benchmarks are often presented without any thought to the results that might be expected from a purely random set of predictions, an analysis of fold recognition baselines is long overdue. Given varying amounts of basic information about a protein-ranging from the length of the sequence to a knowledge of its secondary structure-to what extent can the fold be determined by intelligent guesswork? Can simple methods that make use of secondary structure information assign folds more accurately than purely random methods and could these methods be used to construct viable hierarchical classifications? EXPERIMENTS PERFORMED: A number of rapid automatic methods which score similarities between protein domains were devised and tested. These methods ranged from those that incorporated no secondary structure information, such as measuring absolute differences in sequence lengths, to more complex alignments of secondary structure elements. Each method was assessed for accuracy by comparison with the Class Architecture Topology Homology (CATH) classification. Methods were rated against both a random baseline fold assignment method as a lower control and FSSP as an upper control. Similarity trees were constructed in order to evaluate the accuracy of optimum methods at producing a classification of structure. RESULTS: Using a rigorous comparison of methods with CATH, the random fold assignment method set a lower baseline of 11% true positives allowing for 3% false positives and FSSP set an upper benchmark of 47% true positives at 3% false positives. The optimum secondary structure alignment method used here achieved 27% true positives at 3% false positives. Using a less rigorous Critical Assessment of Structure Prediction (CASP)-like sensitivity measurement the random assignment achieved 6%, FSSP-59% and the optimum secondary structure alignment method-32%. Similarity trees produced by the optimum method illustrate that these methods cannot be used alone to produce a viable protein structural classification system. CONCLUSIONS: Simple methods that use perfect secondary structure information to assign folds cannot produce an accurate protein taxonomy, however they do provide useful baselines for fold recognition. In terms of a typical CASP assessment our results suggest that approximately 6% of targets with folds in the databases could be assigned correctly by randomly guessing, and as many as 32% could be recognised by trivial secondary structure comparison methods, given knowledge of their correct secondary structures.  相似文献   

3.
Toward consistent assignment of structural domains in proteins   总被引:3,自引:0,他引:3  
The assignment of protein domains from three-dimensional structure is critically important in understanding protein evolution and function, yet little quality assurance has been performed. Here, the differences in the assignment of structural domains are evaluated using six common assignment methods. Three human expert methods (AUTHORS (authors' annotation), CATH and SCOP) and three fully automated methods (DALI, DomainParser and PDP) are investigated by analysis of individual methods against the author's assignment as well as analysis based on the consensus among groups of methods (only expert, only automatic, combined). The results demonstrate that caution is recommended in using current domain assignments, and indicates where additional work is needed. Specifically, the major factors responsible for conflicting domain assignments between methods, both experts and automatic, are: (1) the definition of very small domains; (2) splitting secondary structures between domains; (3) the size and number of discontinuous domains; (4) closely packed or convoluted domain-domain interfaces; (5) structures with large and complex architectures; and (6) the level of significance placed upon structural, functional and evolutionary concepts in considering structural domain definitions. A web-based resource that focuses on the results of benchmarking and the analysis of domain assignments is available at  相似文献   

4.

Background

Secondary structures are elements of great importance in structural biology, biochemistry and bioinformatics. They are broadly composed of two repetitive structures namely α-helices and β-sheets, apart from turns, and the rest is associated to coil. These repetitive secondary structures have specific and conserved biophysical and geometric properties. PolyProline II (PPII) helix is yet another interesting repetitive structure which is less frequent and not usually associated with stabilizing interactions. Recent studies have shown that PPII frequency is higher than expected, and they could have an important role in protein – protein interactions.

Methodology/Principal Findings

A major factor that limits the study of PPII is that its assignment cannot be carried out with the most commonly used secondary structure assignment methods (SSAMs). The purpose of this work is to propose a PPII assignment methodology that can be defined in the frame of DSSP secondary structure assignment. Considering the ambiguity in PPII assignments by different methods, a consensus assignment strategy was utilized. To define the most consensual rule of PPII assignment, three SSAMs that can assign PPII, were compared and analyzed. The assignment rule was defined to have a maximum coverage of all assignments made by these SSAMs. Not many constraints were added to the assignment and only PPII helices of at least 2 residues length are defined.

Conclusions/Significance

The simple rules designed in this study for characterizing PPII conformation, lead to the assignment of 5% of all amino as PPII. Sequence – structure relationships associated with PPII, defined by the different SSAMs, underline few striking differences. A specific study of amino acid preferences in their N and C-cap regions was carried out as their solvent accessibility and contact patterns. Thus the assignment of PPII can be coupled with DSSP and thus opens a simple way for further analysis in this field.  相似文献   

5.
A computer program is used to analyse automatically and objectively the atomic co-ordinates of a large number of globular proteins in order to identify the regions of α-helix, β-sheet and reverse-turn secondary structure. Several different criteria for the assignment of secondary structure are tested for accuracy, reproducibility and efficiency. The most successful criterion, which is based on patterns of peptide hydrogen bonds, inter-Cα distances and inter-Cα torsion angles, is used to find the secondary structure of all the proteins studied. The accuracy of the derived assignments is assessed by comparing them with the secondary structure reported in the literature for each protein. The reliability of the methods is assessed by comparing the secondary structures derived from the independently determined sets of co-ordinates available for some proteins.We provide the first objective and consistent compilation of α-helix, β-sheet and reverse-turn secondary structure in almost all globular proteins of known tertiary structure. These data will be invaluable for analysing the relative tendencies of different amino acids to occur in different types of secondary structure, for analysing the regularity of the secondary structure itself, and for analysing how the pieces of secondary structure fit together to form the globular tertiary structure of each protein.  相似文献   

6.
SUMMARY: PONDEROSA (Peak-picking Of Noe Data Enabled by Restriction of Shift Assignments) accepts input information consisting of a protein sequence, backbone and sidechain NMR resonance assignments, and 3D-NOESY ((13)C-edited and/or (15)N-edited) spectra, and returns assignments of NOESY crosspeaks, distance and angle constraints, and a reliable NMR structure represented by a family of conformers. PONDEROSA incorporates and integrates external software packages (TALOS+, STRIDE and CYANA) to carry out different steps in the structure determination. PONDEROSA implements internal functions that identify and validate NOESY peak assignments and assess the quality of the calculated three-dimensional structure of the protein. The robustness of the analysis results from PONDEROSA's hierarchical processing steps that involve iterative interaction among the internal and external modules. PONDEROSA supports a variety of input formats: SPARKY assignment table (.shifts) and spectrum file formats (.ucsf), XEASY proton file format (.prot), and NMR-STAR format (.star). To demonstrate the utility of PONDEROSA, we used the package to determine 3D structures of two proteins: human ubiquitin and Escherichia coli iron-sulfur scaffold protein variant IscU(D39A). The automatically generated structural constraints and ensembles of conformers were as good as or better than those determined previously by much less automated means. AVAILABILITY: The program, in the form of binary code along with tutorials and reference manuals, is available at http://ponderosa.nmrfam.wisc.edu/.  相似文献   

7.
A consensus approach for the assignment of structural domains in proteins is presented. The approach combines a number of previously published algorithms, and takes advantage of the elevated accuracy obtained when assignments from the individual algorithms are in agreement. The consensus approach is tested on a data set of 55 protein chains, for which domain assignments from four automated methods were known, and for which crystallographers assignments had been reported in the literature. Accuracy was found to increase in this test from 72% using individual algorithms to 100% when all four methods were in agreement. However a consensus prediction using all four methods was only possible for 52% of the dataset. The consensus approach [using three publicly available domain assignment algorithms (PUU, DETECTIVE, DOMAK)] was then used to make domain assignments for a data set of 787 protein chains from the Protein Data Bank. Analysis of the assignments showed 55.7% of assignments could be made automatically, and of these, 13.5% were multi-domain proteins. Of the remaining 44.3% that could not be assigned by the consensus procedure 90.4% had their domain boundaries assigned correctly by at least one of the algorithms. Once identified, these domains were analyzed for trends in their size and secondary structure class. In addition, the discontinuity of each domain along the protein chain was considered.  相似文献   

8.
Lee J 《Proteins》2006,65(2):453-462
Many of the recent secondary structure prediction methods incorporate the idea of fuzzy set theory, where instead of assigning a definite secondary structure to a query residue, probability for the residue being in each of the conformational states is estimated. Moreover, continuous assignment of conformational states to the experimentally observed protein structures can be performed in order to reflect inherent flexibility. Although various measures have been developed for evaluating performances of secondary structure prediction methods, they depend only on the most probable secondary structures. They do not assess the accuracy of the probabilities produced by fuzzy prediction methods, and they cannot incorporate information contained in continuous assignments of conformational states to observed structures. Three important measures for evaluating performance of a secondary structure prediction algorithm, Q score, Segment OVerlap (SOV) measure, and the k-state correlation coefficient (Corr), are deformed into fuzzy measures F score, Fuzzy OVerlap (FOV) measure, and the fuzzy correlation coefficient (Forr), so that the new measures not only assess probabilistic outputs of fuzzy prediction methods, but also incorporate information from continuous assignments of secondary structure. As an example of application, prediction results of four fuzzy secondary structure prediction methods, PSIPRED, PROFking, SABLE, and PREDICT, are assessed using the new fuzzy measures.  相似文献   

9.
10.
Structure-based protein NMR assignments using native structural ensembles   总被引:1,自引:0,他引:1  
An important step in NMR protein structure determination is the assignment of resonances and NOEs to corresponding nuclei. Structure-based assignment (SBA) uses a model structure ("template") for the target protein to expedite this process. Nuclear vector replacement (NVR) is an SBA framework that combines multiple sources of NMR data (chemical shifts, RDCs, sparse NOEs, amide exchange rates, TOCSY) and has high accuracy when the template is close to the target protein's structure (less than 2 A backbone RMSD). However, a close template may not always be available. We extend the circle of convergence of NVR for distant templates by using an ensemble of structures. This ensemble corresponds to the low-frequency perturbations of the given template and is obtained using normal mode analysis (NMA). Our algorithm assigns resonances and sparse NOEs using each of the structures in the ensemble separately, and aggregates the results using a voting scheme based on maximum bipartite matching. Experimental results on human ubiquitin, using four distant template structures show an increase in the assignment accuracy. Our algorithm also improves the robustness of NVR with respect to structural noise. We provide a confidence measure for each assignment using the percentage of the structures that agree on that assignment. We use this measure to assign a subset of the peaks with even higher accuracy. We further validate our algorithm on data for two additional proteins with NVR. We then show the general applicability of our approach by applying our NMA ensemble-based voting scheme to another SBA tool, MARS. For three test proteins with corresponding templates, including the 370-residue maltose binding protein, we increase the number of reliable assignments made by MARS. Finally, we show that our voting scheme is sound and optimal, by proving that it is a maximum likelihood estimator of the correct assignments.  相似文献   

11.
Eukaryotic proteins with important biological function can be partially unstructured, conformational flexible, or heterogenic. Crystallization trials often fail for such proteins. In NMR spectroscopy, parts of the polypeptide chain undergoing dynamics in unfavorable time regimes cannot be observed. De novo NMR structure determination is seriously hampered when missing signals lead to an incomplete chemical shift assignment resulting in an information content of the NOE data insufficient to determine the structure ab initio. We developed a new protein structure determination strategy for such cases based on a novel NOE assignment strategy utilizing a number of model structures but no explicit reference structure as it is used for bootstrapping like algorithms. The software distinguishes in detail between consistent and mutually exclusive pairs of possible NOE assignments on the basis of different precision levels of measured chemical shifts searching for a set of maximum number of consistent NOE assignments in agreement with 3D space. Validation of the method using the structure of the low molecular‐weight‐protein tyrosine phosphatase A (MptpA) showed robust results utilizing protein structures with 30–45% sequence identity and 70% of the chemical shift assignments. About 60% of the resonance assignments are sufficient to identify those structural models with highest conformational similarity to the real structure. The software was benchmarked by de novo solution structures of fibroblast growth factor 21 (FGF21) and the extracellular fibroblast growth factor receptor domain FGFR4 D2, which both failed in crystallization trials and in classical NMR structure determination. Proteins 2013; 81:2007–2022. © 2013 Wiley Periodicals, Inc.  相似文献   

12.
Protein fold recognition using sequence-derived predictions.   总被引:18,自引:9,他引:9       下载免费PDF全文
In protein fold recognition, one assigns a probe amino acid sequence of unknown structure to one of a library of target 3D structures. Correct assignment depends on effective scoring of the probe sequence for its compatibility with each of the target structures. Here we show that, in addition to the amino acid sequence of the probe, sequence-derived properties of the probe sequence (such as the predicted secondary structure) are useful in fold assignment. The additional measure of compatibility between probe and target is the level of agreement between the predicted secondary structure of the probe and the known secondary structure of the target fold. That is, we recommend a sequence-structure compatibility function that combines previously developed compatibility functions (such as the 3D-1D scores of Bowie et al. [1991] or sequence-sequence replacement tables) with the predicted secondary structure of the probe sequence. The effect on fold assignment of adding predicted secondary structure is evaluated here by using a benchmark set of proteins (Fischer et al., 1996a). The 3D structures of the probe sequences of the benchmark are actually known, but are ignored by our method. The results show that the inclusion of the predicted secondary structure improves fold assignment by about 25%. The results also show that, if the true secondary structure of the probe were known, correct fold assignment would increase by an additional 8-32%. We conclude that incorporating sequence-derived predictions significantly improves assignment of sequences to known 3D folds. Finally, we apply the new method to assign folds to sequences in the SWISSPROT database; six fold assignments are given that are not detectable by standard sequence-sequence comparison methods; for two of these, the fold is known from X-ray crystallography and the fold assignment is correct.  相似文献   

13.
We have developed a program for automatic identification of domains in protein three-dimensional structures. Performance of the program was assessed by three different benchmarks: (i) by comparison with the expert-curated SCOP database of structural domains; (ii) by comparison with a collection of manual domain assignments; and (iii) by comparison with a set of 55 proteins, frequently used as a benchmark for automatic domain assignment. In all these benchmarks PDP identified domains correctly in more than 80% of proteins. AVAILABILITY: http://123d.ncifcrf.gov/.  相似文献   

14.
Recently developed methods to measure distances in proteins with high accuracy by “exact” nuclear Overhauser effects (eNOEs) make it possible to determine stereospecific assignments, which are particularly important to fully exploit the accuracy of the eNOE distance measurements. Stereospecific assignments are determined by comparing the eNOE-derived distances to protein structure bundles calculated without stereospecific assignments, or an independently determined crystal structure. The absolute and relative CYANA target function difference upon swapping the stereospecific assignment of a diastereotopic group yields the respective stereospecific assignment. We applied the method to the eNOE data set that has recently been obtained for the third immunoglobulin-binding domain of protein G (GB3). The 884 eNOEs provide relevant data for 47 of the total of 75 diastereotopic groups. Stereospecific assignments could be established for 45 diastereotopic groups (96 %) using the X-ray structure, or for 27 diastereotopic groups (57 %) using structures calculated with the eNOE data set without stereospecific assignments, all of which are in agreement with those determined previously. The latter case is relevant for structure determinations based on eNOEs. The accuracy of the eNOE distance measurements is crucial for making stereospecific assignments because applying the same method to the traditional NOE data set for GB3 with imprecise upper distance bounds yields only 13 correct stereospecific assignments using the X-ray structure or 2 correct stereospecific assignments using NMR structures calculated without stereospecific assignments.  相似文献   

15.
We introduce a Python-based program that utilizes the large database of 13C and 15N chemical shifts in the Biological Magnetic Resonance Bank to rapidly predict the amino acid type and secondary structure from correlated chemical shifts. The program, called PACSYlite Unified Query (PLUQ), is designed to help assign peaks obtained from 2D 13C–13C, 15N–13C, or 3D 15N–13C–13C magic-angle-spinning correlation spectra. We show secondary-structure specific 2D 13C–13C correlation maps of all twenty amino acids, constructed from a chemical shift database of 262,209 residues. The maps reveal interesting conformation-dependent chemical shift distributions and facilitate searching of correlation peaks during amino-acid type assignment. Based on these correlations, PLUQ outputs the most likely amino acid types and the associated secondary structures from inputs of experimental chemical shifts. We test the assignment accuracy using four high-quality protein structures. Based on only the Cα and Cβ chemical shifts, the highest-ranked PLUQ assignments were 40–60 % correct in both the amino-acid type and the secondary structure. For three input chemical shifts (CO–Cα–Cβ or N–Cα–Cβ), the first-ranked assignments were correct for 60 % of the residues, while within the top three predictions, the correct assignments were found for 80 % of the residues. PLUQ and the chemical shift maps are expected to be useful at the first stage of sequential assignment, for combination with automated sequential assignment programs, and for highly disordered proteins for which secondary structure analysis is the main goal of structure determination.  相似文献   

16.
The DSSP program automatically assigns the secondary structure for each residue from the three-dimensional co-ordinates of a protein structure to one of eight states. However, discrete assignments are incomplete in that they cannot capture the continuum of thermal fluctuations. Therefore, DSSPcont (http://cubic.bioc.columbia.edu/services/DSSPcont) introduces a continuous assignment of secondary structure that replaces 'static' by 'dynamic' states. Technically, the continuum results from calculating weighted averages over 10 discrete DSSP assignments with different hydrogen bond thresholds. A DSSPcont assignment for a particular residue is a percentage likelihood of eight secondary structure states, derived from a weighted average of the ten DSSP assignments. The continuous assignments have two important features: (i) they reflect the structural variations due to thermal fluctuations as detected by NMR spectroscopy; and (ii) they reproduce the structural variation between many NMR models from one single model. Therefore, functionally important variation can be extracted from a single X-ray structure using the continuous assignment procedure.  相似文献   

17.
Dupuis F  Sadoc JF  Mornon JP 《Proteins》2004,55(3):519-528
We present a new automatic algorithm, named VoTAP (Vo ronoï T essellation A ssignment P rocedure), which assigns secondary structures of a polypeptide chain using the list of α‐carbon coordinates. This program uses three‐dimensional Voronoï tessellation. This geometrical tool associates with each amino acid a Voronoï polyhedron, the faces of which unambiguously define contacts between residues. Thanks to the face area, for the contacts close together along the primary structure (low‐order contacts) a distinction is made between strong and normal ones. This new definition yields new contact matrices, which are analyzed and used to assign secondary structures. This assignment is performed in two stages. The first one uses contacts between residues close together along the primary structure and is based on data collected on a bank of 282 well‐refined nonredundant structures. In this bank, associations were made between the prints defined by these low‐order contacts and the assignments performed by different automatic methods. The second step focuses on the strand assignment and uses contacts between distant residues. Comparison with several other automatic assignment methods are presented, and the influence of resolution on the assignment is investigated. Proteins 2004. © 2004 Wiley‐Liss, Inc.  相似文献   

18.
NMR studies of large proteins have gathered much interest in recent years, especially after methyl-transverse relaxation optimized spectroscopy was successfully applied to systems as large as ~1 MDa in molecular weight. However, to fully take advantage of these spectra, there is a need for convenient and robust methods for making resonance assignments rapidly. Here, we present an improved version of our program MAP-XS (methyl assignment prediction from X-ray structure) for the automatic assignment of methyl peaks, based on nuclear Overhauser effects (NOE) correlations and chemical shifts together with available structures. No manual analysis of the NOE data is needed in this new version, which helps to further accelerate the assignment process. A refined algorithm as well as more efficient sampling produces results from single runs of MAP-XSII using unanalyzed NOE data are comparable to those achieved by the old version using manually curated data with every NOE peak correctly attributed to the two related methyl peaks; in addition, checking the results from multiple parallel runs against each other provides an effective mechanism for getting rid of the wrong assignments while keeping the correct ones, which significantly improves the reliability of final assignments. The new program is tested against three different proteins and delivers ~95 % correct assignments; positive results are also achieved for tests using different cut-off distances for NOEs, structures of lower resolutions, and ambiguous residue types.  相似文献   

19.
The DSSP program assigns protein secondary structure to one of eight states. This discrete assignment cannot describe the continuum of thermal fluctuations. Hence, a continuous assignment is proposed. Technically, the continuum results from averaging over ten discrete DSSP assignments with different hydrogen bond thresholds. The final continuous assignment for a single NMR model successfully reflected the structural variations observed between all NMR models in the ensemble. The structural variations between NMR models were verified to correlate with thermal motion; these variations were captured by the continuous assignments. Because the continuous assignment reproduces the structural variation between many NMR models from one single model, functionally important variation can be extracted from a single X-ray structure. Thus, continuous assignments of secondary structure may affect future protein structure analysis, comparison, and prediction.  相似文献   

20.
Loops connect regular secondary structures. In many instances, they are known to play important biological roles. Analysis and prediction of loop conformations depend directly on the definition of repetitive structures. Nonetheless, the secondary structure assignment methods (SSAMs) often lead to divergent assignments. In this study, we analyzed, both structure and sequence point of views, how the divergence between different SSAMs affect boundary definitions of loops connecting regular secondary structures. The analysis of SSAMs underlines that no clear consensus between the different SSAMs can be easily found. Because these latter greatly influence the loop boundary definitions, important variations are indeed observed, that is, capping positions are shifted between different SSAMs. On the other hand, our results show that the sequence information in these capping regions are more stable than expected, and, classical and equivalent sequence patterns were found for most of the SSAMs. This is, to our knowledge, the most exhaustive survey in this field as (i) various databank have been used leading to similar results without implication of protein redundancy and (ii) the first time various SSAMs have been used. This work hence gives new insights into the difficult question of assignment of repetitive structures and addresses the issue of loop boundaries definition. Although SSAMs give very different local structure assignments capping sequence patterns remain efficiently stable.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号