首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Amino acid propensities for secondary structures were used since the 1970s, when Chou and Fasman evaluated them within datasets of few tens of proteins and developed a method to predict secondary structure of proteins, still in use despite prediction methods having evolved to very different approaches and higher reliability. Propensity for secondary structures represents an intrinsic property of amino acid, and it is used for generating new algorithms and prediction methods, therefore our work has been aimed to investigate what is the best protein dataset to evaluate the amino acid propensities, either larger but not homogeneous or smaller but homogeneous sets, i.e., all-alpha, all-beta, alpha-beta proteins. As a first analysis, we evaluated amino acid propensities for helix, beta-strand, and coil in more than 2000 proteins from the PDBselect dataset. With these propensities, secondary structure predictions performed with a method very similar to that of Chou and Fasman gave us results better than the original one, based on propensities derived from the few tens of X-ray protein structures available in the 1970s. In a refined analysis, we subdivided the PDBselect dataset of proteins in three secondary structural classes, i.e., all-alpha, all-beta, and alpha-beta proteins. For each class, the amino acid propensities for helix, beta-strand, and coil have been calculated and used to predict secondary structure elements for proteins belonging to the same class by using resubstitution and jackknife tests. This second round of predictions further improved the results of the first round. Therefore, amino acid propensities for secondary structures became more reliable depending on the degree of homogeneity of the protein dataset used to evaluate them. Indeed, our results indicate also that all algorithms using propensities for secondary structure can be still improved to obtain better predictive results.  相似文献   

3.
A method of graduating (i.e., least-squares fitting) a smooth polynomial curve through long elements of protein secondary structure is described. It uses the Chebyshev polynomials of a discrete (integer) variable with several restraints to prevent artifactual curvatures. A new recursion formula is given which allows the evaluation of the polynomials on rational-number points as well as on the integer points. High-order splines suitable for interpolation between integer points are also discussed. The new method finds applications in graphics and in structural analysis.  相似文献   

4.
A strong similarity between the major aspects of protein folding and protein recognition is one of the emerging fundamental principles in protein science. A crucial importance of steric complementarity in protein recognition is a well-established fact. The goal of this study was to assess the importance of the steric complementarity in protein folding, namely, in the packing of the secondary structure elements. Although the tight packing of protein structures, in general, is a well-known fact, a systematic study of the role of geometric complementarity in the packing of secondary structure elements has been lacking. To assess the role of the steric complementarity, we used a docking procedure to recreate the crystallographically determined packing of secondary structure elements in known protein structures by using the geometric match only. The docking results revealed a significant percentage of correctly predicted packing configurations. Different types of pairs of secondary structure elements showed different degrees of steric complementarity (from high to low: beta-beta, loop-loop, alpha-alpha, and alpha-beta). Interestingly, the relative contribution of the steric match in different types of pairs was correlated with the number of such pairs in known protein structures. This effect may indicate an evolutionary pressure to select tightly packed elements of secondary structure to maximize the packing of the entire structure. The overall conclusion is that the steric match plays an essential role in the packing of secondary structure elements. The results are important for better understanding of principles of protein structure and may facilitate development of better methods for protein structure prediction.  相似文献   

5.
Exponential growth in the number of available protein sequences is unmatched by the slower growth in the number of structures. As a result, the development of efficient and fast protein secondary structure prediction methods is essential for the broad comprehension of protein structures. Computational methods that can efficiently determine secondary structure can in turn facilitate protein tertiary structure prediction, since most methods rely initially on secondary structure predictions. Recently, we have developed a fast learning optimized prediction methodology (FLOPRED) for predicting protein secondary structure (Saraswathi et al. in JMM 18:4275, 2012). Data are generated by using knowledge-based potentials combined with structure information from the CATH database. A neural network-based extreme learning machine (ELM) and advanced particle swarm optimization (PSO) are used with this data to obtain better and faster convergence to more accurate secondary structure predicted results. A five-fold cross-validated testing accuracy of 83.8 % and a segment overlap (SOV) score of 78.3 % are obtained in this study. Secondary structure predictions and their accuracy are usually presented for three secondary structure elements: α-helix, β-strand and coil but rarely have the results been analyzed with respect to their constituent amino acids. In this paper, we use the results obtained with FLOPRED to provide detailed behaviors for different amino acid types in the secondary structure prediction. We investigate the influence of the composition, physico-chemical properties and position specific occurrence preferences of amino acids within secondary structure elements. In addition, we identify the correlation between these properties and prediction accuracy. The present detailed results suggest several important ways that secondary structure predictions can be improved in the future that might lead to improved protein design and engineering.  相似文献   

6.
RNA tertiary structure is crucial to its many non-coding molecular functions. RNA architecture is shaped by its secondary structure composed of stems, stacked canonical base pairs, enclosing loops. While stems are precisely captured by free-energy models, loops composed of non-canonical base pairs are not. Nor are distant interactions linking together those secondary structure elements (SSEs). Databases of conserved 3D geometries (a.k.a. modules) not captured by energetic models are leveraged for structure prediction and design, but the computational complexity has limited their study to local elements, loops. Representing the RNA structure as a graph has recently allowed to expend this work to pairs of SSEs, uncovering a hierarchical organization of these 3D modules, at great computational cost. Systematically capturing recurrent patterns on a large scale is a main challenge in the study of RNA structures. In this paper, we present an efficient algorithm to compute maximal isomorphisms in edge colored graphs. We extend this algorithm to a framework well suited to identify RNA modules, and fast enough to considerably generalize previous approaches. To exhibit the versatility of our framework, we first reproduce results identifying all common modules spanning more than 2 SSEs, in a few hours instead of weeks. The efficiency of our new algorithm is demonstrated by computing the maximal modules between any pair of entire RNA in the non-redundant corpus of known RNA 3D structures. We observe that the biggest modules our method uncovers compose large shared sub-structure spanning hundreds of nucleotides and base pairs between the ribosomes of Thermus thermophilus, Escherichia Coli, and Pseudomonas aeruginosa.  相似文献   

7.
The language of RNA: a formal grammar that includes pseudoknots   总被引:9,自引:0,他引:9  
MOTIVATION: In a previous paper, we presented a polynomial time dynamic programming algorithm for predicting optimal RNA secondary structure including pseudoknots. However, a formal grammatical representation for RNA secondary structure with pseudoknots was still lacking. RESULTS: Here we show a one-to-one correspondence between that algorithm and a formal transformational grammar. This grammar class encompasses the context-free grammars and goes beyond to generate pseudoknotted structures. The pseudoknot grammar avoids the use of general context-sensitive rules by introducing a small number of auxiliary symbols used to reorder the strings generated by an otherwise context-free grammar. This formal representation of the residue correlations in RNA structure is important because it means we can build full probabilistic models of RNA secondary structure, including pseudoknots, and use them to optimally parse sequences in polynomial time.  相似文献   

8.
In the last years, small-world behavior has been extensively described for proteins, when they are represented by the undirected graph defined by the inter-residue protein contacts. By adopting this representation it was possible to compute the average clustering coefficient (C) and characteristic path length (L) of protein structures, and their values were found to be similar to those of graphs characterized by small-world topology. In this comment, we analyze a large set of non-redundant protein structures (1753) and show that by randomly mimicking the protein collapse, the covalent structure of the protein chain significantly contributes to the small-world behavior of the inter-residue contact graphs. When protein graphs are generated, imposing constraints similar to those induced by the backbone connectivity, their characteristic path lengths and clustering coefficients are indistinguishable from those computed using the real contact maps showing that L and C values cannot be used for 'protein fingerprinting'. Moreover we verified that these results are independent of the selected protein representations, residue composition and protein secondary structures.  相似文献   

9.
MOTIVATION: A large body of evidence suggests that protein structural information is frequently encoded in local sequences-sequence-structure relationships derived from local structure/sequence analyses could significantly enhance the capacities of protein structure prediction methods. In this paper, the prediction capacity of a database (LSBSP2) that organizes local sequence-structure relationships encoded in local structures with two consecutive secondary structure elements is tested with two computational procedures for protein structure prediction. The goal is twofold: to test the folding hypothesis that local structures are determined by local sequences, and to enhance our capacity in predicting protein structures from their amino acid sequences. RESULTS: The LSBSP2 database contains a large set of sequence profiles derived from exhaustive pair-wise structural alignments for local structures with two consecutive secondary structure elements. One computational procedure makes use of the PSI-BLAST alignment program to predict local structures for testing sequence fragments by matching the testing sequence fragments onto the sequence profiles in the LSBSP2 database. The results show that 54% of the test sequence fragments were predicted with local structures that match closely with their native local structures. The other computational procedure is a filter system that is capable of removing false positives as possible from a set of PSI-BLAST hits. An assessment with a large set of non-redundant protein structures shows that the PSI-BLAST + filter system improves the prediction specificity by up to two-fold over the prediction specificity of the PSI-BLAST program for distantly related protein pairs. Tests with the two computational procedures above demonstrate that local sequence-structure relationships can indeed enhance our capacity in protein structure prediction. The results also indicate that local sequences encoded with strong local structure propensities play an important role in determining the native state folding topology.  相似文献   

10.
11.
In this paper, we propose a nongraphical representation for protein secondary structures. By counting the frequency of occurrence of all possible four-tuples (i.e., four-letter words) of a protein secondary structure sequence, we construct a set of 3x3 matrices for the corresponding protein secondary structure sequence. Furthermore, the leading eigenvalues of these matrices are computed and considered as invariants for the protein secondary structure sequences. To illustrate the utility of our approach, we apply it to a set of real data to distinguish protein structural classes. The result indicates that it can be used to complement the classification of protein secondary structures.  相似文献   

12.
Zhang W  Dunker AK  Zhou Y 《Proteins》2008,71(1):61-67
How to make an objective assignment of secondary structures based on a protein structure is an unsolved problem. Defining the boundaries between helix, sheet, and coil structures is arbitrary, and commonly accepted standard assignments do not exist. Here, we propose a criterion that assesses secondary structure assignment based on the similarity of the secondary structures assigned to pairwise sequence-alignment benchmarks, where these benchmarks are determined by prior structural alignments of the protein pairs. This criterion is used to rank six secondary structure assignment methods: STRIDE, DSSP, SECSTR, KAKSI, P-SEA, and SEGNO with three established sequence-alignment benchmarks (PREFAB, SABmark, and SALIGN). STRIDE and KAKSI achieve comparable success rates in assigning the same secondary structure elements to structurally aligned residues in the three benchmarks. Their success rates are between 1-4% higher than those of the other four methods. The consensus of STRIDE, KAKSI, SECSTR, and P-SEA, called SKSP, improves assignments over the best single method in each benchmark by an additional 1%. These results support the usefulness of the sequence-alignment benchmarks as a means to evaluate secondary structure assignment. The SKSP server and the benchmarks can be accessed at http://sparks.informatics.iupui.edu  相似文献   

13.
We have developed a holistic protein structure estimation technique using amide I band Raman spectroscopy. This technique combines the superposition of reference spectra for pure secondary structure elements with simultaneous aromatic, fluorescence, and solvent background subtraction, and is applicable to solution, suspension, and solid protein samples. A key component of this technique was the calculation of the reference spectra for ordered helix, unordered helix, and sheet, turns, and unordered structures from a series of well-characterized reference proteins. We accurately account for the overlap between the amide I and non-amide I regions and allow for different scattering efficiencies for different secondary structures. For hydrated samples, we allowed for the possibility that bound water spectra differ from the bulk water spectra. Our computed reference spectra compare well with previous experimental and theoretical results in the literature. We have demonstrated the use of these reference spectra for the estimation of secondary structures of proteins in solution, suspension, and dry solid forms. The agreement between our structure estimates and the corresponding determinations from X-ray crystallography is good.  相似文献   

14.
We review tools for structure identification and model-based refinement from three-dimensional electron microscopy implemented in our in-house software package, VOLROVER 2.0. For viral density maps with icosahedral symmetry, we segment the capsid, polymeric, and monomeric subunits using techniques based on automatic symmetry detection and multidomain fast marching. For large biomolecules without symmetry information, we again use our multidomain fast-marching method with manual or fit-based multiseeding to segment meaningful substructures. In either case, we subject the resulting segmented subunit to secondary structure detection when the EM resolution is sufficiently high, and rigid-body structure fitting when the corresponding X-ray structure is available. Secondary structure elements are identified by three techniques: our earlier volume-based and boundary-based skeletonization methods as well as a new method, currently in development, based on solving the grassfire flow equation. For rigid-body fitting, we adapt our earlier fast Fourier-based correlation scheme F2Dock. Our reported segmentation, secondary structure elements identification, and rigid-body fitting techniques, implemented in VOLROVER 2.0 are applied to the PSB 2011 cryo-EM modeling challenge data, and our results are briefly compared to similar results submitted from other research groups. The comparisons show that our techniques are equally capable of segmenting relatively accurate subunits from a viral or protein assembly, and that high segmentation quality leads in turn to higher-quality results of secondary structure elements identification and correlation-based rigid-body fitting. ? 2012 Wiley Periodicals, Inc. Biopolymers 97: 709-731, 2012.  相似文献   

15.
In an attempt to assign secondary structure elements to protein primary structures with antibodies, we synthesized a model peptide (beta-peptide: TVTVTDPGQTVTY) with a putative beta-turn structure and analysed the anti-peptide antibodies for their specificity towards the turn sequence. At least 50% of the peptide fraction adopts the intended conformation of a beta-turn (DPGQ) inserted between the two segments of an antiparallel beta-sheet structure. The specific anti-beta-peptide antibodies of the hyperimmune response bind the beta-turn containing epitope of the immunogenic beta-peptide with a three orders of magnitude higher affinity than the synthetic control peptide (Gly-peptide: GGGGGDPGQGGGG). The affinity of the antibodies with specificity for the beta-turn region increases from the primary to the hyperimmune response. Therefore, probing of secondary structure elements, i.e., of individual beta-turn regions, by anti-peptide antibodies now seems feasible for proteins of known sequence and may result in sequence assignments of secondary structures.  相似文献   

16.
We present a novel topological classification of RNA secondary structures with pseudoknots. It is based on the topological genus of the circular diagram associated to the RNA base-pair structure. The genus is a positive integer number whose value quantifies the topological complexity of the folded RNA structure. In such a representation, planar diagrams correspond to pure RNA secondary structures and have zero genus, whereas non-planar diagrams correspond to pseudoknotted structures and have higher genus. The topological genus allows for the definition of topological folding motifs, similar in spirit to those introduced and commonly used in protein folding. We analyze real RNA structures from the databases Worldwide Protein Data Bank and Pseudobase and classify them according to their topological genus. For simplicity, we limit our analysis by considering only Watson-Crick complementary base pairs and G-U wobble base pairs. We compare the results of our statistical survey with existing theoretical and numerical models. We also discuss possible applications of this classification and show how it can be used for identifying new RNA structural motifs.  相似文献   

17.
Similarity search for protein 3D structures become complex and computationally expensive due to the fact that the size of protein structure databases continues to grow tremendously. Recently, fast structural similarity search systems have been required to put them into practical use in protein structure classification whilst existing comparison systems do not provide comparison results on time. Our approach uses multi-step processing that composes of a preprocessing step to represent geometry of protein structures with spatial objects, a filter step to generate a small candidate set using approximate topological string matching, and a refinement step to compute a structural alignment. This paper describes the preprocessing and filtering for fast similarity search using the discovery of topological patterns of secondary structure elements based on spatial relations. Our system is fully implemented by using Oracle 8i spatial. We have previously shown that our approach has the advantage of speed of performance compared with other approach such as DALI. This work shows that the discovery of topological relations of secondary structure elements in protein structures by using spatial relations of spatial databases is practical for fast structural similarity search for proteins.  相似文献   

18.
Abstract

In this paper, we propose a nongraphical representation for protein secondary structures. By counting the frequency of occurrence of all possible four-tuples (i.e., four-letter words) of a protein secondary structure sequence, we construct a set of 3 × 3 matrices for the corresponding protein secondary structure sequence. Furthermore, the leading eigenvalues of these matrices are computed and considered as invariants for the protein secondary structure sequences. To illustrate the utility of our approach, we apply it to a set of real data to distinguish protein structural classes. The result indicates that it can be used to complement the classification of protein secondary structures.  相似文献   

19.
The paper investigates the computational problem of predicting RNA secondary structures. The general belief is that allowing pseudoknots makes the problem hard. Existing polynomial-time algorithms are heuristic algorithms with no performance guarantee and can handle only limited types of pseudoknots. In this paper, we initiate the study of predicting RNA secondary structures with a maximum number of stacking pairs while allowing arbitrary pseudoknots. We obtain two approximation algorithms with worst-case approximation ratios of 1/2 and 1/3 for planar and general secondary structures, respectively. For an RNA sequence of n bases, the approximation algorithm for planar secondary structures runs in O(n(3)) time while that for the general case runs in linear time. Furthermore, we prove that allowing pseudoknots makes it NP-hard to maximize the number of stacking pairs in a planar secondary structure. This result is in contrast with the recent NP-hard results on psuedoknots which are based on optimizing some general and complicated energy functions.  相似文献   

20.
To address many challenges in RNA structure/function prediction, the characterization of RNA''s modular architectural units is required. Using the RNA-As-Graphs (RAG) database, we have previously explored the existence of secondary structure (2D) submotifs within larger RNA structures. Here we present RAG-3D—a dataset of RNA tertiary (3D) structures and substructures plus a web-based search tool—designed to exploit graph representations of RNAs for the goal of searching for similar 3D structural fragments. The objects in RAG-3D consist of 3D structures translated into 3D graphs, cataloged based on the connectivity between their secondary structure elements. Each graph is additionally described in terms of its subgraph building blocks. The RAG-3D search tool then compares a query RNA 3D structure to those in the database to obtain structurally similar structures and substructures. This comparison reveals conserved 3D RNA features and thus may suggest functional connections. Though RNA search programs based on similarity in sequence, 2D, and/or 3D structural elements are available, our graph-based search tool may be advantageous for illuminating similarities that are not obvious; using motifs rather than sequence space also reduces search times considerably. Ultimately, such substructuring could be useful for RNA 3D structure prediction, structure/function inference and inverse folding.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号