首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 390 毫秒
1.
We present a thorough analysis of the relation between amino acid sequence and local three-dimensional structure in proteins. A library of overlapping local structural prototypes was built using an unsupervised clustering approach called “hybrid protein model” (HPM). The HPM carries out a multiple structural alignment of local folds from a non-redundant protein structure databank encoded into a structural alphabet composed of 16 protein blocks (PBs). Following previous research focusing on the HPM protocol, we have considered gaps in the local structure prototype. This methodology allows to have variable length fragments. Hence, 120 local structure prototypes were obtained. Twenty-five percent of the protein fragments learnt by HPM had gaps.An investigation of tight turns suggested that they are mainly derived from three PB series with precise locations in the HPM. The amino acid information content of the whole conformational classes was tackled by multivariate methods, e.g., canonical correlation analysis. It points out the presence of seven amino acid equivalence classes showing high propensities for preferential local structures. In the same way, definition of “contrast factors” based on sequence-structure properties underline the specificity of certain structural prototypes, e.g., the dependence of Gly or Asn-rich turns to a limited number of PBs, or, the opposition between Pro-rich coils to those enriched in Ser, Thr, Asn and Glu. These results are so useful to analyze the sequence-structure relationships, but could also be used to improve fragment-based method for protein structure prediction from sequence.  相似文献   

2.
3.
Three-dimensional protein structures can be described with a library of 3D fragments that define a structural alphabet. We have previously proposed such an alphabet, composed of 16 patterns of five consecutive amino acids, called Protein Blocks (PBs). These PBs have been used to describe protein backbones and to predict local structures from protein sequences. The Q16 prediction rate reaches 40.7% with an optimization procedure. This article examines two aspects of PBs. First, we determine the effect of the enlargement of databanks on their definition. The results show that the geometrical features of the different PBs are preserved (local RMSD value equal to 0.41 A on average) and sequence-structure specificities reinforced when databanks are enlarged. Second, we improve the methods for optimizing PB predictions from sequences, revisiting the optimization procedure and exploring different local prediction strategies. Use of a statistical optimization procedure for the sequence-local structure relation improves prediction accuracy by 8% (Q16 = 48.7%). Better recognition of repetitive structures occurs without losing the prediction efficiency of the other local folds. Adding secondary structure prediction improved the accuracy of Q16 by only 1%. An entropy index (Neq), strongly related to the RMSD value of the difference between predicted PBs and true local structures, is proposed to estimate prediction quality. The Neq is linearly correlated with the Q16 prediction rate distributions, computed for a large set of proteins. An "expected" prediction rate QE16 is deduced with a mean error of 5%.  相似文献   

4.
5.
The structural annotation of proteins with no detectable homologs of known 3D structure identified using sequence‐search methods is a major challenge today. We propose an original method that computes the conditional probabilities for the amino‐acid sequence of a protein to fit to known protein 3D structures using a structural alphabet, known as “Protein Blocks” (PBs). PBs constitute a library of 16 local structural prototypes that approximate every part of protein backbone structures. It is used to encode 3D protein structures into 1D PB sequences and to capture sequence to structure relationships. Our method relies on amino acid occurrence matrices, one for each PB, to score global and local threading of query amino acid sequences to protein folds encoded into PB sequences. It does not use any information from residue contacts or sequence‐search methods or explicit incorporation of hydrophobic effect. The performance of the method was assessed with independent test datasets derived from SCOP 1.75A. With a Z‐score cutoff that achieved 95% specificity (i.e., less than 5% false positives), global and local threading showed sensitivity of 64.1% and 34.2%, respectively. We further tested its performance on 57 difficult CASP10 targets that had no known homologs in PDB: 38 compatible templates were identified by our approach and 66% of these hits yielded correctly predicted structures. This method scales‐up well and offers promising perspectives for structural annotations at genomic level. It has been implemented in the form of a web‐server that is freely available at http://www.bo‐protscience.fr/forsa .  相似文献   

6.
7.
MOTIVATION: A large body of evidence suggests that protein structural information is frequently encoded in local sequences-sequence-structure relationships derived from local structure/sequence analyses could significantly enhance the capacities of protein structure prediction methods. In this paper, the prediction capacity of a database (LSBSP2) that organizes local sequence-structure relationships encoded in local structures with two consecutive secondary structure elements is tested with two computational procedures for protein structure prediction. The goal is twofold: to test the folding hypothesis that local structures are determined by local sequences, and to enhance our capacity in predicting protein structures from their amino acid sequences. RESULTS: The LSBSP2 database contains a large set of sequence profiles derived from exhaustive pair-wise structural alignments for local structures with two consecutive secondary structure elements. One computational procedure makes use of the PSI-BLAST alignment program to predict local structures for testing sequence fragments by matching the testing sequence fragments onto the sequence profiles in the LSBSP2 database. The results show that 54% of the test sequence fragments were predicted with local structures that match closely with their native local structures. The other computational procedure is a filter system that is capable of removing false positives as possible from a set of PSI-BLAST hits. An assessment with a large set of non-redundant protein structures shows that the PSI-BLAST + filter system improves the prediction specificity by up to two-fold over the prediction specificity of the PSI-BLAST program for distantly related protein pairs. Tests with the two computational procedures above demonstrate that local sequence-structure relationships can indeed enhance our capacity in protein structure prediction. The results also indicate that local sequences encoded with strong local structure propensities play an important role in determining the native state folding topology.  相似文献   

8.
A statistical analysis of the PDB structures has led us to define a new set of small 3D structural prototypes called Protein Blocks (PBs). This structural alphabet includes 16 PBs, each one is defined by the (phi, psi) dihedral angles of 5 consecutive residues. The amino acid distributions observed in sequence windows encompassing these PBs are used to predict by a Bayesian approach the local 3D structure of proteins from the sole knowledge of their sequences. LocPred is a software which allows the users to submit a protein sequence and performs a prediction in terms of PBs. The prediction results are given both textually and graphically.  相似文献   

9.
True positive hits of PROSITE sequence pattern are expected to have a characteristic three-dimensional structure. The combined sequence-structure attributes of PROSITE patterns can be used for function prediction of an uncharacterized protein with known primary and 3D structure, a situation that might arise in structural genomics projects. We have found specific examples of true hits of PROSITE patterns displaying structural plasticity by assuming significantly different local conformation, depending upon the context. Our work highlights the importance of taking into account all the known distinct conformations of PROSITE patterns, while creating a sensitive 3D template for the pattern, for use in functional annotation.  相似文献   

10.
11.
Rangwala H  Karypis G 《Proteins》2008,72(3):1005-1018
The effectiveness of comparative modeling approaches for protein structure prediction can be substantially improved by incorporating predicted structural information in the initial sequence-structure alignment. Motivated by the approaches used to align protein structures, this article focuses on developing machine learning approaches for estimating the RMSD value of a pair of protein fragments. These estimated fragment-level RMSD values can be used to construct the alignment, assess the quality of an alignment, and identify high-quality alignment segments. We present algorithms to solve this fragment-level RMSD prediction problem using a supervised learning framework based on support vector regression and classification that incorporates protein profiles, predicted secondary structure, effective information encoding schemes, and novel second-order pairwise exponential kernel functions. Our comprehensive empirical study shows superior results compared with the profile-to-profile scoring schemes. We also show that for protein pairs with low sequence similarity (less than 12% sequence identity) these new local structural features alone or in conjunction with profile-based information lead to alignments that are considerably accurate than those obtained by schemes that use only profile and/or predicted secondary structure information.  相似文献   

12.
We describe a hidden Markov model, HMMSTR, for general protein sequence based on the I-sites library of sequence-structure motifs. Unlike the linear hidden Markov models used to model individual protein families, HMMSTR has a highly branched topology and captures recurrent local features of protein sequences and structures that transcend protein family boundaries. The model extends the I-sites library by describing the adjacencies of different sequence-structure motifs as observed in the protein database and, by representing overlapping motifs in a much more compact form, achieves a great reduction in parameters. The HMM attributes a considerably higher probability to coding sequence than does an equivalent dipeptide model, predicts secondary structure with an accuracy of 74.3 %, backbone torsion angles better than any previously reported method and the structural context of beta strands and turns with an accuracy that should be useful for tertiary structure prediction.  相似文献   

13.
14.
Comparative modeling methods can consistently produce reliable structural models for protein sequences with more than 25% sequence identity to proteins with known structure. However, there is a good chance that also sequences with lower sequence identity have their structural components represented in structural databases. To this end, we present a novel fragment-based method using sets of structurally similar local fragments of proteins. The approach differs from other fragment-based methods that use only single backbone fragments. Instead, we use a library of groups containing sets of sequence fragments with geometrically similar local structures and extract sequence related properties to assign these specific geometrical conformations to target sequences. We test the ability of the approach to recognize correct SCOP folds for 273 sequences from the 49 most popular folds. 49% of these sequences have the correct fold as their top prediction, while 82% have the correct fold in one of the top five predictions. Moreover, the approach shows no performance reduction on a subset of sequence targets with less than 10% sequence identity to any protein used to build the library.  相似文献   

15.
Here we perform a systematic exploration of the use of distance constraints derived from small angle X-ray scattering (SAXS) measurements to filter candidate protein structures for the purpose of protein structure prediction. This is an intrinsically more complex task than that of applying distance constraints derived from NMR data where the identity of the pair of amino acid residues subject to a given distance constraint is known. SAXS, on the other hand, yields a histogram of pair distances (pair distribution function), but the identities of the pairs contributing to a given bin of the histogram are not known. Our study is based on an extension of the Levitt-Hinds coarse grained approach to ab initio protein structure prediction to generate a candidate set of C(alpha) backbones. In spite of the lack of specific residue information inherent in the SAXS data, our study shows that the implementation of a SAXS filter is capable of effectively purifying the set of native structure candidates and thus provides a substantial improvement in the reliability of protein structure prediction. We test the quality of our predicted C(alpha) backbones by doing structural homology searches against the Dali domain library, and find that the results are very encouraging. In spite of the lack of local structural details and limited modeling accuracy at the C(alpha) backbone level, we find that useful information about fold classification can be extracted from this procedure. This approach thus provides a way to use a SAXS data based structure prediction algorithm to generate potential structural homologies in cases where lack of sequence homology prevents identification of candidate folds for a given protein. Thus our approach has the potential to help in determination of the biological function of a protein based on structural homology instead of sequence homology.  相似文献   

16.
One-dimensional (1D) structures of proteins such as secondary structure and contact number provide intuitive pictures to understand how the native three-dimensional (3D) structure of a protein is encoded in the amino acid sequence. However, it is still not clear whether a given set of 1D structures contains sufficient information for recovering the underlying 3D structure. Here we show that the 3D structure of a protein can be recovered from a set of three types of 1D structures, namely, secondary structure, contact number and residue-wise contact order which is introduced here for the first time. Using simulated annealing molecular dynamics simulations, the structures satisfying the given native 1D structural restraints were sought for 16 proteins of various structural classes and of sizes ranging from 56 to 146 residues. By selecting the structures best satisfying the restraints, all the proteins showed a coordinate RMS deviation of <4 A from the native structure, and, for most of them, the deviation was even <2 A. The present result opens a new possibility to protein structure prediction and our understanding of the sequence-structure relationship.  相似文献   

17.
18.
MOTIVATION: Our aim is to develop a process that automatically defines a repertory of contiguous 3D protein structure fragments and can be used in homology modeling. We present here improvements to the method we introduced previously: the 'hybrid protein model' (de Brevern and Hazout, THEOR: Chem. Acc., 106, 36-47, (2001)) The hybrid protein learns a non-redundant databank encoded in a structural alphabet composed of 16 Protein Blocks (PBs; de Brevern et al., Proteins, 41, 271-287, (2000)). Every local fold is learned by looking for the most similar pattern present in the hybrid protein and modifying it slightly. Finally each position corresponds to a cluster of similar 3D local folds. RESULTS: In this paper, we describe improvements to our method for building an optimal hybrid protein: (i) 'baby training,' which is defined as the introduction of large structure fragments and the progressive reduction in the size of training fragments; and (ii) the deletion of the redundant parts of the hybrid protein. This repertory of contiguous 3D protein structure fragments should be a useful tool for molecular modeling.  相似文献   

19.
MOTIVATION: A large body of experimental and theoretical evidence suggests that local structural determinants are frequently encoded in short segments of protein sequence. Although the local structural information, once recognized, is particularly useful in protein structural and functional analyses, it remains a difficult problem to identify embedded local structural codes based solely on sequence information. RESULTS: In this paper, we describe a local structure prediction method aiming at predicting the backbone structures of nine-residue sequence segments. Two elements are the keys for this local structure prediction procedure. The first key element is the LSBSP1 database, which contains a large number of non-redundant local structure-based sequence profiles for nine-residue structure segments. The second key element is the consensus approach, which identifies a consensus structure from a set of hit structures. The local structure prediction procedure starts by matching a query sequence segment of nine consecutive amino acid residues to all the sequence profiles in the local structure-based sequence profile database (LSBSP1). The consensus structure, which is at the center of the largest structural cluster of the hit structures, is predicted to be the native state structure adopted by the query sequence segment. This local structure prediction method is assessed with a large set of random test protein structures that have not been used in constructing the LSBSP1 database. The benchmark results indicate that the prediction capacities of the novel local structure prediction procedure exceed the prediction capacities of the local backbone structure prediction methods based on the I-sites library by a significant margin. AVAILABILITY: All the computational and assessment procedures have been implemented in the integrated computational system PrISM.1 (Protein Informatics System for Modeling). The system and associated databases for LINUX systems can be downloaded from the website: http://www.columbia.edu/~ay1/.  相似文献   

20.
We present a method to assess the reliability of local structure prediction from sequence. We introduce a greedy algorithm for filtering and enrichment of dynamic fragment libraries, compiled with remote-homology detection methods such as HHfrag. After filtering false hits at each target position, we reduce the fragment library to a minimal set of representative fragments, which are guaranteed to have correct local structure in regions of detectable conservation. We demonstrate that the location of conserved motifs in a protein sequence can be predicted by examining the recurrence and structural homogeneity of detected fragments. The resulting confidence score correlates with the local RMSD of the representative fragments and allows us to predict torsion angles from sequence with better accuracy compared to existing machine learning methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号