首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
It is well established that sequence templates such as those in the PROSITE and PRINTS databases are powerful tools for predicting the biological function and tertiary structure for newly derived protein sequences. The number of X-ray and NMR protein structures is increasing rapidly and it is apparent that a 3D equivalent of the sequence templates is needed. Here, we describe an algorithm called TESS that automatically derives 3D templates from structures deposited in the Brookhaven Protein Data Bank. While a new sequence can be searched for sequence patterns, a new structure can be scanned against these 3D templates to identify functional sites. As examples, 3D templates are derived for enzymes with an O-His-O "catalytic triad" and for the ribonucleases and lysozymes. When these 3D templates are applied to a large data set of nonidentical proteins, several interesting hits are located. This suggests that the development of a 3D template database may help to identify the function of new protein structures, if unknown, as well as to design proteins with specific functions.  相似文献   

2.
It is well established that sequence templates (e.g., PROSITE) and databases are powerful tools for identifying biological function and tertiary structure for an unknown protein sequence. Here we describe a method for automatically deriving 3D templates from the protein structures deposited in the Brookhaven Protein Data Bank. As an example, we describe a template derived for the Ser-His-Asp catalytic triad found in the serine proteases and triacylglycerol lipases. We find that the resultant template provides a highly selective tool for automatically differentiating between catalytic and noncatalytic Ser-His-Asp associations. When applied to nonproteolytic proteins, the template picks out two "non-esterase" catalytic triads that may be of biological relevance. This suggests that the development of databases of 3D templates, such as those that currently exist for protein sequence templates, will help identify the functions of new protein structures as they are determined and pinpoint their functionally important regions.  相似文献   

3.
We assume that each class of protein has a core structure that is defined by internal residues, and that the external, solvent-contacting residues contribute to the stability of the structure, are of primary importance to function, but do not determine the architecture of the core portions of the polypeptide chain. An algorithm has been developed to supply a list of permitted sequences of internal residues compatible with a known core structure. This list is referred to as the tertiary template for that structure. In general the positions in the template are not sequentially adjacent and are distributed throughout the polypeptide chain. The template is derived using the fixed positions for the main-chain and beta-carbon atoms in the test structure and selected stereochemical rules. The focus of this paper is on the use of two packing criteria: avoidance of steric overlap and complete filling of available space. The program also notes potential polar group interactions and disulfide bonds as well as possible burial of formal charges. Central to the algorithm is the side-chain rotamer library. In an update of earlier studies by others, we show that 17 of the 20 amino acids (omitting Met, Lys and Arg) can be represented adequately by 67 side-chain rotamers. A list of chi angles and their standard deviations is given. The newer, high-resolution, refined structures in the Brookhaven Protein Data Bank show similar mean chi values, but have much smaller deviations than those of earlier studies. This suggests that a rotamer library may be a better structural approximation than was previously thought. In using packing constraints, it has been found essential to include all hydrogen atoms specifically. The "unified atom" representation is not adequate. The permitted rotamer sequences are severely restricted by the main-chain plus beta-carbon atoms of the test structure. Further restriction is introduced if the full set of atoms of the external residues are held fixed, the full-chain model. The space-filling requirement has a major role in restricting the template lists. The preliminary tests reported here make it appear likely that templates prepared from the currently known core structures will be able to discriminate between these structures. The templates should thus be useful in deciding whether a sequence of unknown tertiary structure fits any of the known core classes and, if a fit is found, how the sequence should be aligned in three dimensions to fit the core of that class.(ABSTRACT TRUNCATED AT 400 WORDS)  相似文献   

4.
Experimental residual dipolar couplings (RDCs) in combination with structural models have the potential for accelerating the protein backbone resonance assignment process because RDCs can be measured accurately and interpreted quantitatively. However, this application has been limited due to the need for very high-resolution structural templates. Here, we introduce a new approach to resonance assignment based on optimal agreement between the experimental and calculated RDCs from a structural template that contains all assignable residues. To overcome the inherent computational complexity of such a global search, we have adopted an efficient two-stage search algorithm and included connectivity data from conventional assignment experiments. In the first stage, a list of strings of resonances (CA-links) is generated via exhaustive searches for short segments of sequentially connected residues in a protein (local templates), and then ranked by the agreement of the experimental 13Cα chemical shifts and 15N-1H RDCs to the predicted values for each local template. In the second stage, the top CA-links for different local templates in stage I are combinatorially connected to produce CA-links for all assignable residues. The resulting CA-links are ranked for resonance assignment according to their measured RDCs and predicted values from a tertiary structure. Since the final RDC ranking of CA-links includes all assignable residues and the assignment is derived from a “global minimum”, our approach is far less reliant on the quality of experimental data and structural templates. The present approach is validated with the assignments of several proteins, including a 42 kDa maltose binding protein (MBP) using RDCs and structural templates of varying quality. Since backbone resonance assignment is an essential first step for most of biomolecular NMR applications and is often a bottleneck for large systems, we expect that this new approach will improve the efficiency of the assignment process for small and medium size proteins and will extend the size limits assignable by current methods for proteins with structural models.  相似文献   

5.
Measurements of protein sequence-structure correlations   总被引:1,自引:0,他引:1  
Crooks GE  Wolfe J  Brenner SE 《Proteins》2004,57(4):804-810
Correlations between protein structures and amino acid sequences are widely used for protein structure prediction. For example, secondary structure predictors generally use correlations between a secondary structure sequence and corresponding primary structure sequence, whereas threading algorithms and similar tertiary structure predictors typically incorporate interresidue contact potentials. To investigate the relative importance of these sequence-structure interactions, we measured the mutual information among the primary structure, secondary structure and side-chain surface exposure, both for adjacent residues along the amino acid sequence and for tertiary structure contacts between residues distantly separated along the backbone. We found that local interactions along the amino acid chain are far more important than non-local contacts and that correlations between proximate amino acids are essentially uninformative. This suggests that knowledge-based contact potentials may be less important for structure predication than is generally believed.  相似文献   

6.
The annotation of protein function has not kept pace with the exponential growth of raw sequence and structure data. An emerging solution to this problem is to identify 3D motifs or templates in protein structures that are necessary and sufficient determinants of function. Here, we demonstrate the recurrent use of evolutionary trace information to construct such 3D templates for enzymes, search for them in other structures, and distinguish true from spurious matches. Serine protease templates built from evolutionarily important residues distinguish between proteases and other proteins nearly as well as the classic Ser-His-Asp catalytic triad. In 53 enzymes spanning 33 distinct functions, an automated pipeline identifies functionally related proteins with an average positive predictive power of 62%, including correct matches to proteins with the same function but with low sequence identity (the average identity for some templates is only 17%). Although these template building, searching, and match classification strategies are not yet optimized, their sequential implementation demonstrates a functional annotation pipeline which does not require experimental information, but only local molecular mimicry among a small number of evolutionarily important residues.  相似文献   

7.
We have used the occluded surface algorithm to estimate the packing of both buried and exposed amino acid residues in protein structures. This method works equally well for buried residues and solvent-exposed residues in contrast to the commonly used Voronoi method that works directly only on buried residues. The atomic packing of individual globular proteins may vary significantly from the average packing of a large data set of globular proteins. Here, we demonstrate that these variations in protein packing are due to a complex combination of protein size, secondary structure composition and amino acid composition. Differences in protein packing are conserved in protein families of similar structure despite significant sequence differences. This conclusion indicates that quality assessments of packing in protein structures should include a consideration of various parameters including the packing of known homologous proteins. Also, modeling of protein structures based on homologous templates should take into account the packing of the template protein structure.  相似文献   

8.
Elements of local tertiary structure in RNA molecules are important in understanding structure-function relationships. The loop E motif, first identified in several eukaryotic RNAs at functional sites which share an exceptional propensity for UV crosslinking between specific bases, was subsequently shown to have a characteristic tertiary structure. Common sequences and secondary structures have allowed other examples of the E-loop motif to be recognized in a number of RNAs at sites of protein binding or other biological function. We would like to know if more elements of local tertiary structure, in addition to the E-loop, can be identified by such common features. The highly structured circular RNA genome of the hepatitis D virus (HDV) provides an ideal test molecule because it has extensive internal structure, a UV-crosslinkable tertiary element, and specific sites for functional interactions with proteins including host PKR. We have now found a UV-crosslinkable element of local tertiary structure in antigenomic HDV RNA which, although differing from the E-loop, has a very similar pattern of sequence and secondary structure to the UV-crosslinkable element found in the genomic strand. Despite the fact that the two structures map close to one another, the sequences comprising them are not the templates for each other. Instead, the template regions for each element are additional sites for potential higher order structure on their respective complementary strands. This wealth of recurring sequences interspersed with base-paired stems provides a context to examine other RNA species for such features and their correlations with biological function.  相似文献   

9.
Lin TH  Tsai KC  Lo TC 《Protein engineering》2003,16(11):819-829
The tertiary structure of the central catalytic domain of insertion sequence ISLC3 isolated from Lactobacillus casei ATCC 393 was predicted using the homology modeling approach. The novel insertion sequence was isolated by us from the template bacteriophage phiA3 of L.casei ATCC 393. The number of amino acid residues of the ISLC3 central catalytic domain was 116 and was treated as the query sequence. There were five Web-available threading methods used to find some primary structure templates for the query sequence. These primary templates were further screened using the SWISS-MODEL Protein Modeling Server and the default parameter settings therein to give six final structure templates. All of these final structure templates were the integrase (IN) protein of retroviruses. Multiple sequence alignment using these IN sequences against the query one revealed the signature DDE motif. Based on the structures of these final templates, the structure of the query sequence was constructed using the InsightII/Discover/Homology programs. A metal ion, Mg(2+), was inserted into the center of the putative catalytic pocket formed by the DDE residues of the predicted structure in the final rounds of refinement by molecular dynamics (MD) simulations. The structure with a metal ion included was designated with Mg and that without a metal ion was designated free Mg. The average exposed surface area of some hydrophobic residues of both the predicted free Mg and with Mg structures were computed and compared with those computed for the six structure templates. Whereas the predicted with Mg structure was slightly more exposed than the predicted free Mg structure, the former appeared to be more stable than the latter, as revealed by the lower conformation energy recorded for the former during the structure refinement by MD simulations. To verify further the predicted structures, the coordinates of both predicted structures were fed into the ERRAT Protein Verification Server. It was found that the quality of the predicted with Mg structure was much better than that of the free Mg structure. The validation results also indicated that regions of the predicted with Mg structure that can be rejected at the 95% confidence level were approximately 20% whereas those which can be rejected at the same level for the six structure templates were approximately 10%. The predicted with Mg structure was also docked into a short oligonucleotide representing the substrate of the ISLC3 transposase using the DOCK_4.0.2 program. It was found that both Glu140 and Asp68 residues of the DDE motif of the predicted with Mg structure were able to form hydrogen bonds with the DNA substrate, which was similar to what was observed in a docking study using the retrovirus IN 1asu and its DNA substrate.  相似文献   

10.
Predicting the structural fold of a protein is an important and challenging problem. Available computer programs for determining whether a protein sequence is compatible with a known 3-dimensional structure fall into 2 categories: (1) structure-based methods, in which structural features such as local conformation and solvent accessibility are encoded in a template, and (2) sequence-based methods, in which aligned sequences of a set of related proteins are encoded in a template. In both cases, the programs use a static template based on a predetermined set of proteins. Here, we describe a computer-based method, called iterative template refinement (ITR), that uses templates combining structure-based and sequence-based information and employs an iterative search procedure to detect related proteins and sequentially add them to the templates. Starting from a single protein of known structure, ITR performs sequential cycles of database search to construct an expanding tree of templates with the aim of identifying subtle relationships among proteins. Evaluating the performance of ITR on 6 proteins, we found that the method automatically identified a variety of subtle structural similarities to other proteins. For example, the method identified structural similarity between arabinose-binding protein and phosphofructokinase, a relationship that has not been widely recognized.  相似文献   

11.
Added-value is the additional information that a model carries with respect to the template structure used for model building. Thousands of single-template models, corresponding to proteins of known structure, were analyzed. The accuracy of structure-derived properties, such as residue accessibility, surface area, electrostatic potential, and others, was determined as a function of template:target sequence identity by comparing the models with their corresponding experimental structures. Added-value was determined by comparing the accuracy in models with that from templates. Geometry-dependent properties such as neighborhood of buried residues and accessible surface area showed low added-value. Properties that also depend on the protein sequence, such as presence of polar areas and electrostatic potential, showed high added-value. In general added-value increases when template:target sequence identity decreases, but it is also affected by alignment errors. This study justifies the use of models instead of the use of templates to estimate structure-derived properties of a target protein.  相似文献   

12.
Rong Liu  Jianjun Hu 《Proteins》2013,81(11):1885-1899
Accurate prediction of DNA‐binding residues has become a problem of increasing importance in structural bioinformatics. Here, we presented DNABind, a novel hybrid algorithm for identifying these crucial residues by exploiting the complementarity between machine learning‐ and template‐based methods. Our machine learning‐based method was based on the probabilistic combination of a structure‐based and a sequence‐based predictor, both of which were implemented using support vector machines algorithms. The former included our well‐designed structural features, such as solvent accessibility, local geometry, topological features, and relative positions, which can effectively quantify the difference between DNA‐binding and nonbinding residues. The latter combined evolutionary conservation features with three other sequence attributes. Our template‐based method depended on structural alignment and utilized the template structure from known protein–DNA complexes to infer DNA‐binding residues. We showed that the template method had excellent performance when reliable templates were found for the query proteins but tended to be strongly influenced by the template quality as well as the conformational changes upon DNA binding. In contrast, the machine learning approach yielded better performance when high‐quality templates were not available (about 1/3 cases in our dataset) or the query protein was subject to intensive transformation changes upon DNA binding. Our extensive experiments indicated that the hybrid approach can distinctly improve the performance of the individual methods for both bound and unbound structures. DNABind also significantly outperformed the state‐of‐art algorithms by around 10% in terms of Matthews's correlation coefficient. The proposed methodology could also have wide application in various protein functional site annotations. DNABind is freely available at http://mleg.cse.sc.edu/DNABind/ . Proteins 2013; 81:1885–1899. © 2013 Wiley Periodicals, Inc.  相似文献   

13.
A novel approach is proposed for modeling loop regions in proteins. In this approach, a prerequisite sequence-structure alignment is examined for regions where the target sequence is not covered by the structural template. These regions, extended with a number of residues from adjacent stem regions, are submitted to fold recognition. The alignments produced by fold recognition are integrated into the initial alignment to create an alignment between the target sequence and several structures, where gaps in the main structural template are covered by local structural templates. This one-to-many (1:N) alignment is used to create a protein model by existing protein-modeling techniques. Several alternative approaches were evaluated using a set of ten proteins. One approach was selected and evaluated using another set of 31 proteins. The most promising result was for gap regions not located at the C-terminus or N-terminus of a protein, where the method produced an average RMSD 12% lower than the loop modeling provided with the program MODELLER. This improvement is shown to be statistically significant. Figure The method derived from the training set applied to CASP target T0191  相似文献   

14.
The local environment of an amino acid in a folded protein determines the acceptability of mutations at that position. In order to characterize and quantify these structural constraints, we have made a comparative analysis of families of homologous proteins. Residues in each structure are classified according to amino acid type, secondary structure, accessibility of the side chain, and existence of hydrogen bonds from the side chains. Analysis of the pattern of observed substitutions as a function of local environment shows that there are distinct patterns, especially for buried polar residues. The substitution data tables are available on diskette with Protein Science. Given the fold of a protein, one is able to predict sequences compatible with the fold (profiles or templates) and potentially to discriminate between a correctly folded and misfolded protein. Conversely, analysis of residue variation across a family of aligned sequences in terms of substitution profiles can allow prediction of secondary structure or tertiary environment.  相似文献   

15.
To facilitate investigation of the molecular and biochemical functions of the adenovirus E4 Orf6 protein, we sought to derive three-dimensional structural information using computational methods, particularly threading and comparative protein modeling. The amino acid sequence of the protein was used for secondary structure and hidden Markov model (HMM) analyses, and for fold recognition by the ProCeryon program. Six alternative models were generated from the top-scoring folds identified by threading. These models were examined by 3D-1D analysis and evaluated in the light of available experimental evidence. The final model of the E4 protein derived from these and additional threading calculations was a chimera, with the tertiary structure of its C-terminal 226 residues derived from a TIM barrel template and a mainly alpha-nonbundle topology for its poorly conserved N-terminal 68 residues. To assess the accuracy of this model, additional threading calculations were performed with E4 Orf6 sequences altered as in previous experimental studies. The proposed structural model is consistent with the reported secondary structure of a functionally important C-terminal sequence and can account for the properties of proteins carrying alterations in functionally important sequences or of those that disrupt an unusual zinc-coordination motif.  相似文献   

16.
MOTIVATION: The quality of a model structure derived from a comparative modeling procedure is dictated by the accuracy of the predicted sequence-template alignment. As the sequence-template pairs are increasingly remote in sequence relationship, the prediction of the sequence-template alignments becomes increasingly problematic with sequence alignment methods. Structural information of the template, used in connection with the sequence relationship of the sequence-template pair, could significantly improve the accuracy of the sequence-template alignment. In this paper, we describe a sequence-template alignment method that integrates sequence and structural information to enhance the accuracy of sequence-template alignments for distantly related protein pairs. RESULTS: The structure-dependent sequence alignment (SDSA) procedure was optimized for coverage and accuracy on a training set of 412 protein pairs; the structures for each of the training pairs are similar (RMSD< approximately 4A) but the sequence relationship is undetectable (average pair-wise sequence identity = 8%). The optimized SDSA procedure was then applied to extend PSI-BLAST local alignments by calculating the global alignments under the constraint of the residue pairs in the local alignments. This composite alignment procedure was assessed with a testing set of 1421 protein pairs, of which the pair-wise structures are similar (RMSD< approximately 4A) but the sequences are marginally related at best in each pair (average pair-wise sequence identity = 13%). The assessment showed that the composite alignment procedure predicted more aligned residues pairs with an average of 27% increase in correctly aligned residues over the standard PSI-BLAST alignments for the protein pairs in the testing set.  相似文献   

17.
Fold recognition predicts protein three-dimensional structure by establishing relationships between a protein sequence and known protein structures. Most methods explicitly use information derived from the secondary and tertiary structure of the templates. Here we show that rigorous application of a sequence search method (PSI-BLAST) with no reference to secondary or tertiary structure information is able to perform as well as traditional fold recognition methods. Since the method, SENSER, does not require knowledge of the three-dimensional structure, it can be used to infer relationships that are not tractable by methods dependent on structural templates.  相似文献   

18.
Constructing a model of a query protein based on its alignment to a homolog with experimentally determined spatial structure (the template) is still the most reliable approach to structure prediction. Alignment errors are the main bottleneck for homology modeling when the query is distantly related to the template. Alignment methods often misalign secondary structural elements by a few residues. Therefore, better alignment solutions can be found within a limited set of local shifts of secondary structures. We present a refinement method to improve pairwise sequence alignments by evaluating alignment variants generated by local shifts of template‐defined secondary structures. Our method SFESA is based on a novel scoring function that combines the profile‐based sequence score and the structure score derived from residue contacts in a template. Such a combined score frequently selects a better alignment variant among a set of candidate alignments generated by local shifts and leads to overall increase in alignment accuracy. Evaluation of several benchmarks shows that our refinement method significantly improves alignments made by automatic methods such as PROMALS, HHpred and CNFpred. The web server is available at http://prodata.swmed.edu/sfesa . Proteins 2015; 83:411–427. © 2014 Wiley Periodicals, Inc.  相似文献   

19.
We investigated the possible role of residues at the Ccap position in an alpha-helix on protein stability. A set of 431 protein alpha-helices containing a C'-Gly from the Protein Data Bank (PDB) was analyzed, and the normalized frequencies for finding particular residues at the Ccap position, the average fraction of buried surface area, and the hydrogen bonding patterns of the Ccap residue side-chain were calculated. We found that on average the Ccap position is 70% buried and noted a significant correlation (R=0.8) between the relative burial of this residue and its hydrophobicity as defined by the Gibbs energy of transfer from octanol or cyclohexane to water. Ccap residues with polar side-chains are commonly involved in hydrogen bonding. The hydrogen bonding pattern is such that, the longer side-chains of Glu, Gln, Arg, Lys, His form hydrogen bonds with residues distal (>+/-4) in sequence, while the shorter side-chains of Asp, Asn, Ser, Thr exhibit hydrogen bonds with residues close in sequence (<+/-4), mainly involving backbone atoms. Experimentally we determined the thermodynamic propensities of residues at the Ccap position using the protein ubiquitin as a model system. We observed a large variation in the stability of the ubiquitin variants depending on the nature of the Ccap residue. Furthermore, the measured changes in stability of the ubiquitin variants correlate with the hydrophobicity of the Ccap residue. The experimental results, together with the statistical analysis of protein structures from the PDB, indicate that the key hydrophobic capping interactions between a helical residue (C3 or C4) and a residue outside the helix (C", C3' or C4') are frequently enhanced by the hydrophobic interactions with Ccap residues.  相似文献   

20.
MOTIVATION: Structural templates consisting of a few atoms in a specific geometric conformation provide a powerful tool for studying the relationship between protein structure and function. Current methods for template searching constrain template syntax and semantics by their design. Hence there is a need for a more flexible core algorithm upon which to build more sophisticated tools. Statistical analysis of structural similarity is still in its infancy when compared with its analogue in sequence alignment. In the context of template matching, there is an urgent need for normalization of scores so that results from templates with differing sensitivity may be compared directly. RESULTS: We introduce Jess, a fast and flexible algorithm for searching protein structures for small groups of atoms under arbitrary constraints on geometry and chemistry. We apply the algorithm to a set of manually derived enzyme active site templates, and derive an empirical measure for estimating the relative significance of hits encountered using differing templates.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号