首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Improving fold recognition without folds   总被引:4,自引:0,他引:4  
The most reliable way to align two proteins of unknown structure is through sequence-profile and profile-profile alignment methods. If the structure for one of the two is known, fold recognition methods outperform purely sequence-based alignments. Here, we introduced a novel method that aligns generalised sequence and predicted structure profiles. Using predicted 1D structure (secondary structure and solvent accessibility) significantly improved over sequence-only methods, both in terms of correctly recognising pairs of proteins with different sequences and similar structures and in terms of correctly aligning the pairs. The scores obtained by our generalised scoring matrix followed an extreme value distribution; this yielded accurate estimates of the statistical significance of our alignments. We found that mistakes in 1D structure predictions correlated between proteins from different sequence-structure families. The impact of this surprising result was that our method succeeded in significantly out-performing sequence-only methods even without explicitly using structural information from any of the two. Since AGAPE also outperformed established methods that rely on 3D information, we made it available through. If we solved the problem of CPU-time required to apply AGAPE on millions of proteins, our results could also impact everyday database searches.  相似文献   

2.
This paper evaluates the results of a protein structure prediction contest. The predictions were made using threading procedures, which employ techniques for aligning sequences with 3D structures to select the correct fold of a given sequence from a set of alternatives. Nine different teams submitted 86 predictions, on a total of 21 target proteins with little or no sequence homology to proteins of known structure. The 3D structures of these proteins were newly determined by experimental methods, but not yet published or otherwise available to the predictors. The predictions, made from the amino acid sequence alone, thus represent a genuine test of the current performance of threading methods. Only a subset of all the predictions is evaluated here. It corresponds to the 44 predictions submitted for the 11 target proteins seen to adopt known folds. The predictions for the remaining 10 proteins were not analyzed, although weak similarities with known folds may also exist in these proteins. We find that threading methods are capable of identifying the correct fold in many cases, but not reliably enough as yet. Every team predicts correctly a different set of targets, with virtually all targets predicted correctly by at least one team. Also, common folds such as TIM barrels are recognized more readily than folds with only a few known examples. However, quite surprisingly, the quality of the sequence-structure alignments, corresponding to correctly recognized folds, is generally very poor, as judged by comparison with the corresponding 3D structure alignments. Thus, threading can presently not be relied upon to derive a detailed 3D model from the amino acid sequence. This raises a very intriguing question: how is fold recognition achieved? Our analysis suggests that it may be achieved because threading procedures maximize hydrophobic interactions in the protein core, and are reasonably good at recognizing local secondary structure. © 1995 Wiley-Liss, Inc.  相似文献   

3.
We present an analysis of 10 blind predictions prepared for a recent conference, “Critical Assessment of Techniques for Protein Structure Prediction.”1 The sequences of these proteins are not detectably similar to those of any protein in the structure database then available, but we attempted, by a threading method, to recognize similarity to known domain folds. Four of the 10 proteins, as we subsequently learned, do indeed show significant similarity to then-known structures. For 2 of these proteins the predictions were accurate, in the sense that a similar structure was at or near the top of the list of threading scores, and the threading alignment agreed well with the corresponding structural alignment. For the best predicted model mean alignment error relative to the optimal structural alignment was 2.7 residues, arising entirely from small “register shifts” of strands or helices. In the analysis we attempt to identify factors responsible for these successes and failures. Since our threading method does not use gap penalties, we may readily distinguish between errors arising from our prior definition of the “cores” of known structures and errors arising from inherent limitations in the threading potential. It would appear from the results that successful substructure recognition depends most critically on accurate definition of the “fold” of a database protein. This definition must correctly delineate substructures that are, and are not, likely to be conserved during protein evolution. © 1995 Wiley-Liss, Inc.  相似文献   

4.
5.
    
Zhou H  Zhou Y 《Proteins》2005,58(2):321-328
Recognizing structural similarity without significant sequence identity has proved to be a challenging task. Sequence-based and structure-based methods as well as their combinations have been developed. Here, we propose a fold-recognition method that incorporates structural information without the need of sequence-to-structure threading. This is accomplished by generating sequence profiles from protein structural fragments. The structure-derived sequence profiles allow a simple integration with evolution-derived sequence profiles and secondary-structural information for an optimized alignment by efficient dynamic programming. The resulting method (called SP(3)) is found to make a statistically significant improvement in both sensitivity of fold recognition and accuracy of alignment over the method based on evolution-derived sequence profiles alone (SP) and the method based on evolution-derived sequence profile and secondary structure profile (SP(2)). SP(3) was tested in SALIGN benchmark for alignment accuracy and Lindahl, PROSPECTOR 3.0, and LiveBench 8.0 benchmarks for remote-homology detection and model accuracy. SP(3) is found to be the most sensitive and accurate single-method server in all benchmarks tested where other methods are available for comparison (although its results are statistically indistinguishable from the next best in some cases and the comparison is subjected to the limitation of time-dependent sequence and/or structural library used by different methods.). In LiveBench 8.0, its accuracy rivals some of the consensus methods such as ShotGun-INBGU, Pmodeller3, Pcons4, and ROBETTA. SP(3) fold-recognition server is available on http://theory.med.buffalo.edu.  相似文献   

6.
    
NMR offers the possibility of accurate secondary structure for proteins that would be too large for structure determination. In the absence of an X-ray crystal structure, this information should be useful as an adjunct to protein fold recognition methods based on low resolution force fields. The value of this information has been tested by adding varying amounts of artificial secondary structure data and threading a sequence through a library of candidate folds. Using a literature test set, the threading method alone has only a one-third chance of producing a correct answer among the top ten guesses. With realistic secondary structure information, one can expect a 60-80% chance of finding a homologous structure. The method has then been applied to examples with published estimates of secondary structure. This implementation is completely independent of sequence homology, and sequences are optimally aligned to candidate structures with gaps and insertions allowed. Unlike work using predicted secondary structure, we test the effect of differing amounts of relatively reliable data.  相似文献   

7.
The dispositions of 39 alpha helices of greater than 2.5 turns and four beta sheets in the major capsid protein (VP5, 149 kDa) of herpes simplex virus type 1 were identified by computational and visualization analysis from the 8.5A electron cryomicroscopy structure of the whole capsid. The assignment of helices in the VP5 upper domain was validated by comparison with the recently determined crystal structure of this region. Analysis of the spatial arrangement of helices in the middle domain of VP5 revealed that the organization of a tightly associated bundle of ten helices closely resembled that of a domain fold found in the annexin family of proteins. Structure-based sequence searches suggested that sequences in both the N and C-terminal portions of the VP5 sequence contribute to this domain. The long helices seen in the floor domain of VP5 form an interconnected network within and across capsomeres. The combined structural and sequence-based informatics has led to an architectural model of VP5. This model placed in the context of the capsid provides insights into the strategies used to achieve viral capsid stability.  相似文献   

8.
    
In protein structure prediction, a central problem is defining the structure of a loop connecting 2 secondary structures. This problem frequently occurs in homology modeling, fold recognition, and in several strategies in ab initio structure prediction. In our previous work, we developed a classification database of structural motifs, ArchDB. The database contains 12,665 clustered loops in 451 structural classes with information about phi-psi angles in the loops and 1492 structural subclasses with the relative locations of the bracing secondary structures. Here we evaluate the extent to which sequence information in the loop database can be used to predict loop structure. Two sequence profiles were used, a HMM profile and a PSSM derived from PSI-BLAST. A jack-knife test was made removing homologous loops using SCOP superfamily definition and predicting afterwards against recalculated profiles that only take into account the sequence information. Two scenarios were considered: (1) prediction of structural class with application in comparative modeling and (2) prediction of structural subclass with application in fold recognition and ab initio. For the first scenario, structural class prediction was made directly over loops with X-ray secondary structure assignment, and if we consider the top 20 classes out of 451 possible classes, the best accuracy of prediction is 78.5%. In the second scenario, structural subclass prediction was made over loops using PSI-PRED (Jones, J Mol Biol 1999;292:195-202) secondary structure prediction to define loop boundaries, and if we take into account the top 20 subclasses out of 1492, the best accuracy is 46.7%. Accuracy of loop prediction was also evaluated by means of RMSD calculations.  相似文献   

9.
  总被引:1,自引:0,他引:1  
The Profiles-3D application, an inverse-folding methodology appropriate for water-soluble proteins, has been modified to allow the determination of structural properties of integral-membrane proteins (IMPs) and for testing the validity of solved and model structures of IMPs. The modification, known as reverse-environment prediction of integral membrane protein structure (REPIMPS), takes into account the fact that exposed areas of side chains for many residues in IMPs are in contact with lipid and not the aqueous phase. This (1) allows lipid-exposed residues to be classified into the correct physicochemical environment class, (2) significantly improves compatibility scores for IMPs whose structures have been solved, and (3) reduces the possibility of rejecting a three-dimensional structure for an IMP because the presence of lipid was not included. Validation tests of REPIMPS showed that it (1) can locate the transmembrane domain of IMPs with single transmembrane helices more frequently than a range of other methodologies, (2) can rotationally orient transmembrane helices with respect to the lipid environment and surrounding helices in IMPs with multiple transmembrane helices, and (3) has the potential to accurately locate transmembrane domains in IMPs with multiple transmembrane helices. We conclude that correcting for the presence of the lipid environment surrounding the transmembrane segments of IMPs is an essential step for reasonable modeling and verification of the three-dimensional structures of these proteins.  相似文献   

10.
    
The expression of genes transcribed by the RNA polymerase with the alternative sigma factor <r54 (Ecr54) is absolutely dependent on activator proteins that bind to enhancer-like sites, located far upstream from the promoter. These unique prokaryotic proteins, known as enhancer-binding proteins (EBP), mediate open promoter complex formation in a reaction dependent on NTP hydrolysis. The best characterized proteins of this family of regulators are NtrC and Nif A, which activate genes required for ammonia assimilation and nitrogen fixation, respectively. In a recent IRBM course (“Frontiers of protein structure prediction,” IRBM, Pomezia, Italy, 1995; see web site http://www.mrc-cpe.cam.uk/ irbm-course95/), one of us (J.O.) participated in the elaboration of the proposal that the Central domain of the EBPs might adopt the classical mononucleotide-binding fold. This suggestion was based on the results of a new protein fold recognition algorithm (Map) and in the mapping of correlated mutations calculated for the sequence family on the same mononucleotide-binding fold topology. In this work, we present new data that support the previous conclusion. The results from a number of different secondary structure prediction programs suggest that the Central domain could adopt an alfi topology. The fold recognition programs ProFIT 0.9, 3D PROFILE combined with secondary structure prediction, and 123D suggest a mononucleotide-binding fold topology for the Central domain amino acid sequence. Finally, and most importantly, three of five reported residue alterations that impair the Central domain ATPase activity of the Eo-54 activators are mapped to polypeptide regions that might be playing equivalent roles as those involved in nucleotide-binding in the mononucleotide-binding proteins. Furthermore, the known residue substitutions that alter the function of the Ecr54 activators, leaving intact the Central domain ATPase activity, are mapped on a region proposed to play an equivalent role as the effector region of the GTPase superfamily.  相似文献   

11.
    
Structures of proteins and protein–protein complexes are determined by the same physical principles and thus share a number of similarities. At the same time, there could be differences because in order to function, proteins interact with other molecules, undergo conformations changes, and so forth, which might impose different restraints on the tertiary versus quaternary structures. This study focuses on structural properties of protein–protein interfaces in comparison with the protein core, based on the wealth of currently available structural data and new structure‐based approaches. The results showed that physicochemical characteristics, such as amino acid composition, residue–residue contact preferences, and hydrophilicity/hydrophobicity distributions, are similar in protein core and protein–protein interfaces. On the other hand, characteristics that reflect the evolutionary pressure, such as structural composition and packing, are largely different. The results provide important insight into fundamental properties of protein structure and function. At the same time, the results contribute to better understanding of the ways to dock proteins. Recent progress in predicting structures of individual proteins follows the advancement of deep learning techniques and new approaches to residue coevolution data. Protein core could potentially provide large amounts of data for application of the deep learning to docking. However, our results showed that the core motifs are significantly different from those at protein–protein interfaces, and thus may not be directly useful for docking. At the same time, such difference may help to overcome a major obstacle in application of the coevolutionary data to docking—discrimination of the intramolecular information not directly relevant to docking.  相似文献   

12.
    
In the past few years, a new generation of fold recognition methods has been developed, in which the classical sequence information is combined with information obtained from secondary structure and, sometimes, accessibility predictions. The results are promising, indicating that this approach may compete with potential-based methods (Rost B et al., 1997, J Mol Biol 270:471-480). Here we present a systematic study of the different factors contributing to the performance of these methods, in particular when applied to the problem of fold recognition of remote homologues. Our results indicate that secondary structure and accessibility prediction methods have reached an accuracy level where they are not the major factor limiting the accuracy of fold recognition. The pattern degeneracy problem is confirmed as the major source of error of these methods. On the basis of these results, we study three different options to overcome these limitations: normalization schemes, mapping of the coil state into the different zones of the Ramachandran plot, and post-threading graphical analysis.  相似文献   

13.
  总被引:1,自引:0,他引:1  
  相似文献   

14.
Protein structure prediction by using bioinformatics can involve sequence similarity searches, multiple sequence alignments, identification and characterization of domains, secondary structure prediction, solvent accessibility prediction, automatic protein fold recognition, constructing three-dimensional models to atomic detail, and model validation. Not all protein structure prediction projects involve the use of all these techniques. A central part of a typical protein structure prediction is the identification of a suitable structural target from which to extrapolate three-dimensional information for a query sequence. The way in which this is done defines three types of projects. The first involves the use of standard and well-understood techniques. If a structural template remains elusive, a second approach using nontrivial methods is required. If a target fold cannot be reliably identified because inconsistent results have been obtained from nontrivial data analyses, the project falls into the third type of project and will be virtually impossible to complete with any degree of reliability. In this article, a set of protocols to predict protein structure from sequence is presented and distinctions among the three types of project are given. These methods, if used appropriately, can provide valuable indicators of protein structure and function.  相似文献   

15.
    
The threading approach to protein structure prediction suffers from the limited number of substantially different folds available as templates. A method is presented for the generation of artificial protein structures, amenable to threading, by modification of native ones. The artificial structures so generated are compared to the native ones and it is shown that, within the accuracy of the pseudoenergy function or force field used, these two types of structures appear equally useful for threading. Since a multitude of pseudonative artificial structures can be generated per native structure, the pool of pseudonative template structures for threading can be enormously enlarged by the inclusion of the pseudonative artificial structures. Proteins 28:522–529, 1997. © 1997 Wiley-Liss, Inc.  相似文献   

16.
    
Structural characterization of protein–protein interactions is essential for our ability to understand life processes. However, only a fraction of known proteins have experimentally determined structures. Such structures provide templates for modeling of a large part of the proteome, where individual proteins can be docked by template‐free or template‐based techniques. Still, the sensitivity of the docking methods to the inherent inaccuracies of protein models, as opposed to the experimentally determined high‐resolution structures, remains largely untested, primarily due to the absence of appropriate benchmark set(s). Structures in such a set should have predefined inaccuracy levels and, at the same time, resemble actual protein models in terms of structural motifs/packing. The set should also be large enough to ensure statistical reliability of the benchmarking results. We present a major update of the previously developed benchmark set of protein models. For each interactor, six models were generated with the model‐to‐native Cα RMSD in the 1 to 6 Å range. The models in the set were generated by a new approach, which corresponds to the actual modeling of new protein structures in the “real case scenario,” as opposed to the previous set, where a significant number of structures were model‐like only. In addition, the larger number of complexes (165 vs. 63 in the previous set) increases the statistical reliability of the benchmarking. We estimated the highest accuracy of the predicted complexes (according to CAPRI criteria), which can be attained using the benchmark structures. The set is available at http://dockground.bioinformatics.ku.edu . Proteins 2015; 83:891–897. © 2015 Wiley Periodicals, Inc.  相似文献   

17.
The three-dimensional (3D) structure prediction of proteins :is an important task in bioinformatics. Finding energy functions that can better represent residue-residue and residue-solvent interactions is a crucial way to improve the prediction accu- racy. The widely used contact energy functions mostly only consider the contact frequency between different types of residues; however, we find that the contact frequency also relates to the residue hydrophobic environment. Accordingly, we present an improved contact energy function to integrate the two factors, which can reflect the influence of hydrophobic interaction on the stabilization of protein 3D structure more effectively. Furthermore, a fold recognition (threading) approach based on this energy function is developed. The testing results obtained with 20 randomly selected proteins demonstrate that, compared with common contact energy functions, the proposed energy function can improve the accuracy of the fold template prediction from 20% to 50%, and can also improve the accuracy of the sequence-template alignment from 35% to 65%.  相似文献   

18.
    
A systematic study of helix-helix packing in a comprehensive database of protein structures revealed that the side chains inside helix-helix interfaces on average are shorter than those in the noninterface parts of the helices. The study follows our earlier study of this effect in transmembrane helices. The results obtained on the entire database of protein structures are consistent with those obtained on the transmembrane helices. The difference in the length of interface and noninterface side chains is small but statistically significant. It indicates that helices, if viewed along their main axis, statistically are not circular, but have a flattened interface. This effect brings the helices closer to each other and creates a tighter structural packing. The results provide an interesting insight into the aspects of protein structure and folding.  相似文献   

19.
    
The caspase‐recruitment domain (CARD) is known to play an important role in apoptosis and inflammation as an essential protein–protein interaction domain. The CARD of the cytosolic pathogen receptor Nod1 was overexpressed in Escherichia coli and purified by affinity chromatography and gel filtration. The purified CARD was crystallized at 277 K using the microseeding method. X‐ray diffraction data were collected to 1.9 Å resolution. The crystals belong to space group P31 or P32, with unit‐cell parameters a = b = 79.1, c = 80.9 Å. Preliminary analysis indicates that there is one dimeric CARD molecule in the asymmetric unit.  相似文献   

20.
    
Caspase recruitment domain (CARD)-only proteins (COPs), regulate apoptosis, inflammation, and innate immunity. They inhibit the assembly of NOD-like receptor complexes such as the inflammasome and NODosome, which are molecular complexes critical for caspase-1 activation. COPs are known to interact with either caspase-1 CARD or RIP2 CARD via a CARD-CARD interaction, and inhibit caspase-1 activation or further downstream signaling. In addition to the human COPs, Pseudo-ICE, INCA, and ICEBERG, several viruses also contain viral COPs that help them escape the host immune system. To elucidate the molecular mechanism of host immunity inhibition by viral COPs, we solved the structure of a viral COP for the first time. Our structure showed that viral COP forms a structural transformation-mediated dimer, which is unique and has not been reported in any structural study of a CARD domain. Based on the current structure, and the previously solved structures of other death domain superfamily members, we propose that structural transformation-mediated dimerization might be a new strategy for dimer assembly in the death domain superfamily.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号