首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 0 毫秒
In the past few years, a new generation of fold recognition methods has been developed, in which the classical sequence information is combined with information obtained from secondary structure and, sometimes, accessibility predictions. The results are promising, indicating that this approach may compete with potential-based methods (Rost B et al., 1997, J Mol Biol 270:471-480). Here we present a systematic study of the different factors contributing to the performance of these methods, in particular when applied to the problem of fold recognition of remote homologues. Our results indicate that secondary structure and accessibility prediction methods have reached an accuracy level where they are not the major factor limiting the accuracy of fold recognition. The pattern degeneracy problem is confirmed as the major source of error of these methods. On the basis of these results, we study three different options to overcome these limitations: normalization schemes, mapping of the coil state into the different zones of the Ramachandran plot, and post-threading graphical analysis.  相似文献   

This paper evaluates the results of a protein structure prediction contest. The predictions were made using threading procedures, which employ techniques for aligning sequences with 3D structures to select the correct fold of a given sequence from a set of alternatives. Nine different teams submitted 86 predictions, on a total of 21 target proteins with little or no sequence homology to proteins of known structure. The 3D structures of these proteins were newly determined by experimental methods, but not yet published or otherwise available to the predictors. The predictions, made from the amino acid sequence alone, thus represent a genuine test of the current performance of threading methods. Only a subset of all the predictions is evaluated here. It corresponds to the 44 predictions submitted for the 11 target proteins seen to adopt known folds. The predictions for the remaining 10 proteins were not analyzed, although weak similarities with known folds may also exist in these proteins. We find that threading methods are capable of identifying the correct fold in many cases, but not reliably enough as yet. Every team predicts correctly a different set of targets, with virtually all targets predicted correctly by at least one team. Also, common folds such as TIM barrels are recognized more readily than folds with only a few known examples. However, quite surprisingly, the quality of the sequence-structure alignments, corresponding to correctly recognized folds, is generally very poor, as judged by comparison with the corresponding 3D structure alignments. Thus, threading can presently not be relied upon to derive a detailed 3D model from the amino acid sequence. This raises a very intriguing question: how is fold recognition achieved? Our analysis suggests that it may be achieved because threading procedures maximize hydrophobic interactions in the protein core, and are reasonably good at recognizing local secondary structure. © 1995 Wiley-Liss, Inc.  相似文献   

NMR offers the possibility of accurate secondary structure for proteins that would be too large for structure determination. In the absence of an X-ray crystal structure, this information should be useful as an adjunct to protein fold recognition methods based on low resolution force fields. The value of this information has been tested by adding varying amounts of artificial secondary structure data and threading a sequence through a library of candidate folds. Using a literature test set, the threading method alone has only a one-third chance of producing a correct answer among the top ten guesses. With realistic secondary structure information, one can expect a 60-80% chance of finding a homologous structure. The method has then been applied to examples with published estimates of secondary structure. This implementation is completely independent of sequence homology, and sequences are optimally aligned to candidate structures with gaps and insertions allowed. Unlike work using predicted secondary structure, we test the effect of differing amounts of relatively reliable data.  相似文献   

We present an analysis of 10 blind predictions prepared for a recent conference, “Critical Assessment of Techniques for Protein Structure Prediction.”1 The sequences of these proteins are not detectably similar to those of any protein in the structure database then available, but we attempted, by a threading method, to recognize similarity to known domain folds. Four of the 10 proteins, as we subsequently learned, do indeed show significant similarity to then-known structures. For 2 of these proteins the predictions were accurate, in the sense that a similar structure was at or near the top of the list of threading scores, and the threading alignment agreed well with the corresponding structural alignment. For the best predicted model mean alignment error relative to the optimal structural alignment was 2.7 residues, arising entirely from small “register shifts” of strands or helices. In the analysis we attempt to identify factors responsible for these successes and failures. Since our threading method does not use gap penalties, we may readily distinguish between errors arising from our prior definition of the “cores” of known structures and errors arising from inherent limitations in the threading potential. It would appear from the results that successful substructure recognition depends most critically on accurate definition of the “fold” of a database protein. This definition must correctly delineate substructures that are, and are not, likely to be conserved during protein evolution. © 1995 Wiley-Liss, Inc.  相似文献   

Improving fold recognition without folds   总被引:4,自引:0,他引:4  
The most reliable way to align two proteins of unknown structure is through sequence-profile and profile-profile alignment methods. If the structure for one of the two is known, fold recognition methods outperform purely sequence-based alignments. Here, we introduced a novel method that aligns generalised sequence and predicted structure profiles. Using predicted 1D structure (secondary structure and solvent accessibility) significantly improved over sequence-only methods, both in terms of correctly recognising pairs of proteins with different sequences and similar structures and in terms of correctly aligning the pairs. The scores obtained by our generalised scoring matrix followed an extreme value distribution; this yielded accurate estimates of the statistical significance of our alignments. We found that mistakes in 1D structure predictions correlated between proteins from different sequence-structure families. The impact of this surprising result was that our method succeeded in significantly out-performing sequence-only methods even without explicitly using structural information from any of the two. Since AGAPE also outperformed established methods that rely on 3D information, we made it available through. If we solved the problem of CPU-time required to apply AGAPE on millions of proteins, our results could also impact everyday database searches.  相似文献   

Zhou H  Zhou Y 《Proteins》2004,55(4):1005-1013
An elaborate knowledge-based energy function is designed for fold recognition. It is a residue-level single-body potential so that highly efficient dynamic programming method can be used for alignment optimization. It contains a backbone torsion term, a buried surface term, and a contact-energy term. The energy score combined with sequence profile and secondary structure information leads to an algorithm called SPARKS (Sequence, secondary structure Profiles and Residue-level Knowledge-based energy Score) for fold recognition. Compared with the popular PSI-BLAST, SPARKS is 21% more accurate in sequence-sequence alignment in ProSup benchmark and 10%, 25%, and 20% more sensitive in detecting the family, superfamily, fold similarities in the Lindahl benchmark, respectively. Moreover, it is one of the best methods for sensitivity (the number of correctly recognized proteins), alignment accuracy (based on the MaxSub score), and specificity (the average number of correctly recognized proteins whose scores are higher than the first false positives) in LiveBench 7 among more than twenty servers of non-consensus methods. The simple algorithm used in SPARKS has the potential for further improvement. This highly efficient method can be used for fold recognition on genomic scales. A web server is established for academic users on http://theory.med.buffalo.edu.  相似文献   

胡始昌  江弋  林琛  邹权 《生物信息学》2012,10(2):112-115
蛋白质折叠问题被列为"21世纪的生物物理学"的重要课题,他是分子生物学中心法则尚未解决的一个重大生物学问题,因此预测蛋白质折叠模式是一个复杂、困难、和有挑战性的工作。为了解决该问题,我们引入了分类器集成,本文所采用的是三种分类器(LMT、RandomForest、SMO)进行集成以及188维组合理化特征来对蛋白质类别进行预测。实验证明,该方法可以有效表征蛋白质折叠模式的特性,对蛋白质序列数据实现精确分类;交叉验证和独立测试均证明本文预测准确率超过70%,比前人工作提高近10个百分点。  相似文献   

Yo Matsuo  Ken Nishikawa 《Proteins》1995,23(3):370-375
A protein fold recognition method was tested by the blind prediction of the structures of a set of proteins. The method evaluates the compatibility of an amino acid sequence with a three-dimensional structure using the four evaluation functions: side-chain packing, solvation, hydrogen-bonding, and local conformation functions. The structures of 14 proteins containing 19 sequences were predicted. The predictions were compared with the experimental structures. The experimental results showed that 9 of the 19 target sequences have known folds or portions of known folds. Among them, the folds of Klebsiella aerogenes urease β subunit (KAUB) and pyruvate phosphate dikinase domain 4 (PPDK4) were successfully recognized; our method predicted that KAUB and PPDK4 would adopt the folds of macromomycin (Ig-fold) and phosphoribosylanthra-nilate isomerase:indoleglycerol-phosphate synthase (TIM barrel), respectively, and the experimental structure revealed that they actually adopt the predicted folds. The predictions for the other targets were not successful, but they often gave secondary structural patterns similar to those of the experimental structures. © 1995 Wiley-Liss, Inc.  相似文献   

Zhou H  Zhou Y 《Proteins》2005,58(2):321-328
Recognizing structural similarity without significant sequence identity has proved to be a challenging task. Sequence-based and structure-based methods as well as their combinations have been developed. Here, we propose a fold-recognition method that incorporates structural information without the need of sequence-to-structure threading. This is accomplished by generating sequence profiles from protein structural fragments. The structure-derived sequence profiles allow a simple integration with evolution-derived sequence profiles and secondary-structural information for an optimized alignment by efficient dynamic programming. The resulting method (called SP(3)) is found to make a statistically significant improvement in both sensitivity of fold recognition and accuracy of alignment over the method based on evolution-derived sequence profiles alone (SP) and the method based on evolution-derived sequence profile and secondary structure profile (SP(2)). SP(3) was tested in SALIGN benchmark for alignment accuracy and Lindahl, PROSPECTOR 3.0, and LiveBench 8.0 benchmarks for remote-homology detection and model accuracy. SP(3) is found to be the most sensitive and accurate single-method server in all benchmarks tested where other methods are available for comparison (although its results are statistically indistinguishable from the next best in some cases and the comparison is subjected to the limitation of time-dependent sequence and/or structural library used by different methods.). In LiveBench 8.0, its accuracy rivals some of the consensus methods such as ShotGun-INBGU, Pmodeller3, Pcons4, and ROBETTA. SP(3) fold-recognition server is available on http://theory.med.buffalo.edu.  相似文献   

The expression of genes transcribed by the RNA polymerase with the alternative sigma factor <r54 (Ecr54) is absolutely dependent on activator proteins that bind to enhancer-like sites, located far upstream from the promoter. These unique prokaryotic proteins, known as enhancer-binding proteins (EBP), mediate open promoter complex formation in a reaction dependent on NTP hydrolysis. The best characterized proteins of this family of regulators are NtrC and Nif A, which activate genes required for ammonia assimilation and nitrogen fixation, respectively. In a recent IRBM course (“Frontiers of protein structure prediction,” IRBM, Pomezia, Italy, 1995; see web site http://www.mrc-cpe.cam.uk/ irbm-course95/), one of us (J.O.) participated in the elaboration of the proposal that the Central domain of the EBPs might adopt the classical mononucleotide-binding fold. This suggestion was based on the results of a new protein fold recognition algorithm (Map) and in the mapping of correlated mutations calculated for the sequence family on the same mononucleotide-binding fold topology. In this work, we present new data that support the previous conclusion. The results from a number of different secondary structure prediction programs suggest that the Central domain could adopt an alfi topology. The fold recognition programs ProFIT 0.9, 3D PROFILE combined with secondary structure prediction, and 123D suggest a mononucleotide-binding fold topology for the Central domain amino acid sequence. Finally, and most importantly, three of five reported residue alterations that impair the Central domain ATPase activity of the Eo-54 activators are mapped to polypeptide regions that might be playing equivalent roles as those involved in nucleotide-binding in the mononucleotide-binding proteins. Furthermore, the known residue substitutions that alter the function of the Ecr54 activators, leaving intact the Central domain ATPase activity, are mapped on a region proposed to play an equivalent role as the effector region of the GTPase superfamily.  相似文献   

Stephen H. Bryant 《Proteins》1996,26(2):172-185
Threading experiments with proteins from the globin family provide an indication of the nature of the structural similarity required for successful fold recognition and accurate sequence-structure alignment. Threading scores are found to rise above the noise of false positives whenever roughly 60% of residues from a sequence can be aligned with analogous sites in the structure of a remote homolog. Fold recognition specificity thus appears to be limited by the extent of structural similarity, regardless of the degree of sequence similarity. Threading alignment accuracy is found to depend more critically on the degree of structural similarity. Alignments are accurate, placing the majority of residues exactly as in structural alignment, only when superposition residuals are less than 2.5 Å. These criteria for successful recognition and sequence-structure alignment appear to be consistent with the successes and failures of threading methods in blind structure prediction. They also suggest a direct assay for improved threading methods: Potentials and alignment models should be tested for their ability to detect less extensive structural similarities, and to produce accurate alignments when superposition residuals for this conserved “core” fall in the range characteristic of remote homologs. © 1996 Wiley-Liss, Inc.
  • 1 This article is a US Government work and, as such, is in the public domain in the United States of America.
  •   相似文献   

    Kinch LN  Baker D  Grishin NV 《Proteins》2003,52(3):323-331
    Sequence--and structure-based searching strategies have proven useful in the identification of remote homologs and have facilitated both structural and functional predictions of many uncharacterized protein families. We implement these strategies to predict the structure of and to classify a previously uncharacterized cluster of orthologs (COG3019) in the thioredoxin-like fold superfamily. The results of each searching method indicate that thioltransferases are the closest structural family to COG3019. We substantiate this conclusion using the ab initio structure prediction method rosetta, which generates a thioredoxin-like fold similar to that of the glutaredoxin-like thioltransferase (NrdH) for a COG3019 target sequence. This structural model contains the thiol-redox functional motif CYS-X-X-CYS in close proximity to other absolutely conserved COG3019 residues, defining a novel thioredoxin-like active site that potentially binds metal ions. Finally, the rosetta-derived model structure assists us in assembling a global multiple-sequence alignment of COG3019 with two other thioredoxin-like fold families, the thioltransferases and the bacterial arsenate reductases (ArsC).  相似文献   

    The detection of remote homolog pairs of proteins using computational methods is a pivotal problem in structural bioinformatics, aiming to compute protein folds on the basis of information in the database of known structures. In the last 25 years, several methods have been developed to tackle this problem, based on different approaches including sequence-sequence alignments and/or structure comparison. In this article, we will briefly discuss When, Why, Where and How (WWWH) to perform remote homology search, reviewing some of the most widely adopted computational approaches. The specific aim is highlighting the basic criteria implemented by different research groups and commenting on the status of the art as well as on still-open questions.  相似文献   

    Chen H  Kihara D 《Proteins》2011,79(1):315-334
    Computational protein structure prediction remains a challenging task in protein bioinformatics. In the recent years, the importance of template-based structure prediction is increasing because of the growing number of protein structures solved by the structural genomics projects. To capitalize the significant efforts and investments paid on the structural genomics projects, it is urgent to establish effective ways to use the solved structures as templates by developing methods for exploiting remotely related proteins that cannot be simply identified by homology. In this work, we examine the effect of using suboptimal alignments in template-based protein structure prediction. We showed that suboptimal alignments are often more accurate than the optimal one, and such accurate suboptimal alignments can occur even at a very low rank of the alignment score. Suboptimal alignments contain a significant number of correct amino acid residue contacts. Moreover, suboptimal alignments can improve template-based models when used as input to Modeller. Finally, we use suboptimal alignments for handling a contact potential in a probabilistic way in a threading program, SUPRB. The probabilistic contacts strategy outperforms the partly thawed approach, which only uses the optimal alignment in defining residue contacts, and also the re-ranking strategy, which uses the contact potential in re-ranking alignments. The comparison with existing methods in the template-recognition test shows that SUPRB is very competitive and outperforms existing methods.  相似文献   

    The three-dimensional (3D) structure prediction of proteins :is an important task in bioinformatics. Finding energy functions that can better represent residue-residue and residue-solvent interactions is a crucial way to improve the prediction accu- racy. The widely used contact energy functions mostly only consider the contact frequency between different types of residues; however, we find that the contact frequency also relates to the residue hydrophobic environment. Accordingly, we present an improved contact energy function to integrate the two factors, which can reflect the influence of hydrophobic interaction on the stabilization of protein 3D structure more effectively. Furthermore, a fold recognition (threading) approach based on this energy function is developed. The testing results obtained with 20 randomly selected proteins demonstrate that, compared with common contact energy functions, the proposed energy function can improve the accuracy of the fold template prediction from 20% to 50%, and can also improve the accuracy of the sequence-template alignment from 35% to 65%.  相似文献   

    Structural and functional annotation of the large and growing database of genomic sequences is a major problem in modern biology. Protein structure prediction by detecting remote homology to known structures is a well-established and successful annotation technique. However, the broad spectrum of evolutionary change that accompanies the divergence of close homologues to become remote homologues cannot easily be captured with a single algorithm. Recent advances to tackle this problem have involved the use of multiple predictive algorithms available on the Internet. Here we demonstrate how such ensembles of predictors can be designed in-house under controlled conditions and permit significant improvements in recognition by using a concept taken from protein loop energetics and applying it to the general problem of 3D clustering. We have developed a stringent test that simulates the situation where a protein sequence of interest is submitted to multiple different algorithms and not one of these algorithms can make a confident (95%) correct assignment. A method of meta-server prediction (Phyre) that exploits the benefits of a controlled environment for the component methods was implemented. At 95% precision or higher, Phyre identified 64.0% of all correct homologous query-template relationships, and 84.0% of the individual test query proteins could be accurately annotated. In comparison to the improvement that the single best fold recognition algorithm (according to training) has over PSI-Blast, this represents a 29.6% increase in the number of correct homologous query-template relationships, and a 46.2% increase in the number of accurately annotated queries. It has been well recognised in fold prediction, other bioinformatics applications, and in many other areas, that ensemble predictions generally are superior in accuracy to any of the component individual methods. However there is a paucity of information as to why the ensemble methods are superior and indeed this has never been systematically addressed in fold recognition. Here we show that the source of ensemble power stems from noise reduction in filtering out false positive matches. The results indicate greater coverage of sequence space and improved model quality, which can consequently lead to a reduction in the experimental workload of structural genomics initiatives.  相似文献   

    A thermodynamic model describing formation of α-helices by peptides and proteins in the absence of specific tertiary interactions has been developed. The model combines free energy terms defining α-helix stability in aqueous solution and terms describing immersion of every helix or fragment of coil into a micelle or a nonpolar droplet created by the rest of protein to calculate averaged or lowest energy partitioning of the peptide chain into helical and coil fragments. The α-helix energy in water was calculated with parameters derived from peptide substitution and protein engineering data and using estimates of nonpolar contact areas between side chains. The energy of nonspecific hydrophobic interactions was estimated considering each α-helix or fragment of coil as freely floating in the spherical micelle or droplet, and using water/cyclohexane (for micelles) or adjustable (for proteins) side-chain transfer energies. The model was verified for 96 and 36 peptides studied by 1H-nmr spectroscopy in aqueous solution and in the presence of micelles, respectively ([set I] and [set 2]) and for 30 mostly α-helical globular proteins ([set 3]). For peptides, the experimental helix locations were identified from the published medium-range nuclear Overhauser effects detected by 1H-nmr spectroscopy. For sets 1, 2, and 3, respectively, 93, 100, and 97% of helices were identified with average errors in calculation of helix boundaries of 1.3, 2.0, and 4.1 residues per helix and an average percentage of correctly calculated helix—coil states of 93, 89, and 81%, respectively. Analysis of adjustable parameters of the model (the entropy and enthalpy of the helix—coil transition, the transfer energy of the helix backbone, and parameters of the bound coil), determined by minimization of the average helix boundary deviation for each set of peptides or proteins, demonstrates that, unlike micelles, the interior of the effective protein droplet has solubility characteristics different from that for cyclohexane, does not bind fragments of coil, and lacks interfacial area. © 1997 John Wiley & Sons, Inc. Biopoly 42: 239–269, 1997  相似文献   

    设为首页 | 免责声明 | 关于勤云 | 加入收藏

    Copyright©北京勤云科技发展有限公司  京ICP备09084417号