首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
To improve the prediction accuracy in the regime where template alignment quality is poor, an updated version of TASSER_2.0, namely TASSER_WT, was developed. TASSER_WT incorporates more accurate contact restraints from a new method, COMBCON. COMBCON uses confidence-weighted contacts from PROSPECTOR_3.5, the latest version, PROSPECTOR_4, and a new local structural fragment-based threading algorithm, STITCH, implemented in two variants depending on expected fragment prediction accuracy. TASSER_WT is tested on 622 Hard proteins, the most difficult targets (incorrect alignments and/or templates and incorrect side-chain contact restraints) in a comprehensive benchmark of 2591 nonhomologous, single domain proteins ≤200 residues that cover the PDB at 35% pairwise sequence identity. For 454 of 622 Hard targets, COMBCON provides contact restraints with higher accuracy and number of contacts per residue. As contact coverage with confidence weight ≥3 (Fwt≥3cov) increases, the more improved are TASSER_WT models. When Fwt≥3cov > 1.0 and > 0.4, the average root mean-square deviation of TASSER_WT (TASSER_2.0) models is 4.11 Å (6.72 Å) and 5.03 Å (6.40 Å), respectively. Regarding a structure prediction as successful when a model has a TM-score to the native structure ≥0.4, when Fwt≥3cov > 1.0 and > 0.4, the success rate of TASSER_WT (TASSER_2.0) is 98.8% (76.2%) and 93.7% (81.1%), respectively.  相似文献   

2.
In a cell, it has been estimated that each protein on average interacts with roughly 10 others, resulting in tens of thousands of proteins known or suspected to have interaction partners; of these, only a tiny fraction have solved protein structures. To partially address this problem, we have developed M-TASSER, a hierarchical method to predict protein quaternary structure from sequence that involves template identification by multimeric threading, followed by multimer model assembly and refinement. The final models are selected by structure clustering. M-TASSER has been tested on a benchmark set comprising 241 dimers having templates with weak sequence similarity and 246 without multimeric templates in the dimer library. Of the total of 207 targets predicted to interact as dimers, 165 (80%) were correctly assigned as interacting with a true positive rate of 68% and a false positive rate of 17%. The initial best template structures have an average root mean-square deviation to native of 5.3, 6.7, and 7.4 Å for the monomer, interface, and dimer structures. The final model shows on average a root mean-square deviation improvement of 1.3, 1.3, and 1.5 Å over the initial template structure for the monomer, interface, and dimer structures, with refinement evident for 87% of the cases. Thus, we have developed a promising approach to predict full-length quaternary structure for proteins that have weak sequence similarity to proteins of solved quaternary structure.  相似文献   

3.
One of the main barriers to accurate computational protein structure prediction is searching the vast space of protein conformations. Distance restraints or inter‐residue contacts have been used to reduce this search space, easing the discovery of the correct folded state. It has been suggested that about 1 contact for every 12 residues may be sufficient to predict structure at fold level accuracy. Here, we use coarse‐grained structure‐based models in conjunction with molecular dynamics simulations to examine this empirical prediction. We generate sparse contact maps for 15 proteins of varying sequence lengths and topologies and find that given perfect secondary‐structural information, a small fraction of the native contact map (5%‐10%) suffices to fold proteins to their correct native states. We also find that different sparse maps are not equivalent and we make several observations about the type of maps that are successful at such structure prediction. Long range contacts are found to encode more information than shorter range ones, especially for α and αβ‐proteins. However, this distinction reduces for β‐proteins. Choosing contacts that are a consensus from successful maps gives predictive sparse maps as does choosing contacts that are well spread out over the protein structure. Additionally, the folding of proteins can also be used to choose predictive sparse maps. Overall, we conclude that structure‐based models can be used to understand the efficacy of structure‐prediction restraints and could, in future, be tuned to include specific force‐field interactions, secondary structure errors and noise in the sparse maps.  相似文献   

4.
Li W  Zhang Y  Skolnick J 《Biophysical journal》2004,87(2):1241-1248
The protein structure prediction algorithm TOUCHSTONEX that uses sparse distance restraints derived from NMR nuclear Overhauser enhancement (NOE) data to predict protein structures at low-to-medium resolution was evaluated as follows: First, a representative benchmark set of the Protein Data Bank library consisting of 1365 proteins up to 200 residues was employed. Using N/8 simulated long-range restraints, where N is the number of residues, 1023 (75%) proteins were folded to a C(alpha) root-mean-square deviation (RMSD) from native <6.5 A in one of the top five models. The average RMSD of the models for all 1365 proteins is 5.0 A. Using N/4 simulated restraints, 1206 (88%) proteins were folded to a RMSD <6.5 A and the average RMSD improved to 4.1 A. Then, 69 proteins with experimental NMR data were used. Using long-range NOE-derived restraints, 47 proteins were folded to a RMSD <6.5 A with N/8 restraints and 61 proteins were folded to a RMSD <6.5 A with N/4 restraints. Thus, TOUCHSTONEX can be a tool for NMR-based rapid structure determination, as well as used in other experimental methods that can provide tertiary restraint information.  相似文献   

5.
One of the challenging problems in tertiary structure prediction of helical membrane proteins (HMPs) is the determination of rotation of α‐helices around the helix normal. Incorrect prediction of helix rotations substantially disrupts native residue–residue contacts while inducing only a relatively small effect on the overall fold. We previously developed a method for predicting residue contact numbers (CNs), which measure the local packing density of residues within the protein tertiary structure. In this study, we tested the idea of incorporating predicted CNs as restraints to guide the sampling of helix rotation. For a benchmark set of 15 HMPs with simple to rather complicated folds, the average contact recovery (CR) of best‐sampled models was improved for all targets, the likelihood of sampling models with CR greater than 20% was increased for 13 targets, and the average RMSD100 of best‐sampled models was improved for 12 targets. This study demonstrated that explicit incorporation of CNs as restraints improves the prediction of helix–helix packing. Proteins 2017; 85:1212–1221. © 2017 Wiley Periodicals, Inc.  相似文献   

6.
Protein structure prediction (PSP) is a significant area for biological information research, disease treatment, and drug development and so on. In this paper, three-dimensional structures of proteins are predicted based on the known amino acid sequences, and the structure prediction problem is transformed into a typical NP problem by an AB off-lattice model. This work applies a novel improved Stochastic Fractal Search algorithm (ISFS) to solve the problem. The Stochastic Fractal Search algorithm (SFS) is an effective evolutionary algorithm that performs well in exploring the search space but falls into local minimums sometimes. In order to avoid the weakness, Lvy flight and internal feedback information are introduced in ISFS. In the experimental process, simulations are conducted by ISFS algorithm on Fibonacci sequences and real peptide sequences. Experimental results prove that the ISFS performs more efficiently and robust in terms of finding the global minimum and avoiding getting stuck in local minimums.  相似文献   

7.

Background

The analysis of correlation in alignments generates a matrix of predicted contacts between positions in the structure and while these can arise for many reasons, the simplest explanation is that the pair of residues are in contact in a three-dimensional structure and are affecting each others selection pressure. To analyse these data, A dynamic programming algorithm was developed for parsing secondary structure interactions in predicted contact maps.

Results

The non-local nature of the constraints required an iterated approach (using a “frozen approximation”) but with good starting definitions, a single pass was usually sufficient. The method was shown to be effective when applied to the transmembrane class of protein and error tolerant even when the signal becomes degraded. In the globular class of protein, where the extent of interactions are more limited and more complex, the algorithm still behaved well, classifying most of the important interactions correctly in both a small and a large test case. For the larger protein, this involved examples of the algorithm apportioning parts of a single large secondary structure element between two different interactions.

Conclusions

It is expected that the method will be useful as a pre-processor to coarse-grained modelling methods to extend the range of protein tertiary structure prediction to larger proteins or to data that is currently too ’noisy’ to be used by current residue-based methods.
  相似文献   

8.
For many membrane proteins, the determination of their topology remains a challenge for methods like X‐ray crystallography and nuclear magnetic resonance (NMR) spectroscopy. Electron paramagnetic resonance (EPR) spectroscopy has evolved as an alternative technique to study structure and dynamics of membrane proteins. The present study demonstrates the feasibility of membrane protein topology determination using limited EPR distance and accessibility measurements. The BCL::MP‐Fold (BioChemical Library membrane protein fold) algorithm assembles secondary structure elements (SSEs) in the membrane using a Monte Carlo Metropolis (MCM) approach. Sampled models are evaluated using knowledge‐based potential functions and agreement with the EPR data and a knowledge‐based energy function. Twenty‐nine membrane proteins of up to 696 residues are used to test the algorithm. The RMSD100 value of the most accurate model is better than 8 Å for 27, better than 6 Å for 22, and better than 4 Å for 15 of the 29 proteins, demonstrating the algorithms' ability to sample the native topology. The average enrichment could be improved from 1.3 to 2.5, showing the improved discrimination power by using EPR data. Proteins 2015; 83:1947–1962. © 2015 Wiley Periodicals, Inc  相似文献   

9.
J M Chandonia  M Karplus 《Proteins》1999,35(3):293-306
A primary and a secondary neural network are applied to secondary structure and structural class prediction for a database of 681 non-homologous protein chains. A new method of decoding the outputs of the secondary structure prediction network is used to produce an estimate of the probability of finding each type of secondary structure at every position in the sequence. In addition to providing a reliable estimate of the accuracy of the predictions, this method gives a more accurate Q3 (74.6%) than the cutoff method which is commonly used. Use of these predictions in jury methods improves the Q3 to 74.8%, the best available at present. On a database of 126 proteins commonly used for comparison of prediction methods, the jury predictions are 76.6% accurate. An estimate of the overall Q3 for a given sequence is made by averaging the estimated accuracy of the prediction over all residues in the sequence. As an example, the analysis is applied to the target beta-cryptogein, which was a difficult target for ab initio predictions in the CASP2 study; it shows that the prediction made with the present method (62% of residues correct) is close to the expected accuracy (66%) for this protein. The larger database and use of a new network training protocol also improve structural class prediction accuracy to 86%, relative to 80% obtained previously. Secondary structure content is predicted with accuracy comparable to that obtained with spectroscopic methods, such as vibrational or electronic circular dichroism and Fourier transform infrared spectroscopy.  相似文献   

10.
NMR residual dipolar couplings (RDCs), in the form of the projection angles between the respective internuclear bond vectors, are used as structural restraints in the ab initio structure prediction of a test set of six proteins. The restraints are applied using a recently developed SICHO (SIde-CHain-Only) lattice protein model that employs a replica exchange Monte Carlo (MC) algorithm to search conformational space. Using a small number of RDC restraints, the quality of the predicted structures is improved as reflected by lower RMSD/dRMSD (root mean square deviation/distance root mean square deviation) values from the corresponding native structures and by the higher correlation of the most cooperative mode of motion of each predicted structure with that of the native structure. The latter, in particular, has possible implications for the structure-based functional analysis of predicted structures.  相似文献   

11.
SUMMARY: Porter is a new system for protein secondary structure prediction in three classes. Porter relies on bidirectional recurrent neural networks with shortcut connections, accurate coding of input profiles obtained from multiple sequence alignments, second stage filtering by recurrent neural networks, incorporation of long range information and large-scale ensembles of predictors. Porter's accuracy, tested by rigorous 5-fold cross-validation on a large set of proteins, exceeds 79%, significantly above a copy of the state-of-the-art SSpro server, better than any system published to date. AVAILABILITY: Porter is available as a public web server at http://distill.ucd.ie/porter/ CONTACT: gianluca.pollastri@ucd.ie.  相似文献   

12.
Conformational sampling is one of the bottlenecks in fragment-based protein structure prediction approaches. They generally start with a coarse-grained optimization where mainchain atoms and centroids of side chains are considered, followed by a fine-grained optimization with an all-atom representation of proteins. It is during this coarse-grained phase that fragment-based methods sample intensely the conformational space. If the native-like region is sampled more, the accuracy of the final all-atom predictions may be improved accordingly. In this work we present EdaFold, a new method for fragment-based protein structure prediction based on an Estimation of Distribution Algorithm. Fragment-based approaches build protein models by assembling short fragments from known protein structures. Whereas the probability mass functions over the fragment libraries are uniform in the usual case, we propose an algorithm that learns from previously generated decoys and steers the search toward native-like regions. A comparison with Rosetta AbInitio protocol shows that EdaFold is able to generate models with lower energies and to enhance the percentage of near-native coarse-grained decoys on a benchmark of [Formula: see text] proteins. The best coarse-grained models produced by both methods were refined into all-atom models and used in molecular replacement. All atom decoys produced out of EdaFold's decoy set reach high enough accuracy to solve the crystallographic phase problem by molecular replacement for some test proteins. EdaFold showed a higher success rate in molecular replacement when compared to Rosetta. Our study suggests that improving low resolution coarse-grained decoys allows computational methods to avoid subsequent sampling issues during all-atom refinement and to produce better all-atom models. EdaFold can be downloaded from http://www.riken.jp/zhangiru/software/.  相似文献   

13.
We critically test and validate the CS‐Rosetta methodology for de novo structure prediction of ‐helical membrane proteins (MPs) from NMR data, such as chemical shifts and NOE distance restraints. By systematically reducing the number and types of NOE restraints, we focus on determining the regime in which MP structures can be reliably predicted and pinpoint the boundaries of the approach. Five MPs of known structure were used as test systems, phototaxis sensory rhodopsin II (pSRII), a subdomain of pSRII, disulfide binding protein B (DsbB), microsomal prostaglandin E2 synthase‐1 (mPGES‐1), and translocator protein (TSPO). For pSRII and DsbB, where NMR and X‐ray structures are available, resolution‐adapted structural recombination (RASREC) CS‐Rosetta yields structures that are as close to the X‐ray structure as the published NMR structures if all available NMR data are used to guide structure prediction. For mPGES‐1 and Bacillus cereus TSPO, where only X‐ray crystal structures are available, highly accurate structures are obtained using simulated NMR data. One main advantage of RASREC CS‐Rosetta is its robustness with respect to even a drastic reduction of the number of NOEs. Close‐to‐native structures were obtained with one randomly picked long‐range NOEs for every 14, 31, 38, and 8 residues for full‐length pSRII, the pSRII subdomain, TSPO, and DsbB, respectively, in addition to using chemical shifts. For mPGES‐1, atomically accurate structures could be predicted even from chemical shifts alone. Our results show that atomic level accuracy for helical membrane proteins is achievable with CS‐Rosetta using very sparse NOE restraint sets to guide structure prediction. Proteins 2017; 85:812–826. © 2016 Wiley Periodicals, Inc.  相似文献   

14.
One major problem with the existing algorithm for the prediction of protein structural classes is low accuracies for proteins from α/β and α+β classes. In this study, three novel features were rationally designed to model the differences between proteins from these two classes. In combination with other rational designed features, an 11-dimensional vector prediction method was proposed. By means of this method, the overall prediction accuracy based on 25PDB dataset was 1.5% higher than the previous best-performing method, MODAS. Furthermore, the prediction accuracy for proteins from α+β class based on 25PDB dataset was 5% higher than the previous best-performing method, SCPRED. The prediction accuracies obtained with the D675 and FC699 datasets were also improved.  相似文献   

15.
Although residue-residue contact maps dictate the topology of proteins, sequence-based ab initio contact predictions have been found little use in actual structure prediction due to the low accuracy. We developed a composite set of nine SVM-based contact predictors that are used in I-TASSER simulation in combination with sparse template contact restraints. When testing the strategy on 273 nonhomologous targets, remarkable improvements of I-TASSER models were observed for both easy and hard targets, with p value by Student's t test<0.00001 and 0.001, respectively. In several cases, template modeling score increases by >30%, which essentially converts "nonfoldable" targets into "foldable" ones. In CASP9, I-TASSER employed ab initio contact predictions, and generated models for 26 FM targets with a GDT-score 16% and 44% higher than the second and third best servers from other groups, respectively. These findings demonstrate a new avenue to improve the accuracy of protein structure prediction especially for free-modeling targets.  相似文献   

16.
The field of structural biology is becoming increasingly important as new technological developments facilitate the collection of data on the atomic structures of proteins and nucleic acids. The solid-state NMR method is a relatively new biophysical technique that holds particular promise for determining the structures of peptides and proteins that are located within the cell membrane. This method provides information on the orientation of the peptide planes relative to an external magnetic field. In this article, we discuss some of the mathematical methods and tools that are useful in deriving the atomic structure from these orientational data. We first discuss how the data are viewed as tensors, and how these tensors can be used to construct an initial atomic model, assuming ideal stereochemistry. We then discuss methods for refining the models using global optimization, with stereochemistry constraints treated as penalty functions. These two processes, initial model building followed by refinement, are the two crucial steps between data collection and the final atomic model.  相似文献   

17.
An algorithm has been developed to improve the success rate in the prediction of the secondary structure of proteins by taking into account the predicted class of the proteins. This method has been called the 'double prediction method' and consists of a first prediction of the secondary structure from a new algorithm which uses parameters of the type described by Chou and Fasman, and the prediction of the class of the proteins from their amino acid composition. These two independent predictions allow one to optimize the parameters calculated over the secondary structure database to provide the final prediction of secondary structure. This method has been tested on 59 proteins in the database (i.e. 10,322 residues) and yields 72% success in class prediction, 61.3% of residues correctly predicted for three states (helix, sheet and coil) and a good agreement between observed and predicted contents in secondary structure.  相似文献   

18.
Current genomic screens for noncoding RNAs (ncRNAs) predict a large number of genomic regions containing potential structural ncRNAs. The analysis of these data requires highly accurate prediction of ncRNA boundaries and discrimination of promising candidate ncRNAs from weak predictions. Existing methods struggle with these goals because they rely on sequence-based multiple sequence alignments, which regularly misalign RNA structure and therefore do not support identification of structural similarities. To overcome this limitation, we compute columnwise and global reliabilities of alignments based on sequence and structure similarity; we refer to these structure-based alignment reliabilities as STARs. The columnwise STARs of alignments, or STAR profiles, provide a versatile tool for the manual and automatic analysis of ncRNAs. In particular, we improve the boundary prediction of the widely used ncRNA gene finder RNAz by a factor of 3 from a median deviation of 47 to 13 nt. Post-processing RNAz predictions, LocARNA-P's STAR score allows much stronger discrimination between true- and false-positive predictions than RNAz's own evaluation. The improved accuracy, in this scenario increased from AUC 0.71 to AUC 0.87, significantly reduces the cost of successive analysis steps. The ready-to-use software tool LocARNA-P produces structure-based multiple RNA alignments with associated columnwise STARs and predicts ncRNA boundaries. We provide additional results, a web server for LocARNA/LocARNA-P, and the software package, including documentation and a pipeline for refining screens for structural ncRNA, at http://www.bioinf.uni-freiburg.de/Supplements/LocARNA-P/.  相似文献   

19.
20.
Sun S  Zhao Y  Jiao Y  Yin Y  Cai L  Zhang Y  Lu H  Chen R  Bu D 《FEBS letters》2006,580(7):1891-1896
MOTIVATION: Predicting protein function accurately is an important issue in the post-genomic era. To achieve this goal, several approaches have been proposed deduce the function of unclassified proteins through sequence similarity, co-expression profiles, and other information. Among these methods, the global optimization method (GOM) is an interesting and powerful tool that assigns functions to unclassified proteins based on their positions in a physical interactions network [Vazquez, A., Flammini, A., Maritan, A. and Vespignani, A. (2003) Global protein function prediction from protein-protein interaction networks, Nat. Biotechnol., 21, 697-700]. To boost both the accuracy and speed of GOM, a new prediction method, MFGO (modified and faster global optimization) is presented in this paper, which employs local optimal repetition method to reduce calculation time, and takes account of topological structure information to achieve a more accurate prediction. CONCLUSION: On four proteins interaction datasets, including Vazquez dataset, YP dataset, DIP-core dataset, and SPK dataset, MFGO was tested and compared with the popular MR (majority rule) and GOM methods. Experimental results confirm MFGO's improvement on both speed and accuracy. Especially, MFGO method has a distinctive advantage in accurately predicting functions for proteins with few neighbors. Moreover, the robustness of the approach was validated both in a dataset containing a high percentage of unknown proteins and a disturbed dataset through random insertion and deletion. The analysis shows that a moderate amount of misplaced interactions do not preclude a reliable function assignment.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号