首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
NMR residual dipolar couplings (RDCs), in the form of the projection angles between the respective internuclear bond vectors, are used as structural restraints in the ab initio structure prediction of a test set of six proteins. The restraints are applied using a recently developed SICHO (SIde-CHain-Only) lattice protein model that employs a replica exchange Monte Carlo (MC) algorithm to search conformational space. Using a small number of RDC restraints, the quality of the predicted structures is improved as reflected by lower RMSD/dRMSD (root mean square deviation/distance root mean square deviation) values from the corresponding native structures and by the higher correlation of the most cooperative mode of motion of each predicted structure with that of the native structure. The latter, in particular, has possible implications for the structure-based functional analysis of predicted structures.  相似文献   

2.
TOUCHSTONEX, a new method for folding proteins that uses a small number of long-range contact restraints derived from NMR experimental NOE (nuclear Overhauser enhancement) data, is described. The method employs a new lattice-based, reduced model of proteins that explicitly represents C(alpha), C(beta), and the sidechain centers of mass. The force field consists of knowledge-based terms to produce protein-like behavior, including various short-range interactions, hydrogen bonding, and one-body, pairwise, and multibody long-range interactions. Contact restraints were incorporated into the force field as an NOE-specific pairwise potential. We evaluated the algorithm using a set of 125 proteins of various secondary structure types and lengths up to 174 residues. Using N/8 simulated, long-range sidechain contact restraints, where N is the number of residues, 108 proteins were folded to a C(alpha)-root-mean-square deviation (RMSD) from native below 6.5 A. The average RMSD of the lowest RMSD structures for all 125 proteins (folded and unfolded) was 4.4 A. The algorithm was also applied to limited experimental NOE data generated for three proteins. Using very few experimental sidechain contact restraints, and a small number of sidechain-main chain and main chain-main chain contact restraints, we folded all three proteins to low-to-medium resolution structures. The algorithm can be applied to the NMR structure determination process or other experimental methods that can provide tertiary restraint information, especially in the early stage of structure determination, when only limited data are available.  相似文献   

3.
Protein secondary structure predictions and amino acid long range contact map predictions from primary sequence of proteins have been explored to aid in modelling protein tertiary structures. In order to evaluate the usefulness of secondary structure and 3D-residue contact prediction methods to model protein structures we have used the known Q3 (alpha-helix, beta-strands and irregular turns/loops) secondary structure information, along with residue-residue contact information as restraints for MODELLER. We present here results of our modelling studies on 30 best resolved single domain protein structures of varied lengths. The results shows that it is very difficult to obtain useful models even with 100% accurate secondary structure predictions and accurate residue contact predictions for up to 30% of residues in a sequence. The best models that we obtained for proteins of lengths 37, 70, 118, 136 and 193 amino acid residues are of RMSDs 4.17, 5.27, 9.12, 7.89 and 9.69, respectively. The results show that one can obtain better models for the proteins which have high percent of alpha-helix content. This analysis further shows that MODELLER restrain optimization program can be useful only if we have truly homologous structure(s) as a template where it derives numerous restraints, almost identical to the templates used. This analysis also clearly indicates that even if we satisfy several true residue-residue contact distances, up to 30% of their sequence length with fully known secondary structural information, we end up predicting model structures much distant from their corresponding native structures.  相似文献   

4.
Protein secondary structure predictions and amino acid long range contact map predictions from primary sequence of proteins have been explored to aid in modelling protein tertiary structures. In order to evaluate the usefulness of secondary structure and 3D-residue contact prediction methods to model protein structures we have used the known Q3 (alpha-helix,beta-strands and irregular turns/loops) secondary structure information, along with residue-residue contact information as restraints for MODELLER. We present here results of our modelling studies on 30 best resolved single domain protein structures of varied lengths. The results shows that it is very difficult to obtain useful models even with 100% accurate secondary structure predictions and accurate residue contact predictions for up to 30% of residues in a sequence. The best models that we obtained for proteins of lengths 37, 70, 118, 136 and 193 amino acid residues are of RMSDs 4.17, 5.27, 9.12, 7.89 and 9.69,respectively. The results show that one can obtain better models for the proteins which have high percent of alpha-helix content. This analysis further shows that MODELLER restrain optimization program can be useful only if we have truly homologous structure(s) as a template where it derives numerous restraints, almost identical to the templates used. This analysis also clearly indicates that even if we satisfy several true residue-residue contact distances, up to 30%of their sequence length with fully known secondary structural information, we end up predicting model structures much distant from their corresponding native structures.  相似文献   

5.
One of the main barriers to accurate computational protein structure prediction is searching the vast space of protein conformations. Distance restraints or inter‐residue contacts have been used to reduce this search space, easing the discovery of the correct folded state. It has been suggested that about 1 contact for every 12 residues may be sufficient to predict structure at fold level accuracy. Here, we use coarse‐grained structure‐based models in conjunction with molecular dynamics simulations to examine this empirical prediction. We generate sparse contact maps for 15 proteins of varying sequence lengths and topologies and find that given perfect secondary‐structural information, a small fraction of the native contact map (5%‐10%) suffices to fold proteins to their correct native states. We also find that different sparse maps are not equivalent and we make several observations about the type of maps that are successful at such structure prediction. Long range contacts are found to encode more information than shorter range ones, especially for α and αβ‐proteins. However, this distinction reduces for β‐proteins. Choosing contacts that are a consensus from successful maps gives predictive sparse maps as does choosing contacts that are well spread out over the protein structure. Additionally, the folding of proteins can also be used to choose predictive sparse maps. Overall, we conclude that structure‐based models can be used to understand the efficacy of structure‐prediction restraints and could, in future, be tuned to include specific force‐field interactions, secondary structure errors and noise in the sparse maps.  相似文献   

6.
The problem of protein tertiary structure prediction from primary sequence can be separated into two subproblems: generation of a library of possible folds and specification of a best fold given the library. A distance geometry procedure based on random pairwise metrization with good sampling properties was used to generate a library of 500 possible structures for each of 11 small helical proteins. The input to distance geometry consisted of sets of restraints to enforce predicted helical secondary structure and a generic range of 5 to 11 A between predicted contact residues on all pairs of helices. For each of the 11 targets, the resulting library contained structures with low RMSD versus the native structure. Near-native sampling was enhanced by at least three orders of magnitude compared to a random sampling of compact folds. All library members were scored with a combination of an all-atom distance-dependent function, a residue pair-potential, and a hydrophobicity function. In six of the 11 cases, the best-ranking fold was considered to be near native. Each library was also reduced to a final ab initio prediction via consensus distance geometry performed over the 50 best-ranking structures from the full set of 500. The consensus results were of generally higher quality, yielding six predictions within 6.5 A of the native fold. These favorable predictions corresponded to those for which the correlation between the RMSD and the scoring function were highest. The advantage of the reported methodology is its extreme simplicity and potential for including other types of structural restraints.  相似文献   

7.
We have developed a new combined approach for ab initio protein structure prediction. The protein conformation is described as a lattice chain connecting C(alpha) atoms, with attached C(beta) atoms and side-chain centers of mass. The model force field includes various short-range and long-range knowledge-based potentials derived from a statistical analysis of the regularities of protein structures. The combination of these energy terms is optimized through the maximization of correlation for 30 x 60,000 decoys between the root mean square deviation (RMSD) to native and energies, as well as the energy gap between native and the decoy ensemble. To accelerate the conformational search, a newly developed parallel hyperbolic sampling algorithm with a composite movement set is used in the Monte Carlo simulation processes. We exploit this strategy to successfully fold 41/100 small proteins (36 approximately 120 residues) with predicted structures having a RMSD from native below 6.5 A in the top five cluster centroids. To fold larger-size proteins as well as to improve the folding yield of small proteins, we incorporate into the basic force field side-chain contact predictions from our threading program PROSPECTOR where homologous proteins were excluded from the data base. With these threading-based restraints, the program can fold 83/125 test proteins (36 approximately 174 residues) with structures having a RMSD to native below 6.5 A in the top five cluster centroids. This shows the significant improvement of folding by using predicted tertiary restraints, especially when the accuracy of side-chain contact prediction is >20%. For native fold selection, we introduce quantities dependent on the cluster density and the combination of energy and free energy, which show a higher discriminative power to select the native structure than the previously used cluster energy or cluster size, and which can be used in native structure identification in blind simulations. These procedures are readily automated and are being implemented on a genomic scale.  相似文献   

8.
Predicted protein residue–residue contacts can be used to build three‐dimensional models and consequently to predict protein folds from scratch. A considerable amount of effort is currently being spent to improve contact prediction accuracy, whereas few methods are available to construct protein tertiary structures from predicted contacts. Here, we present an ab initio protein folding method to build three‐dimensional models using predicted contacts and secondary structures. Our method first translates contacts and secondary structures into distance, dihedral angle, and hydrogen bond restraints according to a set of new conversion rules, and then provides these restraints as input for a distance geometry algorithm to build tertiary structure models. The initially reconstructed models are used to regenerate a set of physically realistic contact restraints and detect secondary structure patterns, which are then used to reconstruct final structural models. This unique two‐stage modeling approach of integrating contacts and secondary structures improves the quality and accuracy of structural models and in particular generates better β‐sheets than other algorithms. We validate our method on two standard benchmark datasets using true contacts and secondary structures. Our method improves TM‐score of reconstructed protein models by 45% and 42% over the existing method on the two datasets, respectively. On the dataset for benchmarking reconstructions methods with predicted contacts and secondary structures, the average TM‐score of best models reconstructed by our method is 0.59, 5.5% higher than the existing method. The CONFOLD web server is available at http://protein.rnet.missouri.edu/confold/ . Proteins 2015; 83:1436–1449. © 2015 Wiley Periodicals, Inc.  相似文献   

9.
A significant number of protein sequences in a given proteome have no obvious evolutionarily related protein in the database of solved protein structures, the PDB. Under these conditions, ab initio or template-free modeling methods are the sole means of predicting protein structure. To assess its expected performance on proteomes, the TASSER structure prediction algorithm is benchmarked in the ab initio limit on a representative set of 1129 nonhomologous sequences ranging from 40 to 200 residues that cover the PDB at 30% sequence identity and which adopt alpha, alpha + beta, and beta secondary structures. For sequences in the 40-100 (100-200) residue range, as assessed by their root mean square deviation from native, RMSD, the best of the top five ranked models of TASSER has a global fold that is significantly close to the native structure for 25% (16%) of the sequences, and with a correct identification of the structure of the protein core for 59% (36%). In the absence of a native structure, the structural similarity among the top five ranked models is a moderately reliable predictor of folding accuracy. If we classify the sequences according to their secondary structure content, then 64% (36%) of alpha, 43% (24%) of alpha + beta, and 20% (12%) of beta sequences in the 40-100 (100-200) residue range have a significant TM-score (TM-score > or = 0.4). TASSER performs best on helical proteins because there are less secondary structural elements to arrange in a helical protein than in a beta protein of equal length, since the average length of a helix is longer than that of a strand. In addition, helical proteins have shorter loops and dangling tails. If we exclude these flexible fragments, then TASSER has similar accuracy for sequences containing the same number of secondary structural elements, irrespective of whether they are helices and/or strands. Thus, it is the effective configurational entropy of the protein that dictates the average likelihood of correctly arranging the secondary structure elements.  相似文献   

10.
The prediction of the protein tertiary structure from solely its residue sequence (the so called Protein Folding Problem) is one of the most challenging problems in Structural Bioinformatics. We focus on the protein residue contact map. When this map is assigned it is possible to reconstruct the 3D structure of the protein backbone. The general problem of recovering a set of 3D coordinates consistent with some given contact map is known as a unit-disk-graph realization problem and it has been recently proven to be NP-Hard. In this paper we describe a heuristic method (COMAR) that is able to reconstruct with an unprecedented rate (3-15 seconds) a 3D model that exactly matches the target contact map of a protein. Working with a non-redundant set of 1760 proteins, we find that the scoring efficiency of finding a 3D model very close to the protein native structure depends on the threshold value adopted to compute the protein residue contact map. Contact maps whose threshold values range from 10 to 18 Ångstroms allow reconstructing 3D models that are very similar to the proteins native structure.  相似文献   

11.
A new, efficient method for the assembly of protein tertiary structure from known, loosely encoded secondary structure restraints and sparse information about exact side chain contacts is proposed and evaluated. The method is based on a new, very simple method for the reduced modeling of protein structure and dynamics, where the protein is described as a lattice chain connecting side chain centers of mass rather than Cαs. The model has implicit built-in multibody correlations that simulate short- and long-range packing preferences, hydrogen bonding cooperativity and a mean force potential describing hydrophobic interactions. Due to the simplicity of the protein representation and definition of the model force field, the Monte Carlo algorithm is at least an order of magnitude faster than previously published Monte Carlo algorithms for structure assembly. In contrast to existing algorithms, the new method requires a smaller number of tertiary restraints for successful fold assembly; on average, one for every seven residues as compared to one for every four residues. For example, for smaller proteins such as the B domain of protein G, the resulting structures have a coordinate root mean square deviation (cRMSD), which is about 3 Å from the experimental structure; for myoglobin, structures whose backbone cRMSD is 4.3 Å are produced, and for a 247-residue TIM barrel, the cRMSD of the resulting folds is about 6 Å. As would be expected, increasing the number of tertiary restraints improves the accuracy of the assembled structures. The reliability and robustness of the new method should enable its routine application in model building protocols based on various (very sparse) experimentally derived structural restraints. Proteins 32:475–494, 1998. © 1998 Wiley-Liss, Inc.  相似文献   

12.
The 3D-solution structure of Neurospora crassa Cu(6)-metallothionein (NcMT) polypeptide backbone was determined using homonuclear, multidimensional (1)H-NMR spectroscopy. It represents a new metallothionein (MT) fold with a protein chain where the N-terminal half is left-handed and the C-terminal half right-handedly folded around a copper(I)-sulfur cluster. As seen with other MTs, the protein lacks definable secondary structural elements; however, the polypeptide fold is unique. The metal coordination and the cysteine spacing defines this unique fold. NcMT is only the second MT in the copper-bound form to be structurally characterized and the first containing the -CxCxxxxxCxC- motif. This motif is found in a variety of mammalian MTs and metalloregulatory proteins. The in vitro formation of the Cu(6)NcMT identical to the native Cu(6)NcMT was dependent upon the prior formation of the Zn(3)NcMT and its titration with Cu(I). The enhanced sensitivity and resolution of the 800 MHz (1)H-NMR spectral data permitted the 3D structure determination of the polypeptide backbone without the substitution and utilization of the NMR active spin 1/2 metals such as (113)Cd and (109)Ag. These restraints have been necessary to establish specific metal to cysteine restraints in 3D structural studies on this family of proteins when using lower field, less sensitive (1)H-NMR spectral data. The accuracy of the structure calculated without these constraints is, however, supported by the similarities of the 800 MHz structures of the alpha-domain of mouse MT1 compared to the one recalculated without metal-cysteine connectivities.  相似文献   

13.
《Molecular membrane biology》2013,30(5-8):156-178
Abstract

Solid-state NMR is unique for its ability to obtain three-dimensional structures and to measure atomic-resolution structural and dynamic information for membrane proteins in native lipid bilayers. An increasing number and complexity of integral membrane protein structures have been determined by solid-state NMR using two main methods. Oriented sample solid-state NMR uses macroscopically aligned lipid bilayers to obtain orientational restraints that define secondary structure and global fold of embedded peptides and proteins and their orientation and topology in lipid bilayers. Magic angle spinning (MAS) solid-state NMR uses unoriented rapidly spinning samples to obtain distance and torsion angle restraints that define tertiary structure and helix packing arrangements. Details of all current protein structures are described, highlighting developments in experimental strategy and other technological advancements. Some structures originate from combining solid- and solution-state NMR information and some have used solid-state NMR to refine X-ray crystal structures. Solid-state NMR has also validated the structures of proteins determined in different membrane mimetics by solution-state NMR and X-ray crystallography and is therefore complementary to other structural biology techniques. By continuing efforts in identifying membrane protein targets and developing expression, isotope labelling and sample preparation strategies, probe technology, NMR experiments, calculation and modelling methods and combination with other techniques, it should be feasible to determine the structures of many more membrane proteins of biological and biomedical importance using solid-state NMR. This will provide three-dimensional structures and atomic-resolution structural information for characterising ligand and drug interactions, dynamics and molecular mechanisms of membrane proteins under physiological lipid bilayer conditions.  相似文献   

14.
Alexandrescu AT 《Proteins》2004,56(1):117-129
Introductory biochemistry texts often note that the fold of a protein is completely defined when the dihedral angles phi and psi are known for each amino acid. This assertion was examined with torsion angle dynamics and simulated annealing (TAD/SA) calculations of protein G using only dihedral angle restraints. When all dihedral angles were restrained to within 1 degrees of the values of the X-ray structure, the TAD/SA structures gave a backbone root mean square deviation to the target of 4 A. Factors that contributed to divergence from the correct solution include deviations of peptide bonds from planarity, internal conflicts resulting from the nonuniform energies of different phi, psi combinations, and relaxation to extended conformations in the absence of long-range constraints. Simulations including hydrogen-bond restraints showed that even a few long-range contacts constrain the fold better than a complete set of accurate dihedral restraints. A procedure is described for TAD/SA calculations using hydrogen-bond restraints, idealized dihedral restraints for residues in regular secondary structures, and "hydrophobic distance restraints" derived from the positions of hydrophobic residues in the amino acid sequence. The hydrogen-bond restraints are treated as inviolable, whereas violated hydrophobic restraints are removed following reduction of restraint upper bounds from 2 to 1 times the predicted radius of gyration. The strategy was tested with simulated restraints from X-ray structures of proteins from different fold classes and NMR data for cold shock protein A that included only backbone chemical shifts and hydrogen bonds obtained from a long-range HNCO experiment.  相似文献   

15.
The ability to determine the structure of a protein in solution is a critical tool for structural biology, as proteins in their native state are found in aqueous environments. Using a physical chemistry based prediction protocol, we demonstrate the ability to reproduce protein loop geometries in experimentally derived solution structures. Predictions were run on loops drawn from (1)NMR entries in the Protein Databank (PDB), and from (2) the RECOORD database in which NMR entries from the PDB have been standardized and re-refined in explicit solvent. The predicted structures are validated by comparison with experimental distance restraints, a test of structural quality as defined by the WHAT IF structure validation program, root mean square deviation (RMSD) of the predicted loops to the original structural models, and comparison of precision of the original and predicted ensembles. Results show that for the RECOORD ensembles, the predicted loops are consistent with an average of 95%, 91%, and 87% of experimental restraints for the short, medium and long loops respectively. Prediction accuracy is strongly affected by the quality of the original models, with increases in the percentage of experimental restraints violated of 2% for the short loops, and 9% for both the medium and long loops in the PDB derived ensembles. We anticipate the application of our protocol to theoretical modeling of protein structures, such as fold recognition methods; as well as to experimental determination of protein structures, or segments, for which only sparse NMR restraint data is available.  相似文献   

16.
17.
Understanding the numerous functions that RNAs play in living cells depends critically on knowledge of their three-dimensional structure. Due to the difficulties in experimentally assessing structures of large RNAs, there is currently great demand for new high-resolution structure prediction methods. We present the novel method for the fully automated prediction of RNA 3D structures from a user-defined secondary structure. The concept is founded on the machine translation system. The translation engine operates on the RNA FRABASE database tailored to the dictionary relating the RNA secondary structure and tertiary structure elements. The translation algorithm is very fast. Initial 3D structure is composed in a range of seconds on a single processor. The method assures the prediction of large RNA 3D structures of high quality. Our approach needs neither structural templates nor RNA sequence alignment, required for comparative methods. This enables the building of unresolved yet native and artificial RNA structures. The method is implemented in a publicly available, user-friendly server RNAComposer. It works in an interactive mode and a batch mode. The batch mode is designed for large-scale modelling and accepts atomic distance restraints. Presently, the server is set to build RNA structures of up to 500 residues.  相似文献   

18.
Chen J  Brooks CL 《Proteins》2007,67(4):922-930
Recent advances in efficient and accurate treatment of solvent with the generalized Born approximation (GB) have made it possible to substantially refine the protein structures generated by various prediction tools through detailed molecular dynamics simulations. As demonstrated in a recent CASPR experiment, improvement can be quite reliably achieved when the initial models are sufficiently close to the native basin (e.g., 3-4 A C(alpha) RMSD). A key element to effective refinement is to incorporate reliable structural information into the simulation protocol. Without intimate knowledge of the target and prediction protocol used to generate the initial structural models, it can be assumed that the regular secondary structure elements (helices and strands) and overall fold topology are largely correct to start with, such that the protocol limits itself to the scope of refinement and focuses the sampling in vicinity of the initial structure. The secondary structures can be enforced by dihedral restraints and the topology through structural contacts, implemented as either multiple pair-wise C(alpha) distance restraints or a single sidechain distance matrix restraint. The restraints are weakly imposed with flat-bottom potentials to allow sufficient flexibility for structural rearrangement. Refinement is further facilitated by enhanced sampling of advanced techniques such as the replica exchange method (REX). In general, for single domain proteins of small to medium sizes, 3-5 nanoseconds of REX/GB refinement simulations appear to be sufficient for reasonable convergence. Clustering of the resulting structural ensembles can yield refined models over 1.0 A closer to the native structure in C(alpha) RMSD. Substantial improvement of sidechain contacts and rotamer states can also be achieved in most cases. Additional improvement is possible with longer sampling and knowledge of the robust structural features in the initial models for a given prediction protocol. Nevertheless, limitations still exist in sampling as well as force field accuracy, manifested as difficulty in refinement of long and flexible loops.  相似文献   

19.
A rule-based automated method is presented for modeling the structures of the seven transmembrane helices of G-protein-coupled receptors. The structures are generated by using a simulated annealing Monte Carlo procedure that positions and orients rigid helices to satisfy structural restraints. The restraints are derived from analysis of experimental information from biophysical studies on native and mutant proteins, from analysis of the sequences of related proteins, and from theoretical considerations of protein structure. Calculations are presented for two systems. The method was validated through calculations using appropriate experimental information for bacteriorhodopsin, which produced a model structure with a root mean square (rms) deviation of 1.87 A from the structure determined by electron microscopy. Calculations are also presented using experimental and theoretical information available for bovine rhodopsin to assign the helices to a projection density map and to produce a model of bovine rhodopsin that can be used as a template for modeling other G-protein-coupled receptors.  相似文献   

20.
There have been steady improvements in protein structure prediction during the past 2 decades. However, current methods are still far from consistently predicting structural models accurately with computing power accessible to common users. Toward achieving more accurate and efficient structure prediction, we developed a number of novel methods and integrated them into a software package, MUFOLD. First, a systematic protocol was developed to identify useful templates and fragments from Protein Data Bank for a given target protein. Then, an efficient process was applied for iterative coarse‐grain model generation and evaluation at the Cα or backbone level. In this process, we construct models using interresidue spatial restraints derived from alignments by multidimensional scaling, evaluate and select models through clustering and static scoring functions, and iteratively improve the selected models by integrating spatial restraints and previous models. Finally, the full‐atom models were evaluated using molecular dynamics simulations based on structural changes under simulated heating. We have continuously improved the performance of MUFOLD by using a benchmark of 200 proteins from the Astral database, where no template with >25% sequence identity to any target protein is included. The average root‐mean‐square deviation of the best models from the native structures is 4.28 Å, which shows significant and systematic improvement over our previous methods. The computing time of MUFOLD is much shorter than many other tools, such as Rosetta. MUFOLD demonstrated some success in the 2008 community‐wide experiment for protein structure prediction CASP8. Proteins 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号