首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Predicted protein residue–residue contacts can be used to build three‐dimensional models and consequently to predict protein folds from scratch. A considerable amount of effort is currently being spent to improve contact prediction accuracy, whereas few methods are available to construct protein tertiary structures from predicted contacts. Here, we present an ab initio protein folding method to build three‐dimensional models using predicted contacts and secondary structures. Our method first translates contacts and secondary structures into distance, dihedral angle, and hydrogen bond restraints according to a set of new conversion rules, and then provides these restraints as input for a distance geometry algorithm to build tertiary structure models. The initially reconstructed models are used to regenerate a set of physically realistic contact restraints and detect secondary structure patterns, which are then used to reconstruct final structural models. This unique two‐stage modeling approach of integrating contacts and secondary structures improves the quality and accuracy of structural models and in particular generates better β‐sheets than other algorithms. We validate our method on two standard benchmark datasets using true contacts and secondary structures. Our method improves TM‐score of reconstructed protein models by 45% and 42% over the existing method on the two datasets, respectively. On the dataset for benchmarking reconstructions methods with predicted contacts and secondary structures, the average TM‐score of best models reconstructed by our method is 0.59, 5.5% higher than the existing method. The CONFOLD web server is available at http://protein.rnet.missouri.edu/confold/ . Proteins 2015; 83:1436–1449. © 2015 Wiley Periodicals, Inc.  相似文献   

2.
We describe a method that can thoroughly sample a protein conformational space given the protein primary sequence of amino acids and secondary structure predictions. Specifically, we target proteins with β‐sheets because they are particularly challenging for ab initio protein structure prediction because of the complexity of sampling long‐range strand pairings. Using some basic packing principles, inverse kinematics (IK), and β‐pairing scores, this method creates all possible β‐sheet arrangements including those that have the correct packing of β‐strands. It uses the IK algorithms of ProteinShop to move α‐helices and β‐strands as rigid bodies by rotating the dihedral angles in the coil regions. Our results show that our approach produces structures that are within 4–6 Å RMSD of the native one regardless of the protein size and β‐sheet topology although this number may increase if the protein has long loops or complex α‐helical regions. Proteins 2010. © Published 2009 Wiley‐Liss, Inc.  相似文献   

3.
Ab initio protein structure prediction methods have improved dramatically in the past several years. Because these methods require only the sequence of the protein of interest, they are potentially applicable to the open reading frames in the many organisms whose sequences have been and will be determined. Ab initio methods cannot currently produce models of high enough resolution for use in rational drug design, but there is an exciting potential for using the methods for functional annotation of protein sequences on a genomic scale. Here we illustrate how functional insights can be obtained from low-resolution predicted structures using examples from blind ab initio structure predictions from the third and fourth critical assessment of structure prediction (CASP3, CASP4) experiments.  相似文献   

4.
The folding process defines three‐dimensional protein structures from their amino acid chains. A protein's structure determines its activity and properties; thus knowing such conformation on an atomic level is essential for both basic and applied studies of protein function and dynamics. However, the acquisition of such structures by experimental methods is slow and expensive, and current computational methods mostly depend on previously known structures to determine new ones. Here we present a new software called GSAFold that applies the generalized simulated annealing (GSA) algorithm on ab initio protein structure prediction. The GSA is a stochastic search algorithm employed in energy minimization and used in global optimization problems, especially those that depend on long‐range interactions, such as gravity models and conformation optimization of small molecules. This new implementation applies, for the first time in ab initio protein structure prediction, an analytical inverse for the Visitation function of GSA. It also employs the broadly used NAMD Molecular Dynamics package to carry out energy calculations, allowing the user to select different force fields and parameterizations. Moreover, the software also allows the execution of several simulations simultaneously. Applications that depend on protein structures include rational drug design and structure‐based protein function prediction. Applying GSAFold in a test peptide, it was possible to predict the structure of mastoparan‐X to a root mean square deviation of 3.00 Å. Proteins 2012; © 2012 Wiley Periodicals, Inc.  相似文献   

5.

Background  

Structural properties of proteins such as secondary structure and solvent accessibility contribute to three-dimensional structure prediction, not only in the ab initio case but also when homology information to known structures is available. Structural properties are also routinely used in protein analysis even when homology is available, largely because homology modelling is lower throughput than, say, secondary structure prediction. Nonetheless, predictors of secondary structure and solvent accessibility are virtually always ab initio.  相似文献   

6.
While ab initio modeling of protein structures is not routine, certain types of proteins are more straightforward to model than others. Proteins with short repetitive sequences typically exhibit repetitive structures. These repetitive sequences can be more amenable to modeling if some information is known about the predominant secondary structure or other key features of the protein sequence. We have successfully built models of a number of repetitive structures with novel folds using knowledge of the consensus sequence within the sequence repeat and an understanding of the likely secondary structures that these may adopt. Our methods for achieving this success are reviewed here.  相似文献   

7.
Kai Zhu  Tyler Day 《Proteins》2013,81(6):1081-1089
Antibodies have the capability of binding a wide range of antigens due to the diversity of the six loops constituting the complementarity determining region (CDR). Among the six loops, the H3 loop is the most diverse in structure, length, and sequence identity. Prediction of the three‐dimensional structures of antibodies, especially the CDR loops, is an important step in the computational design and engineering of novel antibodies for improved affinity and specificity. Although it has been demonstrated that the conformation of the five non‐H3 loops can be accurately predicted by comparing their sequences against databases of canonical loop conformations, no such connection has been established for H3 loops. In this work, we present the results for ab initio structure prediction of the H3 loop using conformational sampling and energy calculations with the program Prime on a dataset of 53 loops ranging in length from 4 to 22 residues. When the prediction is performed in the crystal environment and including symmetry mates, the median backbone root mean square deviation (RMSD) is 0.5 Å to the crystal structure, with 91% of cases having an RMSD of less than 2.0 Å. When the prediction is performed in a noncrystallographic environment, where the scaffold is constructed by swapping the H3 loops between homologous antibodies, 70% of cases have an RMSD below 2.0 Å. These results show promise for ab initio loop predictions applied to modeling of antibodies. © 2012 Wiley Periodicals, Inc.  相似文献   

8.
Dihedral angles of amino acids are of considerable importance in protein tertiary structure prediction as they define the backbone of a protein and hence almost define the protein's entire conformation. Most ab initio protein structure prediction methods predict the secondary structure of a protein before predicting the tertiary structure because three-dimensional fold consists of repeating units of secondary structures. Hence, both dihedral angles and secondary structures are important in tertiary structure prediction of proteins. Here we describe a database called DASSD (Dihedral Angle and Secondary Structure Database of Short Amino acid Fragments) that contains dihedral angle values and secondary structure details of short amino acid fragments of lengths 1, 3 and 5. Information stored in this database was extracted from a set of 5,227 non-redundant high resolution (less than 2-angstroms) protein structures. In total, DASSD stores details for about 733,000 fragments. This database finds application in the development of ab initio protein structure prediction methods using fragment libraries and fragment assembly techniques. It is also useful in protein secondary structure prediction.

Availability  相似文献   


9.
Abstract

A set of software tools designed to study protein structure and kinetics has been developed. The core of these tools is a program called Folding Machine (FM) which is able to generate low resolution folding pathways using modest computational resources. The FM is based on a coarse-grained kinetic ab initio Monte-Carlo sampler that can optionally use information extracted from secondary structure prediction servers or from fragment libraries of local structure. The model underpinning this algorithm contains two novel elements: (a) the conformational space is discretized using the Ramachandran basins defined in the local φ-ψ energy maps; and (b) the solvent is treated implicitly by rescaling the pairwise terms of the non-bonded energy function according to the local solvent environments. The purpose of this hybrid ab initio/knowledge-based approach is threefold: to cover the long time scales of folding, to generate useful 3-dimensional models of protein structures, and to gain insight on the protein folding kinetics. Even though the algorithm is not yet fully developed, it has been used in a recent blind test of protein structure prediction (CASP5). The FM generated models within 6 Å backbone rmsd for fragments of about 60–70 residues of a-helical proteins. For a CASP5 target that turned out to be natively unfolded, the trajectory obtained for this sequence uniquely failed to converge. Also, a new measure to evaluate structure predictions is presented and used along the standard CASP assessment methods. Finally, recent improvements in the prediction of β-sheet structures are briefly described.  相似文献   

10.

Background  

The problem of protein structure prediction consists of predicting the functional or native structure of a protein given its linear sequence of amino acids. This problem has played a prominent role in the fields of biomolecular physics and algorithm design for over 50 years. Additionally, its importance increases continually as a result of an exponential growth over time in the number of known protein sequences in contrast to a linear increase in the number of determined structures. Our work focuses on the problem of searching an exponentially large space of possible conformations as efficiently as possible, with the goal of finding a global optimum with respect to a given energy function. This problem plays an important role in the analysis of systems with complex search landscapes, and particularly in the context of ab initio protein structure prediction.  相似文献   

11.
Loops are the most variable regions of protein structure and are, in general, the least accurately predicted. Their prediction has been approached in two ways, ab initio and database search. In recent years, it has been thought that ab initio methods are more powerful. In light of the continued rapid expansion in the number of known protein structures, we have re‐evaluated FREAD, a database search method and demonstrate that the power of database search methods may have been underestimated. We found that sequence similarity as quantified by environment specific substitution scores can be used to significantly improve prediction. In fact, FREAD performs appreciably better for an identifiable subset of loops (two thirds of shorter loops and half of the longer loops tested) than the ab initio methods of MODELLER, PLOP, and RAPPER. Within this subset, FREAD's predictive ability is length independent, in general, producing results within 2Å RMSD, compared to an average of over 10Å for loop length 20 for any of the other tested methods. We also benchmarked the prediction protocols on a set of 212 loops from the model structures in CASP 7 and 8. An extended version of FREAD is able to make predictions for 127 of these, it gives the best prediction of the methods tested in 61 of these cases. In examining FREAD's ability to predict in the model environment, we found that whole structure quality did not affect the quality of loop predictions. Proteins 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

12.
It has been known even since relatively few structures had been solved that longer protein chains often contain multiple domains, which may fold separately and play the role of reusable functional modules found in many contexts. In many structural biology tasks, in particular structure prediction, it is of great use to be able to identify domains within the structure and analyze these regions separately. However, when using sequence data alone this task has proven exceptionally difficult, with relatively little improvement over the naive method of choosing boundaries based on size distributions of observed domains. The recent significant improvement in contact prediction provides a new source of information for domain prediction. We test several methods for using this information including a kernel smoothing‐based approach and methods based on building alpha‐carbon models and compare performance with a length‐based predictor, a homology search method and four published sequence‐based predictors: DOMCUT, DomPRO, DLP‐SVM, and SCOOBY‐DOmain. We show that the kernel‐smoothing method is significantly better than the other ab initio predictors when both single‐domain and multidomain targets are considered and is not significantly different to the homology‐based method. Considering only multidomain targets the kernel‐smoothing method outperforms all of the published methods except DLP‐SVM. The kernel smoothing method therefore represents a potentially useful improvement to ab initio domain prediction. Proteins 2013. © 2012 Wiley Periodicals, Inc.  相似文献   

13.
Abstract

The application of Molecular-Dynamics simulation in protein-crystallographic structure refinement has become common practice. In this paper, structure optimizations are described where the driving force is derived only from the crystallographic data and not from any physical potential energy function. Under this extreme condition ab initio structure refinement and the application of structure-factor time averaging was investigated using a small 9 atom test system. Success in ab initio refinement, where the starting atomic positions are randomly distributed, depends on the resolution of the crystallographic data used in the optimization. The presence of high resolution data introduces false minima in the X-ray energy profile, enhancing the search problem significantly. On the same system, we also tested the method of time-averaged crystallographically restrained Molecular Dynamics, again in the absence of a physical force field. In this method, the diffraction data is modelled by an ensemble of structures instead of one single structure. In comparison to conventional single-structure refinement, more reflections were required to determine a correct atomic distribution. A time-averaging simulation at 0.2 nm resolution (40 reflections) yielded an incorrect distribution, although a low R-factor was obtained. Simulations at 0.1 nm resolution (248 reflections) gave both low R-factors, 3 to 4%, and correct atomic distributions. The scale factor between the observed and time-averaged calculated structure factor amplitudes appeared to be unstable, when optimized during a time-averaging simulation. Tests of time-averaged restrained simulations with noise added to the observed structure-factor amplitudes, indicated that noise is modelled when no information in the form of constraints or restraints is available to distinguish it from real data.  相似文献   

14.

Background

Protein structures are critical for understanding the mechanisms of biological systems and, subsequently, for drug and vaccine design. Unfortunately, protein sequence data exceed structural data by a factor of more than 200 to 1. This gap can be partially filled by using computational protein structure prediction. While structure prediction Web servers are a notable option, they often restrict the number of sequence queries and/or provide a limited set of prediction methodologies. Therefore, we present a standalone protein structure prediction software package suitable for high-throughput structural genomic applications that performs all three classes of prediction methodologies: comparative modeling, fold recognition, and ab initio. This software can be deployed on a user''s own high-performance computing cluster.

Methodology/Principal Findings

The pipeline consists of a Perl core that integrates more than 20 individual software packages and databases, most of which are freely available from other research laboratories. The query protein sequences are first divided into domains either by domain boundary recognition or Bayesian statistics. The structures of the individual domains are then predicted using template-based modeling or ab initio modeling. The predicted models are scored with a statistical potential and an all-atom force field. The top-scoring ab initio models are annotated by structural comparison against the Structural Classification of Proteins (SCOP) fold database. Furthermore, secondary structure, solvent accessibility, transmembrane helices, and structural disorder are predicted. The results are generated in text, tab-delimited, and hypertext markup language (HTML) formats. So far, the pipeline has been used to study viral and bacterial proteomes.

Conclusions

The standalone pipeline that we introduce here, unlike protein structure prediction Web servers, allows users to devote their own computing assets to process a potentially unlimited number of queries as well as perform resource-intensive ab initio structure prediction.  相似文献   

15.

Background  

Modelling proteins with multiple domains is one of the central challenges in Structural Biology. Although homology modelling has successfully been applied for prediction of protein structures, very often domain-domain interactions cannot be inferred from the structures of homologues and their prediction requiresab initiomethods. Here we present a new structural prediction approach for modelling two-domain proteins based on rigid-body domain-domain docking.  相似文献   

16.
Proteins encoded by newly-emerged genes (‘orphan genes’) share no sequence similarity with proteins in any other species. They provide organisms with a reservoir of genetic elements to quickly respond to changing selection pressures. Here, we systematically assess the ability of five gene prediction pipelines to accurately predict genes in genomes according to phylostratal origin. BRAKER and MAKER are existing, popular ab initio tools that infer gene structures by machine learning. Direct Inference is an evidence-based pipeline we developed to predict gene structures from alignments of RNA-Seq data. The BIND pipeline integrates ab initio predictions of BRAKER and Direct inference; MIND combines Direct Inference and MAKER predictions. We use highly-curated Arabidopsis and yeast annotations as gold-standard benchmarks, and cross-validate in rice. Each pipeline under-predicts orphan genes (as few as 11 percent, under one prediction scenario). Increasing RNA-Seq diversity greatly improves prediction efficacy. The combined methods (BIND and MIND) yield best predictions overall, BIND identifying 68% of annotated orphan genes, 99% of ancient genes, and give the highest sensitivity score regardless dataset in Arabidopsis. We provide a light weight, flexible, reproducible, and well-documented solution to improve gene prediction.  相似文献   

17.
PROPAINOR is a new algorithm developed for ab initio prediction of the 3D structures of proteins using knowledge-based nonparametric multivariate statistical methods. This algorithm is found to be most efficient in terms of computational simplicity and prediction accuracy for single-domain proteins as compared to other ab initio methods. In this paper, we have used the algorithm for the atomic structure prediction of a multi-domain (two-domain) calcium-binding protein, whose solution structure has been deposited in the PDB recently (PDB ID: 1JFK). We have studied the sensitivity of the predicted structure to NMR distance restraints with their incorporation as an additional input. Further, we have compared the predicted structures in both these cases with the NMR derived solution structure reported earlier. We have also validated the refined structure for proper stereochemistry and favorable packing environment with good results and elucidated the role of the central linker. Figure The predicted 3D Structure of EhCaBP with bound Ca2+ ions (CaBP-0). In the structure, α-helices are shown in pink and the β-strands in yellow. Ca2+ ions are depicted as fluorescent green balls. Some of the residues in the calcium-binding loops are depicted in space-fill representation.   相似文献   

18.

Background  

Predicting protein residue-residue contacts is an important 2D prediction task. It is useful for ab initio structure prediction and understanding protein folding. In spite of steady progress over the past decade, contact prediction remains still largely unsolved.  相似文献   

19.
The 13Cα chemical shifts for 16,299 residues from 213 conformations of four proteins (experimentally determined by X-ray crystallography and Nuclear Magnetic Resonance methods) were computed by using a combination of approaches that includes, but is not limited to, the use of density functional theory. Initially, a validation test of this methodology was carried out by a detailed examination of the correlation between computed and observed 13Cα chemical shifts of 10,564 (of the 16,299) residues from 139 conformations of the human protein ubiquitin. The results of this validation test on ubiquitin show agreement with conclusions derived from computation of the chemical shifts at the ab initio Hartree–Fock level. Further, application of this methodology to 5,735 residues from 74 conformations of the three remaining proteins that differ in their number of amino acid residues, sequence and three-dimensional structure, together with a new scoring function, namely the conformationally averaged root-mean-square-deviation, enables us to: (a) offer a criterion for an accurate assessment of the quality of NMR-derived protein conformations; (b) examine whether X-ray or NMR-solved structures are better representations of the observed 13Cα chemical shifts in solution; (c) provide evidence indicating that the proposed methodology is more accurate than automated predictors for validation of protein structures; (d) shed light as to whether the agreement between computed and observed 13Cα chemical shifts is influenced by the identity of an amino acid residue or its location in the sequence; and (e) provide evidence confirming the presence of dynamics for proteins in solution, and hence showing that an ensemble of conformations is a better representation of the structure in solution than any single conformation. Electronic Supplementary Material The online version of this article (doi: ) contains supplementary material, which is available to authorized users.  相似文献   

20.
Adhesion of the serotype M1 Streptococcus pyogenes strain SF370 to human tonsil explants and cultured keratinocytes requires extended polymeric surface structures called pili. In this important human pathogen, pili are assembled from three protein subunits: Spy0125, Spy0128 and Spy0130 through the action of sortase enzymes. For this study, the structural properties of these pili proteins have been investigated in solution. Spy0125 and Spy0128 display characteristics of globular, folded proteins. Circular dichroism suggests a largely β-sheet composition for Spy0128 and Spy0125; Spy0130 appears to contain little secondary structure. Each of the proteins adopts a monodisperse, monomeric state in solution as assessed by analytical ultracentrifugation. Further, small-angle X-ray scattering curves for Spy0125, Spy0128 and Spy0130 suggest each protein adopts an elongated shape, likely comprised of two domains, with similar maximal dimensions. Based on the scattering data, dummy atom models of each of the pili subunits have been reconstructed ab initio. This study provides the first insights into the structure of Streptococcus pyogenes minor pili subunits, and possible implications for protein function are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号