首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background  

The prediction of the secondary structure of a protein is a critical step in the prediction of its tertiary structure and, potentially, its function. Moreover, the backbone dihedral angles, highly correlated with secondary structures, provide crucial information about the local three-dimensional structure.  相似文献   

2.
Tertiary structure prediction of a protein from its amino acid sequence is one of the major challenges in the field of bioinformatics. Hierarchical approach is one of the persuasive techniques used for predicting protein tertiary structure, especially in the absence of homologous protein structures. In hierarchical approach, intermediate states are predicted like secondary structure, dihedral angles, Cα-Cα distance bounds, etc. These intermediate states are used to restraint the protein backbone and assist its correct folding. In the recent years, several methods have been developed for predicting dihedral angles of a protein, but it is difficult to conclude which method is better than others. In this study, we benchmarked the performance of dihedral prediction methods ANGLOR and SPINE X on various datasets, including independent datasets. TANGLE dihedral prediction method was not benchmarked (due to unavailability of its standalone) and was compared with SPINE X and ANGLOR on only ANGLOR dataset on which TANGLE has reported its results. It was observed that SPINE X performed better than ANGLOR and TANGLE, especially in case of prediction of dihedral angles of glycine and proline residues. The analysis suggested that angle shifting was the foremost reason of better performance of SPINE X. We further evaluated the performance of the methods on independent ccPDB30 dataset and observed that SPINE X performed better than ANGLOR.  相似文献   

3.

Background

Protein structures are critical for understanding the mechanisms of biological systems and, subsequently, for drug and vaccine design. Unfortunately, protein sequence data exceed structural data by a factor of more than 200 to 1. This gap can be partially filled by using computational protein structure prediction. While structure prediction Web servers are a notable option, they often restrict the number of sequence queries and/or provide a limited set of prediction methodologies. Therefore, we present a standalone protein structure prediction software package suitable for high-throughput structural genomic applications that performs all three classes of prediction methodologies: comparative modeling, fold recognition, and ab initio. This software can be deployed on a user''s own high-performance computing cluster.

Methodology/Principal Findings

The pipeline consists of a Perl core that integrates more than 20 individual software packages and databases, most of which are freely available from other research laboratories. The query protein sequences are first divided into domains either by domain boundary recognition or Bayesian statistics. The structures of the individual domains are then predicted using template-based modeling or ab initio modeling. The predicted models are scored with a statistical potential and an all-atom force field. The top-scoring ab initio models are annotated by structural comparison against the Structural Classification of Proteins (SCOP) fold database. Furthermore, secondary structure, solvent accessibility, transmembrane helices, and structural disorder are predicted. The results are generated in text, tab-delimited, and hypertext markup language (HTML) formats. So far, the pipeline has been used to study viral and bacterial proteomes.

Conclusions

The standalone pipeline that we introduce here, unlike protein structure prediction Web servers, allows users to devote their own computing assets to process a potentially unlimited number of queries as well as perform resource-intensive ab initio structure prediction.  相似文献   

4.

Background  

Structural properties of proteins such as secondary structure and solvent accessibility contribute to three-dimensional structure prediction, not only in the ab initio case but also when homology information to known structures is available. Structural properties are also routinely used in protein analysis even when homology is available, largely because homology modelling is lower throughput than, say, secondary structure prediction. Nonetheless, predictors of secondary structure and solvent accessibility are virtually always ab initio.  相似文献   

5.
Abstract

A set of software tools designed to study protein structure and kinetics has been developed. The core of these tools is a program called Folding Machine (FM) which is able to generate low resolution folding pathways using modest computational resources. The FM is based on a coarse-grained kinetic ab initio Monte-Carlo sampler that can optionally use information extracted from secondary structure prediction servers or from fragment libraries of local structure. The model underpinning this algorithm contains two novel elements: (a) the conformational space is discretized using the Ramachandran basins defined in the local φ-ψ energy maps; and (b) the solvent is treated implicitly by rescaling the pairwise terms of the non-bonded energy function according to the local solvent environments. The purpose of this hybrid ab initio/knowledge-based approach is threefold: to cover the long time scales of folding, to generate useful 3-dimensional models of protein structures, and to gain insight on the protein folding kinetics. Even though the algorithm is not yet fully developed, it has been used in a recent blind test of protein structure prediction (CASP5). The FM generated models within 6 Å backbone rmsd for fragments of about 60–70 residues of a-helical proteins. For a CASP5 target that turned out to be natively unfolded, the trajectory obtained for this sequence uniquely failed to converge. Also, a new measure to evaluate structure predictions is presented and used along the standard CASP assessment methods. Finally, recent improvements in the prediction of β-sheet structures are briefly described.  相似文献   

6.
Predicted protein residue–residue contacts can be used to build three‐dimensional models and consequently to predict protein folds from scratch. A considerable amount of effort is currently being spent to improve contact prediction accuracy, whereas few methods are available to construct protein tertiary structures from predicted contacts. Here, we present an ab initio protein folding method to build three‐dimensional models using predicted contacts and secondary structures. Our method first translates contacts and secondary structures into distance, dihedral angle, and hydrogen bond restraints according to a set of new conversion rules, and then provides these restraints as input for a distance geometry algorithm to build tertiary structure models. The initially reconstructed models are used to regenerate a set of physically realistic contact restraints and detect secondary structure patterns, which are then used to reconstruct final structural models. This unique two‐stage modeling approach of integrating contacts and secondary structures improves the quality and accuracy of structural models and in particular generates better β‐sheets than other algorithms. We validate our method on two standard benchmark datasets using true contacts and secondary structures. Our method improves TM‐score of reconstructed protein models by 45% and 42% over the existing method on the two datasets, respectively. On the dataset for benchmarking reconstructions methods with predicted contacts and secondary structures, the average TM‐score of best models reconstructed by our method is 0.59, 5.5% higher than the existing method. The CONFOLD web server is available at http://protein.rnet.missouri.edu/confold/ . Proteins 2015; 83:1436–1449. © 2015 Wiley Periodicals, Inc.  相似文献   

7.
MOTIVATION: The Monte Carlo fragment insertion method for protein tertiary structure prediction (ROSETTA) of Baker and others, has been merged with the I-SITES library of sequence structure motifs and the HMMSTR model for local structure in proteins, to form a new public server for the ab initio prediction of protein structure. The server performs several tasks in addition to tertiary structure prediction, including a database search, amino acid profile generation, fragment structure prediction, and backbone angle and secondary structure prediction. Meeting reasonable service goals required improvements in the efficiency, in particular for the ROSETTA algorithm. RESULTS: The new server was used for blind predictions of 40 protein sequences as part of the CASP4 blind structure prediction experiment. The results for 31 of those predictions are presented here. 61% of the residues overall were found in topologically correct predictions, which are defined as fragments of 30 residues or more with a root-mean-square deviation in superimposed alpha carbons of less than 6A. HMMSTR 3-state secondary structure predictions were 73% correct overall. Tertiary structure predictions did not improve the accuracy of secondary structure prediction.  相似文献   

8.
We have investigated some of the basic principles that influence generation of protein structures using a fragment-based, random insertion method. We tested buildup methods and fragment library quality for accuracy in constructing a set of known structures. The parameters most influential in the construction procedure are bond and torsion angles with minor inaccuracies in bond angles alone causing >6 A CalphaRMSD for a 150-residue protein. Idealization to a standard set of values corrects this problem, but changes the torsion angles and does not work for every structure. Alternatively, we found using Cartesian coordinates instead of torsion angles did not reduce performance and can potentially increase speed and accuracy. Under conditions simulating ab initio structure prediction, fragment library quality can be suboptimal and still produce near-native structures. Using various clustering criteria, we created a number of libraries and used them to predict a set of native structures based on nonnative fragments. Local CalphaRMSD fit of fragments, library size, and takeoff/landing angle criteria weakly influence the accuracy of the models. Based on a fragment's minimal perturbation upon insertion into a known structure, a seminative fragment library was created that produced more accurate structures with fragments that were less similar to native fragments than the other sets. These results suggest that fragments need only contain native-like subsections, which when correctly overlapped, can recreate a native-like model. For fragment-based, random insertion methods used in protein structure prediction and design, our findings help to define the parameters this method needs to generate near-native structures.  相似文献   

9.
We describe a method that can thoroughly sample a protein conformational space given the protein primary sequence of amino acids and secondary structure predictions. Specifically, we target proteins with β‐sheets because they are particularly challenging for ab initio protein structure prediction because of the complexity of sampling long‐range strand pairings. Using some basic packing principles, inverse kinematics (IK), and β‐pairing scores, this method creates all possible β‐sheet arrangements including those that have the correct packing of β‐strands. It uses the IK algorithms of ProteinShop to move α‐helices and β‐strands as rigid bodies by rotating the dihedral angles in the coil regions. Our results show that our approach produces structures that are within 4–6 Å RMSD of the native one regardless of the protein size and β‐sheet topology although this number may increase if the protein has long loops or complex α‐helical regions. Proteins 2010. © Published 2009 Wiley‐Liss, Inc.  相似文献   

10.
Lee J  Kim SY  Joo K  Kim I  Lee J 《Proteins》2004,56(4):704-714
A novel method for ab initio prediction of protein tertiary structures, PROFESY (PROFile Enumerating SYstem), is proposed. This method utilizes the secondary structure prediction information of a query sequence and the fragment assembly procedure based on global optimization. Fifteen-residue-long fragment libraries are constructed using the secondary structure prediction method PREDICT, and fragments in these libraries are assembled to generate full-length chains of a query protein. Tertiary structures of 50 to 100 conformations are obtained by minimizing an energy function for proteins, using the conformational space annealing method that enables one to sample diverse low-lying local minima of the energy. We apply PROFESY for benchmark tests to proteins with known structures to demonstrate its feasibility. In addition, we participated in CASP5 and applied PROFESY to four new-fold targets for blind prediction. The results are quite promising, despite the fact that PROFESY was in its early stages of development. In particular, PROFESY successfully provided us the best model-one structure for the target T0161.  相似文献   

11.
Dong Xu  Yang Zhang 《Proteins》2013,81(2):229-239
Fragment assembly using structural motifs excised from other solved proteins has shown to be an efficient method for ab initio protein‐structure prediction. However, how to construct accurate fragments, how to derive optimal restraints from fragments, and what the best fragment length is are the basic issues yet to be systematically examined. In this work, we developed a gapless‐threading method to generate position‐specific structure fragments. Distance profiles and torsion angle pairs are then derived from the fragments by statistical consistency analysis, which achieved comparable accuracy with the machine‐learning‐based methods although the fragments were taken from unrelated proteins. When measured by both accuracies of the derived distance profiles and torsion angle pairs, we come to a consistent conclusion that the optimal fragment length for structural assembly is around 10, and at least 100 fragments at each location are needed to achieve optimal structure assembly. The distant profiles and torsion angle pairs as derived by the fragments have been successfully used in QUARK for ab initio protein structure assembly and are provided by the QUARK online server at http://zhanglab.ccmb. med.umich.edu/QUARK/ . Proteins 2013. © 2012 Wiley Periodicals, Inc.  相似文献   

12.
A relational database of protein structure has been developed to enable rapid and flexible enquiries about the occurrence of many aspects of protein architecture. The coordinates of 294 proteins from the Brookhaven Data Bank have been processed by standard computer programs to generate many additional terms that quantify aspects of protein structure. These terms include solvent accessibility, main-chain and side-chain dihedral angles, and secondary structure. In a relational database, the information is stored in tables with columns holding the different terms and rows holding the different entries for the terms. The different relational base tables store the information about the protein coordinate set, the different chains in the protein, the amino acid residues and ligands, the atomic coordinates, the salt bridges, the hydrogen bonds, the disulphide bridges and the close tertiary contacts. The database was established under ORACLE management system. Enquiries are constructed in ORACLE using SQL (structured query language) which is simple to use and alleviates the need for extensive computer programs. A single table can be searched for entries that meet various criteria, e.g. all protein solved to better than a given resolution. The power of the database occurs when several tables, or the entries in a single table, are cross-correlated. For example the dihedral angles of proline in the fourth position in an alpha-helix in high resolution structures can be rapidly obtained. The structural database provides a powerful tool to obtain empirical rules about protein conformation. This database of protein structures is part of a joint project between Birkbeck College and Leeds University to establish an integrated data resource of protein sequences and structures (ISIS) that encodes the complex patterns of residues and coordinates that define protein conformation. The entire data resource (ISIS) will provide a system to guide all areas of protein modelling including structure prediction, site-directed mutagenesis and de novo protein design. The availability of ISIS is described in the paper.  相似文献   

13.
In principle, structural information of protein sequences with no detectable homology to a protein of known structure could be obtained by predicting the arrangement of their secondary structural elements. Although some ab initio methods for protein structure prediction have been reported, the long-range interactions required to accurately predict tertiary structures of β-sheet containing proteins are still difficult to simulate. To remedy this problem and facilitate de novo prediction of β-sheet containing protein structures, we developed a support vector machine (SVM) approach that classified parallel and antiparallel orientation of β-strands by using the information of interstrand amino acid pairing preferences. Based on a second-order statistics on the relative frequencies of each possible interstrand amino acid pair, we defined an average amino acid pairing encoding matrix (APEM) for encoding β-strands as input in the prediction model. As a result, a prediction accuracy of 86.89% and a Matthew's correlation coefficient value of 0.71 have been achieved through 7-fold cross-validation on a non-redundant protein dataset from PISCES. Although several issues still remain to be studied, the method presented here to some extent could indicate the important contribution of the amino acid pairs to the β-strand orientation, and provide a possible way to further be combined with other algorithms making a full ‘identification’ of β-strands.  相似文献   

14.
Gilis D  Rooman M 《Proteins》2001,42(2):164-176
The location of protein subunits that form early during folding, constituted of consecutive secondary structure elements with some intrinsic stability and favorable tertiary interactions, is predicted using a combination of threading algorithms and local structure prediction methods. Two folding units are selected among the candidates identified in a database of known protein structures: the fragment 15-55 of 434 cro, an all-alpha protein, and the fragment 1-35 of ubiquitin, an alpha/beta protein. These units are further analyzed by means of Monte Carlo simulated annealing using several database-derived potentials describing different types of interactions. Our results suggest that the local interactions along the chain dominate in the first folding steps of both fragments, and that the formation of some of the secondary structures necessarily occurs before structure compaction. These findings led us to define a prediction protocol, which is efficient to improve the accuracy of the predicted structures. It involves a first simulation with a local interaction potential only, whose final conformation is used as a starting structure of a second simulation that uses a combination of local interaction and distance potentials. The root mean square deviations between the coordinates of predicted and native structures are as low as 2-4 A in most trials. The possibility of extending this protocol to the prediction of full proteins is discussed. Proteins 2001;42:164-176.  相似文献   

15.
Exponential growth in the number of available protein sequences is unmatched by the slower growth in the number of structures. As a result, the development of efficient and fast protein secondary structure prediction methods is essential for the broad comprehension of protein structures. Computational methods that can efficiently determine secondary structure can in turn facilitate protein tertiary structure prediction, since most methods rely initially on secondary structure predictions. Recently, we have developed a fast learning optimized prediction methodology (FLOPRED) for predicting protein secondary structure (Saraswathi et al. in JMM 18:4275, 2012). Data are generated by using knowledge-based potentials combined with structure information from the CATH database. A neural network-based extreme learning machine (ELM) and advanced particle swarm optimization (PSO) are used with this data to obtain better and faster convergence to more accurate secondary structure predicted results. A five-fold cross-validated testing accuracy of 83.8 % and a segment overlap (SOV) score of 78.3 % are obtained in this study. Secondary structure predictions and their accuracy are usually presented for three secondary structure elements: α-helix, β-strand and coil but rarely have the results been analyzed with respect to their constituent amino acids. In this paper, we use the results obtained with FLOPRED to provide detailed behaviors for different amino acid types in the secondary structure prediction. We investigate the influence of the composition, physico-chemical properties and position specific occurrence preferences of amino acids within secondary structure elements. In addition, we identify the correlation between these properties and prediction accuracy. The present detailed results suggest several important ways that secondary structure predictions can be improved in the future that might lead to improved protein design and engineering.  相似文献   

16.
We extend the effective fragment molecular orbital method (EFMO) into treating fragments connected by covalent bonds. The accuracy of EFMO is compared to FMO and conventional ab initio electronic structure methods for polypeptides including proteins. Errors in energy for RHF and MP2 are within 2 kcal/mol for neutral polypeptides and 6 kcal/mol for charged polypeptides similar to FMO but obtained two to five times faster. For proteins, the errors are also within a few kcal/mol of the FMO results. We developed both the RHF and MP2 gradient for EFMO. Compared to ab initio, the EFMO optimized structures had an RMSD of 0.40 and 0.44 Å for RHF and MP2, respectively.  相似文献   

17.
The accurate prediction of protein structure, both secondary and tertiary, is an ongoing problem. Over the years, many approaches have been implemented and assessed. Most prediction algorithms start with the entire amino acid sequence and treat all residues in an identical fashion independent of sequence position. Here, we analyze blind prediction data to investigate whether predictive capability varies along the chain. Free modeling results from recent critical assessment of techniques for protein structure prediction (CASP) experiments are evaluated; as is the most up‐to‐date data from EVA, a fully automated blind test of secondary structure prediction servers. The results demonstrate that structure prediction accuracy is dependent on sequence position. Both secondary structure and tertiary structure predictions are more accurate in regions near the amino(N)‐terminus when compared with analogous regions near the carboxy(C)‐terminus. Eight of 10 secondary structure prediction algorithms assessed by EVA perform significantly better in regions at the N‐terminus. CASP data shows a similar bias, with N‐terminal fragments being predicted more accurately than fragments from the C‐terminus. Two analogous fragments are taken from each model, the N‐terminal fragment begins at the start of the most N‐terminal secondary structure element (SSE), whereas the C‐terminal fragment finishes at the end of the most C‐terminal SSE. Each fragment is locally superimposed onto its respective native fragment. The relative terminal prediction accuracy (RMSD) is calculated on an intramodel basis. At a fragment length of 20 residues, the N‐terminal fragment is predicted with greater accuracy in 79% of cases. Proteins 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

18.
Loops are the most variable regions of protein structure and are, in general, the least accurately predicted. Their prediction has been approached in two ways, ab initio and database search. In recent years, it has been thought that ab initio methods are more powerful. In light of the continued rapid expansion in the number of known protein structures, we have re‐evaluated FREAD, a database search method and demonstrate that the power of database search methods may have been underestimated. We found that sequence similarity as quantified by environment specific substitution scores can be used to significantly improve prediction. In fact, FREAD performs appreciably better for an identifiable subset of loops (two thirds of shorter loops and half of the longer loops tested) than the ab initio methods of MODELLER, PLOP, and RAPPER. Within this subset, FREAD's predictive ability is length independent, in general, producing results within 2Å RMSD, compared to an average of over 10Å for loop length 20 for any of the other tested methods. We also benchmarked the prediction protocols on a set of 212 loops from the model structures in CASP 7 and 8. An extended version of FREAD is able to make predictions for 127 of these, it gives the best prediction of the methods tested in 61 of these cases. In examining FREAD's ability to predict in the model environment, we found that whole structure quality did not affect the quality of loop predictions. Proteins 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

19.
In protein structure prediction, a central problem is defining the structure of a loop connecting 2 secondary structures. This problem frequently occurs in homology modeling, fold recognition, and in several strategies in ab initio structure prediction. In our previous work, we developed a classification database of structural motifs, ArchDB. The database contains 12,665 clustered loops in 451 structural classes with information about phi-psi angles in the loops and 1492 structural subclasses with the relative locations of the bracing secondary structures. Here we evaluate the extent to which sequence information in the loop database can be used to predict loop structure. Two sequence profiles were used, a HMM profile and a PSSM derived from PSI-BLAST. A jack-knife test was made removing homologous loops using SCOP superfamily definition and predicting afterwards against recalculated profiles that only take into account the sequence information. Two scenarios were considered: (1) prediction of structural class with application in comparative modeling and (2) prediction of structural subclass with application in fold recognition and ab initio. For the first scenario, structural class prediction was made directly over loops with X-ray secondary structure assignment, and if we consider the top 20 classes out of 451 possible classes, the best accuracy of prediction is 78.5%. In the second scenario, structural subclass prediction was made over loops using PSI-PRED (Jones, J Mol Biol 1999;292:195-202) secondary structure prediction to define loop boundaries, and if we take into account the top 20 subclasses out of 1492, the best accuracy is 46.7%. Accuracy of loop prediction was also evaluated by means of RMSD calculations.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号