首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Lee J  Lee J  Sasaki TN  Sasai M  Seok C  Lee J 《Proteins》2011,79(8):2403-2417
Ab initio protein structure prediction is a challenging problem that requires both an accurate energetic representation of a protein structure and an efficient conformational sampling method for successful protein modeling. In this article, we present an ab initio structure prediction method which combines a recently suggested novel way of fragment assembly, dynamic fragment assembly (DFA) and conformational space annealing (CSA) algorithm. In DFA, model structures are scored by continuous functions constructed based on short- and long-range structural restraint information from a fragment library. Here, DFA is represented by the full-atom model by CHARMM with the addition of the empirical potential of DFIRE. The relative contributions between various energy terms are optimized using linear programming. The conformational sampling was carried out with CSA algorithm, which can find low energy conformations more efficiently than simulated annealing used in the existing DFA study. The newly introduced DFA energy function and CSA sampling algorithm are implemented into CHARMM. Test results on 30 small single-domain proteins and 13 template-free modeling targets of the 8th Critical Assessment of protein Structure Prediction show that the current method provides comparable and complementary prediction results to existing top methods.  相似文献   

2.
Yang Y  Zhan J  Zhao H  Zhou Y 《Proteins》2012,80(8):2080-2088
A structure alignment program aligns two structures by optimizing a scoring function that measures structural similarity. It is highly desirable that such scoring function is independent of the sizes of proteins in comparison so that the significance of alignment across different sizes of the protein regions aligned is comparable. Here, we developed a new score called SP‐score that fixes the cutoff distance at 4 Å and removed the size dependence using a normalization prefactor. We further built a program called SPalign that optimizes SP‐score for structure alignment. SPalign was applied to recognize proteins within the same structure fold and having the same function of DNA or RNA binding. For fold discrimination, SPalign improves sensitivity over TMalign for the chain‐level comparison by 12% and over DALI for the domain‐level comparison by 13% at the same specificity of 99.6%. The difference between TMalign and SPalign at the chain level is due to the inability of TMalign to detect single domain similarity between multidomain proteins. For recognizing nucleic acid binding proteins, SPalign consistently improves over TMalign by 12% and DALI by 31% in average value of Mathews correlation coefficients for four datasets. SPalign with default setting is 14% faster than TMalign. SPalign is expected to be useful for function prediction and comparing structures with or without domains defined. The source code for SPalign and the server are available at http://sparks.informatics.iupui.edu . Proteins 2012;. © 2012 Wiley Periodicals, Inc.  相似文献   

3.
A fast method to sample real protein conformational space   总被引:2,自引:0,他引:2  
Feldman HJ  Hogue CW 《Proteins》2000,39(2):112-131
A fast computer program, FOLDTRAJ, to generate plausible random protein structures is reported. All-atom proteins are made directly in continuous three-dimensional space starting from primary sequence with an N to C directed build-up method. The method uses a novel pipelined residue addition approach in which the leading edge of the protein is constructed three residues at a time for optimal protein geometry, including the placement of cis proline. Build-up methods represent a classic N-body problem, expected to scale as N(2). When proteins become more collapsed, build-up methods are susceptible to backtracking problems which can scale exponentially with the number of residues required to back out of a trapped walk. We have provided solutions to both these problems, using a multiway binary tree that makes the N-body problem of bump-checking scale as NlogN, and speeding up backtracking by varying the number of tries before backtracking based on available conformational space. FOLDTRAJ is independent of energy potentials, other than that implicit in the geometrical properties derived by statistical studies of known structures, and in atomic Van der Waals radii. WHAT-CHECK shows that the program generates chirally and physically valid proteins with all bond lengths, angles and dihedrals within allowable tolerances. Random structures built using sequences from PDB files 1SEM, 2HPR, and 1RTP typically have 5-15% alpha-helical content (according to DSSP) and on the order of 20% beta-strand/extended content. Ensembles of random structures are compared with polymer theory and with experimentally determined fluorescence resonance energy transfer distances. Reasonably sized structure ensembles do sample most of the conformational space available to proteins. The method is also capable of protein reconstruction using Calpha--Calpha direction vectors, and it compares favorably with methods that reconstruct protein backbones based on alpha-carbon coordinates, having an average backbone and Cbeta root mean square deviation of 0.63 A for nine different protein folds. Proteins 2000;39:112-131.  相似文献   

4.
MOTIVATION: Knots in polypeptide chains have been found in very few proteins, and consequently should be generally avoided in protein structure prediction methods. Most effective structure prediction methods do not model the protein folding process itself, but rather seek only to correctly obtain the final native state. Consequently, the mechanisms that prevent knots from occurring in native proteins are not relevant to the modeling process, and as a result, knots can occur with significantly higher frequency in protein models. Here we describe Knotfind, a simple algorithm for knot detection that is fast enough for structure prediction, where tens or hundreds of thousands of conformations may be sampled during the course of a prediction. We have used this algorithm to characterize knots in large populations of model structures generated for targets in CASP 5 and CASP 6 using the Rosetta homology-based modeling method. RESULTS: Analysis of CASP5 models suggested several possible avenues for introduction of knots into these models, and these insights were applied to structure prediction in CASP 6, resulting in a significant decrease in the proportion of knotted models generated. Additionally, using the knot detection algorithm on structures in the Protein Data Bank, a previously unreported deep trefoil knot was found in acetylornithine transcarbamylase. AVAILABILITY: The Knotfind algorithm is available in the Rosetta structure prediction program at http://www.rosettacommons.org.  相似文献   

5.
MOTIVATION: Since the newly developed Grid platform has been considered as a powerful tool to share resources in the Internet environment, it is of interest to demonstrate an efficient methodology to process massive biological data on the Grid environments at a low cost. This paper presents an efficient and economical method based on a Grid platform to predict secondary structures of all proteins in a given organism, which normally requires a long computation time through sequential execution, by means of processing a large amount of protein sequence data simultaneously. From the prediction results, a genome scale protein fold space can be pursued. RESULTS: Using the improved Grid platform, the secondary structure prediction on genomic scale and protein topology derived from the new scoring scheme for four different model proteomes was presented. This protein fold space was compared with structures from the Protein Data Bank, database and it showed similarly aligned distribution. Therefore, the fold space approach based on this new scoring scheme could be a guideline for predicting a folding family in a given organism.  相似文献   

6.
Function in proteins largely depends on the acquisition of specific structures through folding at physiological time scales. Under both equilibrium and non-equilibrium states, proteins develop partially structured molecules that being intermediates in the process, usually resemble the structure of the fully folded protein. These intermediates, known as molten globules, present the faculty of adopting a large variety of conformations mainly supported by changes in their side chains. Taking into account that the mechanism to obtain a fully packed structure is considered more difficult energetically than forming partially “disordered” folding intermediates, evolution might have conferred upon an important number of proteins the capability to first partially fold and—depending on the presence of specific partner ligands—switch on disorder-to-order transitions to adopt a highly ordered well-folded state and reach the lowest energy conformation possible. Disorder in this context can represent segments of proteins or complete proteins that might exist in the native state. Moreover, because this type of disorder-to-order transition in proteins has been found to be reversible, it has been frequently associated with important signaling events in the cell. Due to the central role of this phenomenon in cell biology, protein misfolding and aberrant disorder-to-order transitions have been at present associated with an important number of diseases.  相似文献   

7.
Parallel cascade identification is a method for modeling dynamic systems with possibly high order nonlinearities and lengthy memory, given only input/output data for the system gathered in an experiment. While the method was originally proposed for nonlinear system identification, two recent papers have illustrated its utility for protein family prediction. One strength of this approach is the capability of training effective parallel cascade classifiers from very little training data. Indeed, when the amount of training exemplars is limited, and when distinctions between a small number of categories suffice, parallel cascade identification can outperform some state-of-the-art techniques. Moreover, the unusual approach taken by this method enables it to be effectively combined with other techniques to significantly improve accuracy. In this paper, parallel cascade identification is first reviewed, and its use in a variety of different fields is surveyed. Then protein family prediction via this method is considered in detail, and some particularly useful applications are pointed out.  相似文献   

8.
新型降水分布数学模型研究及其应用   总被引:3,自引:0,他引:3  
在分布式水文模型中,单元栅格内的降水输入是准确模拟各种水文过程的关键因素,寻求产生分布式降水数据的方法是水文模型研究的热点之一.在对国内外降水模型分析基础上,认为流域面上实际降水分布是天气系统降水与下垫面地形影响共同作用的结果,如果不受地形影响,天气系统降水的降水量等值线在平面上的分布近似为一组同心椭圆.根据这一原理,建立了一种能够模拟天气系统降水分布,并利用牛顿插值法对模拟结果进行地形影响修正的新型降水分布数学模型,提出了对降水中心位置及其中心降水量的模型模拟.利用黄土高原西川河流域实测资料对模型进行了检验,结果表明,该模型具有较高精度.由于模型概念简单明晰,且能指明降水中心位置及其中心降水量,因此在流域暴雨分析和洪水预报中具有一定价值.  相似文献   

9.
10.
The folding process defines three‐dimensional protein structures from their amino acid chains. A protein's structure determines its activity and properties; thus knowing such conformation on an atomic level is essential for both basic and applied studies of protein function and dynamics. However, the acquisition of such structures by experimental methods is slow and expensive, and current computational methods mostly depend on previously known structures to determine new ones. Here we present a new software called GSAFold that applies the generalized simulated annealing (GSA) algorithm on ab initio protein structure prediction. The GSA is a stochastic search algorithm employed in energy minimization and used in global optimization problems, especially those that depend on long‐range interactions, such as gravity models and conformation optimization of small molecules. This new implementation applies, for the first time in ab initio protein structure prediction, an analytical inverse for the Visitation function of GSA. It also employs the broadly used NAMD Molecular Dynamics package to carry out energy calculations, allowing the user to select different force fields and parameterizations. Moreover, the software also allows the execution of several simulations simultaneously. Applications that depend on protein structures include rational drug design and structure‐based protein function prediction. Applying GSAFold in a test peptide, it was possible to predict the structure of mastoparan‐X to a root mean square deviation of 3.00 Å. Proteins 2012; © 2012 Wiley Periodicals, Inc.  相似文献   

11.
Protein chemical shifts encode detailed structural information that is difficult and computationally costly to describe at a fundamental level. Statistical and machine learning approaches have been used to infer correlations between chemical shifts and secondary structure from experimental chemical shifts. These methods range from simple statistics such as the chemical shift index to complex methods using neural networks. Notwithstanding their higher accuracy, more complex approaches tend to obscure the relationship between secondary structure and chemical shift and often involve many parameters that need to be trained. We present hidden Markov models (HMMs) with Gaussian emission probabilities to model the dependence between protein chemical shifts and secondary structure. The continuous emission probabilities are modeled as conditional probabilities for a given amino acid and secondary structure type. Using these distributions as outputs of first‐ and second‐order HMMs, we achieve a prediction accuracy of 82.3%, which is competitive with existing methods for predicting secondary structure from protein chemical shifts. Incorporation of sequence‐based secondary structure prediction into our HMM improves the prediction accuracy to 84.0%. Our findings suggest that an HMM with correlated Gaussian distributions conditioned on the secondary structure provides an adequate generative model of chemical shifts. Proteins 2013; © 2012 Wiley Periodicals, Inc.  相似文献   

12.

Background  

One of the most powerful methods for the prediction of protein structure from sequence information alone is the iterative construction of profile-type models. Because profiles are built from sequence alignments, the sequences included in the alignment and the method used to align them will be important to the sensitivity of the resulting profile. The inclusion of highly diverse sequences will presumably produce a more powerful profile, but distantly related sequences can be difficult to align accurately using only sequence information. Therefore, it would be expected that the use of protein structure alignments to improve the selection and alignment of diverse sequence homologs might yield improved profiles. However, the actual utility of such an approach has remained unclear.  相似文献   

13.
A conformational triangle method is presented to analyze the secondary structure contents of 1028 structurally known proteins in the non-redundant data set of the recent 25% PDB_SELECT. The secondary structure contents of each protein are mapped on to a point in the triangle. It was found that the distribution of the 1028 points is strongly skewed in the triangle and about 42% of the whole area is empty, which is called the forbidden area. The detailed border between the allowable and forbidden areas was calculated. The possible explanation of the skewed distribution is discussed. The distributions of the mapping points for enzymes and non-enzymes in this non-redundant data set are compared. It was found that a necessary rather than a sufficient condition for an enzyme molecule is that its coil content must be >/=0.223. It is hoped that the skewed distribution observed here could be used to test the secondary structure and threading predictions.  相似文献   

14.
15.
By considering how polymer structures are distributed in conformation space, we show that it is possible to quantify the difficulty of structural prediction and to provide a measure of progress for prediction calculations. The critical issue is the probability that a conformation is found within a specified distance of another conformer. We address this question by constructing a cumulative distribution function (CDF) for the average probability from observations about its limiting behavior at small displacements and numerical simulations of polyalanine chains. We can use the CDF to estimate the likelihood that a structure prediction is better than random chance. For example, the chance of randomly predicting the native backbone structure of a 150-amino-acid protein to low resolution, say within 6 A, is 10(-14). A high-resolution structural prediction, say to 2 A, is immensely more difficult (10(-57)). With additional assumptions, the CDF yields the conformational entropy of protein folding from native-state coordinate variance. Or, using values of the conformational entropy change on folding, we can estimate the native state's conformational span. For example, for a 150-mer protein, equilibrium alpha-carbon displacements in the native ensemble would be 0.3-0.5 A based on T Delta S of 1.42 kcal/(mol residue).  相似文献   

16.
A segment-based approach to protein secondary structure prediction.   总被引:4,自引:0,他引:4  
Amino acid sequence patterns have been used to identify the location of turns in globular proteins [Cohen et al. (1986) Biochemistry 25, 266-275]. We have developed sequence patterns that facilitate the prediction of helices in all helical proteins. Regular expression patterns recognize the component parts of a helix: the amino terminus (N-cap), the core of the helix (core), and the carboxy terminus (C-cap). These patterns recognize the core features of helices with a 95% success rate and the N- and C-capping features with success rates of 56% and 48%, respectively. A metapattern language, ALPPS, coordinates the recognition of turns and helical components in a scheme that predicts the location and extent of alpha-helices. On the basis of raw residue scoring, a 71% success rate is observed. By focusing on the recognition of core helical features, we achieve a 78% success rate. Amended scoring procedures are presented and discussed, and comparisons are made to other predictive schemes.  相似文献   

17.
A molecular theory of protein secondary structure is presented that takes account of both local interactions inside each chain region and long-range interactions between different regions, incorporating all these interactions in a single Ising-like model. Local interactions are evaluated from the stereochemical theory describing the relative stabilities of α- and β-structures for different residues in synthetic polypeptides, while long-range effects are approximated by the interaction of each chain region with the averaged hydrophobic template. Based on this theory, an algorithm of protein secondary structure prediction is proposed and examples are given of “blind” predictions made before the x-ray structural data became available.  相似文献   

18.
We have developed an iterative hybrid algorithm (HA) to predict the 3D structure of peptides starting from their amino acid sequence. The HA is made of a modified genetic algorithm (GA) coupled to a local optimizer. Each HA iteration is carried out in two phases. In the first phase several GA runs are performed upon the entire peptide conformational space. In the second phase we used the manifestation of what we have called conformational memories, that arises at the end of the first phase, as a way of reducing the peptide conformational space in subsequent HA iterations. Use of conformational memories speeds up and refines the localization of the structure at the putative Global Energy Minimum (GEM) since conformational barriers are avoided. The algorithm has been used to predict successfully the putative GEM for Met- and Leu-enkephalin, and to obtain useful information regarding the 3D structure for the 8mer of polyglycine and the 16 residue (AAQAA)(3)Y peptide. The number of fitness function evaluations needed to locate the putative GEMs are fewer than those reported for other heuristic methods. This study opens the possibility of using Genetic Algorithms in high level predictions of secondary structure of polypeptides.  相似文献   

19.
Bhardwaj N  Lu H 《FEBS letters》2007,581(5):1058-1066
Protein-DNA interactions are crucial to many cellular activities such as expression-control and DNA-repair. These interactions between amino acids and nucleotides are highly specific and any aberrance at the binding site can render the interaction completely incompetent. In this study, we have three aims focusing on DNA-binding residues on the protein surface: to develop an automated approach for fast and reliable recognition of DNA-binding sites; to improve the prediction by distance-dependent refinement; use these predictions to identify DNA-binding proteins. We use a support vector machines (SVM)-based approach to harness the features of the DNA-binding residues to distinguish them from non-binding residues. Features used for distinction include the residue's identity, charge, solvent accessibility, average potential, the secondary structure it is embedded in, neighboring residues, and location in a cationic patch. These features collected from 50 proteins are used to train SVM. Testing is then performed on another set of 37 proteins, much larger than any testing set used in previous studies. The testing set has no more than 20% sequence identity not only among its pairs, but also with the proteins in the training set, thus removing any undesired redundancy due to homology. This set also has proteins with an unseen DNA-binding structural class not present in the training set. With the above features, an accuracy of 66% with balanced sensitivity and specificity is achieved without relying on homology or evolutionary information. We then develop a post-processing scheme to improve the prediction using the relative location of the predicted residues. Balanced success is then achieved with average sensitivity, specificity and accuracy pegged at 71.3%, 69.3% and 70.5%, respectively. Average net prediction is also around 70%. Finally, we show that the number of predicted DNA-binding residues can be used to differentiate DNA-binding proteins from non-DNA-binding proteins with an accuracy of 78%. Results presented here demonstrate that machine-learning can be applied to automated identification of DNA-binding residues and that the success rate can be ameliorated as more features are added. Such functional site prediction protocols can be useful in guiding consequent works such as site-directed mutagenesis and macromolecular docking.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号