首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Homaeian L  Kurgan LA  Ruan J  Cios KJ  Chen K 《Proteins》2007,69(3):486-498
Secondary protein structure carries information about local structural arrangements, which include three major conformations: alpha-helices, beta-strands, and coils. Significant majority of successful methods for prediction of the secondary structure is based on multiple sequence alignment. However, multiple alignment fails to provide accurate results when a sequence comes from the twilight zone, that is, it is characterized by low (<30%) homology. To this end, we propose a novel method for prediction of secondary structure content through comprehensive sequence representation, called PSSC-core. The method uses a multiple linear regression model and introduces a comprehensive feature-based sequence representation to predict amount of helices and strands for sequences from the twilight zone. The PSSC-core method was tested and compared with two other state-of-the-art prediction methods on a set of 2187 twilight zone sequences. The results indicate that our method provides better predictions for both helix and strand content. The PSSC-core is shown to provide statistically significantly better results when compared with the competing methods, reducing the prediction error by 5-7% for helix and 7-9% for strand content predictions. The proposed feature-based sequence representation uses a comprehensive set of physicochemical properties that are custom-designed for each of the helix and strand content predictions. It includes composition and composition moment vectors, frequency of tetra-peptides associated with helical and strand conformations, various property-based groups like exchange groups, chemical groups of the side chains and hydrophobic group, auto-correlations based on hydrophobicity, side-chain masses, hydropathy, and conformational patterns for beta-sheets. The PSSC-core method provides an alternative for predicting the secondary structure content that can be used to validate and constrain results of other structure prediction methods. At the same time, it also provides useful insight into design of successful protein sequence representations that can be used in developing new methods related to prediction of different aspects of the secondary protein structure.  相似文献   

2.
Adamczak R  Porollo A  Meller J 《Proteins》2005,59(3):467-475
Owing to the use of evolutionary information and advanced machine learning protocols, secondary structures of amino acid residues in proteins can be predicted from the primary sequence with more than 75% per-residue accuracy for the 3-state (i.e., helix, beta-strand, and coil) classification problem. In this work we investigate whether further progress may be achieved by incorporating the relative solvent accessibility (RSA) of an amino acid residue as a fingerprint of the overall topology of the protein. Toward that goal, we developed a novel method for secondary structure prediction that uses predicted RSA in addition to attributes derived from evolutionary profiles. Our general approach follows the 2-stage protocol of Rost and Sander, with a number of Elman-type recurrent neural networks (NNs) combined into a consensus predictor. The RSA is predicted using our recently developed regression-based method that provides real-valued RSA, with the overall correlation coefficients between the actual and predicted RSA of about 0.66 in rigorous tests on independent control sets. Using the predicted RSA, we were able to improve the performance of our secondary structure prediction by up to 1.4% and achieved the overall per-residue accuracy between 77.0% and 78.4% for the 3-state classification problem on different control sets comprising, together, 603 proteins without homology to proteins included in the training. The effects of including solvent accessibility depend on the quality of RSA prediction. In the limit of perfect prediction (i.e., when using the actual RSA values derived from known protein structures), the accuracy of secondary structure prediction increases by up to 4%. We also observed that projecting real-valued RSA into 2 discrete classes with the commonly used threshold of 25% RSA decreases the classification accuracy for secondary structure prediction. While the level of improvement of secondary structure prediction may be different for prediction protocols that implicitly account for RSA in other ways, we conclude that an increase in the 3-state classification accuracy may be achieved when combining RSA with a state-of-the-art protocol utilizing evolutionary profiles. The new method is available through a Web server at http://sable.cchmc.org.  相似文献   

3.
We have revisited the protein coarse-grained optimized potential for efficient structure prediction (OPEP). The training and validation sets consist of 13 and 16 protein targets. Because optimization depends on details of how the ensemble of decoys is sampled, trial conformations are generated by molecular dynamics, threading, greedy, and Monte Carlo simulations, or taken from publicly available databases. The OPEP parameters are varied by a genetic algorithm using a scoring function which requires that the native structure has the lowest energy, and the native-like structures have energy higher than the native structure but lower than the remote conformations. Overall, we find that OPEP correctly identifies 24 native or native-like states for 29 targets and has very similar capability to the all-atom discrete optimized protein energy model (DOPE), found recently to outperform five currently used energy models.  相似文献   

4.
Protein chemical shifts encode detailed structural information that is difficult and computationally costly to describe at a fundamental level. Statistical and machine learning approaches have been used to infer correlations between chemical shifts and secondary structure from experimental chemical shifts. These methods range from simple statistics such as the chemical shift index to complex methods using neural networks. Notwithstanding their higher accuracy, more complex approaches tend to obscure the relationship between secondary structure and chemical shift and often involve many parameters that need to be trained. We present hidden Markov models (HMMs) with Gaussian emission probabilities to model the dependence between protein chemical shifts and secondary structure. The continuous emission probabilities are modeled as conditional probabilities for a given amino acid and secondary structure type. Using these distributions as outputs of first‐ and second‐order HMMs, we achieve a prediction accuracy of 82.3%, which is competitive with existing methods for predicting secondary structure from protein chemical shifts. Incorporation of sequence‐based secondary structure prediction into our HMM improves the prediction accuracy to 84.0%. Our findings suggest that an HMM with correlated Gaussian distributions conditioned on the secondary structure provides an adequate generative model of chemical shifts. Proteins 2013; © 2012 Wiley Periodicals, Inc.  相似文献   

5.
6.
The conditional probability, P(sigma/x), is a statement of the probability that the value of sigma will be found given the prior information that a value of x has been observed. Here sigma represents any one of the secondary structure types, alpha, beta, tau, and rho for helix, sheet, turn, and random, respectively, and x represents a sequence attribute, including, but not limited to: (1) hydropathy; (2) hydrophobic moments assuming helix and sheet; (3) Richardson and Richardson helical N-cap and C-cap values; (4) Chou-Fasman conformational parameters for helix, P alpha, for sheet, P beta, and for turn, P tau; and (5) Garnier, Osguthorpe, and Robson (GOR) information values for helix, I alpha, for sheet, I beta, for turn, I tau, and for random structure, I rho. Plots of P(sigma/x) vs. x are demonstrated to provide information about the correlation between structure and attribute, sigma and x. The separations between different P(sigma/x) vs. x curves indicate the capacity of a given attribute to discriminate between different secondary structural types and permit comparison of different attributes. P(alpha/x), P(beta/x), P(tau/x) and P(rho/x) vs. x plots show that the most useful attributes for discriminating helix are, in order: hydrophobic moment assuming helix greater than P alpha much greater than N-cap greater than C-cap approximately I alpha approximately I tau. The information value for turns, I tau, was found to discriminate helix better than turns. Discrimination for sheet was found to be in the following order: I beta much greater than P beta approximately hydropathy greater than I rho approximately hydrophobic moment assuming sheet. Three attributes, at their low values, were found to give significant discrimination for the absence of helix: I alpha approximately P alpha approximately hydrophobic moment assuming helix. Also, three other attributes were found to indicate the absence of sheet: P beta much greater than I rho approximately hydropathy. Indications of the absence of sigma could be as useful for some applications as the indication of the presence of sigma.  相似文献   

7.
A novel method for predicting the secondary structures of proteins from amino acid sequence has been presented. The protein secondary structure seqlets that are analogous to the words in natural language have been extracted. These seqlets will capture the relationship between amino acid sequence and the secondary structures of proteins and further form the protein secondary structure dictionary. To be elaborate, the dictionary is organism-specific. Protein secondary structure prediction is formulated as an integrated word segmentation and part of speech tagging problem. The word-lattice is used to represent the results of the word segmentation and the maximum entropy model is used to calculate the probability of a seqlet tagged as a certain secondary structure type. The method is markovian in the seqlets, permitting efficient exact calculation of the posterior probability distribution over all possible word segmentations and their tags by viterbi algorithm. The optimal segmentations and their tags are computed as the results of protein secondary structure prediction. The method is applied to predict the secondary structures of proteins of four organisms respectively and compared with the PHD method. The results show that the performance of this method is higher than that of PHD by about 3.9% Q3 accuracy and 4.6% SOV accuracy. Combining with the local similarity protein sequences that are obtained by BLAST can give better prediction. The method is also tested on the 50 CASP5 target proteins with Q3 accuracy 78.9% and SOV accuracy 77.1%. A web server for protein secondary structure prediction has been constructed which is available at http://www.insun.hit.edu.cn:81/demos/biology/index.html.  相似文献   

8.
Jia M  Luo L  Liu C 《Biopolymers》2004,73(1):16-26
A new integrated sequence-structure database, called IADE (Integrated ASTRAL-DSSP-EMBL), incorporating matching mRNA sequence, amino acid sequence, and protein secondary structural data, is constructed. It includes 648 protein domains. Based on the IADE database, we studied the relation between RNA stem-loop frequencies and protein secondary structure. It was found that the alpha-helices and beta-strands on proteins tend to be preferably "coded" by mRNA stem region, while the coils on proteins tend to be preferably "coded" by mRNA loop region. These tendencies are more obvious if we observe the structural words (SWs). An SW is defined by a four-amino-acid-fragment that shows the pronounced secondary structural (alpha-helix or beta-strand) propensity. It is demonstrated that the deduced correlation between protein and mRNA structure can hardly be explained as the stochastic fluctuation effect.  相似文献   

9.
1 Introduction The prediction of protein structure and function from amino acid sequences is one of the most impor-tant problems in molecular biology. This problem is becoming more pressing as the number of known pro-tein sequences is explored as a result of genome and other sequencing projects, and the protein sequence- structure gap is widening rapidly[1]. Therefore, com-putational tools to predict protein structures are needed to narrow the widening gap. Although the prediction of three dim…  相似文献   

10.
Vries JK  Liu X  Bahar I 《Proteins》2007,68(4):830-838
An n-gram pattern (NP{n,m}) in a protein sequence is a set of n residues and m wildcards in a window of size n+m. Each window of n+m amino acids is associated with a collection of NP{n,m} patterns based on the combinatorics of n+m objects taken m at a time. NP{n,m} patterns that are shared between sequences reflect evolutionary relationships. Recently the authors developed an alignment-independent protein classification algorithm based on shared NP{4,2} patterns that compared favorably to PSI-BLAST. Theoretically, NP{4,2} patterns should also reflect secondary structure propensity since they contain all possible n-grams for 1 < or = n < or = 4 and a window of 6 residues is wide enough to capture periodicities in the 2 < or = n < or = 5 range. This sparked interest in differentiating the information content in NP{4,2} patterns related to evolution from the content related to local propensity. The probability of alpha-, beta-, and coil components was determined for every NP{4,2} pattern over all the chains in the Protein Data Bank (PDB). An algorithm exclusively based on the Z-values of these distributions was developed, which accurately predicted 71-76% of alpha-helical segments and 62-67% of beta-sheets in rigorous jackknife tests. This provided evidence for the strong correlation between NP{4,2} patterns and secondary structure. By grouping PDB chains into subsets with increasing levels of sequence identity, it was also possible to separate the evolutionary and local propensity contributions to the classification process. The results showed that information derived from evolutionary relationships was more important for beta-sheet prediction than alpha-helix prediction.  相似文献   

11.
The Automated Protein Structure Analysis (APSA) method, which describes the protein backbone as a smooth line in three‐dimensional space and characterizes it by curvature κ and torsion τ as a function of arc length s, was applied on 77 proteins to determine all secondary structural units via specific κ(s) and τ(s) patterns. A total of 533 α‐helices and 644 β‐strands were recognized by APSA, whereas DSSP gives 536 and 651 units, respectively. Kinks and distortions were quantified and the boundaries (entry and exit) of secondary structures were classified. Similarity between proteins can be easily quantified using APSA, as was demonstrated for the roll architecture of proteins ubiquitin and spinach ferridoxin. A twenty‐by‐twenty comparison of all α domains showed that the curvature‐torsion patterns generated by APSA provide an accurate and meaningful similarity measurement for secondary, super secondary, and tertiary protein structure. APSA is shown to accurately reflect the conformation of the backbone effectively reducing three‐dimensional structure information to two‐dimensional representations that are easy to interpret and understand. Proteins 2009. © 2008 Wiley‐Liss, Inc.  相似文献   

12.
A thermodynamic model describing formation of α-helices by peptides and proteins in the absence of specific tertiary interactions has been developed. The model combines free energy terms defining α-helix stability in aqueous solution and terms describing immersion of every helix or fragment of coil into a micelle or a nonpolar droplet created by the rest of protein to calculate averaged or lowest energy partitioning of the peptide chain into helical and coil fragments. The α-helix energy in water was calculated with parameters derived from peptide substitution and protein engineering data and using estimates of nonpolar contact areas between side chains. The energy of nonspecific hydrophobic interactions was estimated considering each α-helix or fragment of coil as freely floating in the spherical micelle or droplet, and using water/cyclohexane (for micelles) or adjustable (for proteins) side-chain transfer energies. The model was verified for 96 and 36 peptides studied by 1H-nmr spectroscopy in aqueous solution and in the presence of micelles, respectively ([set I] and [set 2]) and for 30 mostly α-helical globular proteins ([set 3]). For peptides, the experimental helix locations were identified from the published medium-range nuclear Overhauser effects detected by 1H-nmr spectroscopy. For sets 1, 2, and 3, respectively, 93, 100, and 97% of helices were identified with average errors in calculation of helix boundaries of 1.3, 2.0, and 4.1 residues per helix and an average percentage of correctly calculated helix—coil states of 93, 89, and 81%, respectively. Analysis of adjustable parameters of the model (the entropy and enthalpy of the helix—coil transition, the transfer energy of the helix backbone, and parameters of the bound coil), determined by minimization of the average helix boundary deviation for each set of peptides or proteins, demonstrates that, unlike micelles, the interior of the effective protein droplet has solubility characteristics different from that for cyclohexane, does not bind fragments of coil, and lacks interfacial area. © 1997 John Wiley & Sons, Inc. Biopoly 42: 239–269, 1997  相似文献   

13.
The elucidation of the domain content of a given protein sequence in the absence of determined structure or significant sequence homology to known domains is an important problem in structural biology. Here we address how successfully the delineation of continuous domains can be accomplished in the absence of sequence homology using simple baseline methods, an existing prediction algorithm (Domain Guess by Size), and a newly developed method (DomSSEA). The study was undertaken with a view to measuring the usefulness of these prediction methods in terms of their application to fully automatic domain assignment. Thus, the sensitivity of each domain assignment method was measured by calculating the number of correctly assigned top scoring predictions. We have implemented a new continuous domain identification method using the alignment of predicted secondary structures of target sequences against observed secondary structures of chains with known domain boundaries as assigned by Class Architecture Topology Homology (CATH). Taking top predictions only, the success rate of the method in correctly assigning domain number to the representative chain set is 73.3%. The top prediction for domain number and location of domain boundaries was correct for 24% of the multidomain set (+/-20 residues). These results have been put into context in relation to the results obtained from the other prediction methods assessed.  相似文献   

14.
Chengcheng Hu  Patrice Koehl 《Proteins》2010,78(7):1736-1747
The three‐dimensional structure of a protein is organized around the packing of its secondary structure elements. Although much is known about the packing geometry observed between α‐helices and between β‐sheets, there has been little progress on characterizing helix–sheet interactions. We present an analysis of the conformation of αβ2 motifs in proteins, corresponding to all occurrences of helices in contact with two strands that are hydrogen bonded. The geometry of the αβ2 motif is characterized by the azimuthal angle θ between the helix axis and an average vector representing the two strands, the elevation angle ψ between the helix axis and the plane containing the two strands, and the distance D between the helix and the strands. We observe that the helix tends to align to the two strands, with a preference for an antiparallel orientation if the two strands are parallel; this preference is diminished for other topologies of the β‐sheet. Side‐chain packing at the interface between the helix and the strands is mostly hydrophobic, with a preference for aliphatic amino acids in the strand and aromatic amino acids in the helix. From the knowledge of the geometry and amino acid propensities of αβ2 motifs in proteins, we have derived different statistical potentials that are shown to be efficient in picking native‐like conformations among a set of non‐native conformations in well‐known decoy datasets. The information on the geometry of αβ2 motifs as well as the related statistical potentials have applications in the field of protein structure prediction. Proteins 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

15.
The secondary structures of proteins (alpha-helical, beta-sheet, beta-turn, and random coil) in the solid state and when bound to polymer beads, containing immobilized phenyl and butyl ligands such as those as commonly employed in hydrophobic interaction chromatography, have been investigated using FTIR-ATR spectroscopy and partial least squares (PLS) methods. Proteins with known structural features were used as models, including 12 proteins in the solid state and 7 proteins adsorbed onto the hydrophobic surfaces. A strong PLS correlation was achieved between predictions derived from the experimental data for 4 proteins adsorbed onto the phenyl-modified beads and reference data obtained from the X-ray crystallographic structures with r(2) values of 0.9974, 0.9864, 0.9924, and 0.9743 for alpha-helical, beta-sheet, beta-turn, and random coiled structures, respectively. On the other hand, proteins adsorbed onto the butyl sorbent underwent greater secondary structural changes compared to the phenyl sorbent as evidenced from the poorer PLS r(2) values (r(2) are 0.9658, 0.9106, 0.9571, and 0.9340). The results thus indicate that the secondary structures for these proteins were more affected by the butyl sorbent, whereas the secondary structure remains relatively unchanged for the proteins adsorbed onto the phenyl sorbent. This study has important ramifications for understanding the nature of protein secondary structural changes following adsorption onto hydrophobic sorbent surfaces. This knowledge could also enable the development of useful protocols for enhancing the chromatographic purification of proteins in their native bioactive states. (c) 2008 Wiley Periodicals, Inc. Biopolymers 89: 895-905, 2008.This article was originally published online as an accepted preprint. The "Published Online" date corresponds to the preprint version. You can request a copy of the preprint by emailing the Biopolymers editorial office at biopolymers@wiley.com.  相似文献   

16.
17.
In this study we present an accurate secondary structure prediction procedure by using a query and related sequences. The most novel aspect of our approach is its reliance on local pairwise alignment of the sequence to be predicted with each related sequence rather than utilization of a multiple alignment. The residue-by-residue accuracy of the method is 75% in three structural states after jack-knife tests. The gain in prediction accuracy compared with the existing techniques, which are at best 72%, is achieved by secondary structure propensities based on both local and long-range effects, utilization of similar sequence information in the form of carefully selected pairwise alignment fragments, and reliance on a large collection of known protein primary structures. The method is especially appropriate for large-scale sequence analysis efforts such as genome characterization, where precise and significant multiple sequence alignments are not available or achievable. Proteins 27:329–335, 1997. © 1997 Wiley-Liss, Inc.  相似文献   

18.
The solution conformation of three peptides corresponding to the two beta-hairpins and the alpha-helix of the protein L B1 domain have been analyzed by circular dichroism (CD) and nuclear magnetic resonance spectroscopy (NMR). In aqueous solution, the three peptides show low populations of native and non-native locally folded structures, but no well-defined hairpin or helix structures are formed. In 30% aqueous trifluoroethanol (TFE), the peptide corresponding to the alpha-helix adopts a high populated helical conformation three residues longer than in the protein. The hairpin peptides aggregate in TFE, and no significant conformational change occurs in the NMR observable fraction of molecules. These results indicate that the helical peptide has a significant intrinsic tendency to adopt its native structure and that the hairpin sequences seem to be selected as non-helical. This suggests that these sequences favor the structure finally attained in the protein, but the contribution of the local interactions alone is not enough to drive the formation of a detectable population of native secondary structures. This pattern of secondary structure tendencies is different to those observed in two structurally related proteins: ubiquitin and the protein G B1 domain. The only common feature is a certain propensity of the helical segments to form the native structure. These results indicate that for a protein to fold, there is no need for large native-like secondary structure propensities, although a minimum tendency to avoid non-native structures and to favor native ones could be required.  相似文献   

19.
The most popular algorithms employed in the pairwise alignment of protein primary structures (Smith-Watermann (SW) algorithm, FASTA, BLAST, etc.) only analyze the amino acid sequence. The SW algorithm is the most accurate, yielding alignments that agree best with superimpositions of the corresponding spatial structures of proteins. However, even the SW algorithm fails to reproduce the spatial structure alignment when the sequence identity is lower than 30%. The objective of this work was to develop a new and more accurate algorithm taking the secondary structure of proteins into account. The alignments generated by this algorithm and having the maximal weight with the secondary structure considered proved to be more accurate than SW alignments. With sequences having less than 30% identity, the accuracy (i.e., the portion of reproduced positions of a reference alignment obtained by superimposing the protein spatial structures) of the new algorithm is 58 vs. 35% of the SW algorithm. The accuracy of the new algorithm is much the same with secondary structures established experimentally or predicted theoretically. Hence, the algorithm is applicable to proteins with unknown spatial structures. The program is available at ftp://194.149.64.196/STRUSWER/.  相似文献   

20.
To improve secondary structure predictions in protein sequences, the information residing in multiple sequence alignments of substituted but structurally related proteins is exploited. A database comprised of 70 protein families and a total of 2,500 sequences, some of which were aligned by tertiary structural superpositions, was used to calculate residue exchange weight matrices within alpha-helical, beta-strand, and coil substructures, respectively. Secondary structure predictions were made based on the observed residue substitutions in local regions of the multiple alignments and the largest possible associated exchange weights in each of the three matrix types. Comparison of the observed and predicted secondary structure on a per-residue basis yielded a mean accuracy of 72.2%. Individual alpha-helix, beta-strand, and coil states were respectively predicted at 66.7, and 75.8% correctness, representing a well-balanced three-state prediction. The accuracy level, verified by cross-validation through jack-knife tests on all protein families, dropped, on average, to only 70.9%, indicating the rigor of the prediction procedure. On the basis of robustness, conceptual clarity, accuracy, and executable efficiency, the method has considerable advantage, especially with its sole reliance on amino acid substitutions within structurally related proteins.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号