首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
To improve secondary structure predictions in protein sequences, the information residing in multiple sequence alignments of substituted but structurally related proteins is exploited. A database comprised of 70 protein families and a total of 2,500 sequences, some of which were aligned by tertiary structural superpositions, was used to calculate residue exchange weight matrices within alpha-helical, beta-strand, and coil substructures, respectively. Secondary structure predictions were made based on the observed residue substitutions in local regions of the multiple alignments and the largest possible associated exchange weights in each of the three matrix types. Comparison of the observed and predicted secondary structure on a per-residue basis yielded a mean accuracy of 72.2%. Individual alpha-helix, beta-strand, and coil states were respectively predicted at 66.7, and 75.8% correctness, representing a well-balanced three-state prediction. The accuracy level, verified by cross-validation through jack-knife tests on all protein families, dropped, on average, to only 70.9%, indicating the rigor of the prediction procedure. On the basis of robustness, conceptual clarity, accuracy, and executable efficiency, the method has considerable advantage, especially with its sole reliance on amino acid substitutions within structurally related proteins.  相似文献   

2.
A graphic approach to evaluate algorithms of secondary structure prediction   总被引:3,自引:0,他引:3  
Algorithms of secondary structure prediction have undergone the developments of nearly 30 years. However, the problem of how to appropriately evaluate and compare algorithms has not yet completely solved. A graphic method to evaluate algorithms of secondary structure prediction has been proposed here. Traditionally, the performance of an algorithm is evaluated by a number, i.e., accuracy of various definitions. Instead of a number, we use a graph to completely evaluate an algorithm, in which the mapping points are distributed in a three-dimensional space. Each point represents the predictive result of the secondary structure of a protein. Because the distribution of mapping points in the 3D space generally contains more information than a number or a set of numbers, it is expected that algorithms may be evaluated and compared by the proposed graphic method more objectively. Based on the point distribution, six evaluation parameters are proposed, which describe the overall performance of the algorithm evaluated. Furthermore, the graphic method is simple and intuitive. As an example of application, two advanced algorithms, i.e., the PHD and NNpredict methods, are evaluated and compared. It is shown that there is still much room for further improvement for both algorithms. It is pointed out that the accuracy for predicting either the alpha-helix or beta-strand in proteins with higher alpha-helix or beta-strand content, respectively, should be greatly improved for both algorithms.  相似文献   

3.
Machine learning approach for the prediction of protein secondary structure   总被引:8,自引:0,他引:8  
PROMIS (protein machine induction system), a program for machine learning, was used to generalize rules that characterize the relationship between primary and secondary structure in globular proteins. These rules can be used to predict an unknown secondary structure from a known primary structure. The symbolic induction method used by PROMIS was specifically designed to produce rules that are meaningful in terms of chemical properties of the residues. The rules found were compared with existing knowledge of protein structure: some features of the rules were already recognized (e.g. amphipathic nature of alpha-helices). Other features are not understood, and are under investigation. The rules produced a prediction accuracy for three states (alpha-helix, beta-strand and coil) of 60% for all proteins, 73% for proteins of known alpha domain type, 62% for proteins of known beta domain type and 59% for proteins of known alpha/beta domain type. We conclude that machine learning is a useful tool in the examination of the large databases generated in molecular biology.  相似文献   

4.
Matsuo K  Watanabe H  Gekko K 《Proteins》2008,73(1):104-112
Synchrotron-radiation vacuum-ultraviolet circular dichroism (VUVCD) spectroscopy can significantly improve the predictive accuracy of the contents and segment numbers of protein secondary structures by extending the short-wavelength limit of the spectra. In the present study, we combined VUVCD spectra down to 160 nm with neural-network (NN) method to improve the sequence-based prediction of protein secondary structures. The secondary structures of 30 target proteins (test set) were assigned into alpha-helices, beta-strands, and others by the DSSP program based on their X-ray crystal structures. Combining the alpha-helix and beta-strand contents estimated from the VUVCD spectra of the target proteins improved the overall sequence-based predictive accuracy Q(3) for three secondary-structure components from 59.5 to 60.7%. Incorporating the position-specific scoring matrix in the NN method improved the predictive accuracy from 70.9 to 72.1% when combining the secondary-structure contents, to 72.5% when combining the numbers of segments, and finally to 74.9% when filtering the VUVCD data. Improvement in the sequence-based prediction of secondary structures was also apparent in two other indices of the overall performance: the correlation coefficient (C) and the segment overlap value (SOV). These results suggest that VUVCD data could enhance the predictive accuracy to over 80% when combined with the currently best sequence-prediction algorithms, greatly expanding the applicability of VUVCD spectroscopy to protein structural biology.  相似文献   

5.
In this paper we present a novel approach to membrane protein secondary structure prediction based on the statistical stepwise discriminant analysis method. A new aspect of our approach is the possibility to derive physical-chemical properties that may affect the formation of membrane protein secondary structure. The certain physical-chemical properties of protein chains can be used to clarify the formation of the secondary structure types under consideration. Another aspect of our approach is that the results of multiple sequence alignment, or the other kinds of sequence alignment, are not used in the frame of the method. Using our approach, we predicted the formation of three main secondary structure types (alpha-helix, beta-structure and coil) with high accuracy, that is Q(3) = 76%. Predicting the formation of alpha-helix and non-alpha-helix states we reached the accuracy which was measured as Q(2) = 86%. Also we have identified certain protein chain properties that affect the formation of membrane protein secondary structure. These protein properties include hydrophobic properties of amino acid residues, presence of Gly, Ala and Val amino acids, and the location of protein chain end.  相似文献   

6.
Secondary structure prediction: combination of three different methods   总被引:13,自引:0,他引:13  
A combination of three complementary secondary structure prediction methods is presented. The methods used are the GOR III method, the Homologue method and a new method, the bit pattern method, which is based on hydrophilic/hydrophobic residue patterns. For this purpose a hydropathy scale was developed and is presented here. The combination algorithm (Combine method) was designed to take the best results of each method and use their differences in order to improve the prediction. The combination yields 65.5% correctly predicted residues in three states: alpha-helix (H), beta-strand (E) and aperiodic structure (C) which is an improvement ranging from 2.5 to 6.5% compared with the individual methods when tested with a 67-polypeptide chain database. Seventy-five per cent of the regular secondary structure (H and E) runs are correctly located and beta-sheet runs are much better located by the Combine method in comparison to the other methods.  相似文献   

7.
About 200 mRNA sequences of Escherichia coli and human with matching protein secondary structure data were studied. The mRNA folding for each native sequence and for corresponding randomized sequences was calculated through free energy minimization. We have found that the folding energy of mRNA segments in different protein secondary structures is significantly different. The average Z score is more negative for regular secondary structure (alpha-helix and beta-strand) than that for coil. This suggests that the codon choice in native mRNA sequence coding for protein regular structure contributes more to the mRNA folding stability.  相似文献   

8.
Improving the prediction of secondary structure of 'TIM-barrel' enzymes.   总被引:1,自引:0,他引:1  
The information contained in aligned sets of homologous protein sequences should improve the score of secondary structure prediction. Seven different enzymes having the (beta/alpha)8 or TIM-barrel fold were used to optimize the prediction with regard to this class of enzymes. The alpha-helix, beta-strand and loop propensities of the Garnier-Osguthorpe-Robson method were averaged at aligned residue positions, leading to a significant improvement over the average score obtained from single sequences. The increased accuracy correlates with the average sequence variability of the aligned set. Further improvements were obtained by using the following averaged properties as weights for the averaged state propensities: amphipathic moment and alpha-helix; hydropathy and beta-strand; chain flexibility and loop. The clustering of conserved residues at the C-terminal ends of the beta-strands was used as an additional positive weight for beta-strand propensity and increased the prediction of otherwise unpredicted beta-strands decisively. The automatic weighted prediction method identifies greater than 95% of the secondary structure elements of the set of seven TIM-barrel enzymes.  相似文献   

9.
For a long time, NMR chemical shifts have been used to identify protein secondary structures. Currently, this is accomplished through comparing the observed (1)H(alpha), (13)C(alpha), (13)C(beta), or (13)C' chemical shifts with the random coil values. Here, we present a new protocol, which is based on the joint probability of each of the three secondary structural types (beta-strand, alpha-helix, and random coil) derived from chemical-shift data, to identify the secondary structure. In combination with empirical smooth filters/functions, this protocol shows significant improvements in the accuracy and the confidence of identification. Updated chemical-shift statistics are reported, on the basis of which the reliability of using chemical shift to identify protein secondary structure is evaluated for each nucleus. The reliability varies greatly among the 20 amino acids, but, on average, is in the order of: (13)C(alpha)>(13)C'>(1)H(alpha)>(13)C(beta)>(15)N>(1)H(N) to distinguish an alpha-helix from a random coil; and (1)H(alpha)>(13)C(beta) >(1)H(N) approximately (13)C(alpha) approximately (13)C' approximately (15)N for a beta-strand from a random coil. Amide (15)N and (1)H(N) chemical shifts, which are generally excluded from the application, in fact, were found to be helpful in distinguishing a beta-strand from a random coil. In addition, the chemical-shift statistical data are compared with those reported previously, and the results are discussed. A JAVA User Interface program has been developed to make the entire procedure fully automated and is available via http://ccsr3150-p3.stanford.edu.  相似文献   

10.
As more and more protein structures are determined, it has become clear that there is only a limited number of protein folds in nature. To explore whether the protein folds found in nature are the only solutions to the protein folding problem, or that a lack of evolutionary pressure causes the paucity of different protein folds found, we set out to construct protein libraries without any restriction on topology. We generated different libraries (all alpha-helix, all beta-strand and alpha-helix plus beta-strand) with an average length of 100 amino acid residues, composed of designed secondary structure modules (alpha-helix, beta-strand and beta-turn) in various proportions, based primarily on the patterning of polar and non-polar residues. From the analysis of proteins chosen randomly from the libraries, we found that a substantial portion of pure alpha-helical proteins show properties similar to native proteins. Using these libraries as a starting point, we aim to establish a selection system which allows us to enrich proteins with favorable folding properties (non-aggregating, compactly folded) from the libraries. We have developed such a method based on ribosome display. This selection is based on two concepts: (1) misfolded proteins are more sensitive to proteolysis, (2) misfolded and/or aggregated proteins are more hydrophobic. We show that by applying each of these selection criteria proteins that are compactly folded and soluble can be enriched over insoluble and random coil proteins.  相似文献   

11.
Hybrid system for protein secondary structure prediction.   总被引:13,自引:0,他引:13  
We have developed a hybrid system to predict the secondary structures (alpha-helix, beta-sheet and coil) of proteins and achieved 66.4% accuracy, with correlation coefficients of C(coil) = 0.429, C alpha = 0.470 and C beta = 0.387. This system contains three subsystems ("experts"): a neural network module, a statistical module and a memory-based reasoning module. First, the three experts independently learn the mapping between amino acid sequences and secondary structures from the known protein structures, then a Combiner learns to combine automatically the outputs of the experts to make final predictions. The hybrid system was tested with 107 protein structures through k-way cross-validation. Its performance was better than each expert and all previously reported methods with greater than 0.99 statistical significance. It was observed that for 20% of the residues, all three experts produced the same but wrong predictions. This may suggest an upper bound on the accuracy of secondary structure predictions based on local information from the currently available protein structures, and indicate places where non-local interactions may play a dominant role in conformation. For 64% of the residues, at least two experts were the same and correct, which shows that the Combiner performed better than majority vote. For 77% of the residues, at least one expert was correct, thus there may still be room for improvement in this hybrid approach. Rigorous evaluation procedures were used in testing the hybrid system, and statistical significance measures were developed in analyzing the differences among different methods. When measured in terms of the number of secondary structures (rather than the number of residues) that were predicted correctly, the prediction produced by the hybrid system was also better than those of individual experts.  相似文献   

12.
Comparison of the primary structures of pancreatic colipases from man, pig, horse and rat shows a high degree of homology between proteins. Fifty-two out of the 95 residues of the polypeptide are identical. All colipases contain 10 half-cystines which are located at invariant positions. The secondary structure of colipases has been predicted from the sequence using the statistical method of Chou and Fasman and the method of Gibrat, Garnier and Robson based on information theory. Predictions indicate that colipases have a low content of alpha-helix and beta-strand structure. The two segments at positions 7-10 and 56-59, assumed to be part of the lipid binding domain, have predicted beta-sheet conformation and should be in close spatial vicinity to each other in the proteins. Four beta-turns are predicted in all colipases at positions 3-6, 46-49, 61-64, and 81-84. They might contribute, with the five disulfide bridges, to a tight packing of the protein molecule. Surface residues and major sequential antigenic determinants of mammalian colipases have been predicted using methods based either on hydrophilicity/hydropathy scales or amino acid mutability. From these studies, it appears that colipases exhibit large conformational homologies. In the absence of data on the tertiary structure of colipase, predictive methods, together with physico-chemical and immunological studies, provide valuable information on the conformation of the protein in relation to the topology of residues involved in the functional and antigenic sites.  相似文献   

13.
The prediction of the secondary structure of proteins from their amino acid sequences remains a key component of many approaches to the protein folding problem. The most abundant form of regular secondary structure in proteins is the alpha-helix, in which specific residue preferences exist at the N-terminal locations. Propensities derived from these observed amino acid frequencies in the Protein Data Bank (PDB) database correlate well with experimental free energies measured for residues at different N-terminal positions in alanine-based peptides. We report a novel method to exploit this data to improve protein secondary structure prediction through identification of the correct N-terminal sequences in alpha-helices, based on existing popular methods for secondary structure prediction. With this algorithm, the number of correctly predicted alpha-helix start positions was improved from 30% to 38%, while the overall prediction accuracy (Q3) remained the same, using cross-validated testing. Although the algorithm was developed and tested on multiple sequence alignment-based secondary structure predictions, it was also able to improve the predictions of start locations by methods that use single sequences to make their predictions. Furthermore, the residue frequencies at N-terminal positions of the improved predictions better reflect those seen at the N-terminal positions of alpha-helices in proteins. This has implications for areas such as comparative modeling, where a more accurate prediction of the N-terminal regions of alpha-helices should benefit attempts to model adjacent loop regions. The algorithm is available as a Web tool, located at http://rocky.bms.umist.ac.uk/elephant.  相似文献   

14.
A simple approach to estimate the number of alpha-helical and beta-strand segments from protein circular dichroism spectra is described. The alpha-helix and beta-sheet conformations in globular protein structures, assigned by DSSP and STRIDE algorithms, were divided into regular and distorted fractions by considering a certain number of terminal residues in a given alpha-helix or beta-strand segment to be distorted. The resulting secondary structure fractions for 29 reference proteins were used in the analyses of circular dichroism spectra by the SELCON method. From the performance indices of the analyses, we determined that, on an average, four residues per alpha-helix and two residues per beta-strand may be considered distorted in proteins. The number of alpha-helical and beta-strand segments and their average length in a given protein were estimated from the fraction of distorted alpha-helix and beta-strand conformations determined from the analysis of circular dichroism spectra. The statistical test for the reference protein set shows the high reliability of such a classification of protein secondary structure. The method was used to analyze the circular dichroism spectra of four additional proteins and the predicted structural characteristics agree with the crystal structure data.  相似文献   

15.
Only a minute fraction of all possible protein sequences can exist in the genomes of all life forms. To explore whether physicochemical constraints or a lack of need causes the paucity of different protein folds, we set out to construct protein libraries without any restriction of topology. We generated different libraries (all alpha-helix, all beta-strand, and alpha-helix plus beta-strand) with an average length of 100 amino acid residues, composed of designed secondary structure modules (alpha-helix, beta-strand, and beta-turn) in various proportions, based primarily on the patterning of polar and nonpolar residues. We wished to explore that part of sequence space that is rich in secondary structure. The analysis of randomly chosen clones from each of the libraries showed that, despite the low sequence homology to known protein sequences, a substantial proportion of the library members containing alpha-helix modules were indeed helical, possess a defined oligomerization state, and showed cooperative chemical unfolding behavior. On the other hand, proteins composed of mainly beta-strand modules tended to form amyloid-like fibrils and were among the least soluble proteins ever reported. We found that a large fraction of members in non-beta-strand-containing protein libraries that are distant from natural proteins in sequence space possess unexpectedly favorable properties. These results reinforce the efficacy of applying binary patterning to design proteins with native-like properties despite lack of restriction in topology. Because of the intrinsic tendency of beta-strand modules to aggregate, their presence requires precise topologic arrangement to prevent fibril formation.  相似文献   

16.
The structure of PsbQ, one of the three main extrinsic proteins associated with the oxygen-evolving complex (OEC) of higher plants and green algae, is examined by Fourier transform infrared (FTIR) and circular dichroic (CD) spectroscopy and by computational structural prediction methods. This protein, together with two other lumenally bound extrinsic proteins, PsbO and PsbP, is essential for the stability and full activity of the OEC in plants. The FTIR spectra obtained in both H(2)O and D(2)O suggest a mainly alpha-helix structure on the basis of the relative areas of the constituents of the amide I and I' bands. The FTIR quantitative analyses indicate that PsbQ contains about 53% alpha-helix, 7% turns, 14% nonordered structure, and 24% beta-strand plus other beta-type extended structures. CD analyses indicate that PsbQ is a mainly alpha-helix protein (about 64%), presenting a small percentage assigned to beta-strand ( approximately 7%) and a larger amount assigned to turns and nonregular structures ( approximately 29%). Independent of the spectroscopic analyses, computational methods for protein structure prediction of PsbQ were utilized. First, a multiple alignment of 12 sequences of PsbQ was obtained after an extensive search in the public databases for protein and EST sequences. Based on this alignment, computational prediction of the secondary structure and the solvent accessibility suggest the presence of two different structural domains in PsbQ: a major C-terminal domain containing four alpha-helices and a minor N-terminal domain with a poorly defined secondary structure enriched in proline and glycine residues. The search for PsbQ analogues by fold recognition methods, not based on the secondary structure, also indicates that PsbQ is a four alpha-helix protein, most probably folding as an up-down bundle. The results obtained by both the spectroscopic and computational methods are in agreement, all indicating that PsbQ is mainly an alpha protein, and show the value of using both methodologies for protein structure investigation.  相似文献   

17.
本文独立地建立了用人工神经元网络预测蛋白质二级结构的方法,并通过分析我们提出的分布矩阵(表达每一类构象被预测成所有各类构象的可能性的矩阵),对于这一方法的误差以及造成误差的可能的原因进行了较过去更为深入的分析.并在此基础上提出了一种修正的学习方法,结果对于规则二级结构(α螺旋和β折叠)的预测精度和相关系数均有提高.  相似文献   

18.
Secondary structure prediction with support vector machines   总被引:8,自引:0,他引:8  
MOTIVATION: A new method that uses support vector machines (SVMs) to predict protein secondary structure is described and evaluated. The study is designed to develop a reliable prediction method using an alternative technique and to investigate the applicability of SVMs to this type of bioinformatics problem. METHODS: Binary SVMs are trained to discriminate between two structural classes. The binary classifiers are combined in several ways to predict multi-class secondary structure. RESULTS: The average three-state prediction accuracy per protein (Q(3)) is estimated by cross-validation to be 77.07 +/- 0.26% with a segment overlap (Sov) score of 73.32 +/- 0.39%. The SVM performs similarly to the 'state-of-the-art' PSIPRED prediction method on a non-homologous test set of 121 proteins despite being trained on substantially fewer examples. A simple consensus of the SVM, PSIPRED and PROFsec achieves significantly higher prediction accuracy than the individual methods.  相似文献   

19.
The analysis of protein structure using secondary structure line segments has been widely used in many structure analysis and prediction methods over the past 20 years. Its use in methods that compare protein structures at this level of representation is becoming more important as an increasing number of protein structures become determined through structural genomic programmes. The standard method used to define line segments is to fit an axis through each secondary structure element. This approach has difficulties, however, both with inconsistent definitions of secondary structure and the problem of fitting a single straight line to a bent structure. The procedure described here avoids these problems by finding a set of line segments independently of any external secondary structure definition. This allows the segments to be used as a novel basis for secondary structure definition by taking the average rise/residue along each axis to characterise the segment. This practice has the advantage that secondary structures are described by a single (continuous) value that is not restricted to the conventional classes of alpha-helix, 310 and beta-strand. This latter property allows structures without "classic" secondary structures to be encoded as line segments that can be used in comparison algorithms. When compared over a large number of pairs of homologous proteins, the current method was found to be slightly more consistent than a widely used method based on hydrogen bonds.  相似文献   

20.
Amino acid propensities for secondary structures were used since the 1970s, when Chou and Fasman evaluated them within datasets of few tens of proteins and developed a method to predict secondary structure of proteins, still in use despite prediction methods having evolved to very different approaches and higher reliability. Propensity for secondary structures represents an intrinsic property of amino acid, and it is used for generating new algorithms and prediction methods, therefore our work has been aimed to investigate what is the best protein dataset to evaluate the amino acid propensities, either larger but not homogeneous or smaller but homogeneous sets, i.e., all-alpha, all-beta, alpha-beta proteins. As a first analysis, we evaluated amino acid propensities for helix, beta-strand, and coil in more than 2000 proteins from the PDBselect dataset. With these propensities, secondary structure predictions performed with a method very similar to that of Chou and Fasman gave us results better than the original one, based on propensities derived from the few tens of X-ray protein structures available in the 1970s. In a refined analysis, we subdivided the PDBselect dataset of proteins in three secondary structural classes, i.e., all-alpha, all-beta, and alpha-beta proteins. For each class, the amino acid propensities for helix, beta-strand, and coil have been calculated and used to predict secondary structure elements for proteins belonging to the same class by using resubstitution and jackknife tests. This second round of predictions further improved the results of the first round. Therefore, amino acid propensities for secondary structures became more reliable depending on the degree of homogeneity of the protein dataset used to evaluate them. Indeed, our results indicate also that all algorithms using propensities for secondary structure can be still improved to obtain better predictive results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号