首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We have modified and improved the GOR algorithm for the protein secondary structure prediction by using the evolutionary information provided by multiple sequence alignments, adding triplet statistics, and optimizing various parameters. We have expanded the database used to include the 513 non-redundant domains collected recently by Cuff and Barton (Proteins 1999;34:508-519; Proteins 2000;40:502-511). We have introduced a variable size window that allowed us to include sequences as short as 20-30 residues. A significant improvement over the previous versions of GOR algorithm was obtained by combining the PSI-BLAST multiple sequence alignments with the GOR method. The new algorithm will form the basis for the future GOR V release on an online prediction server. The average accuracy of the prediction of secondary structure with multiple sequence alignment and full jack-knife procedure was 73.5%. The accuracy of the prediction increases to 74.2% by limiting the prediction to 375 (of 513) sequences having at least 50 PSI-BLAST alignments. The average accuracy of the prediction of the new improved program without using multiple sequence alignments was 67.5%. This is approximately a 3% improvement over the preceding GOR IV algorithm (Garnier J, Gibrat JF, Robson B. Methods Enzymol 1996;266:540-553; Kloczkowski A, Ting K-L, Jernigan RL, Garnier J. Polymer 2002;43:441-449). We have discussed alternatives to the segment overlap (Sov) coefficient proposed by Zemla et al. (Proteins 1999;34:220-223).  相似文献   

2.
A procedure to recognize super-secondary structure in protein sequences is described. An idealized template, derived from known super-secondary structures, is used to locate probable sites by matching with secondary structure probability profiles. We applied the method to the identification of βαβ units in β/α type proteins with 75% accuracy. The location of super-secondary structure was then used to refine the original (Garnier et al., 1978) secondary structure prediction resulting in an 8.8% improvement, which correctly assigned 83% of secondary structure elements in 14 proteins. Slight modifications to the Garnier et al. method arc suggested, producing a more accurate identification of protein class and a better prediction for β/α. type proteins. A method for the incorporation of hydrophobic information into the prediction is also described.  相似文献   

3.
MOTIVATION: How critical is the sequence order information in predicting protein secondary structure segments? We tried to get a rough insight on it from a theoretical approach using both a prediction algorithm and structural fragments from Protein Databank (PDB). RESULTS: Using reverse protein sequences and PDB structural fragments, we theoretically estimated the significance of the order for protein secondary structure and prediction. On average: (1) 79% of protein sequence segments resulted in the same prediction in both normal and reverse directions, which indicated a relatively high conservation of secondary structure propensity in the reverse direction; (2) the reversed sequence prediction alone performed less accurately than the normal forward sequence prediction, but comparably high (2% difference); (3) the commonly predicted regions showed a slightly higher prediction accuracy (4%) than the normal sequences prediction; and (4) structural fragments which have counterparts in reverse direction in the same protein showed a comparable degree of secondary structure conservation (73% identity with reversed structures on average for pentamers). CONTACT: jong@biosophy.org; dietmann@ebi.ac.uk; heger@ebi.ac.uk; holm@ebi.ac.uk  相似文献   

4.
Protein structure prediction   总被引:4,自引:0,他引:4  
J Garnier 《Biochimie》1990,72(8):513-524
Current methods developed for predicting protein structure are reviewed. The most widely used algorithms of Chou and Fasman and Garnier et al for predicting secondary structure are compared to the most recent ones including sequence similarity methods, neural network, pattern recognition or joint prediction methods. The best of these methods correctly predict 63-65% of the residues in the database with cross-validation for 3 conformations, helix, beta strand and coli with a standard deviation of 6-8% per protein. However, when a homologous protein is already in the database, the accuracy of prediction by the similarity peptide method of Levin and Garnier reaches about 90%. Some conclusions can be drawn on the mechanism of protein folding. As all the prediction methods only use the local sequence for prediction (+/- 8 residues maximum) one can infer that 65% of the conformation of a residue is dictated on average by the local sequence, the rest is brought by the folding. The best predicted proteins or peptide segments are those for which the folding has less effect on the conformation. Presently, prediction of tertiary structure is only of practical use when the structure of a homologous protein is already known. Amino acid alignment to define residues of equivalent spatial position is critical for modelling of the protein. We showed for serine proteases that secondary structure prediction can help to define a better alignment. Non-homologous segments of the polypeptide chain, such as loops, libraries of known loops and/or energy minimization with various force fields, are used without yet giving satisfactory solutions. An example of modelling by homology, aided by secondary structure prediction on 2 regulatory proteins, Fnr and FixK is presented.  相似文献   

5.
A method is presented for predicting the secondary structureof globular proteins from their amino acid sequence. It is basedon a rigorous statistical exploitation of the well-known biologicalfact that the amino acid compositions of each secondary structureare different. We also propose an evaluation process that allowsus to estimate the capacity of a method to predict the secondarystructure of a new protein which does not have any homologousproteins whose structure is already known. This evaluation processshows that our method has a prediction accuracy of 58.7% overthree states for the 62 proteins of the Kabsch and Sander (1983a)data bank. This result is better than that obtained by the mostwidely used methods—Lim (1974), Chou and Fasman (1978)and Garnier et al. (1978)—and also than that obtainedby a recent method based on local homologies (Levin et al.,1986). Our prediction method is very simple and may be implementedon any microcomputer and even on programmable pocket calculators.A simple Pascal implementation of the method prediction algorithmis given. The interpretation of our results in terms of proteinfolding and directions for further work are discussed. Received on December 15, 1987; accepted on April 12, 1988  相似文献   

6.
The secondary and tertiary structure of recombinant human acidic fibroblast growth factor (aFGF) has been characterized by a variety of spectroscopic methods. Native aFGF consists of ca. 55% beta-sheet, 20% turn, 10% alpha-helix, and 15% disordered polypeptide as determined by laser Raman, circular dichroism, and Fourier transform infrared spectroscopy; the experimentally determined secondary structure content is in agreement with that calculated by the semi-empirical methods of Chou and Fasman (Chou, P. Y., and Fasman, G. C., 1974, Biochemistry 13, 222-244) and Garnier et al. (Garnier, J. O., et al., 1978, J. Mol. Biol. 120, 97-120). Using the Garnier et al. algorithm, the major secondary structure components of aFGF have been assigned to specific regions of the polypeptide chain. The fluorescence spectrum of native aFGF is unusual in that it is dominated by tyrosine fluorescence despite the presence of a tryptophan residue in the protein. However, tryptophan fluorescence is resolved upon excitation above 295 nm. The degree of tyrosine and tryptophan solvent exposure has been assessed by a combination of ultraviolet absorption, laser Raman, and fluorescence spectroscopy; the results suggest that seven of the eight tyrosine residues are solvent exposed while the single tryptophan is partially inaccessible to solvent in native aFGF, consistent with recent crystallographic data. Denaturation of aFGF by extremes of temperature or pH leads to spectroscopically distinct conformational states in which contributions of tyrosine and tryptophan to the fluorescence spectrum of the protein vary. The protein is unstable at physiological temperatures. Addition of heparin or other sulfated polysaccharides does not affect the spectroscopic characteristics of native aFGF. These polymers do, however, dramatically stabilize the native protein against thermal and acid denaturation as determined by differential scanning calorimetry, circular dichroism, and fluorescence spectroscopy. The interaction of aFGF with such polyanions may play a role in controlling the activity of this growth factor in vivo.  相似文献   

7.
The prediction of the secondary structure of proteins from their amino acid sequences remains a key component of many approaches to the protein folding problem. The most abundant form of regular secondary structure in proteins is the alpha-helix, in which specific residue preferences exist at the N-terminal locations. Propensities derived from these observed amino acid frequencies in the Protein Data Bank (PDB) database correlate well with experimental free energies measured for residues at different N-terminal positions in alanine-based peptides. We report a novel method to exploit this data to improve protein secondary structure prediction through identification of the correct N-terminal sequences in alpha-helices, based on existing popular methods for secondary structure prediction. With this algorithm, the number of correctly predicted alpha-helix start positions was improved from 30% to 38%, while the overall prediction accuracy (Q3) remained the same, using cross-validated testing. Although the algorithm was developed and tested on multiple sequence alignment-based secondary structure predictions, it was also able to improve the predictions of start locations by methods that use single sequences to make their predictions. Furthermore, the residue frequencies at N-terminal positions of the improved predictions better reflect those seen at the N-terminal positions of alpha-helices in proteins. This has implications for areas such as comparative modeling, where a more accurate prediction of the N-terminal regions of alpha-helices should benefit attempts to model adjacent loop regions. The algorithm is available as a Web tool, located at http://rocky.bms.umist.ac.uk/elephant.  相似文献   

8.
The integrins are α/β heterodimeric proteins which mediate cell-matrix and cell-cell inter-actions. Current data indicate that the N-terminal moiety of the a subunit is involved in ligand binding. This region of the receptor is made up of a seven-fold repeated sequence of unknown structure which contains EF-hand-like putative divalent cation-binding sites. Recent studies have shown that multiple sequence alignments can be analysed to yield secondary structure predictions. Therefore, to obtain a model structure for the integrin a subunit N-terminal domain repeat, a large alignment of the seven repeats from sixteen integrin sequences was generated. Two methods of analysis were used: First, Chou and Fasman and Garnier, Osguthorpe and Robson predictions were carried out for individual sequences and the consensus predictions derived. Consensus hydrophobicity and chain flexibility data were also used to provide additional data. Second, sites of conservation and variation were analysed by a computer program STAMA (STructure After Multiple Alignment) to yield a secondary structure prediction. The two analyses gave essentially the same predicted structure: undefined region, loop, α-helix, β-strand, divalent cation-binding loop, β-strand, putative turn, loop, β-strand. This is the first model structure to be presented for an integrin domain. Its implications for integrin function are discussed.  相似文献   

9.
10.
The tertiary structure of the alpha-subunit of tryptophan synthase was proposed using a combination of experimental data and computational methods. The vacuum-ultraviolet circular dichroism spectrum was used to assign the protein to the alpha/beta-class of supersecondary structures. The two-domain structure of the alpha-subunit (Miles et al.: Biochemistry 21:2586, 1982; Beasty and Matthews: Biochemistry 24:3547, 1985) eliminated consideration of a barrel structure and focused attention on a beta-sheet structure. An algorithm (Cohen et al.: Biochemistry 22:4894, 1983) was used to generate a secondary structure prediction that was consistent with the sequence data of the alpha-subunit from five species. Three potential secondary structures were then packed into tertiary structures using other algorithms. The assumption of nearest neighbors from second-site revertant data eliminated 97% of the possible tertiary structures; consideration of conserved hydrophobic packing regions on the beta-sheet eliminated all but one structure. The native structure is predicted to have a parallel beta-sheet flanked on both sides by alpha-helices, and is consistent with the available data on chemical cross-linking, chemical modification, and limited proteolysis. In addition, an active site region containing appropriate residues could be identified as well as an interface for beta 2-subunit association. The ability of experimental data to facilitate the prediction of protein structure is discussed.  相似文献   

11.
The sequences of several members of the myosin family of molecular motors are evaluated using ASP (Ambivalent Structure Predictor), a new computational method. ASP predicts structurally ambivalent sequence elements by analyzing the output from a secondary structure prediction algorithm. These ambivalent sequence elements form secondary structures that are hypothesized to function as switches by undergoing conformational rearrangement. For chicken skeletal muscle myosin, 13 discrete structurally ambivalent sequence elements are identified. All 13 are located in the heavy chain motor domain. When these sequence elements are mapped into the myosin tertiary structure, they form two compact regions that connect the actin binding site to the adenosine 5'-triphosphate (ATP) site, and the ATP site to the fulcrum site for the force-producing bending of the motor domain. These regions, predicted by the new algorithm to undergo conformational rearrangements, include the published known and putative switches of the myosin motor domain, and they form plausible allosteric connections between the three main functional sites of myosin. The sequences of several other members of the myosin I and II families are also analyzed.  相似文献   

12.
Protein sequence design is a natural inverse problem to protein structure prediction: given a target structure in three dimensions, we wish to design an amino acid sequence that is likely fold to it. A model of Sun, Brem, Chan, and Dill casts this problem as an optimization on a space of sequences of hydrophobic (H) and polar (P) monomers; the goal is to find a sequence that achieves a dense hydrophobic core with few solvent-exposed hydrophobic residues. Sun et al. developed a heuristic method to search the space of sequences, without a guarantee of optimality or near-optimality; Hart subsequently raised the computational tractability of constructing an optimal sequence in this model as an open question. Here we resolve this question by providing an efficient algorithm to construct optimal sequences; our algorithm has a polynomial running time, and performs very efficiently in practice. We illustrate the implementation of our method on structures drawn from the Protein Data Bank. We also consider extensions of the model to larger amino acid alphabets, as a way to overcome the limitations of the binary H/P alphabet. We show that for a natural class of arbitrarily large alphabets, it remains possible to design optimal sequences efficiently. Finally, we analyze some of the consequences of this sequence design model for the study of evolutionary fitness landscapes. A given target structure may have many sequences that are optimal in the model of Sun et al.; following a notion raised by the work of J. Maynard Smith, we can ask whether these optimal sequences are "connected" by successive point mutations. We provide a polynomial-time algorithm to decide this connectedness property, relative to a given target structure. We develop the algorithm by first solving an analogous problem expressed in terms of submodular functions, a fundamental object of study in combinatorial optimization.  相似文献   

13.
Computational tools for prediction of the secondary structure of two or more interacting nucleic acid molecules are useful for understanding mechanisms for ribozyme function, determining the affinity of an oligonucleotide primer to its target, and designing good antisense oligonucleotides, novel ribozymes, DNA code words, or nanostructures. Here, we introduce new algorithms for prediction of the minimum free energy pseudoknot-free secondary structure of two or more nucleic acid molecules, and for prediction of alternative low-energy (sub-optimal) secondary structures for two nucleic acid molecules. We provide a comprehensive analysis of our predictions against secondary structures of interacting RNA molecules drawn from the literature. Analysis of our tools on 17 sequences of up to 200 nucleotides that do not form pseudoknots shows that they have 79% accuracy, on average, for the minimum free energy predictions. When the best of 100 sub-optimal foldings is taken, the average accuracy increases to 91%. The accuracy decreases as the sequences increase in length and as the number of pseudoknots and tertiary interactions increases. Our algorithms extend the free energy minimization algorithm of Zuker and Stiegler for secondary structure prediction, and the sub-optimal folding algorithm by Wuchty et al. Implementations of our algorithms are freely available in the package MultiRNAFold.  相似文献   

14.
MOTIVATION: Membrane-bound proteins are a special class of proteins. The regions that insert into the cell-membrane have a profoundly different hydrophobicity pattern compared with soluble proteins. Multiple alignment techniques use scoring schemes tailored for sequences of soluble proteins and are therefore in principle not optimal to align membrane-bound proteins. RESULTS: Transmembrane (TM) regions in protein sequences can be reliably recognized using state-of-the-art sequence prediction techniques. Furthermore, membrane-specific scoring matrices are available. We have developed a new alignment method, called PRALINETM, which integrates these two features to enhance multiple sequence alignment. We tested our algorithm on the TM alignment benchmark set by Bahr et al. (2001), and showed that the quality of TM alignments can be significantly improved compared with the quality produced by a standard multiple alignment technique. The results clearly indicate that the incorporation of these new elements into current state-of-the-art alignment methods is crucial for optimizing the alignment of TM proteins. AVAILABILITY: A webserver is available at http://www.ibi.vu.nl/programs/pralinewww.  相似文献   

15.
We have predicted the secondary structures of four beta-lactamases (Bacillus cereus, Bacillus licheniformis, Staphylococcus aureus, and Escherichia coli R-TEM) by the statistical method of Chou & Fasman as well as by the information theory method of Garnier et al. The secondary structures of all four beta-lactamases are of the alpha/beta type (Levitt & Chothia's nomenclature), with helices at N- and C-termini. There are about eight short regions each of alpha-helical (30--50%) and beta-strand (10--20%) structure separated by about 20 reverse turns. The conformation of the Gram-positive and Gram-negative beta-lactamases are generally similar although a few differences are predicted between the S.aureus and E.coli structures. Surprisingly, the two bacilli structures differ significantly in three short regions. In all four enzymes the region near the catalytically-implicated tyrosine has similar secondary structure. The secondary structure of hen egg white lysozyme, a penicillin-binding enzyme, as well as T4 phage lysozyme, has similarities to the N-terminal half of the penicillin-destroying beta-lactamases.  相似文献   

16.
Gao J  Faraggi E  Zhou Y  Ruan J  Kurgan L 《PloS one》2012,7(6):e40104
Accurate identification of immunogenic regions in a given antigen chain is a difficult and actively pursued problem. Although accurate predictors for T-cell epitopes are already in place, the prediction of the B-cell epitopes requires further research. We overview the available approaches for the prediction of B-cell epitopes and propose a novel and accurate sequence-based solution. Our BEST (B-cell Epitope prediction using Support vector machine Tool) method predicts epitopes from antigen sequences, in contrast to some method that predict only from short sequence fragments, using a new architecture based on averaging selected scores generated from sliding 20-mers by a Support Vector Machine (SVM). The SVM predictor utilizes a comprehensive and custom designed set of inputs generated by combining information derived from the chain, sequence conservation, similarity to known (training) epitopes, and predicted secondary structure and relative solvent accessibility. Empirical evaluation on benchmark datasets demonstrates that BEST outperforms several modern sequence-based B-cell epitope predictors including ABCPred, method by Chen et al. (2007), BCPred, COBEpro, BayesB, and CBTOPE, when considering the predictions from antigen chains and from the chain fragments. Our method obtains a cross-validated area under the receiver operating characteristic curve (AUC) for the fragment-based prediction at 0.81 and 0.85, depending on the dataset. The AUCs of BEST on the benchmark sets of full antigen chains equal 0.57 and 0.6, which is significantly and slightly better than the next best method we tested. We also present case studies to contrast the propensity profiles generated by BEST and several other methods.  相似文献   

17.
A novel method is presented for joint prediction of alignment and common secondary structures of two RNA sequences. The joint consideration of common secondary structures and alignment is accomplished by structural alignment over a search space defined by the newly introduced motif called matched helical regions. The matched helical region formulation generalizes previously employed constraints for structural alignment and thereby better accommodates the structural variability within RNA families. A probabilistic model based on pseudo free energies obtained from precomputed base pairing and alignment probabilities is utilized for scoring structural alignments. Maximum a posteriori (MAP) common secondary structures, sequence alignment and joint posterior probabilities of base pairing are obtained from the model via a dynamic programming algorithm called PARTS. The advantage of the more general structural alignment of PARTS is seen in secondary structure predictions for the RNase P family. For this family, the PARTS MAP predictions of secondary structures and alignment perform significantly better than prior methods that utilize a more restrictive structural alignment model. For the tRNA and 5S rRNA families, the richer structural alignment model of PARTS does not offer a benefit and the method therefore performs comparably with existing alternatives. For all RNA families studied, the posterior probability estimates obtained from PARTS offer an improvement over posterior probability estimates from a single sequence prediction. When considering the base pairings predicted over a threshold value of confidence, the combination of sensitivity and positive predictive value is superior for PARTS than for the single sequence prediction. PARTS source code is available for download under the GNU public license at http://rna.urmc.rochester.edu.  相似文献   

18.
A novel algorithm is proposed for predicting transmembrane protein secondary structure from two-dimensional vector trajectories consisting of a hydropathy index and formal charge of a test amino acid sequence using stochastic dynamical system models. Two prediction problems are discussed. One is the prediction of transmembrane region counts; another is that of transmembrane regions, i.e. predicting whether or not each amino acid belongs to a transmembrane region. The prediction accuracies, using a collection of well-characterized transmembrane protein sequences and benchmarking sequences, suggest that the proposed algorithm performs reasonably well. An experiment was performed with a glutamate transporter homologue from Pyrococcus horikoshii. The predicted transmembrane regions of the five human glutamate transporter sequences and observations based on the computed likelihood are reported.  相似文献   

19.
A protein secondary structure prediction method from multiply aligned homologous sequences is presented with an overall per residue three-state accuracy of 70.1%. There are two aims: to obtain high accuracy by identification of a set of concepts important for prediction followed by use of linear statistics; and to provide insight into the folding process. The important concepts in secondary structure prediction are identified as: residue conformational propensities, sequence edge effects, moments of hydrophobicity, position of insertions and deletions in aligned homologous sequence, moments of conservation, auto-correlation, residue ratios, secondary structure feedback effects, and filtering. Explicit use of edge effects, moments of conservation, and auto-correlation are new to this paper. The relative importance of the concepts used in prediction was analyzed by stepwise addition of information and examination of weights in the discrimination function. The simple and explicit structure of the prediction allows the method to be reimplemented easily. The accuracy of a prediction is predictable a priori. This permits evaluation of the utility of the prediction: 10% of the chains predicted were identified correctly as having a mean accuracy of > 80%. Existing high-accuracy prediction methods are "black-box" predictors based on complex nonlinear statistics (e.g., neural networks in PHD: Rost & Sander, 1993a). For medium- to short-length chains (> or = 90 residues and < 170 residues), the prediction method is significantly more accurate (P < 0.01) than the PHD algorithm (probably the most commonly used algorithm). In combination with the PHD, an algorithm is formed that is significantly more accurate than either method, with an estimated overall three-state accuracy of 72.4%, the highest accuracy reported for any prediction method.  相似文献   

20.
P Ponte  S Y Ng  J Engel  P Gunning    L Kedes 《Nucleic acids research》1984,12(3):1687-1696
We report the complete nucleotide sequence of a human beta actin cDNA. Both the 5' and 3' untranslated regions of the sequence are similar (greater than 80%) to the analogous regions of the rat beta-actin gene reported by Nudel et al (1983). When a segment of the 3' untranslated region is used as a radiolabelled probe, strong hybridization to chick beta actin mRNA is seen. This conservation of sequences suggests that strong selective pressures operate on non-translated segments of beta actin mRNA.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号