期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Easy method to predict solvent accessibility from multiple protein sequence alignments

Stefano Pascarella Roldano De Persio Francesco Bossa Patrick Argos 《Proteins》1998,32(2):190-199

An easy and uncomplicated method to predict the solvent accessibility state of a site in a multiple protein sequence alignment is described. The approach is based on amino acid exchange and compositional preference matrices for each of three accessibility states: buried, exposed, and intermediate. Calculations utilized a modified version of the 3D―ali databank, a collection of multiple sequence alignments anchored through protein tertiary structural superpositions. The technique achieves the same accuracy as much more complex methods and thus provides such advantages as computational affordability, facile updating, and easily understood residue substitution patterns useful to biochemists involved in protein engineering, design, and structural prediction. The program is available from the authors; and, due to its simplicity, the algorithm can be readily implemented on any system. For a given alignment site, a hand calculation can yield a comparative prediction. Proteins 32:190–199, 1998. © 1998 Wiley-Liss, Inc. 相似文献

2.

Predicting protein secondary structure and solvent accessibility with an improved multiple linear regression method

Qin S He Y Pan XM 《Proteins》2005,61(3):473-480

We have improved the multiple linear regression (MLR) algorithm for protein secondary structure prediction by combining it with the evolutionary information provided by multiple sequence alignment of PSI-BLAST. On the CB513 dataset, the three states average overall per-residue accuracy, Q(3), reached 76.4%, while segment overlap accuracy, SOV99, reached 73.2%, using a rigorous jackknife procedure and the strictest reduction of eight states DSSP definition to three states. This represents an improvement of approximately 5% on overall per-residue accuracy compared with previous work. The relative solvent accessibility prediction also benefited from this combination of methods. The system achieved 77.7% average jackknifed accuracy for two states prediction based on a 25% relative solvent accessibility mode, with a Mathews' correlation coefficient of 0.548. The improved MLR secondary structure and relative solvent accessibility prediction server is available at http://spg.biosci.tsinghua.edu.cn/. 相似文献

3.

Real value prediction of solvent accessibility in proteins using multiple sequence alignment and secondary structure

Garg A Kaur H Raghava GP 《Proteins》2005,61(2):318-324

The present study is an attempt to develop a neural network-based method for predicting the real value of solvent accessibility from the sequence using evolutionary information in the form of multiple sequence alignment. In this method, two feed-forward networks with a single hidden layer have been trained with standard back-propagation as a learning algorithm. The Pearson's correlation coefficient increases from 0.53 to 0.63, and mean absolute error decreases from 18.2 to 16% when multiple-sequence alignment obtained from PSI-BLAST is used as input instead of a single sequence. The performance of the method further improves from a correlation coefficient of 0.63 to 0.67 when secondary structure information predicted by PSIPRED is incorporated in the prediction. The final network yields a mean absolute error value of 15.2% between the experimental and predicted values, when tested on two different nonhomologous and nonredundant datasets of varying sizes. The method consists of two steps: (1) in the first step, a sequence-to-structure network is trained with the multiple alignment profiles in the form of PSI-BLAST-generated position-specific scoring matrices, and (2) in the second step, the output obtained from the first network and PSIPRED-predicted secondary structure information is used as an input to the second structure-to-structure network. Based on the present study, a server SARpred (http://www.imtech.res.in/raghava/sarpred/) has been developed that predicts the real value of solvent accessibility of residues for a given protein sequence. We have also evaluated the performance of SARpred on 47 proteins used in CASP6 and achieved a correlation coefficient of 0.68 and a MAE of 15.9% between predicted and observed values. 相似文献

4.

Transmembrane helices predicted at 95% accuracy. 总被引：27，自引：1，他引：27

下载免费PDF全文

B. Rost R. Casadio P. Fariselli C. Sander 《Protein science : a publication of the Protein Society》1995,4(3):521-533

We describe a neural network system that predicts the locations of transmembrane helices in integral membrane proteins. By using evolutionary information as input to the network system, the method significantly improved on a previously published neural network prediction method that had been based on single sequence information. The input data were derived from multiple alignments for each position in a window of 13 adjacent residues: amino acid frequency, conservation weights, number of insertions and deletions, and position of the window with respect to the ends of the protein chain. Additional input was the amino acid composition and length of the whole protein. A rigorous cross-validation test on 69 proteins with experimentally determined locations of transmembrane segments yielded an overall two-state per-residue accuracy of 95%. About 94% of all segments were predicted correctly. When applied to known globular proteins as a negative control, the network system incorrectly predicted fewer than 5% of globular proteins as having transmembrane helices. The method was applied to all 269 open reading frames from the complete yeast VIII chromosome. For 59 of these, at least two transmembrane helices were predicted. Thus, the prediction is that about one-fourth of all proteins from yeast VIII contain one transmembrane helix, and some 20%, more than one. 相似文献

5.

Rapid protein domain assignment from amino acid sequence using predicted secondary structure 总被引：8，自引：0，他引：8

下载免费PDF全文

Marsden RL McGuffin LJ Jones DT 《Protein science : a publication of the Protein Society》2002,11(12):2814-2824

The elucidation of the domain content of a given protein sequence in the absence of determined structure or significant sequence homology to known domains is an important problem in structural biology. Here we address how successfully the delineation of continuous domains can be accomplished in the absence of sequence homology using simple baseline methods, an existing prediction algorithm (Domain Guess by Size), and a newly developed method (DomSSEA). The study was undertaken with a view to measuring the usefulness of these prediction methods in terms of their application to fully automatic domain assignment. Thus, the sensitivity of each domain assignment method was measured by calculating the number of correctly assigned top scoring predictions. We have implemented a new continuous domain identification method using the alignment of predicted secondary structures of target sequences against observed secondary structures of chains with known domain boundaries as assigned by Class Architecture Topology Homology (CATH). Taking top predictions only, the success rate of the method in correctly assigning domain number to the representative chain set is 73.3%. The top prediction for domain number and location of domain boundaries was correct for 24% of the multidomain set (+/-20 residues). These results have been put into context in relation to the results obtained from the other prediction methods assessed. 相似文献

6.

Predicting protein flexibility through the prediction of local structures

Bornot A Etchebest C de Brevern AG 《Proteins》2011,79(3):839-852

相似文献

7.

Spatial features of proteins related to their phosphorylation and associated structural changes

Dmitry A. Karasev Darya A. Veselova Alexander V. Veselovsky Boris N. Sobolev Victor G. Zgoda Alexander I. Archakov 《Proteins》2018,86(1):13-20

Protein phosphorylation is widely used in biological regulatory processes. The study of spatial features related to phosphorylation sites is necessary to increase the efficacy of recognition of phosphorylation patterns in protein sequences. Using the data on phosphosites found in amino acid sequences, we mapped these sites onto 3D structures and studied the structural variability of the same sites in different PDB entries related to the same proteins. Solvent accessibility was calculated for the residues known to be phosphorylated. A significant change in accessibility was shown for many sites, but several ones were determined as buried in all the structures considered. Most phosphosites were found in coil regions. However, a significant portion was located in the structurally stable ordered regions. Comparison of structures with the same sites in modified and unmodified states showed that the region surrounding a site could be significantly shifted due to phosphorylation. Comparison between non‐modified structures (as well as between the modified ones) suggested that phosphorylation stabilizes one of the possible conformations. The local structure around the site could be changed due to phosphorylation, but often the initial conformation of the site surrounding is not altered within bounds of a rather large substructure. In this case, we can observe an extensive displacement within a protein domain. Phosphorylation without structural alteration seems to provide the interface for domain‐domain or protein‐protein interactions. Accounting for structural features is important for revealing more specific patterns of phosphorylation. It is also necessary for explaining structural changes as a basis for regulatory processes. 相似文献

8.

Protein rigidity and thermophilic adaptation

Radestock S Gohlke H 《Proteins》2011,79(4):1089-1108

We probe the hypothesis of corresponding states, according to which homologues from mesophilic and thermophilic organisms are in corresponding states of similar rigidity and flexibility at their respective optimal temperatures. For this, the local distribution of flexible and rigid regions in 19 pairs of homologous proteins from meso- and thermophilic organisms is analyzed and related to activity characteristics of the enzymes by constraint network analysis (CNA). Two pairs of enzymes are considered in more detail: 3-isopropylmalate dehydrogenase and thermolysin-like protease. By comparing microscopic stability features of homologues with the help of stability maps, introduced for the first time, we show that adaptive mutations in enzymes from thermophilic organisms maintain the balance between overall rigidity, important for thermostability, and local flexibility, important for activity, at the appropriate working temperature. Thermophilic adaptation in general leads to an increase of structural rigidity but conserves the distribution of functionally important flexible regions between homologues. This finding provides direct evidence for the hypothesis of corresponding states. CNA thereby implicitly captures and unifies many different mechanisms that contribute to increased thermostability and to activity at high temperatures. This allows to qualitatively relate changes in the flexibility of active site regions, induced either by a temperature change or by the introduction of mutations, to experimentally observed losses of the enzyme function. As for applications, the results demonstrate that exploiting the principle of corresponding states not only allows for successful thermostability optimization but also for guiding experiments in order to improve enzyme activity in protein engineering. 相似文献

9.

Protein folding in mode space: a collective coordinate approach to structure prediction

Abseher R Nilges M 《Proteins》2002,49(3):365-377

相似文献

10.

Beyond the Twilight Zone: Automated prediction of structural properties of proteins by recursive neural networks and remote homology information

Catherine Mooney Gianluca Pollastri 《Proteins》2009,77(1):181-190

The prediction of 1D structural properties of proteins is an important step toward the prediction of protein structure and function, not only in the ab initio case but also when homology information to known structures is available. Despite this the vast majority of 1D predictors do not incorporate homology information into the prediction process. We develop a novel structural alignment method, SAMD, which we use to build alignments of putative remote homologues that we compress into templates of structural frequency profiles. We use these templates as additional input to ensembles of recursive neural networks, which we specialise for the prediction of query sequences that show only remote homology to any Protein Data Bank structure. We predict four 1D structural properties – secondary structure, relative solvent accessibility, backbone structural motifs, and contact density. Secondary structure prediction accuracy, tested by five‐fold cross‐validation on a large set of proteins allowing less than 25% sequence identity between training and test set and query sequences and templates, exceeds 82%, outperforming its ab initio counterpart, other state‐of‐the‐art secondary structure predictors (Jpred 3 and PSIPRED) and two other systems based on PSI‐BLAST and COMPASS templates. We show that structural information from homologues improves prediction accuracy well beyond the Twilight Zone of sequence similarity, even below 5% sequence identity, for all four structural properties. Significant improvement over the extraction of structural information directly from PDB templates suggests that the combination of sequence and template information is more informative than templates alone. Proteins 2009. © 2009 Wiley‐Liss, Inc. 相似文献

11.

Protein local structure prediction from sequence

Hunter CG Subramaniam S 《Proteins》2003,50(4):572-579

A basis set of protein canonical fragments, or centroids, represents the range of local structure found in globular proteins. We develop a methodology to predict centroids from the amino acid sequence. The predictor gives the probability of each centroid in the basis set, at each loci along the backbone. The predictor selects the best-fit centroid at about 40% of the loci. The predicted probabilities are accurate and can be used to judge the confidence of each centroid prediction. For example, when filtering out centroids with <0.50 probability, the predictor is 65% accurate, although such high-probability centroids occur at only 28% of the loci. Centroids with high probability can be interpreted as segments that are highly influenced by the amino acid sequence, whereas centroids with low probability can be interpreted as segments that are more likely influenced by tertiary contacts. Low-resolution, starting point structures, can be generated by fitting the predicted centroids together. 相似文献

12.

Prediction of Membrane Protein Topology Utilizing Multiple Sequence Alignments

Bengt Persson Patrick Argos 《Journal of Protein Chemistry》1997,16(5):453-457

A technique for prediction of protein membrane toplogy (intra- and extraceullular sidedness) has been developed. Membrane-spanning segments are first predicted using an algorithm based upon multiply aligned amino acid sequences. The compositional differences in the protein segments exposed at each side of the membrane are then investigated. The ratios are calculated for Asn, Asp, Gly, Phe, Pro, Trp, Tyr, and Val, mostly found on the extracellular side, and for Ala, Arg, Cys, and Lys, mostly occurring on the intracellular side. The consensus over these 12 residue distributions is used for sidedness prediction. The method was developed with a set of 42 protein families for which all but one were correctly predicted with the new algorithm. This represents an improvement over previous techniques. The new method, applied to a set of 12 membrane protein families different from the test set and with recently determined topologies, performed well, with 11 of 12 sidedness assignments agreeing with experimental results. The method has also been applied to several membrane protein families for which the topology has yet to be determined. An electronic prediction service is available at the E-mail address tmap@embl-heidelberg.de and on WWW via http://www.emblheidelberg.de. 相似文献

13.

Progress of 1D protein structure prediction at last

Burkhard Rost Chris Sander 《Proteins》1995,23(3):295-300

Accuracy of predicting protein secondary structure and solvent accessibility from sequence information has been improved significantly by using information contained in multiple sequence alignments as input to a neural 'network system. For the Asilomar meeting, predictions for 13 proteins were generated automatically using the publicly available prediction method PHD. The results confirm the estimate of 72% three-state prediction accuracy. The fairly accurate predictions of secondary structure segments made the tool useful as a starting point for modeling of higher dimensional aspects of protein structure. © 1995 Wiley-Liss, Inc. 相似文献

14.

Protein language-model embeddings for fast,accurate, and alignment-free protein structure prediction

《Structure (London, England : 1993)》2022,30(8):1169-1177.e4

Download : Download high-res image (147KB)
Download : Download full-size image

相似文献

15.

Prediction and evolutionary information analysis of protein solvent accessibility using multiple linear regression

Wang JY Lee HM Ahmad S 《Proteins》2005,61(3):481-491

A multiple linear regression method was applied to predict real values of solvent accessibility from the sequence and evolutionary information. This method allowed us to obtain coefficients of regression and correlation between the occurrence of an amino-acid residue at a specific target and its sequence neighbor positions on the one hand, and the solvent accessibility of that residue on the other. Our linear regression model based on sequence information and evolutionary models was found to predict residue accessibility with 18.9% and 16.2% mean absolute error respectively, which is better than or comparable to the best available methods. A correlation matrix for several neighbor positions to examine the role of evolutionary information at these positions has been developed and analyzed. As expected, the effective frequency of hydrophobic residues at target positions shows a strong negative correlation with solvent accessibility, whereas the reverse is true for charged and polar residues. The correlation of solvent accessibility with effective frequencies at neighboring positions falls abruptly with distance from target residues. Longer protein chains have been found to be more accurately predicted than their smaller counterparts. 相似文献

16.

Alignments grow, secondary structure prediction improves. 总被引：12，自引：0，他引：12

Dariusz Przybylski Burkhard Rost 《Proteins》2002,46(2):197-205

Using information from sequence alignments significantly improves protein secondary structure prediction. Typically, more divergent profiles yield better predictions. Recently, various groups have shown that accuracy can be improved significantly by using PSI-BLAST profiles to develop new prediction methods. Here, we focused on the influences of various alignment strategies on two 8-year-old PHD methods. The following results stood out. (i) PHD using pairwise alignments predicts about 72% of all residues correctly in one of the three states: helix, strand, and other. Using larger databases and PSI-BLAST raised accuracy to 75%. (ii) More than 60% of the improvement originated from the growth of current sequence databases; about 20% resulted from detailed changes in the alignment procedure (substitution matrix, thresholds, and gap penalties). Another 20% of the improvement resulted from carefully using iterated PSI-BLAST searches. (iii) It is of interest that we failed to improve prediction accuracy further when attempting to refine the alignment by dynamic programming (MaxHom and ClustalW). (iv) Improvement through family growth appears to saturate at some point. However, most families have not reached this saturation. Hence, we anticipate that prediction accuracy will continue to rise with database growth. 相似文献

17.

Combining prediction of secondary structure and solvent accessibility in proteins

Adamczak R Porollo A Meller J 《Proteins》2005,59(3):467-475

Owing to the use of evolutionary information and advanced machine learning protocols, secondary structures of amino acid residues in proteins can be predicted from the primary sequence with more than 75% per-residue accuracy for the 3-state (i.e., helix, beta-strand, and coil) classification problem. In this work we investigate whether further progress may be achieved by incorporating the relative solvent accessibility (RSA) of an amino acid residue as a fingerprint of the overall topology of the protein. Toward that goal, we developed a novel method for secondary structure prediction that uses predicted RSA in addition to attributes derived from evolutionary profiles. Our general approach follows the 2-stage protocol of Rost and Sander, with a number of Elman-type recurrent neural networks (NNs) combined into a consensus predictor. The RSA is predicted using our recently developed regression-based method that provides real-valued RSA, with the overall correlation coefficients between the actual and predicted RSA of about 0.66 in rigorous tests on independent control sets. Using the predicted RSA, we were able to improve the performance of our secondary structure prediction by up to 1.4% and achieved the overall per-residue accuracy between 77.0% and 78.4% for the 3-state classification problem on different control sets comprising, together, 603 proteins without homology to proteins included in the training. The effects of including solvent accessibility depend on the quality of RSA prediction. In the limit of perfect prediction (i.e., when using the actual RSA values derived from known protein structures), the accuracy of secondary structure prediction increases by up to 4%. We also observed that projecting real-valued RSA into 2 discrete classes with the commonly used threshold of 25% RSA decreases the classification accuracy for secondary structure prediction. While the level of improvement of secondary structure prediction may be different for prediction protocols that implicitly account for RSA in other ways, we conclude that an increase in the 3-state classification accuracy may be achieved when combining RSA with a state-of-the-art protocol utilizing evolutionary profiles. The new method is available through a Web server at http://sable.cchmc.org. 相似文献

18.

Using genetic algorithms to select most predictive protein features

Kernytsky A Rost B 《Proteins》2009,75(1):75-88

Many important characteristics of proteins such as biochemical activity and subcellular localization present a challenge to machine-learning methods: it is often difficult to encode the appropriate input features at the residue level for the purpose of making a prediction for the entire protein. The problem is usually that the biophysics of the connection between a machine-learning method's input (sequence feature) and its output (observed phenomenon to be predicted) remains unknown; in other words, we may only know that a certain protein is an enzyme (output) without knowing which region may contain the active site residues (input). The goal then becomes to dissect a protein into a vast set of sequence-derived features and to correlate those features with the desired output. We introduce a framework that begins with a set of global sequence features and then vastly expands the feature space by generically encoding the coexistence of residue-based features. It is this combination of individual features, that is the step from the fractions of serine and buried (input space 20 + 2) to the fraction of buried serine (input space 20 * 2) that implicitly shifts the search space from global feature inputs to features that can capture very local evidence such as a the individual residues of a catalytic triad. The vast feature space created is explored by a genetic algorithm (GA) paired with neural networks and support vector machines. We find that the GA is critical for selecting combinations of features that are neither too general resulting in poor performance, nor too specific, leading to overtraining. The final framework manages to effectively sample a feature space that is far too large for exhaustive enumeration. We demonstrate the power of the concept by applying it to prediction of protein enzymatic activity. 相似文献

19.

Prediction of protein secondary structure from amino acid sequence

Jen Tsi Yang 《Journal of Protein Chemistry》1996,15(2):185-191

The conformational parametersP _k for each amino acid species (j=1–20) of sequential peptides in proteins are presented as the product ofP _i,k, wherei is the number of the sequential residues in thekth conformational state (k=-helix,-sheet,-turn, or unordered structure). Since the average parameter for ann-residue segment is related to the average probability of finding the segment in the kth state, it becomes a geometric mean of (P _k)_av=(P _i,k)^1/n with amino acid residuei increasing from 1 ton. We then used ln(P_k)_av to convert a multiplicative process to a summation, i.e., ln(P _k)_av=(1/n)P _i,k (i=1 ton) for ease of operation. However, this is unlike the popular Chou-Fasman algorithm, which has the flaw of using the arithmetic mean for relative probabilities. The Chou-Fasman algorithm happens to be close to our calculations in many cases mainly because the difference between theirP _k and our InP _k is nearly constant for about one-half of the 20 amino acids. When stronger conformation formers and breakers exist, the difference become larger and the prediction at the N- and C-terminal-helix or-sheet could differ. If the average conformational parameters of the overlapping segments of any two states are too close for a unique solution, our calculations could lead to a different prediction. 相似文献

20.

Discrete analyses of protein dynamics

Tarun Jairaj Narwani Pierrick Craveur Nicolas K. Shinada Aline Floch Hubert Santuz Akhila Melarkode Vattekatte 《Journal of biomolecular structure & dynamics》2020,38(10):2988-3002

Abstract

Protein structures are highly dynamic macromolecules. This dynamics is often analysed through experimental and/or computational methods only for an isolated or a limited number of proteins. Here, we explore large-scale protein dynamics simulation to observe dynamics of local protein conformations using different perspectives. We analysed molecular dynamics to investigate protein flexibility locally, using classical approaches such as RMSf, solvent accessibility, but also innovative approaches such as local entropy. First, we focussed on classical secondary structures and analysed specifically how β-strand, β–turns, and bends evolve during molecular simulations. We underlined interesting specific bias between β–turns and bends, which are considered as the same category, while their dynamics show differences. Second, we used a structural alphabet that is able to approximate every part of the protein structures conformations, namely protein blocks (PBs) to analyse (i) how each initial local protein conformations evolve during dynamics and (ii) if some exchange can exist among these PBs. Interestingly, the results are largely complex than simple regular/rigid and coil/flexible exchange. Abbreviations N_eq number of equivalent

PB Protein Blocks

PDB Protein DataBank

RMSf root mean square fluctuations

Communicated by Ramaswamy H. Sarma 相似文献