首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
Protein chemical shifts encode detailed structural information that is difficult and computationally costly to describe at a fundamental level. Statistical and machine learning approaches have been used to infer correlations between chemical shifts and secondary structure from experimental chemical shifts. These methods range from simple statistics such as the chemical shift index to complex methods using neural networks. Notwithstanding their higher accuracy, more complex approaches tend to obscure the relationship between secondary structure and chemical shift and often involve many parameters that need to be trained. We present hidden Markov models (HMMs) with Gaussian emission probabilities to model the dependence between protein chemical shifts and secondary structure. The continuous emission probabilities are modeled as conditional probabilities for a given amino acid and secondary structure type. Using these distributions as outputs of first‐ and second‐order HMMs, we achieve a prediction accuracy of 82.3%, which is competitive with existing methods for predicting secondary structure from protein chemical shifts. Incorporation of sequence‐based secondary structure prediction into our HMM improves the prediction accuracy to 84.0%. Our findings suggest that an HMM with correlated Gaussian distributions conditioned on the secondary structure provides an adequate generative model of chemical shifts. Proteins 2013; © 2012 Wiley Periodicals, Inc.  相似文献   

2.
The combination of the wide availability of protein backbone and side-chain NMR chemical shifts with advances in understanding of their relationship to protein structure makes these parameters useful for the assessment of structural-dynamic protein models. A new chemical shift predictor (PPM) is introduced, which is solely based on physical?Cchemical contributions to the chemical shifts for both the protein backbone and methyl-bearing amino-acid side chains. To explicitly account for the effects of protein dynamics on chemical shifts, PPM was directly refined against 100?ns long molecular dynamics (MD) simulations of 35 proteins with known experimental NMR chemical shifts. It is found that the prediction of methyl-proton chemical shifts by PPM from MD ensembles is improved over other methods, while backbone C??, C??, C??, N, and HN chemical shifts are predicted at an accuracy comparable to the latest generation of chemical shift prediction programs. PPM is particularly suitable for the rapid evaluation of large protein conformational ensembles on their consistency with experimental NMR data and the possible improvement of protein force fields from chemical shifts.  相似文献   

3.
Accurate determination of protein secondary structure from the chemical shift information is a key step for NMR tertiary structure determination. Relatively few work has been done on this subject. There needs to be a systematic investigation of algorithms that are (a) robust for large datasets; (b) easily extendable to (the dynamic) new databases; and (c) approaching to the limit of accuracy. We introduce new approaches using k-nearest neighbor algorithm to do the basic prediction and use the BCJR algorithm to smooth the predictions and combine different predictions from chemical shifts and based on sequence information only. Our new system, SUCCES, improves the accuracy of all existing methods on a large dataset of 805 proteins (at 86% Q(3) accuracy and at 92.6% accuracy when the boundary residues are ignored), and it is easily extendable to any new dataset without requiring any new training. The software is publicly available at http://monod.uwaterloo.ca/nmr/succes.  相似文献   

4.
Fan GL  Li QZ 《Amino acids》2012,43(2):545-555
Knowledge of the submitochondria location of protein is integral to understanding its function and a necessity in the proteomics era. In this work, a new submitochondria data set is constructed, and an approach for predicting protein submitochondria locations is proposed by combining the amino acid composition, dipeptide composition, reduced physicochemical properties, gene ontology, evolutionary information, and pseudo-average chemical shift. The overall prediction accuracy is 93.57% for the submitochondria location and 97.79% for the three membrane protein types in the mitochondria inner membrane using the algorithm of the increment of diversity combined with the support vector machine. The performance of the pseudo-average chemical shift is excellent. For contrast, the method is also used to predict submitochondria locations in the data set constructed by Du and Li; an accuracy of 94.95% is obtained by our method, which is better than that of other existing methods.  相似文献   

5.
Information about the secondary and tertiary structure of a protein sequence can greatly assist biologists in the generation and testing of hypotheses, as well as design of experiments. The PROTINFO server enables users to submit a protein sequence and request a prediction of the three-dimensional (tertiary) structure based on comparative modeling, fold generation and de novo methods developed by the authors. In addition, users can submit NMR chemical shift data and request protein secondary structure assignment that is based on using neural networks to combine the chemical shifts with secondary structure predictions. The server is available at http://protinfo.compbio.washington.edu.  相似文献   

6.
Bock C  Hesser J 《In silico biology》2006,6(1-2):131-145
High sequence identity between two proteins (e.g. > 60%) is a strong evidence for high structural similarity. However, internal shifts in one of the two proteins can sometimes give rise to unexpectedly high structural differences. This, in turn, causes unreliable structure predictions when two such proteins are used in homology modeling. Here, we perform a computational analysis of helix shifts and we show that their occurrence can be predicted with statistical learning methods. Our results indicate that helix shifts increase the RMS error by factor 2.6 compared to those protein pairs without a helix shift. Although helix shifts are rare (1.6% of helices and a commensurately higher number of proteins are affected), they therefore pose a significant problem for reliable structure prediction systems. In this paper, we prototype a new approach for model quality assessment and demonstrate that it can successfully warn against helix shifts. A support vector machine trained on a wide range of sequence and structure properties predicts the occurrence of helix shifts with a sensitivity of 74.2% and a specificity of 83.6%. On an equalized test dataset, this corresponds to an accuracy of 78.9%. Projected to the full dataset, it translates to an accuracy of 83.4%. Our analysis shows that helix shift detection is a valuable building block for highly reliable structure prediction systems. Furthermore, the statistical learning based approach to helix shift detection that we employ here is orthogonal to well-established model quality assessment methods (which use geometric constraint checking or mean force potentials). Therefore, a further increase of prediction accuracy is expected from the combination of these methods.  相似文献   

7.
8.
By using an unsupervised cluster analyzer, we have identified a local structural alphabet composed of 16 folding patterns of five consecutive C(alpha) ("protein blocks"). The dependence that exists between successive blocks is explicitly taken into account. A Bayesian approach based on the relation protein block-amino acid propensity is used for prediction and leads to a success rate close to 35%. Sharing sequence windows associated with certain blocks into "sequence families" improves the prediction accuracy by 6%. This prediction accuracy exceeds 75% when keeping the first four predicted protein blocks at each site of the protein. In addition, two different strategies are proposed: the first one defines the number of protein blocks in each site needed for respecting a user-fixed prediction accuracy, and alternatively, the second one defines the different protein sites to be predicted with a user-fixed number of blocks and a chosen accuracy. This last strategy applied to the ubiquitin conjugating enzyme (alpha/beta protein) shows that 91% of the sites may be predicted with a prediction accuracy larger than 77% considering only three blocks per site. The prediction strategies proposed improve our knowledge about sequence-structure dependence and should be very useful in ab initio protein modelling.  相似文献   

9.
Protein structure determination using nuclear magnetic resonance (NMR) spectroscopy can be both time-consuming and labor intensive. Here we demonstrate how chemical shift threading can permit rapid, robust, and accurate protein structure determination using only chemical shift data. Threading is a relatively old bioinformatics technique that uses a combination of sequence information and predicted (or experimentally acquired) low-resolution structural data to generate high-resolution 3D protein structures. The key motivations behind using NMR chemical shifts for protein threading lie in the fact that they are easy to measure, they are available prior to 3D structure determination, and they contain vital structural information. The method we have developed uses not only sequence and chemical shift similarity but also chemical shift-derived secondary structure, shift-derived super-secondary structure, and shift-derived accessible surface area to generate a high quality protein structure regardless of the sequence similarity (or lack thereof) to a known structure already in the PDB. The method (called E-Thrifty) was found to be very fast (often?<?10 min/structure) and to significantly outperform other shift-based or threading-based structure determination methods (in terms of top template model accuracy)—with an average TM-score performance of 0.68 (vs. 0.50–0.62 for other methods). Coupled with recent developments in chemical shift refinement, these results suggest that protein structure determination, using only NMR chemical shifts, is becoming increasingly practical and reliable. E-Thrifty is available as a web server at http://ethrifty.ca.  相似文献   

10.
MOTIVATION: Disulfide bonds play an important role in protein folding. A precise prediction of disulfide connectivity can strongly reduce the conformational search space and increase the accuracy in protein structure prediction. Conventional disulfide connectivity predictions use sequence information, and prediction accuracy is limited. Here, by using an alternative scheme with global information for disulfide connectivity prediction, higher performance is obtained with respect to other approaches. RESULT: Cysteine separation profiles have been used to predict the disulfide connectivity of proteins. The separations among oxidized cysteine residues on a protein sequence have been encoded into vectors named cysteine separation profiles (CSPs). Through comparisons of their CSPs, the disulfide connectivity of a test protein is inferred from a non-redundant template set. For non-redundant proteins in SwissProt 39 (SP39) sharing less than 30% sequence identity, the prediction accuracy of a fourfold cross-validation is 49%. The prediction accuracy of disulfide connectivity for proteins in SwissProt 43 (SP43) is even higher (53%). The relationship between the similarity of CSPs and the prediction accuracy is also discussed. The method proposed in this work is relatively simple and can generate higher accuracies compared to conventional methods. It may be also combined with other algorithms for further improvements in protein structure prediction. AVAILABILITY: The program and datasets are available from the authors upon request. CONTACT: cykao@csie.ntu.edu.tw.  相似文献   

11.
Chemical shift prediction has an unappreciated power to guide backbone resonance assignment in cases where protein structure is known. Here we describe Resonance Assignment by chemical Shift Prediction (RASP), a method that exploits this power to derive protein backbone resonance assignments from chemical shift predictions. Robust assignments can be obtained from a minimal set of only the most sensitive triple-resonance experiments, even for spectroscopically challenging proteins. Over a test set of 154 proteins RASP assigns 88 % of residues with an accuracy of 99.7 %, using only information available from HNCO and HNCA spectra. Applied to experimental data from a challenging 34 kDa protein, RASP assigns 90 % of manually assigned residues using only 40 % of the experimental data required for the manual assignment. RASP has the potential to significantly accelerate the backbone assignment process for a wide range of proteins for which structural information is available, including those for which conventional assignment strategies are not feasible.  相似文献   

12.
SimShift: identifying structural similarities from NMR chemical shifts   总被引:3,自引:0,他引:3  
MOTIVATION: An important quantity that arises in NMR spectroscopy experiments is the chemical shift. The interpretation of these data is mostly done by human experts; to our knowledge there are no algorithms that predict protein structure from chemical shift sequences alone. One approach to facilitate this process could be to compare two such sequences, where the structure of one protein has already been resolved. Our claim is that similarity of chemical shifts thereby found implies structural similarity of the respective proteins. RESULTS: We present an algorithm to identify structural similarities of proteins by aligning their associated chemical shift sequences. To evaluate the correctness of our predictions, we propose a benchmark set of protein pairs that have high structural similarity, but low sequence similarity (because with high sequence similarity the structural similarities could easily be detected by a sequence alignment algorithm). We compare our results with those of HHsearch and SSEA and show that our method outperforms both in >50% of all cases.  相似文献   

13.
Subcellular location of protein is constructive information in determining its function, screening for drug candidates, vaccine design, annotation of gene products and in selecting relevant proteins for further studies. Computational prediction of subcellular localization deals with predicting the location of a protein from its amino acid sequence. For a computational localization prediction method to be more accurate, it should exploit all possible relevant biological features that contribute to the subcellular localization. In this work, we extracted the biological features from the full length protein sequence to incorporate more biological information. A new biological feature, distribution of atomic composition is effectively used with, multiple physiochemical properties, amino acid composition, three part amino acid composition, and sequence similarity for predicting the subcellular location of the protein. Support Vector Machines are designed for four modules and prediction is made by a weighted voting system. Our system makes prediction with an accuracy of 100, 82.47, 88.81 for self-consistency test, jackknife test and independent data test respectively. Our results provide evidence that the prediction based on the biological features derived from the full length amino acid sequence gives better accuracy than those derived from N-terminal alone. Considering the features as a distribution within the entire sequence will bring out underlying property distribution to a greater detail to enhance the prediction accuracy.  相似文献   

14.
The construction of a consistent protein chemical shift database is an important step toward making more extensive use of this data in structural studies. Unfortunately, progress in this direction has been hampered by the quality of the available data, particularly with respect to chemical shift referencing, which is often either inaccurate or inconsistently annotated. Preprocessing of the data is therefore required to detect and correct referencing errors. In an earlier study we developed CheckShift, a program for performing this task automatically. Now we spent substantial effort in improving the running time of the CheckShift algorithm, which resulted in an running time decrease of 90%, thereby achieving equivalent quality to the former version of CheckShift. The reason for the running time decrease is twofold. Firstly we improved the search for the optimal re-referencing offset considerably. Secondly, as CheckShift is based on a secondary structure prediction from the amino acid sequence (formally PsiPred was used), we evaluated a wide range of available secondary structure prediction programs focusing on the special needs of the CheckShift algorithm. The results of this evaluation prove empirically that we can use faster secondary structure prediction programs than PsiPred without sacrificing CheckShift’s accuracy. Very recently Wang and Markley (2009) gave a small list of extreme outliers of the former version of the CheckShift web-server. Those were due to the empirical reduction of the search space implemented in the old version. The new version of CheckShift now gives very similar results to RefDB and LACS for all outliers mentioned in Table 1 of Wang and Markley (2009).  相似文献   

15.
Peak overlap is one of the major factors complicating the analysis of biomolecular NMR spectra. We present a general method for predicting the extent of peak overlap in multidimensional NMR spectra and its validation using both, experimental data sets and Monte Carlo simulation. The method is based on knowledge of the magnetization transfer pathways of the NMR experiments and chemical shift statistics from the Biological Magnetic Resonance Data Bank. Assuming a normal distribution with characteristic mean value and standard deviation for the chemical shift of each observable atom, an analytic expression was derived for the expected overlap probability of the cross peaks. The analytical approach was verified to agree with the average peak overlap in a large number of individual peak lists simulated using the same chemical shift statistics. The method was applied to eight proteins, including an intrinsically disordered one, for which the prediction results could be compared with the actual overlap based on the experimentally measured chemical shifts. The extent of overlap predicted using only statistical chemical shift information was in good agreement with the overlap that was observed when the measured shifts were used in the virtual spectrum, except for the intrinsically disordered protein. Since the spectral complexity of a protein NMR spectrum is a crucial factor for protein structure determination, analytical overlap prediction can be used to identify potentially difficult proteins before conducting NMR experiments. Overlap predictions can be tailored to particular classes of proteins by preparing statistics from corresponding protein databases. The method is also suitable for optimizing recording parameters and labeling schemes for NMR experiments and improving the reliability of automated spectra analysis and protein structure determination.  相似文献   

16.
A computer program (ORB) has been developed to predict 1H,13C and 15N NMR chemical shifts of previouslyunassigned proteins. The program makes use of the information contained in achemical shift database of previously assigned proteins supplemented by astatistically derived averaged chemical shift database in which the shifts arecategorized according to their residue, atom and secondary structure type[Wishart et al. (1991) J. Mol. Biol., 222, 311–333]. The predictionprocess starts with a multiple alignment of all previously assigned proteinswith the unassigned query protein. ORB uses the sequence and secondarystructure alignment program XALIGN for this task [Wishart et al. (1994)CABIOS, 10, 121–132; 687–688]. The prediction algorithm in ORB isbased on a scoring of the known shifts for each sequence. The scores dependon global sequence similarity, local sequence similarity, structuralsimilarity and residue similarity and determine how much weight one particularshift is given in the prediction process. In situations where no applicablepreviously assigned chemical shifts are available, the shifts derived from theaveraged database are used. In addition to supplying the user with predictedchemical shifts, ORB calculates a confidence value for every prediction. Theseconfidence values enable the user to judge which predictions are the mostaccurate and they are particularly useful when ORB is incorporated into acomplete autoassignment package. The usefulness of ORB was tested on threemedium-sized proteins: an interleukin-8 analog, a troponin C synthetic peptideheterodimer and cardiac troponin C. Excellent results are obtained if ORB isable to use the chemical shifts of at least one highly homologous sequence.ORB performs well as long as the sequence identity between proteins with knownchemical shifts and the new sequence is not less than 30%.  相似文献   

17.
18.
Structural genomics and its importance for gene function analysis   总被引:8,自引:0,他引:8  
Structural genomics projects aim to solve the experimental structures of all possible protein folds. Such projects entail a conceptual shift from traditional structural biology in which structural information is obtained on known proteins to one in which the structure of a protein is determined first and the function assigned only later. Whereas the goal of converting protein structure into function can be accomplished by traditional sequence motif-based approaches, recent studies have shown that assignment of a protein's biochemical function can also be achieved by scanning its structure for a match to the geometry and chemical identity of a known active site. Importantly, this approach can use low-resolution structures provided by contemporary structure prediction methods. When applied to genomes, structural information (either experimental or predicted) is likely to play an important role in high-throughput function assignment.  相似文献   

19.
Knowledge of protein structural class can provide important information about its folding patterns. Many approaches have been developed for the prediction of protein structural classes. However, the information used by these approaches is primarily based on amino acid sequences. In this study, a novel method is presented to predict protein structural classes by use of chemical shift (CS) information derived from nuclear magnetic resonance spectra. Firstly, 399 non-homologue (about 15% identity) proteins were constructed to investigate the distribution of averaged CS values of six nuclei ((13)CO, (13)Cα, (13)Cβ, (1)HN, (1)Hα and (15)N) in three protein structural classes. Subsequently, support vector machine was proposed to predict three protein structural classes by using averaged CS information of six nuclei. Overall accuracy of jackknife cross-validation achieves 87.0%. Finally, the feature selection technique is applied to exclude redundant information and find out an optimized feature set. Results show that the overall accuracy increased to 88.0% by using the averaged CSs of (13)CO, (1)Hα and (15)N. The proposed approach outperformed other state-of-the-art methods in terms of predictive accuracy in particular for low-similarity protein data. We expect that our proposed approach will be an excellent alternative to traditional methods for protein structural class prediction.  相似文献   

20.
Lee S  Lee BC  Kim D 《Proteins》2006,62(4):1107-1114
Knowing protein structure and inferring its function from the structure are one of the main issues of computational structural biology, and often the first step is studying protein secondary structure. There have been many attempts to predict protein secondary structure contents. Previous attempts assumed that the content of protein secondary structure can be predicted successfully using the information on the amino acid composition of a protein. Recent methods achieved remarkable prediction accuracy by using the expanded composition information. The overall average error of the most successful method is 3.4%. Here, we demonstrate that even if we only use the simple amino acid composition information alone, it is possible to improve the prediction accuracy significantly if the evolutionary information is included. The idea is motivated by the observation that evolutionarily related proteins share the similar structure. After calculating the homolog-averaged amino acid composition of a protein, which can be easily obtained from the multiple sequence alignment by running PSI-BLAST, those 20 numbers are learned by a multiple linear regression, an artificial neural network and a support vector regression. The overall average error of method by a support vector regression is 3.3%. It is remarkable that we obtain the comparable accuracy without utilizing the expanded composition information such as pair-coupled amino acid composition. This work again demonstrates that the amino acid composition is a fundamental characteristic of a protein. It is anticipated that our novel idea can be applied to many areas of protein bioinformatics where the amino acid composition information is utilized, such as subcellular localization prediction, enzyme subclass prediction, domain boundary prediction, signal sequence prediction, and prediction of unfolded segment in a protein sequence, to name a few.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号