首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Lee CF  Makhatadze GI  Wong KB 《Biochemistry》2005,44(51):16817-16825
The ability to rationally engineer a protein with altered stability depends upon the detailed understanding of the role of noncovalent interactions in defining thermodynamic properties of proteins. In this paper, we used T. celer L30e as a model to address the question of the role of charge-charge interactions in defining the stability of this protein. A total of 26 single-site charge-to-alanine variants of this protein were generated, and the stability of these proteins was determined using thermal- and denaturant-induced unfolding. It was found that, although L30e is isolated from a thermophilic organism and is highly thermostable, some of the substitutions lead to a further increase in the transition temperature. Analysis of the effects of high ionic strength on the stabilities of L30e variants shows that the long-range charge-charge interactions are as important as the short-range (salt bridge) interactions. The changes in stabilities of the T. celer L30e protein variants were compared with the changes in the energy of charge-charge interactions calculated using different computational models. It was found that there is a good qualitative agreement between experimental and calculated data: for 70-80% (19-21 of 26, confidence p < 0.003) of the variants, computational models predict correctly the sign of the stability changes. In particular, computational models identify correctly those charged amino acid residue substitutions of which led to enhancement in thermostability. Thus, optimization of the charge-charge interactions might be a useful approach for the rational increase in protein stability.  相似文献   

3.
This paper introduces a new subcellular localization system (TSSub) for eukaryotic proteins. This system extracts features from both profiles and amino acid sequences. Four different features are extracted from profiles by four probabilistic neural network (PNN) classifiers, respectively (the amino acid composition from whole profiles; the amino acid composition from the N-terminus of profiles; the dipeptide composition from whole profiles and the amino acid composition from fragments of profiles). In addition, a support vector machine (SVM) classifier is added to implement the residue-couple feature extracted from amino acid sequences. The results from the five classifiers are fused by an additional SVM classifier. The overall accuracies of this TSSub reach 93.0 and 77.4% on Reinhardt and Hubbard's eukaryotic protein dataset and Huang and Li's eukaryotic protein dataset, respectively. The comparison with existing methods results shows TSSub provides better prediction performance than existing methods. AVAILABILITY: The web server is available from http://166.111.24.5/webtools/TSSub/index.html.  相似文献   

4.

Background

Protein destabilization is a common mechanism by which amino acid substitutions cause human diseases. Although several machine learning methods have been reported for predicting protein stability changes upon amino acid substitutions, the previous studies did not utilize relevant sequence features representing biological knowledge for classifier construction.

Results

In this study, a new machine learning method has been developed for sequence feature-based prediction of protein stability changes upon amino acid substitutions. Support vector machines were trained with data from experimental studies on the free energy change of protein stability upon mutations. To construct accurate classifiers, twenty sequence features were examined for input vector encoding. It was shown that classifier performance varied significantly by using different sequence features. The most accurate classifier in this study was constructed using a combination of six sequence features. This classifier achieved an overall accuracy of 84.59% with 70.29% sensitivity and 90.98% specificity.

Conclusions

Relevant sequence features can be used to accurately predict protein stability changes upon amino acid substitutions. Predictive results at this level of accuracy may provide useful information to distinguish between deleterious and tolerant alterations in disease candidate genes. To make the classifier accessible to the genetics research community, we have developed a new web server, called MuStab (http://bioinfo.ggc.org/mustab/).
  相似文献   

5.
Many non-synonymous SNPs (nsSNPs) are associated with diseases, and numerous machine learning methods have been applied to train classifiers for sorting disease-associated nsSNPs from neutral ones. The continuously accumulated nsSNP data allows us to further explore better prediction approaches. In this work, we partitioned the training data into 20 subsets according to either original or substituted amino acid type at the nsSNP site. Using support vector machine (SVM), training classification models on each subset resulted in an overall accuracy of 76.3% or 74.9% depending on the two different partition criteria, while training on the whole dataset obtained an accuracy of only 72.6%. Moreover, the dataset was also randomly divided into 20 subsets, but the corresponding accuracy was only 73.2%. Our results demonstrated that partitioning the whole training dataset into subsets properly, i.e., according to the residue type at the nsSNP site, will improve the performance of the trained classifiers significantly, which should be valuable in developing better tools for predicting the disease-association of nsSNPs.  相似文献   

6.
A simple theoretical model for increasing the protein stability by adequately redesigning the distribution of charged residues on the surface of the native protein was tested experimentally. Using the molecule of ubiquitin as a model system, we predicted possible amino acid substitutions on the surface of this protein which would lead to an increase in its stability. Experimental validation for this prediction was achieved by measuring the stabilities of single-site-substituted ubiquitin variants using urea-induced unfolding monitored by far-UV CD spectroscopy. We show that the generated variants of ubiquitin are indeed more stable than the wild-type protein, in qualitative agreement with the theoretical prediction. As a positive control, theoretical predictions for destabilizing amino acid substitutions on the surface of the ubiquitin molecule were considered as well. These predictions were also tested experimentally using correspondingly designed variants of ubiquitin. We found that these variants are less stable than the wild-type protein, again in agreement with the theoretical prediction. These observations provide guidelines for rational design of more stable proteins and suggest a possible mechanism of structural stability of proteins from thermophilic organisms.  相似文献   

7.
Thomas ST  Makhatadze GI 《Biochemistry》2000,39(33):10275-10283
The contribution of the hydrophobic contact in the C-capping motif of the alpha-helix to the thermodynamic stability of the ubiquitin molecule has been analyzed. For this, 16 variants of ubiquitin containing the full combinatorial set of four nonpolar residues Val, Ile, Leu, and Phe at C4 (Ile30) and C' ' (Ile36) positions were generated. The secondary structure content as estimated using far-UV circular dichroism (CD) spectroscopy of all but Phe variants at position 30 did not show notable changes upon substitutions. The thermodynamic stability of these ubiquitin variants was measured using differential scanning calorimetry, and it was shown that all variants have lower stability as measured by decreases in the Gibbs energy. Since in some cases the decrease in stability was so dramatic that it rendered an unfolded protein, it was therefore concluded that, despite apparent preservation of the secondary structure, the 30/36 hydrophobic contact is essential for the stability of the ubiquitin molecule. The decrease in the Gibbs energy in many cases was found to be accompanied by a large (up to 25%) decrease in the enthalpy of unfolding, particularly significant in the variants containing Ile to Leu substitutions. This decrease in enthalpy of unfolding is proposed to be primarily the result of the perturbed packing interactions in the native state of the Ile --> Leu variants. The analysis of these data and comparison with effects of similar amino acid substitutions on the stability of other model systems suggest that Ile --> Leu substitutions cannot be isoenergetic at the buried site.  相似文献   

8.
Barenboim M  Masso M  Vaisman II  Jamison DC 《Proteins》2008,71(4):1930-1939
There is substantial interest in methods designed to predict the effect of nonsynonymous single nucleotide polymorphisms (nsSNPs) on protein function, given their potential relationship to heritable diseases. Current state-of-the-art supervised machine learning algorithms, such as random forest (RF), train models that classify single amino acid mutations in proteins as either neutral or deleterious to function. However, it is frequently the case that the functional effect of a polymorphism on a protein resides between these two extremes. The utilization of classifiers that incorporate fuzzy logic provides a natural extension in order to account for the spectrum of possible functional consequences. We generated a dataset of single amino acid substitutions in human proteins having known three-dimensional structures. Each variant was uniquely represented as a feature vector that included computational geometry and knowledge-based statistical potential predictors obtained though application of Delaunay tessellation of protein structures. Additional attributes consisted of physicochemical properties of the native and replacement amino acids as well as topological location of the mutated residue position in the solved structure. Classification performance of the RF algorithm was evaluated on a training set consisting of the disease-associated and neutral nsSNPs taken from our dataset, and attributes were ranked according to their relative importance. Similarly, we evaluated the performance of adaptive neuro-fuzzy inference system (ANFIS). The utility of statistical geometry predictors was compared with that of traditional structural and evolutionary attributes employed by other researchers, revealing an equally effective yet complementary methodology. Among all attributes in our feature set, the statistical geometry predictors were found to be the most highly ranked. On the basis of the AUC (area under the ROC curve) measure of performance, the ANFIS and RF models were equally effective when only statistical geometry features were utilized. Tenfold cross-validation studies evaluating AUC, balanced error rate (BER), and Matthew's correlation coefficient (MCC) showed that our RF model was at least comparable with the well-established methods of SIFT and PolyPhen. The trained RF and ANFIS models were each subsequently used to predict the disease potential of human nsSNPs in our dataset that are currently unclassified (http://rna.gmu.edu/FuzzySnps/).  相似文献   

9.
The simplest approximation of interaction potential between amino acid residues in proteins is the contact potential, which defines the effective free energy of a protein conformation by a set of amino acid contacts formed in this conformation. Finding a contact potential capable of predicting free energies of protein states across a variety of protein families will aid protein folding and engineering in silico on a computationally tractable time-scale. We test the ability of contact potentials to accurately and transferably (across various protein families) predict stability changes of proteins upon mutations. We develop a new methodology to determine the contact potentials in proteins from experimental measurements of changes in protein's thermodynamic stabilities (DeltaDeltaG) upon mutations. We apply our methodology to derive sets of contact interaction parameters for a hierarchy of interaction models including solvation and multi-body contact parameters. We test how well our models reproduce experimental measurements by statistical tests. We evaluate the maximum accuracy of predictions obtained by using contact potentials and the correlation between parameters derived from different data-sets of experimental (DeltaDeltaG) values. We argue that it is impossible to reach experimental accuracy and derive fully transferable contact parameters using the contact models of potentials. However, contact parameters may yield reliable predictions of DeltaDeltaG for datasets of mutations confined to the same amino acid positions in the sequence of a single protein.  相似文献   

10.
As next-generation sequencing projects generate massive genome-wide sequence variation data, bioinformatics tools are being developed to provide computational predictions on the functional effects of sequence variations and narrow down the search of casual variants for disease phenotypes. Different classes of sequence variations at the nucleotide level are involved in human diseases, including substitutions, insertions, deletions, frameshifts, and non-sense mutations. Frameshifts and non-sense mutations are likely to cause a negative effect on protein function. Existing prediction tools primarily focus on studying the deleterious effects of single amino acid substitutions through examining amino acid conservation at the position of interest among related sequences, an approach that is not directly applicable to insertions or deletions. Here, we introduce a versatile alignment-based score as a new metric to predict the damaging effects of variations not limited to single amino acid substitutions but also in-frame insertions, deletions, and multiple amino acid substitutions. This alignment-based score measures the change in sequence similarity of a query sequence to a protein sequence homolog before and after the introduction of an amino acid variation to the query sequence. Our results showed that the scoring scheme performs well in separating disease-associated variants (n = 21,662) from common polymorphisms (n = 37,022) for UniProt human protein variations, and also in separating deleterious variants (n = 15,179) from neutral variants (n = 17,891) for UniProt non-human protein variations. In our approach, the area under the receiver operating characteristic curve (AUC) for the human and non-human protein variation datasets is ∼0.85. We also observed that the alignment-based score correlates with the deleteriousness of a sequence variation. In summary, we have developed a new algorithm, PROVEAN (Protein Variation Effect Analyzer), which provides a generalized approach to predict the functional effects of protein sequence variations including single or multiple amino acid substitutions, and in-frame insertions and deletions. The PROVEAN tool is available online at http://provean.jcvi.org.  相似文献   

11.
An empirical method for estimating the effects of single amino acid substitutions on structural stability of proteins with known spatial structure is developed. Twenty physical and chemical properties of amino acids and characteristics of protein tertiary structure were analysed to determine those most involved in producing instability. We employed data on 330 mutant variants of the alpha- and beta-subunits of human haemoglobin in choice of the parameters of the method developed which yielded a 81% of prediction accuracy of stability estimates for human mutant haemoglobins.  相似文献   

12.
Identification and characterization of antigenic determinants on proteins has received considerable attention utilizing both, experimental as well as computational methods. For computational routines mostly structural as well as physicochemical parameters have been utilized for predicting the antigenic propensity of protein sites. However, the performance of computational routines has been low when compared to experimental alternatives. Here we describe the construction of machine learning based classifiers to enhance the prediction quality for identifying linear B-cell epitopes on proteins. Our approach combines several parameters previously associated with antigenicity, and includes novel parameters based on frequencies of amino acids and amino acid neighborhood propensities. We utilized machine learning algorithms for deriving antigenicity classification functions assigning antigenic propensities to each amino acid of a given protein sequence. We compared the prediction quality of the novel classifiers with respect to established routines for epitope scoring, and tested prediction accuracy on experimental data available for HIV proteins. The major finding is that machine learning classifiers clearly outperform the reference classification systems on the HIV epitope validation set.  相似文献   

13.
Synonymous variations, which are defined as codon substitutions that do not change the encoded amino acid, were previously thought to have no effect on the properties of the synthesized protein(s). However, mounting evidence shows that these "silent" variations can have a significant impact on protein expression and function and should no longer be considered "silent". Here, the effects of six synonymous and six non-synonymous variations, previously found in the gene of ADAMTS13, the von Willebrand Factor (VWF) cleaving hemostatic protease, have been investigated using a variety of approaches. The ADAMTS13 mRNA and protein expression levels, as well as the conformation and activity of the variants have been compared to that of wild-type ADAMTS13. Interestingly, not only the non-synonymous variants but also the synonymous variants have been found to change the protein expression levels, conformation and function. Bioinformatic analysis of ADAMTS13 mRNA structure, amino acid conservation and codon usage allowed us to establish correlations between mRNA stability, RSCU, and intracellular protein expression. This study demonstrates that variants and more specifically, synonymous variants can have a substantial and definite effect on ADAMTS13 function and that bioinformatic analysis may allow development of predictive tools to identify variants that will have significant effects on the encoded protein.  相似文献   

14.
We present a method for classifying proteins into families based on short subsequences of amino acids using a new probabilistic model called sparse Markov transducers (SMT). We classify a protein by estimating probability distributions over subsequences of amino acids from the protein. Sparse Markov transducers, similar to probabilistic suffix trees, estimate a probability distribution conditioned on an input sequence. SMTs generalize probabilistic suffix trees by allowing for wild-cards in the conditioning sequences. Since substitutions of amino acids are common in protein families, incorporating wild-cards into the model significantly improves classification performance. We present two models for building protein family classifiers using SMTs. As protein databases become larger, data driven learning algorithms for probabilistic models such as SMTs will require vast amounts of memory. We therefore describe and use efficient data structures to improve the memory usage of SMTs. We evaluate SMTs by building protein family classifiers using the Pfam and SCOP databases and compare our results to previously published results and state-of-the-art protein homology detection methods. SMTs outperform previous probabilistic suffix tree methods and under certain conditions perform comparably to state-of-the-art protein homology methods.  相似文献   

15.
One of the unique traits of membrane proteins is that a significant fraction of their hydrophobic amino acids is exposed to the hydrophobic core of lipid bilayers rather than being embedded in the protein interior, which is often not explicitly considered in the protein structure and function predictions. Here, we propose a characteristic and predictive quantity, the membrane contact probability (MCP), to describe the likelihood of the amino acids of a given sequence being in direct contact with the acyl chains of lipid molecules. We show that MCP is complementary to solvent accessibility in characterizing the outer surface of membrane proteins, and it can be predicted for any given sequence with a machine learning-based method by utilizing a training dataset extracted from MemProtMD, a database generated from molecular dynamics simulations for the membrane proteins with a known structure. As the first of many potential applications, we demonstrate that MCP can be used to systematically improve the prediction precision of the protein contact maps and structures.  相似文献   

16.
The twin arginine transport (Tat) system transports folded proteins across the prokaryotic cytoplasmic membrane and the plant thylakoid membrane. TatC is the largest and most conserved component of the Tat machinery. It forms a multisubunit complex with TatB and binds the signal peptides of Tat substrates. Here we have taken a random mutagenesis approach to identify substitutions in Escherichia coli TatC that inactivate protein transport. We identify 32 individual amino acid substitutions that abolish or severely compromise TatC activity. The majority of the inactivating substitutions fall within the first two periplasmic loops of TatC. These regions are predicted to have conserved secondary structure and results of extensive amino acid insertion and deletion mutagenesis are consistent with these conserved elements being essential for TatC function. Three inactivating substitutions were identified in the fifth transmembrane helix of TatC. The inactive M205R variant could be suppressed by mutations affecting amino acids in the transmembrane helix of TatB. A physical interaction between TatC helix 5 and the TatB transmembrane helix was confirmed by the formation of a site-specific disulphide bond between TatC M205C and TatB L9C variants. This is the first molecular contact site mapped to single amino acid level between these two proteins.  相似文献   

17.
Some amino acid substitutions in phage P22 coat protein cause a temperature-sensitive folding (tsf) phenotype. In vivo, these tsf amino acid substitutions cause coat protein to aggregate and form intracellular inclusion bodies when folded at high temperatures, but at low temperatures the proteins fold properly. Here the effects of tsf amino acid substitutions on folding and unfolding kinetics and the stability of coat protein in vitro have been investigated to determine how the substitutions change the ability of coat protein to fold properly. The equilibrium unfolding transitions of the tsf variants were best fit to a three-state model, N if I if U, where all species concerned were monomeric, a result confirmed by velocity sedimentation analytical ultracentrifugation. The primary effect of the tsf amino acid substitutions on the equilibrium unfolding pathway was to decrease the stability (DeltaG) and the solvent accessibility (m-value) of the N if I transition. The kinetics of folding and unfolding of the tsf coat proteins were investigated using tryptophan fluorescence and circular dichroism (CD) at 222 nm. The tsf amino acid substitutions increased the rate of unfolding by 8-14-fold, with little effect on the rate of folding, when monitored by tryptophan fluorescence. In contrast, when folding or unfolding reactions were monitored by CD, the reactions were too fast to be observed. The tsf coat proteins are natural substrates for the molecular chaperones, GroEL/S. When native tsf coat protein monomers were incubated with GroEL, they bound efficiently, indicating that a folding intermediate was significantly populated even without denaturant. Thus, the tsf coat proteins aggregate in vivo because of an increased propensity to populate this unfolding intermediate.  相似文献   

18.
When theoretical methods are used to predict the properties of a given system, such as the effects of the substitution of a specific amino acid on the activity or stability of a protein as a whole, the accuracy of the prediction is directly dependent on the validity of the underlying model. A common error, however, is to attempt to improve a basically crude model by performing one aspect of the calculation in a rigorous manner. The accuracy of the model as a whole will remain limited by the crudest approximation or weakest assumption. To demonstrate the principle that nothing can be gained by performing extensive calculations using a basically crude underlying model we compare the predictive power of three models in relation to activity and stability data for 78 triple-site sequence variants of the lambda-repressor protein. This system has recently been analysed in terms of a conceptionally simple, but computationally elaborate model for the prediction of the energy of a protein in which amino acid residues in the core of the protein have been mutated. We show that comparable, if not better agreement with the experimental data can be reached using either of two much simpler models, based on straightforward structural considerations, which do not require elaborate calculations on a computer.  相似文献   

19.
Reliable prediction of free energy changes upon amino acid substitutions (ΔΔGs) is crucial to investigate their impact on protein stability and protein–protein interaction. Advances in experimental mutational scans allow high-throughput studies thanks to multiplex techniques. On the other hand, genomics initiatives provide a large amount of data on disease-related variants that can benefit from analyses with structure-based methods. Therefore, the computational field should keep the same pace and provide new tools for fast and accurate high-throughput ΔΔG calculations. In this context, the Rosetta modeling suite implements effective approaches to predict folding/unfolding ΔΔGs in a protein monomer upon amino acid substitutions and calculate the changes in binding free energy in protein complexes. However, their application can be challenging to users without extensive experience with Rosetta. Furthermore, Rosetta protocols for ΔΔG prediction are designed considering one variant at a time, making the setup of high-throughput screenings cumbersome. For these reasons, we devised RosettaDDGPrediction, a customizable Python wrapper designed to run free energy calculations on a set of amino acid substitutions using Rosetta protocols with little intervention from the user. Moreover, RosettaDDGPrediction assists with checking completed runs and aggregates raw data for multiple variants, as well as generates publication-ready graphics. We showed the potential of the tool in four case studies, including variants of uncertain significance in childhood cancer, proteins with known experimental unfolding ΔΔGs values, interactions between target proteins and disordered motifs, and phosphomimetics. RosettaDDGPrediction is available, free of charge and under GNU General Public License v3.0, at https://github.com/ELELAB/RosettaDDGPrediction .  相似文献   

20.
Pim-1 kinase, a serine/threonine protein kinase encoded by the pim proto-oncogene, is involved in several signalling pathways such as the regulation of cell cycle progression and apoptosis. Many cancer types show high expression levels of Pim kinases and particularly Pim-1 has been linked to the initiation and progression of the malignant phenotype. In several cancer tissues somatic Pim-1 mutants have been identified. These natural variants are nonsynonymous single nucleotide polymorphisms, variations of a single nucleotide occurring in the coding region and leading to amino acid substitutions. In this study we investigated the effect of amino acid substitution on the structural stability and on the activity of Pim-1 kinase. We expressed and purified some of the mutants of Pim-1 kinase that are expressed in cancer tissues and reported in the single nucleotide polymorphisms database. The point mutations in the variants significantly affect the conformation of the native state of Pim-1. All the mutants, expressed as soluble recombinant proteins, show a decreased thermal and thermodynamic stability and a lower activation energy values for kinase activity. The decreased stability accompanied by an increased flexibility suggests that Pim-1 variants may be involved in a wider network of protein interactions. All mutants bound ATP and ATP mimetic inhibitors with comparable IC50 values suggesting that the studied Pim-1 kinase mutants can be efficiently targeted with inhibitors developed for the wild type protein.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号