首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Cheng J  Randall A  Baldi P 《Proteins》2006,62(4):1125-1132
Accurate prediction of protein stability changes resulting from single amino acid mutations is important for understanding protein structures and designing new proteins. We use support vector machines to predict protein stability changes for single amino acid mutations leveraging both sequence and structural information. We evaluate our approach using cross-validation methods on a large dataset of single amino acid mutations. When only the sign of the stability changes is considered, the predictive method achieves 84% accuracy-a significant improvement over previously published results. Moreover, the experimental results show that the prediction accuracy obtained using sequence alone is close to the accuracy obtained using tertiary structure information. Because our method can accurately predict protein stability changes using primary sequence information only, it is applicable to many situations where the tertiary structure is unknown, overcoming a major limitation of previous methods which require tertiary information. The web server for predictions of protein stability changes upon mutations (MUpro), software, and datasets are available at http://www.igb.uci.edu/servers/servers.html.  相似文献   

2.
The study of the evolution of compensatory mechanisms among amino acids is paramount to our understanding of intramolecular epistatic interactions. It has been addressed from different points of view, for example much effort has been devoted to establish the number of compensatory mutations required per deleterious mutation. However, we still do not know how the nature of the compensated mutation determines the existence of compensatory mutations. Within this context, recent studies have produced several instances of an interesting phenomenon: human disease-associated residues may sometimes appear as wild-type residues in non-human proteins. This can be explained in terms of compensatory mutations, present in the non-human protein, which would neutralize the damage caused by the disease-associated residue. Therefore, comparison between these compensated mutations and non-compensated pathological mutations provides a simple approach to understand how the nature of the compensated deleterious mutation determines the existence of compensatory mutations. To address this issue, we have obtained a large set of compensated mutations and characterised them with a series of different properties. When comparing the resulting distributions with those from pathological mutations we find that in general compensated mutations are milder than pathological mutations. More precisely, we find that the probability that a compensatory mutation will evolve is directly related (i) to the location in the protein structure and (ii) to changes in physico-chemical properties (e.g. amino acid volume or hydrophobicity) of the compensated mutation.  相似文献   

3.
The development of methods to assess the impact of amino acid mutations on human health has become an important goal in biomedical research, due to the growing number of nonsynonymous SNPs identified. Within this context, computational methods constitute a valuable tool, because they can easily process large amounts of mutations and give useful, almost cost-free, information on their pathological character. In this paper we present a computational approach to the prediction of disease-associated amino acid mutations, using only sequence-based information (amino acid properties, evolutionary information, secondary structure and accessibility predictions, and database annotations) and neural networks, as a model building tool. Mutations are predicted to be either pathological or neutral. Our results show that the method has a good overall success rate, 83%, that can reach 95% when trained for specific proteins. The methodology is fast and flexible enough to provide good estimates of the pathological character of large sets of nonsynonymous SNPs, but can also be easily adapted to give more precise predictions for proteins of special biomedical interest.  相似文献   

4.
Although mutation analysis serves as a key part in making a definitive diagnosis about a genetic disease, it still remains a time-consuming step to interpret their biological implications through integration of various lines of archived information about genes in question. To expedite this evaluation step of disease-causing genetic variations, here we developed Mutation@A Glance (http://rapid.rcai.riken.jp/mutation/), a highly integrated web-based analysis tool for analysing human disease mutations; it implements a user-friendly graphical interface to visualize about 40 000 known disease-associated mutations and genetic polymorphisms from more than 2600 protein-coding human disease-causing genes. Mutation@A Glance locates already known genetic variation data individually on the nucleotide and the amino acid sequences and makes it possible to cross-reference them with tertiary and/or quaternary protein structures and various functional features associated with specific amino acid residues in the proteins. We showed that the disease-associated missense mutations had a stronger tendency to reside in positions relevant to the structure/function of proteins than neutral genetic variations. From a practical viewpoint, Mutation@A Glance could certainly function as a ‘one-stop’ analysis platform for newly determined DNA sequences, which enables us to readily identify and evaluate new genetic variations by integrating multiple lines of information about the disease-causing candidate genes.  相似文献   

5.
Proteins evolve under a myriad of biophysical selection pressures that collectively control the patterns of amino acid substitutions. These evolutionary pressures are sufficiently consistent over time and across protein families to produce substitution patterns, summarized in global amino acid substitution matrices such as BLOSUM, JTT, WAG, and LG, which can be used to successfully detect homologs, infer phylogenies, and reconstruct ancestral sequences. Although the factors that govern the variation of amino acid substitution rates have received much attention, the influence of thermodynamic stability constraints remains unresolved. Here we develop a simple model to calculate amino acid substitution matrices from evolutionary dynamics controlled by a fitness function that reports on the thermodynamic effects of amino acid mutations in protein structures. This hybrid biophysical and evolutionary model accounts for nucleotide transition/transversion rate bias, multi‐nucleotide codon changes, the number of codons per amino acid, and thermodynamic protein stability. We find that our theoretical model accurately recapitulates the complex yet universal pattern observed in common global amino acid substitution matrices used in phylogenetics. These results suggest that selection for thermodynamically stable proteins, coupled with nucleotide mutation bias filtered by the structure of the genetic code, is the primary driver behind the global amino acid substitution patterns observed in proteins throughout the tree of life.  相似文献   

6.
7.
8.
9.
Predicting the effect of a single amino acid substitution on the stability of a protein structure is a fundamental task in macromolecular modeling. It has relevance to drug design and understanding of disease-causing protein variants. We present KINARI-Mutagen, a web server for performing in silico mutation experiments on protein structures from the Protein Data Bank. Our rigidity-theoretical approach permits fast evaluation of the effects of mutations that may not be easy to perform in vitro, because it is not always possible to express a protein with a specific amino acid substitution. We use KINARI-Mutagen to identify critical residues, and we show that our predictions correlate with destabilizing mutations to glycine. In two in-depth case studies we show that the mutated residues identified by KINARI-Mutagen as critical correlate with experimental data, and would not have been identified by other methods such as Solvent Accessible Surface Area measurements or residue ranking by contributions to stabilizing interactions. We also generate 48 mutants for 14 proteins, and compare our rigidity-based results against experimental mutation stability data. KINARI-Mutagen is available at http://kinari.cs.umass.edu.  相似文献   

10.
Predicting protein sequences that fold into specific native three-dimensional structures is a problem of great potential complexity. Although the complete solution is ultimately rooted in understanding the physical chemistry underlying the complex interactions between amino acid residues that determine protein stability, recent work shows that empirical information about these first principles is embedded in the statistics of protein sequence and structure databases. This review focuses on the use of 'knowledge-based' potentials derived from these databases in designing proteins. In addition, the data suggest how the study of these empirical potentials might impact our fundamental understanding of the energetic principles of protein structure.  相似文献   

11.
Lee S  Lee BC  Kim D 《Proteins》2006,62(4):1107-1114
Knowing protein structure and inferring its function from the structure are one of the main issues of computational structural biology, and often the first step is studying protein secondary structure. There have been many attempts to predict protein secondary structure contents. Previous attempts assumed that the content of protein secondary structure can be predicted successfully using the information on the amino acid composition of a protein. Recent methods achieved remarkable prediction accuracy by using the expanded composition information. The overall average error of the most successful method is 3.4%. Here, we demonstrate that even if we only use the simple amino acid composition information alone, it is possible to improve the prediction accuracy significantly if the evolutionary information is included. The idea is motivated by the observation that evolutionarily related proteins share the similar structure. After calculating the homolog-averaged amino acid composition of a protein, which can be easily obtained from the multiple sequence alignment by running PSI-BLAST, those 20 numbers are learned by a multiple linear regression, an artificial neural network and a support vector regression. The overall average error of method by a support vector regression is 3.3%. It is remarkable that we obtain the comparable accuracy without utilizing the expanded composition information such as pair-coupled amino acid composition. This work again demonstrates that the amino acid composition is a fundamental characteristic of a protein. It is anticipated that our novel idea can be applied to many areas of protein bioinformatics where the amino acid composition information is utilized, such as subcellular localization prediction, enzyme subclass prediction, domain boundary prediction, signal sequence prediction, and prediction of unfolded segment in a protein sequence, to name a few.  相似文献   

12.
The vast majority of our knowledge on protein stability arises from the study of simple two-state models. However, proteins displaying equilibrium intermediates under certain conditions abound and it is unclear whether the energetics of native/intermediate equilibria is well represented in current knowledge. We consider here that the overall conformational stability of three-state proteins is made of a "relevant" term and a "residual" one, corresponding to the free energy differences of the native to intermediate (N-to-I) and intermediate to denatured (I-to-D) equilibria, respectively. The N-to-I free energy difference is considered to be the relevant stability because protein-unfolding intermediates are likely devoid of biological activity. We use surface charge optimisation to first increase the overall (N-to-D) stability of a model three-state protein (apoflavodoxin) and then investigate whether the stabilisation obtained is realised into relevant or into residual stability. Most of the mutations designed from electrostatic calculations or from simple sequence conservation analysis produce large increases in the overall stability of the protein. However, in most cases, this simply leads to similarly large increases of the residual stability. Two mutations, nevertheless, show a different trend and increase the relevant stability of the protein substantially. When all the mutations are mapped onto the structure of the apoflavodoxin thermal-unfolding intermediate (obtained independently by equilibrium phi-analysis and NMR) they cluster perfectly so that the mutations increasing the relevant stability appear in the small unstructured region of the intermediate and the others in the native-like region. This illustrates the need for specific investigation of N-to-I equilibria and the structure of protein intermediates, and indicates that it is possible to rationally stabilise a protein against partial unfolding once the structure of the intermediate conformation is known, even if at low resolution.  相似文献   

13.
14.
Over last several years, we demonstrated that the mutations are more likely to occur at randomly unpredictable amino acid pairs in a protein. We therefore can in principle predict the amino acid pairs sensitive to the future mutations in a protein. However, we still need to predict the positions at which the sensitive amino acid pairs are located in a protein. In this study, we use a probabilistic approach to analyze the effect of 191 mutations in human p53 protein and can approximately estimate the sensitive positions to mutations in human p53 protein.  相似文献   

15.
We develop an approximate maximum likelihood method to estimate flanking nucleotide context-dependent mutation rates and amino acid exchange-dependent selection in orthologous protein-coding sequences and use it to analyze genome-wide coding sequence alignments from mammals and yeast. Allowing context-dependent mutation provides a better fit to coding sequence data than simpler (context-independent or CpG "hotspot") models and significantly affects selection parameter estimates. Allowing asymmetric (nonreciprocal) selection on amino acid exchanges gives a better fit than simple dN/dS or symmetric selection models. Relative selection strength estimates from our models show good agreement with independent estimates derived from human disease-causing and engineered mutations. Selection strengths depend on local protein structure, showing expected biophysical trends in helical versus nonhelical regions and increased asymmetry on polar-hydrophobic exchanges with increased burial. The more stringent selection that has previously been observed for highly expressed proteins is primarily concentrated in buried regions, supporting the notion that such proteins are under stronger than average selection for stability. Our analyses indicate that a highly parameterized model of mutation and selection is computationally tractable and is a useful tool for exploring a variety of biological questions concerning protein and coding sequence evolution.  相似文献   

16.
The simplest approximation of interaction potential between amino acid residues in proteins is the contact potential, which defines the effective free energy of a protein conformation by a set of amino acid contacts formed in this conformation. Finding a contact potential capable of predicting free energies of protein states across a variety of protein families will aid protein folding and engineering in silico on a computationally tractable time-scale. We test the ability of contact potentials to accurately and transferably (across various protein families) predict stability changes of proteins upon mutations. We develop a new methodology to determine the contact potentials in proteins from experimental measurements of changes in protein's thermodynamic stabilities (DeltaDeltaG) upon mutations. We apply our methodology to derive sets of contact interaction parameters for a hierarchy of interaction models including solvation and multi-body contact parameters. We test how well our models reproduce experimental measurements by statistical tests. We evaluate the maximum accuracy of predictions obtained by using contact potentials and the correlation between parameters derived from different data-sets of experimental (DeltaDeltaG) values. We argue that it is impossible to reach experimental accuracy and derive fully transferable contact parameters using the contact models of potentials. However, contact parameters may yield reliable predictions of DeltaDeltaG for datasets of mutations confined to the same amino acid positions in the sequence of a single protein.  相似文献   

17.
The most common form of systemic amyloidosis originates from antibody light chains. The large number of amino acid variations that distinguish amyloidogenic from nonamyloidogenic light chain proteins has impeded our understanding of the structural basis of light-chain fibril formation. Moreover, even among the subset of human light chains that are amyloidogenic, many primary structure differences are found. We compared the thermodynamic stabilities of two recombinant kappa4 light-chain variable domains (V(L)s) derived from amyloidogenic light chains with a V(L) from a benign light chain. The amyloidogenic V(L)s were significantly less stable than the benign V(L). Furthermore, only the amyloidogenic V(L)s formed fibrils under native conditions in an in vitro fibril formation assay. We used site-directed mutagenesis to examine the consequences of individual amino acid substitutions found in the amyloidogenic V(L)s on stability and fibril formation capability. Both stabilizing and destabilizing mutations were found; however, only destabilizing mutations induced fibril formation in vitro. We found that fibril formation by the benign V(L) could be induced by low concentrations of a denaturant. This indicates that there are no structural or sequence-specific features of the benign V(L) that are incompatible with fibril formation, other than its greater stability. These studies demonstrate that the V(L) beta-domain structure is vulnerable to destabilizing mutations at a number of sites, including complementarity determining regions (CDRs), and that loss of variable domain stability is a major driving force in fibril formation.  相似文献   

18.
Understanding and predicting how amino acid substitutions affect proteins are keys to our basic understanding of protein function and evolution. Amino acid changes may affect protein function in a number of ways including direct perturbations of activity or indirect effects on protein folding and stability. We have analyzed 6,749 experimentally determined variant effects from multiplexed assays on abundance and activity in two proteins (NUDT15 and PTEN) to quantify these effects and find that a third of the variants cause loss of function, and about half of loss-of-function variants also have low cellular abundance. We analyze the structural and mechanistic origins of loss of function and use the experimental data to find residues important for enzymatic activity. We performed computational analyses of protein stability and evolutionary conservation and show how we may predict positions where variants cause loss of activity or abundance. In this way, our results link thermodynamic stability and evolutionary conservation to experimental studies of different properties of protein fitness landscapes.  相似文献   

19.
A detailed knowledge of a protein's functional site is an absolute prerequisite for understanding its mode of action at the molecular level. However, the rapid pace at which sequence and structural information is being accumulated for proteins greatly exceeds our ability to determine their biochemical roles experimentally. As a result, computational methods are required which allow for the efficient processing of the evolutionary information contained in this wealth of data, in particular that related to the nature and location of functionally important sites and residues. The method presented here, referred to as conserved functional group (CFG) analysis, relies on a simplified representation of the chemical groups found in amino acid side-chains to identify functional sites from a single protein structure and a number of its sequence homologues. We show that CFG analysis can fully or partially predict the location of functional sites in approximately 96% of the 470 cases tested and that, unlike other methods available, it is able to tolerate wide variations in sequence identity. In addition, we discuss its potential in a structural genomics context, where automation, scalability and efficiency are critical, and an increasing number of protein structures are determined with no prior knowledge of function. This is exemplified by our analysis of the hypothetical protein Ydde_Ecoli, whose structure was recently solved by members of the North East Structural Genomics consortium. Although the proposed active site for this protein needs to be validated experimentally, this example illustrates the scope of CFG analysis as a general tool for the identification of residues likely to play an important role in a protein's biochemical function. Thus, our method offers a convenient solution to rapidly and automatically process the vast amounts of data that are beginning to emerge from structural genomics projects.  相似文献   

20.
The structure of a protein molecule consists of both rigid and flexible sections to satisfy the demands for stability and catalysis. Because the flexibility of a protein segment is indispensable for a proteolytic attack, limited proteolysis is a superb tool to analyse both confined local fluctuations and global unfolding events in proteins. While the identification of the primary cleavage products allows the assignment of the flexible regions to the primary structure, the kinetics of proteolytic degradation enables differentiation between local fluctuations in the native protein molecule and the global unfolding process during denaturation. Modifications of the amino acid sequence in the concerned regions can tune proteolytic susceptibility and alter protein stability. In the present paper, we summarise our results on native-state and unfolded-state proteolysis of ribonuclease A (RNase A) and the effect of mutations in the detected flexible regions on the stability and unfolding of the RNase A molecule.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号