首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Methods for automated prediction of deleterious protein mutations have utilized both structural and evolutionary information but the relative contribution of these two factors remains unclear. To address this, we have used a variety of structural and evolutionary features to create simple deleterious mutation models that have been tested on both experimental mutagenesis and human allele data. We find that the most accurate predictions are obtained using a solvent-accessibility term, the C(beta) density, and a score derived from homologous sequences, SIFT. A classification tree using these two features has a cross-validated prediction error of 20.5% on an experimental mutagenesis test set when the prior probability for deleterious and neutral cases is equal, whereas this prediction error is 28.8% and 22.2% using either the C(beta) density or SIFT alone. The improvement imparted by structure increases when fewer homologs are available: when restricted to three homologs the prediction error improves from 26.9% using SIFT alone to 22.4% using SIFT and the C(beta) density, or 24.8% using SIFT and a noisy C(beta) density term approximating the inaccuracy of ab initio structures modeled by the Rosetta method. We conclude that methods for deleterious mutation prediction should include structural information when fewer than five to ten homologs are available, and that ab initio predicted structures may soon be useful in such cases when high-resolution structures are unavailable.  相似文献   

2.
用DREAM技术进行全长质粒快速定点突变   总被引:2,自引:1,他引:1  
利用“设计限制酶辅助突变”(Designed Restriction Enzyme Assisted Mutagenesis, DREAM)进行全长质粒快速定点突变。根据突变位点附近氨基酸靶序列, 以简并密码子进行逆向推导, 这样在不改变氨基酸序列的前提下可以得到数目巨大的隐性突变体(Silent mutants), 这些突变体中包含大量的限制性酶切位点, 选择合适的酶切位点设计引物, 用Phusion超保真DNA聚合酶扩增全长质粒的DNA序列, 得到的PCR产物用T4多聚核苷酸激酶添加5¢磷酸基团后进行平末端连接, 转化大肠杆菌受体菌后用设计的酶切位点进行快速筛选。本研究用该方法成功地纠正了长约8 kb的质粒pcDNA3.1-pIgR中的突变碱基, 从而获得了多聚免疫球蛋白受体(pIgR)的野生型氨基酸序列。以上结果表明: 利用DREAM技术将限制性酶切位点引入目的基因而不改变目的蛋白质的氨基酸序列, 使突变体的筛选简单化; 配合使用高保真和高效率的Phusion DNA聚合酶可以进行长达8 kb的全长质粒的快速突变; 该方法无需使用定点突变试剂盒和特殊的受体菌, 同时避免了核酸杂交以及同位素的使用。  相似文献   

3.
Proteins that need to be structured in their native state must be stable both against the unfolded ensemble and against incorrectly folded (misfolded) conformations with low free energy. Positive design targets the first type of stability by strengthening native interactions. The second type of stability is achieved by destabilizing interactions that occur frequently in the misfolded ensemble, a strategy called negative design. Here, we investigate negative design adopting a statistical mechanical model of the misfolded ensemble, which improves the usual Gaussian approximation by taking into account the third moment of the energy distribution and contact correlations. Applying this model, we detect and quantify selection for negative design in most natural proteins, and we analytically design protein sequences that are stable both against unfolding and against misfolding. Proteins 2013; 81:1102–1112. © 2013 Wiley Periodicals, Inc.  相似文献   

4.
T Palzkill  D Botstein 《Proteins》1992,14(1):29-44
A new analytical mutagenesis technique is described that involves randomizing the DNA sequence of a short stretch of a gene (3-6 codons) and determining the percentage of all possible random sequences that produce a functional protein. A low percentage of functional random sequences in a complete library of random substitutions indicates that the region mutagenized is important for the structure and/or function of the protein. Repeating the mutagenesis over many regions throughout a protein gives a global perspective of which amino acid sequences in a protein are critical. We applied this method to 66 codons of the gene encoding TEM-1 beta-lactamase in 19 separate experiments. We found that TEM-1 beta-lactamase is extremely tolerant of amino acid substitutions: on average, 44% of all mutants with random substitutions function and 20% of the substitutions are expressed, secreted, and fold well enough to function at levels similar to those for the wild-type enzyme. We also found a few exceptional regions where only a few random sequences function. Examination of the X-ray structures of homologous beta-lactamases indicates that the regions most sensitive to substitution are in the vicinity of the active site pocket or buried in the hydrophobic core of the protein. DNA sequence analysis of functional random sequences has been used to obtain more detailed information about the amino acid sequence requirements for several regions and this information has been compared to sequence conservation among several related beta-lactamases.  相似文献   

5.
In prediction of a protein main-chain structure into which a query sequence of amino acids folds, one evaluates the relative stability of a candidate structure against reference structures. We developed a statistical theory for calculating the energy distribution over a main-chain structure ensemble, only with an amino acid composition given as a single argument. Then, we obtained a statistical formulae of the ensemble mean and ensemble variance V[E] of the reference structural energies, as explicit functions of the amino acid composition. The mean and the variance V[E] calculated from the formulae were well or roughly consistent with those resulting from a gapless threading simulation. We can use the formulae not only to perform the high-through-put screening of sequences in the inverse folding problem, but also to handle the problem analytically.  相似文献   

6.
Although there have been recent transformative advances in the area of protein structure prediction, prediction of point mutations that improve protein stability remains challenging. It is possible to construct and screen large mutant libraries for improved activity or ligand binding. However, reliable screens for mutants that improve protein stability do not yet exist, especially for proteins that are well folded and relatively stable. Here, we demonstrate that incorporation of a single, specific, destabilizing mutation termed parent inactivating mutation into each member of a single-site saturation mutagenesis library, followed by screening for suppressors, allows for robust and accurate identification of stabilizing mutations. We carried out fluorescence-activated cell sorting of such a yeast surface display, saturation suppressor library of the bacterial toxin CcdB, followed by deep sequencing of sorted populations. We found that multiple stabilizing mutations could be identified after a single round of sorting. In addition, multiple libraries with different parent inactivating mutations could be pooled and simultaneously screened to further enhance the accuracy of identification of stabilizing mutations. Finally, we show that individual stabilizing mutations could be combined to result in a multi-mutant that demonstrated an increase in thermal melting temperature of about 20 °C, and that displayed enhanced tolerance to high temperature exposure. We conclude that as this method is robust and employs small library sizes, it can be readily extended to other display and screening formats to rapidly isolate stabilized protein mutants.  相似文献   

7.
Designating amino-acid sequences that fold into a common main-chain structure as "neutral sequences" for the structure, regardless of their function or stability, we investigated the distribution of neutral sequences in protein sequence space. For four distinct target structures (alpha, beta,alpha/beta and alpha+beta types) with the same chain length of 108, we generated the respective neutral sequences by using the inverse folding technique with a knowledge-based potential function. We assumed that neutral sequences for a protein structure have Z scores higher than or equal to fixed thresholds, where thresholds are defined as the Z score for the corresponding native sequence (case 1) or much greater Z score (case 2). An exploring walk simulation suggested that the neutral sequences mapped into the sequence space were connected with each other through straight neutral paths and formed an inherent neutral network over the sequence space. Through another exploring walk simulation, we investigated contiguous regions between or among the neutral networks for the distinct protein structures and obtained the following results. The closest approach distance between the two neutral networks ranged from 5 to 29 on the Hamming distance scale, showing a linear increase against the threshold values. The sequences located at the "interchange" regions between the two neutral networks have intermediate sequence-profile-scores for both corresponding structures. Introducing a "ball" in the sequence space that contains at least one neutral sequence for each of the four structures, we found that the minimal radius of the ball that is centered at an arbitrary position ranged from 35 to 50, while the minimal radius of the ball that is centered at a certain special position ranged from 20 to 30, in the Hamming distance scale. The relatively small Hamming distances (5-30) may support an evolution mechanism by transferring from a network for a structure to another network for a more beneficial structure via the interchange regions.  相似文献   

8.
Structure prediction methods often generate a large number of models for a target sequence. Even if the correct fold for the target sequence is sampled in this dataset, it is difficult to distinguish it from other decoy structures. An attempt to solve this problem using experimental mutational sensitivity data for the CcdB protein was described previously by exploiting the correlation of residue depth with mutational sensitivity (r ~ 0.6). We now show that such a correlation extends to four other proteins with localized active sites, and for which saturation mutagenesis datasets exist. We also examine whether incorporation of predicted secondary structure information and the DOPE model quality assessment score, in addition to mutational sensitivity, improves the accuracy of model discrimination using a decoy dataset of 163 targets from CASP. Although most CASP models would have been subjected to model quality assessment prior to submission, we find that the DOPE score makes a substantial contribution to the observed improvement. We therefore also applied the approach to CcdB and four other proteins for which reliable experimental mutational data exist and observe that inclusion of experimental mutational data results in a small qualitative improvement in model discrimination relative to that seen with just the DOPE score. This is largely because of our limited ability to quantitatively predict effects of point mutations on in vivo protein activity. Further improvements in the methodology are required to facilitate improved utilization of single mutant data.  相似文献   

9.
Multistate computational protein design (MSD) with backbone ensembles approximating conformational flexibility can predict higher quality sequences than single‐state design with a single fixed backbone. However, it is currently unclear what characteristics of backbone ensembles are required for the accurate prediction of protein sequence stability. In this study, we aimed to improve the accuracy of protein stability predictions made with MSD by using a variety of backbone ensembles to recapitulate the experimentally measured stability of 85 Streptococcal protein G domain β1 sequences. Ensembles tested here include an NMR ensemble as well as those generated by molecular dynamics (MD) simulations, by Backrub motions, and by PertMin, a new method that we developed involving the perturbation of atomic coordinates followed by energy minimization. MSD with the PertMin ensembles resulted in the most accurate predictions by providing the highest number of stable sequences in the top 25, and by correctly binning sequences as stable or unstable with the highest success rate (≈90%) and the lowest number of false positives. The performance of PertMin ensembles is due to the fact that their members closely resemble the input crystal structure and have low potential energy. Conversely, the NMR ensemble as well as those generated by MD simulations at 500 or 1000 K reduced prediction accuracy due to their low structural similarity to the crystal structure. The ensembles tested herein thus represent on‐ or off‐target models of the native protein fold and could be used in future studies to design for desired properties other than stability. Proteins 2014; 82:771–784. © 2013 Wiley Periodicals, Inc.  相似文献   

10.
We have developed a new method for the prediction of peptide sequences that bind to a protein, given a three-dimensional structure of the protein in complex with a peptide. By applying a recently developed sequence prediction algorithm and a novel ensemble averaging calculation, we generate a diverse collection of peptide sequences that are predicted to have significant affinity for the protein. Using output from the simulations, we create position-specific scoring matrices, or virtual interaction profiles (VIPs). Comparison of VIPs for a collection of binding motifs to sequences determined experimentally indicates that the prediction algorithm is accurate and applicable to a diverse range of structures. With these VIPs, one can scan protein sequence databases rapidly to seek binding partners of potential biological significance. Overall, this method can significantly enhance the information contained within a protein- peptide crystal structure, and enrich the data obtained by experimental selection methods such as phage display.  相似文献   

11.
Frenz CM 《Proteins》2005,59(2):147-151
Protein-based therapeutics are playing an increasingly important role in the treatment of diseases, including diabetes and cancer. The viability of these treatments, however, are highly dependent on the stability of the therapeutic, since stability affects both the shelf life of the therapeutic as well as its active life in the body. Stability engineering can, therefore, be used to increase the effectiveness of protein-based therapeutics. Computational methods of protein stability prediction have been under development for about a decade, but complex molecular interactions make stability prediction difficult and computationally intensive. A rapid computational method of protein stability prediction is developed using feed-forward neural networks and used to predict mutation-induced stability changes in Staphylococcal nuclease. The input to the neural network consisted of sequences of evolutionarily based amino acid similarity scores that were obtained through the comparison of the amino acids in a mutation containing sequence to their positional counterparts in the baseline wild-type amino acid sequence. A training set was created which consisted of similarity score sequences, for which the stabilities of the corresponding amino acid sequences were known, paired with the relative stabilities of the sequences to that of the baseline. Back-propagation of error was used to train the network to output accurate relative stability scores for the sequences in the training set. Neural network-based relative stability predictions for 55 sequences containing mutation combinations not found in the training set had an accuracy of 92.8%.  相似文献   

12.
13.
Protein trafficking or protein sorting in eukaryotes is a complicated process and is carried out based on the information contaified in the protein. Many methods reported prediction of the subcellular location of proteins from sequence information. However, most of these prediction methods use a flat structure or parallel architecture to perform prediction. In this work, we introduce ensemble classifiers with features that are extracted directly from full length protein sequences to predict locations in the protein-sorting pathway hierarchically. Sequence driven features, sequence mapped features and sequence autocorrelation features were tested with ensemble learners and their performances were compared. When evaluated by independent data testing, ensemble based-bagging algorithms with sequence feature composition, transition and distribution (CTD) successfully classified two datasets with accuracies greater than 90%. We compared our results with similar published methods, and our method equally performed with the others at two levels in the secreted pathway. This study shows that the feature CTD extracted from protein sequences is effective in capturing biological features among compartments in secreted pathways.  相似文献   

14.
The analysis of sequence conservation is commonly used to predict functionally important sites in proteins. We have developed an approach that first identifies highly conserved sites in a set of orthologous sequences using a weighted substitution‐matrix‐based conservation score and then filters these conserved sites based on the pattern of conservation present in a wider alignment of sequences from the same family and structural information to identify surface‐exposed sites. This allows us to detect specific functional sites in the target protein and exclude regions that are likely to be generally important for the structure or function of the wider protein family. We applied our method to two members of the serpin family of serine protease inhibitors. We first confirmed that our method successfully detected the known heparin binding site in antithrombin while excluding residues known to be generally important in the serpin family. We next applied our sequence analysis approach to neuroserpin and used our results to guide site‐directed polyalanine mutagenesis experiments. The majority of the mutant neuroserpin proteins were found to fold correctly and could still form inhibitory complexes with tissue plasminogen activator (tPA). Kinetic analysis of tPA inhibition, however, revealed altered inhibitory kinetics in several of the mutant proteins, with some mutants showing decreased association with tPA and others showing more rapid dissociation of the covalent complex. Altogether, these results confirm that our sequence analysis approach is a useful tool that can be used to guide mutagenesis experiments for the detection of specific functional sites in proteins. Proteins 2015; 83:135–152. © 2014 Wiley Periodicals, Inc.  相似文献   

15.
Pentapeptide scanning mutagenesis is a facile transposon-based procedure for the random insertion of a variable five amino acid cassette into a target protein. The analysis of a library of proteins harbouring pentapeptide insertions can provide invaluable information on the essential and inessential regions of a target protein, as well as revealing surprising aspects of target protein function and activity.  相似文献   

16.
The leucine-specific binding protein (LS-BP), a periplasmic component of the Escherichia coli high-affinity leucine transport system, is initially synthesized in a precursor form with a 23 amino acid N-terminal leader sequence that is removed during secretion of the protein into the periplasm. Using in vitro mutagenesis, deletion mutants of the LS-BP gene have been constructed with altered or missing amino acid sequences in the C-terminal portion of the protein. These altered binding proteins exhibited normal processing and secretion but were rapidly degraded in the periplasmic space. In the presence of an uncoupler of the transmembrane potential (CCCP) the precursor forms accumulated in the membrane and were protected from degradation. The altered binding proteins also were secreted by spheroplasts of E coli, after which they were easily detected.  相似文献   

17.
The thermodynamic stability of a protein provides an experimental metric for the relationship of protein sequence and native structure. We have investigated an approach based on an analysis of the structural database for stability engineering of an immunoglobulin variable domain. The most frequently occurring residues in specific positions of beta-turn motifs were predicted to increase the folding stability of mutants that were constructed by site-directed mutagenesis. Even in positions in which different residues are conserved in immunoglobulin sequences, the predictions were confirmed. Frequently, mutants with increased beta-turn propensities display increased folding cooperativities, suggesting pronounced effects on the unfolded state independent of the expected effect on conformational entropy. We conclude that structural motifs with predominantly local interactions can serve as templates with which patterns of sequence preferences can be extracted from the database of protein structures. Such preferences can predict the stability effects of mutations for protein engineering and design.  相似文献   

18.
A computer program for the generation and analysis of in silico random point mutagenesis libraries is described. The program operates by mutagenizing an input nucleic acid sequence according to mutation parameters specified by the user for each sequence position and type of point mutation. The program can mimic almost any type of random mutagenesis library, including those produced via error-prone PCR (ep-PCR), mutator Escherichia coli strains, chemical mutagenesis, and doped or random oligonucleotide synthesis. The program analyzes the generated nucleic acid sequences and/or the associated protein library to produce several estimates of library diversity (number of unique sequences, point mutations, and single point mutants) and the rate of saturation of these diversities during experimental screening or selection of clones. This information allows one to select the optimal screen size for a given mutagenesis library, necessary to efficiently obtain a certain coverage of the sequence-space. The program also reports the abundance of each specific protein mutation at each sequence position, which is useful as a measure of the level and type of mutation bias in the library. Alternatively, one can use the program to evaluate the relative merits of preexisting libraries, or to examine various hypothetical mutation schemes to determine the optimal method for creating a library that serves the screen/selection of interest. Simulated libraries of at least 109 sequences are accessible by the numerical algorithm with currently available personal computers; an analytical algorithm is also available which can rapidly calculate a subset of the numerical statistics in libraries of arbitrarily large size. A multi-type double-strand stochastic model of ep-PCR is developed in an appendix to demonstrate the applicability of the algorithm to amplifying mutagenesis procedures. Estimators of DNA polymerase mutation-type-specific error rates are derived using the model. Analyses of an alpha-synuclein ep-PCR library and NNS synthetic oligonucleotide libraries are given as examples.  相似文献   

19.
Increasing the conformational stability of proteins is an important goal for both basic research and industrial applications. In vitro selection has been used successfully to increase protein stability, but more often site‐directed mutagenesis is used to optimize the various forces that contribute to protein stability. In previous studies, we showed that improving electrostatic interactions on the protein surface and improving the β‐turn sequences were good general strategies for increasing protein stability, and used them to increase the stability of RNase Sa. By incorporating seven of these mutations in RNase Sa, we increased the stability by 5.3 kcal/mol. Adding one more mutation, D79F, gave a total increase in stability of 7.7 kcal/mol, and a melting temperature 28°C higher than the wild‐type enzyme. Surprisingly, the D79F mutation lowers the change in heat capacity for folding, ΔCp, by 0.6 kcal/mol/K. This suggests that this mutation stabilizes structure in the denatured state ensemble. We made other mutants that give some insight into the structure present in the denatured state. Finally, the thermodynamics of folding of these stabilized variants of RNase Sa are compared with those observed for proteins from thermophiles.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号