首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
The development of the EGAD program and energy function for protein design is described. In contrast to most protein design methods, which require several empirical parameters or heuristics such as patterning of residues or rotamers, EGAD has a minimalist philosophy; it uses very few empirical factors to account for inaccuracies resulting from the use of fixed backbones and discrete rotamers in protein design calculations, and describes the unfolded state, aggregates, and alternative conformers explicitly with physical models instead of fitted parameters. This approach unveils important issues in protein design that are often camouflaged by heuristic-emphasizing methods. Inter-atom energies are modeled with the OPLS-AA all-atom forcefield, electrostatics with the generalized Born continuum model, and the hydrophobic effect with a solvent-accessible surface area-dependent term. Experimental characterization of proteins designed with an unmodified version of the energy function revealed problems with under-packing, stability, aggregation, and structural specificity. Under-packing was addressed by modifying the van der Waals function. By optimizing only three parameters, the effects of >400 mutations on protein-protein complex formation were predicted to within 1.0 kcal mol(-1). As an independent test, this modified energy function was used to predict the stabilities of >1500 mutants to within 1.0 kcal mol(-1); this required a physical model of the unfolded state that includes more interactions than traditional tripeptide-based models. Solubility and structural specificity were addressed with simple physical approximations of aggregation and conformational equilibria. The complete energy function can design protein sequences that have high levels of identity with their natural counterparts, and have predicted structural properties more consistent with soluble and uniquely folded proteins than the initial designs.  相似文献   

2.
Proteins are typically represented by discrete atomic coordinates providing an accessible framework to describe different conformations. However, in some fields proteins are more accurately represented as near-continuous surfaces, as these are imprinted with geometric (shape) and chemical (electrostatics) features of the underlying protein structure. Protein surfaces are dependent on their chemical composition and, ultimately determine protein function, acting as the interface that engages in interactions with other molecules. In the past, such representations were utilized to compare protein structures on global and local scales and have shed light on functional properties of proteins. Here we describe RosettaSurf, a surface-centric computational design protocol, that focuses on the molecular surface shape and electrostatic properties as means for protein engineering, offering a unique approach for the design of proteins and their functions. The RosettaSurf protocol combines the explicit optimization of molecular surface features with a global scoring function during the sequence design process, diverging from the typical design approaches that rely solely on an energy scoring function. With this computational approach, we attempt to address a fundamental problem in protein design related to the design of functional sites in proteins, even when structurally similar templates are absent in the characterized structural repertoire. Surface-centric design exploits the premise that molecular surfaces are, to a certain extent, independent of the underlying sequence and backbone configuration, meaning that different sequences in different proteins may present similar surfaces. We benchmarked RosettaSurf on various sequence recovery datasets and showcased its design capabilities by generating epitope mimics that were biochemically validated. Overall, our results indicate that the explicit optimization of surface features may lead to new routes for the design of functional proteins.  相似文献   

3.
Protein disorder prediction: implications for structural proteomics   总被引:26,自引:0,他引:26  
A great challenge in the proteomics and structural genomics era is to predict protein structure and function, including identification of those proteins that are partially or wholly unstructured. Disordered regions in proteins often contain short linear peptide motifs (e.g., SH3 ligands and targeting signals) that are important for protein function. We present here DisEMBL, a computational tool for prediction of disordered/unstructured regions within a protein sequence. As no clear definition of disorder exists, we have developed parameters based on several alternative definitions and introduced a new one based on the concept of "hot loops," i.e., coils with high temperature factors. Avoiding potentially disordered segments in protein expression constructs can increase expression, foldability, and stability of the expressed protein. DisEMBL is thus useful for target selection and the design of constructs as needed for many biochemical studies, particularly structural biology and structural genomics projects. The tool is freely available via a web interface (http://dis.embl.de) and can be downloaded for use in large-scale studies.  相似文献   

4.
GlobPlot: Exploring protein sequences for globularity and disorder   总被引:2,自引:0,他引:2  
A major challenge in the proteomics and structural genomics era is to predict protein structure and function, including identification of those proteins that are partially or wholly unstructured. Non-globular sequence segments often contain short linear peptide motifs (e.g. SH3-binding sites) which are important for protein function. We present here a new tool for discovery of such unstructured, or disordered regions within proteins. GlobPlot (http://globplot.embl.de) is a web service that allows the user to plot the tendency within the query protein for order/globularity and disorder. We show examples with known proteins where it successfully identifies inter-domain segments containing linear motifs, and also apparently ordered regions that do not contain any recognised domain. GlobPlot may be useful in domain hunting efforts. The plots indicate that instances of known domains may often contain additional N- or C-terminal segments that appear ordered. Thus GlobPlot may be of use in the design of constructs corresponding to globular proteins, as needed for many biochemical studies, particularly structural biology. GlobPlot has a pipeline interface--GlobPipe--for the advanced user to do whole proteome analysis. GlobPlot can also be used as a generic infrastructure package for graphical displaying of any possible propensity.  相似文献   

5.
The construction of fitness landscape has broad implication in understanding molecular evolution, cellular epigenetic state, and protein structures. We studied the problem of constructing fitness landscape of inverse protein folding or protein design, with the aim to generate amino acid sequences that would fold into an a priori determined structural fold which would enable engineering novel or enhanced biochemistry. For this task, an effective fitness function should allow identification of correct sequences that would fold into the desired structure. In this study, we showed that nonlinear fitness function for protein design can be constructed using a rectangular kernel with a basis set of proteins and decoys chosen a priori. The full landscape for a large number of protein folds can be captured using only 480 native proteins and 3,200 non-protein decoys via a finite Newton method. A blind test of a simplified version of fitness function for sequence design was carried out to discriminate simultaneously 428 native sequences not homologous to any training proteins from 11 million challenging protein-like decoys. This simplified function correctly classified 408 native sequences (20 misclassifications, 95% correct rate), which outperforms several other statistical linear scoring function and optimized linear function. Our results further suggested that for the task of global sequence design of 428 selected proteins, the search space of protein shape and sequence can be effectively parametrized with just about 3,680 carefully chosen basis set of proteins and decoys, and we showed in addition that the overall landscape is not overly sensitive to the specific choice of this set. Our results can be generalized to construct other types of fitness landscape.  相似文献   

6.
A number of structural genomics/proteomics initiatives are focused on bacterial or viral pathogens. In this article, we will review the progress of structural proteomics initiatives targeting the SARS coronavirus (SARS-CoV), the etiological agent of the 2003 worldwide epidemic that culminated in approximately 8,000 cases and 800 deaths. The SARS-CoV genome encodes 28 proteins in three distinct classes, many of them with unknown function and sharing low similarity to other proteins. The structures of 16 SARS-CoV proteins or functional domains have been determined to date. Remarkably, eight of these 16 proteins or functional domains have novel folds, indicating the uniqueness of the coronavirus proteins. The results of SARS-CoV structural proteomics initiatives will have several profound biological impacts, including elucidation of the structure-function relationships of coronavirus proteins; identification of targets for the design of anti-viral compounds against SARS-CoV and other coronaviruses; and addition of new protein folds to the fold space, with further understanding of the structure-function relationships for several new protein families. We discuss the use of structural proteomics in response to emerging infectious diseases such as SARS-CoV and to increase preparedness against future emerging coronaviruses.  相似文献   

7.
蛋白质结构与功能中的结构域   总被引:5,自引:1,他引:4  
结构域是蛋白质亚基结构中的紧密球状区域.结构域作为蛋白质结构中介于二级与三级结构之间的又一结构层次,在蛋白质中起着独立的结构单位、功能单位与折叠单位的作用.在复杂蛋白质中,结构域具有结构与功能组件与遗传单位的作用.结构域层次的研究将会促进蛋白质结构与功能关系、蛋白质折叠机制以及蛋白质设计的研究.  相似文献   

8.
Finding small molecules that modulate protein function is of primary importance in drug development and in the emerging field of chemical genomics. To facilitate the identification of such molecules, we developed a novel strategy making use of structural conservatism found in protein domain architecture and natural product inspired compound library design. Domains and proteins identified as being structurally similar in their ligand-sensing cores are grouped in a protein structure similarity cluster (PSSC). Natural products can be considered as evolutionary pre-validated ligands for multiple proteins and therefore natural products that are known to interact with one of the PSSC member proteins are selected as guiding structures for compound library synthesis. Application of this novel strategy for compound library design provided enhanced hit rates in small compound libraries for structurally similar proteins.  相似文献   

9.
Rahul Kaushik  Kam Y. J. Zhang 《Proteins》2020,88(10):1271-1284
The infinitesimally small sequence space naturally scouted in the millions of years of evolution suggests that the natural proteins are constrained by some functional prerequisites and should differ from randomly generated sequences. We have developed a protein sequence fitness scoring function that implements sequence and corresponding secondary structural information at tripeptide levels to differentiate natural and nonnatural proteins. The proposed fitness function is extensively validated on a dataset of about 210 000 natural and nonnatural protein sequences and benchmarked with existing methods for differentiating natural and nonnatural proteins. The high sensitivity, specificity, and percentage accuracy (0.81%, 0.95%, and 91% respectively) of the fitness function demonstrates its potential application for sampling the protein sequences with higher probability of mimicking natural proteins. Moreover, the four major classes of proteins (α proteins, β proteins, α/β proteins, and α + β proteins) are separately analyzed and β proteins are found to score slightly lower as compared to other classes. Further, an analysis of about 250 designed proteins (adopted from previously reported cases) helped to define the boundaries for sampling the ideal protein sequences. The protein sequence characterization aided by the proposed fitness function could facilitate the exploration of new perspectives in the design of novel functional proteins.  相似文献   

10.
The rational/structure-based design and/or combinatorial development of molecules capable of structurally and functionally mimicking the binding sites of proteins represents a promising strategy for the exploration and understanding of protein structure and function. The ultimate goal of using such molecules is the modulation of protein function through controlled interference with the underlying binding events. In addition to their basic significance, such proteinmimetics are also useful tools for a range of biomedical applications, in particular the inhibition of disease-associated protein-ligand interactions. Owing to their chemical and structural relation to proteins, as well as the relative simplicity of their chemical or recombinant synthesis, peptides have emerged as adequate molecules for the mimicry of protein binding sites, as well as the inhibition of protein-protein interactions.  相似文献   

11.
12.
Escherichia coli (E. coli) is the most widely used expression system for the production of recombinant proteins for structural and functional studies. However, to obtain milligrams of soluble proteins is still challenging since many proteins are expressed in an insoluble form without optimization. Therefore when working with tens of proteins or protein domains it is recommended that high-throughput expression screening at a small scale (1-4ml of culture) is carried out to identify the optimal conditions for soluble protein production. Once determined, these culture conditions can be applied at a large scale to produce sufficient protein for structural or functional studies. We describe a procedure that has enabled the systematic screening of culture conditions or fusion-tags on hundreds of cultures per week. The analysis of the optimal conditions for the soluble production of these proteins helped us to design a simple and efficient protocol for soluble protein expression screening. This protocol has since been used on hundreds of proteins and is illustrated with the genome wide scale production of proteins containing the DNA binding domains of Ciona intestinalis.  相似文献   

13.
Protein families and RNA recognition   总被引:1,自引:0,他引:1  
Chen Y  Varani G 《The FEBS journal》2005,272(9):2088-2097
This minireview series examines the structural principles underlying the biological function of RNA-binding proteins. The structural work of the last decade has elucidated the structures of essentially all the major RNA-binding protein families; it has also demonstrated how RNA recognition takes place. The ribosome structures have further integrated this knowledge into principles for the assembly of complex ribonucleoproteins. Structural and biochemical work has revealed unexpectedly that several RNA-binding proteins bind to other proteins in addition to RNA or instead of RNA. This tremendous increase in the structural knowledge has expanded not only our understanding of the RNA recognition principle, but has also provided new insight into the biological function of these proteins and has helped to design better experiments to understand their biological roles.  相似文献   

14.
X-ray solution scattering in both the small-angle (SAXS) and wide-angle (WAXS) regimes is making an increasing impact on our understanding of biomolecular complexes. The accurate calculation of WAXS patterns from atomic coordinates has positioned the approach for rapid growth and integration with existing Structural Genomics efforts. WAXS data are sensitive to small structural changes in proteins; useful for calculation of the pair-distribution function at relatively high resolution; provides a means to characterize the breadth of the structural ensemble in solution; and can be used to identify proteins with similar folds. WAXS data are often used to test structural models, identify structural similarities and characterize structural changes. WAXS is highly complementary to crystallography and NMR. It holds great potential for the testing of structural models of proteins; identification of proteins that may exhibit novel folds; characterization of unfolded or natively disordered proteins; and detection of structural changes associated with protein function.  相似文献   

15.
The structural genomics projects have been accumulating an increasing number of protein structures, many of which remain functionally unknown. In parallel effort to experimental methods, computational methods are expected to make a significant contribution for functional elucidation of such proteins. However, conventional computational methods that transfer functions from homologous proteins do not help much for these uncharacterized protein structures because they do not have apparent structural or sequence similarity with the known proteins. Here, we briefly review two avenues of computational function prediction methods, i.e. structure-based methods and sequence-based methods. The focus is on our recent developments of local structure-based and sequence-based methods, which can effectively extract function information from distantly related proteins. Two structure-based methods, Pocket-Surfer and Patch-Surfer, identify similar known ligand binding sites for pocket regions in a query protein without using global protein fold similarity information. Two sequence-based methods, protein function prediction and extended similarity group, make use of weakly similar sequences that are conventionally discarded in homology based function annotation. Combined together with experimental methods we hope that computational methods will make leading contribution in functional elucidation of the protein structures.  相似文献   

16.
Protein aggregation, being an outcome of improper protein folding, is largely dependent on the folding kinetics of a protein. Previous studies have reported a positive correlation between the stability of the secondary structural elements of a protein and their rate of folding/unfolding. In this in silico study, the secondary and tertiary structures of proteins a) that form inclusion bodies on overexpression in Escherichia coli, b) that form amyloid fibrils and c) that are soluble on overexpression in E. coli are analyzed for certain features that are known to be associated with structural stability. The study revealed that the soluble proteins seem to have a higher rate of folding (based on contact order) and a lower percentage of exposed hydrophobic residues as compared to the inclusion body forming or amyloidogenic proteins. The soluble proteins also seem to have a more favored helix and strand composition (based on the known secondary structural propensities of amino acids). The secondary structure analyses also reveal that the evolutionary pressure is directed against protein aggregation. This understanding of the positive correlation between structural stability and solubility, along with the other parameters known to influence aggregation, could be exploited in the design of mutations aimed at reducing the aggregation propensity of the proteins.  相似文献   

17.
A long-standing goal in biology is to establish the link between function, structure, and dynamics of proteins. Considering that protein function at the molecular level is understood by the ability of proteins to bind to other molecules, the limited structural data of proteins in association with other bio-molecules represents a major hurdle to understanding protein function at the structural level. Recent reports show that protein function can be linked to protein structure and dynamics through network centrality analysis, suggesting that the structures of proteins bound to natural ligands may be inferred computationally. In the present work, a new method is described to discriminate protein conformations relevant to the specific recognition of a ligand. The method relies on a scoring system that matches critical residues with central residues in different structures of a given protein. Central residues are the most traversed residues with the same frequency in networks derived from protein structures. We tested our method in a set of 24 different proteins and more than 260,000 structures of these in the absence of a ligand or bound to it. To illustrate the usefulness of our method in the study of the structure/dynamics/function relationship of proteins, we analyzed mutants of the yeast TATA-binding protein with impaired DNA binding. Our results indicate that critical residues for an interaction are preferentially found as central residues of protein structures in complex with a ligand. Thus, our scoring system effectively distinguishes protein conformations relevant to the function of interest.  相似文献   

18.
Proteomics is the study of the protein complement of a genome and employs a number of newly emerging tools. One such tool is chemical proteomics, which is a branch of proteomics devoted to the exploration of protein function using both in vitro and in vivo chemical probes. Chemical proteomics aims to define protein function and mechanism at the level of directly observed protein–ligand interactions, whereas chemical genomics aims to define the biological role of a protein using chemical knockouts and observing phenotypic changes. Chemical proteomics is therefore traditional mechanistic biochemistry performed in a systems-based manner, using either activity- or affinity-based probes that target proteins related by chemical reactivities or by binding site shape/properties, respectively. Systems are groups of proteins related by metabolic pathway, regulatory pathway or binding to the same ligand. Studies can be based on two main types of proteome samples: pooled proteins (1 mixture of N proteins) or isolated proteins in a given system and studied in parallel (N single protein samples). Although the field of chemical proteomics originated with the use of covalent labeling strategies such as isotope-coded affinity tagging, it is expanding to include chemical probes that bind proteins noncovalently, and to include more methods for observing protein–ligand interactions. This review presents an emerging role for nuclear magnetic resonance spectroscopy in chemical proteomics, both in vitro and in vivo. Applications include: functional proteomics using cofactor fingerprinting to assign proteins to gene families; gene family-based structural characterizations of protein–ligand complexes; gene family-focused design of drug leads; and chemical proteomic probes using nuclear magnetic resonance SOLVE and studies of protein–ligand interactions in vivo.  相似文献   

19.
Protein structure can provide new insight into the biological function of a protein and can enable the design of better experiments to learn its biological roles. Moreover, deciphering the interactions of a protein with other molecules can contribute to the understanding of the protein's function within cellular processes. In this study, we apply a machine learning approach for classifying RNA-binding proteins based on their three-dimensional structures. The method is based on characterizing unique properties of electrostatic patches on the protein surface. Using an ensemble of general protein features and specific properties extracted from the electrostatic patches, we have trained a support vector machine (SVM) to distinguish RNA-binding proteins from other positively charged proteins that do not bind nucleic acids. Specifically, the method was applied on proteins possessing the RNA recognition motif (RRM) and successfully classified RNA-binding proteins from RRM domains involved in protein-protein interactions. Overall the method achieves 88% accuracy in classifying RNA-binding proteins, yet it cannot distinguish RNA from DNA binding proteins. Nevertheless, by applying a multiclass SVM approach we were able to classify the RNA-binding proteins based on their RNA targets, specifically, whether they bind a ribosomal RNA (rRNA), a transfer RNA (tRNA), or messenger RNA (mRNA). Finally, we present here an innovative approach that does not rely on sequence or structural homology and could be applied to identify novel RNA-binding proteins with unique folds and/or binding motifs.  相似文献   

20.
The rapid growth in protein structural data and the emergence of structural genomics projects have increased the need for automatic structure analysis and tools for function prediction. Small molecule recognition is critical to the function of many proteins; therefore, determination of ligand binding site similarity is important for understanding ligand interactions and may allow their functional classification. Here, we present a binding sites database (SitesBase) that given a known protein-ligand binding site allows rapid retrieval of other binding sites with similar structure independent of overall sequence or fold similarity. However, each match is also annotated with sequence similarity and fold information to aid interpretation of structure and functional similarity. Similarity in ligand binding sites can indicate common binding modes and recognition of similar molecules, allowing potential inference of function for an uncharacterised protein or providing additional evidence of common function where sequence or fold similarity is already known. Alternatively, the resource can provide valuable information for detailed studies of molecular recognition including structure-based ligand design and in understanding ligand cross-reactivity. Here, we show examples of atomic similarity between superfamily or more distant fold relatives as well as between seemingly unrelated proteins. Assignment of unclassified proteins to structural superfamiles is also undertaken and in most cases substantiates assignments made using sequence similarity. Correct assignment is also possible where sequence similarity fails to find significant matches, illustrating the potential use of binding site comparisons for newly determined proteins.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号