首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Protein–protein interactions (PPI) are crucial for protein function. There exist many techniques to identify PPIs experimentally, but to determine the interactions in molecular detail is still difficult and very time‐consuming. The fact that the number of PPIs is vastly larger than the number of individual proteins makes it practically impossible to characterize all interactions experimentally. Computational approaches that can bridge this gap and predict PPIs and model the interactions in molecular detail are greatly needed. Here we present InterPred, a fully automated pipeline that predicts and model PPIs from sequence using structural modeling combined with massive structural comparisons and molecular docking. A key component of the method is the use of a novel random forest classifier that integrate several structural features to distinguish correct from incorrect protein–protein interaction models. We show that InterPred represents a major improvement in protein–protein interaction detection with a performance comparable or better than experimental high‐throughput techniques. We also show that our full‐atom protein–protein complex modeling pipeline performs better than state of the art protein docking methods on a standard benchmark set. In addition, InterPred was also one of the top predictors in the latest CAPRI37 experiment. InterPred source code can be downloaded from http://wallnerlab.org/InterPred Proteins 2017; 85:1159–1170. © 2017 Wiley Periodicals, Inc.  相似文献   

2.
The mechanism of autophagy relies on complex cell signaling and regulatory processes. Each cell contains many proteins that lack a rigid 3-dimensional structure under physiological conditions. These dynamic proteins, called intrinsically disordered proteins (IDPs) and protein regions (IDPRs), are predominantly involved in cell signaling and regulation. Yet, very little is known about their presence among proteins of the core autophagy machinery. In this work, we characterized the autophagy protein Atg3 from yeast and human along with 2 variants to show that Atg3 is an IDPRs-containing protein and that disorder/order predicted for these proteins from their amino acid sequence corresponds to their experimental characteristics. Based on this consensus, we applied the same prediction methods to all known Atg proteins from Saccharomyces cerevisiae. The data presented here provide an insight into the structural dynamics of each Atg protein. They also show that intrinsic disorder at various levels has to be taken into consideration for about half of the Atg proteins. This work should become a useful tool that will facilitate and encourage exploration of protein intrinsic disorder in autophagy.  相似文献   

3.
《Autophagy》2013,9(6):1093-1104
The mechanism of autophagy relies on complex cell signaling and regulatory processes. Each cell contains many proteins that lack a rigid 3-dimensional structure under physiological conditions. These dynamic proteins, called intrinsically disordered proteins (IDPs) and protein regions (IDPRs), are predominantly involved in cell signaling and regulation. Yet, very little is known about their presence among proteins of the core autophagy machinery. In this work, we characterized the autophagy protein Atg3 from yeast and human along with 2 variants to show that Atg3 is an IDPRs-containing protein and that disorder/order predicted for these proteins from their amino acid sequence corresponds to their experimental characteristics. Based on this consensus, we applied the same prediction methods to all known Atg proteins from Saccharomyces cerevisiae. The data presented here provide an insight into the structural dynamics of each Atg protein. They also show that intrinsic disorder at various levels has to be taken into consideration for about half of the Atg proteins. This work should become a useful tool that will facilitate and encourage exploration of protein intrinsic disorder in autophagy.  相似文献   

4.
Morra G  Colombo G 《Proteins》2008,72(2):660-672
Most proteins must fold to a well-defined structure with a minimal stability to perform their function. Here we use a simple, molecular dynamics-based, energy decomposition approach to map the principal energetic interactions in a set of proteins representative of different folds. This work involves the all-atom simulation and analysis of the native structures and mutants of five different proteins representative of an all-alpha (yACPB, Protein A), all-beta (SH3), and a mixed alpha/beta fold (Proteins G and L). Given a certain structure, a native sequence and a set of mutants, we show that our model discriminates the ability of a mutation to yield a more or less stable protein, in agreement with experimental data, catching the principal energetic determinants of protein stabilization. Our approach identifies the interaction determinants responsible to define a fold and shows that mutations can either modulate the strength of pair-wise coupling between residues important for folding, or modify the profile of the principal interactions. Furthermore, we address the question of how to evaluate the fitness of a sequence to a given structure by comparing the information contained in the energy map, which recapitulates the chemistry of the sequence, to that contained in the contact map, which recapitulates the fold topology. The results show that the better fit between the energetic properties of the sequence and the fold topology corresponds to a higher stabilization of the protein. We discuss the relevance of these observations to the analysis of protein designability and to the rational evolution of new sequences.  相似文献   

5.
Xia Y  Levitt M 《Proteins》2004,55(1):107-114
To understand the physical and evolutionary determinants of protein folding, we map out the complete organization of thermodynamic and kinetic properties for protein sequences that share the same fold. The exhaustive nature of our study necessitates using simplified models of protein folding. We obtain a stability map and a folding rate map in sequence space. Comparison of the two maps reveals a common organizational principle: optimality decreases more or less uniformly with distance from the optimal sequence in the sequence space. This gives a funnel-shaped optimality surface. Evolutionary dynamics of a sequence population on these two maps reveal how the simple organization of sequence space affects the distributions of stability and folding rate preferred by evolution.  相似文献   

6.
Tompa P  Prilusky J  Silman I  Sussman JL 《Proteins》2008,71(2):903-909
Targeted turnover of proteins is a key element in the regulation of practically all basic cellular processes. The underlying physicochemical and/or sequential signals, however, are not fully understood. This issue is particularly pertinent in light of the recent recognition that intrinsically unstructured/disordered proteins, common in eukaryotic cells, are extremely susceptible to proteolytic degradation in vitro. The in vivo half-lives of proteins were determined recently in a high-throughput study encompassing the entire yeast proteome; here we examine whether these half-lives correlate with the presence of classical degradation motifs (PEST region, destruction-box, KEN-box, or the N-terminal residue) or with various physicochemical characteristics, such as the size of the protein, the degree of structural disorder, or the presence of low-complexity regions. Our principal finding is that, in general, the half-life of a protein does not depend on the presence of degradation signals within its sequence, even of ubiquitination sites, but correlates mainly with the length of its polypeptide chain and with various measures of structural disorder. Two distinct modes of involvement of disorder in degradation are proposed. Susceptibility to degradation of longer proteins, containing larger numbers of residues in conformational disorder, suggests an extensive function, whereby the effect of disorder can be ascribed to its mere physical presence. However, after normalization for protein length, the only signal that correlates with half-life is disorder, which indicates that it also acts in an intensive manner, that is, as a specific signal, perhaps in conjunction with the recognition of classical degradation motifs. The significance of correlation is rather low; thus protein degradation is not determined by a single characteristic, but is a multi-factorial process that shows large protein-to-protein variations. Protein disorder, nevertheless, plays a key signalling role in many cases.  相似文献   

7.
Barbany M  Morata J  Meyer T  Lois S  Orozco M  de la Cruz X 《Proteins》2012,80(9):2235-2249
Recent studies have shown how alternative splicing (AS), the process by which eukaryotic genes express more than one product, affects protein sequence and structure. However, little information is available on the impact of AS on protein dynamics, a property fundamental for protein function. In this work, we have addressed this issue using molecular dynamics simulations of the isoforms of two model proteins: glutathione S-transferase and ectodysplasin-A. We have found that AS does not have a noticeable impact on global or local structure fluctuations. We have also found that, quite interestingly, AS has a significant effect on the coupling between key structural elements such as surface cavities. Our results provide the first atom-level view of the impact of AS on protein dynamics, as far as we know. They can contribute to refine our present view of the relationship between AS and protein disorder and, more importantly, they reveal how AS may modify structural dynamic couplings in proteins.  相似文献   

8.
The intracellular fate of T cell antigen receptor (TCR) subunits (alpha beta gamma delta epsilon zeta 2) is determined by their assembly in the endoplasmic reticulum (ER). To study the structural bases for this tight correlation between assembly and intracellular fate, we sought to define the nature of determinants for both ER degradation and subunit assembly within the TCR-alpha chain. We found that a 9 amino acid transmembrane sequence of the TCR-alpha chain, containing 2 critical charged residues, was sufficient to cause ER degradation when placed in the context of the Tac antigen, used here as a reporter protein. CD3-delta assembled with chimeric proteins containing this short transmembrane sequence, and this assembly resulted in abrogation of targeting for ER degradation. Thus, the colocalization of determinants for ER degradation and sites of subunit interactions explains how the fate of some newly synthesized TCR chains can be decided on the basis of their assembly status.  相似文献   

9.
DeWeese-Scott C  Moult J 《Proteins》2004,55(4):942-961
Experimental protein structures often provide extensive insight into the mode and specificity of small molecule binding, and this information is useful for understanding protein function and for the design of drugs. We have performed an analysis of the reliability with which ligand-binding information can be deduced from computer model structures, as opposed to experimentally derived ones. Models produced as part of the CASP experiments are used. The accuracy of contacts between protein model atoms and experimentally determined ligand atom positions is the main criterion. Only comparative models are included (i.e., models based on a sequence relationship between the protein of interest and a known structure). We find that, as expected, contact errors increase with decreasing sequence identity used as a basis for modeling. Analysis of the causes of errors shows that sequence alignment errors between model and experimental template have the most deleterious effect. In general, good, but not perfect, insight into ligand binding can be obtained from models based on a sequence relationship, providing there are no alignment errors in the model. The results support a structural genomics strategy based on experimental sampling of structure space so that all protein domains can be modeled on the basis of 30% or higher sequence identity.  相似文献   

10.
Membrane proteins play a crucial role in various cellular processes and are essential components of cell membranes. Computational methods have emerged as a powerful tool for studying membrane proteins due to their complex structures and properties that make them difficult to analyze experimentally. Traditional features for protein sequence analysis based on amino acid types, composition, and pair composition have limitations in capturing higher-order sequence patterns. Recently, multiple sequence alignment (MSA) and pre-trained language models (PLMs) have been used to generate features from protein sequences. However, the significant computational resources required for MSA-based features generation can be a major bottleneck for many applications. Several methods and tools have been developed to accelerate the generation of MSAs and reduce their computational cost, including heuristics and approximate algorithms. Additionally, the use of PLMs such as BERT has shown great potential in generating informative embeddings for protein sequence analysis. In this review, we provide an overview of traditional and more recent methods for generating features from protein sequences, with a particular focus on MSAs and PLMs. We highlight the advantages and limitations of these approaches and discuss the methods and tools developed to address the computational challenges associated with features generation. Overall, the advancements in computational methods and tools provide a promising avenue for gaining deeper insights into the function and properties of membrane proteins, which can have significant implications in drug discovery and personalized medicine.  相似文献   

11.
Targeted protein degradation plays an important regulatory role in the cell, but only a few protein degradation signals have been characterized in plants. Here we describe three instability determinants in the termini of the cauliflower mosaic virus (CaMV) capsid protein precursor, of which one is still present in the mature capsid protein p44. A modified ubiquitin protein reference technique was used to show that these motifs are still active when fused to a heterologous reporter gene. The N-terminus of p44 contains a degradation motif characterized by proline, glutamate, aspartate, serine and threonine residues (PEST), which can be inactivated by mutation of three glutamic acid residues to alanines. The signals from the precursor do not correspond to known degradation motifs, although they confer high instability on proteins expressed in plant protoplasts. All three instability determinants were also active in mammalian cells. The PEST signal had a significantly higher degradation activity in HeLa cells, whereas the precursor signals were less active. Inhibition studies suggest that only the signal within the N-terminus of the precursor is targeting the proteasome in plants. This implies that the other two signals may target a novel degradation pathway.  相似文献   

12.
Proteolytic resistance, as conferred by protein aggregation into inclusion bodies, has not been explored in detail. We have investigated the eventual digestion of several closely-related proteins, namely six insertional and two fusion mutants of the homotrimeric bacteriophage P22 tailspike (TSP) protein. When over-produced in E. coli, all these polypeptides form inclusion bodies accompanied by only traces of soluble protein. The mutations introduced in TSP impaired its degradation and enhanced its half live up to ten-fold, without affecting protein solubility. This indicates that protein properties other than solubility, are the main determinants of susceptibility to proteolysis. In addition, the analysis of the degradation fragments strongly suggests that the aggregated TSP polypeptides undergo a site-limited proteolytic attack, and that their complete digestion occurs through an in situ cascade cleavage process.  相似文献   

13.
Recombinant human Acid Alpha Glucosidase (GAA) is the therapeutic enzyme used for the treatment of Pompe disease, a rare genetic disorder characterized by GAA deficiency in the cell lysosomes (Raben et al., Curr Mol Med. 2002; 2:145–166). The manufacturing process for GAA can be challenging, in part due to protease degradation. The overall goal of this study was to understand the effects of GAA overexpression on cell lysosomal phenotype and host cell protein (HCP) release, and any resultant consequences for protease levels and ease of manufacture. To do this we first generated a human recombinant GAA producing stable CHO cell line and designed the capture chromatographic step anion exchange (IEX). We then collected images of cell lysosomes via transmission electron microscopy (TEM) and compared the resulting data with that from a null CHO cell line. TEM imaging revealed 72% of all lysosomes in the GAA cell line were engorged indicating extensive cell stress; by comparison only 8% of lysosomes in the null CHO had a similar phenotype. Furthermore, comparison of the HCP profile among cell lines (GAA, mAb, and Null) capture eluates, showed that while most HCPs released were common across them, some were unique to the GAA producer, implying that cell stress caused by overexpression of GAA has a molecule specific effect on HCP release. Protease analysis via zymograms showed an overall reduction in proteolytic activity after the capture step but also revealed the presence of co‐eluting proteases at approximately 80 KDa, which MS analysis putatively identified as dipeptidyl peptidase 3 and prolyl endopeptidase. © 2017 American Institute of Chemical Engineers Biotechnol. Prog., 33:666–676, 2017  相似文献   

14.
Although protein synthesis and protein degradation are two independent processes that are firmly regulated, how they maintain a balance of protein in the non-growing cell remains to be established. In work in the 1980s, the author suggested a self-regulating mechanism. However, experimental work on this interesting and fundamental problem is needed for a better understanding of 'protein balance' in cells.  相似文献   

15.
Traditionally, proteins have been viewed as a construct based on elements of secondary structure and their arrangement in three-dimensional space. In a departure from this perspective we show that protein structures can be modelled as network systems that exhibit small-world, single-scale, and to some degree, scale-free properties. The phenomenological network concept of degrees of separation is applied to three-dimensional protein structure networks and reveals how amino acid residues can be connected to each other within six degrees of separation. This work also illuminates the unique features of protein networks in comparison to other networks currently studied. Recognising that proteins are networks provides a means of rationalising the robustness in the overall three-dimensional fold of a protein against random mutations and suggests an alternative avenue to investigate the determinants of protein structure, function and folding.  相似文献   

16.
During their evolution, proteins explore sequence space via an interplay between random mutations and phenotypic selection. Here, we build upon recent progress in reconstructing data-driven fitness landscapes for families of homologous proteins, to propose stochastic models of experimental protein evolution. These models predict quantitatively important features of experimentally evolved sequence libraries, like fitness distributions and position-specific mutational spectra. They also allow us to efficiently simulate sequence libraries for a vast array of combinations of experimental parameters like sequence divergence, selection strength, and library size. We showcase the potential of the approach in reanalyzing two recent experiments to determine protein structure from signals of epistasis emerging in experimental sequence libraries. To be detectable, these signals require sufficiently large and sufficiently diverged libraries. Our modeling framework offers a quantitative explanation for different outcomes of recently published experiments. Furthermore, we can forecast the outcome of time- and resource-intensive evolution experiments, opening thereby a way to computationally optimize experimental protocols.  相似文献   

17.
Deep mutational scanning provides unprecedented wealth of quantitative data regarding the functional outcome of mutations in proteins. A single experiment may measure properties (eg, structural stability) of numerous protein variants. Leveraging the experimental data to gain insights about unexplored regions of the mutational landscape is a major computational challenge. Such insights may facilitate further experimental work and accelerate the development of novel protein variants with beneficial therapeutic or industrially relevant properties. Here we present a novel, machine learning approach for the prediction of functional mutation outcome in the context of deep mutational screens. Using sequence (one-hot) features of variants with known properties, as well as structural features derived from models thereof, we train predictive statistical models to estimate the unknown properties of other variants. The utility of the new computational scheme is demonstrated using five sets of mutational scanning data, denoted “targets”: (a) protease specificity of APPI (amyloid precursor protein inhibitor) variants; (b-d) three stability related properties of IGBPG (immunoglobulin G-binding β1 domain of streptococcal protein G) variants; and (e) fluorescence of GFP (green fluorescent protein) variants. Performance is measured by the overall correlation of the predicted and observed properties, and enrichment—the ability to predict the most potent variants and presumably guide further experiments. Despite the diversity of the targets the statistical models can generalize variant examples thereof and predict the properties of test variants with both single and multiple mutations.  相似文献   

18.
The ability to predict protein function from structure is becoming increasingly important as the number of structures resolved is growing more rapidly than our capacity to study function. Current methods for predicting protein function are mostly reliant on identifying a similar protein of known function. For proteins that are highly dissimilar or are only similar to proteins also lacking functional annotations, these methods fail. Here, we show that protein function can be predicted as enzymatic or not without resorting to alignments. We describe 1178 high-resolution proteins in a structurally non-redundant subset of the Protein Data Bank using simple features such as secondary-structure content, amino acid propensities, surface properties and ligands. The subset is split into two functional groupings, enzymes and non-enzymes. We use the support vector machine-learning algorithm to develop models that are capable of assigning the protein class. Validation of the method shows that the function can be predicted to an accuracy of 77% using 52 features to describe each protein. An adaptive search of possible subsets of features produces a simplified model based on 36 features that predicts at an accuracy of 80%. We compare the method to sequence-based methods that also avoid calculating alignments and predict a recently released set of unrelated proteins. The most useful features for distinguishing enzymes from non-enzymes are secondary-structure content, amino acid frequencies, number of disulphide bonds and size of the largest cleft. This method is applicable to any structure as it does not require the identification of sequence or structural similarity to a protein of known function.  相似文献   

19.
In the post-genome era, the prediction of protein function is one of the most demanding tasks in the study of bioinformatics. Machine learning methods, such as the support vector machines (SVMs), greatly help to improve the classification of protein function. In this work, we integrated SVMs, protein sequence amino acid composition, and associated physicochemical properties into the study of nucleic-acid-binding proteins prediction. We developed the binary classifications for rRNA-, RNA-, DNA-binding proteins that play an important role in the control of many cell processes. Each SVM predicts whether a protein belongs to rRNA-, RNA-, or DNA-binding protein class. Self-consistency and jackknife tests were performed on the protein data sets in which the sequences identity was < 25%. Test results show that the accuracies of rRNA-, RNA-, DNA-binding SVMs predictions are approximately 84%, approximately 78%, approximately 72%, respectively. The predictions were also performed on the ambiguous and negative data set. The results demonstrate that the predicted scores of proteins in the ambiguous data set by RNA- and DNA-binding SVM models were distributed around zero, while most proteins in the negative data set were predicted as negative scores by all three SVMs. The score distributions agree well with the prior knowledge of those proteins and show the effectiveness of sequence associated physicochemical properties in the protein function prediction. The software is available from the author upon request.  相似文献   

20.
Knowing the quality of a protein structure model is important for its appropriate usage. We developed a model evaluation method to assess the absolute quality of a single protein model using only structural features with support vector machine regression. The method assigns an absolute quantitative score (i.e. GDT‐TS) to a model by comparing its secondary structure, relative solvent accessibility, contact map, and beta sheet structure with their counterparts predicted from its primary sequence. We trained and tested the method on the CASP6 dataset using cross‐validation. The correlation between predicted and true scores is 0.82. On the independent CASP7 dataset, the correlation averaged over 95 protein targets is 0.76; the average correlation for template‐based and ab initio targets is 0.82 and 0.50, respectively. Furthermore, the predicted absolute quality scores can be used to rank models effectively. The average difference (or loss) between the scores of the top‐ranked models and the best models is 5.70 on the CASP7 targets. This method performs favorably when compared with the other methods used on the same dataset. Moreover, the predicted absolute quality scores are comparable across models for different proteins. These features make the method a valuable tool for model quality assurance and ranking. Proteins 2009. © 2008 Wiley‐Liss, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号