首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We have performed a statistical analysis of unstructured amino acid residues in protein structures available in the databank of protein structures. Data on the occurrence of disordered regions at the ends and in the middle part of protein chains have been obtained: in the regions near the ends (at distance less than 30 residues from the N- or C-terminus), there are 66% of unstructured residues (38% are near the N-terminus and 28% are near the C-terminus), although these terminal regions include only 23% of the amino acid residues. The frequencies of occurrence of unstructured residues have been calculated for each of 20 types in different positions in the protein chain. It has been shown that relative frequencies of occurrence of unstructured residues of 20 types at the termini of protein chains differ from the ones in the middle part of the protein chain; amino acid residues of the same type have different probabilities to be unstructured in the terminal regions and in the middle part of the protein chain. The obtained frequencies of occurrence of unstructured residues in the middle part of the protein chain have been used as a scale for predicting disordered regions from amino acid sequence using the method (FoldUnfold) previously developed by us. This scale of frequencies of occurrence of unstructured residues correlates with the contact scale (previously developed by us and used for the same purpose) at a level of 95%. Testing the new scale on a database of 427 unstructured proteins and 559 completely structured proteins has shown that this scale can be successfully used for the prediction of disordered regions in protein chains.  相似文献   

2.
Intrinsically unstructured proteins (IUPs) are proteins lacking a fixed three dimensional structure or containing long disordered regions. IUPs play an important role in biology and disease. Identifying disordered regions in protein sequences can provide useful information on protein structure and function, and can assist high-throughput protein structure determination. In this paper we present a system for predicting disordered regions in proteins based on decision trees and reduced amino acid composition. Concise rules based on biochemical properties of amino acid side chains are generated for prediction. Coarser information extracted from the composition of amino acids can not only improve the prediction accuracy but also increase the learning efficiency. In cross-validation tests, with four groups of reduced amino acid composition, our system can achieve a recall of 80% at a 13% false positive rate for predicting disordered regions, and the overall accuracy can reach 83.4%. This prediction accuracy is comparable to most, and better than some, existing predictors. Advantages of our approach are high prediction accuracy for long disordered regions and efficiency for large-scale sequence analysis. Our software is freely available for academic use upon request.  相似文献   

3.
The point of view that a uniquely folded protein tertiary structure is required for the protein functioning has been prevailing in the literature quite recently. However of lately it has been found that many proteins in a cell have no such structure in an isolated state, though they have a well-defined function in physiological conditions. These proteins were named as proteins with natural or internal disorder. The portion of disordered regions in such proteins may vary from a sequence of several amino acids to a completely disordered sequence containing from tens to hundreds of amino acids. The main difference of these proteins from the structured (globular) ones is that they have no unique tertiary structure in an isolated state and acquire it after interaction with their partners. Their conformation in such a complex depends on the interacting partner and not only on their own amino acid sequence, which is specific for structured (globular) proteins. The problem of structural and functional relations in the structured proteins and proteins with internal disorder is discussed in this review. The complexity of the problem and its potential solutions are illustrated by the example of elongation factors EFlA.  相似文献   

4.
Programs exist for searching protein sequences for potential membrane-penetrating segments (hydrophobic regions) and for lipid-binding sites with highly defined tertiary structures, such as PH, FERM, C2, ENTH, and other domains. However, a rapidly growing number of membrane-associated proteins (including cytoskeletal proteins, kinases, GTP-binding proteins, and their effectors) bind lipids through less structured regions. Here, we describe the development and testing of a simple computer search program that identifies unstructured potential membrane-binding sites. Initially, we found that both basic and hydrophobic amino acids, irrespective of sequence, contribute to the binding to acidic phospholipid vesicles of synthetic peptides that correspond to the putative membrane-binding domains of Acanthamoeba class I myosins. Based on these results, we modified a hydrophobicity scale giving Arg- and Lys-positive, rather than negative, values. Using this basic and hydrophobic scale with a standard search algorithm, we successfully identified previously determined unstructured membrane-binding sites in all 16 proteins tested. Importantly, basic and hydrophobic searches identified previously unknown potential membrane-binding sites in class I myosins, PAKs and CARMIL (capping protein, Arp2/3, myosin I linker; a membrane-associated cytoskeletal scaffold protein), and synthetic peptides and protein domains containing these newly identified sites bound to acidic phospholipids in vitro.  相似文献   

5.
Abstract: Proteins are often classified in a binary fashion as either structured or disordered. However this approach has several deficits. Firstly, protein folding is always conditional on the physiochemical environment. A protein which is structured in some circumstances will be disordered in others. Secondly, it hides a fundamental asymmetry in behavior. While all structured proteins can be unfolded through a change in environment, not all disordered proteins have the capacity for folding. Failure to accommodate these complexities confuses the definition of both protein structural domains and intrinsically disordered regions. We illustrate these points with an experimental study of a family of small binding domains, drawn from the RNA polymerase of mumps virus and its closest relatives. Assessed at face value the domains fall on a structural continuum, with folded, partially folded, and near unstructured members. Yet the disorder present in the family is conditional, and these closely related polypeptides can access the same folded state under appropriate conditions. Any heuristic definition of the protein domain emphasizing conformational stability divides this domain family in two, in a way that makes no biological sense. Structural domains would be better defined by their ability to adopt a specific tertiary structure: a structure that may or may not be realized, dependent on the circumstances. This explicitly allows for the conditional nature of protein folding, and more clearly demarcates structural domains from intrinsically disordered regions that may function without folding.  相似文献   

6.
The structural stability of a protein requires a large number of interresidue interactions. The energetic contribution of these can be approximated by low-resolution force fields extracted from known structures, based on observed amino acid pairing frequencies. The summation of such energies, however, cannot be carried out for proteins whose structure is not known or for intrinsically unstructured proteins. To overcome these limitations, we present a novel method for estimating the total pairwise interaction energy, based on a quadratic form in the amino acid composition of the protein. This approach is validated by the good correlation of the estimated and actual energies of proteins of known structure and by a clear separation of folded and disordered proteins in the energy space it defines. As the novel algorithm has not been trained on unstructured proteins, it substantiates the concept of protein disorder, i.e. that the inability to form a well-defined 3D structure is an intrinsic property of many proteins and protein domains. This property is encoded in their sequence, because their biased amino acid composition does not allow sufficient stabilizing interactions to form. By limiting the calculation to a predefined sequential neighborhood, the algorithm was turned into a position-specific scoring scheme that characterizes the tendency of a given amino acid to fall into an ordered or disordered region. This application we term IUPred and compare its performance with three generally accepted predictors, PONDR VL3H, DISOPRED2 and GlobPlot on a database of disordered proteins.  相似文献   

7.
Defining the role of intrinsic disorder in proteins in the myriad of biological processes with which it is involved represents a significant goal in modern biophysics. Toward this end, NMR is uniquely suited for molecular studies of dynamic and disordered regions, but studying these regions in concert with their more structured domains and binding partners presents spectroscopic challenges. Here, we investigate the interactions between the structured and disordered regions of the human glucocorticoid receptor (GR). To do this, we developed an NMR strategy that relies on a novel relaxation filter for the simultaneous study of structured and unstructured regions. Using this approach, we conducted a comparative analysis of three translational isoforms of GR containing a folded DNA-binding domain (DBD) and two disordered regions that flank the DBD, one of which varies in size in the different isoforms. Notably, we were able to assign resonances that had previously been inaccessible because of the spectral complexity of the translational isoforms, which in turn allowed us to 1) identify a region of the structured DBD that undergoes significant changes in the local chemical environment in the presence of the disordered region and 2) determine differences in the conformational ensembles of the disordered regions of the translational isoforms. Furthermore, an ensemble-based thermodynamic analysis of the isoforms reveals conserved patterns of stability within the N-terminal domain of GR that persist despite low sequence conservation. These studies provide an avenue for further investigations of the mechanistic underpinnings of the functional relevance of the translational isoforms of GR while also providing a general NMR strategy for studying systems containing both structured and disordered regions.  相似文献   

8.
Intrinsically unstructured/disordered proteins and domains (IUPs) lack a well-defined three-dimensional structure under native conditions. The IUPred server presents a novel algorithm for predicting such regions from amino acid sequences by estimating their total pairwise interresidue interaction energy, based on the assumption that IUP sequences do not fold due to their inability to form sufficient stabilizing interresidue interactions. Optional to the prediction are built-in parameter sets optimized for predicting short or long disordered regions and structured domains.  相似文献   

9.
10.
Serine/arginine-rich (SR) splicing factors play an important role in constitutive and alternative splicing as well as during several steps of RNA metabolism. Despite the wealth of functional information about SR proteins accumulated to-date, structural knowledge about the members of this family is very limited. To gain a better insight into structure-function relationships of SR proteins, we performed extensive sequence analysis of SR protein family members and combined it with ordered/disordered structure predictions. We found that SR proteins have properties characteristic of intrinsically disordered (ID) proteins. The amino acid composition and sequence complexity of SR proteins were very similar to those of the disordered protein regions. More detailed analysis showed that the SR proteins, and their RS domains in particular, are enriched in the disorder-promoting residues and are depleted in the order-promoting residues as compared to the entire human proteome. Moreover, disorder predictions indicated that RS domains of SR proteins were completely unstructured. Two different classification methods, the charge-hydropathy measure and the cumulative distribution function (CDF) of the disorder scores, were in agreement with each other, and they both strongly predicted members of the SR protein family to be disordered. This study emphasizes the importance of the disordered structure for several functions of SR proteins, such as for spliceosome assembly and for interaction with multiple partners. In addition, it demonstrates the usefulness of order/disorder predictions for inferring protein structure from sequence.  相似文献   

11.
Several algorithms have been developed that use amino acid sequences to predict whether or not a protein or a region of a protein is disordered. These algorithms make accurate predictions for disordered regions that are 30 amino acids or longer, but it is unclear whether the predictions can be directly related to the backbone dynamics of individual amino acid residues. The nuclear Overhauser effect between the amide nitrogen and hydrogen (NHNOE) provides an unambiguous measure of backbone dynamics at single residue resolution and is an excellent tool for characterizing the dynamic behavior of disordered proteins. In this report, we show that the NHNOE values for several members of a family of disordered proteins are highly correlated with the output from three popular algorithms used to predict disordered regions from amino acid sequence. This is the first test between an experimental measure of residue specific backbone dynamics and disorder predictions. The results suggest that some disorder predictors can accurately estimate the backbone dynamics of individual amino acids in a long disordered region.  相似文献   

12.
More than a hundred proteins in yeast reversibly aggregate and phase-separate in response to various stressors, such as nutrient depletion and heat shock. We know little about the protein sequence and structural features behind this ability, which has not been characterized on a proteome-wide level. To identify the distinctive features of aggregation-prone protein regions, we apply machine learning algorithms to genome-scale limited proteolysis-mass spectrometry (LiP-MS) data from yeast proteins. LiP-MS data reveals that 96 proteins show significant structural changes upon heat shock. We find that in these proteins the propensity to phase separate cannot be solely driven by disordered regions, because their aggregation-prone regions (APRs) are not significantly disordered. Instead, the phase separation of these proteins requires contributions from both disordered and structured regions. APRs are significantly enriched in aliphatic residues and depleted in positively charged amino acids. Aggregator proteins with longer APRs show a greater propensity to aggregate, a relationship that can be explained by equilibrium statistical thermodynamics. Altogether, our observations suggest that proteome-wide reversible protein aggregation is mediated by sequence-encoded properties. We propose that aggregating proteins resemble supra-molecular amphiphiles, where APRs are the hydrophobic parts, and non-APRs are the hydrophilic parts.  相似文献   

13.
Until recently, the point of view that the unique tertiary structure is necessary for protein function has prevailed. However, recent data have demonstrated that many cell proteins do not possess such structure in isolation, although displaying a distinct function under physiological conditions. These proteins were named the naturally, or intrinsically, disordered proteins. The fraction of intrinsically disordered regions in such proteins may vary from several amino acid residues to a completely unordered sequence of several tens or even several hundreds of residues. The main distinction of these proteins from structured (globular) proteins is that they have no unique tertiary structure in isolation and acquire it only upon interaction with their partners. The conformation of these proteins in a complex is determined not only by their own amino acid sequence (as is typical of structured, or globular, proteins) but also by the interacting partner. This review discusses the structure-function relationships in structured and intrinsically disordered proteins. The intricateness of this problem and the possible ways to solve it are illustrated by the example of the EF1A elongation factor family.  相似文献   

14.
Gir2 is a highly acidic cytoplasmic protein of Saccharomyces cerevisiae of unknown function that shows an anomalous migration on SDS-PAGE. Based on its large Stokes radius and thermostability, we have previously suggested that Gir2 lacks extensive secondary structure. Here we report that Gir2 is extremely sensitive to proteolysis when compared to glutathione-S-transferase, a highly structured protein, further indicating its unfolded nature. Prediction based on the FoldIndex program also indicates that Gir2 is a disordered protein. Using truncated forms of Gir2 we show that the N-terminal half of this protein, with its high content of acidic amino acid residues, is responsible for the anomalous electrophoretic behavior of Gir2. Because all these features are hallmarks of intrinsically unstructured proteins (IUP), we propose that Gir2 is another representative of the IUP group of proteins. Additionally, we describe that the endogenous yeast Gir2 shows heterogeneous electrophoretic mobility, which is not due to proteolytic cleavage.  相似文献   

15.
Numerous studies have demonstrated that the propensity of a protein to form amyloids or amorphous aggregates is encoded by its amino acid sequence. This led to the emergence of several computational programs to predict amyloidogenicity from amino acid sequences. However, a growing number of studies indicate that an accurate prediction of the protein aggregation can only be achieved when also accounting for the overall structural context of the protein, and the likelihood of transition between the initial state and the aggregate. Here, we describe a computational pipeline called TAPASS, which was designed to do just that. The pipeline assigns each residue of a protein as belonging to a structured region or an intrinsically disordered region (IDR). For this purpose, TAPASS uses either several state-of-the-art programs for prediction of IDRs, of transmembrane regions and of structured domains or the artificial intelligence program AlphaFold. In the next step, this assignment is crossed with amyloidogenicity prediction. As a result, TAPASS allows the detection of Exposed Amyloidogenic Regions (EARs) located within intrinsically disordered regions (IDRs) and carrying high amyloidogenic potential. TAPASS can substantially improve the prediction of amyloids and be used in proteome-wide analysis to discover new amyloid-forming proteins. Its results, combined with clinical data, can create individual risk profiles for different amyloidoses, opening up new opportunities for personalised medicine. The architecture of the pipeline is designed so that it makes it easy to add new individual predictors as they become available. TAPASS can be used through the web interface (https://bioinfo.crbm.cnrs.fr/index.php?route=tools&tool=32).  相似文献   

16.
Local structural disorder imparts plasticity on linear motifs   总被引:5,自引:0,他引:5  
MOTIVATION: The dynamic nature of protein interaction networks requires fast and transient molecular switches. The underlying recognition motifs (linear motifs, LMs) are usually short and evolutionarily variable segments, which in several cases, such as phosphorylation sites or SH3-binding regions, fall into locally disordered regions. We probed the generality of this phenomenon by predicting the intrinsic disorder of all LM-containing proteins enlisted in the Eukaryotic Linear Motif (ELM) database. RESULTS: We demonstrated that LMs in average are embedded in locally unstructured regions, while their amino acid composition and charge/hydropathy properties exhibit a mixture characteristic of folded and disordered proteins. Overall, LMs are constructed by grafting a few specificity-determining residues favoring structural order on a highly flexible carrier region. These results establish a connection between LMs and molecular recognition elements of intrinsically unstructured proteins (IUPs), which realize a non-conventional mode of partner binding mostly in regulatory functions.  相似文献   

17.
Length-dependent prediction of protein intrinsic disorder   总被引:2,自引:0,他引:2  

Background  

Due to the functional importance of intrinsically disordered proteins or protein regions, prediction of intrinsic protein disorder from amino acid sequence has become an area of active research as witnessed in the 6th experiment on Critical Assessment of Techniques for Protein Structure Prediction (CASP6). Since the initial work by Romero et al. (Identifying disordered regions in proteins from amino acid sequences, IEEE Int. Conf. Neural Netw., 1997), our group has developed several predictors optimized for long disordered regions (>30 residues) with prediction accuracy exceeding 85%. However, these predictors are less successful on short disordered regions (≤30 residues). A probable cause is a length-dependent amino acid compositions and sequence properties of disordered regions.  相似文献   

18.
We analyzed the mouse forebrain cytosolic phosphoproteome using sequential (protein and peptide) IMAC purifications, enzymatic dephosphorylation, and targeted tandem mass spectrometry analysis strategies. In total, using complementary phosphoenrichment and LC-MS/MS strategies, 512 phosphorylation sites on 540 non-redundant phosphopeptides from 162 cytosolic phosphoproteins were characterized. Analysis of protein domains and amino acid sequence composition of this data set of cytosolic phosphoproteins revealed that it is significantly enriched in intrinsic sequence disorder, and this enrichment is associated with both cellular location and phosphorylation status. The majority of phosphorylation sites found by MS were located outside of structural protein domains (97%) but were mostly located in regions of intrinsic sequence disorder (86%). 368 phosphorylation sites were located in long regions of disorder (over 40 amino acids long), and 94% of proteins contained at least one such long region of disorder. In addition, we found that 58 phosphorylation sites in this data set occur in 14-3-3 binding consensus motifs, linear motifs that are associated with unstructured regions in proteins. These results demonstrate that in this data set protein phosphorylation is significantly depleted in protein domains and significantly enriched in disordered protein sequences and that enrichment of intrinsic sequence disorder may be a common feature of phosphoproteomes. This supports the hypothesis that disordered regions in proteins allow kinases, phosphatases, and phosphorylation-dependent binding proteins to gain access to target sequences to regulate local protein conformation and activity.  相似文献   

19.
Intrinsically disordered proteins (IDPs) do not adopt stable three-dimensional structures in physiological conditions, yet these proteins play crucial roles in biological phenomena. In most cases, intrinsic disorder manifests itself in segments or domains of an IDP, called intrinsically disordered regions (IDRs), but fully disordered IDPs also exist. Although IDRs can be detected as missing residues in protein structures determined by X-ray crystallography, no protocol has been developed to identify IDRs from structures obtained by Nuclear Magnetic Resonance (NMR). Here, we propose a computational method to assign IDRs based on NMR structures. We compared missing residues of X-ray structures with residue-wise deviations of NMR structures for identical proteins, and derived a threshold deviation that gives the best correlation of ordered and disordered regions of both structures. The obtained threshold of 3.2 Å was applied to proteins whose structures were only determined by NMR, and the resulting IDRs were analyzed and compared to those of X-ray structures with no NMR counterpart in terms of sequence length, IDR fraction, protein function, cellular location, and amino acid composition, all of which suggest distinct characteristics. The structural knowledge of IDPs is still inadequate compared with that of structured proteins. Our method can collect and utilize IDRs from structures determined by NMR, potentially enhancing the understanding of IDPs.  相似文献   

20.
A practical overview of protein disorder prediction methods   总被引:1,自引:0,他引:1  
In the past few years there has been a growing awareness that a large number of proteins contain long disordered (unstructured) regions that often play a functional role. However, these disordered regions are still poorly detected. Recognition of disordered regions in a protein is important for two main reasons: reducing bias in sequence similarity analysis by avoiding alignment of disordered regions against ordered ones, and helping to delineate boundaries of protein domains to guide structural and functional studies. As none of the available method for disorder prediction can be taken as fully reliable on its own, we present an overview of the methods currently employed highlighting their advantages and drawbacks. We show a few practical examples of how they can be combined to avoid pitfalls and to achieve more reliable predictions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号