首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Protein disorder prediction: implications for structural proteomics   总被引:26,自引:0,他引:26  
A great challenge in the proteomics and structural genomics era is to predict protein structure and function, including identification of those proteins that are partially or wholly unstructured. Disordered regions in proteins often contain short linear peptide motifs (e.g., SH3 ligands and targeting signals) that are important for protein function. We present here DisEMBL, a computational tool for prediction of disordered/unstructured regions within a protein sequence. As no clear definition of disorder exists, we have developed parameters based on several alternative definitions and introduced a new one based on the concept of "hot loops," i.e., coils with high temperature factors. Avoiding potentially disordered segments in protein expression constructs can increase expression, foldability, and stability of the expressed protein. DisEMBL is thus useful for target selection and the design of constructs as needed for many biochemical studies, particularly structural biology and structural genomics projects. The tool is freely available via a web interface (http://dis.embl.de) and can be downloaded for use in large-scale studies.  相似文献   

2.
Prediction of short linear protein binding regions   总被引:1,自引:0,他引:1  
Short linear motifs in proteins (typically 3-12 residues in length) play key roles in protein-protein interactions by frequently binding specifically to peptide binding domains within interacting proteins. Their tendency to be found in disordered segments of proteins has meant that they have often been overlooked. Here we present SLiMPred (short linear motif predictor), the first general de novo method designed to computationally predict such regions in protein primary sequences independent of experimentally defined homologs and interactors. The method applies machine learning techniques to predict new motifs based on annotated instances from the Eukaryotic Linear Motif database, as well as structural, biophysical, and biochemical features derived from the protein primary sequence. We have integrated these data sources and benchmarked the predictive accuracy of the method, and found that it performs equivalently to a predictor of protein binding regions in disordered regions, in addition to having predictive power for other classes of motif sites such as polyproline II helix motifs and short linear motifs lying in ordered regions. It will be useful in predicting peptides involved in potential protein associations and will aid in the functional characterization of proteins, especially of proteins lacking experimental information on structures and interactions. We conclude that, despite the diversity of motif sequences and structures, SLiMPred is a valuable tool for prioritizing potential interaction motifs in proteins.  相似文献   

3.
Natively unstructured or disordered protein regions may increase the functional complexity of an organism; they are particularly abundant in eukaryotes and often evade structure determination. Many computational methods predict unstructured regions by training on outliers in otherwise well-ordered structures. Here, we introduce an approach that uses a neural network in a very different and novel way. We hypothesize that very long contiguous segments with nonregular secondary structure (NORS regions) differ significantly from regular, well-structured loops, and that a method detecting such features could predict natively unstructured regions. Training our new method, NORSnet, on predicted information rather than on experimental data yielded three major advantages: it removed the overlap between testing and training, it systematically covered entire proteomes, and it explicitly focused on one particular aspect of unstructured regions with a simple structural interpretation, namely that they are loops. Our hypothesis was correct: well-structured and unstructured loops differ so substantially that NORSnet succeeded in their distinction. Benchmarks on previously used and new experimental data of unstructured regions revealed that NORSnet performed very well. Although it was not the best single prediction method, NORSnet was sufficiently accurate to flag unstructured regions in proteins that were previously not annotated. In one application, NORSnet revealed previously undetected unstructured regions in putative targets for structural genomics and may thereby contribute to increasing structural coverage of large eukaryotic families. NORSnet found unstructured regions more often in domain boundaries than expected at random. In another application, we estimated that 50%–70% of all worm proteins observed to have more than seven protein–protein interaction partners have unstructured regions. The comparative analysis between NORSnet and DISOPRED2 suggested that long unstructured loops are a major part of unstructured regions in molecular networks.  相似文献   

4.
Local structural disorder imparts plasticity on linear motifs   总被引:5,自引:0,他引:5  
MOTIVATION: The dynamic nature of protein interaction networks requires fast and transient molecular switches. The underlying recognition motifs (linear motifs, LMs) are usually short and evolutionarily variable segments, which in several cases, such as phosphorylation sites or SH3-binding regions, fall into locally disordered regions. We probed the generality of this phenomenon by predicting the intrinsic disorder of all LM-containing proteins enlisted in the Eukaryotic Linear Motif (ELM) database. RESULTS: We demonstrated that LMs in average are embedded in locally unstructured regions, while their amino acid composition and charge/hydropathy properties exhibit a mixture characteristic of folded and disordered proteins. Overall, LMs are constructed by grafting a few specificity-determining residues favoring structural order on a highly flexible carrier region. These results establish a connection between LMs and molecular recognition elements of intrinsically unstructured proteins (IUPs), which realize a non-conventional mode of partner binding mostly in regulatory functions.  相似文献   

5.
Protein–protein interactions are thought to be mediated by domains, which are autonomous folding units of proteins. Recently, a second type of interaction has been suggested, mediated by short segments termed linear motifs, which are related to recognition elements of intrinsically disordered regions. Here, we propose a third kind of protein–protein recognition mechanism, mediated by disordered regions longer than 20–30 residues. Bioinformatics predictions and well‐characterized examples, such as the kinase‐inhibitory domain of Cdk inhibitors and the Wiskott–Aldrich syndrome protein (WASP)‐homology domain 2 of actin‐binding proteins, show that these disordered regions conform to the definition of domains rather than motifs, i.e., they represent functional, evolutionary, and structural units. Their functions are distinct from those of short motifs and ordered domains, and establish a third kind of interaction principle. With these points, we argue that these long disordered regions should be recognized as a distinct class of biologically functional protein domains.  相似文献   

6.
The structural stability of a protein requires a large number of interresidue interactions. The energetic contribution of these can be approximated by low-resolution force fields extracted from known structures, based on observed amino acid pairing frequencies. The summation of such energies, however, cannot be carried out for proteins whose structure is not known or for intrinsically unstructured proteins. To overcome these limitations, we present a novel method for estimating the total pairwise interaction energy, based on a quadratic form in the amino acid composition of the protein. This approach is validated by the good correlation of the estimated and actual energies of proteins of known structure and by a clear separation of folded and disordered proteins in the energy space it defines. As the novel algorithm has not been trained on unstructured proteins, it substantiates the concept of protein disorder, i.e. that the inability to form a well-defined 3D structure is an intrinsic property of many proteins and protein domains. This property is encoded in their sequence, because their biased amino acid composition does not allow sufficient stabilizing interactions to form. By limiting the calculation to a predefined sequential neighborhood, the algorithm was turned into a position-specific scoring scheme that characterizes the tendency of a given amino acid to fall into an ordered or disordered region. This application we term IUPred and compare its performance with three generally accepted predictors, PONDR VL3H, DISOPRED2 and GlobPlot on a database of disordered proteins.  相似文献   

7.
The amyloid precursor protein (APP), that plays a critical role in the development of senile plaques in Alzheimer disease (AD), and the gp41 envelope protein of the human immunodeficiency virus (HIV), the causative agent of the acquired immunodeficiency syndrome (AIDS), are single-spanning type-1 transmembrane (TM) glycoproteins with the ability to form homo-oligomers. In this review we describe similarities, both in structural terms and sequence determinants of their TM and juxtamembrane regions. The TM domains are essential not only for anchoring the proteins in membranes but also have functional roles. Both TM segments contain GxxxG motifs that drive TM associations within the lipid bilayer. They also each possess similar sequence motifs, positioned at the membrane interface preceding their TM domains. These domains are known as cholesterol recognition/interaction amino acid consensus (CRAC) motif in gp41 and CRAC-like motif in APP. Moreover, in the cytoplasmic domain of both proteins other α-helical membranotropic regions with functional implications have been identified. Recent drug developments targeting both diseases are reviewed and the potential use of TM interaction modulators as therapeutic targets is discussed.  相似文献   

8.
Alternative inclusion of exons increases the functional diversity of proteins. Among alternatively spliced exons, tissue-specific exons play a critical role in maintaining tissue identity. This raises the question of how tissue-specific protein-coding exons influence protein function. Here we investigate the structural, functional, interaction, and evolutionary properties of constitutive, tissue-specific, and other alternative exons in human. We find that tissue-specific protein segments often contain disordered regions, are enriched in posttranslational modification sites, and frequently embed conserved binding motifs. Furthermore, genes containing tissue-specific exons tend to occupy central positions in interaction networks and display distinct interaction partners in the respective tissues, and are enriched in signaling, development, and disease genes. Based on these findings, we propose that tissue-specific inclusion of disordered segments that contain binding motifs rewires interaction networks and signaling pathways. In this way, tissue-specific splicing may contribute to functional versatility of proteins and increases the diversity of interaction networks across tissues.  相似文献   

9.
Many protein functions can be traced to linear sequence motifs of less than five residues, which are often found within intrinsically disordered domains. In spite of their prevalence, their role in protein evolution is only beginning to be understood. The study of papillomaviruses has provided many insights on the evolution of protein structure and function. We have chosen the papillomavirus E7 oncoprotein as a model system for the evolution of functional linear motifs. The multiple functions of E7 proteins from paradigmatic papillomavirus types can be explained to a large extent in terms of five linear motifs within the intrinsically disordered N-terminal domain and two linear motifs within the globular homodimeric C-terminal domain. We examined the motif inventory of E7 proteins from over 200 known papillomavirus types and found that the motifs reported for paradigmatic papillomavirus types are absent from many uncharacterized E7 proteins. Several motif pairs occur more often than expected, suggesting that linear motifs may evolve and function in a cooperative manner. The E7 linear motifs have appeared or disappeared multiple times during papillomavirus evolution, confirming the evolutionary plasticity of short functional sequences. Four of the motifs appeared several times during papillomavirus evolution, providing direct evidence for convergent evolution. Interestingly, the evolution pattern of a motif is independent of its location in a globular or disordered domain. The correlation between the presence of some motifs and virus host specificity and tissue tropism suggests that linear motifs play a role in the adaptive evolution of papillomaviruses.  相似文献   

10.
Structural genomics initiatives aim to elucidate representative 3D structures for the majority of protein families over the next decade, but many obstacles must be overcome. The correct design of constructs is extremely important since many proteins will be too large or contain unstructured regions and will not be amenable to crystallization. It is therefore essential to identify regions in protein sequences that are likely to be suitable for structural study. Scooby-Domain is a fast and simple method to identify globular domains in protein sequences. Domains are compact units of protein structure and their correct delineation will aid structural elucidation through a divide-and-conquer approach. Scooby-Domain predictions are based on the observed lengths and hydrophobicities of domains from proteins with known tertiary structure. The prediction method employs an A*-search to identify sequence regions that form a globular structure and those that are unstructured. On a test set of 173 proteins with consensus CATH and SCOP domain definitions, Scooby-Domain has a sensitivity of 50% and an accuracy of 29%, which is better than current state-of-the-art methods. The method does not rely on homology searches and, therefore, can identify previously unknown domains.  相似文献   

11.
MOTIVATION: Characterization of a protein family by its distinct sequence domains is crucial for functional annotation and correct classification of newly discovered proteins. Conventional Multiple Sequence Alignment (MSA) based methods find difficulties when faced with heterogeneous groups of proteins. However, even many families of proteins that do share a common domain contain instances of several other domains, without any common underlying linear ordering. Ignoring this modularity may lead to poor or even false classification results. An automated method that can analyze a group of proteins into the sequence domains it contains is therefore highly desirable. RESULTS: We apply a novel method to the problem of protein domain detection. The method takes as input an unaligned group of protein sequences. It segments them and clusters the segments into groups sharing the same underlying statistics. A Variable Memory Markov (VMM) model is built using a Prediction Suffix Tree (PST) data structure for each group of segments. Refinement is achieved by letting the PSTs compete over the segments, and a deterministic annealing framework infers the number of underlying PST models while avoiding many inferior solutions. We show that regions of similar statistics correlate well with protein sequence domains, by matching a unique signature to each domain. This is done in a fully automated manner, and does not require or attempt an MSA. Several representative cases are analyzed. We identify a protein fusion event, refine an HMM superfamily classification into the underlying families the HMM cannot separate, and detect all 12 instances of a short domain in a group of 396 sequences. CONTACT: jill@cs.huji.ac.il; tishby@cs.huji.ac.il.  相似文献   

12.
The abundance and potential functional roles of intrinsically disordered regions in aquaporin-4, Kir4.1, a dystrophin isoforms Dp71, α-1 syntrophin, and α-dystrobrevin; i.e., proteins constituting the functional core of the astrocytic dystrophin-associated protein complex (DAPC), are analyzed by a wealth of computational tools. The correlation between protein intrinsic disorder, single nucleotide polymorphisms (SNPs) and protein function is also studied together with the peculiarities of structural and functional conservation of these proteins. Our study revealed that the DAPC members are typical hybrid proteins that contain both ordered and intrinsically disordered regions. Both ordered and disordered regions are important for the stabilization of this complex. Many disordered binding regions of these five proteins are highly conserved among vertebrates. Conserved eukaryotic linear motifs and molecular recognition features found in the disordered regions of five protein constituting DAPC likely enhance protein-protein interactions that are required for the cellular functions of this complex. Curiously, the disorder-based binding regions are rarely affected by SNPs suggesting that these regions are crucial for the biological functions of their corresponding proteins.  相似文献   

13.
A practical overview of protein disorder prediction methods   总被引:1,自引:0,他引:1  
In the past few years there has been a growing awareness that a large number of proteins contain long disordered (unstructured) regions that often play a functional role. However, these disordered regions are still poorly detected. Recognition of disordered regions in a protein is important for two main reasons: reducing bias in sequence similarity analysis by avoiding alignment of disordered regions against ordered ones, and helping to delineate boundaries of protein domains to guide structural and functional studies. As none of the available method for disorder prediction can be taken as fully reliable on its own, we present an overview of the methods currently employed highlighting their advantages and drawbacks. We show a few practical examples of how they can be combined to avoid pitfalls and to achieve more reliable predictions.  相似文献   

14.
Intrinsically unstructured proteins (IUPs) exist in a disordered conformational state, often considered to be equivalent with the random-coil structure. We challenge this simplifying view by limited proteolysis, circular dichroism (CD) spectroscopy, and solid-state (1)H NMR, to show short- and long-range structural organization in two IUPs, the first inhibitory domain of calpastatin (CSD1) and microtubule-associated protein 2c (MAP2c). Proteases of either narrow (trypsin, chymotrypsin, and plasmin) or broad (subtilisin and proteinase K) substrate specificity, applied at very low concentrations, preferentially cleaved both proteins in regions, i.e., subdomains A, B, and C in CSD1 and the proline-rich region (PRR) in MAP2c, that are destined to form contacts with their targets. For CSD1, nonadditivity of the CD spectra of its two halves and suboptimal hydration of the full-length protein measured by solid-state NMR demonstrate that long-range tertiary interactions provide the structural background of this structural feature. In MAP2c, such tertiary interactions are absent, which points to the importance of local structural constraints. In fact, urea and temperature dependence of the CD spectrum of its PRR reveals the presence of the extended and rather stiff polyproline II helix conformation that keeps the interaction site exposed. These data suggest that functionally significant residual structure exists in both of these IUPs. This structure, manifest as either transient local and/or global organization, ensures the spatial exposure of short contact segments on the surface. Pertinent data from other IUPs suggest that the presence of such recognition motifs may be a general feature of disordered proteins. To emphasize the possible importance of this structural trait, we propose that these motifs be called primary contact sites in IUPs.  相似文献   

15.
Natively unfolded proteins: a point where biology waits for physics   总被引:17,自引:0,他引:17       下载免费PDF全文
The experimental material accumulated in the literature on the conformational behavior of intrinsically unstructured (natively unfolded) proteins was analyzed. Results of this analysis showed that these proteins do not possess uniform structural properties, as expected for members of a single thermodynamic entity. Rather, these proteins may be divided into two structurally different groups: intrinsic coils, and premolten globules. Proteins from the first group have hydrodynamic dimensions typical of random coils in poor solvent and do not possess any (or almost any) ordered secondary structure. Proteins from the second group are essentially more compact, exhibiting some amount of residual secondary structure, although they are still less dense than native or molten globule proteins. An important feature of the intrinsically unstructured proteins is that they undergo disorder-order transition during or prior to their biological function. In this respect, the Protein Quartet model, with function arising from four specific conformations (ordered forms, molten globules, premolten globules, and random coils) and transitions between any two of the states, is discussed.  相似文献   

16.
Nuclear-encoded, chloroplast-destined proteins are synthesized with transit sequences that contain all information to get them inside the organelle. Different proteins are imported via a general protein import machinery, but their transit sequences do not share amino acid homology. It has been suggested that interactions between transit sequence and chloroplast envelope membrane lipids give rise to recognizable, structural motifs. In this study a detailed investigation of the structural, dynamical, and topological features of an isolated transit peptide associated with mixed micelles is described. The structure of the preferredoxin transit peptide in these micelles was studied by circular dichroism (CD) and multidimensional NMR techniques. CD experiments indicated that the peptide, which is unstructured in aqueous solution, obtained helical structure in the presence of the micelles. By NMR it is shown that the micelles introduced ill-defined helical structures in the transit peptide. Heteronuclear relaxation experiments showed that the whole peptide backbone is very flexible. The least dynamic segments are two N- and C-terminal helical regions flanking an unstructured proline-rich amino acid stretch. Finally, the insertion of the peptide backbone in the hydrophobic interior of the micelle was investigated by use of hydrophobic spin-labels. The combined data result in a model of the transit peptide structure, backbone dynamics, and insertion upon its interaction with mixed micelles.  相似文献   

17.
We have used GRATH, a graph-based structure comparison algorithm, to map the similarities between the different folds observed in the CATH domain structure database. Statistical analysis of the distributions of the fold similarities has allowed us to assess the significance for any similarity. Therefore we have examined whether it is best to represent folds as discrete entities or whether, in fact, a more accurate model would be a continuum wherein folds overlap via common motifs. To do this we have introduced a new statistical measure of fold similarity, termed gregariousness. For a particular fold, gregariousness measures how many other folds have a significant structural overlap with that fold, typically comprising 40% or more of the larger structure. Gregarious folds often contain commonly occurring super-secondary structural motifs, such as beta-meanders, greek keys, alpha-beta plait motifs or alpha-hairpins, which are matching similar motifs in other folds. Apart from one example, all the most gregarious folds matching 20% or more of the other folds in the database, are alpha-beta proteins. They also occur in highly populated architectural regions of fold space, adopting sandwich-like arrangements containing two or more layers of alpha-helices and beta-strands.Domains that exhibit a low gregariousness, are those that have very distinctive folds, with few common motifs or motifs that are packed in unusual arrangements. Most of the superhelices exhibit low gregariousness despite containing some commonly occurring super-secondary structural motifs. In these folds, these common motifs are combined in an unusual way and represent a small proportion of the fold (<10%). Our results suggest that fold space may be considered as continuous for some architectural arrangements (e.g. alpha-beta sandwiches), in that super-secondary motifs can be used to link neighbouring fold groups. However, in other regions of fold space much more discrete topologies are observed with little similarity between folds.  相似文献   

18.
Compactness has been used to locate discontinuous structural units containing one or more polypeptide chains in proteins of known structure. Rather than exhaustively calculating the compactness of all possible units, our procedure uses a screening algorithm to find discontinuous regions that are potentially compact. Precise calculations of compactness are restricted only to units in these regions. With our procedure, compactness can be used to discover discontinuous domains with virtually any number of disjoint peptides. Small, single-domain proteins may contain several compact regions: thus, compact regions do not always correspond to folding domains. Because a domain is an independent folding unit and should contain a hydrophobic core, compact units were further examined for the presence of hydrophobic clusters (Zehfus MH, 1995, Protein Sci 4:1188-1202). This added constraint limits the number of acceptable units and helps greatly in the location of the true structural domains. The larger hydrophobically stabilized compact units correspond to domains, while the smaller units may correspond to folding intermediates.  相似文献   

19.
Suppressors of cytokine signaling (SOCS) proteins function as negative regulators of cytokine signaling and are involved in fine tuning the immune response. The structure and role of the SH2 domains and C‐terminal SOCS box motifs of the SOCS proteins are well characterized, but the long N‐terminal domains of SOCS4–7 remain poorly understood. Here, we present bioinformatic analyses of the N‐terminal domains of the mammalian SOCS proteins, which indicate that these domains of SOCS4, 5, 6, and 7 are largely disordered. We have also identified a conserved region of about 70 residues in the N‐terminal domains of SOCS4 and 5 that is predicted to be more ordered than the surrounding sequence. The conservation of this region can be traced as far back as lower vertebrates. As conserved regions with increased structural propensity that are located within long disordered regions often contain molecular recognition motifs, we expressed the N‐terminal conserved region of mouse SOCS4 for further analysis. This region, mSOCS486–155, has been characterized by circular dichroism and nuclear magnetic resonance spectroscopy, both of which indicate that it is predominantly unstructured in aqueous solution, although it becomes helical in the presence of trifluoroethanol. The high degree of sequence conservation of this region across different species and between SOCS4 and SOCS5 nonetheless implies that it has an important functional role, and presumably this region adopts a more ordered conformation in complex with its partners. The recombinant protein will be a valuable tool in identifying these partners and defining the structures of these complexes. Proteins 2011. © 2012 Wiley Periodicals, Inc.  相似文献   

20.
Eukaryotic cells are known to contain a wide variety of RNA–protein assemblies, collectively referred to as RNP granules. RNP granules form from a combination of RNA–RNA, protein–RNA, and protein–protein interactions. In addition, RNP granules are enriched in proteins with intrinsically disordered regions (IDRs), which are frequently appended to a well-folded domain of the same protein. This structural organization of RNP granule components allows for a diverse set of protein–protein interactions including traditional structured interactions between well-folded domains, interactions of short linear motifs in IDRs with the surface of well-folded domains, interactions of short motifs within IDRs that weakly interact with related motifs, and weak interactions involving at most transient ordering of IDRs and folded domains with other components. In addition, both well-folded domains and IDRs in granule components frequently interact with RNA and thereby can contribute to RNP granule assembly. We discuss the contribution of these interactions to liquid–liquid phase separation and the possible role of phase separation in the assembly of RNP granules. We expect that these principles also apply to other non-membrane bound organelles and large assemblies in the cell.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号