首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
All proteomes contain both proteins and polypeptide segments that don’t form a defined three-dimensional structure yet are biologically active—called intrinsically disordered proteins and regions (IDPs and IDRs). Most of these IDPs/IDRs lack useful functional annotation limiting our understanding of their importance for organism fitness. Here we characterized IDRs using protein sequence annotations of functional sites and regions available in the UniProt knowledgebase (“UniProt features”: active site, ligand-binding pocket, regions mediating protein-protein interactions, etc.). By measuring the statistical enrichment of twenty-five UniProt features in 981 IDRs of 561 human proteins, we identified eight features that are commonly located in IDRs. We then collected the genetic variant data from the general population and patient-based databases and evaluated the prevalence of population and pathogenic variations in IDPs/IDRs. We observed that some IDRs tolerate 2 to 12-times more single amino acid-substituting missense mutations than synonymous changes in the general population. However, we also found that 37% of all germline pathogenic mutations are located in disordered regions of 96 proteins. Based on the observed-to-expected frequency of mutations, we categorized 34 IDRs in 20 proteins (DDX3X, KIT, RB1, etc.) as intolerant to mutation. Finally, using statistical analysis and a machine learning approach, we demonstrate that mutation-intolerant IDRs carry a distinct signature of functional features. Our study presents a novel approach to assign functional importance to IDRs by leveraging the wealth of available genetic data, which will aid in a deeper understating of the role of IDRs in biological processes and disease mechanisms.  相似文献   

2.
3.
Accurate predictions of the three-dimensional structures of proteins from their amino acid sequences have come of age. AlphaFold, a deep learning-based approach to protein structure prediction, shows remarkable success in independent assessments of prediction accuracy. A significant epoch in structural bioinformatics was the structural annotation of over 98% of protein sequences in the human proteome. Interestingly, many predictions feature regions of very low confidence, and these regions largely overlap with intrinsically disordered regions (IDRs). That over 30% of regions within the proteome are disordered is congruent with estimates that have been made over the past two decades, as intense efforts have been undertaken to generalize the structure–function paradigm to include the importance of conformational heterogeneity and dynamics. With structural annotations from AlphaFold in hand, there is the temptation to draw inferences regarding the “structures” of IDRs and their interactomes. Here, we offer a cautionary note regarding the misinterpretations that might ensue and highlight efforts that provide concrete understanding of sequence-ensemble-function relationships of IDRs. This perspective is intended to emphasize the importance of IDRs in sequence-function relationships (SERs) and to highlight how one might go about extracting quantitative SERs to make sense of how IDRs function.  相似文献   

4.
Behura SK  Severson DW 《Gene》2012,504(2):226-232
We present a detailed genome-scale comparative analysis of simple sequence repeats within protein coding regions among 25 insect genomes. The repetitive sequences in the coding regions primarily represented single codon repeats and codon pair repeats. The CAG triplet is highly repetitive in the coding regions of insect genomes. It is frequently paired with the synonymous codon CAA to code for polyglutamine repeats. The codon pairs that are least repetitive code for polyalanine repeats. The frequency of hexanucleotide and dinucleotide motifs of codon pair repeats is significantly (p<0.001) different in the Drosophila species compared to the non-Drosophila species. However, the frequency of synonymous and non-synonymous codon pair repeats varies in a correlated manner (r(2)=0.79) among all the species. Results further show that perfect and imperfect repeats have significant association with the trinucleotide and hexanucleotide coding repeats in most of these insects. However, only select species show significant association between the numbers of perfect/imperfect hexamers and repeat coding for single amino acid/amino acid pair runs. Our data further suggests that genes containing simple sequence coding repeats may be under negative selection as they tend to be poorly conserved across species. The sequences of coding repeats of orthologous genes vary according to the known phylogeny among the species. In conclusion, the study shows that simple sequence coding repeats are important features of genome diversity among insects.  相似文献   

5.
Intrinsically disordered regions in autophagy proteins   总被引:1,自引:0,他引:1  
Autophagy is an essential eukaryotic pathway required for cellular homeostasis. Numerous key autophagy effectors and regulators have been identified, but the mechanism by which they carry out their function in autophagy is not fully understood. Our rigorous bioinformatic analysis shows that the majority of key human autophagy proteins include intrinsically disordered regions (IDRs), which are sequences lacking stable secondary and tertiary structure; suggesting that IDRs play an important, yet hitherto uninvestigated, role in autophagy. Available crystal structures corroborate the absence of structure in some of these predicted IDRs. Regions of orthologs equivalent to the IDRs predicted in the human autophagy proteins are poorly conserved, indicating that these regions may have diverse functions in different homologs. We also show that IDRs predicted in human proteins contain several regions predicted to facilitate protein–protein interactions, and delineate the network of proteins that interact with each predicted IDR‐containing autophagy protein, suggesting that many of these interactions may involve IDRs. Lastly, we experimentally show that a BCL2 homology 3 domain (BH3D), within the key autophagy effector BECN1 is an IDR. This BH3D undergoes a dramatic conformational change from coil to α‐helix upon binding to BCL2s, with the C‐terminal half of this BH3D constituting a binding motif, which serves to anchor the interaction of the BH3D to BCL2s. The information presented here will help inform future in‐depth investigations of the biological role and mechanism of IDRs in autophagy proteins. Proteins 2014; 82:565–578. © 2013 Wiley Periodicals, Inc.  相似文献   

6.
Intrinsically disordered regions have been associated with various cellular processes and are implicated in several human diseases, but their exact roles remain unclear. We previously defined two classes of conserved disordered regions in budding yeast, referred to as “flexible” and “constrained” conserved disorder. In flexible disorder, the property of disorder has been positionally conserved during evolution, whereas in constrained disorder, both the amino acid sequence and the property of disorder have been conserved. Here, we show that flexible and constrained disorder are widespread in the human proteome, and are particularly common in proteins with regulatory functions. Both classes of disordered sequences are highly enriched in regions of proteins that undergo tissue-specific (TS) alternative splicing (AS), but not in regions of proteins that undergo general (i.e., not tissue-regulated) AS. Flexible disorder is more highly enriched in TS alternative exons, whereas constrained disorder is more highly enriched in exons that flank TS alternative exons. These latter regions are also significantly more enriched in potential phosphosites and other short linear motifs associated with cell signaling. We further show that cancer driver mutations are significantly enriched in regions of proteins associated with TS and general AS. Collectively, our results point to distinct roles for TS alternative exons and flanking exons in the dynamic regulation of protein interaction networks in response to signaling activity, and they further suggest that alternatively spliced regions of proteins are often functionally altered by mutations responsible for cancer.  相似文献   

7.
A substantial portion of the proteome consists of intrinsically disordered regions (IDRs) that do not fold into well-defined 3D structures yet perform numerous biological functions and are associated with a broad range of diseases. It has been a long-standing enigma how different IDRs successfully execute their specific functions. Further putting a spotlight on IDRs are recent discoveries of functionally relevant biomolecular assemblies, which in some cases form through liquid-liquid phase separation. At the molecular level, the formation of biomolecular assemblies is largely driven by weak, multivalent, but selective IDR-IDR interactions. Emerging experimental and computational studies suggest that the primary amino acid sequences of IDRs encode a variety of their interaction behaviors. In this review, we focus on findings and insights that connect sequence-derived features of IDRs to their conformations, propensities to form biomolecular assemblies, selectivity of interaction partners, functions in the context of physiology and disease, and regulation of function. We also discuss directions of future research to facilitate establishing a comprehensive sequence-function paradigm that will eventually allow prediction of selective interactions and specificity of function mediated by IDRs.  相似文献   

8.

Background

Intrinsically disordered proteins (IDPs) or proteins with disordered regions (IDRs) do not have a well-defined tertiary structure, but perform a multitude of functions, often relying on their native disorder to achieve the binding flexibility through changing to alternative conformations. Intrinsic disorder is frequently found in all three kingdoms of life, and may occur in short stretches or span whole proteins. To date most studies contrasting the differences between ordered and disordered proteins focused on simple summary statistics. Here, we propose an evolutionary approach to study IDPs, and contrast patterns specific to ordered protein regions and the corresponding IDRs.

Results

Two empirical Markov models of amino acid substitutions were estimated, based on a large set of multiple sequence alignments with experimentally verified annotations of disordered regions from the DisProt database of IDPs. We applied new methods to detect differences in Markovian evolution and evolutionary rates between IDRs and the corresponding ordered protein regions. Further, we investigated the distribution of IDPs among functional categories, biochemical pathways and their preponderance to contain tandem repeats.

Conclusions

We find significant differences in the evolution between ordered and disordered regions of proteins. Most importantly we find that disorder promoting amino acids are more conserved in IDRs, indicating that in some cases not only amino acid composition but the specific sequence is important for function. This conjecture is also reinforced by the observation that for of our data set IDRs evolve more slowly than the ordered parts of the proteins, while we still support the common view that IDRs in general evolve more quickly. The improvement in model fit indicates a possible improvement for various types of analyses e.g. de novo disorder prediction using a phylogenetic Hidden Markov Model based on our matrices showed a performance similar to other disorder predictors.  相似文献   

9.
ABSTRACT: BACKGROUND: Short linear protein motifs are attracting increasing attention as functionally independent sites, typically 3-10 amino acids in length that are enriched in disordered regions of proteins. Multiple methods have recently been proposed to discover over-represented motifs within a set of proteins based on simple regular expressions. Here, we extend these approaches to profile-based methods, which provide a richer motif representation. RESULTS: The profile motif discovery method MEME performed relatively poorly for motifs in disordered regions of proteins. However, when we applied evolutionary weighting to account for redundancy amongst homologous proteins, and masked out poorly conserved regions of disordered proteins, the performance of MEME is equivalent to that of regular expression methods. However, the two approaches returned different subsets within both a benchmark dataset, and a more realistic discovery dataset. CONCLUSIONS: Profile-based motif discovery methods complement regular expression based methods. Whilst profile-based methods are computationally more intensive, they are likely to discover motifs currently overlooked by regular expression methods.  相似文献   

10.
We present a system for multi-class protein classification based on neural networks. The basic issue concerning the construction of neural network systems for protein classification is the sequence encoding scheme that must be used in order to feed the neural network. To deal with this problem we propose a method that maps a protein sequence into a numerical feature space using the matching scores of the sequence to groups of conserved patterns (called motifs) into protein families. We consider two alternative ways for identifying the motifs to be used for feature generation and provide a comparative evaluation of the two schemes. We also evaluate the impact of the incorporation of background features (2-grams) on the performance of the neural system. Experimental results on real datasets indicate that the proposed method is highly efficient and is superior to other well-known methods for protein classification.  相似文献   

11.
Calsequestrin (CASQ) exists as two distinct isoforms CASQ1 and CASQ2 in all vertebrates. Although the isoforms exhibit unique functional characteristic, the structural basis for the same is yet to be fully defined. Interestingly, the C‐terminal region of the two isoforms exhibit significant differences both in length and amino acid composition; forming Dn‐motif and DEXn‐motif in CASQ1 and CASQ2, respectively. Here, we investigated if the unique C‐terminal motifs possess Ca2+‐sensitivity and affect protein function. Sequence analysis shows that both the Dn‐ and DEXn‐motifs are intrinsically disordered regions (IDRs) of the protein, a feature that is conserved from fish to man. Using purified synthetic peptides, we show that these motifs undergo distinctive Ca2+‐mediated folding suggesting that these disordered motifs are Ca2+‐sensitivity. We generated chimeric proteins by swapping the C‐terminal portions between CASQ1 and CASQ2. Our studies show that the C‐terminal portions do not play significant role in protein folding. An interesting finding of the current study is that the switching of the C‐terminal portion completely reverses the polymerization kinetics. Collectively, these data suggest that these Ca2+‐sensitivity IDRs located at the back‐to‐back dimer interface influence isoform‐specific Ca2+‐dependent polymerization properties of CASQ. © 2014 Wiley Periodicals, Inc. Biopolymers 103: 15–22, 2015.  相似文献   

12.
13.
Intrinsically disordered regions (IDRs) are prevalent in the eukaryotic proteome. Common functional roles of IDRs include forming flexible linkers or undergoing allosteric folding-upon-binding. Recent studies have suggested an additional functional role for IDRs: generating steric pressure on the plasma membrane during endocytosis, via molecular crowding. However, in order to accomplish useful functions, such crowding needs to be regulated in space (e.g., endocytic hotspots) and time (e.g., during vesicle formation). In this work, we explore binding-induced regulation of IDR steric volume. We simulate the IDRs of two proteins from Clathrin-mediated endocytosis (CME) to see if their conformational spaces are regulated via binding-induced expansion. Using Monte-Carlo computational modeling of excluded volumes, we generate large conformational ensembles (3 million) for the IDRs of Epsin and Eps15 and dock the conformers to the alpha subunit of Adaptor Protein 2 (AP2α), their CME binding partner. Our results show that as more molecules of AP2α are bound, the Epsin-derived ensemble shows a significant increase in global dimensions, measured as the radius of Gyration (RG) and the end-to-end distance (EED). Unlike Epsin, Eps15-derived conformers that permit AP2α binding at one motif were found to be more likely to accommodate binding of AP2α at other motifs, suggesting a tendency toward co-accessibility of binding motifs. Co-accessibility was not observed for any pair of binding motifs in Epsin. Thus, we speculate that the disordered regions of Epsin and Eps15 perform different roles during CME, with accessibility in Eps15 allowing it to act as a recruiter of AP2α molecules, while binding-induced expansion of the Epsin disordered region could impose steric pressure and remodel the plasma membrane during vesicle formation.  相似文献   

14.
We have identified four repeats and five domains that are novel in proteins encoded by the Pyrobaculum aerophilum str. IM2 proteome using automated in silico methods. A "repeat" corresponds to a region comprising less than 55 amino acid residues that occurs more than once in the protein sequence and sometimes present in tandem. A "domain" corresponds to a conserved region comprising greater than 55 amino acid residues and may be present as single or multiple copies in the protein sequence. These correspond to (1) 85 amino acid residues AAG domain, (2) 72 amino acid residues GFGN domain, (3) 43 amino acid residues KGG repeat, (4) 25 amino acid residues RWE repeat, (5) 25 amino acid residues RID repeat, (6) 108 amino acid residues NDFA domain, (7) 140 amino acid residues VxY domain, (8) 35 amino acid residues LLPN repeat and (9) 98 amino acid residues GxY domain. A repeat or domain is characterized by specific conserved sequence motifs. We discuss the presence of these repeats and domains in proteins from other genomes and their probable secondary structure.  相似文献   

15.
《Journal of molecular biology》2019,431(8):1650-1670
Intrinsically disordered proteins (IDPs) or regions (IDRs) perform diverse cellular functions, but are also prone to forming promiscuous and potentially deleterious interactions. We investigate the extent to which the properties of, and content in, IDRs have adapted to enable functional diversity while limiting interference from promiscuous interactions in the crowded cellular environment. Information on protein sequences, their predicted intrinsic disorder, and 3D structure contents is related to data on protein cellular concentrations, gene co-expression, and protein–protein interactions in the well-studied yeast Saccharomyces cerevisiae. Results reveal that both the protein IDR content and the frequency of “sticky” amino acids in IDRs (those more frequently involved in protein interfaces) decrease with increasing protein cellular concentration. This implies that the IDR content and the amino acid composition of IDRs experience negative selection as the protein concentration increases. In the S. cerevisiae protein–protein interaction network, the higher a protein's IDR content, the more frequently it interacts with IDR-containing partners, and the more functionally diverse the partners are. Employing a clustering analysis of Gene Ontology terms, we newly identify ~ 600 putative multifunctional proteins in S. cerevisiae. Strikingly, these proteins are enriched in IDRs and contribute significantly to all the observed trends. In particular, IDRs of multi-functional proteins feature more sticky amino acids than IDRs of their non-multifunctional counterparts, or the surfaces of structured yeast proteins. This property likely affords sufficient binding affinity for the functional interactions, commonly mediated by short IDR segments, thereby counterbalancing the loss in overall IDR conformational entropy upon binding.  相似文献   

16.
Intrinsically disordered regions (IDR) play an important role in key biological processes and are closely related to human diseases. IDRs have great potential to serve as targets for drug discovery, most notably in disordered binding regions. Accurate prediction of IDRs is challenging because their genome wide occurrence and a low ratio of disordered residues make them difficult targets for traditional classification techniques. Existing computational methods mostly rely on sequence profiles to improve accuracy which is time consuming and computationally expensive. This article describes an ab initio sequence-only prediction method—which tries to overcome the challenge of accurate prediction posed by IDRs—based on reduced amino acid alphabets and convolutional neural networks (CNNs). We experiment with six different 3-letter reduced alphabets. We argue that the dimensional reduction in the input alphabet facilitates the detection of complex patterns within the sequence by the convolutional step. Experimental results show that our proposed IDR predictor performs at the same level or outperforms other state-of-the-art methods in the same class, achieving accuracy levels of 0.76 and AUC of 0.85 on the publicly available Critical Assessment of protein Structure Prediction dataset (CASP10). Therefore, our method is suitable for proteome-wide disorder prediction yielding similar or better accuracy than existing approaches at a faster speed.  相似文献   

17.
18.
We present a method based on hierarchical self-organizing maps (SOMs) for recognizing patterns in protein sequences. The method is fully automatic, does not require prealigned sequences, is insensitive to redundancy in the training set, and works surprisingly well even with small learning sets. Because it uses unsupervised neural networks, it is able to extract patterns that are not present in all of the unaligned sequences of the learning set. The identification of these patterns in sequence databases is sensitive and efficient. The procedure comprises three main training stages. In the first stage, one SOM is trained to extract common features from the set of unaligned learning sequences. A feature is a number of ungapped sequence segments (usually 4-16 residues long) that are similar to segments in most of the sequences of the learning set according to an initial similarity matrix. In the second training stage, the recognition of each individual feature is refined by selecting an optimal weighting matrix out of a variety of existing amino acid similarity matrices. In a third stage of the SOM procedure, the position of the features in the individual sequences is learned. This allows for variants with feature repeats and feature shuffling. The procedure has been successfully applied to a number of notoriously difficult cases with distinct recognition problems: helix-turn-helix motifs in DNA-binding proteins, the CUB domain of developmentally regulated proteins, and the superfamily of ribokinases. A comparison with the established database search procedure PROFILE (and with several others) led to the conclusion that the new automatic method performs satisfactorily.  相似文献   

19.
The pre-translational modification of messenger ribonucleic acids (mRNAs) by alternative promoter usage and alternative splicing is an important source of pleiotropy. Despite intensive efforts, our understanding of the functional implications of this dynamically created diversity is still incomplete. Using the available knowledge of interaction modules, particularly within intrinsically disordered regions (IDRs), we analysed the occurrences of protein modules within alternative exons. We find that regions removed or included by pre-translational variation are enriched in linear motifs suggesting that the removal or inclusion of exons containing these interaction modules is an important regulatory mechanism. In particular, we observe that PDZ-, PTB-, SH2- and WW-domain binding motifs are more likely to occur within alternative exons. We also determine that regions removed or included by alternative promoter usage are enriched in IDRs suggesting that protein isoform diversity is tightly coupled to the modulation of IDRs. This study, therefore, demonstrates that short linear motifs are key components for establishing protein diversity between splice variants.  相似文献   

20.
Large portions of higher eukaryotic proteomes are intrinsically disordered, and abundant evidence suggests that these unstructured regions of proteins are rich in regulatory interaction interfaces. A major class of disordered interaction interfaces are the compact and degenerate modules known as short linear motifs (SLiMs). As a result of the difficulties associated with the experimental identification and validation of SLiMs, our understanding of these modules is limited, advocating the use of computational methods to focus experimental discovery. This article evaluates the use of evolutionary conservation as a discriminatory technique for motif discovery. A statistical framework is introduced to assess the significance of relatively conserved residues, quantifying the likelihood a residue will have a particular level of conservation given the conservation of the surrounding residues. The framework is expanded to assess the significance of groupings of conserved residues, a metric that forms the basis of SLiMPrints (short linear motif fingerprints), a de novo motif discovery tool. SLiMPrints identifies relatively overconstrained proximal groupings of residues within intrinsically disordered regions, indicative of putatively functional motifs. Finally, the human proteome is analysed to create a set of highly conserved putative motif instances, including a novel site on translation initiation factor eIF2A that may regulate translation through binding of eIF4E.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号