首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Short and long disordered regions of proteins have different preference for different amino acid residues. Different methods often have to be trained to predict them separately. In this study, we developed a single neural-network-based technique called SPINE-D that makes a three-state prediction first (ordered residues and disordered residues in short and long disordered regions) and reduces it into a two-state prediction afterwards. SPINE-D was tested on various sets composed of different combinations of Disprot annotated proteins and proteins directly from the PDB annotated for disorder by missing coordinates in X-ray determined structures. While disorder annotations are different according to Disprot and X-ray approaches, SPINE-D's prediction accuracy and ability to predict disorder are relatively independent of how the method was trained and what type of annotation was employed but strongly depend on the balance in the relative populations of ordered and disordered residues in short and long disordered regions in the test set. With greater than 85% overall specificity for detecting residues in both short and long disordered regions, the residues in long disordered regions are easier to predict at 81% sensitivity in a balanced test dataset with 56.5% ordered residues but more challenging (at 65% sensitivity) in a test dataset with 90% ordered residues. Compared to eleven other methods, SPINE-D yields the highest area under the curve (AUC), the highest Mathews correlation coefficient for residue-based prediction, and the lowest mean square error in predicting disorder contents of proteins for an independent test set with 329 proteins. In particular, SPINE-D is comparable to a meta predictor in predicting disordered residues in long disordered regions and superior in short disordered regions. SPINE-D participated in CASP 9 blind prediction and is one of the top servers according to the official ranking. In addition, SPINE-D was examined for prediction of functional molecular recognition motifs in several case studies.  相似文献   

2.
Abstract

Short and long disordered regions of proteins have different preference for different amino acid residues. Different methods often have to be trained to predict them separately. In this study, we developed a single neural-network-based technique called SPINE-D that makes a three-state prediction first (ordered residues and disordered residues in short and long disordered regions) and reduces it into a two-state prediction afterwards. SPINE-D was tested on various sets composed of different combinations of Disprot annotated proteins and proteins directly from the PDB annotated for disorder by missing coordinates in X-ray determined structures. While disorder annotations are different according to Disprot and X-ray approaches, SPINE-D's prediction accuracy and ability to predict disorder are relatively independent of how the method was trained and what type of annotation was employed but strongly depend on the balance in the relative populations of ordered and disordered residues in short and long disordered regions in the test set. With greater than 85% overall specificity for detecting residues in both short and long disordered regions, the residues in long disordered regions are easier to predict at 81% sensitivity in a balanced test dataset with 56.5% ordered residues but more challenging (at 65% sensitivity) in a test dataset with 90% ordered residues. Compared to eleven other methods, SPINE-D yields the highest area under the curve (AUC), the highest Mathews correlation coefficient for residue-based prediction, and the lowest mean square error in predicting disorder contents of proteins for an independent test set with 329 proteins. In particular, SPINE-D is comparable to a meta predictor in predicting disordered residues in long disordered regions and superior in short disordered regions. SPINE-D participated in CASP 9 blind prediction and is one of the top servers according to the official ranking. In addition, SPINE-D was examined for prediction of functional molecular recognition motifs in several case studies. The server and databases are available at http://sparks.informatics.iupui.edu/.  相似文献   

3.
We have performed a statistical analysis of unstructured amino acid residues in protein structures available in the databank of protein structures. Data on the occurrence of disordered regions at the ends and in the middle part of protein chains have been obtained: in the regions near the ends (at distance less than 30 residues from the N- or C-terminus), there are 66% of unstructured residues (38% are near the N-terminus and 28% are near the C-terminus), although these terminal regions include only 23% of the amino acid residues. The frequencies of occurrence of unstructured residues have been calculated for each of 20 types in different positions in the protein chain. It has been shown that relative frequencies of occurrence of unstructured residues of 20 types at the termini of protein chains differ from the ones in the middle part of the protein chain; amino acid residues of the same type have different probabilities to be unstructured in the terminal regions and in the middle part of the protein chain. The obtained frequencies of occurrence of unstructured residues in the middle part of the protein chain have been used as a scale for predicting disordered regions from amino acid sequence using the method (FoldUnfold) previously developed by us. This scale of frequencies of occurrence of unstructured residues correlates with the contact scale (previously developed by us and used for the same purpose) at a level of 95%. Testing the new scale on a database of 427 unstructured proteins and 559 completely structured proteins has shown that this scale can be successfully used for the prediction of disordered regions in protein chains.  相似文献   

4.
Identification of disordered regions in polypeptide chains is very important because such regions are essential for protein function. A new parameter, namely mean packing density of residues has been introduced to detect disordered regions in a protein sequence. We have demonstrated that regions with weak expected packing density would be responsible for the appearance of disordered regions. Our method (FoldUnfold) has been tested on datasets of globular proteins (559 proteins) and long disordered protein segments (129 proteins) and showed improved performance over some other widely used methods, such as DISOPRED, PONDR VL3H, IUPred and GlobPlot. AVAILABILITY: The FoldUnfold server is available for users at http://skuld.protres.ru/~mlobanov/ogu/ogu.cgi. There is a link to our server through the web site of DisProt (http://www.disprot.org/predictors.php).  相似文献   

5.
A new criterion proposed for classification of the living world is based on the ability of the protein amino acid sequence to form disordered regions, appearing as loops in the 3D structure. The approach used fundamentally differs from the approaches based on comparisons of certain RNA or protein sequences of different organisms. Introduction of any new structural-functional criterion that could resolve the evolutionary relationships between the main groups of origin organisms is of interest in itself, as megasystematics and macrophylogeny lack informative criteria despite the apparent abundance of molecular characteristics. The specialized program FoldUnfold was used to search for disordered regions in the elongation factors EF1A (EFs). The reliability of loop prediction was verified against five EFs with the structures known from X-ray analysis. It was demonstrated with the example of several dozens of typical representatives of the living world that the program predicts extra loops in addition to two linkers between three structural domains in EFs. Besides the effector loop, contained in all EFs, six loops were detected at maximum. Of them, three loops (A, B, and C) are in domain I, one (D) is in domain II, and two (E and F) are in domain III. Moreover, all six loops are never present in the same EF. The EF signatures were determined for each of the superkingdoms of life. Each superkingdom displayed variations in the number of loops and their location within the EF domains. Not only the presence of a particular loop was important in the analysis, but also the specificity of its amino acid sequence. As the total number of predicted loops in EFs increases with the increasing complexity of organisms, the following evolutionary role was postulated for the loops. Following the principle of thrifty inventiveness, nature operates with different universal inserts (loops), adapting their number, location within the EF domains, and amino acid composition so that the protein performs specialized functions—single in protozoa and several in higher organisms.  相似文献   

6.
Protein flexibility and intrinsic disorder   总被引:6,自引:0,他引:6  
Comparisons were made among four categories of protein flexibility: (1) low-B-factor ordered regions, (2) high-B-factor ordered regions, (3) short disordered regions, and (4) long disordered regions. Amino acid compositions of the four categories were found to be significantly different from each other, with high-B-factor ordered and short disordered regions being the most similar pair. The high-B-factor (flexible) ordered regions are characterized by a higher average flexibility index, higher average hydrophilicity, higher average absolute net charge, and higher total charge than disordered regions. The low-B-factor regions are significantly enriched in hydrophobic residues and depleted in the total number of charged residues compared to the other three categories. We examined the predictability of the high-B-factor regions and developed a predictor that discriminates between regions of low and high B-factors. This predictor achieved an accuracy of 70% and a correlation of 0.43 with experimental data, outperforming the 64% accuracy and 0.32 correlation of predictors based solely on flexibility indices. To further clarify the differences between short disordered regions and ordered regions, a predictor of short disordered regions was developed. Its relatively high accuracy of 81% indicates considerable differences between ordered and disordered regions. The distinctive amino acid biases of high-B-factor ordered regions, short disordered regions, and long disordered regions indicate that the sequence determinants for these flexibility categories differ from one another, whereas the significantly-greater-than-chance predictability of these categories from sequence suggest that flexible ordered regions, short disorder, and long disorder are, to a significant degree, encoded at the primary structure level.  相似文献   

7.
Intrinsically disordered or unstructured proteins (or regions in proteins) have been found to be important in a wide range of biological functions and implicated in many diseases. Due to the high cost and low efficiency of experimental determination of intrinsic disorder and the exponential increase of unannotated protein sequences, developing complementary computational prediction methods has been an active area of research for several decades. Here, we employed an ensemble of deep Squeeze-and-Excitation residual inception and long short-term memory (LSTM) networks for predicting protein intrinsic disorder with input from evolutionary information and predicted one-dimensional structural properties. The method, called SPOT-Disorder2, offers substantial and consistent improvement not only over our previous technique based on LSTM networks alone, but also over other state-of-the-art techniques in three independent tests with different ratios of disordered to ordered amino acid residues, and for sequences with either rich or limited evolutionary information. More importantly, semi-disordered regions predicted in SPOT-Disorder2 are more accurate in identifying molecular recognition features (MoRFs) than methods directly designed for MoRFs prediction. SPOT-Disorder2 is available as a web server and as a standalone program at https://sparks-lab.org/server/spot-disorder2/.  相似文献   

8.
Abstract

The Protein Data Bank (PDB) is the preeminent source of protein structural information. PDB contains over 32,500 experimentally determined 3-D structures solved using X-ray crystallography or nuclear magnetic resonance spectroscopy. Intrinsically disordered regions fail to form a fixed 3-D structure under physiological conditions. In this study, we compare the amino-acid sequences of proteins whose structures are determined by X-ray crystallography with the corresponding sequences from the Swiss-Prot database. The analyzed dataset includes 16,370 structures, which represent 18,101 PDB chains and 5,434 different proteins from 910 different organisms (2,793 eukaryotic, 2,109 bacterial, 288 viral, and 244 archaeal). In this dataset, on average, each Swiss-Prot protein is represented by 7 PDB chains with 76% of the crystallized regions being represented by more than one structure. Intriguingly, the complete sequences of only ~7% of proteins are observed in the corresponding PDB structures, and only ~25% of the total dataset have >95% of their lengths observed in the corresponding PDB structures. This suggests that the vast majority of PDB proteins is shorter than their corresponding Swiss-Prot sequences and/or contain numerous residues, which are not observed in maps of electron density. To determine the prevalence of disordered regions in PDB, the residues in the Swiss-Prot sequences were grouped into four general categories, “Observed” (which correspond to structured regions), “Not observed” (regions with missing electron density, potentially disordered), “Uncharacterized,” and “Ambiguous,” depending on their appearance in the corresponding PDB entries. This non-redundant set of residues can be viewed as a ‘fragment’ or empirical domain database that contains a set of experimentally determined structured regions or domains and a set of experimentally verified disordered regions or domains. We studied the propensities and properties of residues in these four categories and analyzed their relations to the predictions of disorder using several algorithms. “Non-observed,” “Ambiguous,” and “Uncharacterized” regions were shown to possess the amino acid compositional biases typical of intrinsically disordered proteins. The application of four different disorder predictors (PONDR® VL-XT, VL3-BA, VSL1P, and IUPred) revealed that the vast majority of residues in the “Observed” dataset are ordered, and that the “Not observed” regions are mostly disordered. The “Uncharacterized” regions possess some tendency toward order, whereas the predictions for the short “Ambiguous” regions are really ambiguous. Long “Ambiguous” regions (>70 amino acid residues) are mostly predicted to be ordered, suggesting that they are likely to be “wobbly” domains.

Overall, we showed that completely ordered proteins are not highly abundant in PDB and many PDB sequences have disordered regions. In fact, in the analyzed dataset ~10% of the PDB proteins contain regions of consecutive missing or ambiguous residues longer than 30 amino-acids and ~40% of the proteins possess short regions (≥10 and <30 amino-acid long) of missing and ambiguous residues.  相似文献   

9.
Intrinsic disorder in the Protein Data Bank   总被引:2,自引:0,他引:2  
The Protein Data Bank (PDB) is the preeminent source of protein structural information. PDB contains over 32,500 experimentally determined 3-D structures solved using X-ray crystallography or nuclear magnetic resonance spectroscopy. Intrinsically disordered regions fail to form a fixed 3-D structure under physiological conditions. In this study, we compare the amino-acid sequences of proteins whose structures are determined by X-ray crystallography with the corresponding sequences from the Swiss-Prot database. The analyzed dataset includes 16,370 structures, which represent 18,101 PDB chains and 5,434 different proteins from 910 different organisms (2,793 eukaryotic, 2,109 bacterial, 288 viral, and 244 archaeal). In this dataset, on average, each Swiss-Prot protein is represented by 7 PDB chains with 76% of the crystallized regions being represented by more than one structure. Intriguingly, the complete sequences of only approximately 7% of proteins are observed in the corresponding PDB structures, and only approximately 25% of the total dataset have >95% of their lengths observed in the corresponding PDB structures. This suggests that the vast majority of PDB proteins is shorter than their corresponding Swiss-Prot sequences and/or contain numerous residues, which are not observed in maps of electron density. To determine the prevalence of disordered regions in PDB, the residues in the Swiss-Prot sequences were grouped into four general categories, "Observed" (which correspond to structured regions), "Not observed" (regions with missing electron density, potentially disordered), "Uncharacterized," and "Ambiguous," depending on their appearance in the corresponding PDB entries. This non-redundant set of residues can be viewed as a 'fragment' or empirical domain database that contains a set of experimentally determined structured regions or domains and a set of experimentally verified disordered regions or domains. We studied the propensities and properties of residues in these four categories and analyzed their relations to the predictions of disorder using several algorithms. "Non-observed," "Ambiguous," and "Uncharacterized" regions were shown to possess the amino acid compositional biases typical of intrinsically disordered proteins. The application of four different disorder predictors (PONDR(R) VL-XT, VL3-BA, VSL1P, and IUPred) revealed that the vast majority of residues in the "Observed" dataset are ordered, and that the "Not observed" regions are mostly disordered. The "Uncharacterized" regions possess some tendency toward order, whereas the predictions for the short "Ambiguous" regions are really ambiguous. Long "Ambiguous" regions (>70 amino acid residues) are mostly predicted to be ordered, suggesting that they are likely to be "wobbly" domains. Overall, we showed that completely ordered proteins are not highly abundant in PDB and many PDB sequences have disordered regions. In fact, in the analyzed dataset approximately 10% of the PDB proteins contain regions of consecutive missing or ambiguous residues longer than 30 amino-acids and approximately 40% of the proteins possess short regions (> or =10 and < 30 amino-acid long) of missing and ambiguous residues.  相似文献   

10.
Many protein regions have been shown to be intrinsically disordered, lacking unique structure under physiological conditions. These intrinsically disordered regions are not only very common in proteomes, but also crucial to the function of many proteins, especially those involved in signaling, recognition, and regulation. The goal of this work was to identify the prevalence, characteristics, and functions of conserved disordered regions within protein domains and families. A database was created to store the amino acid sequences of nearly one million proteins and their domain matches from the InterPro database, a resource integrating eight different protein family and domain databases. Disorder prediction was performed on these protein sequences. Regions of sequence corresponding to domains were aligned using a multiple sequence alignment tool. From this initial information, regions of conserved predicted disorder were found within the domains. The methodology for this search consisted of finding regions of consecutive positions in the multiple sequence alignments in which a 90% or more of the sequences were predicted to be disordered. This procedure was constrained to find such regions of conserved disorder prediction that were at least 20 amino acids in length. The results of this work included 3,653 regions of conserved disorder prediction, found within 2,898 distinct InterPro entries. Most regions of conserved predicted disorder detected were short, with less than 10% of those found exceeding 30 residues in length.  相似文献   

11.
12.
Single amino acid substitutions were generated in predicted hydrophilic loop regions of the human tumour necrosis factor beta (TNF-beta) molecule, and the mutant proteins were expressed in Escherichia coli and purified. Mutants with single amino acid changes at either of two distinct loop regions, at positions aspartic acid 50 or tyrosine 108, were found to have greatly reduced receptor binding and cytotoxic activity. These two regions in TNF-beta correspond to known loop regions where mutations also result in loss of biological activity of TNF-alpha, a related cytokine which shares the same cellular receptors with TNF-beta. The two distinct loops at positions 31-34 and 84-89 in the known three-dimensional structure of TNF-alpha (equivalent to positions 46-50 and 105-110 respectively in TNF-beta), lie on opposite sides of the TNF-alpha monomer. When the TNF-alpha monomer forms a trimer, the two loops, each from a different subunit of the trimer, come together and lie in a cleft between adjacent subunits. Together, these findings suggest that a TNF receptor binds to a cleft between subunits via surface loops at amino acid residues 31-34 and 84-89 in TNF-alpha, and similarly via surface loops including amino acids aspartic acid 50 and tyrosine 108 in TNF-beta.  相似文献   

13.
本文对固有无序蛋白(IDPs)与其他蛋白质相互作用位点残基特征进行了研究.首先在数据库中选出满足条件的109条IDPs蛋白质链及与其他配体蛋白形成的299个IDPs-蛋白质复合物,然后提取复合物中作为相互作用位点的IDPs-蛋白质残基.这109条IDPs链中共含有50 031个氨基酸残基,其中处于作用位点的残基有4 822个.通过分析发现,20种氨基酸在形成IDPs-蛋白质相互作用位点残基时具有不同的倾向性,根据形成作用位点残基的倾向性,20种氨基酸可分成三大类:倾向型氨基酸(ILE、LEU、ARG、PHE、TYR、MET、TRP)、中间型氨基酸(GLN、GLU、THR、LYS、VAL、ASP、HIS)、非倾向型氨基酸(PRO、SER、GLY、ALA、ASN、CYS).研究结果还进一步表明,不同氨基酸在有序区域与无序区域形成IDPs-蛋白质作用位点残基的倾向性不同.其中,氨基酸TRP、LEU、ILE、CYS在有序和无序区域形成作用位点残基的差异性尤为明显,而氨基酸GLU、PHE、HIS、ALA则基本没有多大差别.对IDPs-蛋白质相互作用位点残基理化特征进行分析发现:疏水性强、侧链净电荷量较少、极性较小、溶剂可及性表面积较大、侧链体积较大、极化率较大的氨基酸比较倾向于形成作用位点残基.主成分分析结果显示,残基的极化率、侧链体积和溶剂可及表面积对作用位点残基影响最大.  相似文献   

14.
The intradiskal surface of the transmembrane protein, rhodopsin, consists of the amino terminal domain and three loops connecting six of the seven transmembrane helices. This surface corresponds to the extracellular surface of other G-protein receptors. Peptides that represent each of the extramembraneous domains on this surface (three loops and the amino terminus) were synthesized. These peptides also included residues which, based on a hydrophobic plot, could be expected to be part of the transmembrane helix. The structure of each of these peptides in solution was then determined using two-dimensional 1H nuclear magnetic resonance. All peptide domains showed ordered structures in solution. The structures of each of the peptides from intradiskal loops of rhodopsin exhibited a turn in the central region of the peptide. The ends of the peptides show an unwinding of the transmembrane helices to form this turn. The amino terminal domain peptide exhibited alpha-helical regions with breaks and bends at proline residues. This region forms a compact domain. Together, the structures for the loop and amino terminus domains indicate that the intradiskal surface of rhodopsin is ordered. These data further suggest a structural motif for short loops in transmembrane proteins. The ordered structures of these loops, in the absence of the transmembrane helices, indicate that the primary sequences of these loops are sufficient to code for the turn.  相似文献   

15.
Both the ordered and disordered solvent networks of vitamin B12 coenzyme crystal hydrate have been generated by Monte Carlo simulation techniques. Several different potential functions have been use to model both water-water and water-solute (i.e., water-coenzyme) interactions. The results have been analysed in terms of the structural properties of the water networks, such as mean water oxygen and hydrogen positions, coordination of each water molecule, and maxima of probability density maps in all four asymmetric units of this crystal.The following results were found: (I) Within each asymmetric unit only one hydrogen bonding network was predicted although there were several hydrogen atom positions for any one solvent molecule (defined as maxima in probability density). (II) Reasonable agreement was obtained between predicted and experimental positions in the ordered solvent region, independent of the potential function used. (III) The positions of the calculated probability density maxima for the disordered channel region were different in different asymmetric units; this led to different simulated hydrogen bond networks which were not always consistent with the experimentally determined alternative (lower occupancy) sites.The results suggest that it is advisable to simulate more than one asymmetric unit if one wishes to look at disorder in the solvent regions. Probability density maps were qualitatively very useful for picturing these disordered regions. However, there were no significant differences between quantitative results predicted using either average atomic positions or maxima of the probability density distributions.Problems in quantifying agreement between experimental and predicted disordered solvent networks are discussed. The potential which included hydrogen atoms explicitly (EMPWI) seemed to give the best overall agreement, mainly because it was successful in predicting the unusually short hydrogen bonds which are found in this crystal.  相似文献   

16.
Local structural disorder imparts plasticity on linear motifs   总被引:5,自引:0,他引:5  
MOTIVATION: The dynamic nature of protein interaction networks requires fast and transient molecular switches. The underlying recognition motifs (linear motifs, LMs) are usually short and evolutionarily variable segments, which in several cases, such as phosphorylation sites or SH3-binding regions, fall into locally disordered regions. We probed the generality of this phenomenon by predicting the intrinsic disorder of all LM-containing proteins enlisted in the Eukaryotic Linear Motif (ELM) database. RESULTS: We demonstrated that LMs in average are embedded in locally unstructured regions, while their amino acid composition and charge/hydropathy properties exhibit a mixture characteristic of folded and disordered proteins. Overall, LMs are constructed by grafting a few specificity-determining residues favoring structural order on a highly flexible carrier region. These results establish a connection between LMs and molecular recognition elements of intrinsically unstructured proteins (IUPs), which realize a non-conventional mode of partner binding mostly in regulatory functions.  相似文献   

17.
Intrinsically disordered regions serve as molecular recognition elements, which play an important role in the control of many cellular processes and signaling pathways. It is useful to be able to predict positions of disordered residues and disordered regions in protein chains using protein sequence alone. A new method (IsUnstruct) based on the Ising model for prediction of disordered residues from protein sequence alone has been developed. According to this model, each residue can be in one of two states: ordered or disordered. The model is an approximation of the Ising model in which the interaction term between neighbors has been replaced by a penalty for changing between states (the energy of border). The IsUnstruct has been compared with other available methods and found to perform well. The method correctly finds 77% of disordered residues as well as 87% of ordered residues in the CASP8 database, and 72% of disordered residues as well as 85% of ordered residues in the DisProt database.  相似文献   

18.
Intrinsically disordered regions serve as molecular recognition elements, which play an important role in the control of many cellular processes and signaling pathways. It is useful to be able to predict positions of disordered regions in protein chains. The statistical analysis of disordered residues was done considering 34,464 unique protein chains taken from the PDB database. In this database, 4.95% of residues are disordered (i.e. invisible in X-ray structures). The statistics were obtained separately for the N- and C-termini as well as for the central part of the protein chain. It has been shown that frequencies of occurrence of disordered residues of 20 types at the termini of protein chains differ from the ones in the middle part of the protein chain. Our systematic analysis of disordered regions in PDB revealed 109 disordered patterns of different lengths. Each of them has disordered occurrences in at least five protein chains with identity less than 20%. The vast majority of all occurrences of each disordered pattern are disordered. This allows one to use the library of disordered patterns for predicting the status of a residue of a given protein to be ordered or disordered. We analyzed the occurrence of the selected patterns in three eukaryotic and three bacterial proteomes.  相似文献   

19.
Prediction of short linear protein binding regions   总被引:1,自引:0,他引:1  
Short linear motifs in proteins (typically 3-12 residues in length) play key roles in protein-protein interactions by frequently binding specifically to peptide binding domains within interacting proteins. Their tendency to be found in disordered segments of proteins has meant that they have often been overlooked. Here we present SLiMPred (short linear motif predictor), the first general de novo method designed to computationally predict such regions in protein primary sequences independent of experimentally defined homologs and interactors. The method applies machine learning techniques to predict new motifs based on annotated instances from the Eukaryotic Linear Motif database, as well as structural, biophysical, and biochemical features derived from the protein primary sequence. We have integrated these data sources and benchmarked the predictive accuracy of the method, and found that it performs equivalently to a predictor of protein binding regions in disordered regions, in addition to having predictive power for other classes of motif sites such as polyproline II helix motifs and short linear motifs lying in ordered regions. It will be useful in predicting peptides involved in potential protein associations and will aid in the functional characterization of proteins, especially of proteins lacking experimental information on structures and interactions. We conclude that, despite the diversity of motif sequences and structures, SLiMPred is a valuable tool for prioritizing potential interaction motifs in proteins.  相似文献   

20.
Serine/arginine-rich (SR) splicing factors play an important role in constitutive and alternative splicing as well as during several steps of RNA metabolism. Despite the wealth of functional information about SR proteins accumulated to-date, structural knowledge about the members of this family is very limited. To gain a better insight into structure-function relationships of SR proteins, we performed extensive sequence analysis of SR protein family members and combined it with ordered/disordered structure predictions. We found that SR proteins have properties characteristic of intrinsically disordered (ID) proteins. The amino acid composition and sequence complexity of SR proteins were very similar to those of the disordered protein regions. More detailed analysis showed that the SR proteins, and their RS domains in particular, are enriched in the disorder-promoting residues and are depleted in the order-promoting residues as compared to the entire human proteome. Moreover, disorder predictions indicated that RS domains of SR proteins were completely unstructured. Two different classification methods, the charge-hydropathy measure and the cumulative distribution function (CDF) of the disorder scores, were in agreement with each other, and they both strongly predicted members of the SR protein family to be disordered. This study emphasizes the importance of the disordered structure for several functions of SR proteins, such as for spliceosome assembly and for interaction with multiple partners. In addition, it demonstrates the usefulness of order/disorder predictions for inferring protein structure from sequence.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号