期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Markov models of amino acid substitution to study proteins with intrinsically disordered regions

Szalkowski AM Anisimova M 《PloS one》2011,6(5):e20488

Background

Intrinsically disordered proteins (IDPs) or proteins with disordered regions (IDRs) do not have a well-defined tertiary structure, but perform a multitude of functions, often relying on their native disorder to achieve the binding flexibility through changing to alternative conformations. Intrinsic disorder is frequently found in all three kingdoms of life, and may occur in short stretches or span whole proteins. To date most studies contrasting the differences between ordered and disordered proteins focused on simple summary statistics. Here, we propose an evolutionary approach to study IDPs, and contrast patterns specific to ordered protein regions and the corresponding IDRs.

Results

Two empirical Markov models of amino acid substitutions were estimated, based on a large set of multiple sequence alignments with experimentally verified annotations of disordered regions from the DisProt database of IDPs. We applied new methods to detect differences in Markovian evolution and evolutionary rates between IDRs and the corresponding ordered protein regions. Further, we investigated the distribution of IDPs among functional categories, biochemical pathways and their preponderance to contain tandem repeats.

Conclusions

We find significant differences in the evolution between ordered and disordered regions of proteins. Most importantly we find that disorder promoting amino acids are more conserved in IDRs, indicating that in some cases not only amino acid composition but the specific sequence is important for function. This conjecture is also reinforced by the observation that for of our data set IDRs evolve more slowly than the ordered parts of the proteins, while we still support the common view that IDRs in general evolve more quickly. The improvement in model fit indicates a possible improvement for various types of analyses e.g. de novo disorder prediction using a phylogenetic Hidden Markov Model based on our matrices showed a performance similar to other disorder predictors. 相似文献

2.

Using Bayesian multinomial classifier to predict whether a given protein sequence is intrinsically disordered

Bulashevska A Eils R 《Journal of theoretical biology》2008,254(4):799-803

Intrinsically disordered proteins (IDPs) lack a well-defined three-dimensional structure under physiological conditions. Intrinsic disorder is a common phenomenon, particularly in multicellular eukaryotes, and is responsible for important protein functions including regulation and signaling. Many disease-related proteins are likely to be intrinsically disordered or to have disordered regions. In this paper, a new predictor model based on the Bayesian classification methodology is introduced to predict for a given protein or protein region if it is intrinsically disordered or ordered using only its primary sequence. The method allows to incorporate length-dependent amino acid compositional differences of disordered regions by including separate statistical representations for short, middle and long disordered regions. The predictor was trained on the constructed data set of protein regions with known structural properties. In a Jack-knife test, the predictor achieved the sensitivity of 89.2% for disordered and 81.4% for ordered regions. Our method outperformed several reported predictors when evaluated on the previously published data set of Prilusky et al. [2005. FoldIndex: a simple tool to predict whether a given protein sequence is intrinsically unfolded. Bioinformatics 21 (16), 3435-3438]. Further strength of our approach is the ease of implementation. 相似文献

3.

Flavors of protein disorder 总被引：1，自引：0，他引：1

Vucetic S Brown CJ Dunker AK Obradovic Z 《Proteins》2003,52(4):573-584

Intrinsically disordered proteins are characterized by long regions lacking 3-D structure in their native states, yet they have been so far associated with 28 distinguishable functions. Previous studies showed that protein predictors trained on disorder from one type of protein often achieve poor accuracy on disorder of proteins of a different type, thus indicating significant differences in sequence properties among disordered proteins. Important biological problems are identifying different types, or flavors, of disorder and examining their relationships with protein function. Innovative use of computational methods is needed in addressing these problems due to relative scarcity of experimental data and background knowledge related to protein disorder. We developed an algorithm that partitions protein disorder into flavors based on competition among increasing numbers of predictors, with prediction accuracy determining both the number of distinct predictors and the partitioning of the individual proteins. Using 145 variously characterized proteins with long (>30 amino acids) disordered regions, 3 flavors, called V, C, and S, were identified by this approach, with the V subset containing 52 segments and 7743 residues, C containing 39 segments and 3402 residues, and S containing 54 segments and 5752 residues. The V, C, and S flavors were distinguishable by amino acid compositions, sequence locations, and biological function. For the sequences in SwissProt and 28 genomes, their protein functions exhibit correlations with the commonness and usage of different disorder flavors, suggesting different flavor-function sets across these protein groups. Overall, the results herein support the flavor-function approach as a useful complement to structural genomics as a means for automatically assigning possible functions to sequences. 相似文献

4.

Location of disorder in coiled coil proteins is influenced by its biological role and subcellular localization: a GO-based study on human proteome

Anurag M Singh GP Dash D 《Molecular bioSystems》2012,8(1):346-352

Intrinsic disorder in proteins has been explored to study lack of structure-function aspects of many proteins. The current study focuses on coiled coils which are often linked to intrinsic disorder. We present a sequence level analysis of human coiled coils to find out if this is universally true for all coiled coils. When annotated coiled-coil regions were collected from UniProt and investigated with disorder prediction tools namely-IUPred and DISpro, three patterns were commonly observed-disordered coiled coils (DisCCs), ordered coiled coils (OCCs) and the last one having a disordered region outside the coiled-coil region (DOCCs). Differential enrichment in the gene ontology was seen in these three categories. We found that OCCs are enriched in structural components of the extracellular space including the fibrinogen complex and laminin complex. On the contrary, DisCCs were found to be exclusively over-represented in proteins involved in actin filament, lamellipodium, cell junction, macromolecule complexes, ciliary rootlet and nucleolus. DOCCs are found to be associated with many regulatory and adaptor functions including positive regulation of calcium ion transport via store-operated calcium channel activity, cytoskeletal adaptor activity etc. Other than the GO-based analysis, sequence level analysis showed that disordered coiled-coil regions bear a high proportion of low-complexity regions as compared to ordered coiled coils. The former also has a higher probability of forming a dimer as compared to the ordered counterpart. Our study shows that the in silico approach of mapping of disorder in or around coiled coils in other biological systems or organisms can be applied to understand and rationalize the mode of action of these dynamic motifs. 相似文献

5.

The twilight zone between protein order and disorder

下载免费PDF全文

Szilágyi A Györffy D Závodszky P 《Biophysical journal》2008,95(4):1612-1626

The amino acid composition of intrinsically disordered proteins and protein segments characteristically differs from that of ordered proteins. This observation forms the basis of several disorder prediction methods. These, however, usually perform worse for smaller proteins (or segments) than for larger ones. We show that the regions of amino acid composition space corresponding to ordered and disordered proteins overlap with each other, and the extent of the overlap (the “twilight zone”) is larger for short than for long chains. To explain this finding, we used two-dimensional lattice model proteins containing hydrophobic, polar, and charged monomers and revealed the relation among chain length, amino acid composition, and disorder. Because the number of chain configurations exponentially grows with chain length, a larger fraction of longer chains can reach a low-energy, ordered state than do shorter chains. The amount of information carried by the amino acid composition about whether a protein or segment is (dis)ordered grows with increasing chain length. Smaller proteins rely more on specific interactions for stability, which limits the possible accuracy of disorder prediction methods. For proteins in the “twilight zone”, size can determine order, as illustrated by the example of two-state homodimers. 相似文献

6.

Sequence patterns associated with disordered regions in proteins

Lise S Jones DT 《Proteins》2005,58(1):144-150

The relationship between amino acid sequence and intrinsic disorder in proteins is investigated. Two databases, one of disordered proteins and the other of globular proteins, are analyzed and compared in order to extract simple sequence patterns of a few amino acids or amino acid properties that characterize disordered segments. It is found that a number of reliable, nonrandom associations exists. In particular, two types of patterns appear to be recurrent: a proline-rich pattern and a (positively or negatively) charged pattern. These results indicate that local sequence information can determine disordered regions in proteins. The derived patterns provide some insights into the physical reasons for disordered structures. They should also be helpful in improving currently available prediction methods. 相似文献

7.

Length-dependent prediction of protein intrinsic disorder 总被引：2，自引：0，他引：2

Kang Peng Predrag Radivojac Slobodan Vucetic A Keith Dunker Zoran Obradovic 《BMC bioinformatics》2006,7(1):208-17

Background

Due to the functional importance of intrinsically disordered proteins or protein regions, prediction of intrinsic protein disorder from amino acid sequence has become an area of active research as witnessed in the 6th experiment on Critical Assessment of Techniques for Protein Structure Prediction (CASP6). Since the initial work by Romero et al. (Identifying disordered regions in proteins from amino acid sequences, IEEE Int. Conf. Neural Netw., 1997), our group has developed several predictors optimized for long disordered regions (>30 residues) with prediction accuracy exceeding 85%. However, these predictors are less successful on short disordered regions (≤30 residues). A probable cause is a length-dependent amino acid compositions and sequence properties of disordered regions. 相似文献

8.

Evolutionary rate heterogeneity in proteins with long disordered regions

Brown CJ Takayama S Campen AM Vise P Marshall TW Oldfield CJ Williams CJ Dunker AK 《Journal of molecular evolution》2002,55(1):104-110

The dominant view in protein science is that a three-dimensional (3-D) structure is a prerequisite for protein function. In contrast to this dominant view, there are many counterexample proteins that fail to fold into a 3-D structure, or that have local regions that fail to fold, and yet carry out function. Protein without fixed 3-D structure is called intrinsically disordered. Motivated by anecdotal accounts of higher rates of sequence evolution in disordered protein than in ordered protein we are exploring the molecular evolution of disordered proteins. To test whether disordered protein evolves more rapidly than ordered protein, pairwise genetic distances were compared between the ordered and the disordered regions of 26 protein families having at least one member with a structurally characterized region of disorder of 30 or more consecutive residues. For five families, there were no significant differences in pairwise genetic distances between ordered and disordered sequences. The disordered region evolved significantly more rapidly than the ordered region for 19 of the 26 families. The functions of these disordered regions are diverse, including binding sites for protein, DNA, or RNA and also including flexible linkers. The functions of some of these regions are unknown. The disordered regions evolved significantly more slowly than the ordered regions for the two remaining families. The functions of these more slowly evolving disordered regions include sites for DNA binding. More work is needed to understand the underlying causes of the variability in the evolutionary rates of intrinsically ordered and disordered protein. 相似文献

9.

Analyses of the general rule on residue pair frequencies in local amino acid sequences of soluble,ordered proteins

Matsuyuki Shirota Kengo Kinoshita 《Protein science : a publication of the Protein Society》2013,22(6):725-733

The amino acid sequences of soluble, ordered proteins with stable structures have evolved due to biological and physical requirements, thus distinguishing them from random sequences. Previous analyses have focused on extracting the features that frequently appear in protein substructures, such as α‐helix and β‐sheet, but the universal features of protein sequences have not been addressed. To clarify the differences between native protein sequences and random sequences, we analyzed 7368 soluble, ordered protein sequences, by inspecting the observed and expected occurrences of 400 amino acid pairs in local proximity, up to 10 residues along the sequence in comparison with their expected occurrence in random sequence. We found the trend that the hydrophobic residue pairs and the polar residue pairs are significantly decreased, whereas the pairs between a hydrophobic residue and a polar residue are increased. This trend was universally observed regardless of the secondary structure content but was not observed in protein sequences that include intrinsically disordered regions, indicating that it can be a general rule of protein foldability. The possible benefits of this rule are discussed from the viewpoints of protein aggregation and disorder, which are both caused by low‐complexity regions of hydrophobic or polar residues. 相似文献

10.

Comparison of sequence masking algorithms and the detection of biased protein sequence regions

Kreil DP Ouzounis CA 《Bioinformatics (Oxford, England)》2003,19(13):1672-1681

MOTIVATION: Separation of protein sequence regions according to their local information complexity and subsequent masking of low complexity regions has greatly enhanced the reliability of function prediction by sequence similarity. Comparisons with alternative methods that focus on compositional sequence bias rather than information complexity measures have shown that removal of compositional bias yields at least as sensitive and much more specific results. Besides the application of sequence masking algorithms to sequence similarity searches, the study of the masked regions themselves is of great interest. Traditionally, however, these have been neglected despite evidence of their functional relevance. RESULTS: Here we demonstrate that compositional bias seems to be a more effective measure for the detection of biologically meaningful signals. Typical results on proteins are compared to results for sequences that have been randomized in various ways, conserving composition and local correlations for individual proteins or the entire set. It is remarkable that low-complexity regions have the same form of distribution in proteins as in randomized sequences, and that the signal from randomized sequences with conserved local correlations and amino acid composition almost matches the signal from proteins. This is not the case for sequence bias, which hence seems to be a genuinely biological phenomenon in contrast to patches of low complexity. 相似文献