首页 | 本学科首页   官方微博 | 高级检索  
   检索      


Proteome-wide Structural Analysis of PTM Hotspots Reveals Regulatory Elements Predicted to Impact Biological Function and Disease
Authors:Matthew P Torres  Henry Dewhurst  Niveda Sundararaman
Institution:From the School of Biological Sciences; Georgia Institute of Technology; Atlanta, Georgia 30332
Abstract:Post-translational modifications (PTMs) regulate protein behavior through modulation of protein-protein interactions, enzymatic activity, and protein stability essential in the translation of genotype to phenotype in eukaryotes. Currently, less than 4% of all eukaryotic PTMs are reported to have biological function - a statistic that continues to decrease with an increasing rate of PTM detection. Previously, we developed SAPH-ire (Structural Analysis of PTM Hotspots) - a method for the prioritization of PTM function potential that has been used effectively to reveal novel PTM regulatory elements in discrete protein families (Dewhurst et al., 2015). Here, we apply SAPH-ire to the set of eukaryotic protein families containing experimental PTM and 3D structure data - capturing 1,325 protein families with 50,839 unique PTM sites organized into 31,747 modified alignment positions (MAPs), of which 2010 (∼6%) possess known biological function. Here, we show that using an artificial neural network model (SAPH-ire NN) trained to identify MAP hotspots with biological function results in prediction outcomes that far surpass the use of single hotspot features, including nearest neighbor PTM clustering methods. We find the greatest enhancement in prediction for positions with PTM counts of five or less, which represent 98% of all MAPs in the eukaryotic proteome and 90% of all MAPs found to have biological function. Analysis of the top 1092 MAP hotspots revealed 267 of truly unknown function (containing 5443 distinct PTMs). Of these, 165 hotspots could be mapped to human KEGG pathways for normal and/or disease physiology. Many high-ranking hotspots were also found to be disease-associated pathogenic sites of amino acid substitution despite the lack of observable PTM in the human protein family member. Taken together, these experiments demonstrate that the functional relevance of a PTM can be predicted very effectively by neural network models, revealing a large but testable body of potential regulatory elements that impact hundreds of different biological processes important in eukaryotic biology and human health.Since the discovery of phosphorylation in 1954 (1), post-translational modifications (PTMs)1 have emerged as a broad class of protein feature that expand the functional proteome in eukaryotes. Improvements in the detection of PTMs by mass spectrometry have resulted in an exponential increase in our knowledge of the number and type of PTMs that make up the landscape of a modified eukaryotic proteome. As a result, the rate at which PTMs are discovered now far surpasses the rate at which they can be experimentally tested for biological function - a characteristic that is specific for each PTM and likely not equivalent between all PTMs that have been observed (24). Thus, effective methods of prioritization are essential for quantifying the likelihood of a site to be regulatory and/or impactful on biological function, which we refer to as the function potential of a PTM.Several unique features have been identified as predictors of biological impact for any given PTM - the determination of which relies on placing each PTM in the context of a multiple sequence alignment for a discrete protein or domain family, which we refer to as a Modified Alignment Position (MAP). For example, MAPs that are evolutionarily well conserved are more likely to exhibit biological function (3, 4). Similarly, functional PTMs are more commonly found within MAPs that exhibit a higher PTM observation frequency, are dynamic with respect to biological condition, located at protein interaction interfaces, and more solvent-accessible within a folded protein structure (57). Although efforts to elucidate the features associated with functional PTMs are relatively longstanding, few if any have established an integrative approach to quantitatively prioritize the function potential of PTMs beyond the use of single features.Previous evidence from our lab first demonstrated that multiple feature integration can improve functional prioritization. To accomplish this, we built Structural Analysis of PTM Hotspots (SAPH-ire)—an algorithm through which multiple predictors of PTM function are integrated to produce a single, quantitative function potential (FP) score that rank orders each hotspot within or between protein families (6) (Fig. 1). Previously, we used SAPH-ire to predict novel PTM regulatory elements in G protein families—including heterotrimeric G proteins—for which we discovered and experimentally confirmed a novel PTM regulatory element that is critical for cell signaling (6, 8). We propose that similar analysis of PTMs across the entire eukaryotic proteome is likely to result in the discovery of several novel regulatory elements that have yet to be realized.Open in a separate windowFig. 1.Schematic diagram of SAPH-ire. A, A theoretical segment of the multiple sequence alignment for a protein family (IPR000276; G protein-coupled receptor, rhodopsin-like) used here for illustrating the concept of SAPH-ire. Circled amino acid residues represent PTM sites experimentally observed on respective protein family members. Circle and arrow color represents the PTM observation frequency at each aligned position, called a MAP (modified alignment position), where green indicates 1 observation, blue for 2, orange for 3, and red for 5 or more. B, Cartoon rendering of bovine rhodopsin (P02699, RHO; PDB 2PED, chain A) showing side chains with projected PTM hotspots colored according to the number of observations within the family at each position aligned with the structural sequence. PDB coordinate data from the structurally projected PTM hotspots is used for calculation of solvent accessible surface area (SASA) and determination of protein interface residence (PPI). C, Hotspot features derived from the sequence and structural data are extracted for each protein family, where each hotspot corresponds to a precise family alignment position containing at least one PTM observation. D, Comparison of the comprehensive and SAPH-ire datasets representing all known experimental PTM data versus PTM data included in this study, respectively. E, Values calculated and derived from extracted hotspot features are analyzed by logistic regression or neural network models to produce probability scores for each hotspot.Here we apply SAPH-ire to protein families for which PTMs and protein structure are currently available, resulting in function potential prediction for 50,839 experimental PTM sites distributed across 31,747 MAPs. Using a neural network model (SAPH-ire NN) trained to predict the identity of embedded known-function MAPs, we derived a probability score that allows rank ordering for the likelihood of function for all MAPs including those with unknown function. We show that the SAPH-ire NN model significantly outperforms all other single or multi-feature predictive models and exhibits a proportional increase in predictive power for known function hotspots that have been more frequently studied (and therefore published). Using a strictly conservative probability threshold, we characterized the top-ranked 1092 MAPs corresponding to “function potential hotspots,” revealing 267 with truly unknown function - a striking fraction of which are also found mutated in human disease irrespective of whether the human protein, specifically, contains an observed PTM.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号