Bayesian Markov Random Field Analysis for Protein Function Prediction
Based on Network Data |
| |
Authors: | Yiannis A. I. Kourmpetis Aalt D. J. van Dijk Marco C. A. M. Bink Roeland C. H. J. van Ham Cajo J. F. ter Braak |
| |
Affiliation: | 1. Biometris, Wageningen University and Research Centre, Wageningen, TheNetherlands.; 2. Applied Bioinformatics, Plant Research International, Wageningen, TheNetherlands.; 3. Laboratory of Bioinformatics, Wageningen University, Wageningen, TheNetherlands.;Miami University, United States of America |
| |
Abstract: | Inference of protein functions is one of the most important aims of modernbiology. To fully exploit the large volumes of genomic data typically producedin modern-day genomic experiments, automated computational methods for proteinfunction prediction are urgently needed. Established methods use sequence orstructure similarity to infer functions but those types of data do not sufficeto determine the biological context in which proteins act. Currenthigh-throughput biological experiments produce large amounts of data on theinteractions between proteins. Such data can be used to infer interactionnetworks and to predict the biological process that the protein is involved in.Here, we develop a probabilistic approach for protein function prediction usingnetwork data, such as protein-protein interaction measurements. We take aBayesian approach to an existing Markov Random Field method by performingsimultaneous estimation of the model parameters and prediction of proteinfunctions. We use an adaptive Markov Chain Monte Carlo algorithm that leads tomore accurate parameter estimates and consequently to improved predictionperformance compared to the standard Markov Random Fields method. We tested ourmethod using a high quality S.cereviciae validation networkwith 1622 proteins against 90 Gene Ontology terms of different levels ofabstraction. Compared to three other protein function prediction methods, ourapproach shows very good prediction performance. Our method can be directlyapplied to protein-protein interaction or coexpression networks, but also can beextended to use multiple data sources. We apply our method to physical proteininteraction data from S. cerevisiae and provide novelpredictions, using 340 Gene Ontology terms, for 1170 unannotated proteins and weevaluate the predictions using the available literature. |
| |
Keywords: | |
|
|