Bayesian Markov Random Field Analysis for Protein Function Prediction
Based on Network Data |
| |
Authors: | Yiannis A I Kourmpetis Aalt D J van Dijk Marco C A M Bink Roeland C H J van Ham Cajo J F ter Braak |
| |
Institution: | 1. Biometris, Wageningen University and Research Centre, Wageningen, The
Netherlands.; 2. Applied Bioinformatics, Plant Research International, Wageningen, The
Netherlands.; 3. Laboratory of Bioinformatics, Wageningen University, Wageningen, The
Netherlands.;Miami University, United States of America |
| |
Abstract: | Inference of protein functions is one of the most important aims of modern
biology. To fully exploit the large volumes of genomic data typically produced
in modern-day genomic experiments, automated computational methods for protein
function prediction are urgently needed. Established methods use sequence or
structure similarity to infer functions but those types of data do not suffice
to determine the biological context in which proteins act. Current
high-throughput biological experiments produce large amounts of data on the
interactions between proteins. Such data can be used to infer interaction
networks and to predict the biological process that the protein is involved in.
Here, we develop a probabilistic approach for protein function prediction using
network data, such as protein-protein interaction measurements. We take a
Bayesian approach to an existing Markov Random Field method by performing
simultaneous estimation of the model parameters and prediction of protein
functions. We use an adaptive Markov Chain Monte Carlo algorithm that leads to
more accurate parameter estimates and consequently to improved prediction
performance compared to the standard Markov Random Fields method. We tested our
method using a high quality S.cereviciae validation network
with 1622 proteins against 90 Gene Ontology terms of different levels of
abstraction. Compared to three other protein function prediction methods, our
approach shows very good prediction performance. Our method can be directly
applied to protein-protein interaction or coexpression networks, but also can be
extended to use multiple data sources. We apply our method to physical protein
interaction data from S. cerevisiae and provide novel
predictions, using 340 Gene Ontology terms, for 1170 unannotated proteins and we
evaluate the predictions using the available literature. |
| |
Keywords: | |
|
|