Institution: | 1.Molecular and Computational Biology Program,University of Southern California,Los Angeles,USA;2.Zilkha Neurogenetic Institute, Keck School of Medicine,University of Southern California,Los Angeles,USA;3.Department of Psychiatry and Behavioral Sciences, Keck School of Medicine,University of Southern California,Los Angeles,USA;4.Centre for Computational Systems Biology, School of Mathematical Sciences,Fudan University,Shanghai,China |
Abstract: | BackgroundProtein domains can be viewed as portable units of biological function that defines the functional properties of proteins. Therefore, if a protein is associated with a disease, protein domains might also be associated and define disease endophenotypes. However, knowledge about such domain-disease relationships is rarely available. Thus, identification of domains associated with human diseases would greatly improve our understandingof the mechanism of human complex diseases and further improve the prevention, diagnosis and treatment of these diseases.MethodsBased on phenotypic similarities among diseases, we first group diseases into overlapping modules. We then develop a framework to infer associations between domains and diseases through known relationships between diseases and modules, domains and proteins, as well as proteins and disease modules. Different methods including Association, Maximum likelihood estimation (MLE), Domain-disease pair exclusion analysis (DPEA), Bayesian, and Parsimonious explanation (PE) approaches are developed to predict domain-disease associations.ResultsWe demonstrate the effectiveness of all the five approaches via a series of validation experiments, and show the robustness of the MLE, Bayesian and PE approaches to the involved parameters. We also study the effects of disease modularization in inferring novel domain-disease associations. Through validation, the AUC (Area Under the operating characteristic Curve) scores for Bayesian, MLE, DPEA, PE, and Association approaches are 0.86, 0.84, 0.83, 0.83 and 0.79, respectively, indicating the usefulness of these approaches for predicting domain-disease relationships. Finally, we choose the Bayesian approach to infer domains associated with two common diseases, Crohn’s disease and type 2 diabetes.ConclusionsThe Bayesian approach has the best performance for the inference of domain-disease relationships. The predicted landscape between domains and diseases provides a more detailed view about the disease mechanisms. |