Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

首页 | 本学科首页

官方微博 | 高级检索

按检索

Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks

Authors:	Alaa Abi-Haidar Jasleen Kaur Ana Maguitman Predrag Radivojac Andreas Rechtsteiner Karin Verspoor Zhiping Wang Luis M Rocha

Institution:	School of Informatics, Indiana University, Bloomington, IN 47405, USA.

Abstract:	Background: We participated in three of the protein-protein interaction subtasks of the Second BioCreative Challenge: classification of abstracts relevant for protein-protein interaction (interaction article subtask IAS]), discovery of protein pairs (interaction pair subtask IPS]), and identification of text passages characterizing protein interaction (interaction sentences subtask ISS]) in full-text documents. We approached the abstract classification task with a novel, lightweight linear model inspired by spam detection techniques, as well as an uncertainty-based integration scheme. We also used a support vector machine and singular value decomposition on the same features for comparison purposes. Our approach to the full-text subtasks (protein pair and passage identification) includes a feature expansion method based on word proximity networks. Results: Our approach to the abstract classification task (IAS) was among the top submissions for this task in terms of measures of performance used in the challenge evaluation (accuracy, F-score, and area under the receiver operating characteristic curve). We also report on a web tool that we produced using our approach: the Protein Interaction Abstract Relevance Evaluator (PIARE). Our approach to the full-text tasks resulted in one of the highest recall rates as well as mean reciprocal rank of correct passages. Conclusion: Our approach to abstract classification shows that a simple linear model, using relatively few features, can generalize and uncover the conceptual nature of protein-protein interactions from the bibliome. Because the novel approach is based on a rather lightweight linear model, it can easily be ported and applied to similar problems. In full-text problems, the expansion of word features with word proximity networks is shown to be useful, although the need for some improvements is discussed.

Keywords:
本文献已被 PubMed SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司京ICP备09084417号