首页 | 本学科首页   官方微博 | 高级检索  
     


A Multistage Gene Normalization System Integrating Multiple Effective Methods
Authors:Lishuang Li  Shanshan Liu  Lihua Li  Wenting Fan  Degen Huang  Huiwei Zhou
Affiliation:1. School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning, China.; 2. School of Mathmatics and Information Science and Technology, Hebei Normal University of Science and Technology, Qinhuangdao, Hebei, China.; Technische Universität Dresden, Germany,
Abstract:
Gene/protein recognition and normalization is an important preliminary step for many biological text mining tasks. In this paper, we present a multistage gene normalization system which consists of four major subtasks: pre-processing, dictionary matching, ambiguity resolution and filtering. For the first subtask, we apply the gene mention tagger developed in our earlier work, which achieves an F-score of 88.42% on the BioCreative II GM testing set. In the stage of dictionary matching, the exact matching and approximate matching between gene names and the EntrezGene lexicon have been combined. For the ambiguity resolution subtask, we propose a semantic similarity disambiguation method based on Munkres'' Assignment Algorithm. At the last step, a filter based on Wikipedia has been built to remove the false positives. Experimental results show that the presented system can achieve an F-score of 90.1%, outperforming most of the state-of-the-art systems.
Keywords:
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号