首页 | 本学科首页   官方微博 | 高级检索  
     


Annokey: an annotation tool based on key term search of the NCBI Entrez Gene database
Authors:Daniel?J?Park,Tú?Nguyen-Dumont,Sori?Kang,Karin?Verspoor,Bernard?J?Pope  mailto:bjpope@unimelb.edu.au"   title="  bjpope@unimelb.edu.au"   itemprop="  email"   data-track="  click"   data-track-action="  Email author"   data-track-label="  "  >Email author
Affiliation:1.Genetic Epidemiology Laboratory, Department of Pathology, Medical Building,The University of Melbourne,Melbourne,Australia;2.Department of Computing and Information Systems, Doug McDonell Building,The University of Melbourne,Melbourne,Australia;3.Victorian Life Sciences Computation Initiative,The University of Melbourne,Melbourne,Australia
Abstract:

Background

The NCBI Entrez Gene and PubMed databases contain a wealth of high-quality information about genes for many different organisms. The NCBI Entrez online web-search interface is convenient for simple manual search for a small number of genes but impractical for the kinds of outputs seen in typical genomics projects.

Results

We have developed an efficient open source tool implemented in Python called Annokey, which annotates gene lists with the results of a keyword search of the NCBI Entrez Gene database and linked Pubmed article information. The user steers the search by specifying a ranked list of keywords (including multi-word phrases and regular expressions) that are correlated with their topic of interest. Rank information of matched terms allows the user to guide further investigation.We applied Annokey to the entire human Entrez Gene database using the key-term “DNA repair” and assessed its performance in identifying the 176 members of a published “gold standard” list of genes established to be involved in this pathway. For this test case we observed a sensitivity and specificity of 97% and 96%, respectively.

Conclusions

Annokey facilitates the identification of genes related to an area of interest, a task which can be onerous if performed manually on a large number of genes. Annokey provides a way to capitalize on the high quality information provided by the Entrez Gene database allowing both scalability and compatibility with automated analysis pipelines, thus offering the potential to significantly enhance research productivity.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号