首页 | 本学科首页   官方微博 | 高级检索  
     


The language of gene ontology: a Zipf's law analysis
Authors:Kalankesh Leila R  Stevens Robert  Brass Andy
Abstract:ABSTRACT: BACKGROUND: Most major genome projects and sequence databases provide a GO annotation of their data,either automatically or through human annotators, creating a large corpus of data written inthe language of GO. Texts written in natural language show a statistical power law behaviour,Zipf's law, the exponent of which can provide useful information on the nature of thelanguage being used. We have therefore explored the hypothesis that collections of GOannotations will show similar statistical behaviours to natural language. RESULTS: Annotations from the Gene Ontology Annotation project were found to follow Zipf's law.Surprisingly, the measured power law exponents were consistently different betweenannotation captured using the three GO sub-ontologies in the corpora (function, process andcomponent). On filtering the corpora using GO evidence codes we found that the value of themeasured power law exponent responded in a predictable way as a function of the evidencecodes used to support the annotation. CONCLUSIONS: Techniques from computational linguistics can provide new insights into the annotationprocess. GO annotations show similar statistical behaviours to those seen in natural languagewith measured exponents that provide a signal which correlates with the nature of the evidence codes used to support the annotations, suggesting that the measured exponent mightprovide a signal regarding the information content of the annotation.
Keywords:
本文献已被 PubMed 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号