Random Texts Do Not Exhibit the Real Zipf's Law-Like Rank Distribution期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

首页 | 本学科首页

官方微博 | 高级检索

按检索

Random Texts Do Not Exhibit the Real Zipf's Law-Like Rank Distribution

Authors:	Ramon Ferrer-i-Cancho Brita Elvev?g

Institution:	1. Departament de Llenguatges i Sistemes Informàtics, Universitat Politècnica de Catalunya, Barcelona, Catalonia, Spain.; 2. Clinical Brain Disorders Branch, National Institute of Mental Health, National Institutes of Health, Bethesda, Maryland, United States of America.;University of East Piedmont, Italy

Abstract:	Background Zipf''s law states that the relationship between the frequency of a word in a text and its rank (the most frequent word has rank , the 2nd most frequent word has rank ,…) is approximately linear when plotted on a double logarithmic scale. It has been argued that the law is not a relevant or useful property of language because simple random texts - constructed by concatenating random characters including blanks behaving as word delimiters - exhibit a Zipf''s law-like word rank distribution. Methodology/Principal Findings In this article, we examine the flaws of such putative good fits of random texts. We demonstrate - by means of three different statistical tests - that ranks derived from random texts and ranks derived from real texts are statistically inconsistent with the parameters employed to argue for such a good fit, even when the parameters are inferred from the target real text. Our findings are valid for both the simplest random texts composed of equally likely characters as well as more elaborate and realistic versions where character probabilities are borrowed from a real text. Conclusions/Significance The good fit of random texts to real Zipf''s law-like rank distributions has not yet been established. Therefore, we suggest that Zipf''s law might in fact be a fundamental law in natural languages.

Keywords:

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司京ICP备09084417号