首页 | 本学科首页   官方微博 | 高级检索  
   检索      


A model of the statistical power of comparative genome sequence analysis
Authors:Eddy Sean R
Institution:1 Howard Hughes Medical Institute and Department of Genetics, Washington University School of Medicine Saint Louis, Missouri United States of America;Pennsylvania State University United States of America
Abstract:Comparative genome sequence analysis is powerful, but sequencing genomes is expensive. It is desirable to be able to predict how many genomes are needed for comparative genomics, and at what evolutionary distances. Here I describe a simple mathematical model for the common problem of identifying conserved sequences. The model leads to some useful rules of thumb. For a given evolutionary distance, the number of comparative genomes needed for a constant level of statistical stringency in identifying conserved regions scales inversely with the size of the conserved feature to be detected. At short evolutionary distances, the number of comparative genomes required also scales inversely with distance. These scaling behaviors provide some intuition for future comparative genome sequencing needs, such as the proposed use of “phylogenetic shadowing” methods using closely related comparative genomes, and the feasibility of high-resolution detection of small conserved features.
Keywords:
本文献已被 PubMed 等数据库收录!
点击此处可从《PLoS Biology》浏览原始摘要信息
点击此处可从《PLoS Biology》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号