Extracting DNA words based on the sequence features: non-uniform distribution and integrity期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

首页 | 本学科首页

官方微博 | 高级检索

按检索

Extracting DNA words based on the sequence features: non-uniform distribution and integrity

Authors:	Zhi?Li Hongyan?Cao Email author" target="_blank">Yuehua?Cui Email author Email author" target="_blank">Yanbo?Zhang Email author

Institution:	1.Department of Health Statistics, School of Public Health,Shanxi Medical University,Taiyuan,China;2.Department of Statistics and Probability,Michigan State University,East Lansing,USA

Abstract:	Background DNA sequence can be viewed as an unknown language with words as its functional units. Given that most sequence alignment algorithms such as the motif discovery algorithms depend on the quality of background information about sequences, it is necessary to develop an ab initio algorithm for extracting the “words” based only on the DNA sequences. Methods We considered that non-uniform distribution and integrity were two important features of a word, based on which we developed an ab initio algorithm to extract “DNA words” that have potential functional meaning. A Kolmogorov-Smirnov test was used for consistency test of uniform distribution of DNA sequences, and the integrity was judged by the sequence and position alignment. Two random base sequences were adopted as negative control, and an English book was used as positive control to verify our algorithm. We applied our algorithm to the genomes of Saccharomyces cerevisiae and 10 strains of Escherichia coli to show the utility of the methods. Results The results provide strong evidences that the algorithm is a promising tool for ab initio building a DNA dictionary. Conclusions Our method provides a fast way for large scale screening of important DNA elements and offers potential insights into the understanding of a genome.

Keywords:
本文献已被 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司京ICP备09084417号