Using Markov model to improve word normalization algorithm for biological sequence comparison |
| |
Authors: | Qi Dai Xiaoqing Liu Yuhua Yao Fukun Zhao |
| |
Institution: | (1) College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou, 310018, People’s Republic of China;(2) School of Science, Hangzhou Dianzi University, Hangzhou, 310018, People’s Republic of China |
| |
Abstract: | There are two crucial problems with statistical measures for sequence comparison: overlapping structures and background information
of words in biological sequences. Word normalization in improved composition vector method took into account these problems
and achieved better performance in evolutionary analysis. The word normalization is desirable, but not sufficient, because
it assumes that the four bases A, C, T, and G occur randomly with equal chance. This paper proposed an improved word normalization
which uses Markov model to estimate exact k-word distribution according to observed biological sequence and thus has the ability to adjust the background information
of the k-word frequencies in biological sequences. The improved word normalization was tested with three experiments and compared
with the existing word normalization. The experiment results confirm that the improved word normalization using Markov model
to estimate the exact k-word distribution in biological sequences is more efficient. |
| |
Keywords: | |
本文献已被 PubMed SpringerLink 等数据库收录! |
|