首页 | 本学科首页   官方微博 | 高级检索  
   检索      


Use of runs statistics for pattern recognition in genomic DNA sequences.
Authors:Leo Wang-Kit Cheung
Institution:Epidemiology Section, Cancer Etiology Program, Cancer Research Center of Hawaii, University of Hawaii, Honolulu, HI 96813-2479, USA. lcheung@crch.hawaii.edu
Abstract:In this article, the use of the finite Markov chain imbedding (FMCI) technique to study patterns in DNA under a hidden Markov model (HMM) is introduced. With a vision of studying multiple runs-related statistics simultaneously under an HMM through the FMCI technique, this work establishes an investigation of a bivariate runs statistic under a binary HMM for DNA pattern recognition. An FMCI-based recursive algorithm is derived and implemented for the determination of the exact distribution of this bivariate runs statistic under an independent identically distributed (IID) framework, a Markov chain (MC) framework, and a binary HMM framework. With this algorithm, we have studied the distributions of the bivariate runs statistic under different binary HMM parameter sets; probabilistic profiles of runs are created and shown to be useful for trapping HMM maximum likelihood estimates (MLEs). This MLE-trapping scheme offers good initial estimates to jump-start the expectation-maximization (EM) algorithm in HMM parameter estimation and helps prevent the EM estimates from landing on a local maximum or a saddle point. Applications of the bivariate runs statistic and the probabilistic profiles in conjunction with binary HMMs for pattern recognition in genomic DNA sequences are illustrated via case studies on DNA bendability signals using human DNA data.
Keywords:
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号