共查询到20条相似文献,搜索用时 0 毫秒
1.
Song YS 《Bulletin of mathematical biology》2006,68(2):361-384
In hidden Markov models, the probability of observing a set of strings can be computed using recursion relations. We construct a sufficient condition for simplifying the recursion relations for a certain class of hidden Markov models. If the condition is satisfied, then one can construct a reduced recursion where the dependence on Markov states completely disappears. We discuss a specific example—namely, statistical multiple alignment based on the TKF-model—in which the sufficient condition is satisfied. 相似文献
2.
Iain L. Macdonald David Raubenheimer 《Biometrical journal. Biometrische Zeitschrift》1995,37(6):701-712
This paper proposes the use of hidden Markov time series models for the analysis of the behaviour sequences of one or more animals under observation. These models have advantages over the Markov chain models commonly used for behaviour sequences, as they can allow for time-trend or expansion to several subjects without sacrificing parsimony. Furthermore, they provide an alternative to higher-order Markov chain models if a first-order Markov chain is unsatisfactory as a model. To illustrate the use of such models, we fit multivariate and univariate hidden Markov models allowing for time-trend to data from an experiment investigating the effects of feeding on the locomotory behaviour of locusts (Locusta migratoria). 相似文献
3.
4.
5.
The phenomenon of interference in genetic recombination is well-known and studied in a wide variety of organisms. Multilocus linkage analysis, which makes use of recombination patterns among all genetic markers simultaneously, is routinely used with data on humans and experimental organisms to build genetic maps. It is also used to try to determine the genes involved in traits of interest, such as common diseases. Most linkage analyses performed today ignore the occurrence of genetical interference. We present an extension to the Lander-Green algorithm for experimental crosses (backcross and intercross) to incorporate crossover interference according to the chi2 model. Simulation results show the impact of using this model on the accuracy of estimated genetic maps. 相似文献
6.
This paper examines recent developments and applications of Hidden Markov Models (HMMs) to various problems in computational biology, including multiple sequence alignment, homology detection, protein sequences classification, and genomic annotation. 相似文献
7.
Stacia M. DeSantis E. Andrés Houseman Brent A. Coull David N. Louis Gayatry Mohapatra Rebecca A. Betensky 《Biometrics》2009,65(4):1296-1305
Summary Array CGH is a high‐throughput technique designed to detect genomic alterations linked to the development and progression of cancer. The technique yields fluorescence ratios that characterize DNA copy number change in tumor versus healthy cells. Classification of tumors based on aCGH profiles is of scientific interest but the analysis of these data is complicated by the large number of highly correlated measures. In this article, we develop a supervised Bayesian latent class approach for classification that relies on a hidden Markov model to account for the dependence in the intensity ratios. Supervision means that classification is guided by a clinical endpoint. Posterior inferences are made about class‐specific copy number gains and losses. We demonstrate our technique on a study of brain tumors, for which our approach is capable of identifying subsets of tumors with different genomic profiles, and differentiates classes by survival much better than unsupervised methods. 相似文献
8.
9.
隐马尔科夫过程在生物信息学中的应用 总被引:3,自引:0,他引:3
隐马尔科夫过程(hidden markov model,简称HMM)是20世纪70年代提出来的一种统计方法,以前主要用于语音识别。1989年Churchill将其引入计算生物学。目前,HMM是生物信息学中应用比较广泛的一种统计方法,主要用于:线性序列分析、模型分析、基因发现等方面。对HMM进行了简明扼要的描述,并对其在上述几个方面的应用作一概略介绍。 相似文献
10.
基于隐马氏模型对编码序列缺失与插入的检测(英) 总被引:2,自引:0,他引:2
在基因组测序工作完成后,利用计算工具进行基因识别以及基因结构预测受到了越来越多人的重视.人们开发了大量的相关应用软件,如GenScan, Genemark, GRAIL等,这些软件在寻找新基因方面提供了很重要的线索.但基因的识别和预测问题仍未得到完全解决,当目标基因的编码序列有缺失和插入时,其预测结果和基因的实际结构相差很大.为了消除测序错误对预测结果的影响,希望能找出编码序列区的测序错误.基于这种想法,尝试根据DNA序列的一些统计特性,利用隐马尔科夫模型(Hidden Markov Model),引入缺失和插入状态,然后用Viterbi算法,从中找出含有缺失和插入的外显子序列片段.在常用的Burset/Guigo检测集进行检测,得到的结果在外显子水平上,Sn(sensitivity)和Sp(specificity)均达到84%以上. 相似文献
11.
12.
广义隐Markov模型(GHMM)是基因识别的一种重要模型,但是其计算量比传统的隐Markov模型大得多,以至于不能直 接在基因识别中使用。根据原核生物基因的结构特点,提出了一种高效的简化算法,其计算量是序列长度的线性函数。在此 基础上,构建了针对原核生物基因的识别程序GeneMiner,对实际数据的测试表明,此算法是有效的。 相似文献
13.
Summary Continuous‐time multistate models are widely used for categorical response data, particularly in the modeling of chronic diseases. However, inference is difficult when the process is only observed at discrete time points, with no information about the times or types of events between observation times, unless a Markov assumption is made. This assumption can be limiting as rates of transition between disease states might instead depend on the time since entry into the current state. Such a formulation results in a semi‐Markov model. We show that the computational problems associated with fitting semi‐Markov models to panel‐observed data can be alleviated by considering a class of semi‐Markov models with phase‐type sojourn distributions. This allows methods for hidden Markov models to be applied. In addition, extensions to models where observed states are subject to classification error are given. The methodology is demonstrated on a dataset relating to development of bronchiolitis obliterans syndrome in post‐lung‐transplantation patients. 相似文献
14.
S. Michalek M. Wagner J. Timmer W. Vach 《Biometrical journal. Biometrische Zeitschrift》2001,43(7):863-879
Hidden Markov models were successfully applied in various fields of time series analysis, especially for analyzing ion channel recordings. The maximum likelihood estimator (MLE) has recently been proven to be asymptotically normally distributed. Here, we investigate finite sample properties of the MLE and of different types of likelihood ratio tests (LRTs) by means of simulation studies. The MLE is shown to reach the asymptotic behavior within sample sizes that are common for various applications. Thus, reliable estimates and confidence intervals can be obtained. We give an approximative scaling function for the estimation error for finite samples, and investigate the power of different LRTs suitable for applications to ion channels, including tests for superimposed hidden Markov processes. Our results are applied to physiological sodium channel data. 相似文献
15.
Khan RN Martinac B Madsen BW Milne RK Yeo GF Edeson RO 《Mathematical biosciences》2005,193(2):139-158
Patch clamp data from the large conductance mechanosensitive channel (MscL) in E. coli was studied with the aim of developing a strategy for statistical analysis based on hidden Markov models (HMMs) and determining the number of conductance levels of the channel, together with mean current, mean dwell time and equilibrium probability of occupancy for each level. The models incorporated state-dependent white noise and moving average adjustment for filtering, with maximum likelihood parameter estimates obtained using an EM (expectation-maximisation) based iteration. Adjustment for filtering was included as it could be expected that the electronic filter used in recording would have a major effect on obviously brief intermediate conductance level sojourns. Preliminary data analysis revealed that the brevity of intermediate level sojourns caused difficulties in assignment of data points to levels as a result of over-estimation of noise variances. When reasonable constraints were placed on these variances using the better determined noise variances for the closed and fully open levels, idealisation anomalies were eliminated. Nevertheless, simulations suggested that mean sojourn times for the intermediate levels were still considerably over-estimated, and that recording bandwidth was a major limitation; improved results were obtained with higher bandwidth data (10 kHz sampled at 25 kHz). The simplest model consistent with these data had four open conductance levels, intermediate levels being approximately 20%, 51% and 74% of fully open. The mean lifetime at the fully open level was about 1 ms; estimates for the three intermediate levels were 54-92 micros, probably still over-estimates. 相似文献
16.
We present a method for parameter estimation in a two-compartment hidden Markov model of the first two stages of hematopoiesis. Hematopoiesis is the specialization of stem cells into mature blood cells. As stem cells are not distinguishable in bone marrow, little is known about their behavior, although it is known that they have the ability to self-renew or to differentiate to more specialized (progenitor) cells. We observe progenitor cells in samples of bone marrow taken from hybrid cats whose cells contain a natural binary marker. With data consisting of the changing proportions of this binary marker over time from several cats, estimates for stem cell self-renewal and differentiation parameters are obtained using an estimating equations approach. 相似文献
17.
This paper discusses a two‐state hidden Markov Poisson regression (MPR) model for analyzing longitudinal data of epileptic seizure counts, which allows for the rate of the Poisson process to depend on covariates through an exponential link function and to change according to the states of a two‐state Markov chain with its transition probabilities associated with covariates through a logit link function. This paper also considers a two‐state hidden Markov negative binomial regression (MNBR) model, as an alternative, by using the negative binomial instead of Poisson distribution in the proposed MPR model when there exists extra‐Poisson variation conditional on the states of the Markov chain. The two proposed models in this paper relax the stationary requirement of the Markov chain, allow for overdispersion relative to the usual Poisson regression model and for correlation between repeated observations. The proposed methodology provides a plausible analysis for the longitudinal data of epileptic seizure counts, and the MNBR model fits the data much better than the MPR model. Maximum likelihood estimation using the EM and quasi‐Newton algorithms is discussed. A Monte Carlo study for the proposed MPR model investigates the reliability of the estimation method, the choice of probabilities for the initial states of the Markov chain, and some finite sample behaviors of the maximum likelihood estimates, suggesting that (1) the estimation method is accurate and reliable as long as the total number of observations is reasonably large, and (2) the choice of probabilities for the initial states of the Markov process has little impact on the parameter estimates. 相似文献
18.
现有蛋白质亚细胞定位方法针对水溶性蛋白质而设计,对跨膜蛋白并不适用。而专门的跨膜拓扑预测器,又不是为亚细胞定位而设计的。文章改进了跨膜拓扑预测器TMPHMMLoc的模型结构,设计了一个新的二阶隐马尔可夫模型;采用推广到二阶模型的Baum-Welch算法估计模型参数,并把将各个亚细胞位置建立的模型整合为一个预测器。数据集上测试结果表明,此方法性能显著优于针对可溶性蛋白设计的支持向量机方法和模糊k最邻近方法,也优于TMPHMMLoc中提出的隐马尔可夫模型方法,是一个有效的跨膜蛋白亚细胞定位预测方法。 相似文献
19.
We developed a computer program, GeneHackerTL, which predictsthe most probable translation initiation site for a given nucleotidesequence. The program requires that information be extractedfrom the nucleotide sequence data surrounding the translationinitiation sites according to the framework of the Hidden MarkovModel. Since the translation initiation sites of 72 highly abundantproteins have already been assigned on the genome of Synechocystissp. strain PCC6803 by amino-terminal analysis, we extractednecessary information for GeneHackerTL from the nucleotide sequencedata. The prediction rate of the GeneHackerTL for these proteinswas estimated to be 86.1%. We then used GeneHackerTL for predictionof the translation initiation sites of 24 other proteins, ofwhich the initiation sites were not assigned experimentally,because of the lack of a potential initiation codon at the amino-terminalposition. For 20 out of the 24 proteins, the initiation siteswere predicted in the upstream of their amino-terminal positions.According to this assignment, the processed regions representa typical feature of signal peptides. We could also predictmultiple translation initiation sites for a particular genefor which at least two initiation sites were experimentallydetected. This program would be e.ective for the predictionof translation initiationsites of other proteins, not only inthis species but also in other prokaryotes as well. 相似文献