首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
SUMMARY: Hidden Markov models (HMMs) are widely used for biological sequence analysis because of their ability to incorporate biological information in their structure. An automatic means of optimizing the structure of HMMs would be highly desirable. However, this raises two important issues; first, the new HMMs should be biologically interpretable, and second, we need to control the complexity of the HMM so that it has good generalization performance on unseen sequences. In this paper, we explore the possibility of using a genetic algorithm (GA) for optimizing the HMM structure. GAs are sufficiently flexible to allow incorporation of other techniques such as Baum-Welch training within their evolutionary cycle. Furthermore, operators that alter the structure of HMMs can be designed to favour interpretable and simple structures. In this paper, a training strategy using GAs is proposed, and it is tested on finding HMM structures for the promoter and coding region of the bacterium Campylobacter jejuni. The proposed GA for hidden Markov models (GA-HMM) allows, HMMs with different numbers of states to evolve. To prevent over-fitting, a separate dataset is used for comparing the performance of the HMMs to that used for the Baum-Welch training. The GA-HMM was capable of finding an HMM comparable to a hand-coded HMM designed for the same task, which has been published previously.  相似文献   

2.
3.
Hidden Markov models (HMMs) are a class of stochastic models that have proven to be powerful tools for the analysis of molecular sequence data. A hidden Markov model can be viewed as a black box that generates sequences of observations. The unobservable internal state of the box is stochastic and is determined by a finite state Markov chain. The observable output is stochastic with distribution determined by the state of the hidden Markov chain. We present a Bayesian solution to the problem of restoring the sequence of states visited by the hidden Markov chain from a given sequence of observed outputs. Our approach is based on a Monte Carlo Markov chain algorithm that allows us to draw samples from the full posterior distribution of the hidden Markov chain paths. The problem of estimating the probability of individual paths and the associated Monte Carlo error of these estimates is addressed. The method is illustrated by considering a problem of DNA sequence multiple alignment. The special structure for the hidden Markov model used in the sequence alignment problem is considered in detail. In conclusion, we discuss certain interesting aspects of biological sequence alignments that become accessible through the Bayesian approach to HMM restoration.  相似文献   

4.
5.
《Biophysical journal》2023,122(2):433-441
Potential energy landscapes are useful models in describing events such as protein folding and binding. While single-molecule fluorescence resonance energy transfer (smFRET) experiments encode information on continuous potentials for the system probed, including rarely visited barriers between putative potential minima, this information is rarely decoded from the data. This is because existing analysis methods often model smFRET output assuming, from the onset, that the system probed evolves in a discretized state space to be analyzed within a hidden Markov model (HMM) paradigm. By contrast, here, we infer continuous potentials from smFRET data without discretely approximating the state space. We do so by operating within a Bayesian nonparametric paradigm by placing priors on the family of all possible potential curves. As our inference accounts for a number of required experimental features raising computational cost (such as incorporating discrete photon shot noise), the framework leverages a structured-kernel-interpolation Gaussian process prior to help curtail computational cost. We show that our structured-kernel-interpolation priors for potential energy reconstruction from smFRET analysis accurately infers the potential energy landscape from a smFRET binding experiment. We then illustrate advantages of structured-kernel-interpolation priors for potential energy reconstruction from smFRET over standard HMM approaches by providing information, such as barrier heights and friction coefficients, that is otherwise inaccessible to HMMs.  相似文献   

6.
Complex sequencing rules observed in birdsongs provide an opportunity to investigate the neural mechanism for generating complex sequential behaviors. To relate the findings from studying birdsongs to other sequential behaviors such as human speech and musical performance, it is crucial to characterize the statistical properties of the sequencing rules in birdsongs. However, the properties of the sequencing rules in birdsongs have not yet been fully addressed. In this study, we investigate the statistical properties of the complex birdsong of the Bengalese finch (Lonchura striata var. domestica). Based on manual-annotated syllable labeles, we first show that there are significant higher-order context dependencies in Bengalese finch songs, that is, which syllable appears next depends on more than one previous syllable. We then analyze acoustic features of the song and show that higher-order context dependencies can be explained using first-order hidden state transition dynamics with redundant hidden states. This model corresponds to hidden Markov models (HMMs), well known statistical models with a large range of application for time series modeling. The song annotation with these models with first-order hidden state dynamics agreed well with manual annotation, the score was comparable to that of a second-order HMM, and surpassed the zeroth-order model (the Gaussian mixture model; GMM), which does not use context information. Our results imply that the hierarchical representation with hidden state dynamics may underlie the neural implementation for generating complex behavioral sequences with higher-order dependencies.  相似文献   

7.
Summary Continuous‐time multistate models are widely used for categorical response data, particularly in the modeling of chronic diseases. However, inference is difficult when the process is only observed at discrete time points, with no information about the times or types of events between observation times, unless a Markov assumption is made. This assumption can be limiting as rates of transition between disease states might instead depend on the time since entry into the current state. Such a formulation results in a semi‐Markov model. We show that the computational problems associated with fitting semi‐Markov models to panel‐observed data can be alleviated by considering a class of semi‐Markov models with phase‐type sojourn distributions. This allows methods for hidden Markov models to be applied. In addition, extensions to models where observed states are subject to classification error are given. The methodology is demonstrated on a dataset relating to development of bronchiolitis obliterans syndrome in post‐lung‐transplantation patients.  相似文献   

8.
We present a new method for inferring hidden Markov models from noisy time sequences without the necessity of assuming a model architecture, thus allowing for the detection of degenerate states. This is based on the statistical prediction techniques developed by Crutchfield et al. and generates so called causal state models, equivalent in structure to hidden Markov models. The new method is applicable to any continuous data which clusters around discrete values and exhibits multiple transitions between these values such as tethered particle motion data or Fluorescence Resonance Energy Transfer (FRET) spectra. The algorithms developed have been shown to perform well on simulated data, demonstrating the ability to recover the model used to generate the data under high noise, sparse data conditions and the ability to infer the existence of degenerate states. They have also been applied to new experimental FRET data of Holliday Junction dynamics, extracting the expected two state model and providing values for the transition rates in good agreement with previous results and with results obtained using existing maximum likelihood based methods. The method differs markedly from previous Markov-model reconstructions in being able to uncover truly hidden states.  相似文献   

9.
Profile hidden Markov models (HMMs) are used to model protein families and for detecting evolutionary relationships between proteins. Such a profile HMM is typically constructed from a multiple alignment of a set of related sequences. Transition probability parameters in an HMM are used to model insertions and deletions in the alignment. We show here that taking into account unrelated sequences when estimating the transition probability parameters helps to construct more discriminative models for the global/local alignment mode. After normal HMM training, a simple heuristic is employed that adjusts the transition probabilities between match and delete states according to observed transitions in the training set relative to the unrelated (noise) set. The method is called adaptive transition probabilities (ATP) and is based on the HMMER package implementation. It was benchmarked in two remote homology tests based on the Pfam and the SCOP classifications. Compared to the HMMER default procedure, the rate of misclassification was reduced significantly in both tests and across all levels of error rate.  相似文献   

10.
It is shown that hidden Markov models (HMMs) are a powerful tool in the analysis of multielectrode data. This is demonstrated for a 30-electrode measurement of neuronal spike activity in the monkey's visual cortex during the application of different visual stimuli. HMMs with optimized parameters code the information contained in the spatiotemporal discharge patterns as a probabilistic function of a Markov process and thus provide abstract dynamical models of the pattern-generating process. We compare HMMs obtained from vector-quantized data with models in which parametrized output processes such as multivariate Poisson or binomial distributions are assumed. In the latter cases the visual stimuli are recognized at rates of more than 90% from the neuronal spike patterns. An analysis of the models obtained reveals important aspects of the coding of information in the brain. For example, we identify relevant time scales and characterize the degree and nature of the spatiotemporal variations on these scales.  相似文献   

11.
In this article, we introduce the drifting Markov models (DMMs) which are inhomogeneous Markov models designed for modeling the heterogeneities of sequences (in our case DNA or protein sequences) in a more flexible way than homogeneous Markov chains or even hidden Markov models (HMMs). We focus here on the polynomial drift: the transition matrix varies in a polynomial way. To show the reliability of our models on DNA, we exhibit high similarities between the probability distributions of nucleotides obtained by our models and the frequencies of these nucleotides computed by using a sliding window. In a further step, these DMMs can be used as the states of an HMM: on each of its segments, the observed process can be modeled by a drifting Markov model. Search of rare words in DNA sequences remains possible with DMMs and according to the fits provided, DMMs turn out to be a powerful tool for this purpose. The software is available on request from the author. It will soon be integrated on seq++ library (http://stat.genopole.cnrs.fr/seqpp/).  相似文献   

12.
1.  Linking the movement and behaviour of animals to their environment is a central problem in ecology. Through the use of electronic tagging and tracking (ETT), collection of in situ data from free-roaming animals is now commonplace, yet statistical approaches enabling direct relation of movement observations to environmental conditions are still in development.
2.  In this study, we examine the hidden Markov model (HMM) for behavioural analysis of tracking data. HMMs allow for prediction of latent behavioural states while directly accounting for the serial dependence prevalent in ETT data. Updating the probability of behavioural switches with tag or remote-sensing data provides a statistical method that links environmental data to behaviour in a direct and integrated manner.
3.  It is important to assess the reliability of state categorization over the range of time-series lengths typically collected from field instruments and when movement behaviours are similar between movement states. Simulation with varying lengths of times series data and contrast between average movements within each state was used to test the HMMs ability to estimate movement parameters.
4.  To demonstrate the methods in a realistic setting, the HMMs were used to categorize resident and migratory phases and the relationship between movement behaviour and ocean temperature using electronic tagging data from southern bluefin tuna ( Thunnus maccoyii ). Diagnostic tools to evaluate the suitability of different models and inferential methods for investigating differences in behaviour between individuals are also demonstrated.  相似文献   

13.
As hidden Markov models (HMMs) become increasingly more important in the analysis of biological sequences, so too have databases of HMMs expanded in size, number and importance. While the standard paradigm a short while ago was the analysis of one or a few sequences at a time, it has now become standard procedure to submit an entire microbial genome. In the future, it will be common to submit large groups of completed genomes to run simultaneously against a dozen public databases and any number of internally developed targets. This paper looks at some of the readily available HMM (or HMM-like) algorithms and several publicly available HMM databases, and outlines methods by which the reader may develop custom HMM targets.  相似文献   

14.
The standard method of applying hidden Markov models to biological problems is to find a Viterbi (maximal weight) path through the HMM graph. The Viterbi algorithm reduces the problem of finding the most likely hidden state sequence that explains given observations, to a dynamic programming problem for corresponding directed acyclic graphs. For example, in the gene finding application, the HMM is used to find the most likely underlying gene structure given a DNA sequence. In this note we discuss the applications of sampling methods for HMMs. The standard sampling algorithm for HMMs is a variant of the common forward-backward and backtrack algorithms, and has already been applied in the context of Gibbs sampling methods. Nevetheless, the practice of sampling state paths from HMMs does not seem to have been widely adopted, and important applications have been overlooked. We show how sampling can be used for finding alternative splicings for genes, including alternative splicings that are conserved between genes from related organisms. We also show how sampling from the posterior distribution is a natural way to compute probabilities for predicted exons and gene structures being correct under the assumed model. Finally, we describe a new memory efficient sampling algorithm for certain classes of HMMs which provides a practical sampling alternative to the Hirschberg algorithm for optimal alignment. The ideas presented have applications not only to gene finding and HMMs but more generally to stochastic context free grammars and RNA structure prediction.  相似文献   

15.
16.
Hidden Markov models (HMMs) provide an excellent analysis of recordings with very poor signal/noise ratio made from systems such as ion channels which switch among a few states. This method has also recently been used for modeling the kinetic rate constants of molecular motors, where the observable variable—the position—steadily accumulates as a result of the motor's reaction cycle. We present a new HMM implementation for obtaining the chemical-kinetic model of a molecular motor's reaction cycle called the variable-stepsize HMM in which the quantized position variable is represented by a large number of states of the Markov model. Unlike previous methods, the model allows for arbitrary distributions of step sizes, and allows these distributions to be estimated. The result is a robust algorithm that requires little or no user input for characterizing the stepping kinetics of molecular motors as recorded by optical techniques.  相似文献   

17.
Single-molecule fluorescence resonance energy transfer (smFRET) measurement is a powerful technique for investigating dynamics of biomolecules, for which various efforts have been made to overcome significant stochastic noise. Time stamp (TS) measurement has been employed experimentally to enrich information within the signals, while data analyses such as the hidden Markov model (HMM) have been successfully applied to recover the trajectories of molecular state transitions from time-binned photon counting signals or images. In this article, we introduce the HMM for TS-FRET signals, employing the variational Bayes (VB) inference to solve the model, and demonstrate the application of VB-HMM-TS-FRET to simulated TS-FRET data. The same analysis using VB-HMM is conducted for other models and the previously reported change point detection scheme. The performance is compared to other analysis methods or data types and we show that our VB-HMM-TS-FRET analysis can achieve the best performance and results in the highest time resolution. Finally, an smFRET experiment was conducted to observe spontaneous branch migration of Holliday-junction DNA. VB-HMM-TS-FRET was successfully applied to reconstruct the state transition trajectory with the number of states consistent with the nucleotide sequence. The results suggest that a single migration process frequently involves rearrangement of multiple basepairs.  相似文献   

18.
《IRBM》2019,40(3):157-166
ObjectiveFetal Electro Cardiogram (fECG) provides critical information on the wellbeing of a foetus heart in its developing stages in the mother's womb. The objective of this work is to extract fECG which is buried in a composite signal consisting of itself, maternal ECG (mECG) and noises contributed from various unavoidable sources. In the past, the challenge of extracting fECG from the composite signal was dealt with by Stochastic Weiner filter, model-based Kalman filter and other adaptive filtering techniques. Blind Source Separation (BSS) based Independent Component Analysis (ICA) has shown an edge over the adaptive filtering techniques as the former does not require a reference signal. Recently, data-driven machine learning techniques e.g., adaptive neural networks, adaptive neuro-fuzzy inference system, support vector machine (SVM) are also applied.MethodThis work pursues hidden Markov model (HMM)-based supervised machine learning frame-work for the determination of the location of fECG QRS complex from the composite abdominal signal. HMM is used to model the underlying hidden states of the observable time series of the extracted and separated fECG data with its QRS peak location as one of the hidden states. The state transition probabilities are estimated in the training phase using the annotated data sets. Afterwards, using the estimated HMM networks, fQRS locations are detected in the testing phase. To evaluate the proposed technique, the accuracy of the correct detection of QRS complex with respect to the correct annotation of QRS complex location is considered and quantified by the sensitivity, probability of false alarm, and accuracy.ResultsThe best results that have been achieved using the proposed method are: accuracy – 97.1%, correct detection rate (translated to sensitivity) – 100%, and false alarm rate – 2.89%.ConclusionTwo primary challenges in these methods are finding the right reference threshold for the normalization of the extracted fECG signal during the initial trials and limitation of discrete frame work of HMM signal (converted from continuous time) which only offers a countable number of levels in observations. By feeding the posterior probabilities, obtained from SVM, into HMM, as emission probabilities, can further improve the accuracy of fQRS location detection.  相似文献   

19.
20.
Level or jump detectors generate the reconstructed time series from a noisy record of patch-clamp current. The reconstructed time series is used to create dwell-time histograms for the kinetic analysis of the Markov model of the investigated ion channel. It is shown here that some additional lines in the software of such a detector can provide a powerful new means of patch-clamp analysis. For each current level that can be recognized by the detector, an array is declared. The new software assigns every data point of the original time series to the array that belongs to the actual state of the detector. From the data sets in these arrays distributions-per-level are generated. Simulated and experimental time series analyzed by Hinkley detectors are used to demonstrate the benefits of these distributions-per-level. First, they can serve as a test of the reliability of jump and level detectors. Second, they can reveal beta distributions as resulting from fast gating that would usually be hidden in the overall amplitude histogram. Probably the most valuable feature is that the malfunctions of the Hinkley detectors turn out to depend on the Markov model of the ion channel. Thus, the errors revealed by the distributions-per-level can be used to distinguish between different putative Markov models of the measured time series. Abbreviations: AMFE = anomalous mole fraction effect; C, Z = closed state; DHD = sublevel Hinkley detector; HMM = Hidden Markov model; O, F = openstate; S = subconductance level  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号