首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Hidden Markov models (HMMs) are a class of stochastic models that have proven to be powerful tools for the analysis of molecular sequence data. A hidden Markov model can be viewed as a black box that generates sequences of observations. The unobservable internal state of the box is stochastic and is determined by a finite state Markov chain. The observable output is stochastic with distribution determined by the state of the hidden Markov chain. We present a Bayesian solution to the problem of restoring the sequence of states visited by the hidden Markov chain from a given sequence of observed outputs. Our approach is based on a Monte Carlo Markov chain algorithm that allows us to draw samples from the full posterior distribution of the hidden Markov chain paths. The problem of estimating the probability of individual paths and the associated Monte Carlo error of these estimates is addressed. The method is illustrated by considering a problem of DNA sequence multiple alignment. The special structure for the hidden Markov model used in the sequence alignment problem is considered in detail. In conclusion, we discuss certain interesting aspects of biological sequence alignments that become accessible through the Bayesian approach to HMM restoration.  相似文献   

2.
Summary Tree growth is assumed to be mainly the result of three components: (i) an endogenous component assumed to be structured as a succession of roughly stationary phases separated by marked change points that are asynchronous among individuals, (ii) a time‐varying environmental component assumed to take the form of synchronous fluctuations among individuals, and (iii) an individual component corresponding mainly to the local environment of each tree. To identify and characterize these three components, we propose to use semi‐Markov switching linear mixed models, i.e., models that combine linear mixed models in a semi‐Markovian manner. The underlying semi‐Markov chain represents the succession of growth phases and their lengths (endogenous component) whereas the linear mixed models attached to each state of the underlying semi‐Markov chain represent—in the corresponding growth phase—both the influence of time‐varying climatic covariates (environmental component) as fixed effects, and interindividual heterogeneity (individual component) as random effects. In this article, we address the estimation of Markov and semi‐Markov switching linear mixed models in a general framework. We propose a Monte Carlo expectation–maximization like algorithm whose iterations decompose into three steps: (i) sampling of state sequences given random effects, (ii) prediction of random effects given state sequences, and (iii) maximization. The proposed statistical modeling approach is illustrated by the analysis of successive annual shoots along Corsican pine trunks influenced by climatic covariates.  相似文献   

3.
Techniques for extracting small, single channel ion currents from background noise are described and tested. It is assumed that single channel currents are generated by a first-order, finite-state, discrete-time, Markov process to which is added 'white' background noise from the recording apparatus (electrode, amplifiers, etc). Given the observations and the statistics of the background noise, the techniques described here yield a posteriori estimates of the most likely signal statistics, including the Markov model state transition probabilities, duration (open- and closed-time) probabilities, histograms, signal levels, and the most likely state sequence. Using variations of several algorithms previously developed for solving digital estimation problems, we have demonstrated that: (1) artificial, small, first-order, finite-state, Markov model signals embedded in simulated noise can be extracted with a high degree of accuracy, (2) processing can detect signals that do not conform to a first-order Markov model but the method is less accurate when the background noise is not white, and (3) the techniques can be used to extract from the baseline noise single channel currents in neuronal membranes. Some studies have been included to test the validity of assuming a first-order Markov model for biological signals. This method can be used to obtain directly from digitized data, channel characteristics such as amplitude distributions, transition matrices and open- and closed-time durations.  相似文献   

4.
Chen PC  Chen JW 《Bio Systems》2007,90(2):535-545
This paper presents an approach for controlling gene networks based on a Markov chain model, where the state of a gene network is represented as a probability distribution, while state transitions are considered to be probabilistic. An algorithm is proposed to determine a sequence of control actions that drives (without state feedback) the state of a given network to within a desired state set with a prescribed minimum or maximum probability. A heuristic is proposed and shown to improve the efficiency of the algorithm for a class of genetic networks.  相似文献   

5.
We present a statistical method, and its accompanying algorithms, for the selection of a mathematical model of the gating mechanism of an ion channel and for the estimation of the parameters of this model. The method assumes a hidden Markov model that incorporates filtering, colored noise and state-dependent white excess noise for the recorded data. The model selection and parameter estimation are performed via a Bayesian approach using Markov chain Monte Carlo. The method is illustrated by its application to single-channel recordings of the K+ outward-rectifier in barley leaf.Acknowledgement The authors thank Sake Vogelzang, Bert van Duijn and Bert de Boer for their helpful advice and useful comments and suggestions.  相似文献   

6.
We describe a new algorithm for protein classification and the detection of remote homologs. The rationale is to exploit both vertical and horizontal information of a multiple alignment in a well-balanced manner. This is in contrast to established methods such as profiles and profile hidden Markov models which focus on vertical information as they model the columns of the alignment independently and to family pairwise search which focuses on horizontal information as it treats given sequences separately. In our setting, we want to select from a given database of "candidate sequences" those proteins that belong to a given superfamily. In order to do so, each candidate sequence is separately tested against a multiple alignment of the known members of the superfamily by means of a new jumping alignment algorithm. This algorithm is an extension of the Smith-Waterman algorithm and computes a local alignment of a single sequence and a multiple alignment. In contrast to traditional methods, however, this alignment is not based on a summary of the individual columns of the multiple alignment. Rather, the candidate sequence is at each position aligned to one sequence of the multiple alignment, called the "reference sequence." In addition, the reference sequence may change within the alignment, while each such jump is penalized. To evaluate the discriminative quality of the jumping alignment algorithm, we compare it to profiles, profile hidden Markov models, and family pairwise search on a subset of the SCOP database of protein domains. The discriminative quality is assessed by median false positive counts (med-FP-counts). For moderate med-FP-counts, the number of successful searches with our method is considerably higher than with the competing methods.  相似文献   

7.
The maximum-likelihood technique for the direct estimation of rate constants from the measured patch clamp current is extended to the analysis of multi-channel recordings, including channels with subconductance levels. The algorithm utilizes a simplified approach for the calculation of the matrix exponentials of the probability matrix from the rate constants of the Markov model of the involved channel(s) by making use of the Kronecker sum and product. The extension to multi-channel analysis is tested by the application to simulated data. For these tests, three different channel models were selected: a two-state model, a three-state model with two open states of different conductance, and a three-state model with two closed states. For the simulations, time series of these models were calculated from the related first-order, finite-state, continuous-time Markov processes. Blue background noise was added, and the signals were filtered by a digital filter similar to the anti-aliasing low-pass. The tests showed that the fit algorithm revealed good estimates of the original rate constants from time series of simulated records with up to four independent and identical channels even in the case of signal-to-noise ratios being as low as 2. The number of channels in a record can be determined from the dependence of the likelihood on channel number. For large enough data sets, it takes on a maximum when the assumed channel number is equal to the "true" channel number.  相似文献   

8.
Molecular motors, such as kinesin, myosin, or dynein, convert chemical energy into mechanical energy by hydrolyzing ATP. The mechanical energy is used for moving in discrete steps along the cytoskeleton and carrying a molecular load. High resolution single molecule recordings of motor steps appear as a stochastic sequence of dwells, resembling a staircase. Staircase data can also be obtained from other molecular machines such as F1 -ATPase, RNA polymerase, or topoisomerase. We developed a maximum likelihood algorithm that estimates the rate constants between different conformational states of the protein, including motor steps. We model the motor with a periodic Markov model that reflects the repetitive chemistry of the motor step. We estimated the kinetics from the idealized dwell-sequence by numerical maximization of the likelihood function for discrete-time Markov models. This approach eliminates the need for missed event correction. The algorithm can fit kinetic models of arbitrary complexity, such as uniform or alternating step chemistry, reversible or irreversible kinetics, ATP concentration and mechanical force-dependent rates, etc. The method allows global fitting across stationary and nonstationary experimental conditions, and user-defined a priori constraints on rate constants. The algorithm was tested with simulated data, and implemented in the free QuB software.  相似文献   

9.
The activity of trans-membrane proteins such as ion channels is the essence of neuronal transmission. The currently most accurate method for determining ion channel kinetic mechanisms is single-channel recording and analysis. Yet, the limitations and complexities in interpreting single-channel recordings discourage many physiologists from using them. Here we show that a genetic search algorithm in combination with a gradient descent algorithm can be used to fit whole-cell voltage-clamp data to kinetic models with a high degree of accuracy. Previously, ion channel stimulation traces were analyzed one at a time, the results of these analyses being combined to produce a picture of channel kinetics. Here the entire set of traces from all stimulation protocols are analysed simultaneously. The algorithm was initially tested on simulated current traces produced by several Hodgkin-Huxley–like and Markov chain models of voltage-gated potassium and sodium channels. Currents were also produced by simulating levels of noise expected from actual patch recordings. Finally, the algorithm was used for finding the kinetic parameters of several voltage-gated sodium and potassium channels models by matching its results to data recorded from layer 5 pyramidal neurons of the rat cortex in the nucleated outside-out patch configuration. The minimization scheme gives electrophysiologists a tool for reproducing and simulating voltage-gated ion channel kinetics at the cellular level.  相似文献   

10.
The alpha-helical coiled coil can adopt a variety of topologies, among the most common of which are parallel and antiparallel dimers and trimers. We present Multicoil2, an algorithm that predicts both the location and oligomerization state (two versus three helices) of coiled coils in protein sequences. Multicoil2 combines the pairwise correlations of the previous Multicoil method with the flexibility of Hidden Markov Models (HMMs) in a Markov Random Field (MRF). The resulting algorithm integrates sequence features, including pairwise interactions, through multinomial logistic regression to devise an optimized scoring function for distinguishing dimer, trimer and non-coiled-coil oligomerization states; this scoring function is used to produce Markov Random Field potentials that incorporate pairwise correlations localized in sequence. Multicoil2 significantly improves both coiled-coil detection and dimer versus trimer state prediction over the original Multicoil algorithm retrained on a newly-constructed database of coiled-coil sequences. The new database, comprised of 2,105 sequences containing 124,088 residues, includes reliable structural annotations based on experimental data in the literature. Notably, the enhanced performance of Multicoil2 is evident when tested in stringent leave-family-out cross-validation on the new database, reflecting expected performance on challenging new prediction targets that have minimal sequence similarity to known coiled-coil families. The Multicoil2 program and training database are available for download from http://multicoil2.csail.mit.edu.  相似文献   

11.
Profile Hidden Markov Models (pHMMs) are widely used to model nucleotide or protein sequence families. In many applications, a sequence family classified into several subfamilies is given and each subfamily is modeled separately by one pHMM. A major drawback of this approach is the difficulty of coping with subfamilies composed of very few sequences.Correct subtyping of human immunodeficiency virus-1 (HIV-1) sequences is one of the most crucial bioinformatic tasks affected by this problem of small subfamilies, i.e., HIV-1 subtypes with a small number of known sequences. To deal with small samples for particular subfamilies of HIV-1, we employ a machine learning approach. More precisely, we make use of an existing HMM architecture and its associated inference engine, while replacing the unsupervised estimation of emission probabilities by a supervised method. For that purpose, we use regularized linear discriminant learning together with a balancing scheme to account for the widely varying sample size. After training the multiclass linear discriminants, the corresponding weights are transformed to valid probabilities using a softmax function.We apply this modified algorithm to classify HIV-1 sequence data (in the form of partial-length HIV-1 sequences and semi-artificial recombinants) and show that the performance of pHMMs can be significantly improved by the proposed technique.  相似文献   

12.
Fractal and Markov behavior in ion channel kinetics   总被引:1,自引:0,他引:1  
Kinetic analysis of ion channel recordings attempts to distinguish the number and lifetimes of channel molecular states. Most kinetic analysis assumes that the lifetime of each state is independent of previous channel history, so that open and closed durations are Markov processes whose probability densities are sums of exponential decays. An alternative approach assumes that channel molecules have many configurtions with widely varying lifetimes. Rates of opening and closing then vary with the time scale of observation, leading to fractal kinetics. We have examined kinetic behavior in two types of channels from human and avian fibroblasts, using a maximum likehood method to test the dependence of rates on observational time scale. For both channels, openings showed mixed fractal and Markov behavior, while closings gave mainly fractal kinetics.  相似文献   

13.
Hidden Markov models have recently been used to model single ion channel currents as recorded with the patch clamp technique from cell membranes. The estimation of hidden Markov models parameters using the forward-backward and Baum-Welch algorithms can be performed at signal to noise ratios that are too low for conventional single channel kinetic analysis; however, the application of these algorithms relies on the assumptions that the background noise be white and that the underlying state transitions occur at discrete times. To address these issues, we present an "H-noise" algorithm that accounts for correlated background noise and the randomness of sampling relative to transitions. We also discuss three issues that arise in the practical application of the algorithm in analyzing single channel data. First, we describe a digital inverse filter that removes the effects of the analog antialiasing filter and yields a sharp frequency roll-off. This enhances the performance while reducing the computational intensity of the algorithm. Second, the data may be contaminated with baseline drifts or deterministic interferences such as 60-Hz pickup. We propose an extension of previous results to consider baseline drift. Finally, we describe the extension of the algorithm to multiple data sets.  相似文献   

14.
Techniques for characterizing very small single-channel currents buried in background noise are described and tested on simulated data to give confidence when applied to real data. Single channel currents are represented as a discrete-time, finite-state, homogeneous, Markov process, and the noise that obscures the signal is assumed to be white and Gaussian. The various signal model parameters, such as the Markov state levels and transition probabilities, are unknown. In addition to white Gaussian noise, the signal can be corrupted by deterministic interferences of known form but unknown parameters, such as the sinusoidal disturbance stemming from AC interference and a drift of the base line owing to a slow development of liquid-junction potentials. To characterize the signal buried in such stochastic and deterministic interferences, the problem is first formulated in the framework of a Hidden Markov Model and then the Expectation Maximization algorithm is applied to obtain the maximum likelihood estimates of the model parameters (state levels, transition probabilities), signals, and the parameters of the deterministic disturbances. Using fictitious channel currents embedded in the idealized noise, we first show that the signal processing technique is capable of characterizing the signal characteristics quite accurately even when the amplitude of currents is as small as 5-10 fA. The statistics of the signal estimated from the processing technique include the amplitude, mean open and closed duration, open-time and closed-time histograms, probability of dwell-time and the transition probability matrix. With a periodic interference composed, for example, of 50 Hz and 100 Hz components, or a linear drift of the baseline added to the segment containing channel currents and white noise, the parameters of the deterministic interference, such as the amplitude and phase of the sinusoidal wave, or the rate of linear drift, as well as all the relevant statistics of the signal, are accurately estimated with the algorithm we propose. Also, if the frequencies of the periodic interference are unknown, they can be accurately estimated. Finally, we provide a technique by which channel currents originating from the sum of two or more independent single channels are decomposed so that each process can be separately characterized. This process is also formulated as a Hidden Markov Model problem and solved by applying the Expectation Maximization algorithm. The scheme relies on the fact that the transition matrix of the summed Markov process can be construed as a tensor product of the transition matrices of individual processes.  相似文献   

15.
Hidden Markov modeling (HMM) can be applied to extract single channel kinetics at signal-to-noise ratios that are too low for conventional analysis. There are two general HMM approaches: traditional Baum's reestimation and direct optimization. The optimization approach has the advantage that it optimizes the rate constants directly. This allows setting constraints on the rate constants, fitting multiple data sets across different experimental conditions, and handling nonstationary channels where the starting probability of the channel depends on the unknown kinetics. We present here an extension of this approach that addresses the additional issues of low-pass filtering and correlated noise. The filtering is modeled using a finite impulse response (FIR) filter applied to the underlying signal, and the noise correlation is accounted for using an autoregressive (AR) process. In addition to correlated background noise, the algorithm allows for excess open channel noise that can be white or correlated. To maximize the efficiency of the algorithm, we derive the analytical derivatives of the likelihood function with respect to all unknown model parameters. The search of the likelihood space is performed using a variable metric method. Extension of the algorithm to data containing multiple channels is described. Examples are presented that demonstrate the applicability and effectiveness of the algorithm. Practical issues such as the selection of appropriate noise AR orders are also discussed through examples.  相似文献   

16.
Zheng X  Liu T  Wang J 《Amino acids》2009,37(2):427-433
A complexity-based approach is proposed to predict subcellular location of proteins. Instead of extracting features from protein sequences as done previously, our approach is based on a complexity decomposition of symbol sequences. In the first step, distance between each pair of protein sequences is evaluated by the conditional complexity of one sequence given the other. Subcellular location of a protein is then determined using the k-nearest neighbor algorithm. Using three widely used data sets created by Reinhardt and Hubbard, Park and Kanehisa, and Gardy et al., our approach shows an improvement in prediction accuracy over those based on the amino acid composition and Markov model of protein sequences.  相似文献   

17.
A method to test the Markov nature of ion channel gating is proposed. It makes use of singly and doubly conditional distributions. The application of this method to recordings from single BK channels provides evidence that at least two states of the underlying kinetic scheme are left at a constant rate. Moreover, the probabilities, when leaving a state, of reaching another given state are shown to be constant for all the states of the system. Offprint requests to: D. Petracchi  相似文献   

18.
Hidden Markov models were successfully applied in various fields of time series analysis, especially for analyzing ion channel recordings. The maximum likelihood estimator (MLE) has recently been proven to be asymptotically normally distributed. Here, we investigate finite sample properties of the MLE and of different types of likelihood ratio tests (LRTs) by means of simulation studies. The MLE is shown to reach the asymptotic behavior within sample sizes that are common for various applications. Thus, reliable estimates and confidence intervals can be obtained. We give an approximative scaling function for the estimation error for finite samples, and investigate the power of different LRTs suitable for applications to ion channels, including tests for superimposed hidden Markov processes. Our results are applied to physiological sodium channel data.  相似文献   

19.
A Markov analysis of DNA sequences   总被引:12,自引:0,他引:12  
We present a model by which we look at the DNA sequence as a Markov process. It has been suggested by several workers that some basic biological or chemical features of nucleic acids stand behind the frequencies of dinucleotides (doublets) in these chains. Comparing patterns of doublet frequencies in DNA of different organisms was shown to be a fruitful approach to some phylogenetic questions (Russel & Subak-Sharpe, 1977). Grantham (1978) formulated mRNA sequence indices, some of which involve certain doublet frequencies. He suggested that using these indices may provide indications of the molecular constraints existing during gene evolution. Nussinov (1981) has shown that a set of dinucleotide preference rules holds consistently for eukaryotes, and suggested a strong correlation between these rules and degenerate codon usage. Gruenbaum, Cedar & Razin (1982) found that methylation in eukaryotic DNA occurs exclusively at C-G sites. Important biological information thus seems to be contained in the doublet frequencies. One of the basic questions to be asked (the "correlation question") is to what extent are the 64 trinucleotide (triplet) frequencies measured in a sequence determined by the 16 doublet frequencies in the same sequence. The DNA is described here as a Markov process, with the nucleotides being outcomes of a sequence generator. Answering the correlation question mentioned above means finding the order of the Markov process. The difficulty is that natural sequences are of finite length, and statistical noise is quite strong. We show that even for a 16000 nucleotide long sequence (like that of the human mitochondrial genome) the finite length effect cannot be neglected. Using the Markov chain model, the correlation between doublet and triplet frequencies can, however, be determined even for finite sequences, taking proper account of the finite length. Two natural DNA sequences, the human mitochondrial genome and the SV40 DNA, are analysed as examples of the method.  相似文献   

20.
This paper proposes a graphical method for detecting interspecies recombination in multiple alignments of DNA sequences. A fixed-size window is moved along a given DNA sequence alignment. For every position, the marginal posterior probability over tree topologies is determined by means of a Markov chain Monte Carlo simulation. Two probabilistic divergence measures are plotted along the alignment, and are used to identify recombinant regions. The method is compared with established detection methods on a set of synthetic benchmark sequences and two real-world DNA sequence alignments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号