首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The standard method of applying hidden Markov models to biological problems is to find a Viterbi (maximal weight) path through the HMM graph. The Viterbi algorithm reduces the problem of finding the most likely hidden state sequence that explains given observations, to a dynamic programming problem for corresponding directed acyclic graphs. For example, in the gene finding application, the HMM is used to find the most likely underlying gene structure given a DNA sequence. In this note we discuss the applications of sampling methods for HMMs. The standard sampling algorithm for HMMs is a variant of the common forward-backward and backtrack algorithms, and has already been applied in the context of Gibbs sampling methods. Nevetheless, the practice of sampling state paths from HMMs does not seem to have been widely adopted, and important applications have been overlooked. We show how sampling can be used for finding alternative splicings for genes, including alternative splicings that are conserved between genes from related organisms. We also show how sampling from the posterior distribution is a natural way to compute probabilities for predicted exons and gene structures being correct under the assumed model. Finally, we describe a new memory efficient sampling algorithm for certain classes of HMMs which provides a practical sampling alternative to the Hirschberg algorithm for optimal alignment. The ideas presented have applications not only to gene finding and HMMs but more generally to stochastic context free grammars and RNA structure prediction.  相似文献   

2.
A central problem in ecology is relating the interactions of individuals-described in terms of competition, predation, interference, etc.-to the dynamics of the populations of these individuals-in terms of change in numbers of individuals over time. Here, we address this problem for a class of site-based ecological models, where local interactions between individuals take place at a finite number of discrete resource sites over non-overlapping generations and, between generations, individuals move randomly between sites over the entire system. Such site-based models have previously been applied to a wide range of ecological systems: from those involving contest or scramble competition for resources to host-parasite interactions and meta-populations. We show how the population dynamics of site-based models can be accurately approximated by and understood through deterministic and stochastic difference equations. Conversely, we use the inverse of this approximation to show what implicit assumptions are made about individual interactions by modelling of population dynamics in terms of difference equations. To this end, we prove a useful and general theorem: that any model in our class of site-based models has a corresponding stochastic difference equation population model, by which it can be approximated. This theorem allows us to calculate long-term population dynamics, evolutionary stable strategies and, by extending our theory to account for large deviations, extinction probabilities for a wide range of site-based systems. Our methodology is then illustrated to various examples of between species competition, predator-prey interactions and co-operation.  相似文献   

3.
Any mechanism of language acquisition can only learn a restricted set of grammars. The human brain contains a mechanism for language acquisition which can learn a restricted set of grammars. The theory of this restricted set is universal grammar (UG). UG has to be sufficiently specific to induce linguistic coherence in a population. This phenomenon is known as "coherence threshold". Previously, we have calculated the coherence threshold for deterministic dynamics and infinitely large populations. Here, we extend the framework to stochastic processes and finite populations. If there is selection for communicative function (selective language dynamics), then the analytic results for infinite populations are excellent approximations for finite populations; as expected, finite populations need a slightly higher accuracy of language acquisition to maintain coherence. If there is no selection for communicative function (neutral language dynamics), then linguistic coherence is only possible for finite populations.  相似文献   

4.
Summary .  We present a Bayesian approach to modeling dynamic smoking addiction behavior processes when cure is not directly observed due to censoring. Subject-specific probabilities model the stochastic transitions among three behavioral states: smoking, transient quitting, and permanent quitting (absorbent state). A multivariate normal distribution for random effects is used to account for the potential correlation among the subject-specific transition probabilities. Inference is conducted using a Bayesian framework via Markov chain Monte Carlo simulation. This framework provides various measures of subject-specific predictions, which are useful for policy-making, intervention development, and evaluation. Simulations are used to validate our Bayesian methodology and assess its frequentist properties. Our methods are motivated by, and applied to, the Alpha-Tocopherol, Beta-Carotene Lung Cancer Prevention study, a large (29,133 individuals) longitudinal cohort study of smokers from Finland.  相似文献   

5.
Coalescent theory is routinely used to estimate past population dynamics and demographic parameters from genealogies. While early work in coalescent theory only considered simple demographic models, advances in theory have allowed for increasingly complex demographic scenarios to be considered. The success of this approach has lead to coalescent-based inference methods being applied to populations with rapidly changing population dynamics, including pathogens like RNA viruses. However, fitting epidemiological models to genealogies via coalescent models remains a challenging task, because pathogen populations often exhibit complex, nonlinear dynamics and are structured by multiple factors. Moreover, it often becomes necessary to consider stochastic variation in population dynamics when fitting such complex models to real data. Using recently developed structured coalescent models that accommodate complex population dynamics and population structure, we develop a statistical framework for fitting stochastic epidemiological models to genealogies. By combining particle filtering methods with Bayesian Markov chain Monte Carlo methods, we are able to fit a wide class of stochastic, nonlinear epidemiological models with different forms of population structure to genealogies. We demonstrate our framework using two structured epidemiological models: a model with disease progression between multiple stages of infection and a two-population model reflecting spatial structure. We apply the multi-stage model to HIV genealogies and show that the proposed method can be used to estimate the stage-specific transmission rates and prevalence of HIV. Finally, using the two-population model we explore how much information about population structure is contained in genealogies and what sample sizes are necessary to reliably infer parameters like migration rates.  相似文献   

6.
Markov chain models are frequently used for studying event histories that include transitions between several states. An empirical transition matrix for nonhomogeneous Markov chains has previously been developed, including a detailed statistical theory based on counting processes and martingales. In this article, we show how to estimate transition probabilities dependent on covariates. This technique may, e.g., be used for making estimates of individual prognosis in epidemiological or clinical studies. The covariates are included through nonparametric additive models on the transition intensities of the Markov chain. The additive model allows for estimation of covariate-dependent transition intensities, and again a detailed theory exists based on counting processes. The martingale setting now allows for a very natural combination of the empirical transition matrix and the additive model, resulting in estimates that can be expressed as stochastic integrals, and hence their properties are easily evaluated. Two medical examples will be given. In the first example, we study how the lung cancer mortality of uranium miners depends on smoking and radon exposure. In the second example, we study how the probability of being in response depends on patient group and prophylactic treatment for leukemia patients who have had a bone marrow transplantation. A program in R and S-PLUS that can carry out the analyses described here has been developed and is freely available on the Internet.  相似文献   

7.
This paper considers the utility of a new class of stochastic branching processes with non-homogeneous immigration in modeling complex renewing cell systems. Such systems typically include the population of stem cells that provides an inexhaustible supply of cells necessary for maintaining the cellular composition of a tissue. A stem cell may be induced to transform (differentiate) into a progenitor cell. Progenitor cells retain the ability to proliferate and their function is believed to provide a quick proliferative response to an increased demand for cells in the population. There may be several sub-types of progenitor cells. Terminally differentiated cells do not divide under normal conditions; they are responsible for maintaining tissue-specific functions. Recent advancements in experimental techniques offer considerable scope for quantitative studies of in vivo cell kinetics based on stochastic modeling of renewing cell populations. However, no ready-made theory is currently available to take full advantage of these advancements. This paper introduces such a theory with a special focus on its feasibility in biological applications.  相似文献   

8.
This paper is a sequel to a paper by the author entitled “Restricted Transition Probabilities and Their Applications to Some Problems in the Dynamics of Biological Populations” (Bull. Math. Biophysics, 1966,28, 315–331). The paper is divided into two parts. In part one some aspects of the maximum size attained by the population during a finite time interval are studied for the case the stochastic process underlying the evolution of the population is a birth process. Two interesting by-products emerge from the study presented in part one; namely a combinatorial method of finding solutions to the Kolmogorov differential equations in special cases, and secondly, a set of criteria for the optimum allocation of genotypes in the host population of a host-pathogen system. The optimum allocation of genotypes in the host population is a problem of practical importance in controlling plant pathogens. In part two the theory of restricted transition probabilities developed in the companion paper is applied in finding the distribution of the time to the appearance of the first mutation for the case of a two dimensional birth process. The distribution of the time to the appearance of the first mutation is of importance in understanding the role mutation plays in the evolution of a population, particularly in the pathogen population of a host-pathogen system. The research reported in this paper was supported by the United States Atomic Energy Commission, Division of Biology and Medicine Project AT(45-1)-1729.  相似文献   

9.
This paper addresses the problem of modelling heterogeneous individual characteristics in a population. A flexible unified approach for stochastic parametrization dynamics of the distribution in population data is proposed. To approximate data with multiple observations per individual, models based on Markov processes are constructed. The method can be applied to scalar or multivariate characteristics, and its application to growth and allometry data is considered. Different stochastic versions of known growth and allometry functions are developed, which enable wide applicability. Simple informative growth indices are calculated as the moments of distribution. The three-parameter Gompertz growth model for size-at-age data was reparametrized to a size-increment data model with two parameters. An erratum to this article is available at .  相似文献   

10.
Techniques for characterizing very small single-channel currents buried in background noise are described and tested on simulated data to give confidence when applied to real data. Single channel currents are represented as a discrete-time, finite-state, homogeneous, Markov process, and the noise that obscures the signal is assumed to be white and Gaussian. The various signal model parameters, such as the Markov state levels and transition probabilities, are unknown. In addition to white Gaussian noise, the signal can be corrupted by deterministic interferences of known form but unknown parameters, such as the sinusoidal disturbance stemming from AC interference and a drift of the base line owing to a slow development of liquid-junction potentials. To characterize the signal buried in such stochastic and deterministic interferences, the problem is first formulated in the framework of a Hidden Markov Model and then the Expectation Maximization algorithm is applied to obtain the maximum likelihood estimates of the model parameters (state levels, transition probabilities), signals, and the parameters of the deterministic disturbances. Using fictitious channel currents embedded in the idealized noise, we first show that the signal processing technique is capable of characterizing the signal characteristics quite accurately even when the amplitude of currents is as small as 5-10 fA. The statistics of the signal estimated from the processing technique include the amplitude, mean open and closed duration, open-time and closed-time histograms, probability of dwell-time and the transition probability matrix. With a periodic interference composed, for example, of 50 Hz and 100 Hz components, or a linear drift of the baseline added to the segment containing channel currents and white noise, the parameters of the deterministic interference, such as the amplitude and phase of the sinusoidal wave, or the rate of linear drift, as well as all the relevant statistics of the signal, are accurately estimated with the algorithm we propose. Also, if the frequencies of the periodic interference are unknown, they can be accurately estimated. Finally, we provide a technique by which channel currents originating from the sum of two or more independent single channels are decomposed so that each process can be separately characterized. This process is also formulated as a Hidden Markov Model problem and solved by applying the Expectation Maximization algorithm. The scheme relies on the fact that the transition matrix of the summed Markov process can be construed as a tensor product of the transition matrices of individual processes.  相似文献   

11.
Complex biological dynamics often generate sequences of discrete events which can be described as a Markov process. The order of the underlying Markovian stochastic process is fundamental for characterizing statistical dependencies within sequences. As an example for this class of biological systems, we investigate the Markov order of sequences of microsaccadic eye movements from human observers. We calculate the integrated likelihood of a given sequence for various orders of the Markov process and use this in a Bayesian framework for statistical inference on the Markov order. Our analysis shows that data from most participants are best explained by a first-order Markov process. This is compatible with recent findings of a statistical coupling of subsequent microsaccade orientations. Our method might prove to be useful for a broad class of biological systems.  相似文献   

12.
Some time ago, the Markov processes were introduced in biomedical sciences in order to study disease history events. Homogeneous and Non-homogeneous Markov processes are an important field of research into stochastic processes, especially when exact transition times are unknown and interval-censored observations are present in the analysis. Non-homogeneous Markov process should be used when the homogeneous assumption is too strong. However these sorts of models increase the complexity of the analysis and standard software is limited. In this paper, some methods for fitting non-homogeneous Markov models are reviewed and an algorithm is proposed for biomedical data analysis. The method has been applied to analyse breast cancer data. Specific software for this purpose has been implemented.  相似文献   

13.
The hidden Markov model (HMM) is a framework for time series analysis widely applied to single-molecule experiments. Although initially developed for applications outside the natural sciences, the HMM has traditionally been used to interpret signals generated by physical systems, such as single molecules, evolving in a discrete state space observed at discrete time levels dictated by the data acquisition rate. Within the HMM framework, transitions between states are modeled as occurring at the end of each data acquisition period and are described using transition probabilities. Yet, whereas measurements are often performed at discrete time levels in the natural sciences, physical systems evolve in continuous time according to transition rates. It then follows that the modeling assumptions underlying the HMM are justified if the transition rates of a physical process from state to state are small as compared to the data acquisition rate. In other words, HMMs apply to slow kinetics. The problem is, because the transition rates are unknown in principle, it is unclear, a priori, whether the HMM applies to a particular system. For this reason, we must generalize HMMs for physical systems, such as single molecules, because these switch between discrete states in “continuous time”. We do so by exploiting recent mathematical tools developed in the context of inferring Markov jump processes and propose the hidden Markov jump process. We explicitly show in what limit the hidden Markov jump process reduces to the HMM. Resolving the discrete time discrepancy of the HMM has clear implications: we no longer need to assume that processes, such as molecular events, must occur on timescales slower than data acquisition and can learn transition rates even if these are on the same timescale or otherwise exceed data acquisition rates.  相似文献   

14.
A stochastic Markov chain model for metastatic progression is developed for primary lung cancer based on a network construction of metastatic sites with dynamics modeled as an ensemble of random walkers on the network. We calculate a transition matrix, with entries (transition probabilities) interpreted as random variables, and use it to construct a circular bi-directional network of primary and metastatic locations based on postmortem tissue analysis of 3827 autopsies on untreated patients documenting all primary tumor locations and metastatic sites from this population. The resulting 50 potential metastatic sites are connected by directed edges with distributed weightings, where the site connections and weightings are obtained by calculating the entries of an ensemble of transition matrices so that the steady-state distribution obtained from the long-time limit of the Markov chain dynamical system corresponds to the ensemble metastatic distribution obtained from the autopsy data set. We condition our search for a transition matrix on an initial distribution of metastatic tumors obtained from the data set. Through an iterative numerical search procedure, we adjust the entries of a sequence of approximations until a transition matrix with the correct steady-state is found (up to a numerical threshold). Since this constrained linear optimization problem is underdetermined, we characterize the statistical variance of the ensemble of transition matrices calculated using the means and variances of their singular value distributions as a diagnostic tool. We interpret the ensemble averaged transition probabilities as (approximately) normally distributed random variables. The model allows us to simulate and quantify disease progression pathways and timescales of progression from the lung position to other sites and we highlight several key findings based on the model.  相似文献   

15.
Hidden Markov models (HMMs) are a class of stochastic models that have proven to be powerful tools for the analysis of molecular sequence data. A hidden Markov model can be viewed as a black box that generates sequences of observations. The unobservable internal state of the box is stochastic and is determined by a finite state Markov chain. The observable output is stochastic with distribution determined by the state of the hidden Markov chain. We present a Bayesian solution to the problem of restoring the sequence of states visited by the hidden Markov chain from a given sequence of observed outputs. Our approach is based on a Monte Carlo Markov chain algorithm that allows us to draw samples from the full posterior distribution of the hidden Markov chain paths. The problem of estimating the probability of individual paths and the associated Monte Carlo error of these estimates is addressed. The method is illustrated by considering a problem of DNA sequence multiple alignment. The special structure for the hidden Markov model used in the sequence alignment problem is considered in detail. In conclusion, we discuss certain interesting aspects of biological sequence alignments that become accessible through the Bayesian approach to HMM restoration.  相似文献   

16.
Modelling forest dynamics: a perspective from point process methods   总被引:1,自引:0,他引:1  
This paper reviews the main applications of (marked) point process theory in forestry including functions to analyse spatial variability and the main (marked) point process models. Although correlation functions do describe spatial variability at distinct range of scale, they are clearly restricted to the analysis of few dominant species since they are based on pairwise analysis. This has over-simplified the spatial analysis of complex forest dynamics involving "large" number of species. Moreover, although process models can reproduce, to some extent, real forest spatial patterns of trees, the biological forest-ecological interpretation of the resulting spatial structures is difficult since these models usually lack of biological realism. This problem gains in strength as usually most of these point process models are defined in terms of purely spatial relationships, though in real life, forest develop through time. We thus aim to discuss the applicability of such formulations to analyse and simulate "real" forest dynamics and unwrap their shortcomes. We present a unified approach of modern spatially explicit forest growth models. Finally, we focus on a continuous space-time stochastic process as an alternative approach to generate marked point patterns evolving through space and time.  相似文献   

17.
Multi-state stochastic models are useful tools for studying complex dynamics such as chronic diseases. Semi-Markov models explicitly define distributions of waiting times, giving an extension of continuous time and homogeneous Markov models based implicitly on exponential distributions. This paper develops a parametric model adapted to complex medical processes. (i) We introduced a hazard function of waiting times with a U or inverse U shape. (ii) These distributions were specifically selected for each transition. (iii) The vector of covariates was also selected for each transition. We applied this method to the evolution of HIV infected patients. We used a sample of 1244 patients followed up at the hospital in Nice, France.  相似文献   

18.
Conway-Cranos LL  Doak DF 《Oecologia》2011,167(1):199-207
Repeated, spatially explicit sampling is widely used to characterize the dynamics of sessile communities in both terrestrial and aquatic systems, yet our understanding of the consequences of errors made in such sampling is limited. In particular, when Markov transition probabilities are calculated by tracking individual points over time, misidentification of the same spatial locations will result in biased estimates of transition probabilities, successional rates, and community trajectories. Nonetheless, to date, all published studies that use such data have implicitly assumed that resampling occurs without error when making estimates of transition rates. Here, we develop and test a straightforward maximum likelihood approach, based on simple field estimates of resampling errors, to arrive at corrected estimates of transition rates between species in a rocky intertidal community. We compare community Markov models based on raw and corrected transition estimates using data from Endocladia muricata-dominated plots in a California intertidal assemblage, finding that uncorrected predictions of succession consistently overestimate recovery time. We tested the precision and accuracy of the approach using simulated datasets and found good performance of our estimation method over a range of realistic sample sizes and error rates.  相似文献   

19.
20.
An importance-sampling method is presented for computing the likelihood of the configuration of population genetic data under general assumptions about population history and transitions among states. The configuration of the data is the number of chromosomes sampled that are in each of a finite set of states. Transitions among states are governed by a Markov chain with transition probabilities dependent on one or more parameters. The method assumes that the joint distribution of coalescence times of the underlying gene genealogy is independent of the genetic state of each lineage. Given a set of coalescence times, the probability that a pair of lineages is chosen to coalesce in each replicate is proportional to the contribution that the coalescence event makes to the probability of the data. This method can be applied to gene genealogies generated by the neutral coalescent process and to genealogies generated by other processes, such as a linear birth-death process which provides a good approximation to the dynamics of low-frequency alleles. Two applications are described. In the first, the fit of allele frequencies at two microsatellite loci sampled in a Sardinian population to the one-step mutation model is tested. The one-step model is rejected for one locus but not for the other. The second application is to low-frequency alleles in a geographically subdivided population. The geographic location is the allelic state, and the alleles are assumed to be sufficiently rare that their dynamics can be approximated by a linear birth-death process in which the birth and death rates are independent of geographic location. The analysis of eight low-frequency allozyme alleles found in the glaucous-winged gull, Larus glaucescens, illustrates how geographically restricted dispersal can be detected.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号