首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Sewall Wright's threshold model has been used in modelling discrete traits that may have a continuous trait underlying them, but it has proven difficult to make efficient statistical inferences with it. The availability of Markov chain Monte Carlo (MCMC) methods makes possible likelihood and Bayesian inference using this model. This paper discusses prospects for the use of the threshold model in morphological systematics to model the evolution of discrete all-or-none traits. There the threshold model has the advantage over 0/1 Markov process models in that it not only accommodates polymorphism within species, but can also allow for correlated evolution of traits with far fewer parameters that need to be inferred. The MCMC importance sampling methods needed to evaluate likelihood ratios for the threshold model are introduced and described in some detail.  相似文献   

2.
Recent advances in big data and analytics research have provided a wealth of large data sets that are too big to be analyzed in their entirety, due to restrictions on computer memory or storage size. New Bayesian methods have been developed for data sets that are large only due to large sample sizes. These methods partition big data sets into subsets and perform independent Bayesian Markov chain Monte Carlo analyses on the subsets. The methods then combine the independent subset posterior samples to estimate a posterior density given the full data set. These approaches were shown to be effective for Bayesian models including logistic regression models, Gaussian mixture models and hierarchical models. Here, we introduce the R package parallelMCMCcombine which carries out four of these techniques for combining independent subset posterior samples. We illustrate each of the methods using a Bayesian logistic regression model for simulation data and a Bayesian Gamma model for real data; we also demonstrate features and capabilities of the R package. The package assumes the user has carried out the Bayesian analysis and has produced the independent subposterior samples outside of the package. The methods are primarily suited to models with unknown parameters of fixed dimension that exist in continuous parameter spaces. We envision this tool will allow researchers to explore the various methods for their specific applications and will assist future progress in this rapidly developing field.  相似文献   

3.
A finite-context (Markov) model of order k yields the probability distribution of the next symbol in a sequence of symbols, given the recent past up to depth k. Markov modeling has long been applied to DNA sequences, for example to find gene-coding regions. With the first studies came the discovery that DNA sequences are non-stationary: distinct regions require distinct model orders. Since then, Markov and hidden Markov models have been extensively used to describe the gene structure of prokaryotes and eukaryotes. However, to our knowledge, a comprehensive study about the potential of Markov models to describe complete genomes is still lacking. We address this gap in this paper. Our approach relies on (i) multiple competing Markov models of different orders (ii) careful programming techniques that allow orders as large as sixteen (iii) adequate inverted repeat handling (iv) probability estimates suited to the wide range of context depths used. To measure how well a model fits the data at a particular position in the sequence we use the negative logarithm of the probability estimate at that position. The measure yields information profiles of the sequence, which are of independent interest. The average over the entire sequence, which amounts to the average number of bits per base needed to describe the sequence, is used as a global performance measure. Our main conclusion is that, from the probabilistic or information theoretic point of view and according to this performance measure, multiple competing Markov models explain entire genomes almost as well or even better than state-of-the-art DNA compression methods, such as XM, which rely on very different statistical models. This is surprising, because Markov models are local (short-range), contrasting with the statistical models underlying other methods, where the extensive data repetitions in DNA sequences is explored, and therefore have a non-local character.  相似文献   

4.
Realistic power calculations for large cohort studies and nested case control studies are essential for successfully answering important and complex research questions in epidemiology and clinical medicine. For this, we provide a methodical framework for general realistic power calculations via simulations that we put into practice by means of an R‐based template. We consider staggered recruitment and individual hazard rates, competing risks, interaction effects, and the misclassification of covariates. The study cohort is assembled with respect to given age‐, gender‐, and community distributions. Nested case‐control analyses with a varying number of controls enable comparisons of power with a full cohort analysis. Time‐to‐event generation under competing risks, including delayed study‐entry times, is realized on the basis of a six‐state Markov model. Incidence rates, prevalence of risk factors and prefixed hazard ratios allow for the assignment of age‐dependent transition rates given in the form of Cox models. These provide the basis for a central simulation‐algorithm, which is used for the generation of sample paths of the underlying time‐inhomogeneous Markov processes. With the inclusion of frailty terms into the Cox models the Markov property is specifically biased. An “individual Markov process given frailty” creates some unobserved heterogeneity between individuals. Different left‐truncation‐ and right‐censoring patterns call for the use of Cox models for data analysis. p‐values are recorded over repeated simulation runs to allow for the desired power calculations. For illustration, we consider scenarios with a “testing” character as well as realistic scenarios. This enables the validation of a correct implementation of theoretical concepts and concrete sample size recommendations against an actual epidemiological background, here given with possible substudy designs within the German National Cohort.  相似文献   

5.
This paper discusses a two‐state hidden Markov Poisson regression (MPR) model for analyzing longitudinal data of epileptic seizure counts, which allows for the rate of the Poisson process to depend on covariates through an exponential link function and to change according to the states of a two‐state Markov chain with its transition probabilities associated with covariates through a logit link function. This paper also considers a two‐state hidden Markov negative binomial regression (MNBR) model, as an alternative, by using the negative binomial instead of Poisson distribution in the proposed MPR model when there exists extra‐Poisson variation conditional on the states of the Markov chain. The two proposed models in this paper relax the stationary requirement of the Markov chain, allow for overdispersion relative to the usual Poisson regression model and for correlation between repeated observations. The proposed methodology provides a plausible analysis for the longitudinal data of epileptic seizure counts, and the MNBR model fits the data much better than the MPR model. Maximum likelihood estimation using the EM and quasi‐Newton algorithms is discussed. A Monte Carlo study for the proposed MPR model investigates the reliability of the estimation method, the choice of probabilities for the initial states of the Markov chain, and some finite sample behaviors of the maximum likelihood estimates, suggesting that (1) the estimation method is accurate and reliable as long as the total number of observations is reasonably large, and (2) the choice of probabilities for the initial states of the Markov process has little impact on the parameter estimates.  相似文献   

6.
Performing causal inference in observational studies requires we assume confounding variables are correctly adjusted for. In settings with few discrete-valued confounders, standard models can be employed. However, as the number of confounders increases these models become less feasible as there are fewer observations available for each unique combination of confounding variables. In this paper, we propose a new model for estimating treatment effects in observational studies that incorporates both parametric and nonparametric outcome models. By conceptually splitting the data, we can combine these models while maintaining a conjugate framework, allowing us to avoid the use of Markov chain Monte Carlo (MCMC) methods. Approximations using the central limit theorem and random sampling allow our method to be scaled to high-dimensional confounders. Through simulation studies we show our method can be competitive with benchmark models while maintaining efficient computation, and illustrate the method on a large epidemiological health survey.  相似文献   

7.
We investigate models for animal feeding behaviour, with the aim of improving understanding of how animals organise their behaviour in the short term. We consider three classes of model: hidden Markov, latent Gaussian and semi-Markov. Each can predict the typical 'clustered' feeding behaviour that is generally observed, however they differ in the extent to which 'memory' of previous behaviour is allowed to affect future behaviour. The hidden Markov model has 'lack of memory', the current behavioural state being dependent on the previous state only. The latent Gaussian model assumes feeding/non-feeding periods to occur by the thresholding of an underlying continuous variable, thereby incorporating some 'short-term memory'. The semi-Markov model, by taking into account the duration of time spent in the previous state, can be said to incorporate 'longer-term memory'. We fit each of these models to a dataset of cow feeding behaviour. We find the semi-Markov model (longer-term memory) to have the best fit to the data and the hidden Markov model (lack of memory) the worst. We argue that in view of effects of satiety on short-term feeding behaviour of animal species in general, biologically suitable models should allow 'memory' to play a role. We conclude that our findings are equally relevant for the analysis of other types of short-term behaviour that are governed by satiety-like principles.  相似文献   

8.
The flux of ions and molecules in and out of the cell is vital for maintaining the basis of various biological processes. The permeation of substrates across the cellular membrane is mediated through the function of specialized integral membrane proteins commonly known as membrane transporters. These proteins undergo a series of structural rearrangements that allow a primary substrate binding site to be accessed from either side of the membrane at a given time. Structural insights provided by experimentally resolved structures of membrane transporters have aided in the biophysical characterization of these important molecular drug targets. However, characterizing the transitions between conformational states remains challenging to achieve both experimentally and computationally. Though molecular dynamics simulations are a powerful approach to provide atomistic resolution of protein dynamics, a recurring challenge is its ability to efficiently obtain relevant timescales of large conformational transitions as exhibited in transporters. One approach to overcome this difficulty is to adaptively guide the simulation to favor exploration of the conformational landscape, otherwise known as adaptive sampling. Furthermore, such sampling is greatly benefited by the statistical analysis of Markov state models. Historically, the use of Markov state models has been effective in quantifying slow dynamics or long timescale behaviors such as protein folding. Here, we review recent implementations of adaptive sampling and Markov state models to not only address current limitations of molecular dynamics simulations, but to also highlight how Markov state modeling can be applied to investigate the structure–function mechanisms of large, complex membrane transporters.  相似文献   

9.
Essential to applying a mathematical model to a real-world application is calibrating the model to data. Methods for calibrating population models often become computationally infeasible when the population size (more generally the size of the state space) becomes large, or other complexities such as time-dependent transition rates, or sampling error, are present. Continuing previous work in this series on the use of diffusion approximations for efficient calibration of continuous-time Markov chains, I present efficient techniques for time-inhomogeneous chains and accounting for observation error. Observation error (partial observability) is accounted for by joint estimation using a scaled unscented Kalman filter for state-space models. The methodology will be illustrated with respect to models of disease dynamics incorporating seasonal transmission rate and in the presence of observation error, including application to two influenza outbreaks and measles in London in the pre-vaccination era.  相似文献   

10.
Late-onset familial Alzheimer disease (LOFAD) is a genetically heterogeneous and complex disease for which only one locus, APOE, has been definitively identified. Difficulties in identifying additional loci are likely to stem from inadequate linkage analysis methods. Nonparametric methods suffer from low power because of limited use of the data, and traditional parametric methods suffer from limitations in the complexity of the genetic model that can be feasibly used in analysis. Alternative methods that have recently been developed include Bayesian Markov chain-Monte Carlo methods. These methods allow multipoint linkage analysis under oligogenic trait models in pedigrees of arbitrary size; at the same time, they allow for inclusion of covariates in the analysis. We applied this approach to an analysis of LOFAD on five chromosomes with previous reports of linkage. We identified strong evidence of a second LOFAD gene on chromosome 19p13.2, which is distinct from APOE on 19q. We also obtained weak evidence of linkage to chromosome 10 at the same location as a previous report of linkage but found no evidence for linkage of LOFAD age-at-onset loci to chromosomes 9, 12, or 21.  相似文献   

11.
Methods for Bayesian inference of phylogeny using DNA sequences based on Markov chain Monte Carlo (MCMC) techniques allow the incorporation of arbitrarily complex models of the DNA substitution process, and other aspects of evolution. This has increased the realism of models, potentially improving the accuracy of the methods, and is largely responsible for their recent popularity. Another consequence of the increased complexity of models in Bayesian phylogenetics is that these models have, in several cases, become overparameterized. In such cases, some parameters of the model are not identifiable; different combinations of nonidentifiable parameters lead to the same likelihood, making it impossible to decide among the potential parameter values based on the data. Overparameterized models can also slow the rate of convergence of MCMC algorithms due to large negative correlations among parameters in the posterior probability distribution. Functions of parameters can sometimes be found, in overparameterized models, that are identifiable, and inferences based on these functions are legitimate. Examples are presented of overparameterized models that have been proposed in the context of several Bayesian methods for inferring the relative ages of nodes in a phylogeny when the substitution rate evolves over time.  相似文献   

12.
We examine memory models for multisite capture–recapture data. This is an important topic, as animals may exhibit behavior that is more complex than simple first‐order Markov movement between sites, when it is necessary to devise and fit appropriate models to data. We consider the Arnason–Schwarz model for multisite capture–recapture data, which incorporates just first‐order Markov movement, and also two alternative models that allow for memory, the Brownie model and the Pradel model. We use simulation to compare two alternative tests which may be undertaken to determine whether models for multisite capture–recapture data need to incorporate memory. Increasing the complexity of models runs the risk of introducing parameters that cannot be estimated, irrespective of how much data are collected, a feature which is known as parameter redundancy. Rouan et al. (JABES, 2009, pp 338–355) suggest a constraint that may be applied to overcome parameter redundancy when it is present in multisite memory models. For this case, we apply symbolic methods to derive a simpler constraint, which allows more parameters to be estimated, and give general results not limited to a particular configuration. We also consider the effect sparse data can have on parameter redundancy and recommend minimum sample sizes. Memory models for multisite capture–recapture data can be highly complex and difficult to fit to data. We emphasize the importance of a structured approach to modeling such data, by considering a priori which parameters can be estimated, which constraints are needed in order for estimation to take place, and how much data need to be collected. We also give guidance on the amount of data needed to use two alternative families of tests for whether models for multisite capture–recapture data need to incorporate memory.  相似文献   

13.
14.
Emerging ecological time series from long-term ecological studies and remote sensing provide excellent opportunities for ecologists to study the dynamic patterns and governing processes of ecological systems. However, signal extraction from long-term time series often requires system learning (e.g., estimation of true system state) to process the large amount of information, to reconstruct system state, to account for measurement error, and to handle missing data. State-space models (SSMs) are a natural choice for these tasks and thus have received increasing attention in ecological and environmental studies. Data-based learning using SSMs that connect ecological processes to the measurement of system state becomes a useful technique in the ecological informatics toolkit. The present study illustrates the use of the Kalman filter (KF), an estimator of SSMs, with case studies of population dynamics. The examples of the SSM applications include the reconstruction of system state using the KF method and Markov chain Monte Carlo methods, estimation of measurement-error variances in the estimates of animal population abundance using basic structural models (BSMs), and estimation of missing values using the KF and Kalman smoother. Estimation of measurement-error variances by BSMs does not require knowledge of the functional form that generates the time series data. Instead, BSMs approximate the trajectory or deterministic skeleton of a system dynamics in a semi-parametric fashion, and provide a robust estimator of measurement-error variances. The present study also compares Bayesian SSMs with non-Bayesian SSMs. The joint use of the KF method or its extensions and Markov chain Monte Carlo (MCMC) methods is a promising approach to the parameter estimation of SSMs.  相似文献   

15.
The most commonly used models for analysing local dependencies in DNA sequences are (high-order) Markov chains. Incorporating knowledge relative to the possible grouping of the nucleotides enables to define dedicated sub-classes of Markov chains. The problem of formulating lumpability hypotheses for a Markov chain is therefore addressed. In the classical approach to lumpability, this problem can be formulated as the determination of an appropriate state space (smaller than the original state space) such that the lumped chain defined on this state space retains the Markov property. We propose a different perspective on lumpability where the state space is fixed and the partitioning of this state space is represented by a one-to-many probabilistic function within a two-level stochastic process. Three nested classes of lumped processes can be defined in this way as sub-classes of first-order Markov chains. These lumped processes enable parsimonious reparameterizations of Markov chains that help to reveal relevant partitions of the state space. Characterizations of the lumped processes on the original transition probability matrix are derived. Different model selection methods relying either on hypothesis testing or on penalized log-likelihood criteria are presented as well as extensions to lumped processes constructed from high-order Markov chains. The relevance of the proposed approach to lumpability is illustrated by the analysis of DNA sequences. In particular, the use of lumped processes enables to highlight differences between intronic sequences and gene untranslated region sequences.  相似文献   

16.
Yang HC  Chao A 《Biometrics》2005,61(4):1010-1017
A bivariate Markov chain approach that includes both enduring (long-term) and ephemeral (short-term) behavioral effects in models for capture-recapture experiments is proposed. The capture history of each animal is modeled as a Markov chain with a bivariate state space with states determined by the capture status (capture/noncapture) and marking status (marked/unmarked). In this framework, a conditional-likelihood method is used to estimate the population size and the transition probabilities. The classical behavioral model that assumes only an enduring behavioral effect is included as a special case of the bivariate Markovian model. Another special case that assumes only an ephemeral behavioral effect reduces to a univariate Markov chain based on capture/noncapture status. The model with the ephemeral behavioral effect is extended to incorporate time effects; in this model, in contrast to extensions of the classical behavioral model, all parameters are identifiable. A data set is analyzed to illustrate the use of the Markovian models in interpreting animals' behavioral response. Simulation results are reported to examine the performance of the estimators.  相似文献   

17.
Usually in capture–recapture, a model parameter is time or time since first capture dependent. However, the case where the probability of staying in one state depends on the time spent in that particular state is not rare. Hidden Markov models are not appropriate to manage these situations. A more convenient approach would be to consider models that incorporate semi‐Markovian states which explicitly define the waiting time distribution and have been used in previous biologic studies as a convenient framework for modeling the time spent in a given physiological state. Here, we propose hidden Markovian models that combine several nonhomogeneous Markovian states with one semi‐Markovian state and which (i) are well adapted to imperfect and variable detection and (ii) allow us to consider time, time since first capture, and time spent in one state effects. Implementation details depending on the number of semi‐Markovian states are discussed. From a user's perspective, the present approach enhances the toolbox for analyzing capture–recapture data. We then show the potential of this framework by means of two ecological examples: (i) stopover duration and (ii) breeding success dynamics.  相似文献   

18.
Summary Continuous‐time multistate models are widely used for categorical response data, particularly in the modeling of chronic diseases. However, inference is difficult when the process is only observed at discrete time points, with no information about the times or types of events between observation times, unless a Markov assumption is made. This assumption can be limiting as rates of transition between disease states might instead depend on the time since entry into the current state. Such a formulation results in a semi‐Markov model. We show that the computational problems associated with fitting semi‐Markov models to panel‐observed data can be alleviated by considering a class of semi‐Markov models with phase‐type sojourn distributions. This allows methods for hidden Markov models to be applied. In addition, extensions to models where observed states are subject to classification error are given. The methodology is demonstrated on a dataset relating to development of bronchiolitis obliterans syndrome in post‐lung‐transplantation patients.  相似文献   

19.
Stochastic models of ion channels have been based largely on Markov theory where individual states and transition rates must be specified, and sojourn-time densities for each state are constrained to be exponential. This study presents an approach based on random-sum methods and alternating-renewal theory, allowing individual states to be grouped into classes provided the successive sojourn times in a given class are independent and identically distributed. Under these conditions Markov models form a special case. The utility of the approach is illustrated by considering the effects of limited time resolution (modelled by using a discrete detection limit, xi) on the properties of observable events, with emphasis on the observed open-time (xi-open-time). The cumulants and Laplace transform for a xi-open-time are derived for a range of Markov and non-Markov models; several useful approximations to the xi-open-time density function are presented. Numerical studies show that the effects of limited time resolution can be extreme, and also highlight the relative importance of the various model parameters. The theory could form a basis for future inferential studies in which parameter estimation takes account of limited time resolution in single channel records. Appendixes include relevant results concerning random sums and a discussion of the role of exponential distributions in Markov models.  相似文献   

20.
A class of discrete-time models of infectious disease spread, referred to as individual-level models (ILMs), are typically fitted in a Bayesian Markov chain Monte Carlo (MCMC) framework. These models quantify probabilistic outcomes regarding the risk of infection of susceptible individuals due to various susceptibility and transmissibility factors, including their spatial distance from infectious individuals. The infectious pressure from infected individuals exerted on susceptible individuals is intrinsic to these ILMs. Unfortunately, quantifying this infectious pressure for data sets containing many individuals can be computationally burdensome, leading to a time-consuming likelihood calculation and, thus, computationally prohibitive MCMC-based analysis. This problem worsens when using data augmentation to allow for uncertainty in infection times. In this paper, we develop sampling methods that can be used to calculate a fast, approximate likelihood when fitting such disease models. A simple random sampling approach is initially considered followed by various spatially-stratified schemes. We test and compare the performance of our methods with both simulated data and data from the 2001 foot-and-mouth disease (FMD) epidemic in the U.K. Our results indicate that substantial computation savings can be obtained—albeit, of course, with some information loss—suggesting that such techniques may be of use in the analysis of very large epidemic data sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号