首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
A mixture Markov regression model is proposed to analyze heterogeneous time series data. Mixture quasi‐likelihood is formulated to model time series with mixture components and exogenous variables. The parameters are estimated by quasi‐likelihood estimating equations. A modified EM algorithm is developed for the mixture time series model. The model and proposed algorithm are tested on simulated data and applied to mosquito surveillance data in Peel Region, Canada.  相似文献   

2.

Background  

The estimation of demographic parameters from genetic data often requires the computation of likelihoods. However, the likelihood function is computationally intractable for many realistic evolutionary models, and the use of Bayesian inference has therefore been limited to very simple models. The situation changed recently with the advent of Approximate Bayesian Computation (ABC) algorithms allowing one to obtain parameter posterior distributions based on simulations not requiring likelihood computations.  相似文献   

3.
Interval‐censored recurrent event data arise when the event of interest is not readily observed but the cumulative event count can be recorded at periodic assessment times. In some settings, chronic disease processes may resolve, and individuals will cease to be at risk of events at the time of disease resolution. We develop an expectation‐maximization algorithm for fitting a dynamic mover‐stayer model to interval‐censored recurrent event data under a Markov model with a piecewise‐constant baseline rate function given a latent process. The model is motivated by settings in which the event times and the resolution time of the disease process are unobserved. The likelihood and algorithm are shown to yield estimators with small empirical bias in simulation studies. Data are analyzed on the cumulative number of damaged joints in patients with psoriatic arthritis where individuals experience disease remission.  相似文献   

4.
Large amounts of longitudinal health records are now available for dynamic monitoring of the underlying processes governing the observations. However, the health status progression across time is not typically observed directly: records are observed only when a subject interacts with the system, yielding irregular and often sparse observations. This suggests that the observed trajectories should be modeled via a latent continuous‐time process potentially as a function of time‐varying covariates. We develop a continuous‐time hidden Markov model to analyze longitudinal data accounting for irregular visits and different types of observations. By employing a specific missing data likelihood formulation, we can construct an efficient computational algorithm. We focus on Bayesian inference for the model: this is facilitated by an expectation‐maximization algorithm and Markov chain Monte Carlo methods. Simulation studies demonstrate that these approaches can be implemented efficiently for large data sets in a fully Bayesian setting. We apply this model to a real cohort where patients suffer from chronic obstructive pulmonary disease with the outcome being the number of drugs taken, using health care utilization indicators and patient characteristics as covariates.  相似文献   

5.
The genetic analysis of characters that change as a function of some independent and continuous variable has received increasing attention in the biological and statistical literature. Previous work in this area has focused on the analysis of normally distributed characters that are directly observed. We propose a framework for the development and specification of models for a quantitative genetic analysis of function-valued characters that are not directly observed, such as genetic variation in age-specific mortality rates or complex threshold characters. We employ a hybrid Markov chain Monte Carlo algorithm involving a Monte Carlo EM algorithm coupled with a Markov chain approximation to the likelihood, which is quite robust and provides accurate estimates of the parameters in our models. The methods are investigated using simulated data and are applied to a large data set measuring mortality rates in the fruit fly, Drosophila melanogaster.  相似文献   

6.
Phylogenetic comparative methods (PCMs) have been used to test evolutionary hypotheses at phenotypic levels. The evolutionary modes commonly included in PCMs are Brownian motion (genetic drift) and the Ornstein–Uhlenbeck process (stabilizing selection), whose likelihood functions are mathematically tractable. More complicated models of evolutionary modes, such as branch‐specific directional selection, have not been used because calculations of likelihood and parameter estimates in the maximum‐likelihood framework are not straightforward. To solve this problem, we introduced a population genetics framework into a PCM, and here, we present a flexible and comprehensive framework for estimating evolutionary parameters through simulation‐based likelihood computations. The method does not require analytic likelihood computations, and evolutionary models can be used as long as simulation is possible. Our approach has many advantages: it incorporates different evolutionary modes for phenotypes into phylogeny, it takes intraspecific variation into account, it evaluates full likelihood instead of using summary statistics, and it can be used to estimate ancestral traits. We present a successful application of the method to the evolution of brain size in primates. Our method can be easily implemented in more computationally effective frameworks such as approximate Bayesian computation (ABC), which will enhance the use of computationally intensive methods in the study of phenotypic evolution.  相似文献   

7.
MOTIVATION: Maximum likelihood (ML) is an increasingly popular optimality criterion for selecting evolutionary trees. Yet the computational complexity of ML was open for over 20 years, and only recently resolved by the authors for the Jukes-Cantor model of substitution and its generalizations. It was proved that reconstructing the ML tree is computationally intractable (NP-hard). In this work we explore three directions, which extend that result. RESULTS: (1) We show that ML under the assumption of molecular clock is still computationally intractable (NP-hard). (2) We show that not only is it computationally intractable to find the exact ML tree, even approximating the logarithm of the ML for any multiplicative factor smaller than 1.00175 is computationally intractable. (3) We develop an algorithm for approximating log-likelihood under the condition that the input sequences are sparse. It employs any approximation algorithm for parsimony, and asymptotically achieves the same approximation ratio. We note that ML reconstruction for sparse inputs is still hard under this condition, and furthermore many real datasets satisfy it.  相似文献   

8.
Summary In many applications involving geographically indexed data, interest focuses on identifying regions of rapid change in the spatial surface, or the related problem of the construction or testing of boundaries separating regions with markedly different observed values of the spatial variable. This process is often referred to in the literature as boundary analysis or wombling. Recent developments in hierarchical models for point‐referenced (geostatistical) and areal (lattice) data have led to corresponding statistical wombling methods, but there does not appear to be any literature on the subject in the point‐process case, where the locations themselves are assumed to be random and likelihood evaluation is notoriously difficult. We extend existing point‐level and areal wombling tools to this case, obtaining full posterior inference for multivariate spatial random effects that, when mapped, can help suggest spatial covariates still missing from the model. In the areal case we can also construct wombled maps showing significant boundaries in the fitted intensity surface, while the point‐referenced formulation permits testing the significance of a postulated boundary. In the computationally demanding point‐referenced case, our algorithm combines Monte Carlo approximants to the likelihood with a predictive process step to reduce the dimension of the problem to a manageable size. We apply these techniques to an analysis of colorectal and prostate cancer data from the northern half of Minnesota, where a key substantive concern is possible similarities in their spatial patterns, and whether they are affected by each patient's distance to facilities likely to offer helpful cancer screening options.  相似文献   

9.
P S Albert 《Biometrics》1991,47(4):1371-1381
This paper discusses a model for a time series of epileptic seizure counts in which the mean of a Poisson distribution changes according to an underlying two-state Markov chain. The EM algorithm (Dempster, Laird, and Rubin, 1977, Journal of the Royal Statistical Society, Series B 39, 1-38) is used to compute maximum likelihood estimators for the parameters of this two-state mixture model and extensions are made allowing for nonstationarity. The model is illustrated using daily seizure counts for patients with intractable epilepsy and results are compared with a simple Poisson distribution and Poisson regressions. Some simulation results are also presented to demonstrate the feasibility of this model.  相似文献   

10.
Bartolucci F  Pennoni F 《Biometrics》2007,63(2):568-578
We propose an extension of the latent class model for the analysis of capture-recapture data which allows us to take into account the effect of a capture on the behavior of a subject with respect to future captures. The approach is based on the assumption that the variable indexing the latent class of a subject follows a Markov chain with transition probabilities depending on the previous capture history. Several constraints are allowed on these transition probabilities and on the parameters of the conditional distribution of the capture configuration given the latent process. We also allow for the presence of discrete explanatory variables, which may affect the parameters of the latent process. To estimate the resulting models, we rely on the conditional maximum likelihood approach and for this aim we outline an EM algorithm. We also give some simple rules for point and interval estimation of the population size. The approach is illustrated by applying it to two data sets concerning small mammal populations.  相似文献   

11.
The molecular clock provides a powerful way to estimate species divergence times. If information on some species divergence times is available from the fossil or geological record, it can be used to calibrate a phylogeny and estimate divergence times for all nodes in the tree. The Bayesian method provides a natural framework to incorporate different sources of information concerning divergence times, such as information in the fossil and molecular data. Current models of sequence evolution are intractable in a Bayesian setting, and Markov chain Monte Carlo (MCMC) is used to generate the posterior distribution of divergence times and evolutionary rates. This method is computationally expensive, as it involves the repeated calculation of the likelihood function. Here, we explore the use of Taylor expansion to approximate the likelihood during MCMC iteration. The approximation is much faster than conventional likelihood calculation. However, the approximation is expected to be poor when the proposed parameters are far from the likelihood peak. We explore the use of parameter transforms (square root, logarithm, and arcsine) to improve the approximation to the likelihood curve. We found that the new methods, particularly the arcsine-based transform, provided very good approximations under relaxed clock models and also under the global clock model when the global clock is not seriously violated. The approximation is poorer for analysis under the global clock when the global clock is seriously wrong and should thus not be used. The results suggest that the approximate method may be useful for Bayesian dating analysis using large data sets.  相似文献   

12.
We explore the estimation of uncertainty in evolutionary parameters using a recently devised approach for resampling entire additive genetic variance–covariance matrices ( G ). Large‐sample theory shows that maximum‐likelihood estimates (including restricted maximum likelihood, REML) asymptotically have a multivariate normal distribution, with covariance matrix derived from the inverse of the information matrix, and mean equal to the estimated G . This suggests that sampling estimates of G from this distribution can be used to assess the variability of estimates of G , and of functions of G . We refer to this as the REML‐MVN method. This has been implemented in the mixed‐model program WOMBAT. Estimates of sampling variances from REML‐MVN were compared to those from the parametric bootstrap and from a Bayesian Markov chain Monte Carlo (MCMC) approach (implemented in the R package MCMCglmm). We apply each approach to evolvability statistics previously estimated for a large, 20‐dimensional data set for Drosophila wings. REML‐MVN and MCMC sampling variances are close to those estimated with the parametric bootstrap. Both slightly underestimate the error in the best‐estimated aspects of the G matrix. REML analysis supports the previous conclusion that the G matrix for this population is full rank. REML‐MVN is computationally very efficient, making it an attractive alternative to both data resampling and MCMC approaches to assessing confidence in parameters of evolutionary interest.  相似文献   

13.
Zero‐truncated data arises in various disciplines where counts are observed but the zero count category cannot be observed during sampling. Maximum likelihood estimation can be used to model these data; however, due to its nonstandard form it cannot be easily implemented using well‐known software packages, and additional programming is often required. Motivated by the Rao–Blackwell theorem, we develop a weighted partial likelihood approach to estimate model parameters for zero‐truncated binomial and Poisson data. The resulting estimating function is equivalent to a weighted score function for standard count data models, and allows for applying readily available software. We evaluate the efficiency for this new approach and show that it performs almost as well as maximum likelihood estimation. The weighted partial likelihood approach is then extended to regression modelling and variable selection. We examine the performance of the proposed methods through simulation and present two case studies using real data.  相似文献   

14.
Hidden Markov modeling (HMM) can be applied to extract single channel kinetics at signal-to-noise ratios that are too low for conventional analysis. There are two general HMM approaches: traditional Baum's reestimation and direct optimization. The optimization approach has the advantage that it optimizes the rate constants directly. This allows setting constraints on the rate constants, fitting multiple data sets across different experimental conditions, and handling nonstationary channels where the starting probability of the channel depends on the unknown kinetics. We present here an extension of this approach that addresses the additional issues of low-pass filtering and correlated noise. The filtering is modeled using a finite impulse response (FIR) filter applied to the underlying signal, and the noise correlation is accounted for using an autoregressive (AR) process. In addition to correlated background noise, the algorithm allows for excess open channel noise that can be white or correlated. To maximize the efficiency of the algorithm, we derive the analytical derivatives of the likelihood function with respect to all unknown model parameters. The search of the likelihood space is performed using a variable metric method. Extension of the algorithm to data containing multiple channels is described. Examples are presented that demonstrate the applicability and effectiveness of the algorithm. Practical issues such as the selection of appropriate noise AR orders are also discussed through examples.  相似文献   

15.
Longitudinal data usually consist of a number of short time series. A group of subjects or groups of subjects are followed over time and observations are often taken at unequally spaced time points, and may be at different times for different subjects. When the errors and random effects are Gaussian, the likelihood of these unbalanced linear mixed models can be directly calculated, and nonlinear optimization used to obtain maximum likelihood estimates of the fixed regression coefficients and parameters in the variance components. For binary longitudinal data, a two state, non-homogeneous continuous time Markov process approach is used to model serial correlation within subjects. Formulating the model as a continuous time Markov process allows the observations to be equally or unequally spaced. Fixed and time varying covariates can be included in the model, and the continuous time model allows the estimation of the odds ratio for an exposure variable based on the steady state distribution. Exact likelihoods can be calculated. The initial probability distribution on the first observation on each subject is estimated using logistic regression that can involve covariates, and this estimation is embedded in the overall estimation. These models are applied to an intervention study designed to reduce children's sun exposure.  相似文献   

16.
The models of nucleotide substitution used by most maximum likelihood-based methods assume that the evolutionary process is stationary, reversible, and homogeneous. We present an extension of the Barry and Hartigan model, which can be used to estimate parameters by maximum likelihood (ML) when the data contain invariant sites and there are violations of the assumptions of stationarity, reversibility, and homogeneity. Unlike most ML methods for estimating invariant sites, we estimate the nucleotide composition of invariant sites separately from that of variable sites. We analyze a bacterial data set where problems due to lack of stationarity and homogeneity have been previously well noted and use the parametric bootstrap to show that the data are consistent with our general Markov model. We also show that estimates of invariant sites obtained using our method are fairly accurate when applied to data simulated under the general Markov model.  相似文献   

17.
Summary This article develops a latent model and likelihood‐based inference to detect temporal clustering of events. The model mimics typical processes generating the observed data. We apply model selection techniques to determine the number of clusters, and develop likelihood inference and a Monte Carlo expectation–maximization algorithm to estimate model parameters, detect clusters, and identify cluster locations. Our method differs from the classical scan statistic in that we can simultaneously detect multiple clusters of varying sizes. We illustrate the methodology with two real data applications and evaluate its efficiency through simulation studies. For the typical data‐generating process, our methodology is more efficient than a competing procedure that relies on least squares.  相似文献   

18.
This paper discusses a two‐state hidden Markov Poisson regression (MPR) model for analyzing longitudinal data of epileptic seizure counts, which allows for the rate of the Poisson process to depend on covariates through an exponential link function and to change according to the states of a two‐state Markov chain with its transition probabilities associated with covariates through a logit link function. This paper also considers a two‐state hidden Markov negative binomial regression (MNBR) model, as an alternative, by using the negative binomial instead of Poisson distribution in the proposed MPR model when there exists extra‐Poisson variation conditional on the states of the Markov chain. The two proposed models in this paper relax the stationary requirement of the Markov chain, allow for overdispersion relative to the usual Poisson regression model and for correlation between repeated observations. The proposed methodology provides a plausible analysis for the longitudinal data of epileptic seizure counts, and the MNBR model fits the data much better than the MPR model. Maximum likelihood estimation using the EM and quasi‐Newton algorithms is discussed. A Monte Carlo study for the proposed MPR model investigates the reliability of the estimation method, the choice of probabilities for the initial states of the Markov chain, and some finite sample behaviors of the maximum likelihood estimates, suggesting that (1) the estimation method is accurate and reliable as long as the total number of observations is reasonably large, and (2) the choice of probabilities for the initial states of the Markov process has little impact on the parameter estimates.  相似文献   

19.
Cook RJ 《Biometrics》1999,55(3):915-920
Many chronic medical conditions can be meaningfully characterized in terms of a two-state stochastic process. Here we consider the problem in which subjects make transitions among two such states in continuous time but are only observed at discrete, irregularly spaced time points that are possibly unique to each subject. Data arising from such an observation scheme are called panel data, and methods for related analyses are typically based on Markov assumptions. The purpose of this article is to present a conditionally Markov model that accommodates subject-to-subject variation in the model parameters by the introduction of random effects. We focus on a particular random effects formulation that generates a closed-form expression for the marginal likelihood. The methodology is illustrated by application to a data set from a parasitic field infection survey.  相似文献   

20.
In this paper, our aim is to analyze geographical and temporal variability of disease incidence when spatio‐temporal count data have excess zeros. To that end, we consider random effects in zero‐inflated Poisson models to investigate geographical and temporal patterns of disease incidence. Spatio‐temporal models that employ conditionally autoregressive smoothing across the spatial dimension and B‐spline smoothing over the temporal dimension are proposed. The analysis of these complex models is computationally difficult from the frequentist perspective. On the other hand, the advent of the Markov chain Monte Carlo algorithm has made the Bayesian analysis of complex models computationally convenient. Recently developed data cloning method provides a frequentist approach to mixed models that is also computationally convenient. We propose to use data cloning, which yields to maximum likelihood estimation, to conduct frequentist analysis of zero‐inflated spatio‐temporal modeling of disease incidence. One of the advantages of the data cloning approach is that the prediction and corresponding standard errors (or prediction intervals) of smoothing disease incidence over space and time is easily obtained. We illustrate our approach using a real dataset of monthly children asthma visits to hospital in the province of Manitoba, Canada, during the period April 2006 to March 2010. Performance of our approach is also evaluated through a simulation study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号