首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
We consider longitudinal studies in which the outcome observed over time is binary and the covariates of interest are categorical. With no missing responses or covariates, one specifies a multinomial model for the responses given the covariates and uses maximum likelihood to estimate the parameters. Unfortunately, incomplete data in the responses and covariates are a common occurrence in longitudinal studies. Here we assume the missing data are missing at random (Rubin, 1976, Biometrika 63, 581-592). Since all of the missing data (responses and covariates) are categorical, a useful technique for obtaining maximum likelihood parameter estimates is the EM algorithm by the method of weights proposed in Ibrahim (1990, Journal of the American Statistical Association 85, 765-769). In using the EM algorithm with missing responses and covariates, one specifies the joint distribution of the responses and covariates. Here we consider the parameters of the covariate distribution as a nuisance. In data sets where the percentage of missing data is high, the estimates of the nuisance parameters can lead to highly unstable estimates of the parameters of interest. We propose a conditional model for the covariate distribution that has several modeling advantages for the EM algorithm and provides a reduction in the number of nuisance parameters, thus providing more stable estimates in finite samples.  相似文献   

Pan W  Lin X  Zeng D 《Biometrics》2006,62(2):402-412
We propose a new class of models, transition measurement error models, to study the effects of covariates and the past responses on the current response in longitudinal studies when one of the covariates is measured with error. We show that the response variable conditional on the error-prone covariate follows a complex transition mixed effects model. The naive model obtained by ignoring the measurement error correctly specifies the transition part of the model, but misspecifies the covariate effect structure and ignores the random effects. We next study the asymptotic bias in naive estimator obtained by ignoring the measurement error for both continuous and discrete outcomes. We show that the naive estimator of the regression coefficient of the error-prone covariate is attenuated, while the naive estimators of the regression coefficients of the past responses are generally inflated. We then develop a structural modeling approach for parameter estimation using the maximum likelihood estimation method. In view of the multidimensional integration required by full maximum likelihood estimation, an EM algorithm is developed to calculate maximum likelihood estimators, in which Monte Carlo simulations are used to evaluate the conditional expectations in the E-step. We evaluate the performance of the proposed method through a simulation study and apply it to a longitudinal social support study for elderly women with heart disease. An additional simulation study shows that the Bayesian information criterion (BIC) performs well in choosing the correct transition orders of the models.  相似文献   

This article investigates maximum likelihood estimation with saturated and unsaturated models for correlated exchangeable binary data, when a sample of independent clusters of varying sizes is available. We discuss various parameterizations of these models, and propose using the EM algorithm to obtain maximum likelihood estimates. The methodology is illustrated by applications to a study of familial disease aggregation and to the design of a proposed group randomized cancer prevention trial.  相似文献   

Stubbendick AL  Ibrahim JG 《Biometrics》2003,59(4):1140-1150
This article analyzes quality of life (QOL) data from an Eastern Cooperative Oncology Group (ECOG) melanoma trial that compared treatment with ganglioside vaccination to treatment with high-dose interferon. The analysis of this data set is challenging due to several difficulties, namely, nonignorable missing longitudinal responses and baseline covariates. Hence, we propose a selection model for estimating parameters in the normal random effects model with nonignorable missing responses and covariates. Parameters are estimated via maximum likelihood using the Gibbs sampler and a Monte Carlo expectation maximization (EM) algorithm. Standard errors are calculated using the bootstrap. The method allows for nonmonotone patterns of missing data in both the response variable and the covariates. We model the missing data mechanism and the missing covariate distribution via a sequence of one-dimensional conditional distributions, allowing the missing covariates to be either categorical or continuous, as well as time-varying. We apply the proposed approach to the ECOG quality-of-life data and conduct a small simulation study evaluating the performance of the maximum likelihood estimates. Our results indicate that a patient treated with the vaccine has a higher QOL score on average at a given time point than a patient treated with high-dose interferon.  相似文献   

Lin H  Guo Z  Peduzzi PN  Gill TM  Allore HG 《Biometrics》2008,64(4):1032-1042
SUMMARY: We propose a general multistate transition model. The model is developed for the analysis of repeated episodes of multiple states representing different health status. Transitions among multiple states are modeled jointly using multivariate latent traits with factor loadings. Different types of state transition are described by flexible transition-specific nonparametric baseline intensities. A state-specific latent trait is used to capture individual tendency of the sojourn in the state that cannot be explained by covariates and to account for correlation among repeated sojourns in the same state within an individual. Correlation among sojourns across different states within an individual is accounted for by the correlation between the different latent traits. The factor loadings for a latent trait accommodate the dependence of the transitions to different competing states from a same state. We obtain the semiparametric maximum likelihood estimates through an expectation-maximization (EM) algorithm. The method is illustrated by studying repeated transitions between independence and disability states of activities of daily living (ADL) with death as an absorbing state in a longitudinal aging study. The performance of the estimation procedure is assessed by simulation studies.  相似文献   

Titman AC 《Biometrics》2011,67(3):780-787
Methods for fitting nonhomogeneous Markov models to panel-observed data using direct numerical solution to the Kolmogorov Forward equations are developed. Nonhomogeneous Markov models occur most commonly when baseline transition intensities depend on calendar time, but may also occur with deterministic time-dependent covariates such as age. We propose transition intensities based on B-splines as a smooth alternative to piecewise constant intensities and also as a generalization of time transformation models. An expansion of the system of differential equations allows first derivatives of the likelihood to be obtained, which can be used in a Fisher scoring algorithm for maximum likelihood estimation. The method is evaluated through a small simulation study and demonstrated on data relating to the development of cardiac allograft vasculopathy in posttransplantation patients.  相似文献   

Horton NJ  Laird NM 《Biometrics》2001,57(1):34-42
This article presents a new method for maximum likelihood estimation of logistic regression models with incomplete covariate data where auxiliary information is available. This auxiliary information is extraneous to the regression model of interest but predictive of the covariate with missing data. Ibrahim (1990, Journal of the American Statistical Association 85, 765-769) provides a general method for estimating generalized linear regression models with missing covariates using the EM algorithm that is easily implemented when there is no auxiliary data. Vach (1997, Statistics in Medicine 16, 57-72) describes how the method can be extended when the outcome and auxiliary data are conditionally independent given the covariates in the model. The method allows the incorporation of auxiliary data without making the conditional independence assumption. We suggest tests of conditional independence and compare the performance of several estimators in an example concerning mental health service utilization in children. Using an artificial dataset, we compare the performance of several estimators when auxiliary data are available.  相似文献   

We propose a novel connectionist method for the use of different feature sets in pattern classification. Unlike traditional methods, e.g., combination of multiple classifiers and use of a composite feature set, our method copes with the problem based on an idea of soft competition on different feature sets developed in our earlier work. An alternative modular neural network architecture is proposed to provide a more effective implementation of soft competition on different feature sets. The proposed architecture is interpreted as a generalized finite mixture model and, therefore, parameter estimation is treated as a maximum likelihood problem. An EM algorithm is derived for parameter estimation and, moreover, a model selection method is proposed to fit the proposed architecture to a specific problem. Comparative results are presented for the real world problem of speaker identification.  相似文献   

We consider the problem of estimating segregation ratios in families based on ascertainment through affected children, formulate it as an incomplete problem and work out the EM algorithm for maximum likelihood estimation of segregation ratios. We treat both the cases of known and unknown ascertainment probability. We also derive expressions for the covariance matrix of the estimators suitable for computing along with the EM algorithm. We illustrate the method with an example, compare the computational effort with that required in using the scoring method and argue that the EM algorithm is simpler.  相似文献   

Standard optimization algorithms for maximizing likelihood may not be applicable to the estimation of those flexible multivariable models that are nonlinear in their parameters. For applications where the model's structure permits separating estimation of mutually exclusive subsets of parameters into distinct steps, we propose the alternating conditional estimation (ACE) algorithm. We validate the algorithm, in simulations, for estimation of two flexible extensions of Cox's proportional hazards model where the standard maximum partial likelihood estimation does not apply, with simultaneous modeling of (1) nonlinear and time‐dependent effects of continuous covariates on the hazard, and (2) nonlinear interaction and main effects of the same variable. We also apply the algorithm in real‐life analyses to estimate nonlinear and time‐dependent effects of prognostic factors for mortality in colon cancer. Analyses of both simulated and real‐life data illustrate good statistical properties of the ACE algorithm and its ability to yield new potentially useful insights about the data structure.  相似文献   

A central task in the study of molecular evolution is the reconstruction of a phylogenetic tree from sequences of current-day taxa. The most established approach to tree reconstruction is maximum likelihood (ML) analysis. Unfortunately, searching for the maximum likelihood phylogenetic tree is computationally prohibitive for large data sets. In this paper, we describe a new algorithm that uses Structural Expectation Maximization (EM) for learning maximum likelihood phylogenetic trees. This algorithm is similar to the standard EM method for edge-length estimation, except that during iterations of the Structural EM algorithm the topology is improved as well as the edge length. Our algorithm performs iterations of two steps. In the E-step, we use the current tree topology and edge lengths to compute expected sufficient statistics, which summarize the data. In the M-Step, we search for a topology that maximizes the likelihood with respect to these expected sufficient statistics. We show that searching for better topologies inside the M-step can be done efficiently, as opposed to standard methods for topology search. We prove that each iteration of this procedure increases the likelihood of the topology, and thus the procedure must converge. This convergence point, however, can be a suboptimal one. To escape from such "local optima," we further enhance our basic EM procedure by incorporating moves in the flavor of simulated annealing. We evaluate these new algorithms on both synthetic and real sequence data and show that for protein sequences even our basic algorithm finds more plausible trees than existing methods for searching maximum likelihood phylogenies. Furthermore, our algorithms are dramatically faster than such methods, enabling, for the first time, phylogenetic analysis of large protein data sets in the maximum likelihood framework.  相似文献   

Roy J  Lin X 《Biometrics》2005,61(3):837-846
We consider estimation in generalized linear mixed models (GLMM) for longitudinal data with informative dropouts. At the time a unit drops out, time-varying covariates are often unobserved in addition to the missing outcome. However, existing informative dropout models typically require covariates to be completely observed. This assumption is not realistic in the presence of time-varying covariates. In this article, we first study the asymptotic bias that would result from applying existing methods, where missing time-varying covariates are handled using naive approaches, which include: (1) using only baseline values; (2) carrying forward the last observation; and (3) assuming the missing data are ignorable. Our asymptotic bias analysis shows that these naive approaches yield inconsistent estimators of model parameters. We next propose a selection/transition model that allows covariates to be missing in addition to the outcome variable at the time of dropout. The EM algorithm is used for inference in the proposed model. Data from a longitudinal study of human immunodeficiency virus (HIV)-infected women are used to illustrate the methodology.  相似文献   

Many different methods for evaluating diagnostic test results in the absence of a gold standard have been proposed. In this paper, we discuss how one common method, a maximum likelihood estimate for a latent class model found via the Expectation-Maximization (EM) algorithm can be applied to longitudinal data where test sensitivity changes over time. We also propose two simplified and nonparametric methods which use data-based indicator variables for disease status and compare their accuracy to the maximum likelihood estimation (MLE) results. We find that with high specificity tests, the performance of simpler approximations may be just as high as the MLE.  相似文献   

Huiping Xu  Bruce A. Craig 《Biometrics》2009,65(4):1145-1155
Summary Traditional latent class modeling has been widely applied to assess the accuracy of dichotomous diagnostic tests. These models, however, assume that the tests are independent conditional on the true disease status, which is rarely valid in practice. Alternative models using probit analysis have been proposed to incorporate dependence among tests, but these models consider restricted correlation structures. In this article, we propose a probit latent class model that allows a general correlation structure. When combined with some helpful diagnostics, this model provides a more flexible framework from which to evaluate the correlation structure and model fit. Our model encompasses several other PLC models but uses a parameter‐expanded Monte Carlo EM algorithm to obtain the maximum‐likelihood estimates. The parameter‐expanded EM algorithm was designed to accelerate the convergence rate of the EM algorithm by expanding the complete‐data model to include a larger set of parameters and it ensures a simple solution in fitting the PLC model. We demonstrate our estimation and model selection methods using a simulation study and two published medical studies.  相似文献   

This article focuses on parameter estimation of multilevel nonlinearmixed-effects models (MNLMEMs). These models are used to analyzedata presenting multiple hierarchical levels of grouping (clusterdata, clinical trials with several observation periods, ...).The variability of the individual parameters of the regressionfunction is thus decomposed as a between-subject variabilityand higher levels of variability (e.g. within-subject variability).We propose maximum likelihood estimates of parameters of thoseMNLMEMs with 2 levels of random effects, using an extensionof the stochastic approximation version of expectation–maximization(SAEM)–Monte Carlo Markov chain algorithm. The extendedSAEM algorithm is split into an explicit direct expectation–maximization(EM) algorithm and a stochastic EM part. Compared to the originalalgorithm, additional sufficient statistics have to be approximatedby relying on the conditional distribution of the second levelof random effects. This estimation method is evaluated on pharmacokineticcrossover simulated trials, mimicking theophylline concentrationdata. Results obtained on those data sets with either the SAEMalgorithm or the first-order conditional estimates (FOCE) algorithm(implemented in the nlme function of R software) are compared:biases and root mean square errors of almost all the SAEM estimatesare smaller than the FOCE ones. Finally, we apply the extendedSAEM algorithm to analyze the pharmacokinetic interaction oftenofovir on atazanavir, a novel protease inhibitor, from theAgence Nationale de Recherche sur le Sida 107-Puzzle 2 study.A significant decrease of the area under the curve of atazanaviris found in patients receiving both treatments.  相似文献   

Hsieh F  Tseng YK  Wang JL 《Biometrics》2006,62(4):1037-1043
The maximum likelihood approach to jointly model the survival time and its longitudinal covariates has been successful to model both processes in longitudinal studies. Random effects in the longitudinal process are often used to model the survival times through a proportional hazards model, and this invokes an EM algorithm to search for the maximum likelihood estimates (MLEs). Several intriguing issues are examined here, including the robustness of the MLEs against departure from the normal random effects assumption, and difficulties with the profile likelihood approach to provide reliable estimates for the standard error of the MLEs. We provide insights into the robustness property and suggest to overcome the difficulty of reliable estimates for the standard errors by using bootstrap procedures. Numerical studies and data analysis illustrate our points.  相似文献   

In many biometrical applications, the count data encountered often contain extra zeros relative to the Poisson distribution. Zero‐inflated Poisson regression models are useful for analyzing such data, but parameter estimates may be seriously biased if the nonzero observations are over‐dispersed and simultaneously correlated due to the sampling design or the data collection procedure. In this paper, a zero‐inflated negative binomial mixed regression model is presented to analyze a set of pancreas disorder length of stay (LOS) data that comprised mainly same‐day separations. Random effects are introduced to account for inter‐hospital variations and the dependency of clustered LOS observations. Parameter estimation is achieved by maximizing an appropriate log‐likelihood function using an EM algorithm. Alternative modeling strategies, namely the finite mixture of Poisson distributions and the non‐parametric maximum likelihood approach, are also considered. The determination of pertinent covariates would assist hospital administrators and clinicians to manage LOS and expenditures efficiently.  相似文献   

Maximum-likelihood approaches to phylogenetic estimation have the potential of great flexibility, even though current implementations are highly constrained. One such constraint has been the limitation to one-parameter models of substitution. A general implementation of Newton's maximization procedure was developed that allows the maximum likelihood method to be used with multiparameter models. The Estimate and Maximize (EM) algorithm was also used to obtain a good approximation to the maximum likelihood for a certain class of multiparameter models. The condition for which a multiparameter model will only have a single maximum on the likelihood surface was identified. Two-and three-parameter models of substitution in base-paired regions of RNA sequences were used as examples for computer simulations to show that these implementations of the maximum likelihood method are not substantially slower than one-parameter models. Newton's method is much faster than the EM method but may be subject to divergence in some circumstances. In these cases the EM method can be used to restore convergence.  相似文献   

S G Baker 《Biometrics》1990,46(4):1193-7, Discussion 1198-200
A simple EM algorithm is proposed for obtaining maximum likelihood estimates when fitting a loglinear model to data from k capture-recapture samples with categorical covariates. The method is used to analyze data on screening for the early detection of breast cancer.  相似文献   

EM算法是在不完全信息资料下实现参数极大似然估计的一种通用方法.本文导出了双位点不同标记类型,包括共显性-共显性,共显性-显性和显性-显性3种模式下,估计遗传重组率的EM算法,以及获得重组率抽样方差的Bootstrap方法;并将之推广到部分个体缺失标记基因型(未检测到电泳谱带)下的重组率估计.通过大量Monte Carlo模拟研究发现: (1)连锁紧密时,样本容量对重组率的估计影响不大;连锁松散时,需要较大样本容量才可检测到连锁以及实现重组率的较精确估计.(2)用包含缺失标记的所有个体估计重组率比仅用其中的非缺失标记个体估计更准确,且可显著提高连锁检测的统计功效.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号