A serially correlated gamma frailty model for longitudinal count data   总被引:3,自引:0,他引:3  

Zhang Y  Jamshidian M 《Biometrics》2003,59(4):1099-1106
In this article, we study nonparametric estimation of the mean function of a counting process with panel observations. We introduce the gamma frailty variable to account for the intracorrelation between the panel counts of the counting process and construct a maximum pseudo-likelihood estimate with the frailty variable. Three simulated examples are given to show that this estimation procedure, while preserving the robustness and simplicity of the computation, improves the efficiency of the nonparametric maximum pseudo-likelihood estimate studied in Wellner and Zhang (2000, Annals of Statistics 28, 779-814). A real example from a bladder tumor study is used to illustrate the method.  相似文献   

A state space model for multivariate longitudinal count data   总被引:1,自引:0,他引:1  

We propose a likelihood-based model for correlated count data that display under- or overdispersion within units (e.g. subjects). The model is capable of handling correlation due to clustering and/or serial correlation, in the presence of unbalanced, missing or unequally spaced data. A family of distributions based on birth-event processes is used to model within-subject underdispersion. A computational approach is given to overcome a parameterization difficulty with this family, and this allows use of common Markov Chain Monte Carlo software (e.g. WinBUGS) for estimation. Application of the model to daily counts of asthma inhaler use by children shows substantial within-subject underdispersion, between-subject heterogeneity and correlation due to both clustering of measurements within subjects and serial correlation of longitudinal measurements. The model provides a major improvement over Poisson longitudinal models, and diagnostics show that the model fits well.  相似文献   

This paper presents new methods, using a Bayesian approach, for analyzing longitudinal count data with excess zeros and nonlinear effects of continuously valued covariates. In longitudinal count data there are many problems that can make the use of a zero-inflated Poisson (ZIP) model ineffective. These problems are unobserved heterogeneity and nonlinear effects of continuously valued covariates. Our proposed semiparametric model can simultaneously handle these problems in a unified framework. The framework accounts for heterogeneity by incorporating random effects and has two components. The parametric component of the model which deals with the linear effects of time invariant covariates and the non-parametric component which gives an arbitrary smooth function to model the effect of time or time-varying covariates on the logarithm of mean count. The proposed methods are illustrated by analyzing longitudinal count data on the assessment of an efficacy of pesticides in controlling the reproduction of whitefly.  相似文献   

Some covariance models for longitudinal count data with overdispersion   总被引:9,自引:0,他引:9  
P F Thall  S C Vail 《Biometrics》1990,46(3):657-671
A family of covariance models for longitudinal counts with predictive covariates is presented. These models account for overdispersion, heteroscedasticity, and dependence among repeated observations. The approach is a quasi-likelihood regression similar to the formulation given by Liang and Zeger (1986, Biometrika 73, 13-22). Generalized estimating equations for both the covariate parameters and the variance-covariance parameters are presented. Large-sample properties of the parameter estimates are derived. The proposed methods are illustrated by an analysis of epileptic seizure count data arising from a study of progabide as an adjuvant therapy for partial seizures.  相似文献   

P F Thall 《Biometrics》1988,44(1):197-209
In many longitudinal studies it is desired to estimate and test the rate over time of a particular recurrent event. Often only the event counts corresponding to the elapsed time intervals between each subject's successive observation times, and baseline covariate data, are available. The intervals may vary substantially in length and number between subjects, so that the corresponding vectors of counts are not directly comparable. A family of Poisson likelihood regression models incorporating a mixed random multiplicative component in the rate function of each subject is proposed for this longitudinal data structure. A related empirical Bayes estimate of random-effect parameters is also described. These methods are illustrated by an analysis of dyspepsia data from the National Cooperative Gallstone Study.  相似文献   

We propose a state space model for analyzing equally or unequally spaced longitudinal count data with serial correlation. With a log link function, the mean of the Poisson response variable is a nonlinear function of the fixed and random effects. The random effects are assumed to be generated from a Gaussian first order autoregression (AR(1)). In this case, the mean of the observations has a log normal distribution. We use a combination of linear and nonlinear methods to take advantage of the Gaussian process embedded in a nonlinear function. The state space model uses a modified Kalman filter recursion to estimate the mean and variance of the AR(1) random error given the previous observations. The marginal likelihood is approximated by numerically integrating out the AR(1) random error. Simulation studies with different sets of parameters show that the state space model performs well. The model is applied to Epileptic Seizure data and Primary Care Visits Data. Missing and unequally spaced observations are handled naturally with this model.  相似文献   

Overdispersed count data are very common in ecology. The negative binomial model has been used widely to represent such data. Ecological data often vary considerably, and traditional approaches are likely to be inefficient or incorrect due to underestimation of uncertainty and poor predictive power. We propose a new statistical model to account for excessive overdisperson. It is the combination of two negative binomial models, where the first determines the number of clusters and the second the number of individuals in each cluster. Simulations show that this model often performs better than the negative binomial model. This model also fitted catch and effort data for southern bluefin tuna better than other models according to AIC. A model that explicitly and properly accounts for overdispersion should contribute to robust management and conservation for wildlife and plants.  相似文献   

Cognition is not directly measurable. It is assessed using psychometric tests, which can be viewed as quantitative measures of cognition with error. The aim of this article is to propose a model to describe the evolution in continuous time of unobserved cognition in the elderly and assess the impact of covariates directly on it. The latent cognitive process is defined using a linear mixed model including a Brownian motion and time-dependent covariates. The observed psychometric tests are considered as the results of parameterized nonlinear transformations of the latent cognitive process at discrete occasions. Estimation of the parameters contained both in the transformations and in the linear mixed model is achieved by maximizing the observed likelihood and graphical methods are performed to assess the goodness of fit of the model. The method is applied to data from PAQUID, a French prospective cohort study of ageing.  相似文献   



High-throughput sequencing, such as ribonucleic acid sequencing (RNA-seq) and chromatin immunoprecipitation sequencing (ChIP-seq) analyses, enables various features of organisms to be compared through tag counts. Recent studies have demonstrated that the normalization step for RNA-seq data is critical for a more accurate subsequent analysis of differential gene expression. Development of a more robust normalization method is desirable for identifying the true difference in tag count data.


We describe a strategy for normalizing tag count data, focusing on RNA-seq. The key concept is to remove data assigned as potential differentially expressed genes (DEGs) before calculating the normalization factor. Several R packages for identifying DEGs are currently available, and each package uses its own normalization method and gene ranking algorithm. We compared a total of eight package combinations: four R packages (edgeR, DESeq, baySeq, and NBPSeq) with their default normalization settings and with our normalization strategy. Many synthetic datasets under various scenarios were evaluated on the basis of the area under the curve (AUC) as a measure for both sensitivity and specificity. We found that packages using our strategy in the data normalization step overall performed well. This result was also observed for a real experimental dataset.


Our results showed that the elimination of potential DEGs is essential for more accurate normalization of RNA-seq data. The concept of this normalization strategy can widely be applied to other types of tag count data and to microarray data.  相似文献   

In this paper, the panel count data analysis for recurrent events is considered. Such analysis is useful for studying tumor or infection recurrences in both clinical trial and observational studies. A bivariate Gaussian Cox process model is proposed to jointly model the observation process and the recurrent event process. Bayesian nonparametric inference is proposed for simultaneously estimating regression parameters, bivariate frailty effects, and baseline intensity functions. Inference is done through Markov chain Monte Carlo, with fully developed computational techniques. Predictive inference is also discussed under the Bayesian setting. The proposed method is shown to be efficient via simulation studies. A clinical trial dataset on skin cancer patients is analyzed to illustrate the proposed approach.  相似文献   

Albert PS 《Biometrics》1999,55(4):1252-1257
Studies of chronic disease often focus on estimating prevalence and incidence in which the presence of active disease is based on dichotomizing a continuous marker variable measured with error. Examples include hypertension, asthma, and depression, where active disease is defined by setting a threshold on a continuous measure of blood pressure, respiratory function, and mood, respectively. This paper proposes a model for inference about prevalence and incidence when active disease is determined by dichotomizing a continuous marker variable in a population-based study. In this formulation, it is postulated that there are three groups of people, those that are not susceptible to the disease, those who are always in the disease state, and those who have the potential to transition between the disease and the disease-free states over time. The model is used to estimate the prevalence and incidence of the disease in the population while accounting for measurement error in the marker. An EM algorithm is used for parameter estimation and the methodology is illustrated on Framingham heart study hypertension data. A simulation study is conducted in order to demonstrate the importance of accounting for measurement error in estimating prevalence and incidence for this example.  相似文献   

Semiparametric regression for count data   总被引:3,自引:0,他引:3  

Analog forecasting is a mechanism‐free nonlinear method that forecasts a system forward in time by examining how past states deemed similar to the current state moved forward. Previous applications of analog forecasting has been successful at producing robust forecasts for a variety of ecological and physical processes, but it has typically been presented in an empirical or heuristic procedure, rather than as a formal statistical model. The methodology presented here extends the model‐based analog method of McDermott and Wikle (Environmetrics, 27, 2016, 70) by placing analog forecasting within a fully hierarchical statistical framework that can accommodate count observations. Using a Bayesian approach, the hierarchical analog model is able to quantify rigorously the uncertainty associated with forecasts. Forecasting waterfowl settling patterns in the northwestern United States and Canada is conducted by applying the hierarchical analog model to a breeding population survey dataset. Sea surface temperature (SST) in the Pacific Ocean is used to help identify potential analogs for the waterfowl settling patterns.  相似文献   

A semiparametric pseudolikelihood estimation method for panel count data   总被引:1,自引:0,他引:1  
Zhang  Ying 《Biometrika》2002,89(1):39-48

