首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
We consider models for hierarchical count data, subject to overdispersion and/or excess zeros. Molenberghs et al. ( 2007 ) and Molenberghs et al. ( 2010 ) extend the Poisson‐normal generalized linear‐mixed model by including gamma random effects to accommodate overdispersion. Excess zeros are handled using either a zero‐inflation or a hurdle component. These models were studied by Kassahun et al. ( 2014 ). While flexible, they are quite elaborate in parametric specification and therefore model assessment is imperative. We derive local influence measures to detect and examine influential subjects, that is subjects who have undue influence on either the fit of the model as a whole, or on specific important sub‐vectors of the parameter vector. The latter include the fixed effects for the Poisson and for the excess‐zeros components, the variance components for the normal random effects, and the parameters describing gamma random effects, included to accommodate overdispersion. Interpretable influence components are derived. The method is applied to data from a longitudinal clinical trial involving patients with epileptic seizures. Even though the data were extensively analyzed in earlier work, the insight gained from the proposed diagnostics, statistically and clinically, is considerable. Possibly, a small but important subgroup of patients has been identified.  相似文献   

2.
Ghosh S  Gelfand AE  Zhu K  Clark JS 《Biometrics》2012,68(3):878-885
Summary Many applications involve count data from a process that yields an excess number of zeros. Zero-inflated count models, in particular, zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB) models, along with Poisson hurdle models, are commonly used to address this problem. However, these models struggle to explain extreme incidence of zeros (say more than 80%), especially to find important covariates. In fact, the ZIP may struggle even when the proportion is not extreme. To redress this problem we propose the class of k-ZIG models. These models allow more flexible modeling of both the zero-inflation and the nonzero counts, allowing interplay between these two components. We develop the properties of this new class of models, including reparameterization to a natural link function. The models are straightforwardly fitted within a Bayesian framework. The methodology is illustrated with simulated data examples as well as a forest seedling dataset obtained from the USDA Forest Service's Forest Inventory and Analysis program.  相似文献   

3.
Hall DB 《Biometrics》2000,56(4):1030-1039
In a 1992 Technometrics paper, Lambert (1992, 34, 1-14) described zero-inflated Poisson (ZIP) regression, a class of models for count data with excess zeros. In a ZIP model, a count response variable is assumed to be distributed as a mixture of a Poisson(lambda) distribution and a distribution with point mass of one at zero, with mixing probability p. Both p and lambda are allowed to depend on covariates through canonical link generalized linear models. In this paper, we adapt Lambert's methodology to an upper bounded count situation, thereby obtaining a zero-inflated binomial (ZIB) model. In addition, we add to the flexibility of these fixed effects models by incorporating random effects so that, e.g., the within-subject correlation and between-subject heterogeneity typical of repeated measures data can be accommodated. We motivate, develop, and illustrate the methods described here with an example from horticulture, where both upper bounded count (binomial-type) and unbounded count (Poisson-type) data with excess zeros were collected in a repeated measures designed experiment.  相似文献   

4.
Count data are very common in health services research, and very commonly the basic Poisson regression model has to be extended in several ways to accommodate several sources of heterogeneity: (i) an excess number of zeros relative to a Poisson distribution, (ii) hierarchical structures, and correlated data, (iii) remaining “unexplained” sources of overdispersion. In this paper, we propose hierarchical zero‐inflated and overdispersed models with independent, correlated, and shared random effects for both components of the mixture model. We show that all different extensions of the Poisson model can be based on the concept of mixture models, and that they can be combined to account for all different sources of heterogeneity. Expressions for the first two moments are derived and discussed. The models are applied to data on maternal deaths and related risk factors within health facilities in Mozambique. The final model shows that the maternal mortality rate mainly depends on the geographical location of the health facility, the percentage of women admitted with HIV and the percentage of referrals from the health facility.  相似文献   

5.
The statistical modelling of count data permeates the discipline of ecology. Such data often exhibit overdispersion compared with a standard Poisson distribution, so that the variance of the counts is greater than that of the mean. Whereas modelling to reveal the effects of explanatory variables on the mean is commonplace, overdispersion is generally regarded as a nuisance parameter to be accounted for and subsequently ignored. Instead, we propose a method that models the overdispersion as a biologically interesting property of a data set and show how novel inference is provided as a result. We adapted the double hierarchical generalized linear model approach to create an easily extendible model structure that quantifies the influence of explanatory variables on the overdispersion of count data, and apply it to farmland birds. These data were from a study within Irish agricultural ecosystems, in which total bird species abundance and the abundance of farmland indicator species were compared on dairy and non‐dairy farms in the winter and breeding seasons. In general, overdispersion in bird counts was greater on dairy farms than on non‐dairy farms, and for total bird numbers, overdispersion was greatest on dairy farms in winter. Our code is fitted using the Bayesian package Rstan, and we make all code and data available in a GitHub repository. Within a Bayesian framework, this approach facilitates a meaningful quantification of the effects of categorical explanatory variables on any response variable with a tendency to overdispersion that has a meaningful biological or ecological explanation.  相似文献   

6.
7.
When analyzing Poisson count data sometimes a high frequency of extra zeros is observed. The Zero‐Inflated Poisson (ZIP) model is a popular approach to handle zero‐inflation. In this paper we generalize the ZIP model and its regression counterpart to accommodate the extent of individual exposure. Empirical evidence drawn from an occupational injury data set confirms that the incorporation of exposure information can exert a substantial impact on the model fit. Tests for zero‐inflation are also considered. Their finite sample properties are examined in a Monte Carlo study.  相似文献   

8.
This paper reviews the generalized Poisson regression model, the restricted generalized Poisson regression model and the mixed Poisson regression (negative binomial regression and Poisson inverse Gaussian regression) models which can be used for regression analysis of counts. The aim of this study is to demonstrate the quasi likelihood/moment method, which is used for estimation of the parameters of mixed Poisson regression models, also applicable to obtain the estimates of the parameters of the generalized Poisson regression and the restricted generalized Poisson regression models. Besides, at the end of this study an application related to this method for zoological data is given.  相似文献   

9.
10.
Two-part regression models are frequently used to analyze longitudinal count data with excess zeros, where the same set of subjects is repeatedly observed over time. In this context, several sources of heterogeneity may arise at individual level that affect the observed process. Further, longitudinal studies often suffer from missing values: individuals dropout of the study before its completion, and thus present incomplete data records. In this paper, we propose a finite mixture of hurdle models to face the heterogeneity problem, which is handled by introducing random effects with a discrete distribution; a pattern-mixture approach is specified to deal with non-ignorable missing values. This approach helps us to consider overdispersed counts, while allowing for association between the two parts of the model, and for non-ignorable dropouts. The effectiveness of the proposal is tested through a simulation study. Finally, an application to real data on skin cancer is provided.  相似文献   

11.
Dark spots in the fleece area are often associated with dark fibres in wool, which limits its competitiveness with other textile fibres. Field data from a sheep experiment in Uruguay revealed an excess number of zeros for dark spots. We compared the performance of four Poisson and zero-inflated Poisson (ZIP) models under four simulation scenarios. All models performed reasonably well under the same scenario for which the data were simulated. The deviance information criterion favoured a Poisson model with residual, while the ZIP model with a residual gave estimates closer to their true values under all simulation scenarios. Both Poisson and ZIP models with an error term at the regression level performed better than their counterparts without such an error. Field data from Corriedale sheep were analysed with Poisson and ZIP models with residuals. Parameter estimates were similar for both models. Although the posterior distribution of the sire variance was skewed due to a small number of rams in the dataset, the median of this variance suggested a scope for genetic selection. The main environmental factor was the age of the sheep at shearing. In summary, age related processes seem to drive the number of dark spots in this breed of sheep.  相似文献   

12.
Count data sets are traditionally analyzed using the ordinary Poisson distribution. However, such a model has its applicability limited as it can be somewhat restrictive to handle specific data structures. In this case, it arises the need for obtaining alternative models that accommodate, for example, (a) zero‐modification (inflation or deflation at the frequency of zeros), (b) overdispersion, and (c) individual heterogeneity arising from clustering or repeated (correlated) measurements made on the same subject. Cases (a)–(b) and (b)–(c) are often treated together in the statistical literature with several practical applications, but models supporting all at once are less common. Hence, this paper's primary goal was to jointly address these issues by deriving a mixed‐effects regression model based on the hurdle version of the Poisson–Lindley distribution. In this framework, the zero‐modification is incorporated by assuming that a binary probability model determines which outcomes are zero‐valued, and a zero‐truncated process is responsible for generating positive observations. Approximate posterior inferences for the model parameters were obtained from a fully Bayesian approach based on the Adaptive Metropolis algorithm. Intensive Monte Carlo simulation studies were performed to assess the empirical properties of the Bayesian estimators. The proposed model was considered for the analysis of a real data set, and its competitiveness regarding some well‐established mixed‐effects models for count data was evaluated. A sensitivity analysis to detect observations that may impact parameter estimates was performed based on standard divergence measures. The Bayesian ‐value and the randomized quantile residuals were considered for model diagnostics.  相似文献   

13.
This paper discusses a two‐state hidden Markov Poisson regression (MPR) model for analyzing longitudinal data of epileptic seizure counts, which allows for the rate of the Poisson process to depend on covariates through an exponential link function and to change according to the states of a two‐state Markov chain with its transition probabilities associated with covariates through a logit link function. This paper also considers a two‐state hidden Markov negative binomial regression (MNBR) model, as an alternative, by using the negative binomial instead of Poisson distribution in the proposed MPR model when there exists extra‐Poisson variation conditional on the states of the Markov chain. The two proposed models in this paper relax the stationary requirement of the Markov chain, allow for overdispersion relative to the usual Poisson regression model and for correlation between repeated observations. The proposed methodology provides a plausible analysis for the longitudinal data of epileptic seizure counts, and the MNBR model fits the data much better than the MPR model. Maximum likelihood estimation using the EM and quasi‐Newton algorithms is discussed. A Monte Carlo study for the proposed MPR model investigates the reliability of the estimation method, the choice of probabilities for the initial states of the Markov chain, and some finite sample behaviors of the maximum likelihood estimates, suggesting that (1) the estimation method is accurate and reliable as long as the total number of observations is reasonably large, and (2) the choice of probabilities for the initial states of the Markov process has little impact on the parameter estimates.  相似文献   

14.
For time series of count data, correlated measurements, clustering as well as excessive zeros occur simultaneously in biomedical applications. Ignoring such effects might contribute to misleading treatment outcomes. A generalized mixture Poisson geometric process (GMPGP) model and a zero‐altered mixture Poisson geometric process (ZMPGP) model are developed from the geometric process model, which was originally developed for modelling positive continuous data and was extended to handle count data. These models are motivated by evaluating the trend development of new tumour counts for bladder cancer patients as well as by identifying useful covariates which affect the count level. The models are implemented using Bayesian method with Markov chain Monte Carlo (MCMC) algorithms and are assessed using deviance information criterion (DIC).  相似文献   

15.
Analysis of longitudinal data with excessive zeros has gained increasing attention in recent years; however, current approaches to the analysis of longitudinal data with excessive zeros have primarily focused on balanced data. Dropouts are common in longitudinal studies; therefore, the analysis of the resulting unbalanced data is complicated by the missing mechanism. Our study is motivated by the analysis of longitudinal skin cancer count data presented by Greenberg, Baron, Stukel, Stevens, Mandel, Spencer, Elias, Lowe, Nierenberg, Bayrd, Vance, Freeman, Clendenning, Kwan, and the Skin Cancer Prevention Study Group[New England Journal of Medicine 323 , 789–795]. The data consist of a large number of zero responses (83% of the observations) as well as a substantial amount of dropout (about 52% of the observations). To account for both excessive zeros and dropout patterns, we propose a pattern‐mixture zero‐inflated model with compound Poisson random effects for the unbalanced longitudinal skin cancer data. We also incorporate an autoregressive of order 1 correlation structure in the model to capture longitudinal correlation of the count responses. A quasi‐likelihood approach has been developed in the estimation of our model. We illustrated the method with analysis of the longitudinal skin cancer data.  相似文献   

16.
On occasion, generalized linear models for counts based on Poisson or overdispersed count distributions may encounter lack of fit due to disproportionately large frequencies of zeros. Three alternative types of regression models that utilize all the information and explicitly account for excess zeros are examined and given general formulations. A simple mechanism for added zeros is assumed that directly motivates one type of model, here called the added-zero type, particular forms of which have been proposed independently by D. LAMBERT (1992) and in unpublished work by the author. An original regression formulation (the zero-altered model) is presented as a reduced form of the two-part model for count data, which is also discussed. It is suggested that two-part models be used to aid in development of an added-zero model when the latter is thought to be appropriate.  相似文献   

17.
Bayesian hierarchical models usually model the risk surface on the same arbitrary geographical units for all data sources. Poisson/gamma random field models overcome this restriction as the underlying risk surface can be specified independently to the resolution of the data. Moreover, covariates may be considered as either excess or relative risk factors. We compare the performance of the Poisson/gamma random field model to the Markov random field (MRF)‐based ecologic regression model and the Bayesian Detection of Clusters and Discontinuities (BDCD) model, in both a simulation study and a real data example. We find the BDCD model to have advantages in situations dominated by abruptly changing risk while the Poisson/gamma random field model convinces by its flexibility in the estimation of random field structures and by its flexibility incorporating covariates. The MRF‐based ecologic regression model is inferior. WinBUGS code for Poisson/gamma random field models is provided.  相似文献   

18.
Bivariate time series of counts with excess zeros relative to the Poisson process are common in many bioscience applications. Failure to account for the extra zeros in the analysis may result in biased parameter estimates and misleading inferences. A class of bivariate zero-inflated Poisson autoregression models is presented to accommodate the zero-inflation and the inherent serial dependency between successive observations. An autoregressive correlation structure is assumed in the random component of the compound regression model. Parameter estimation is achieved via an EM algorithm, by maximizing an appropriate log-likelihood function to obtain residual maximum likelihood estimates. The proposed method is applied to analyze a bivariate series from an occupational health study, in which the zero-inflated injury count events are classified as either musculoskeletal or non-musculoskeletal in nature. The approach enables the evaluation of the effectiveness of a participatory ergonomics intervention at the population level, in terms of reducing the overall incidence of lost-time injury and a simultaneous decline in the two mean injury rates.  相似文献   

19.
In this paper, our aim is to analyze geographical and temporal variability of disease incidence when spatio‐temporal count data have excess zeros. To that end, we consider random effects in zero‐inflated Poisson models to investigate geographical and temporal patterns of disease incidence. Spatio‐temporal models that employ conditionally autoregressive smoothing across the spatial dimension and B‐spline smoothing over the temporal dimension are proposed. The analysis of these complex models is computationally difficult from the frequentist perspective. On the other hand, the advent of the Markov chain Monte Carlo algorithm has made the Bayesian analysis of complex models computationally convenient. Recently developed data cloning method provides a frequentist approach to mixed models that is also computationally convenient. We propose to use data cloning, which yields to maximum likelihood estimation, to conduct frequentist analysis of zero‐inflated spatio‐temporal modeling of disease incidence. One of the advantages of the data cloning approach is that the prediction and corresponding standard errors (or prediction intervals) of smoothing disease incidence over space and time is easily obtained. We illustrate our approach using a real dataset of monthly children asthma visits to hospital in the province of Manitoba, Canada, during the period April 2006 to March 2010. Performance of our approach is also evaluated through a simulation study.  相似文献   

20.
Recently, although advances were made on modeling multivariate count data, existing models really has several limitations: (i) The multivariate Poisson log‐normal model (Aitchison and Ho, 1989) cannot be used to fit multivariate count data with excess zero‐vectors; (ii) The multivariate zero‐inflated Poisson (ZIP) distribution (Li et al., 1999) cannot be used to model zero‐truncated/deflated count data and it is difficult to apply to high‐dimensional cases; (iii) The Type I multivariate zero‐adjusted Poisson (ZAP) distribution (Tian et al., 2017) could only model multivariate count data with a special correlation structure for random components that are all positive or negative. In this paper, we first introduce a new multivariate ZAP distribution, based on a multivariate Poisson distribution, which allows the correlations between components with a more flexible dependency structure, that is some of the correlation coefficients could be positive while others could be negative. We then develop its important distributional properties, and provide efficient statistical inference methods for multivariate ZAP model with or without covariates. Two real data examples in biomedicine are used to illustrate the proposed methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号