首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Hall DB 《Biometrics》2000,56(4):1030-1039
In a 1992 Technometrics paper, Lambert (1992, 34, 1-14) described zero-inflated Poisson (ZIP) regression, a class of models for count data with excess zeros. In a ZIP model, a count response variable is assumed to be distributed as a mixture of a Poisson(lambda) distribution and a distribution with point mass of one at zero, with mixing probability p. Both p and lambda are allowed to depend on covariates through canonical link generalized linear models. In this paper, we adapt Lambert's methodology to an upper bounded count situation, thereby obtaining a zero-inflated binomial (ZIB) model. In addition, we add to the flexibility of these fixed effects models by incorporating random effects so that, e.g., the within-subject correlation and between-subject heterogeneity typical of repeated measures data can be accommodated. We motivate, develop, and illustrate the methods described here with an example from horticulture, where both upper bounded count (binomial-type) and unbounded count (Poisson-type) data with excess zeros were collected in a repeated measures designed experiment.  相似文献   

2.
3.
4.
We analyze a real data set pertaining to reindeer fecal pellet‐group counts obtained from a survey conducted in a forest area in northern Sweden. In the data set, over 70% of counts are zeros, and there is high spatial correlation. We use conditionally autoregressive random effects for modeling of spatial correlation in a Poisson generalized linear mixed model (GLMM), quasi‐Poisson hierarchical generalized linear model (HGLM), zero‐inflated Poisson (ZIP), and hurdle models. The quasi‐Poisson HGLM allows for both under‐ and overdispersion with excessive zeros, while the ZIP and hurdle models allow only for overdispersion. In analyzing the real data set, we see that the quasi‐Poisson HGLMs can perform better than the other commonly used models, for example, ordinary Poisson HGLMs, spatial ZIP, and spatial hurdle models, and that the underdispersed Poisson HGLMs with spatial correlation fit the reindeer data best. We develop R codes for fitting these models using a unified algorithm for the HGLMs. Spatial count response with an extremely high proportion of zeros, and underdispersion can be successfully modeled using the quasi‐Poisson HGLM with spatial random effects.  相似文献   

5.
Dark spots in the fleece area are often associated with dark fibres in wool, which limits its competitiveness with other textile fibres. Field data from a sheep experiment in Uruguay revealed an excess number of zeros for dark spots. We compared the performance of four Poisson and zero-inflated Poisson (ZIP) models under four simulation scenarios. All models performed reasonably well under the same scenario for which the data were simulated. The deviance information criterion favoured a Poisson model with residual, while the ZIP model with a residual gave estimates closer to their true values under all simulation scenarios. Both Poisson and ZIP models with an error term at the regression level performed better than their counterparts without such an error. Field data from Corriedale sheep were analysed with Poisson and ZIP models with residuals. Parameter estimates were similar for both models. Although the posterior distribution of the sire variance was skewed due to a small number of rams in the dataset, the median of this variance suggested a scope for genetic selection. The main environmental factor was the age of the sheep at shearing. In summary, age related processes seem to drive the number of dark spots in this breed of sheep.  相似文献   

6.
This paper presents new methods, using a Bayesian approach, for analyzing longitudinal count data with excess zeros and nonlinear effects of continuously valued covariates. In longitudinal count data there are many problems that can make the use of a zero-inflated Poisson (ZIP) model ineffective. These problems are unobserved heterogeneity and nonlinear effects of continuously valued covariates. Our proposed semiparametric model can simultaneously handle these problems in a unified framework. The framework accounts for heterogeneity by incorporating random effects and has two components. The parametric component of the model which deals with the linear effects of time invariant covariates and the non-parametric component which gives an arbitrary smooth function to model the effect of time or time-varying covariates on the logarithm of mean count. The proposed methods are illustrated by analyzing longitudinal count data on the assessment of an efficacy of pesticides in controlling the reproduction of whitefly.  相似文献   

7.
Count data often exhibit more zeros than predicted by common count distributions like the Poisson or negative binomial. In recent years, there has been considerable interest in methods for analyzing zero-inflated count data in longitudinal or other correlated data settings. A common approach has been to extend zero-inflated Poisson models to include random effects that account for correlation among observations. However, these models have been shown to have a few drawbacks, including interpretability of regression coefficients and numerical instability of fitting algorithms even when the data arise from the assumed model. To address these issues, we propose a model that parameterizes the marginal associations between the count outcome and the covariates as easily interpretable log relative rates, while including random effects to account for correlation among observations. One of the main advantages of this marginal model is that it allows a basis upon which we can directly compare the performance of standard methods that ignore zero inflation with that of a method that explicitly takes zero inflation into account. We present simulations of these various model formulations in terms of bias and variance estimation. Finally, we apply the proposed approach to analyze toxicological data of the effect of emissions on cardiac arrhythmias.  相似文献   

8.
Bivariate time series of counts with excess zeros relative to the Poisson process are common in many bioscience applications. Failure to account for the extra zeros in the analysis may result in biased parameter estimates and misleading inferences. A class of bivariate zero-inflated Poisson autoregression models is presented to accommodate the zero-inflation and the inherent serial dependency between successive observations. An autoregressive correlation structure is assumed in the random component of the compound regression model. Parameter estimation is achieved via an EM algorithm, by maximizing an appropriate log-likelihood function to obtain residual maximum likelihood estimates. The proposed method is applied to analyze a bivariate series from an occupational health study, in which the zero-inflated injury count events are classified as either musculoskeletal or non-musculoskeletal in nature. The approach enables the evaluation of the effectiveness of a participatory ergonomics intervention at the population level, in terms of reducing the overall incidence of lost-time injury and a simultaneous decline in the two mean injury rates.  相似文献   

9.
When analyzing Poisson count data sometimes a high frequency of extra zeros is observed. The Zero‐Inflated Poisson (ZIP) model is a popular approach to handle zero‐inflation. In this paper we generalize the ZIP model and its regression counterpart to accommodate the extent of individual exposure. Empirical evidence drawn from an occupational injury data set confirms that the incorporation of exposure information can exert a substantial impact on the model fit. Tests for zero‐inflation are also considered. Their finite sample properties are examined in a Monte Carlo study.  相似文献   

10.
11.
Analysis of count data is required in many areas of biometric interest. Often the simple Poisson distribution is not appropriate, since an extra-number of zero counts occur in the count data. Some current approaches for the problem at hand are reviewed. It will be argued that these situations can often be easily modeled using the zero-inflated Poisson distribution. A variety of applications are considered in which this occurs. Possibilities are outlined on how the validity of the zero-inflated Poisson can be validated including a comparison with the nonparametric Poisson mixture maximum likelihood estimator.  相似文献   

12.
选取在经济学和社会科学领域广泛应用的零膨胀模型(zero-inflated models)和栅栏模型(Hurdle models)对大兴安岭地区林火发生进行模拟,应用赤池准则(AIC)、似然比检验(LR)和模型残差平方和(SSR)对两类共4个回归模型——零膨胀泊松模型(ZIP)、零膨胀负二项模型(ZINB)、栅栏泊松模型(PH)、栅栏负二项模型(NBH)进行拟合分析,最终选取适合此林火发生特性的预测模型.模型的AIC和SSR值表明,ZINB模型对当地林火数据的拟合度最高.运用LR检验对嵌套模型(ZINB与ZIP,NBH与PH)进行检验,结果显示: ZINB和NBH均优于各自的嵌入模型,说明负二项(NB)模型对数据结构中的过度离散现象可以很好地模拟和解释.根据研究区林火实际发生规律和两类不同模型的应用假设条件判断,零膨胀模型更适合塔河地区的林火特性.  相似文献   

13.
The environmental legislation of many countries increasingly requires the continuous monitoring of fish assemblages to evaluate the success of river and stream restorations. Predicting species–environment relationships on the basis of monitoring data is central in the evaluation of ecological integrity and planning of rehabilitation strategies. Monitoring data are, however, often plagued by a substantial proportion of zeros (no catch at single sampling points) which are caused by relevant ecological processes, but complicate the use of commonly used statistical methods. This study compares mere count regression models, mixture and hurdle models based on Poisson and negative binomial distribution and logistic regressions with respect to their ability to cope with large zero-inflated data sets obtained by point abundance sampling of young-of-the-year fish from three large German rivers. Only mixture and hurdle models based on negative binomial distribution could satisfactorily be fitted to the zero-inflated and overdispersed count data. The logistic regression models applied to transliterated catch data simplified the computational procedure and yielded qualitative similar results to the count regression models indicating that the use of more complex count data did not generally provide better predictions. Therefore, presence/absence sampling may be a suitable and less costly alternative to abundance surveys for identifying environmental factors which affect the spatial distribution of fish populations at least if information on subtly abundance fluctuations is not needed. Mixture or hurdle models are particularly worth the additional effort if it is reasonable to distinguish between those environmental factors influencing the occurrence probability and others affecting the abundance. All models showed low sensitivity to rare guilds pointing to the need for a further development of statistical models for rare species whose management is a matter of growing environmental concern.  相似文献   

14.
Phenotypes measured in counts are commonly observed in nature. Statistical methods for mapping quantitative trait loci (QTL) underlying count traits are documented in the literature. The majority of them assume that the count phenotype follows a Poisson distribution with appropriate techniques being applied to handle data dispersion. When a count trait has a genetic basis, “naturally occurring” zero status also reflects the underlying gene effects. Simply ignoring or miss-handling the zero data may lead to wrong QTL inference. In this article, we propose an interval mapping approach for mapping QTL underlying count phenotypes containing many zeros. The effects of QTLs on the zero-inflated count trait are modelled through the zero-inflated generalized Poisson regression mixture model, which can handle the zero inflation and Poisson dispersion in the same distribution. We implement the approach using the EM algorithm with the Newton-Raphson algorithm embedded in the M-step, and provide a genome-wide scan for testing and estimating the QTL effects. The performance of the proposed method is evaluated through extensive simulation studies. Extensions to composite and multiple interval mapping are discussed. The utility of the developed approach is illustrated through a mouse F2 intercross data set. Significant QTLs are detected to control mouse cholesterol gallstone formation.  相似文献   

15.
Smooth tests for the zero-inflated poisson distribution   总被引:1,自引:0,他引:1  
Thas O  Rayner JC 《Biometrics》2005,61(3):808-815
In this article we construct three smooth goodness-of-fit tests for testing for the zero-inflated Poisson (ZIP) distribution against general smooth alternatives in the sense of Neyman. We apply our tests to a data set previously claimed to be ZIP distributed, and show that the ZIP is not a good model to describe the data. At rejection of the null hypothesis of ZIP, the individual components of the test statistic, which are directly related to interpretable parameters in a smooth model, may be used to gain insight into an alternative distribution.  相似文献   

16.
We prove that the generalized Poisson distribution GP(theta, eta) (eta > or = 0) is a mixture of Poisson distributions; this is a new property for a distribution which is the topic of the book by Consul (1989). Because we find that the fits to count data of the generalized Poisson and negative binomial distributions are often similar, to understand their differences, we compare the probability mass functions and skewnesses of the generalized Poisson and negative binomial distributions with the first two moments fixed. They have slight differences in many situations, but their zero-inflated distributions, with masses at zero, means and variances fixed, can differ more. These probabilistic comparisons are helpful in selecting a better fitting distribution for modelling count data with long right tails. Through a real example of count data with large zero fraction, we illustrate how the generalized Poisson and negative binomial distributions as well as their zero-inflated distributions can be discriminated.  相似文献   

17.
Qihuang Zhang  Grace Y. Yi 《Biometrics》2023,79(2):1089-1102
Zero-inflated count data arise frequently from genomics studies. Analysis of such data is often based on a mixture model which facilitates excess zeros in combination with a Poisson distribution, and various inference methods have been proposed under such a model. Those analysis procedures, however, are challenged by the presence of measurement error in count responses. In this article, we propose a new measurement error model to describe error-contaminated count data. We show that ignoring the measurement error effects in the analysis may generally lead to invalid inference results, and meanwhile, we identify situations where ignoring measurement error can still yield consistent estimators. Furthermore, we propose a Bayesian method to address the effects of measurement error under the zero-inflated Poisson model and discuss the identifiability issues. We develop a data-augmentation algorithm that is easy to implement. Simulation studies are conducted to evaluate the performance of the proposed method. We apply our method to analyze the data arising from a prostate adenocarcinoma genomic study.  相似文献   

18.
Ridout M  Hinde J  Demétrio CG 《Biometrics》2001,57(1):219-223
Count data often show a higher incidence of zero counts than would be expected if the data were Poisson distributed. Zero-inflated Poisson regression models are a useful class of models for such data, but parameter estimates may be seriously biased if the nonzero counts are overdispersed in relation to the Poisson distribution. We therefore provide a score test for testing zero-inflated Poisson regression models against zero-inflated negative binomial alternatives.  相似文献   

19.
Count phenotypes with excessive zeros are often observed in the biological world. Researchers have studied many statistical methods for mapping the quantitative trait loci (QTLs) of zero-inflated count phenotypes. However, most of the existing methods consist of finding the approximate positions of the QTLs on the chromosome by genome-wide scanning. Additionally, most of the existing methods use the EM algorithm for parameter estimation. In this paper, we propose a Bayesian interval mapping scheme of QTLs for zero-inflated count data. The method takes advantage of a zero-inflated generalized Poisson (ZIGP) regression model to study the influence of QTLs on the zero-inflated count phenotype. The MCMC algorithm is used to estimate the effects and position parameters of QTLs. We use the Haldane map function to realize the conversion between recombination rate and map distance. Monte Carlo simulations are conducted to test the applicability and advantage of the proposed method. The effects of QTLs on the formation of mouse cholesterol gallstones were demonstrated by analyzing an mouse data set.  相似文献   

20.
Pooling the relative risk (RR) across studies investigating rare events, for example, adverse events, via meta-analytical methods still presents a challenge to researchers. The main reason for this is the high probability of observing no events in treatment or control group or both, resulting in an undefined log RR (the basis of standard meta-analysis). Other technical challenges ensue, for example, the violation of normality assumptions, or bias due to exclusion of studies and application of continuity corrections, leading to poor performance of standard approaches. In the present simulation study, we compared three recently proposed alternative models (random-effects [RE] Poisson regression, RE zero-inflated Poisson [ZIP] regression, binomial regression) to the standard methods in conjunction with different continuity corrections and to different versions of beta-binomial regression. Based on our investigation of the models' performance in 162 different simulation settings informed by meta-analyses from the Cochrane database and distinguished by different underlying true effects, degrees of between-study heterogeneity, numbers of primary studies, group size ratios, and baseline risks, we recommend the use of the RE Poisson regression model. The beta-binomial model recommended by Kuss (2015) also performed well. Decent performance was also exhibited by the ZIP models, but they also had considerable convergence issues. We stress that these recommendations are only valid for meta-analyses with larger numbers of primary studies. All models are applied to data from two Cochrane reviews to illustrate differences between and issues of the models. Limitations as well as practical implications and recommendations are discussed; a flowchart summarizing recommendations is provided.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号