首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Ridout M  Hinde J  Demétrio CG 《Biometrics》2001,57(1):219-223
Count data often show a higher incidence of zero counts than would be expected if the data were Poisson distributed. Zero-inflated Poisson regression models are a useful class of models for such data, but parameter estimates may be seriously biased if the nonzero counts are overdispersed in relation to the Poisson distribution. We therefore provide a score test for testing zero-inflated Poisson regression models against zero-inflated negative binomial alternatives.  相似文献   

2.
Hall DB 《Biometrics》2000,56(4):1030-1039
In a 1992 Technometrics paper, Lambert (1992, 34, 1-14) described zero-inflated Poisson (ZIP) regression, a class of models for count data with excess zeros. In a ZIP model, a count response variable is assumed to be distributed as a mixture of a Poisson(lambda) distribution and a distribution with point mass of one at zero, with mixing probability p. Both p and lambda are allowed to depend on covariates through canonical link generalized linear models. In this paper, we adapt Lambert's methodology to an upper bounded count situation, thereby obtaining a zero-inflated binomial (ZIB) model. In addition, we add to the flexibility of these fixed effects models by incorporating random effects so that, e.g., the within-subject correlation and between-subject heterogeneity typical of repeated measures data can be accommodated. We motivate, develop, and illustrate the methods described here with an example from horticulture, where both upper bounded count (binomial-type) and unbounded count (Poisson-type) data with excess zeros were collected in a repeated measures designed experiment.  相似文献   

3.
In many biometrical applications, the count data encountered often contain extra zeros relative to the Poisson distribution. Zero‐inflated Poisson regression models are useful for analyzing such data, but parameter estimates may be seriously biased if the nonzero observations are over‐dispersed and simultaneously correlated due to the sampling design or the data collection procedure. In this paper, a zero‐inflated negative binomial mixed regression model is presented to analyze a set of pancreas disorder length of stay (LOS) data that comprised mainly same‐day separations. Random effects are introduced to account for inter‐hospital variations and the dependency of clustered LOS observations. Parameter estimation is achieved by maximizing an appropriate log‐likelihood function using an EM algorithm. Alternative modeling strategies, namely the finite mixture of Poisson distributions and the non‐parametric maximum likelihood approach, are also considered. The determination of pertinent covariates would assist hospital administrators and clinicians to manage LOS and expenditures efficiently.  相似文献   

4.
This paper presents the zero‐truncated negative binomial regression model to estimate the population size in the presence of a single registration file. The model is an alternative to the zero‐truncated Poisson regression model and it may be useful if the data are overdispersed due to unobserved heterogeneity. Horvitz–Thompson point and interval estimates for the population size are derived, and the performance of these estimators is evaluated in a simulation study. To illustrate the model, the size of the population of opiate users in the city of Rotterdam is estimated. In comparison to the Poisson model, the zero‐truncated negative binomial regression model fits these data better and yields a substantially higher population size estimate. (© 2008 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

5.
我国林火发生预测模型研究进展   总被引:2,自引:0,他引:2  
通过文献回顾,总结了国内林火发生预测模型的研究现状,并从林火发生驱动因子、林火发生概率预测模型、林火发生频次预测模型和模型检验方法等方面进行归纳分析。得出以下结论: 1)气象、地形、植被、可燃物、人类活动等因素是影响林火发生及模型预测精度的主要驱动因子;2)林火发生概率模型中,地理加权逻辑斯蒂回归模型考虑了变量之间的空间相关性,Gompit回归模型适宜非对称结构的林火数据,随机森林模型不需要多重共线性检验,在避免过度拟合的同时提高了预测精度,是林火发生概率预测模型的优选方法之一;3)林火发生频次模型中,负二项回归模型更适合对过度离散数据进行模拟,零膨胀模型和栅栏模型可以处理林火数据中包含大量零值的问题;4)ROC检验、AIC检验、似然比检验和Wald检验方法是林火概率和频次模型的常用检验方法。林火发生预测模型研究仍是我国当前林火管理工作的重点,预测模型的选择需要依据不同地区林火数据特点。此外,构建林火预测模型时需要考虑更多的影响因素,以提高模型预测精度;未来,需要进一步探索其他数学模型在林火发生预测中的应用,不断提高林火发生预测模型的准确度。  相似文献   

6.
We discuss the problem of estimating the number of nests of different species of seabirds on North East Herald Cay based on the data from a 1996 survey of quadrats along transects and data from similar past surveys. We consider three approaches based on different plausible models, namely a conditional negative binomial model that allows for additional zeroes in the data, a weighting approach (based on a heteroscedastic regression model), and a transform-both-sides regression approach. We find that the conditional negative binomial approach and a linear regression approach work well but that the transform-both-sides approach should not be used. We apply the conditional negative binomial and linear regression approaches with poststratification based on data quality and availability to estimate the number of frigatebird nests on North East Herald Cay.  相似文献   

7.
Overdispersed count data are very common in ecology. The negative binomial model has been used widely to represent such data. Ecological data often vary considerably, and traditional approaches are likely to be inefficient or incorrect due to underestimation of uncertainty and poor predictive power. We propose a new statistical model to account for excessive overdisperson. It is the combination of two negative binomial models, where the first determines the number of clusters and the second the number of individuals in each cluster. Simulations show that this model often performs better than the negative binomial model. This model also fitted catch and effort data for southern bluefin tuna better than other models according to AIC. A model that explicitly and properly accounts for overdispersion should contribute to robust management and conservation for wildlife and plants.  相似文献   

8.
雷击火的发生与气象因子之间存在着密切的关系。该文选用符合大兴安岭地区林火发生数据结构的负二项(negative binomial,NB)和零膨胀负二项(zero-inflated negative binomial,ZINB)两种模型对大兴安岭林区1980–2005年间雷击火的发生与气象因素间的关系进行建模分析,并与以往研究中所使用的最小二乘(OLS)回归方法相对比。使用SAS和R-Project统计软件进行模型拟合运算,计算得出模型各参数。结果表明,NB和ZINB模型对数据拟合较好,模型内各气象因子显著性水平较高,对雷击火发生次数均具有较好的预测能力。运用AIC和Vuong等检验方法,进一步比较了NB和ZINB模型对数据的拟合水平以及模型预测水平,结果表明ZINB模型无论在数据拟合还是模型预测上都要优于NB模型。提出了大兴安岭地区林火发生与气象因子关系的最优模型。  相似文献   

9.
Count data are very common in health services research, and very commonly the basic Poisson regression model has to be extended in several ways to accommodate several sources of heterogeneity: (i) an excess number of zeros relative to a Poisson distribution, (ii) hierarchical structures, and correlated data, (iii) remaining “unexplained” sources of overdispersion. In this paper, we propose hierarchical zero‐inflated and overdispersed models with independent, correlated, and shared random effects for both components of the mixture model. We show that all different extensions of the Poisson model can be based on the concept of mixture models, and that they can be combined to account for all different sources of heterogeneity. Expressions for the first two moments are derived and discussed. The models are applied to data on maternal deaths and related risk factors within health facilities in Mozambique. The final model shows that the maternal mortality rate mainly depends on the geographical location of the health facility, the percentage of women admitted with HIV and the percentage of referrals from the health facility.  相似文献   

10.
On occasion, generalized linear models for counts based on Poisson or overdispersed count distributions may encounter lack of fit due to disproportionately large frequencies of zeros. Three alternative types of regression models that utilize all the information and explicitly account for excess zeros are examined and given general formulations. A simple mechanism for added zeros is assumed that directly motivates one type of model, here called the added-zero type, particular forms of which have been proposed independently by D. LAMBERT (1992) and in unpublished work by the author. An original regression formulation (the zero-altered model) is presented as a reduced form of the two-part model for count data, which is also discussed. It is suggested that two-part models be used to aid in development of an added-zero model when the latter is thought to be appropriate.  相似文献   

11.
In health services and outcome research, count outcomes are frequently encountered and often have a large proportion of zeros. The zero‐inflated negative binomial (ZINB) regression model has important applications for this type of data. With many possible candidate risk factors, this paper proposes new variable selection methods for the ZINB model. We consider maximum likelihood function plus a penalty including the least absolute shrinkage and selection operator (LASSO), smoothly clipped absolute deviation (SCAD), and minimax concave penalty (MCP). An EM (expectation‐maximization) algorithm is proposed for estimating the model parameters and conducting variable selection simultaneously. This algorithm consists of estimating penalized weighted negative binomial models and penalized logistic models via the coordinated descent algorithm. Furthermore, statistical properties including the standard error formulae are provided. A simulation study shows that the new algorithm not only has more accurate or at least comparable estimation, but also is more robust than the traditional stepwise variable selection. The proposed methods are applied to analyze the health care demand in Germany using the open‐source R package mpath .  相似文献   

12.
A common design for a falls prevention trial is to assess falling at baseline, randomize participants into an intervention or control group, and ask them to record the number of falls they experience during a follow‐up period of time. This paper addresses how best to include the baseline count in the analysis of the follow‐up count of falls in negative binomial (NB) regression. We examine the performance of various approaches in simulated datasets where both counts are generated from a mixed Poisson distribution with shared random subject effect. Including the baseline count after log‐transformation as a regressor in NB regression (NB‐logged) or as an offset (NB‐offset) resulted in greater power than including the untransformed baseline count (NB‐unlogged). Cook and Wei's conditional negative binomial (CNB) model replicates the underlying process generating the data. In our motivating dataset, a statistically significant intervention effect resulted from the NB‐logged, NB‐offset, and CNB models, but not from NB‐unlogged, and large, outlying baseline counts were overly influential in NB‐unlogged but not in NB‐logged. We conclude that there is little to lose by including the log‐transformed baseline count in standard NB regression compared to CNB for moderate to larger sized datasets.  相似文献   

13.
For time series of count data, correlated measurements, clustering as well as excessive zeros occur simultaneously in biomedical applications. Ignoring such effects might contribute to misleading treatment outcomes. A generalized mixture Poisson geometric process (GMPGP) model and a zero‐altered mixture Poisson geometric process (ZMPGP) model are developed from the geometric process model, which was originally developed for modelling positive continuous data and was extended to handle count data. These models are motivated by evaluating the trend development of new tumour counts for bladder cancer patients as well as by identifying useful covariates which affect the count level. The models are implemented using Bayesian method with Markov chain Monte Carlo (MCMC) algorithms and are assessed using deviance information criterion (DIC).  相似文献   

14.
The dose-response model concerns to establish a relationship between a dose and the magnitude of the response produced by the dose. A common complication in the dose-response model for jejunal crypts cell surviving data is overdispersion, where the observed variation exceeds that predicted from the binomial distribution. In this study, two different methods for analyzing jejunal crypts cell survival after regimens of several fractions are contrasted and compared. One method is the logistic regression approach, where the numbers of surviving crypts are predicted by the logistic function of a single dose of radiation. The other one is the transform-both-sides approach, where the arcsine transformation family is applied based on the first-order variance-stabilizing transformation. This family includes the square root, arcsine, and hyperbolic arcsine transformations, which have been used for Poisson, binomial, and negative binomial count data, as special cases. These approaches are applied to a data set from radiobiology. Simulation study indicates that the arcsine transformation family is more efficient than the logistic regression when there exists moderate overdispersion.  相似文献   

15.
Jung BC  Jhun M  Lee JW 《Biometrics》2005,61(2):626-628
Ridout, Hinde, and Demétrio (2001, Biometrics 57, 219-223) derived a score test for testing a zero-inflated Poisson (ZIP) regression model against zero-inflated negative binomial (ZINB) alternatives. They mentioned that the score test using the normal approximation might underestimate the nominal significance level possibly for small sample cases. To remedy this problem, a parametric bootstrap method is proposed. It is shown that the bootstrap method keeps the significance level close to the nominal one and has greater power uniformly than the existing normal approximation for testing the hypothesis.  相似文献   

16.
In recent years there have been a series of advances in the field of dynamic prediction. Among those is the development of methods for dynamic prediction of the cumulative incidence function in a competing risk setting. These models enable the predictions to be updated as time progresses and more information becomes available, for example when a patient comes back for a follow‐up visit after completing a year of treatment, the risk of death, and adverse events may have changed since treatment initiation. One approach to model the cumulative incidence function in competing risks is by direct binomial regression, where right censoring of the event times is handled by inverse probability of censoring weights. We extend the approach by combining it with landmarking to enable dynamic prediction of the cumulative incidence function. The proposed models are very flexible, as they allow the covariates to have complex time‐varying effects, and we illustrate how to investigate possible time‐varying structures using Wald tests. The models are fitted using generalized estimating equations. The method is applied to bone marrow transplant data and the performance is investigated in a simulation study.  相似文献   

17.
Abstract  Studies of citrus leafminer in a coastal orchard in NSW, Australia indicated that an increase in abundance to about one mine per flush was followed during the midseason flush by a rapid increase in population that was related to an increase in the percentage of leaves infested within flushes and the number of mines per leaf. The fits of frequency distributions and Iwao's patchiness regression indicated that populations were highly contagious initially, and as the exponent k of the negative binomial distribution increased with increasing population density, the distribution approached random. Concurrently, the coefficient of variation of mines per flush (which was strongly related to the proportion of un-infested flushes) decreased to about unity as the proportion of un-infested flushes reached zero and fell further as the number of mines per flush increased. Both numerative and binomial sequential sampling plans were developed using a decision threshold based on 1.2 mines per flush. The binomial sampling plan was based on a closely fitting model of the functional relationship between mean density and proportion of infested flushes. Functional relationships using the parameters determined from Iwao's patchiness regression and Taylor's power law were equally satisfactory, and one based on the negative binomial model also fitted well, but the Poisson model did not. The three best fitting models indicated that a decision threshold of 1.2 mines per flush was equivalent to 50% of flushes infested. From a practical point of view, the transition from 25% infestation of flushes through 50% is so rapid that it may be prudent to take action when the 25% level is reached; otherwise, the 50% may be passed before the crop is checked again. For valuable nursery stock should infestation be detected in spring, it may be advisable to apply prophylactic treatment as the midseason flush starts.  相似文献   

18.
黑龙江省红松人工林枝条分布数量模拟   总被引:1,自引:0,他引:1  
郑杨  董利虎  李凤日 《生态学杂志》2016,27(7):2172-2180
基于黑龙江省佳木斯市孟家岗林场的12块样地65株人工红松解析木的955个枝解析数据,以Poisson回归模型和负二项回归模型作为备选模型,构建了人工红松二级枝条数量分布模型,并采用AIC、Pseudo-R2、均方根误差(RMSE)和Vuong检验对模型的拟合优度进行比较.结果表明: 每轮一级枝条分布数量集中在3~5个,均值为4个,一级枝条分布数量与人工红松自身的枝条属性相关.一级标准枝上二级枝条分布的离散程度较大,利用全部子回归技术构建二级枝条分布数量模型,最终选择以负二项回归模型为基础的E(Y)=exp(β0+β1lnRDINC+β2RDINC2+β3HT/DBH+β4CL+β5DBH)作为二级枝条分布数量最优预测模型(β为参数;RDINC为相对着枝深度;HT为树高;DBH为胸径;CL为冠长).最优模型的Pseudo-R2为0.79,平均偏差接近于0,平均绝对偏差<7.对于所建立的模型,lnRDINCCLDBH的参数为正值,RDINC2HT/DBH的为负值,随着RDINC增大,在树冠内二级枝条分布数量存在最大值.总的来说,所建立的人工红松二级枝条分布数量模型的预测精度为96.4%,可以很好地预估该研究区域人工红松二级枝条分布数量,为以后枝条的光合作用和生物量的研究提供了理论基础.  相似文献   

19.
The complementary log-log link was originally introduced in 1922 to R. A. Fisher, long before the logit and probit links. While the last two links are symmetric, the complementary log-log link is an asymmetrical link without a parameter associated with it. Several asymmetrical links with an extra parameter were proposed in the literature over last few years to deal with imbalanced data in binomial regression (when one of the classes is much smaller than the other); however, these do not necessarily have the cloglog link as a special case, with the exception of the link based on the generalized extreme value distribution. In this paper, we introduce flexible cloglog links for modeling binomial regression models that include an extra parameter associated with the link that explains some unbalancing for binomial outcomes. For all cases, the cloglog is a special case or the reciprocal version loglog link is obtained. A Bayesian Markov chain Monte Carlo inference approach is developed. Simulations study to evaluate the performance of the proposed algorithm is conducted and prior sensitivity analysis for the extra parameter shows that a uniform prior is the most convenient for all models. Additionally, two applications in medical data (age at menarche and pulmonary infection) illustrate the advantages of the proposed models.  相似文献   

20.
We analyze a real data set pertaining to reindeer fecal pellet‐group counts obtained from a survey conducted in a forest area in northern Sweden. In the data set, over 70% of counts are zeros, and there is high spatial correlation. We use conditionally autoregressive random effects for modeling of spatial correlation in a Poisson generalized linear mixed model (GLMM), quasi‐Poisson hierarchical generalized linear model (HGLM), zero‐inflated Poisson (ZIP), and hurdle models. The quasi‐Poisson HGLM allows for both under‐ and overdispersion with excessive zeros, while the ZIP and hurdle models allow only for overdispersion. In analyzing the real data set, we see that the quasi‐Poisson HGLMs can perform better than the other commonly used models, for example, ordinary Poisson HGLMs, spatial ZIP, and spatial hurdle models, and that the underdispersed Poisson HGLMs with spatial correlation fit the reindeer data best. We develop R codes for fitting these models using a unified algorithm for the HGLMs. Spatial count response with an extremely high proportion of zeros, and underdispersion can be successfully modeled using the quasi‐Poisson HGLM with spatial random effects.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号