首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Dealing with overdispersed count data in applied ecology   总被引:1,自引:0,他引:1  

ABSTRACT Count data with means <2 are often assumed to follow a Poisson distribution. However, in many cases these kinds of data, such as number of young fledged, are more appropriately considered to be multinomial observations due to naturally occurring upper truncation of the distribution. We evaluated the performance of several versions of multinomial regression, plus Poisson and normal regression, for analysis of count data with means <2 through Monte Carlo simulations. Simulated data mimicked observed counts of number of young fledged (0, 1, 2, or 3) by California spotted owls (Strix occidentalis occidentalis). We considered size and power of tests to detect differences among 10 levels of a categorical predictor, as well as tests for trends across 10-year periods. We found regular regression and analysis of variance procedures based on a normal distribution to perform satisfactorily in all cases we considered, whereas failure rate of multinomial procedures was often excessively high, and the Poisson model demonstrated inappropriate test size for data where the variance/mean ratio was <1 or >1.2. Thus, managers can use simple statistical methods with which they are likely already familiar to analyze the kinds of count data we described here.  相似文献   

Overdispersed count data are very common in ecology. The negative binomial model has been used widely to represent such data. Ecological data often vary considerably, and traditional approaches are likely to be inefficient or incorrect due to underestimation of uncertainty and poor predictive power. We propose a new statistical model to account for excessive overdisperson. It is the combination of two negative binomial models, where the first determines the number of clusters and the second the number of individuals in each cluster. Simulations show that this model often performs better than the negative binomial model. This model also fitted catch and effort data for southern bluefin tuna better than other models according to AIC. A model that explicitly and properly accounts for overdispersion should contribute to robust management and conservation for wildlife and plants.  相似文献   

Conditional logistic regression models for correlated binary data   总被引:1,自引:0,他引:1  

Abstract: We perceive a need for more complete interpretation of regression models published in the wildlife literature to minimize the appearance of poor models and to maximize the extraction of information from good models. Accordingly, we offer this primer on interpretation of parameters in single- and multi-variable regression models. Using examples from the wildlife literature, we illustrate how to interpret linear zero-intercept, simple linear, semi-log, log-log, and polynomial models based on intercepts, coefficients, and shapes of relationships. We show how intercepts and coefficients have biological and management interpretations. We examine multiple linear regression models and show how to use the signs (+, -) of coefficients to assess the merit and meaning of a derived model. We discuss 3 methods of viewing the output of 3-dimensional models (y, x1, x2) in 2-dimensional space (sheet of paper) and illustrate graphical model interpretation with a 4-dimensional logistic regression model. Statistical significance or Akaike best-ness does not prevent the appearance of implausible regression models. We recommend that members of the peer review process be sensitive to full interpretation of regression models to forestall bad models and maximize information retrieval from good models  相似文献   

ABSTRACT Ecologists often develop complex regression models that include multiple categorical and continuous variables, interactions among predictors, and nonlinear relationships between the response and predictor variables. Nomograms, which are graphical devices for presenting mathematical functions and calculating output values, can aid biologists in interpreting and presenting these complex models. To illustrate benefits of nomograms, we developed a logistic regression model of elk (Cervus elaphus) resource selection. With this model, we demonstrated how a nomogram helps scientists and managers interpret interactions among variables, compare the relative biological importance of variables, and examine predicted shapes of relationships (e.g., linear vs. nonlinear) between response and predictor variables. Although our example focused on logistic regression, nomograms are equally useful for other linear and nonlinear models. Regardless of the approach used for model development, nomograms and other graphical summaries can help scientists and managers develop, interpret, and apply statistical models.  相似文献   

Ma S  Kosorok MR  Fine JP 《Biometrics》2006,62(1):202-210
As a useful alternative to Cox's proportional hazard model, the additive risk model assumes that the hazard function is the sum of the baseline hazard function and the regression function of covariates. This article is concerned with estimation and prediction for the additive risk models with right censored survival data, especially when the dimension of the covariates is comparable to or larger than the sample size. Principal component regression is proposed to give unique and numerically stable estimators. Asymptotic properties of the proposed estimators, component selection based on the weighted bootstrap, and model evaluation techniques are discussed. This approach is illustrated with analysis of the primary biliary cirrhosis clinical data and the diffuse large B-cell lymphoma genomic data. It is shown that this methodology is numerically stable and effective in dimension reduction, while still being able to provide satisfactory prediction and classification results.  相似文献   

Yi GY  He W 《Biometrics》2009,65(2):618-625
Summary .  Recently, median regression models have received increasing attention. When continuous responses follow a distribution that is quite different from a normal distribution, usual mean regression models may fail to produce efficient estimators whereas median regression models may perform satisfactorily. In this article, we discuss using median regression models to deal with longitudinal data with dropouts. Weighted estimating equations are proposed to estimate the median regression parameters for incomplete longitudinal data, where the weights are determined by modeling the dropout process. Consistency and the asymptotic distribution of the resultant estimators are established. The proposed method is used to analyze a longitudinal data set arising from a controlled trial of HIV disease ( Volberding et al., 1990 , The New England Journal of Medicine 322, 941–949). Simulation studies are conducted to assess the performance of the proposed method under various situations. An extension to estimation of the association parameters is outlined.  相似文献   

The equivalence of two models for ordinal data   总被引:1,自引:0,他引:1  
LAARA  E.; MATTHEWS  J. N. S. 《Biometrika》1985,72(1):206-207

We propose a generalization of the varying coefficient modelfor longitudinal data to cases where not only current but alsorecent past values of the predictor process affect current response.More precisely, the targeted regression coefficient functionsof the proposed model have sliding window supports around currenttime t. A variant of a recently proposed two-step estimationmethod for varying coefficient models is proposed for estimationin the context of these generalized varying coefficient models,and is found to lead to improvements, especially for the caseof additive measurement errors in both response and predictors.The proposed methodology for estimation and inference is alsoapplicable for the case of additive measurement error in thecommon versions of varying coefficient models that relate onlycurrent observations of predictor and response processes toeach other. Asymptotic distributions of the proposed estimatorsare derived, and the model is applied to the problem of predictingprotein concentrations in a longitudinal study. Simulation studiesdemonstrate the efficacy of the proposed estimation procedure.  相似文献   

Lloyd CJ 《Biometrics》2000,56(3):862-867
The performance of a diagnostic test is summarized by its receiver operating characteristic (ROC) curve. Under quite natural assumptions about the latent variable underlying the test, the ROC curve is convex. Empirical data on a test's performance often comes in the form of observed true positive and false positive relative frequencies under varying conditions. This paper describes a family of regression models for analyzing such data. The underlying ROC curves are specified by a quality parameter delta and a shape parameter mu and are guaranteed to be convex provided delta > 1. Both the position along the ROC curve and the quality parameter delta are modeled linearly with covariates at the level of the individual. The shape parameter mu enters the model through the link functions log(p mu) - log(1 - p mu) of a binomial regression and is estimated either by search or from an appropriate constructed variate. One simple application is to the meta-analysis of independent studies of the same diagnostic test, illustrated on some data of Moses, Shapiro, and Littenberg (1993). A second application, to so-called vigilance data, is given, where ROC curves differ across subjects and modeling of the position along the ROC curve is of primary interest.  相似文献   

Resource selection functions (RSFs) are typically estimated by comparing covariates at a discrete set of “used” locations to those from an “available” set of locations. This RSF approach treats the response as binary and does not account for intensity of use among habitat units where locations were recorded. Advances in global positioning system (GPS) technology allow animal location data to be collected at fine spatiotemporal scales and have increased the size and correlation of data used in RSF analyses. We suggest that a more contemporary approach to analyzing such data is to model intensity of use, which can be estimated for one or more animals by relating the relative frequency of locations in a set of sampling units to the habitat characteristics of those units with count‐based regression and, in particular, negative binomial (NB) regression. We demonstrate this NB RSF approach with location data collected from 10 GPS‐collared Rocky Mountain elk (Cervus elaphus) in the Starkey Experimental Forest and Range enclosure. We discuss modeling assumptions and show how RSF estimation with NB regression can easily accommodate contemporary research needs, including: analysis of large GPS data sets, computational ease, accounting for among‐animal variation, and interpretation of model covariates. We recommend the NB approach because of its conceptual and computational simplicity, and the fact that estimates of intensity of use are unbiased in the face of temporally correlated animal location data.  相似文献   

Semiparametric Regression in Size-Biased Sampling   总被引:1,自引:0,他引:1  
Ying Qing Chen 《Biometrics》2010,66(1):149-158
Summary .  Size-biased sampling arises when a positive-valued outcome variable is sampled with selection probability proportional to its size. In this article, we propose a semiparametric linear regression model to analyze size-biased outcomes. In our proposed model, the regression parameters of covariates are of major interest, while the distribution of random errors is unspecified. Under the proposed model, we discover that regression parameters are invariant regardless of size-biased sampling. Following this invariance property, we develop a simple estimation procedure for inferences. Our proposed methods are evaluated in simulation studies and applied to two real data analyses.  相似文献   

林火预测预报是科学有效进行林火管理的前提,是林业管理部门和科研工作者的广泛关注的领域。逻辑斯蒂回归(Logistic Regression,LR)是目前国内外广泛应用于森林火灾预测的模型方法,然而近年来有学者发现该方法没有充分考虑林火影响因子的空间相关性和异质性,从而导致模型拟合结果偏差。地理加权逻辑斯蒂回归(Geographically weighted logistic regression,GWR)模型考虑到了模型变量之间的空间相关性,有效提高的模型的拟合能力。为探讨GWLR模型在福建林火预测上的适用性,本研究应用LR和GWLR两种方法分别建立福建省森林火灾与气象因子的预测模型,通过模型拟合能力对比,判断在GWLR的适用性。研究以2000—2005年福建地区森林火灾卫星火点数据和每日气象因子为基础,将全样本分为60%的建模数据和40%的校验数据,并重复5次,建立5个样本组。选择在5个样本组中3个及以上表现显著的变量进入最终模型。研究结果表明GWLR在模型拟合度、模型残差、空间自相关性以及预测准确率等方面均优于LR模型,说明充分考虑模型变量的空间异质性有助于提高模型的预测精度,同时也验证了GWLR在福建地区林火预测上的适应性。此外,模型参数结果显示,"日最高地表气温"、"日最低地表气温"、"日平均风速"、"24小时降水量"、"日最高本站气压"、"日照时数"、"日最高气温"和"日最小相对湿度"8个因子对福建省林火发生有显著影响,研究结论为福建地区林火预测预报提供了新的方法。  相似文献   

A goodness-of-fit test for multinomial logistic regression   总被引:1,自引:0,他引:1  
Goeman JJ  le Cessie S 《Biometrics》2006,62(4):980-985
This article presents a score test to check the fit of a logistic regression model with two or more outcome categories. The null hypothesis that the model fits well is tested against the alternative that residuals of samples close to each other in covariate space tend to deviate from the model in the same direction. We propose a test statistic that is a sum of squared smoothed residuals, and show that it can be interpreted as a score test in a random effects model. By specifying the distance metric in covariate space, users can choose the alternative against which the test is directed, making it either an omnibus goodness-of-fit test or a test for lack of fit of specific model variables or outcome categories.  相似文献   

1.  The construction of a predictive metapopulation model includes three steps: the choice of factors affecting metapopulation dynamics, the choice of model structure, and finally parameter estimation and model testing.
2.  Unless the assumption is made that the metapopulation is at stochastic quasi-equilibrium and unless the method of parameter estimation of model parameters uses that assumption, estimates from a limited amount of data will usually predict a trend in metapopulation size.
3.  This implicit estimation of a trend occurs because extinction-colonization stochasticity, possibly amplified by regional stochasticity, leads to unequal numbers of observed extinction and colonization events during a short study period.
4.  Metapopulation models, such as those based on the logistic regression model, that rely on observed population turnover events in parameter estimation are sensitive to the implicit estimation of a trend.
5.  A new parameter estimation method, based on Monte Carlo inference for statistically implicit models, allows an explicit decision about whether metapopulation quasi-stability is assumed or not.
6. Our confidence in metapopulation model parameter estimates that have been produced from only a few years of data is decreased by the need to know before parameter estimation whether the metapopulation is in quasi-stable state or not.
7. The choice of whether metapopulation stability is assumed or not in parameter estimation should be done consciously. Typical data sets cover only a few years and rarely allow a statistical test of a possible trend. While making the decision about stability one should consider any information about the landscape history and species and metapopulation characteristics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号