首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Robert M. Dorazio 《Biometrics》2012,68(4):1303-1312
Summary Several models have been developed to predict the geographic distribution of a species by combining measurements of covariates of occurrence at locations where the species is known to be present with measurements of the same covariates at other locations where species occurrence status (presence or absence) is unknown. In the absence of species detection errors, spatial point‐process models and binary‐regression models for case‐augmented surveys provide consistent estimators of a species’ geographic distribution without prior knowledge of species prevalence. In addition, these regression models can be modified to produce estimators of species abundance that are asymptotically equivalent to those of the spatial point‐process models. However, if species presence locations are subject to detection errors, neither class of models provides a consistent estimator of covariate effects unless the covariates of species abundance are distinct and independently distributed from the covariates of species detection probability. These analytical results are illustrated using simulation studies of data sets that contain a wide range of presence‐only sample sizes. Analyses of presence‐only data of three avian species observed in a survey of landbirds in western Montana and northern Idaho are compared with site‐occupancy analyses of detections and nondetections of these species.  相似文献   

2.
Sexton J  Laake P 《Biometrics》2007,63(2):586-592
In this article, we consider nonparametric regression when covariates are measured with error. Estimation is performed using boosted regression trees, with the sum of the trees forming the estimate of the conditional expectation of the response. Both binary and continuous response regression are investigated. An approach to fitting regression trees when covariates are measured with error is described, and the boosting algorithms consist of its repeated application. The main feature of the approach is that it handles situations where multiple covariates are measured with error. Some simulation results are given as well as its application to data from the Framingham Heart Study.  相似文献   

3.
We have developed a new general approach for handling misclassification in discrete covariates or responses in regression models. The simulation and extrapolation (SIMEX) method, which was originally designed for handling additive covariate measurement error, is applied to the case of misclassification. The statistical model for characterizing misclassification is given by the transition matrix Pi from the true to the observed variable. We exploit the relationship between the size of misclassification and bias in estimating the parameters of interest. Assuming that Pi is known or can be estimated from validation data, we simulate data with higher misclassification and extrapolate back to the case of no misclassification. We show that our method is quite general and applicable to models with misclassified response and/or misclassified discrete regressors. In the case of a binary response with misclassification, we compare our method to the approach of Neuhaus, and to the matrix method of Morrissey and Spiegelman in the case of a misclassified binary regressor. We apply our method to a study on caries with a misclassified longitudinal response.  相似文献   

4.
Abstract I provide a brief introduction to the concept of spatial autocorrelation and its incorporation into regression-type models. Spatial autocorrelation occurs when the response variable is correlated with itself at other locations in the region of interest. The autocorrelation usually takes a specific form where observations close in space are more correlated than those farther apart, and the rate of decay of the correlation is a function of the distance separating 2 locations. I present 2 commonly used models: 1) geostatistical modeling in which data are collected at points in the study region and 2) conditional autoregression (lattice) models in which data are aggregated over small nonoverlapping sub-areas of the study region. I also describe incorporation of explanatory covariates, such as habitat or physico-chemical attributes. I emphasize frequentist methods, but I briefly describe Bayesian approaches. I also provide some advantages, such as obtaining correct standard errors for estimators, and disadvantages, such as requirements for larger sample sizes, of incorporating spatial autocorrelation into the modeling effort. This information can aid researchers in designing and analyzing models of the relationships between species distributions and habitat. As a result, more informative models can be developed which further aid in management of wildlife.  相似文献   

5.
Exposure measurement error can result in a biased estimate of the association between an exposure and outcome. When the exposure–outcome relationship is linear on the appropriate scale (e.g. linear, logistic) and the measurement error is classical, that is the result of random noise, the result is attenuation of the effect. When the relationship is non‐linear, measurement error distorts the true shape of the association. Regression calibration is a commonly used method for correcting for measurement error, in which each individual's unknown true exposure in the outcome regression model is replaced by its expectation conditional on the error‐prone measure and any fully measured covariates. Regression calibration is simple to execute when the exposure is untransformed in the linear predictor of the outcome regression model, but less straightforward when non‐linear transformations of the exposure are used. We describe a method for applying regression calibration in models in which a non‐linear association is modelled by transforming the exposure using a fractional polynomial model. It is shown that taking a Bayesian estimation approach is advantageous. By use of Markov chain Monte Carlo algorithms, one can sample from the distribution of the true exposure for each individual. Transformations of the sampled values can then be performed directly and used to find the expectation of the transformed exposure required for regression calibration. A simulation study shows that the proposed approach performs well. We apply the method to investigate the relationship between usual alcohol intake and subsequent all‐cause mortality using an error model that adjusts for the episodic nature of alcohol consumption.  相似文献   

6.
Resource selection functions (RSFs) are typically estimated by comparing covariates at a discrete set of “used” locations to those from an “available” set of locations. This RSF approach treats the response as binary and does not account for intensity of use among habitat units where locations were recorded. Advances in global positioning system (GPS) technology allow animal location data to be collected at fine spatiotemporal scales and have increased the size and correlation of data used in RSF analyses. We suggest that a more contemporary approach to analyzing such data is to model intensity of use, which can be estimated for one or more animals by relating the relative frequency of locations in a set of sampling units to the habitat characteristics of those units with count‐based regression and, in particular, negative binomial (NB) regression. We demonstrate this NB RSF approach with location data collected from 10 GPS‐collared Rocky Mountain elk (Cervus elaphus) in the Starkey Experimental Forest and Range enclosure. We discuss modeling assumptions and show how RSF estimation with NB regression can easily accommodate contemporary research needs, including: analysis of large GPS data sets, computational ease, accounting for among‐animal variation, and interpretation of model covariates. We recommend the NB approach because of its conceptual and computational simplicity, and the fact that estimates of intensity of use are unbiased in the face of temporally correlated animal location data.  相似文献   

7.
We introduce a new method, moment reconstruction, of correcting for measurement error in covariates in regression models. The central idea is similar to regression calibration in that the values of the covariates that are measured with error are replaced by "adjusted" values. In regression calibration the adjusted value is the expectation of the true value conditional on the measured value. In moment reconstruction the adjusted value is the variance-preserving empirical Bayes estimate of the true value conditional on the outcome variable. The adjusted values thereby have the same first two moments and the same covariance with the outcome variable as the unobserved "true" covariate values. We show that moment reconstruction is equivalent to regression calibration in the case of linear regression, but leads to different results for logistic regression. For case-control studies with logistic regression and covariates that are normally distributed within cases and controls, we show that the resulting estimates of the regression coefficients are consistent. In simulations we demonstrate that for logistic regression, moment reconstruction carries less bias than regression calibration, and for case-control studies is superior in mean-square error to the standard regression calibration approach. Finally, we give an example of the use of moment reconstruction in linear discriminant analysis and a nonstandard problem where we wish to adjust a classification tree for measurement error in the explanatory variables.  相似文献   

8.
We propose a conditional scores procedure for obtaining bias-corrected estimates of log odds ratios from matched case-control data in which one or more covariates are subject to measurement error. The approach involves conditioning on sufficient statistics for the unobservable true covariates that are treated as fixed unknown parameters. For the case of Gaussian nondifferential measurement error, we derive a set of unbiased score equations that can then be solved to estimate the log odds ratio parameters of interest. The procedure successfully removes the bias in naive estimates, and standard error estimates are obtained by resampling methods. We present an example of the procedure applied to data from a matched case-control study of prostate cancer and serum hormone levels, and we compare its performance to that of regression calibration procedures.  相似文献   

9.
Errors‐in‐variables models in high‐dimensional settings pose two challenges in application. First, the number of observed covariates is larger than the sample size, while only a small number of covariates are true predictors under an assumption of model sparsity. Second, the presence of measurement error can result in severely biased parameter estimates, and also affects the ability of penalized methods such as the lasso to recover the true sparsity pattern. A new estimation procedure called SIMulation‐SELection‐EXtrapolation (SIMSELEX) is proposed. This procedure makes double use of lasso methodology. First, the lasso is used to estimate sparse solutions in the simulation step, after which a group lasso is implemented to do variable selection. The SIMSELEX estimator is shown to perform well in variable selection, and has significantly lower estimation error than naive estimators that ignore measurement error. SIMSELEX can be applied in a variety of errors‐in‐variables settings, including linear models, generalized linear models, and Cox survival models. It is furthermore shown in the Supporting Information how SIMSELEX can be applied to spline‐based regression models. A simulation study is conducted to compare the SIMSELEX estimators to existing methods in the linear and logistic model settings, and to evaluate performance compared to naive methods in the Cox and spline models. Finally, the method is used to analyze a microarray dataset that contains gene expression measurements of favorable histology Wilms tumors.  相似文献   

10.
Stratified Cox regression models with large number of strata and small stratum size are useful in many settings, including matched case-control family studies. In the presence of measurement error in covariates and a large number of strata, we show that extensions of existing methods fail either to reduce the bias or to correct the bias under nonsymmetric distributions of the true covariate or the error term. We propose a nonparametric correction method for the estimation of regression coefficients, and show that the estimators are asymptotically consistent for the true parameters. Small sample properties are evaluated in a simulation study. The method is illustrated with an analysis of Framingham data.  相似文献   

11.
Borchers DL  Efford MG 《Biometrics》2008,64(2):377-385
Live-trapping capture-recapture studies of animal populations with fixed trap locations inevitably have a spatial component: animals close to traps are more likely to be caught than those far away. This is not addressed in conventional closed-population estimates of abundance and without the spatial component, rigorous estimates of density cannot be obtained. We propose new, flexible capture-recapture models that use the capture locations to estimate animal locations and spatially referenced capture probability. The models are likelihood-based and hence allow use of Akaike's information criterion or other likelihood-based methods of model selection. Density is an explicit parameter, and the evaluation of its dependence on spatial or temporal covariates is therefore straightforward. Additional (nonspatial) variation in capture probability may be modeled as in conventional capture-recapture. The method is tested by simulation, using a model in which capture probability depends only on location relative to traps. Point estimators are found to be unbiased and standard error estimators almost unbiased. The method is used to estimate the density of Red-eyed Vireos (Vireo olivaceus) from mist-netting data from the Patuxent Research Refuge, Maryland, U.S.A. Estimates agree well with those from an existing spatially explicit method based on inverse prediction. A variety of additional spatially explicit models are fitted; these include models with temporal stratification, behavioral response, and heterogeneous animal home ranges.  相似文献   

12.
This paper introduces a statistical approach for high-level spatial analysis when there is little prior information about the shape or location of the region of interest in the underlying image and limited spatial resolution of the available data. Our work was motivated by a functional brain mapping technique called direct cortical electrical interference (DCEI) that gives binary observations at multiple sites throughout the brain. We estimate an underlying, binary spatial response function using a mixture of an unknown number of simple geometrical shapes (e.g. circles) with unknown centers and sizes to be estimated. Inference is made using reversible jump Markov chain Monte Carlo. The approach is illustrated with simulated examples and a real example with DCEI data.  相似文献   

13.
In many environmental epidemiology studies, the locations and/or times of exposure measurements and health assessments do not match. In such settings, health effects analyses often use the predictions from an exposure model as a covariate in a regression model. Such exposure predictions contain some measurement error as the predicted values do not equal the true exposures. We provide a framework for spatial measurement error modeling, showing that smoothing induces a Berkson-type measurement error with nondiagonal error structure. From this viewpoint, we review the existing approaches to estimation in a linear regression health model, including direct use of the spatial predictions and exposure simulation, and explore some modified approaches, including Bayesian models and out-of-sample regression calibration, motivated by measurement error principles. We then extend this work to the generalized linear model framework for health outcomes. Based on analytical considerations and simulation results, we compare the performance of all these approaches under several spatial models for exposure. Our comparisons underscore several important points. First, exposure simulation can perform very poorly under certain realistic scenarios. Second, the relative performance of the different methods depends on the nature of the underlying exposure surface. Third, traditional measurement error concepts can help to explain the relative practical performance of the different methods. We apply the methods to data on the association between levels of particulate matter and birth weight in the greater Boston area.  相似文献   

14.
A Bayesian procedure for misclassified binary data was developed. An animal breeding simulation indicated that, when error of classification was ignored, the variance between clusters was inferred incorrectly. Data were reanalyzed assuming that the probability of misclassification was either known or unknown. In the first case, input parameter values were recovered in the analysis. When the probability was unknown, there was a slight bias; the true probability of misclassification and the true number of miscoded observations appeared within high credibility regions. An analysis of fertility in dairy cows is presented.  相似文献   

15.
J M Neuhaus  N P Jewell 《Biometrics》1990,46(4):977-990
Recently a great deal of attention has been given to binary regression models for clustered or correlated observations. The data of interest are of the form of a binary dependent or response variable, together with independent variables X1,...., Xk, where sets of observations are grouped together into clusters. A number of models and methods of analysis have been suggested to study such data. Many of these are extensions in some way of the familiar logistic regression model for binary data that are not grouped (i.e., each cluster is of size 1). In general, the analyses of these clustered data models proceed by assuming that the observed clusters are a simple random sample of clusters selected from a population of clusters. In this paper, we consider the application of these procedures to the case where the clusters are selected randomly in a manner that depends on the pattern of responses in the cluster. For example, we show that ignoring the retrospective nature of the sample design, by fitting standard logistic regression models for clustered binary data, may result in misleading estimates of the effects of covariates and the precision of estimated regression coefficients.  相似文献   

16.
Ko H  Davidian M 《Biometrics》2000,56(2):368-375
The nonlinear mixed effects model is used to represent data in pharmacokinetics, viral dynamics, and other areas where an objective is to elucidate associations among individual-specific model parameters and covariates; however, covariates may be measured with error. For additive measurement error, we show substitution of mismeasured covariates for true covariates may lead to biased estimators for fixed effects and random effects covariance parameters, while regression calibration may eliminate bias in fixed effects but fail to correct that in covariance parameters. We develop methods to take account of measurement error that correct this bias and may be implemented with standard software, and we demonstrate their utility via simulation and application to data from a study of HIV dynamics.  相似文献   

17.
Ecologists are increasingly using statistical models to predict animal abundance and occurrence in unsampled locations. The reliability of such predictions depends on a number of factors, including sample size, how far prediction locations are from the observed data, and similarity of predictive covariates in locations where data are gathered to locations where predictions are desired. In this paper, we propose extending Cook’s notion of an independent variable hull (IVH), developed originally for application with linear regression models, to generalized regression models as a way to help assess the potential reliability of predictions in unsampled areas. Predictions occurring inside the generalized independent variable hull (gIVH) can be regarded as interpolations, while predictions occurring outside the gIVH can be regarded as extrapolations worthy of additional investigation or skepticism. We conduct a simulation study to demonstrate the usefulness of this metric for limiting the scope of spatial inference when conducting model-based abundance estimation from survey counts. In this case, limiting inference to the gIVH substantially reduces bias, especially when survey designs are spatially imbalanced. We also demonstrate the utility of the gIVH in diagnosing problematic extrapolations when estimating the relative abundance of ribbon seals in the Bering Sea as a function of predictive covariates. We suggest that ecologists routinely use diagnostics such as the gIVH to help gauge the reliability of predictions from statistical models (such as generalized linear, generalized additive, and spatio-temporal regression models).  相似文献   

18.
Species data held in museum and herbaria, survey data and opportunistically observed data are a substantial information resource. A key challenge in using these data is the uncertainty about where an observation is located. This is important when the data are used for species distribution modelling (SDM), because the coordinates are used to extract the environmental variables and thus, positional error may lead to inaccurate estimation of the species–environment relationship. The magnitude of this effect is related to the level of spatial autocorrelation in the environmental variables. Using local spatial association can be relevant because it can lead to the identification of the specific occurrence records that cause the largest drop in SDM accuracy. Therefore, in this study, we tested whether the SDM predictions are more affected by positional uncertainty originating from locations that have lower local spatial association in their predictors. We performed this experiment for Spain and the Netherlands, using simulated datasets derived from well known species distribution models (SDMs). We used the K statistic to quantify the local spatial association in the predictors at each species occurrence location. A probabilistic approach using Monte Carlo simulations was employed to introduce the error in the species locations. The results revealed that positional uncertainty in species occurrence data at locations with low local spatial association in predictors reduced the prediction accuracy of the SDMs. We propose that local spatial association is a way to identify the species occurrence records that require treatment for positional uncertainty. We also developed and present a tool in the R environment to target observations that are likely to create error in the output from SDMs as a result of positional uncertainty.  相似文献   

19.
Scale for resource selection functions   总被引:3,自引:0,他引:3  
Resource selection functions (RSFs) are statistical models defined to be proportional to the probability of use of a resource unit. My objective with this review is to identify how RSFs can be used to unravel the influence of scale in habitat selection. In wildlife habitat studies, including radiotelemetry, RSFs can be estimated using a variety of statistical methods, all of which can be used to explore the role of scale. All RSFs are bounded by the resolution of data and the spatial extent of the study area, but also allow predictor covariates to be measured at a variety of scales. Conditional logistic regression permits designs (e.g. matched case) that relate the process of habitat selection to a limited domain of resource units that might better characterize what is truly ‘available’ to the animal. Scale influences the process of habitat selection, e.g. food resources are often selected at fine spatial scales, whereas landscape patterns at much larger scales typically influence the location of home ranges. Scale also influences appropriate sampling in many ways: (1) heterogeneity might be obliterated (transmutation) if resolution or grain size is too large, (2) variance of habitat characteristics might be undersampled if extent or domain is too small, (3) timing and duration of observations can influence RSF models, and (d) both spatial and temporal autocorrelations can vary directly with the intensity of sampling. Using RSFs, researchers can examine habitat selection at multiple scales, and predictive models that bridge scales can be estimated. Using Geographical Information Systems, predictor covariates in RSF models can be measured at different scales easily so that the predictive ability of models at alternative spatial and temporal domains can be explored by the investigator. Identification of the scale that best explains the data can be evaluated by comparing alternative models using information‐theoretic metrics such as Akaike Information Criteria, and predictive capability of the models can be assessed using k‐fold cross validation.  相似文献   

20.
Missing data is a common issue in research using observational studies to investigate the effect of treatments on health outcomes. When missingness occurs only in the covariates, a simple approach is to use missing indicators to handle the partially observed covariates. The missing indicator approach has been criticized for giving biased results in outcome regression. However, recent papers have suggested that the missing indicator approach can provide unbiased results in propensity score analysis under certain assumptions. We consider assumptions under which the missing indicator approach can provide valid inferences, namely, (1) no unmeasured confounding within missingness patterns; either (2a) covariate values of patients with missing data were conditionally independent of treatment or (2b) these values were conditionally independent of outcome; and (3) the outcome model is correctly specified: specifically, the true outcome model does not include interactions between missing indicators and fully observed covariates. We prove that, under the assumptions above, the missing indicator approach with outcome regression can provide unbiased estimates of the average treatment effect. We use a simulation study to investigate the extent of bias in estimates of the treatment effect when the assumptions are violated and we illustrate our findings using data from electronic health records. In conclusion, the missing indicator approach can provide valid inferences for outcome regression, but the plausibility of its assumptions must first be considered carefully.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号