首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
T R Fears  C C Brown 《Biometrics》1986,42(4):955-960
There are a number of possible designs for case-control studies. The simplest uses two separate simple random samples, but an actual study may use more complex sampling procedures. Typically, stratification is used to control for the effects of one or more risk factors in which we are interested. It has been shown (Anderson, 1972, Biometrika 59, 19-35; Prentice and Pyke, 1979, Biometrika 66, 403-411) that the unconditional logistic regression estimators apply under stratified sampling, so long as the logistic model includes a term for each stratum. We consider the case-control problem with stratified samples and assume a logistic model that does not include terms for strata, i.e., for fixed covariates the (prospective) probability of disease does not depend on stratum. We assume knowledge of the proportion sampled in each stratum as well as the total number in the stratum. We use this knowledge to obtain the maximum likelihood estimators for all parameters in the logistic model including those for variables completely associated with strata. The approach may also be applied to obtain estimators under probability sampling.  相似文献   

2.
Ko H  Davidian M 《Biometrics》2000,56(2):368-375
The nonlinear mixed effects model is used to represent data in pharmacokinetics, viral dynamics, and other areas where an objective is to elucidate associations among individual-specific model parameters and covariates; however, covariates may be measured with error. For additive measurement error, we show substitution of mismeasured covariates for true covariates may lead to biased estimators for fixed effects and random effects covariance parameters, while regression calibration may eliminate bias in fixed effects but fail to correct that in covariance parameters. We develop methods to take account of measurement error that correct this bias and may be implemented with standard software, and we demonstrate their utility via simulation and application to data from a study of HIV dynamics.  相似文献   

3.
Suitability of trees as hosts for epiphytic lichens are studied in a forest stand of size 25 ha. Suitability is measured as occupation probabilites which are modelled using hierarchical Bayesian approach. These probabilities are useful for an ecologist. They give smoothed spatial distribution map of suitability for each of the species and can be used in detecting high‐ and low‐probability areas. In addition, suitability is explained by tree‐level covariates. Spatial dependence, which is due to unobserved spatially structured covariates, is modelled through an unobserved Markov random field. Markov chain Monte Carlo method has been applied in Bayesian computation. The extensive spatial data consist of the occurrences of eight lichen species and one bryophyte on all of the 1253 potential host trees. In addition, coordinates of the trees and several tree characteristics have been recorded. The data have been analysed for four most abundant species: Lobaria pulmonaria, Nephroma bellum, Nephroma parile and Peltigera praetextata. The tree level parameters, subject to estimation, consist of the occurrence probabilities for each tree and for each lichen species. Model validation is discussed in detail and, in addition to Bayesian validation tools, the autologistic model and case‐control design based on logistic regression have been suggested for validation of covariate effects. As a result we present suitability maps for the four lichen species. We observed, that among the observed tree covariates, the diameter at breast height (DBH) correlates with lichen occurrence. Our modelling approach has close connections to disease mapping in spatial epidemiology.  相似文献   

4.
Georeferencing error is prevalent in datasets used to model species distributions, inducing uncertainty in covariate values associated with species occurrences that result in biased probability of occurrence estimates. Traditionally, this error has been dealt with at the data‐level by using only records with an acceptable level of error (filtering) or by summarizing covariates at sampling units by using measures of central tendency (averaging). Here we compare those previous approaches to a novel implementation of a Bayesian logistic regression with measurement error (ME), a seldom used method in species distribution modeling. We show that the ME model outperforms data‐level approaches on 1) specialist species and 2) when either sample sizes are small, the georeferencing error is large or when all georeferenced occurrences have a fixed level of error. Thus, for certain types of species and datasets the ME model is an effective method to reduce biases in probability of occurrence estimates and account for the uncertainty generated by georeferencing error. Our approach may be expanded for its use with presence‐only data as well as to include other sources of uncertainty in species distribution models.  相似文献   

5.
In this paper we develop a Bayesian approach to parameter estimation in a stochastic spatio-temporal model of the spread of invasive species across a landscape. To date, statistical techniques, such as logistic and autologistic regression, have outstripped stochastic spatio-temporal models in their ability to handle large numbers of covariates. Here we seek to address this problem by making use of a range of covariates describing the bio-geographical features of the landscape. Relative to regression techniques, stochastic spatio-temporal models are more transparent in their representation of biological processes. They also explicitly model temporal change, and therefore do not require the assumption that the species' distribution (or other spatial pattern) has already reached equilibrium as is often the case with standard statistical approaches. In order to illustrate the use of such techniques we apply them to the analysis of data detailing the spread of an invasive plant, Heracleum mantegazzianum, across Britain in the 20th Century using geo-referenced covariate information describing local temperature, elevation and habitat type. The use of Markov chain Monte Carlo sampling within a Bayesian framework facilitates statistical assessments of differences in the suitability of different habitat classes for H. mantegazzianum, and enables predictions of future spread to account for parametric uncertainty and system variability. Our results show that ignoring such covariate information may lead to biased estimates of key processes and implausible predictions of future distributions.  相似文献   

6.
Two-phase designs can reduce the cost of epidemiological studies by limiting the ascertainment of expensive covariates or/and exposures to an efficiently selected subset (phase-II) of a larger (phase-I) study. Efficient analysis of the resulting data set combining disparate information from phase-I and phase-II, however, can be complex. Most of the existing methods, including semiparametric maximum-likelihood estimator, require the information in phase-I to be summarized into a fixed number of strata. In this paper, we describe a novel method for the analysis of two-phase studies where information from phase-I is summarized by parameters associated with a reduced logistic regression model of the disease outcome on available covariates. We then setup estimating equations for parameters associated with the desired extended logistic regression model, based on information on the reduced model parameters from phase-I and complete data available at phase-II after accounting for nonrandom sampling design. We use generalized method of moments to solve overly identified estimating equations and develop the resulting asymptotic theory for the proposed estimator. Simulation studies show that the use of reduced parametric models, as opposed to summarizing data into strata, can lead to more efficient utilization of phase-I data. An application of the proposed method is illustrated using the data from the U.S. National Wilms Tumor Study.  相似文献   

7.
Robert M. Dorazio 《Biometrics》2012,68(4):1303-1312
Summary Several models have been developed to predict the geographic distribution of a species by combining measurements of covariates of occurrence at locations where the species is known to be present with measurements of the same covariates at other locations where species occurrence status (presence or absence) is unknown. In the absence of species detection errors, spatial point‐process models and binary‐regression models for case‐augmented surveys provide consistent estimators of a species’ geographic distribution without prior knowledge of species prevalence. In addition, these regression models can be modified to produce estimators of species abundance that are asymptotically equivalent to those of the spatial point‐process models. However, if species presence locations are subject to detection errors, neither class of models provides a consistent estimator of covariate effects unless the covariates of species abundance are distinct and independently distributed from the covariates of species detection probability. These analytical results are illustrated using simulation studies of data sets that contain a wide range of presence‐only sample sizes. Analyses of presence‐only data of three avian species observed in a survey of landbirds in western Montana and northern Idaho are compared with site‐occupancy analyses of detections and nondetections of these species.  相似文献   

8.
我国林火发生预测模型研究进展   总被引:2,自引:0,他引:2  
通过文献回顾,总结了国内林火发生预测模型的研究现状,并从林火发生驱动因子、林火发生概率预测模型、林火发生频次预测模型和模型检验方法等方面进行归纳分析。得出以下结论: 1)气象、地形、植被、可燃物、人类活动等因素是影响林火发生及模型预测精度的主要驱动因子;2)林火发生概率模型中,地理加权逻辑斯蒂回归模型考虑了变量之间的空间相关性,Gompit回归模型适宜非对称结构的林火数据,随机森林模型不需要多重共线性检验,在避免过度拟合的同时提高了预测精度,是林火发生概率预测模型的优选方法之一;3)林火发生频次模型中,负二项回归模型更适合对过度离散数据进行模拟,零膨胀模型和栅栏模型可以处理林火数据中包含大量零值的问题;4)ROC检验、AIC检验、似然比检验和Wald检验方法是林火概率和频次模型的常用检验方法。林火发生预测模型研究仍是我国当前林火管理工作的重点,预测模型的选择需要依据不同地区林火数据特点。此外,构建林火预测模型时需要考虑更多的影响因素,以提高模型预测精度;未来,需要进一步探索其他数学模型在林火发生预测中的应用,不断提高林火发生预测模型的准确度。  相似文献   

9.
Horton NJ  Laird NM 《Biometrics》2001,57(1):34-42
This article presents a new method for maximum likelihood estimation of logistic regression models with incomplete covariate data where auxiliary information is available. This auxiliary information is extraneous to the regression model of interest but predictive of the covariate with missing data. Ibrahim (1990, Journal of the American Statistical Association 85, 765-769) provides a general method for estimating generalized linear regression models with missing covariates using the EM algorithm that is easily implemented when there is no auxiliary data. Vach (1997, Statistics in Medicine 16, 57-72) describes how the method can be extended when the outcome and auxiliary data are conditionally independent given the covariates in the model. The method allows the incorporation of auxiliary data without making the conditional independence assumption. We suggest tests of conditional independence and compare the performance of several estimators in an example concerning mental health service utilization in children. Using an artificial dataset, we compare the performance of several estimators when auxiliary data are available.  相似文献   

10.
In many biometrical applications, the count data encountered often contain extra zeros relative to the Poisson distribution. Zero‐inflated Poisson regression models are useful for analyzing such data, but parameter estimates may be seriously biased if the nonzero observations are over‐dispersed and simultaneously correlated due to the sampling design or the data collection procedure. In this paper, a zero‐inflated negative binomial mixed regression model is presented to analyze a set of pancreas disorder length of stay (LOS) data that comprised mainly same‐day separations. Random effects are introduced to account for inter‐hospital variations and the dependency of clustered LOS observations. Parameter estimation is achieved by maximizing an appropriate log‐likelihood function using an EM algorithm. Alternative modeling strategies, namely the finite mixture of Poisson distributions and the non‐parametric maximum likelihood approach, are also considered. The determination of pertinent covariates would assist hospital administrators and clinicians to manage LOS and expenditures efficiently.  相似文献   

11.
In ecology, as in other research fields, efficient sampling for population estimation often drives sample designs toward unequal probability sampling, such as in stratified sampling. Design based statistical analysis tools are appropriate for seamless integration of sample design into the statistical analysis. However, it is also common and necessary, after a sampling design has been implemented, to use datasets to address questions that, in many cases, were not considered during the sampling design phase. Questions may arise requiring the use of model based statistical tools such as multiple regression, quantile regression, or regression tree analysis. However, such model based tools may require, for ensuring unbiased estimation, data from simple random samples, which can be problematic when analyzing data from unequal probability designs. Despite numerous method specific tools available to properly account for sampling design, too often in the analysis of ecological data, sample design is ignored and consequences are not properly considered. We demonstrate here that violation of this assumption can lead to biased parameter estimates in ecological research. In addition, to the set of tools available for researchers to properly account for sampling design in model based analysis, we introduce inverse probability bootstrapping (IPB). Inverse probability bootstrapping is an easily implemented method for obtaining equal probability re-samples from a probability sample, from which unbiased model based estimates can be made. We demonstrate the potential for bias in model-based analyses that ignore sample inclusion probabilities, and the effectiveness of IPB sampling in eliminating this bias, using both simulated and actual ecological data. For illustration, we considered three model based analysis tools—linear regression, quantile regression, and boosted regression tree analysis. In all models, using both simulated and actual ecological data, we found inferences to be biased, sometimes severely, when sample inclusion probabilities were ignored, while IPB sampling effectively produced unbiased parameter estimates.  相似文献   

12.
13.
Primary analysis of case-control studies focuses on the relationship between disease (D) and a set of covariates of interest (Y,X). A secondary application of the case-control study, often invoked in modern genetic epidemiologic association studies, is to investigate the interrelationship between the covariates themselves. The task is complicated due to the case-control sampling, and to avoid the biased sampling that arises from the design, it is typical to use the control data only. In this paper, we develop penalized regression spline methodology that uses all the data, and improves precision of estimation compared to using only the controls. A simulation study and an empirical example are used to illustrate the methodology.  相似文献   

14.
Managing forest ecosystems for sustainable, multiple use requires forest resource managers to understand and predict how plant species composition and distribution varies across environmental gradients and responds to landscape scale disturbances. This study demonstrates predictive vegetation modeling and mapping for a Northeast Oregon forest using non-parametric Multiplicative Regression (NPMR) with presence/absence data for the species Clintonia uniflora (CLUN) and a set of stand structural and raster-based predictor variables. NPMR is a flexible probability modeling system that can find the best subset of habitat factors influencing species occurrence. NPMR was compared with logistic regression (LR) by building reduced models from variables selected as best by NPMR and full models from variables identified as significant with a forward stepwise process and further manual testing. log β was used to select models with the highest predictive capability. NPMR models were less complex and had higher predictive capability than LR for all modeling approaches. Spatial coordinates were among the most powerful predictors and the modeling approach with physiographic and stand structural variables together was the most improved relative to the average frequency of occurrence. GIS probability maps produced with the application of the physiographic models showed good spatial congruence between high probability values and plots that contained CLUN. NPMR proved to be a reliable probability modeling and mapping tool that could be used as the analytical link between monitoring and quantifying the status and trends of vegetation resources.  相似文献   

15.
Aerial distance sampling of bears to estimate population size has been used throughout many parts of Alaska. The distance sampling models are complex since they need to account for undetected bears and differences in detection probabilities. This will require covariates and mark‐recapture data. The models proposed by Schmidt et al. do not use covariates or mark‐recapture data and are inappropriate for these surveys.  相似文献   

16.
  1. Close‐kin mark–recapture (CKMR) is a method for estimating abundance and vital rates from kinship relationships observed in genetic samples. CKMR inference only requires animals to be sampled once (e.g., lethally), potentially widening the scope of population‐level inference relative to traditional monitoring programs.
  2. One assumption of CKMR is that, conditional on individual covariates like age, all animals have an equal probability of being sampled. However, if genetic data are collected opportunistically (e.g., via hunters or fishers), there is potential for spatial variation in sampling probability that can bias CKMR estimators, particularly when genetically related individuals stay in close proximity.
  3. We used individual‐based simulation to investigate consequences of dispersal limitation and spatially biased sampling on performance of naive (nonspatial) CKMR estimators of abundance, fecundity, and adult survival. Population dynamics approximated that of a long‐lived mammal species subject to lethal sampling.
  4. Naive CKMR abundance estimators were relatively unbiased when dispersal was unconstrained (i.e., complete mixing) or when sampling was random or subject to moderate levels of spatial variation. When dispersal was limited, extreme variation in spatial sampling probabilities negatively biased abundance estimates. Reproductive schedules and survival were well estimated, except for survival when adults could emigrate out of the sampled area. Incomplete mixing was readily detected using Kolmogorov–Smirnov tests.
  5. Although CKMR appears promising for estimating abundance and vital rates with opportunistically collected genetic data, care is needed when dispersal limitation is coupled with spatially biased sampling. Fortunately, incomplete mixing is easily detected with adequate sample sizes. In principle, it is possible to devise and fit spatially explicit CKMR models to avoid bias under dispersal limitation, but development of such models necessitates additional complexity (and possibly additional data). We suggest using simulation studies to examine potential bias and precision of proposed modeling approaches prior to implementing a CKMR program.
  相似文献   

17.
In Western Europe, habitat loss and landscape fragmentation has led to significant population decline in various animal groups, including amphibians. The extinction of the last natural populations of the yellow-bellied toad in Belgium, Luxembourg and several regions of southern and western France suggests a widespread decline. By using site-occupancy models and adding covariates corresponding to the human-influenced features of the landscape, we tried to identify the relative effects of different land-use types on the species’ distribution pattern in a man-made environment (the Alsatian Rhine floodplain in France). We recorded presence–absence data in 150 forest sample plots (300 × 300 m) and then modeled species distribution while taking into account detection errors in the field. Land-use was recorded on two spatial scales: within the forest sample plots and in a 1500 m radius buffer area around the forest plots. In the forest plots, toad occurrence was negatively correlated with loss of forest cover to agricultural land. In contrast, occurrence is positively correlated with the density of human-made rutted dirt paths and tracks, which provide semi-natural breeding sites. In the 1500 m radius buffer zones around forest plots, toad occurrence was negatively correlated with the density of urbanization and road networks. These results can be used to plan conservation strategies for amphibians in human-dominated landscapes.  相似文献   

18.
Sutradhar BC  Das K 《Biometrics》2000,56(2):622-625
Liang and Zeger (1986, Biometrika 73, 13-22) introduced a generalized estimating equation (GEE) approach based on a working correlation matrix to obtain efficient estimators of regression parameters in the class of generalized linear models for repeated measures data. As demonstrated by Crowder (1995, Biometrika 82, 407-410), because of uncertainty of the definition of the working correlation matrix, the Liang-Zeger approach may, in some cases, lead to a complete breakdown of the estimation of the regression parameters. After taking this comment of Crowder into account, recently Sutradhar and Das (1999, Biometrika 86, 459-465) examined the loss of efficiency of the regression estimators due to misspecification of the correlation structures. But their study was confined to the regression estimation with cluster-level covariates, as in the original paper of Liang and Zeger. In this paper, we study this efficiency loss problem for the generalized regression models with within-cluster covariates by utilizing the approach of Sutradhar and Das (1999).  相似文献   

19.
Errors‐in‐variables models in high‐dimensional settings pose two challenges in application. First, the number of observed covariates is larger than the sample size, while only a small number of covariates are true predictors under an assumption of model sparsity. Second, the presence of measurement error can result in severely biased parameter estimates, and also affects the ability of penalized methods such as the lasso to recover the true sparsity pattern. A new estimation procedure called SIMulation‐SELection‐EXtrapolation (SIMSELEX) is proposed. This procedure makes double use of lasso methodology. First, the lasso is used to estimate sparse solutions in the simulation step, after which a group lasso is implemented to do variable selection. The SIMSELEX estimator is shown to perform well in variable selection, and has significantly lower estimation error than naive estimators that ignore measurement error. SIMSELEX can be applied in a variety of errors‐in‐variables settings, including linear models, generalized linear models, and Cox survival models. It is furthermore shown in the Supporting Information how SIMSELEX can be applied to spline‐based regression models. A simulation study is conducted to compare the SIMSELEX estimators to existing methods in the linear and logistic model settings, and to evaluate performance compared to naive methods in the Cox and spline models. Finally, the method is used to analyze a microarray dataset that contains gene expression measurements of favorable histology Wilms tumors.  相似文献   

20.
Mosses and lichens are the dominant macrophytes of the Antarctic terrestrial ecosystem. Using occurrence data from existing databases and additional published records, we analyzed patterns of moss and lichen species diversity on the Antarctic Peninsula at both a regional scale (1°latitudinal bands) and a local scale (52 and 56 individual snow‐ and ice‐free coastal areas for mosses and lichens, respectively) to test hypothesized relationships between species diversity and environmental factors, and to identify locations whose diversity may be particularly poorly represented by existing collections and online databases. We found significant heterogeneity in sampling frequency, number of records collected, and number of species found among analysis units at the two spatial scales, and estimated species richness using projected species accumulation curves to account for potential biases stemming from sample heterogeneity. Our estimates of moss and lichen richness for the entire Antarctic Peninsula region were within 20% of the total number of known species. Area, latitude, spatial isolation, mean summer temperature, and penguin colony size were considered as potential covariates of estimated species richness. Moss richness was correlated with isolation and latitude at the local scale, while lichen richness was correlated with summer mean temperature and, for 17 sites where penguins where present with <20 000 breeding pairs, penguin colony size. At the regional scale, moss richness was correlated with temperature and latitude. Lichen richness, by contrast, was not significantly correlated with any of the variables considered at the regional scale. With the exception of temperature, which explained 91% of the variation in regional moss diversity, explained variance was very low. Our results show that patterns of moss and lichen biodiversity are highly scale‐dependent and largely unexplained by the biogeographic variables found important in other systems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号