首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Kulldorff M  Fang Z  Walsh SJ 《Biometrics》2003,59(2):323-331
Many databases exist with which it is possible to study the relationship between health events and various potential risk factors. Among these databases, some have variables that naturally form a hierarchical tree structure, such as pharmaceutical drugs and occupations. It is of great interest to use such databases for surveillance purposes in order to detect unsuspected relationships to disease risk. We propose a tree-based scan statistic, by which the surveillance can be conducted with a minimum of prior assumptions about the group of occupations/drugs that increase risk, and which adjusts for the multiple testing inherent in the many potential combinations. The method is illustrated using data from the National Center for Health Statistics Multiple Cause of Death Database, looking at the relationship between occupation and death from silicosis.  相似文献   

2.
ABSTRACT Ecologists often develop complex regression models that include multiple categorical and continuous variables, interactions among predictors, and nonlinear relationships between the response and predictor variables. Nomograms, which are graphical devices for presenting mathematical functions and calculating output values, can aid biologists in interpreting and presenting these complex models. To illustrate benefits of nomograms, we developed a logistic regression model of elk (Cervus elaphus) resource selection. With this model, we demonstrated how a nomogram helps scientists and managers interpret interactions among variables, compare the relative biological importance of variables, and examine predicted shapes of relationships (e.g., linear vs. nonlinear) between response and predictor variables. Although our example focused on logistic regression, nomograms are equally useful for other linear and nonlinear models. Regardless of the approach used for model development, nomograms and other graphical summaries can help scientists and managers develop, interpret, and apply statistical models.  相似文献   

3.
《PloS one》2013,8(6)

Background

Caesarean delivery (CD) rates are commonly used as an indicator of quality in obstetric care and risk adjustment evaluation is recommended to assess inter-institutional variations. The aim of this study was to evaluate whether the Ten Group classification system (TGCS) can be used in case-mix adjustment.

Methods

Standardized data on 15,255 deliveries from 11 different regional centers were prospectively collected. Crude Risk Ratios of CDs were calculated for each center. Two multiple logistic regression models were herein considered by using: Model 1- maternal (age, Body Mass Index), obstetric variables (gestational age, fetal presentation, single or multiple, previous scar, parity, neonatal birth weight) and presence of risk factors; Model 2- TGCS either with or without maternal characteristics and presence of risk factors. Receiver Operating Characteristic (ROC) curves of the multivariate logistic regression analyses were used to assess the diagnostic accuracy of each model. The null hypothesis that Areas under ROC Curve (AUC) were not different from each other was verified with a Chi Square test and post hoc pairwise comparisons by using a Bonferroni correction.

Results

Crude evaluation of CD rates showed all centers had significantly higher Risk Ratios than the referent. Both multiple logistic regression models reduced these variations. However the two methods ranked institutions differently: model 1 and model 2 (adjusted for TGCS) identified respectively nine and eight centers with significantly higher CD rates than the referent with slightly different AUCs (0.8758 and 0.8929 respectively). In the adjusted model for TGCS and maternal characteristics/presence of risk factors, three centers had CD rates similar to the referent with the best AUC (0.9024).

Conclusions

The TGCS might be considered as a reliable variable to adjust CD rates. The addition of maternal characteristics and risk factors to TGCS substantially increase the predictive discrimination of the risk adjusted model.  相似文献   

4.
We compared probability surfaces derived using one set of environmental variables in three Geographic Information Systems (GIS) -based approaches: logistic regression and Akaike's Information Criterion (AIC),Multiple Criteria Evaluation (MCE),and Bayesian Analysis (specifically Dempster-Shafer theory). We used lynx Lynx canadensis as our focal species,and developed our environment relationship model using track data collected in Banff National Park,Alberta,Canada,during winters from 1997 to 2000. The accuracy of the three spatial models were compared using a contingency table method. We determined the percentage of cases in which both presence and absence points were correctly classified (overall accuracy),the failure to predict a species where it occurred (omission error) and the prediction of presence where there was absence (commission error). Our overall accuracy showed the logistic regression approach was the most accurate (74.51% ). The multiple criteria evaluation was intermediate (39.22%),while the Dempster-Shafer (D-S) theory model was the poorest (29.90%). However,omission and commission error tell us a different story: logistic regression had the lowest commission error,while D-S theory produced the lowest omission error. Our results provide evidence that habitat modellers should evaluate all three error measures when ascribing confidence in their model. We suggest that for our study area at least,the logistic regression model is optimal. However,where sample size is small or the species is very rare,it may also be useful to explore and/or use a more ecologically cautious modelling approach (e.g. Dempster-Shafer) that would over-predict,protect more sites,and thereby minimize the risk of missing critical habitat in conservation plans.  相似文献   

5.
Analysis through logistic regression explored to investigate the relationship between binary or multivariable ordinal response probability and in one or more explanatory variables. The main objectives of this study to investigate advanced prediction risk factor of Coronary Heart Disease (CHD) using a logit model. Attempts made to reduce risk factors, increase public or professional awareness. Logit model used to evaluate the probability of a person develop CHD, considering any factors such as age, gender, high low-density lipoprotein (LDL) cholesterol, low high-density lipoprotein (HDL) cholesterol, high blood pressure, family history of CHD younger than 45, diabetes, smoking, being post-menopausal for women and being older than 45 for men. Logit concept of brief statistics described with slight modification to estimate the parameters testing for the significance of the coefficients, confidence interval fits the simple, multiple logit models. Besides, interpretation of the fitted logit regression model introduced. Variables showing best results within the scientific context, good explanation data assessed to fit an estimated logit model containing chosen variables, this present experiment used the statistical inference procedure; chi-square distribution, likelihood ratio, Score, or Wald test and goodness-of-fit. Health promotion started with increased public or professional awareness improved for early detection of CHD, to reduce the risk of mortality, aimed to be Saudi vision by 2030.  相似文献   

6.
Motivated by a clinical prediction problem, a simulation study was performed to compare different approaches for building risk prediction models. Robust prediction models for hospital survival in patients with acute heart failure were to be derived from three highly correlated blood parameters measured up to four times, with predictive ability having explicit priority over interpretability. Methods that relied only on the original predictors were compared with methods using an expanded predictor space including transformations and interactions. Predictors were simulated as transformations and combinations of multivariate normal variables which were fitted to the partly skewed and bimodally distributed original data in such a way that the simulated data mimicked the original covariate structure. Different penalized versions of logistic regression as well as random forests and generalized additive models were investigated using classical logistic regression as a benchmark. Their performance was assessed based on measures of predictive accuracy, model discrimination, and model calibration. Three different scenarios using different subsets of the original data with different numbers of observations and events per variable were investigated. In the investigated setting, where a risk prediction model should be based on a small set of highly correlated and interconnected predictors, Elastic Net and also Ridge logistic regression showed good performance compared to their competitors, while other methods did not lead to substantial improvements or even performed worse than standard logistic regression. Our work demonstrates how simulation studies that mimic relevant features of a specific data set can support the choice of a good modeling strategy.  相似文献   

7.
For this study a simulation is conducted to investigate the accuracy of neural networks and logistic regression in identifying populations at high risk for occupational back injury. In contrast to most standard regression techniques, neural networks do not rely on linearity or explicitly specifying the nature of the association. Because the underlying relationships between work exposures, personal risk factors, and injury are often not well defined, neural networks may prove useful for injury risk assessment. Accuracy was assessed by comparing the injury status to the predicted level of risk in each worker. In simulations of a non-linear association, workers (used in the training data) were correctly classified 85% of the time with neural networks, 74% of the time with the main effects logistic model, and 79% of the time with the fully-specified logistic model. Using the test data, however, workers were correctly classified 67% of the time with neural networks, and 71% and 69% of the time with the main effects and fully specified logistic models, respectively. Simulations of a null association indicated that neural networks may be more likely to overfit random associations. These findings provide a valuable guide concerning statistical methodology for identifying high-risk worker populations.  相似文献   

8.
Unintentional injuries cause much of the global mortality burden, with the workplace being a common accident setting. Even in high-income economies, occupational injury figures remain remarkably high. Because risk factors for occupational injuries are prone to confounding, the present research takes a comprehensive approach. To better understand the occurrence of occupational injuries, sociodemographic factors and work- and health-related factors are tested simultaneously. Thus, the present analysis aims to develop a comprehensive epidemiological model that facilitates the explanation of varying injury rates in the workplace. The representative phone survey German Health Update 2010 provides information on medically treated occupational injuries sustained in the year prior to the interview. Data were collected on sociodemographics, occupation, working conditions, health-related behaviors, and chronic diseases. For the economically active population (18–70 years, n = 14,041), the 12-month prevalence of occupational injuries was calculated with a 95% confidence interval (CI). Blockwise multiple logistic regression was applied to successively include different groups of variables. Overall, 2.8% (95% CI 2.4–3.2) of the gainfully employed population report at least one occupational injury (women: 0.9%; 95% CI 0.7–1.2; men: 4.3%; 95% CI 3.7–5.0). In the fully adjusted model, male gender (OR 3.16) and age 18–29 (OR 1.54), as well as agricultural (OR 5.40), technical (OR 3.41), skilled service (OR 4.24) or manual (OR 5.12), and unskilled service (OR 3.13) or manual (OR 4.97) occupations are associated with higher chances of occupational injuries. The same holds for frequent stressors such as heavy carrying (OR 1.78), working in awkward postures (OR 1.46), environmental stress (OR 1.48), and working under pressure (OR 1.41). Among health-related variables, physical inactivity (OR 1.47) and obesity (OR 1.73) present a significantly higher chance of occupational injuries. While the odds for most work-related factors were as expected, the associations for health-related factors such as smoking, drinking, and chronic diseases were rather weak. In part, this may be due to context-specific factors such as safety and workplace regulations in high-income countries like Germany. This assumption could guide further research, taking a multi-level approach to international comparisons.  相似文献   

9.
Variations in spatio-temporal patterns of Human Monocytic Ehrlichiosis (HME) infection in the state of Kansas, USA were examined and the relationship between HME relative risk and various environmental, climatic and socio-economic variables were evaluated. HME data used in the study was reported to the Kansas Department of Health and Environment between years 2005–2012, and geospatial variables representing the physical environment [National Land cover/Land use, NASA Moderate Resolution Imaging Spectroradiometer (MODIS)], climate [NASA MODIS, Prediction of Worldwide Renewable Energy (POWER)], and socio-economic conditions (US Census Bureau) were derived from publicly available sources. Following univariate screening of candidate variables using logistic regressions, two Bayesian hierarchical models were fit; a partial spatio-temporal model with random effects and a spatio-temporal interaction term, and a second model that included additional covariate terms. The best fitting model revealed that spatio-temporal autocorrelation in Kansas increased steadily from 2005–2012, and identified poverty status, relative humidity, and an interactive factor, ‘diurnal temperature range x mixed forest area’ as significant county-level risk factors for HME. The identification of significant spatio-temporal pattern and new risk factors are important in the context of HME prevention, for future research in the areas of ecology and evolution of HME, and as well as climate change impacts on tick-borne diseases.  相似文献   

10.
Large-scale surveys, such as national forest inventories and vegetation monitoring programs, usually have complex sampling designs that include geographical stratification and units organized in clusters. When models are developed using data from such programs, a key question is whether or not to utilize design information when analyzing the relationship between a response variable and a set of covariates. Standard statistical regression methods often fail to account for complex sampling designs, which may lead to severely biased estimators of model coefficients. Furthermore, ignoring that data are spatially correlated within clusters may underestimate the standard errors of regression coefficient estimates, with a risk for drawing wrong conclusions. We first review general approaches that account for complex sampling designs, e.g. methods using probability weighting, and stress the need to explore the effects of the sampling design when applying logistic regression models. We then use Monte Carlo simulation to compare the performance of the standard logistic regression model with two approaches to model correlated binary responses, i.e. cluster-specific and population-averaged logistic regression models. As an example, we analyze the occurrence of epiphytic hair lichens in the genus Bryoria; an indicator of forest ecosystem integrity. Based on data from the National Forest Inventory (NFI) for the period 1993–2014 we generated a data set on hair lichen occurrence on  >100,000 Picea abies trees distributed throughout Sweden. The NFI data included ten covariates representing forest structure and climate variables potentially affecting lichen occurrence. Our analyses show the importance of taking complex sampling designs and correlated binary responses into account in logistic regression modeling to avoid the risk of obtaining notably biased parameter estimators and standard errors, and erroneous interpretations about factors affecting e.g. hair lichen occurrence. We recommend comparisons of unweighted and weighted logistic regression analyses as an essential step in development of models based on data from large-scale surveys.  相似文献   

11.
An estimate of the risk or prevalence ratio, adjusted for confounders, can be obtained from a log binomial model (binomial errors, log link) fitted to binary outcome data. We propose a modification of the log binomial model to obtain relative risk estimates for nominal outcomes with more than two attributes (the "log multinomial model"). Extensive data simulations were undertaken to compare the performance of the log multinomial model with that of an expanded data multinomial logistic regression method based on the approach proposed by Schouten et al. (1993) for binary data, and with that of separate fits of a Poisson regression model based on the approach proposed by Zou (2004) and Carter, Lipsitz and Tilley (2005) for binary data. Log multinomial regression resulted in "inadmissable" solutions (out-of-bounds probabilities) exceeding 50% in some data settings. Coefficient estimates by the alternative methods produced out-of-bounds probabilities for the log multinomial model in up to 27% of samples to which a log multinomial model had been successfully fitted. The log multinomial coefficient estimates generally had lesser relative bias and mean squared error than the alternative methods. The practical utility of the log multinomial regression model was demonstrated with a real data example. The log multinomial model offers a practical solution to the problem of obtaining adjusted estimates of the risk ratio in the multinomial setting, but must be used with some care and attention to detail.  相似文献   

12.
For modelling dose-response relationships in case-control studies the multiplicative logistic regression model, assuming the relative risk to be an exponential function of the dose, is widely known. If the relative risk is assumed to be a linear function of the dose, several authors (see e.g. BERRY (1980)) have proposed an additive (linear) model. This model has a better fit with the data if such a linear relation holds. Confidence limits for the relative risk derived from the information matrix, however, appear to be rather inaccurate. Therefore, use of the ‘standard’ logistic model in two different ways was studied: extension with a quadratic term or a logarithmic transformation of the dose. By applying the methods both to an empirical data set and in a simulation experiment, it is shown that appropriate transformation (often logarithmic) of the dosage and then applying the ‘standard’ logistic model is an useful approach if a linear dose-response relationship holds.  相似文献   

13.
Fatigue has been linked to adverse safety outcomes, and poor quality or decreased sleep has been associated with obesity (higher body mass index, BMI). Additionally, higher BMI is related to an increased risk for injury; however, it is unclear whether BMI modifies the effect of short sleep or has an independent effect on work-related injury risk. To answer this question, the authors examined the risk of a work-related injury as a function of total daily sleep time and BMI using the US National Health Interview Survey (NHIS). The NHIS is an in-person household survey using a multistage, stratified, clustered sample design representing the US civilian population. Data were pooled for the 7-yr survey period from 2004 to 2010 for 101 891 "employed" adult subjects (51.7%; 41.1?±?yrs of age [mean?±?SEM]) with data on both sleep and BMI. Weighted annualized work-related injury rates were estimated across a priori defined categories of BMI: healthy weight (BMI: <25), overweight (BMI: 25-29.99), and obese (BMI: ≥30) and also categories of usual daily sleep duration: <6, 6-6.99, 7-7.99, 8-8.99, and ≥9 h. To account for the complex sampling design, including stratification, clustering, and unequal weighting, weighted multiple logistic regression was used to estimate the risk of a work-related injury. The initial model examined the interaction among daily sleep duration and BMI, controlling for weekly working hours, age, sex, race/ethnicity, education, type of pay, industry, and occupation. No significant interaction was found between usual daily sleep duration and BMI (p =?.72); thus, the interaction term of the final logistic model included these two variables as independent predictors of injury, along with the aforementioned covariates. Statistically significant covariates (p ≤?.05) included age, sex, weekly work hours, occupation, and if the worker was paid hourly. The lowest categories of usual sleep duration (<6 and 6-6.9 h) showed significantly (p ≤?.05) elevated injury risks than the referent category (7-8 h sleep), whereas sleeping >7-8 h did not significantly elevate risk. The adjusted injury risk odds ratio (OR) for a worker with a usual daily sleep of <6 h was 1.86 (95% confidence interval [CI]: 1.37-2.52), and for 6-6.9 h it was 1.46 (95% CI: 1.18-1.80). With regards to BMI, the adjusted injury risk OR comparing workers who were obese (BMI: ≥30) to healthy weight workers (BMI: <25) was 1.34 (95% CI: 1.09-1.66), whereas the risk in comparing overweight workers (BMI: 25-29.99) to healthy weight risk was elevated, but not statistically significant (OR = 1.08; 95% CI: .88-1.33). These results from a large representative sample of US workers suggest increase in work-related injury risk for reduced sleep regardless of worker's body mass. However, being an overweight worker also increases work-injury risk regardless of usual daily sleep duration. The independent additive risk of these factors on work-related injury suggests a substantial, but at least partially preventable, risk.  相似文献   

14.
Classical paper-and-pencil based risk assessment questionnaires are often accompanied by the online versions of the questionnaire to reach a wider population. This study focuses on the loss, especially in risk estimation performance, that can be inflicted by direct transformation from the paper to online versions of risk estimation calculators by ignoring the possibilities of more complex and accurate calculations that can be performed using the online calculators. We empirically compare the risk estimation performance between four major diabetes risk calculators and two, more advanced, predictive models. National Health and Nutrition Examination Survey (NHANES) data from 1999–2012 was used to evaluate the performance of detecting diabetes and pre-diabetes.American Diabetes Association risk test achieved the best predictive performance in category of classical paper-and-pencil based tests with an Area Under the ROC Curve (AUC) of 0.699 for undiagnosed diabetes (0.662 for pre-diabetes) and 47% (47% for pre-diabetes) persons selected for screening. Our results demonstrate a significant difference in performance with additional benefits for a lower number of persons selected for screening when statistical methods are used. The best AUC overall was obtained in diabetes risk prediction using logistic regression with AUC of 0.775 (0.734) and an average 34% (48%) persons selected for screening. However, generalized boosted regression models might be a better option from the economical point of view as the number of selected persons for screening of 30% (47%) lies significantly lower for diabetes risk assessment in comparison to logistic regression (p < 0.001), with a significantly higher AUC (p < 0.001) of 0.774 (0.740) for the pre-diabetes group.Our results demonstrate a serious lack of predictive performance in four major online diabetes risk calculators. Therefore, one should take great care and consider optimizing the online versions of questionnaires that were primarily developed as classical paper questionnaires.  相似文献   

15.
The purpose of the present study was to examine the relative influence of different habitat factors on ottersLutra lutra (Linnaeus, 1758) and to develop a predictive model to better understand the distribution of the otter in Denmark. During the National Otter Survey in 1991 data were collected on 19 variables which reflected aspects of habitat structure, composition, organic pollution and human disturbance. Multiple logistic regression analysis was used to estimate probabilities of the presence of otters as a function of one or more explanatory variables. Six variables (county, pH, water depth, presence of trees, bottom substrate and Saprobien-Index) were identified. In Denmark, otter habitat typically consists of water courses with depths > 1 m over a varied bottom, with pH > 7.0, Saprobien-Index on II–III to III (indicating slight organic pollution) with no trees on the banks. Some of these variables reflect highly productive waters. The use of the otter as an indicator of good water quality and/or aquatic habitat should be used with care.  相似文献   

16.
The objective was to examine BMI of working‐age Canadian adults in relation to occupational prestige, adjusting for other aspects of social class including household income and respondent's education. We analyzed data from 49,252 adults (age 25–64) from Cycle 2.1 of the Canadian Community Health Survey, a cross‐sectional self‐report survey conducted in 2003. Multiple linear regression was used to examine the relation between BMI and occupational prestige, adjusting for other sociodemographic variables. For women, higher ranking occupations showed lower average BMI relative to the lowest ranking occupations, but this effect was largely eliminated when adjusting for education. For men, occupation effects endured in adjusted models and we detected some evidence of a pattern whereby men in occupations characterized by management/supervisory responsibilities were heavier than those in the lowest ranking occupations (i.e., elemental sales and service). Results are interpreted in light of the symbolic value of body size in western culture, which differs for men and women. Men in positions of management/supervision may benefit from the physical dominance conveyed by a larger body size, and thus occupational prestige rankings may help us to understand the gender differences in the patterning of BMI by different indicators of social class.  相似文献   

17.
The odds ratio is known to closely approximate the relative risk when the disease is rare. Logistic regression models are often used to estimate such odds ratios, but here a different model is used which avoids the assumptions implicit in logistic modelling; it also has the advantage of providing a test of homogeneity for odds rat os in situations where the logistic model cannot.  相似文献   

18.
Pepe MS  Cai T  Longton G 《Biometrics》2006,62(1):221-229
No single biomarker for cancer is considered adequately sensitive and specific for cancer screening. It is expected that the results of multiple markers will need to be combined in order to yield adequately accurate classification. Typically, the objective function that is optimized for combining markers is the likelihood function. In this article, we consider an alternative objective function-the area under the empirical receiver operating characteristic curve (AUC). We note that it yields consistent estimates of parameters in a generalized linear model for the risk score but does not require specifying the link function. Like logistic regression, it yields consistent estimation with case-control or cohort data. Simulation studies suggest that AUC-based classification scores have performance comparable with logistic likelihood-based scores when the logistic regression model holds. Analysis of data from a proteomics biomarker study shows that performance can be far superior to logistic regression derived scores when the logistic regression model does not hold. Model fitting by maximizing the AUC rather than the likelihood should be considered when the goal is to derive a marker combination score for classification or prediction.  相似文献   

19.
Logistic regression is often used to help make medical decisions with binary outcomes. Here we evaluate the use of several methods for selection of variables in logistic regression. We use a large dataset to predict the diagnosis of myocardial infarction in patients reporting to an emergency room with chest pain. Our results indicate that some of the examined methods are well suited for variable selection in logistic regression and that our model, and our myocardial infarction risk calculator, can be an additional tool to aid physicians in myocardial infarction diagnosis.  相似文献   

20.
Stochastic compartmental modeling techniques have been employed to simulate coronary heart disease morbidity and mortality. In the current paper, polychotomous logistic models are used to describe the relationship between risk of disease and multiple risk factors, effect modification and confounding variables. The process of estimating the parameters for two risk factors and three types of outcomes is described for a population followed for five years. A Statistical Analysis System (SAS) procedure was used to estimate risk factor coefficients based on two partial periods and on the entire five year epoch. Most of the estimated coefficients were found to be statistically significant. The model performance was evaluated by comparing the observational data with simulated outcomes using a micropopulation and Monte Carlo techniques. Two different tests of goodness of fit were used. Satisfactory fits were obtained both for the risk coefficients based on two partial periods and those based on the entire epoch. This indicates that the model is suitable for simulation of the effects of intervention strategies. The use of the entire epoch involved estimates of one half as many parameters as did the use of two partial periods. Accordingly, it is concluded that only the entire epoch need be considered for future studies of this population.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号