首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
An estimate of the risk, adjusted for confounders, can be obtained from a fitted logistic regression model, but it substantially over-estimates when the outcome is not rare. The log binomial model, binomial errors and log link, is increasingly being used for this purpose. However this model's performance, goodness of fit tests and case-wise diagnostics have not been studied. Extensive simulations are used to compare the performance of the log binomial, a logistic regression based method proposed by Schouten et al. (1993) and a Poisson regression approach proposed by Zou (2004) and Carter, Lipsitz, and Tilley (2005). Log binomial regression resulted in "failure" rates (non-convergence, out-of-bounds predicted probabilities) as high as 59%. Estimates by the method of Schouten et al. (1993) produced fitted log binomial probabilities greater than unity in up to 19% of samples to which a log binomial model had been successfully fit and in up to 78% of samples when the log binomial model fit failed. Similar percentages were observed for the Poisson regression approach. Coefficient and standard error estimates from the three models were similar. Rejection rates for goodness of fit tests for log binomial fit were around 5%. Power of goodness of fit tests was modest when an incorrect logistic regression model was fit. Examples demonstrate the use of the methods. Uncritical use of the log binomial regression model is not recommended.  相似文献   

2.
We assessed complementary log–log (CLL) regression as an alternative statistical model for estimating multivariable‐adjusted prevalence ratios (PR) and their confidence intervals. Using the delta method, we derived an expression for approximating the variance of the PR estimated using CLL regression. Then, using simulated data, we examined the performance of CLL regression in terms of the accuracy of the PR estimates, the width of the confidence intervals, and the empirical coverage probability, and compared it with results obtained from log–binomial regression and stratified Mantel–Haenszel analysis. Within the range of values of our simulated data, CLL regression performed well, with only slight bias of point estimates of the PR and good confidence interval coverage. In addition, and importantly, the computational algorithm did not have the convergence problems occasionally exhibited by log–binomial regression. The technique is easy to implement in SAS (SAS Institute, Cary, NC), and it does not have the theoretical and practical issues associated with competing approaches. CLL regression is an alternative method of binomial regression that warrants further assessment.  相似文献   

3.
The risk difference is an intelligible measure for comparing disease incidence in two exposure or treatment groups. Despite its convenience in interpretation, it is less prevalent in epidemiological and clinical areas where regression models are required in order to adjust for confounding. One major barrier to its popularity is that standard linear binomial or Poisson regression models can provide estimated probabilities out of the range of (0,1), resulting in possible convergence issues. For estimating adjusted risk differences, we propose a general framework covering various constraint approaches based on binomial and Poisson regression models. The proposed methods span the areas of ordinary least squares, maximum likelihood estimation, and Bayesian inference. Compared to existing approaches, our methods prevent estimates and confidence intervals of predicted probabilities from falling out of the valid range. Through extensive simulation studies, we demonstrate that the proposed methods solve the issue of having estimates or confidence limits of predicted probabilities out of (0,1), while offering performance comparable to its alternative in terms of the bias, variability, and coverage rates in point and interval estimation of the risk difference. An application study is performed using data from the Prospective Registry Evaluating Myocardial Infarction: Event and Recovery (PREMIER) study.  相似文献   

4.
In the quantitative analysis of behaviour, choice data are most often plotted and analyzed as logarithmic transforms of ratios of responses and of ratios of reinforcers according to the generalized-matching relation, or its derivatives such as conditional-discrimination models. The relation between log choice ratios and log reinforcer ratios has normally been found using ordinary linear regression, which minimizes the sums of the squares of the y deviations from the fitted line. However, linear regression of this type requires that the log choice data be normally distributed, of equal variance for each log reinforcer ratio, and that the x (log reinforcer ratio) measures be fixed with no variance. We argue that, while log transformed choice data may be normally distributed, log reinforcer ratios do have variance, and because these measures derive from a binomial process, log reinforcer ratio distributions will be non-normal and skewed to more extreme values. These effects result in ordinary linear regression systematically underestimating generalized-matching sensitivity values, and in faulty parameter estimates from non-linear regression to assume hyperbolic and exponential decay processes. They also lead to model comparisons, which assume equal normally distributed error around every data point, being incorrect. We describe an alternative approach that can be used if the variance in choice is measured.  相似文献   

5.
This paper discusses a two‐state hidden Markov Poisson regression (MPR) model for analyzing longitudinal data of epileptic seizure counts, which allows for the rate of the Poisson process to depend on covariates through an exponential link function and to change according to the states of a two‐state Markov chain with its transition probabilities associated with covariates through a logit link function. This paper also considers a two‐state hidden Markov negative binomial regression (MNBR) model, as an alternative, by using the negative binomial instead of Poisson distribution in the proposed MPR model when there exists extra‐Poisson variation conditional on the states of the Markov chain. The two proposed models in this paper relax the stationary requirement of the Markov chain, allow for overdispersion relative to the usual Poisson regression model and for correlation between repeated observations. The proposed methodology provides a plausible analysis for the longitudinal data of epileptic seizure counts, and the MNBR model fits the data much better than the MPR model. Maximum likelihood estimation using the EM and quasi‐Newton algorithms is discussed. A Monte Carlo study for the proposed MPR model investigates the reliability of the estimation method, the choice of probabilities for the initial states of the Markov chain, and some finite sample behaviors of the maximum likelihood estimates, suggesting that (1) the estimation method is accurate and reliable as long as the total number of observations is reasonably large, and (2) the choice of probabilities for the initial states of the Markov process has little impact on the parameter estimates.  相似文献   

6.
Paulino CD  Soares P  Neuhaus J 《Biometrics》2003,59(3):670-675
Motivated by a study of human papillomavirus infection in women, we present a Bayesian binomial regression analysis in which the response is subject to an unconstrained misclassification process. Our iterative approach provides inferences for the parameters that describe the relationships of the covariates with the response and for the misclassification probabilities. Furthermore, our approach applies to any meaningful generalized linear model, making model selection possible. Finally, it is straightforward to extend it to multinomial settings.  相似文献   

7.
A statistical model for jointly analysing the spatial variation of incidences of three (or more) diseases, with common and uncommon risk factors, is introduced. Deaths for different diseases are described by a logit model for multinomial responses (multinomial logit or polytomous logit model). For each area and confounding strata population (i.e. age-class, sex, race) the probabilities of death for each cause (the response probabilities) are estimated. A specic disease, the one having a common risk factor only, acts as the baseline category. The log odds are decomposed additively into shared (common to diseases different by the reference disease) and specic structured spatial variability terms, unstructured unshared spatial terms and confounders terms (such as age, race and sex) to adjust the crude observed data for their effects. Disease specic spatially structured effects are estimated; these are considered as latent variables denoting disease-specic risk factors. The model is presented with reference to a specic application. We considered the mortality data (from 1990 to 1994) relative to oral cavity, larynx and lung cancers in 13 age groups of males, in the 287 municipalities of Region of Tuscany (Italy). All these pathologies share smoking as a common risk factor; furthermore, two of them (oral cavity and larynx cancer) share alcohol consumption as a risk factor. All studies suggest that smoking and alcohol consumption are the major known risk factors for oral cavity and larynx cancers; nevertheless, in this paper, we investigate the possibility of other different risk factors for these diseases, or even the presence of an interaction effect (between smoking and alcohol risk factors) but with different spatial patterns for oral and larynx cancer. For each municipality and age-class the probabilities of death for each cause (the response probabilities) are estimated. Lung cancer acts as the baseline category. The log odds are decomposed additively into shared (common to oral cavity and larynx diseases) and specic structured spatial variability terms, unstructured unshared spatial terms and an age-group term. It turns out that oral cavity and larynx cancer have different spatial patterns for residual risk factors which are not the typical ones such as smoking habits and alcohol consumption. But, possibly, these patterns are due to different spatial interactions between smoking habits and alcohol consumption for the first and the second disease.  相似文献   

8.
For a prospective randomized clinical trial with two groups, the relative risk can be used as a measure of treatment effect and is directly interpretable as the ratio of success probabilities in the new treatment group versus the placebo group. For a prospective study with many covariates and a binary outcome (success or failure), relative risk regression may be of interest. If we model the log of the success probability as a linear function of covariates, the regression coefficients are log-relative risks. However, using such a log-linear model with a Bernoulli likelihood can lead to convergence problems in the Newton-Raphson algorithm. This is likely to occur when the success probabilities are close to one. A constrained likelihood method proposed by Wacholder (1986, American Journal of Epidemiology 123, 174-184), also has convergence problems. We propose a quasi-likelihood method of moments technique in which we naively assume the Bernoulli outcome is Poisson, with the mean (success probability) following a log-linear model. We use the Poisson maximum likelihood equations to estimate the regression coefficients without constraints. Using method of moment ideas, one can show that the estimates using the Poisson likelihood will be consistent and asymptotically normal. We apply these methods to a double-blinded randomized trial in primary biliary cirrhosis of the liver (Markus et al., 1989, New England Journal of Medicine 320, 1709-1713).  相似文献   

9.
Diagnostic studies in ophthalmology frequently involve binocular data where pairs of eyes are evaluated, through some diagnostic procedure, for the presence of certain diseases or pathologies. The simplest approach of estimating measures of diagnostic accuracy, such as sensitivity and specificity, treats eyes as independent, consequently yielding incorrect estimates, especially of the standard errors. Approaches that account for the inter‐eye correlation include regression methods using generalized estimating equations and likelihood techniques based on various correlated binomial models. The paper proposes a simple alternative statistical methodology of jointly estimating measures of diagnostic accuracy for binocular tests based on a flexible model for correlated binary data. Moments' estimation of model parameters is outlined and asymptotic inference is discussed. The resulting estimates are straightforward and easy to obtain, requiring no special statistical software but only elementary calculations. Results of simulations indicate that large‐sample and bootstrap confidence intervals based on the estimates have relatively good coverage properties when the model is correctly specified. The computation of the estimates and their standard errors are illustrated with data from a study on diabetic retinopathy.  相似文献   

10.
Three bivariate generalizations of the POISSON binomial distribution are introduced. The probabilities, moments, conditional distributions and regression functions for these distributions are obtained in terms of bipartitional polynomials. Recurrences for the probabilities and moments are also given. Parameter estimators are derived using the methods of moments and zero frequencies and the three distributions are fitted to some ecological data.  相似文献   

11.
A binomial (presence–absence) sampling plan has been developed based on the relationship between the proportion of cauliflower plants having visible cabbage root fly eggs ( Delia radicum L.) exposed on the soil surface around the plant stem and the mean density of eggs per plant. The Kono–Sugino's model was fitted to a total of 125 population estimates, each based on 10 plant samples collected from cauliflower fields in 1994 and 1995 (P=0.001; R2=0.64). When the model was compared with an independent data set consisting of 39 population estimates collected in 1995, an analysis of covariance showed no significant differences between the regression lines. The efficiency of the binomial method was compared with absolute sampling in terms of relative precision and cost efficiency. The binomial method had a high coefficient of variation, RV ≈ 0.85, due to large biological error. In spite of this, binomial sampling was more cost efficient than the applied soil sampling when between 10 and 30 plants were examined for the presence of visible eggs.  相似文献   

12.
This paper presents the zero‐truncated negative binomial regression model to estimate the population size in the presence of a single registration file. The model is an alternative to the zero‐truncated Poisson regression model and it may be useful if the data are overdispersed due to unobserved heterogeneity. Horvitz–Thompson point and interval estimates for the population size are derived, and the performance of these estimators is evaluated in a simulation study. To illustrate the model, the size of the population of opiate users in the city of Rotterdam is estimated. In comparison to the Poisson model, the zero‐truncated negative binomial regression model fits these data better and yields a substantially higher population size estimate. (© 2008 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

13.
The household secondary attack risk (SAR), often called the secondary attack rate or secondary infection risk, is the probability of infectious contact from an infectious household member A to a given household member B, where we define infectious contact to be a contact sufficient to infect B if he or she is susceptible. Estimation of the SAR is an important part of understanding and controlling the transmission of infectious diseases. In practice, it is most often estimated using binomial models such as logistic regression, which implicitly attribute all secondary infections in a household to the primary case. In the simplest case, the number of secondary infections in a household with m susceptibles and a single primary case is modeled as a binomial(m, p) random variable where p is the SAR. Although it has long been understood that transmission within households is not binomial, it is thought that multiple generations of transmission can be neglected safely when p is small. We use probability generating functions and simulations to show that this is a mistake. The proportion of susceptible household members infected can be substantially larger than the SAR even when p is small. As a result, binomial estimates of the SAR are biased upward and their confidence intervals have poor coverage probabilities even if adjusted for clustering. Accurate point and interval estimates of the SAR can be obtained using longitudinal chain binomial models or pairwise survival analysis, which account for multiple generations of transmission within households, the ongoing risk of infection from outside the household, and incomplete follow-up. We illustrate the practical implications of these results in an analysis of household surveillance data collected by the Los Angeles County Department of Public Health during the 2009 influenza A (H1N1) pandemic.  相似文献   

14.

Background

Marriage is a significant event in life-course of individuals, and creates a system that characterizes societal and economic structures. Marital patterns and dynamics over the years have changed a lot, with decreasing proportions of marriage, increased levels of divorce and co-habitation in developing countries. Although, such changes have been reported in African societies including Namibia, they have largely remained unexplained.

Objectives and Methods

In this paper, we examined trends and patterns of marital status of women of marriageable age: 15 to 49 years, in Namibia using the 1992, 2000 and 2006 Demographic and Health Survey (DHS) data. Trends were established for selected demographic variables. Two binary logistic regression models for ever-married versus never married, and cohabitation versus married were fitted to establish factors associated with such nuptial systems. Further a multinomial logistic regression models, adjusted for bio-demographic and socio-economic variables, were fitted separately for each year, to establish determinants of type of union (never married, married and cohabitation).

Results and Conclusions

Findings indicate a general change away from marriage, with a shift in singulate mean age at marriage. Cohabitation was prevalent among those less than 30 years of age, the odds were higher in urban areas and increased since 1992. Be as it may marriage remained a persistent nuptiality pattern, and common among the less educated and employed, but lower odds in urban areas. Results from multinomial model suggest that marital status was associated with age at marriage, total children born, region, place of residence, education level and religion. We conclude that marital patterns have undergone significant transformation over the past two decades in Namibia, with a coexistence of traditional marriage framework with co-habitation, and sizeable proportion remaining unmarried to the late 30s. A shift in the singulate mean age is becoming distinctive in the Namibian society.  相似文献   

15.
This study developed models to predict lactic acid concentration, dipping time, and storage temperature combinations determining growth/no-growth interfaces of Listeria monocytogenes at desired probabilities on bologna and frankfurters. L. monocytogenes was inoculated on bologna and frankfurters, and 75 combinations of lactic acid concentrations, dipping times, and storage temperatures were tested. Samples were stored in vacuum packages for up to 60 days, and bacterial populations were enumerated on tryptic soy agar plus 0.6% yeast extract and Palcam agar on day zero and at the end point of storage. The combinations that allowed L. monocytogenes increases of ≥1 log CFU/cm2 were assigned the value of 1 (growth), and the combinations that had increases of <l log CFU/cm2 were given the value of 0 (no growth). These binary growth response data were fitted to logistic regression to develop a model predicting probabilities of growth. Validation with existing data and various indices showed acceptable model performance. Thus, the models developed in this study may be useful in determining probabilities of growth and in selecting lactic acid concentrations and dipping times to control L. monocytogenes growth on bologna and frankfurters, while the procedures followed may also be used to develop models for other products, conditions, or pathogens.  相似文献   

16.
Habitats in the Wadden Sea, a world heritage area, are affected by land subsidence resulting from natural gas extraction and by sea level rise. Here we describe a method to monitor changes in habitat types by producing sequential maps based on point information followed by mapping using a multinomial logit regression model with abiotic variables of which maps are available as predictors.In a 70 ha study area a total of 904 vegetation samples has been collected in seven sampling rounds with an interval of 2–3 years. Half of the vegetation plots was permanent, violating the assumption of independent data in multinomial logistic regression. This paper shows how this dependency can be accounted for by adding a random effect to the multinomial logit (MLN) model, thus becoming a mixed multinomial logit (MMNL) model. In principle all regression coefficients can be taken as random, but in this study only the intercepts are treated as location-specific random variables (random intercepts model). With six habitat types we have five intercepts, so that the number of extra model parameters becomes 15, 5 variances and 10 covariances.The likelihood ratio test showed that the MMNL model fitted significantly better than the MNL model with the same fixed effects. McFadden-R2 for the MMNL model was 0.467, versus 0.395 for the MNL model. The estimated coefficients of the MMNL and MNL model were comparable; those of altitude, the most important predictor, differed most. The MMNL model accounts for pseudo-replication at the permanent plots, which explains the larger standard errors of the MMNL coefficients. The habitat type at a given location-year combination was predicted by the habitat type with the largest predicted probability. The series of maps shows local trends in habitat types most likely driven by sea-level rise, soil subsidence, and a restoration project.We conclude that in environmental modeling of categorical variables using panel data, dependency of repeated observations at permanent plots should be accounted for. This will affect the estimated probabilities of the categories, and even stronger the standard errors of the regression coefficients.  相似文献   

17.
Binomial sampling based on the proportion of samples infested was investigated for estimating mean densities of citrus rust mite, Phyllocoptruta oleivora (Ashmead), and Aculops pelekassi (Keifer) (Acari: Eriophyidae), on oranges, Citrus sinensis (L.) Osbeck. Data for the investigation were obtained by counting the number of motile mites within 600 sample units (each unit a 1-cm2 surface area per fruit) across a 4-ha block of trees (32 blocks total): five areas per 4 ha, five trees per area, 12 fruit per tree, and two samples per fruit. A significant (r2 = 0.89), linear relationship was found between ln(-ln(1 -Po)) and ln(mean), where P0 is the proportion of samples with more than zero mites. The fitted binomial parameters adequately described a validation data set from a sampling plan consisting of 192 samples. Projections indicated the fitted parameters would apply to sampling plans with as few as 48 samples, but reducing sample size resulted in an increase of bootstrap estimates falling outside expected confidence limits. Although mite count data fit the binomial model, confidence limits for mean arithmetic predictions increased dramatically as proportion of samples infested increased. Binomial sampling using a tally threshold of 0 therefore has less value when proportions of samples infested are large. Increasing the tally threshold to two mites marginally improved estimates at larger densities. Overall, binomial sampling for a general estimate of mite densities seemed to be a viable alternative to absolute counts of mites per sample for a grower using a low management threshold such as two or three mites per sample.  相似文献   

18.
Understanding causes of nest loss is critical for the management of endangered bird populations. Available methods for estimating nest loss probabilities to competing sources do not allow for random effects and covariation among sources, and there are few data simulation methods or goodness‐of‐fit (GOF) tests for such models. We developed a Bayesian multinomial extension of the widely used logistic exposure (LE) nest survival model which can incorporate multiple random effects and fixed‐effect covariates for each nest loss category. We investigated the performance of this model and the accompanying GOF test by analysing simulated nest fate datasets with and without age‐biased discovery probability, and by comparing the estimates with those of traditional fixed‐effects estimators. We then exemplify the use of the multinomial LE model and GOF test by analysing Piping Plover Charadrius melodus nest fate data (n = 443) to explore the effects of wire cages (exclosures) constructed around nests, which are used to protect nests from predation but can lead to increased nest abandonment rates. Mean parameter estimates of the random‐effects multinomial LE model were all within 1 sd of the true values used to simulate the datasets. Age‐biased discovery probability did not result in biased parameter estimates. Traditional fixed‐effects models provided estimates with a high bias of up to 43% with a mean of 71% smaller standard deviations. The GOF test identified models that were a poor fit to the simulated data. For the Piping Plover dataset, the fixed‐effects model was less well‐supported than the random‐effects model and underestimated the risk of exclosure use by 16%. The random‐effects model estimated a range of 1–6% probability of abandonment for nests not protected by exclosures across sites and 5–41% probability of abandonment for nests with exclosures, suggesting that the magnitude of exclosure‐related abandonment is site‐specific. Our results demonstrate that unmodelled heterogeneity can result in biased estimates potentially leading to incorrect management recommendations. The Bayesian multinomial LE model offers a flexible method of incorporating random effects into an analysis of nest failure and is robust to age‐biased nest discovery probability. This model can be generalized to other staggered‐entry, time‐to‐hazard situations.  相似文献   

19.
The accuracy of a single diagnostic test for binary outcome can be summarized by the area under the receiver operating characteristic (ROC) curve. Volume under the surface and hypervolume under the manifold have been proposed as extensions for multiple class diagnosis (Scurfield, 1996, 1998). However, the lack of simple inferential procedures for such measures has limited their practical utility. Part of the difficulty is that calculating such quantities may not be straightforward, even with a single test. The decision rule used to generate the ROC surface requires class probability assessments, which are not provided by the tests. We develop a method based on estimating the probabilities via some procedure, for example, multinomial logistic regression. Bootstrap inferences are proposed to account for variability in estimating the probabilities and perform well in simulations. The ROC measures are compared to the correct classification rate, which depends heavily on class prevalences. An example of tumor classification with microarray data demonstrates that this property may lead to substantially different analyses. The ROC-based analysis yields notable decreases in model complexity over previous analyses.  相似文献   

20.
Generalized relative and absolute risk models are fitted to the latest Japanese atomic bomb survivor solid cancer and leukemia mortality data (through 2000), with the latest (DS02) dosimetry, by classical (regression calibration) and Bayesian techniques, taking account of errors in dose estimates and other uncertainties. Linear-quadratic and linear-quadratic-exponential models are fitted and used to assess risks for contemporary populations of China, Japan, Puerto Rico, the U.S. and the UK. Many of these models are the same as or very similar to models used in the UNSCEAR 2006 report. For a test dose of 0.1 Sv, the solid cancer mortality for a UK population using the generalized linear-quadratic relative risk model is estimated as 5.4% Sv(-1) [90% Bayesian credible interval (BCI) 3.1, 8.0]. At 0.1 Sv, leukemia mortality for a UK population using the generalized linear-quadratic relative risk model is estimated as 0.50% Sv(-1) (90% BCI 0.11, 0.97). Risk estimates varied little between populations; at 0.1 Sv the central estimates ranged from 3.7 to 5.4% Sv(-1) for solid cancers and from 0.4 to 0.6% Sv(-1) for leukemia. Analyses using regression calibration techniques yield central estimates of risk very similar to those for the Bayesian approach. The central estimates of population risk were similar for the generalized absolute risk model and the relative risk model. Linear-quadratic-exponential models predict lower risks (at least at low test doses) and appear to fit as well, although for other (theoretical) reasons we favor the simpler linear-quadratic models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号