期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Beyond mean modelling: Bias due to misspecification of dispersion in Poisson‐inverse Gaussian regression

Gillian Z. Heller Dominique‐Laurent Couturier Stephane R. Heritier 《Biometrical journal. Biometrische Zeitschrift》2019,61(2):333-342

In clinical trials one traditionally models the effect of treatment on the mean response. The underlying assumption is that treatment affects the response distribution through a mean location shift on a suitable scale, with other aspects of the distribution (shape/dispersion/variance) remaining the same. This work is motivated by a trial in Parkinson's disease patients in which one of the endpoints is the number of falls during a 10‐week period. Inspection of the data reveals that the Poisson‐inverse Gaussian (PiG) distribution is appropriate, and that the experimental treatment reduces not only the mean, but also the variability, substantially. The conventional analysis assumes a treatment effect on the mean, either adjusted or unadjusted for covariates, and a constant dispersion parameter. On our data, this analysis yields a non‐significant treatment effect. However, if we model a treatment effect on both mean and dispersion parameters, both effects are highly significant. A simulation study shows that if a treatment effect exists on the dispersion and is ignored in the modelling, estimation of the treatment effect on the mean can be severely biased. We show further that if we use an orthogonal parametrization of the PiG distribution, estimates of the mean model are robust to misspecification of the dispersion model. We also discuss inferential aspects that are more difficult than anticipated in this setting. These findings have implications in the planning of statistical analyses for count data in clinical trials. 相似文献

2.

Estimating marginal and incremental effects on health outcomes using flexible link and variance function models 总被引：1，自引：0，他引：1

Basu A Rathouz PJ 《Biostatistics (Oxford, England)》2005,6(1):93-109

We propose an extension to the estimating equations in generalized linear models to estimate parameters in the link function and variance structure simultaneously with regression coefficients. Rather than focusing on the regression coefficients, the purpose of these models is inference about the mean of the outcome as a function of a set of covariates, and various functionals of the mean function used to measure the effects of the covariates. A commonly used functional in econometrics, referred to as the marginal effect, is the partial derivative of the mean function with respect to any covariate, averaged over the empirical distribution of covariates in the model. We define an analogous parameter for discrete covariates. The proposed estimation method not only helps to identify an appropriate link function and to suggest an underlying distribution for a specific application but also serves as a robust estimator when no specific distribution for the outcome measure can be identified. Using Monte Carlo simulations, we show that the resulting parameter estimators are consistent. The method is illustrated with an analysis of inpatient expenditure data from a study of hospitalists. 相似文献

3.

Inference of haplotype effects in case-control studies using unphased genotype and environmental data

Chen X Li Z 《Biometrical journal. Biometrische Zeitschrift》2008,50(2):270-282

A retrospective likelihood-based approach was proposed to test and estimate the effect of haplotype on disease risk using unphased genotype data with adjustment for environmental covariates. The proposed method was also extended to handle the data in which the haplotype and environmental covariates are not independent. Likelihood ratio tests were constructed to test the effects of haplotype and gene-environment interaction. The model parameters such as haplotype effect size was estimated using an Expectation Conditional-Maximization (ECM) algorithm developed by Meng and Rubin (1993). Model-based variance estimates were derived using the observed information matrix. Simulation studies were conducted for three different genetic effect models, including dominant effect, recessive effect, and additive effect. The results showed that the proposed method generated unbiased parameter estimates, proper type I error, and true beta coverage probabilities. The model performed well with small or large sample sizes, as well as short or long haplotypes. 相似文献

4.

Smooth individual level covariates adjustment in disease mapping

下载免费PDF全文

Md Hamidul Huque Craig Anderson Richard Walton Samuel Woolford Louise Ryan 《Biometrical journal. Biometrische Zeitschrift》2018,60(3):597-615

Spatial models for disease mapping should ideally account for covariates measured both at individual and area levels. The newly available “indiCAR” model fits the popular conditional autoregresssive (CAR) model by accommodating both individual and group level covariates while adjusting for spatial correlation in the disease rates. This algorithm has been shown to be effective but assumes log‐linear associations between individual level covariates and outcome. In many studies, the relationship between individual level covariates and the outcome may be non‐log‐linear, and methods to track such nonlinearity between individual level covariate and outcome in spatial regression modeling are not well developed. In this paper, we propose a new algorithm, smooth‐indiCAR, to fit an extension to the popular conditional autoregresssive model that can accommodate both linear and nonlinear individual level covariate effects while adjusting for group level covariates and spatial correlation in the disease rates. In this formulation, the effect of a continuous individual level covariate is accommodated via penalized splines. We describe a two‐step estimation procedure to obtain reliable estimates of individual and group level covariate effects where both individual and group level covariate effects are estimated separately. This distributed computing framework enhances its application in the Big Data domain with a large number of individual/group level covariates. We evaluate the performance of smooth‐indiCAR through simulation. Our results indicate that the smooth‐indiCAR method provides reliable estimates of all regression and random effect parameters. We illustrate our proposed methodology with an analysis of data on neutropenia admissions in New South Wales (NSW), Australia. 相似文献

5.

A Mixture Model for Quantum Dot Images of Kinesin Motor Assays

John Hughes John Fricks 《Biometrics》2011,67(2):588-595

Summary We introduce a nearly automatic procedure to locate and count the quantum dots in images of kinesin motor assays. Our procedure employs an approximate likelihood estimator based on a two‐component mixture model for the image data; the first component has a normal distribution, and the other component is distributed as a normal random variable plus an exponential random variable. The normal component has an unknown variance, which we model as a function of the mean. We use B‐splines to estimate the variance function during a training run on a suitable image, and the estimate is used to process subsequent images. Parameter estimates are generated for each image along with estimates of standard errors, and the number of dots in the image is determined using an information criterion and likelihood ratio tests. Realistic simulations show that our procedure is robust and that it leads to accurate estimates, both of parameters and of standard errors. 相似文献

6.

Clustered restricted mean survival time regression

Xinyuan Chen Michael O. Harhay Fan Li 《Biometrical journal. Biometrische Zeitschrift》2023,65(6):2200002

For multicenter randomized trials or multilevel observational studies, the Cox regression model has long been the primary approach to study the effects of covariates on time-to-event outcomes. A critical assumption of the Cox model is the proportionality of the hazard functions for modeled covariates, violations of which can result in ambiguous interpretations of the hazard ratio estimates. To address this issue, the restricted mean survival time (RMST), defined as the mean survival time up to a fixed time in a target population, has been recommended as a model-free target parameter. In this article, we generalize the RMST regression model to clustered data by directly modeling the RMST as a continuous function of restriction times with covariates while properly accounting for within-cluster correlations to achieve valid inference. The proposed method estimates regression coefficients via weighted generalized estimating equations, coupled with a cluster-robust sandwich variance estimator to achieve asymptotically valid inference with a sufficient number of clusters. In small-sample scenarios where a limited number of clusters are available, however, the proposed sandwich variance estimator can exhibit negative bias in capturing the variability of regression coefficient estimates. To overcome this limitation, we further propose and examine bias-corrected sandwich variance estimators to reduce the negative bias of the cluster-robust sandwich variance estimator. We study the finite-sample operating characteristics of proposed methods through simulations and reanalyze two multicenter randomized trials. 相似文献

7.

Effects of variance-function misspecification in analysis of longitudinal data

Wang YG Lin X 《Biometrics》2005,61(2):413-421

The approach of generalized estimating equations (GEE) is based on the framework of generalized linear models but allows for specification of a working matrix for modeling within-subject correlations. The variance is often assumed to be a known function of the mean. This article investigates the impacts of misspecifying the variance function on estimators of the mean parameters for quantitative responses. Our numerical studies indicate that (1) correct specification of the variance function can improve the estimation efficiency even if the correlation structure is misspecified; (2) misspecification of the variance function impacts much more on estimators for within-cluster covariates than for cluster-level covariates; and (3) if the variance function is misspecified, correct choice of the correlation structure may not necessarily improve estimation efficiency. We illustrate impacts of different variance functions using a real data set from cow growth. 相似文献

8.

Methods for estimating the parameters of a linear model for ordered categorical data.

S R Lipsitz 《Biometrics》1992,48(1):271-281

In many empirical analyses, the response of interest is categorical with an ordinal scale attached. Many investigators prefer to formulate a linear model, assigning scores to each category of the ordinal response and treating it as continuous. When the covariates are categorical, Haber (1985, Computational Statistics and Data Analysis 3, 1-10) has developed a method to obtain maximum likelihood (ML) estimates of the parameters of the linear model using Lagrange multipliers. However, when the covariates are continuous, the only method we found in the literature is ordinary least squares (OLS), performed under the assumption of homogeneous variance. The OLS estimates are unbiased and consistent but, since variance homogeneity is violated, the OLS estimates of variance can be biased and may not be consistent. We discuss a variance estimate (White, 1980, Econometrica 48, 817-838) that is consistent for the true variance of the OLS parameter estimates. The possible bias encountered by using the naive OLS variance estimate is discussed. An estimated generalized least squares (EGLS) estimator is proposed and its efficiency relative to OLS is discussed. Finally, an empirical comparison of OLS, EGLS, and ML estimators is made. 相似文献

9.

Variance Component Estimation for Mixed Model Analysis of cDNA Microarray Data

Barbara Sarholz Hans‐Peter Piepho 《Biometrical journal. Biometrische Zeitschrift》2008,50(6):927-939

Microarrays provide a valuable tool for the quantification of gene expression. Usually, however, there is a limited number of replicates leading to unsatisfying variance estimates in a gene‐wise mixed model analysis. As thousands of genes are available, it is desirable to combine information across genes. When more than two tissue types or treatments are to be compared it might be advisable to consider the array effect as random. Then information between arrays may be recovered, which can increase accuracy in estimation. We propose a method of variance component estimation across genes for a linear mixed model with two random effects. The method may be extended to models with more than two random effects. We assume that the variance components follow a log‐normal distribution. Assuming that the sums of squares from the gene‐wise analysis, given the true variance components, follow a scaled χ²‐distribution, we adopt an empirical Bayes approach. The variance components are estimated by the expectation of their posterior distribution. The new method is evaluated in a simulation study. Differentially expressed genes are more likely to be detected by tests based on these variance estimates than by tests based on gene‐wise variance estimates. This effect is most visible in studies with small array numbers. Analyzing a real data set on maize endosperm the method is shown to work well. (© 2008 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim) 相似文献

10.

A diagnostic for Cox regression with discrete failure-time models

Parker CB Delong ER 《Biometrics》2000,56(4):996-1001

Changes in maximum likelihood parameter estimates due to deletion of individual observations are useful statistics, both for regression diagnostics and for computing robust estimates of covariance. For many likelihoods, including those in the exponential family, these delete-one statistics can be approximated analytically from a one-step Newton-Raphson iteration on the full maximum likelihood solution. But for general conditional likelihoods and the related Cox partial likelihood, the one-step method does not reduce to an analytic solution. For these likelihoods, an alternative analytic approximation that relies on an appropriately augmented design matrix has been proposed. In this paper, we extend the augmentation approach to explicitly deal with discrete failure-time models. In these models, an individual subject may contribute information at several time points, thereby appearing in multiple risk sets before eventually experiencing a failure or being censored. Our extension also allows the covariates to be time dependent. The new augmentation requires no additional computational resources while improving results. 相似文献

11.

Effects of residual smoothing on the posterior of the fixed effects in disease-mapping models 总被引：2，自引：0，他引：2

Reich BJ Hodges JS Zadnik V 《Biometrics》2006,62(4):1197-1206

Disease-mapping models for areal data often have fixed effects to measure the effect of spatially varying covariates and random effects with a conditionally autoregressive (CAR) prior to account for spatial clustering. In such spatial regressions, the objective may be to estimate the fixed effects while accounting for the spatial correlation. But adding the CAR random effects can cause large changes in the posterior mean and variance of fixed effects compared to the nonspatial regression model. This article explores the impact of adding spatial random effects on fixed effect estimates and posterior variance. Diagnostics are proposed to measure posterior variance inflation from collinearity between the fixed effect covariates and the CAR random effects and to measure each region's influence on the change in the fixed effect's estimates by adding the CAR random effects. A new model that alleviates the collinearity between the fixed effect covariates and the CAR random effects is developed and extensions of these methods to point-referenced data models are discussed. 相似文献

12.

Using density surface models to estimate spatio-temporal changes in population densities and trend

Richard J. Camp David L. Miller Len Thomas Stephen T. Buckland Steve J. Kendall 《Ecography》2020,43(7):1079-1089

Precise measures of population abundance and trend are needed for species conservation; these are most difficult to obtain for rare and rapidly changing populations. We compare uncertainty in densities estimated from spatio–temporal models with that from standard design-based methods. Spatio–temporal models allow us to target priority areas where, and at times when, a population may most benefit. Generalised additive models were fitted to a 31-year time series of point-transect surveys of an endangered Hawaiian forest bird, the Hawai‘i ‘ākepa Loxops coccineus. This allowed us to estimate bird densities over space and time. We used two methods to quantify uncertainty in density estimates from the spatio–temporal model: the delta method (which assumes independence between detection and distribution parameters) and a variance propagation method. With the delta method we observed a 52% decrease in the width of the design-based 95% confidence interval (CI), while we observed a 37% decrease in CI width when propagating the variance. We mapped bird densities as they changed across space and time, allowing managers to evaluate management actions. Integrating detection function modelling with spatio–temporal modelling exploits survey data more efficiently by producing finer-grained abundance estimates than are possible with design-based methods as well as producing more precise abundance estimates. Model-based approaches require switching from making assumptions about the survey design to assumptions about bird distribution. Such a switch warrants consideration. In this case the model-based approach benefits conservation planning through improved management efficiency and reduced costs by taking into account both spatial shifts and temporal changes in population abundance and distribution. 相似文献

13.

Accelerated failure time modeling via nonparametric mixtures

Byungtae Seo Sangwook Kang 《Biometrics》2023,79(1):165-177

An accelerated failure time (AFT) model assuming a log-linear relationship between failure time and a set of covariates can be either parametric or semiparametric, depending on the distributional assumption for the error term. Both classes of AFT models have been popular in the analysis of censored failure time data. The semiparametric AFT model is more flexible and robust to departures from the distributional assumption than its parametric counterpart. However, the semiparametric AFT model is subject to producing biased results for estimating any quantities involving an intercept. Estimating an intercept requires a separate procedure. Moreover, a consistent estimation of the intercept requires stringent conditions. Thus, essential quantities such as mean failure times might not be reliably estimated using semiparametric AFT models, which can be naturally done in the framework of parametric AFT models. Meanwhile, parametric AFT models can be severely impaired by misspecifications. To overcome this, we propose a new type of the AFT model using a nonparametric Gaussian-scale mixture distribution. We also provide feasible algorithms to estimate the parameters and mixing distribution. The finite sample properties of the proposed estimators are investigated via an extensive stimulation study. The proposed estimators are illustrated using a real dataset. 相似文献

14.

Monte Carlo EM for missing covariates in parametric regression models

Ibrahim JG Chen MH Lipsitz SR 《Biometrics》1999,55(2):591-596

We propose a method for estimating parameters for general parametric regression models with an arbitrary number of missing covariates. We allow any pattern of missing data and assume that the missing data mechanism is ignorable throughout. When the missing covariates are categorical, a useful technique for obtaining parameter estimates is the EM algorithm by the method of weights proposed in Ibrahim (1990, Journal of the American Statistical Association 85, 765-769). We extend this method to continuous or mixed categorical and continuous covariates, and for arbitrary parametric regression models, by adapting a Monte Carlo version of the EM algorithm as discussed by Wei and Tanner (1990, Journal of the American Statistical Association 85, 699-704). In addition, we discuss the Gibbs sampler for sampling from the conditional distribution of the missing covariates given the observed data and show that the appropriate complete conditionals are log-concave. The log-concavity property of the conditional distributions will facilitate a straightforward implementation of the Gibbs sampler via the adaptive rejection algorithm of Gilks and Wild (1992, Applied Statistics 41, 337-348). We assume the model for the response given the covariates is an arbitrary parametric regression model, such as a generalized linear model, a parametric survival model, or a nonlinear model. We model the marginal distribution of the covariates as a product of one-dimensional conditional distributions. This allows us a great deal of flexibility in modeling the distribution of the covariates and reduces the number of nuisance parameters that are introduced in the E-step. We present examples involving both simulated and real data. 相似文献

15.

ROC curve estimation when covariates affect the verification process

Rodenberg C Zhou XH 《Biometrics》2000,56(4):1256-1262

A receiver operating characteristic (ROC) curve is commonly used to measure the accuracy of a medical test. It is a plot of the true positive fraction (sensitivity) against the false positive fraction (1-specificity) for increasingly stringent positivity criterion. Bias can occur in estimation of an ROC curve if only some of the tested patients are selected for disease verification and if analysis is restricted only to the verified cases. This bias is known as verification bias. In this paper, we address the problem of correcting for verification bias in estimation of an ROC curve when the verification process and efficacy of the diagnostic test depend on covariates. Our method applies the EM algorithm to ordinal regression models to derive ML estimates for ROC curves as a function of covariates, adjusted for covariates affecting the likelihood of being verified. Asymptotic variance estimates are obtained using the observed information matrix of the observed data. These estimates are derived under the missing-at-random assumption, which means that selection for disease verification depends only on the observed data, i.e., the test result and the observed covariates. We also address the issues of model selection and model checking. Finally, we illustrate the proposed method on data from a two-phase study of dementia disorders, where selection for verification depends on the screening test result and age. 相似文献

16.

Outlier robust model‐assisted small area estimation

下载免费PDF全文

Enrico Fabrizi Nicola Salvati Monica Pratesi Nikos Tzavidis 《Biometrical journal. Biometrische Zeitschrift》2014,56(1):157-175

Small area estimation with M‐quantile models was proposed by Chambers and Tzavidis ( 2006 ). The key target of this approach to small area estimation is to obtain reliable and outlier robust estimates avoiding at the same time the need for strong parametric assumptions. This approach, however, does not allow for the use of unit level survey weights, making questionable the design consistency of the estimators unless the sampling design is self‐weighting within small areas. In this paper, we adopt a model‐assisted approach and construct design consistent small area estimators that are based on the M‐quantile small area model. Analytic and bootstrap estimators of the design‐based variance are discussed. The proposed estimators are empirically evaluated in the presence of complex sampling designs. 相似文献

17.

Confidence Interval of a Proportion with Over‐dispersion

Cong Chen Robert W. Tipping 《Biometrical journal. Biometrische Zeitschrift》2002,44(7):877-886

A new approach that extends the classical Clopper‐Pearson procedure is proposed for the estimation of the (1–α)% confidence interval of a proportion with over‐dispersion. Over‐dispersion occurs when a proportion of interest shows more variation (variance inflation) than predicted by the binomial distribution. There are two steps in the approach. The first step consists of the estimation of the variance inflation factor. In the second step, an extended Clopper‐Pearson procedure is applied to calculate the confidence interval after the effective sample size is obtained by adjusting with the estimated variance inflation factor. The performance of the extended Clopper‐Pearson procedure is evaluated via a Monte Carlo study under the setup motivated from head lice studies. It is demonstrated that the 95% confidence intervals constructed from the new approach generally have the closest coverage rate to target (95%) when compared with those constructed from competing procedures. 相似文献

18.

Fitting Mixed Poisson Regression Models Using Quasi-Likelihood Methods

James J. Chen Hongshik Ahn 《Biometrical journal. Biometrische Zeitschrift》1996,38(1):81-96

This paper is to investigate the use of the quasi-likelihood, extended quasi-likelihood, and pseudo-likelihood approach to estimating and testing the mean parameters with respect to two variance models, M1: φ μ^θ(1+μphis;) and M2: φ μ^θ(1+τ). Simulation was conducted to compare the bias and standard deviation, and type I error of the Wald tests, based on the model-based and robust variance estimates, using the three semi-parametric approaches under four mixed Poisson models, two variance structures, and two sample sizes. All methods perform reasonably well in terms of bias. Type I error of the Wald test, based on either the model-based or robust estimate, tends to be larger than the nominal level when over-dispersion is moderate. The extended quasi-likelihood method with the variance model M1 performs more consistently in terms of the efficiency and controlling the type I error than with the model M2, and better than the pseudo-likelihood approach with either the M1 or M2 model. The model-based estimate seems to perform better than the robust estimate when the sample size is small. 相似文献

19.

突变积累实验中极大似然法和距法的比较

邓红文李剑《生命科学研究》2001,5(3):189-201

最近,人们突变积累实验（MA）中测定有害基因突变（DGM）的兴趣大增。在MA实验中有两种常见的DGM估计方法（极大似然法ML和距法MM）,依靠计算机模拟和处理真实数据的应用软件来比较这两种方法。结论是：ML法难于得到最大似然估计（MLEs),所以ML法不如MM法估计有效;即使MLEs可得,也因其具严重的微样误差（据偏差和抽样差异）而产生估计偏差;似然函数曲线较平坦而难于区分高峰态和低峰态的分布。相似文献

20.

Marginalized zero‐inflated Poisson models with missing covariates

下载免费PDF全文

Habtamu K. Benecha John S. Preisser Kimon Divaris Amy H. Herring Kalyan Das 《Biometrical journal. Biometrische Zeitschrift》2018,60(4):845-858

Unlike zero‐inflated Poisson regression, marginalized zero‐inflated Poisson (MZIP) models for counts with excess zeros provide estimates with direct interpretations for the overall effects of covariates on the marginal mean. In the presence of missing covariates, MZIP and many other count data models are ordinarily fitted using complete case analysis methods due to lack of appropriate statistical methods and software. This article presents an estimation method for MZIP models with missing covariates. The method, which is applicable to other missing data problems, is illustrated and compared with complete case analysis by using simulations and dental data on the caries preventive effects of a school‐based fluoride mouthrinse program. 相似文献