期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Model-checking techniques based on cumulative residuals

Lin DY Wei LJ Ying Z 《Biometrics》2002,58(1):1-12

Residuals have long been used for graphical and numerical examinations of the adequacy of regression models. Conventional residual analysis based on the plots of raw residuals or their smoothed curves is highly subjective, whereas most numerical goodness-of-fit tests provide little information about the nature of model misspecification. In this paper, we develop objective and informative model-checking techniques by taking the cumulative sums of residuals over certain coordinates (e.g., covariates or fitted values) or by considering some related aggregates of residuals, such as moving sums and moving averages. For a variety of statistical models and data structures, including generalized linear models with independent or dependent observations, the distributions of these stochastic processes tinder the assumed model can be approximated by the distributions of certain zero-mean Gaussian processes whose realizations can be easily generated by computer simulation. Each observed process can then be compared, both graphically and numerically, with a number of realizations from the Gaussian process. Such comparisons enable one to assess objectively whether a trend seen in a residual plot reflects model misspecification or natural variation. The proposed techniques are particularly useful in checking the functional form of a covariate and the link function. Illustrations with several medical studies are provided. 相似文献

2.

Checking semiparametric transformation models with censored data

Chen L Lin DY Zeng D 《Biostatistics (Oxford, England)》2012,13(1):18-31

Semiparametric transformation models provide a very general framework for studying the effects of (possibly time-dependent) covariates on survival time and recurrent event times. Assessing the adequacy of these models is an important task because model misspecification affects the validity of inference and the accuracy of prediction. In this paper, we introduce appropriate time-dependent residuals for these models and consider the cumulative sums of the residuals. Under the assumed model, the cumulative sum processes converge weakly to zero-mean Gaussian processes whose distributions can be approximated through Monte Carlo simulation. These results enable one to assess, both graphically and numerically, how unusual the observed residual patterns are in reference to their null distributions. The residual patterns can also be used to determine the nature of model misspecification. Extensive simulation studies demonstrate that the proposed methods perform well in practical situations. Three medical studies are provided for illustrations. 相似文献

3.

A Robust Alternative to the Schemper–Henderson Estimator of Prediction Error

Matthias Schmid Thomas Hielscher Thomas Augustin Olaf Gefeller 《Biometrics》2011,67(2):524-535

Summary In clinical applications, the prediction error of survival models has to be taken into consideration to assess the practical suitability of conclusions drawn from these models. Different approaches to evaluate the predictive performance of survival models have been suggested in the literature. In this article, we analyze the properties of the estimator of prediction error developed by Schemper and Henderson (2000 , Biometrics 56, 249–255), which quantifies the absolute distance between predicted and observed survival functions. We provide a formal proof that the estimator proposed by Schemper and Henderson is not robust against misspecification of the survival model, that is, the estimator will only be meaningful if the model family used for deriving predictions has been specified correctly. To remedy this problem, we construct a new estimator of the absolute distance between predicted and observed survival functions. We show that this modified Schemper–Henderson estimator is robust against model misspecification, allowing its practical application to a wide class of survival models. The properties of the Schemper–Henderson estimator and its new modification are illustrated by means of a simulation study and the analysis of two clinical data sets. 相似文献

4.

Estimating Treatment Effects of Longitudinal Designs using Regression Models on Propensity Scores

Aristide C. Achy‐Brou Constantine E. Frangakis Michael Griswold 《Biometrics》2010,66(3):824-833

Summary We derive regression estimators that can compare longitudinal treatments using only the longitudinal propensity scores as regressors. These estimators, which assume knowledge of the variables used in the treatment assignment, are important for reducing the large dimension of covariates for two reasons. First, if the regression models on the longitudinal propensity scores are correct, then our estimators share advantages of correctly specified model‐based estimators, a benefit not shared by estimators based on weights alone. Second, if the models are incorrect, the misspecification can be more easily limited through model checking than with models based on the full covariates. Thus, our estimators can also be better when used in place of the regression on the full covariates. We use our methods to compare longitudinal treatments for type II diabetes mellitus. 相似文献

5.

Model checking in regression via dimension reduction

Xia Yingcun 《Biometrika》2009,96(1):133-148

Lack-of-fit checking for parametric and semiparametric modelsis essential in reducing misspecification. The efficiency ofmost existing model-checking methods drops rapidly as the dimensionof the covariates increases. We propose to check a model byprojecting the fitted residuals along a direction that adaptsto the systematic departure of the residuals from the desiredpattern. Consistency of the method is proved for parametricand semiparametric regression models. A bootstrap implementationis also discussed. Simulation comparisons with several existingmethods are made, suggesting that the proposed methods are moreefficient than the existing methods when the dimension increases.Air pollution data from Chicago are used to illustrate the procedure. 相似文献

6.

Doubly robust multiple imputation using kernel‐based techniques

下载免费PDF全文

Chiu‐Hsieh Hsu Yulei He Yisheng Li Qi Long Randall Friese 《Biometrical journal. Biometrische Zeitschrift》2016,58(3):588-606

We consider the problem of estimating the marginal mean of an incompletely observed variable and develop a multiple imputation approach. Using fully observed predictors, we first establish two working models: one predicts the missing outcome variable, and the other predicts the probability of missingness. The predictive scores from the two models are used to measure the similarity between the incomplete and observed cases. Based on the predictive scores, we construct a set of kernel weights for the observed cases, with higher weights indicating more similarity. Missing data are imputed by sampling from the observed cases with probability proportional to their kernel weights. The proposed approach can produce reasonable estimates for the marginal mean and has a double robustness property, provided that one of the two working models is correctly specified. It also shows some robustness against misspecification of both models. We demonstrate these patterns in a simulation study. In a real‐data example, we analyze the total helicopter response time from injury in the Arizona emergency medical service data. 相似文献

7.

On inconsistency of the neighbor-joining, least squares, and minimum evolution estimation when substitution processes are incorrectly modeled 总被引：4，自引：0，他引：4

Susko E Inagaki Y Roger AJ 《Molecular biology and evolution》2004,21(9):1629-1642

Using analytical methods, we show that under a variety of model misspecifications, Neighbor-Joining, minimum evolution, and least squares estimation procedures are statistically inconsistent. Failure to correctly account for differing rates-across-sites processes, failure to correctly model rate matrix parameters, and failure to adjust for parallel rates-across-sites changes (a rates-across-subtrees process) are all shown to lead to a "long branch attraction" form of inconsistency. In addition, failure to account for rates-across-sites processes is also shown to result in underestimation of evolutionary distances for a wide variety of substitution models, generalizing an earlier analytical result for the Jukes-Cantor model reported in Golding and a similar bias result for the GTR or REV model in Kelly and Rice (1996). Although standard rates-across-sites models can be employed in many of these cases to restore consistency, current models cannot account for other kinds of misspecification. We examine an idealized but biologically relevant case, where parallel changes in rates at sites across subtrees is shown to give rise to inconsistency. This changing rates-across-subtrees type model misspecification cannot be adjusted for with conventional methods or without carefully considering the rate variation in the larger tree. The results are presented for four-taxon trees, but the expectation is that they have implications for larger trees as well. To illustrate this, a simulated 42-taxon example is given in which the microsporidia, an enigmatic group of eukaryotes, are incorrectly placed at the archaebacteria-eukaryotes split because of incorrectly specified pairwise distances. The analytical nature of the results lend insight into the reasons that long branch attraction tends to be a common form of inconsistency and reasons that other forms of inconsistency like "long branches repel" can arise in some settings. In many of the cases of inconsistency presented, a particular incorrect topology is estimated with probability converging to one, the implication being that measures of uncertainty like bootstrap support will be unable to detect that there is a problem with the estimation. The focus is on distance methods, but previous simulation results suggest that the zones of inconsistency for distance methods contain the zones of inconsistency for maximum likelihood methods as well. 相似文献

8.

On latent-variable model misspecification in structural measurement error models for binary response

Huang X Tebbs JM 《Biometrics》2009,65(3):710-718

Summary . We consider structural measurement error models for a binary response. We show that likelihood-based estimators obtained from fitting structural measurement error models with pooled binary responses can be far more robust to covariate measurement error in the presence of latent-variable model misspecification than the corresponding estimators from individual responses. Furthermore, despite the loss in information, pooling can provide improved parameter estimators in terms of mean-squared error. Based on these and other findings, we create a new diagnostic method to detect latent-variable model misspecification in structural measurement error models with individual binary response. We use simulation and data from the Framingham Heart Study to illustrate our methods. 相似文献

9.

Bias analysis and the simulation‐extrapolation method for survival data with covariate measurement error under parametric proportional odds models

Grace Y. Yi Wenqing He 《Biometrical journal. Biometrische Zeitschrift》2012,54(3):343-360

It has been well known that ignoring measurement error may result in substantially biased estimates in many contexts including linear and nonlinear regressions. For survival data with measurement error in covariates, there has been extensive discussion in the literature with the focus on proportional hazards (PH) models. Recently, research interest has extended to accelerated failure time (AFT) and additive hazards (AH) models. However, the impact of measurement error on other models, such as the proportional odds model, has received relatively little attention, although these models are important alternatives when PH, AFT, or AH models are not appropriate to fit data. In this paper, we investigate this important problem and study the bias induced by the naive approach of ignoring covariate measurement error. To adjust for the induced bias, we describe the simulation‐extrapolation method. The proposed method enjoys a number of appealing features. Its implementation is straightforward and can be accomplished with minor modifications of existing software. More importantly, the proposed method does not require modeling the covariate process, which is quite attractive in practice. As the precise values of error‐prone covariates are often not observable, any modeling assumption on such covariates has the risk of model misspecification, hence yielding invalid inferences if this happens. The proposed method is carefully assessed both theoretically and empirically. Theoretically, we establish the asymptotic normality for resulting estimators. Numerically, simulation studies are carried out to evaluate the performance of the estimators as well as the impact of ignoring measurement error, along with an application to a data set arising from the Busselton Health Study. Sensitivity of the proposed method to misspecification of the error model is studied as well. 相似文献

10.

Where will species go? Incorporating new advances in climate modelling into projections of species distributions 总被引：4，自引：0，他引：4

LINDA J. BEAUMONT A. J. PITMAN† MICHAEL POULSEN‡ LESLEY HUGHES 《Global Change Biology》2007,13(7):1368-1385

Bioclimatic models are the primary tools for simulating the impact of climate change on species distributions. Part of the uncertainty in the output of these models results from uncertainty in projections of future climates. To account for this, studies often simulate species responses to climates predicted by more than one climate model and/or emission scenario. One area of uncertainty, however, has remained unexplored: internal climate model variability. By running a single climate model multiple times, but each time perturbing the initial state of the model slightly, different but equally valid realizations of climate will be produced. In this paper, we identify how ongoing improvements in climate models can be used to provide guidance for impacts studies. In doing so we provide the first assessment of the extent to which this internal climate model variability generates uncertainty in projections of future species distributions, compared with variability between climate models. We obtained data on 13 realizations from three climate models (three from CSIRO Mark2 v3.0, four from GISS AOM, and six from MIROC v3.2) for two time periods: current (1985–1995) and future (2025–2035). Initially, we compared the simulated values for each climate variable (P, T_max, T_min, and T_mean) for the current period to observed climate data. This showed that climates simulated by realizations from the same climate model were more similar to each other than to realizations from other models. However, when projected into the future, these realizations followed different trajectories and the values of climate variables differed considerably within and among climate models. These had pronounced effects on the projected distributions of nine Australian butterfly species when modelled using the BIOCLIM component of DIVA-GIS. Our results show that internal climate model variability can lead to substantial differences in the extent to which the future distributions of species are projected to change. These can be greater than differences resulting from between-climate model variability. Further, different conclusions regarding the vulnerability of species to climate change can be reached due to internal model variability. Clearly, several climate models, each represented by multiple realizations, are required if we are to adequately capture the range of uncertainty associated with projecting species distributions in the future. 相似文献

11.

Diagnosis of random-effect model misspecification in generalized linear mixed models for binary response

Huang X 《Biometrics》2009,65(2):361-368

Summary . Generalized linear mixed models (GLMMs) are widely used in the analysis of clustered data. However, the validity of likelihood-based inference in such analyses can be greatly affected by the assumed model for the random effects. We propose a diagnostic method for random-effect model misspecification in GLMMs for clustered binary response. We provide a theoretical justification of the proposed method and investigate its finite sample performance via simulation. The proposed method is applied to data from a longitudinal respiratory infection study. 相似文献

12.

A general framework of nonparametric feature selection in high-dimensional data

Hang Yu Yuanjia Wang Donglin Zeng 《Biometrics》2023,79(2):951-963

Nonparametric feature selection for high-dimensional data is an important and challenging problem in the fields of statistics and machine learning. Most of the existing methods for feature selection focus on parametric or additive models which may suffer from model misspecification. In this paper, we propose a new framework to perform nonparametric feature selection for both regression and classification problems. Under this framework, we learn prediction functions through empirical risk minimization over a reproducing kernel Hilbert space. The space is generated by a novel tensor product kernel, which depends on a set of parameters that determines the importance of the features. Computationally, we minimize the empirical risk with a penalty to estimate the prediction and kernel parameters simultaneously. The solution can be obtained by iteratively solving convex optimization problems. We study the theoretical property of the kernel feature space and prove the oracle selection property and Fisher consistency of our proposed method. Finally, we demonstrate the superior performance of our approach compared to existing methods via extensive simulation studies and applications to two real studies. 相似文献

13.

A robust method using propensity score stratification for correcting verification bias for binary tests

He H McDermott MP 《Biostatistics (Oxford, England)》2012,13(1):32-47

Sensitivity and specificity are common measures of the accuracy of a diagnostic test. The usual estimators of these quantities are unbiased if data on the diagnostic test result and the true disease status are obtained from all subjects in an appropriately selected sample. In some studies, verification of the true disease status is performed only for a subset of subjects, possibly depending on the result of the diagnostic test and other characteristics of the subjects. Estimators of sensitivity and specificity based on this subset of subjects are typically biased; this is known as verification bias. Methods have been proposed to correct verification bias under the assumption that the missing data on disease status are missing at random (MAR), that is, the probability of missingness depends on the true (missing) disease status only through the test result and observed covariate information. When some of the covariates are continuous, or the number of covariates is relatively large, the existing methods require parametric models for the probability of disease or the probability of verification (given the test result and covariates), and hence are subject to model misspecification. We propose a new method for correcting verification bias based on the propensity score, defined as the predicted probability of verification given the test result and observed covariates. This is estimated separately for those with positive and negative test results. The new method classifies the verified sample into several subsamples that have homogeneous propensity scores and allows correction for verification bias. Simulation studies demonstrate that the new estimators are more robust to model misspecification than existing methods, but still perform well when the models for the probability of disease and probability of verification are correctly specified. 相似文献

14.

Differential equation modeling of HIV viral fitness experiments: model identification, model selection, and multimodel inference

Miao H Dykes C Demeter LM Wu H 《Biometrics》2009,65(1):292-300

Summary . Many biological processes and systems can be described by a set of differential equation (DE) models. However, literature in statistical inference for DE models is very sparse. We propose statistical estimation, model selection, and multimodel averaging methods for HIV viral fitness experiments in vitro that can be described by a set of nonlinear ordinary differential equations (ODE). The parameter identifiability of the ODE models is also addressed. We apply the proposed methods and techniques to experimental data of viral fitness for HIV-1 mutant 103N. We expect that the proposed modeling and inference approaches for the DE models can be widely used for a variety of biomedical studies. 相似文献

15.

Semiparametric models for missing covariate and response data in regression models

Chen Q Ibrahim JG 《Biometrics》2006,62(1):177-184

We consider a class of semiparametric models for the covariate distribution and missing data mechanism for missing covariate and/or response data for general classes of regression models including generalized linear models and generalized linear mixed models. Ignorable and nonignorable missing covariate and/or response data are considered. The proposed semiparametric model can be viewed as a sensitivity analysis for model misspecification of the missing covariate distribution and/or missing data mechanism. The semiparametric model consists of a generalized additive model (GAM) for the covariate distribution and/or missing data mechanism. Penalized regression splines are used to express the GAMs as a generalized linear mixed effects model, in which the variance of the corresponding random effects provides an intuitive index for choosing between the semiparametric and parametric model. Maximum likelihood estimates are then obtained via the EM algorithm. Simulations are given to demonstrate the methodology, and a real data set from a melanoma cancer clinical trial is analyzed using the proposed methods. 相似文献

16.

Estimating equations with nonignorably missing response data

Wang YG 《Biometrics》1999,55(3):984-989

Troxel, Lipsitz, and Brennan (1997, Biometrics 53, 857-869) considered parameter estimation from survey data with nonignorable nonresponse and proposed weighted estimating equations to remove the biases in the complete-case analysis that ignores missing observations. This paper suggests two alternative modifications for unbiased estimation of regression parameters when a binary outcome is potentially observed at successive time points. The weighting approach of Robins, Rotnitzky, and Zhao (1995, Journal of the American Statistical Association 90, 106-121) is also modified to obtain unbiased estimating functions. The suggested estimating functions are unbiased only when the missingness probability is correctly specified, and misspecification of the missingness model will result in biases in the estimates. Simulation studies are carried out to assess the performance of different methods when the covariate is binary or normal. For the simulation models used, the relative efficiency of the two new methods to the weighting methods is about 3.0 for the slope parameter and about 2.0 for the intercept parameter when the covariate is continuous and the missingness probability is correctly specified. All methods produce substantial biases in the estimates when the missingness model is misspecified or underspecified. Analysis of data from a medical survey illustrates the use and possible differences of these estimating functions. 相似文献

17.

Resampling-based multiple testing methods with covariate adjustment: application to investigation of antiretroviral drug susceptibility

Yang Y Degruttola V 《Biometrics》2008,64(2):329-336

Summary . Identifying genetic mutations that cause clinical resistance to antiretroviral drugs requires adjustment for potential confounders, such as the number of active drugs in a HIV-infected patient's regimen other than the one of interest. Motivated by this problem, we investigated resampling-based methods to test equal mean response across multiple groups defined by HIV genotype, after adjustment for covariates. We consider construction of test statistics and their null distributions under two types of model: parametric and semiparametric. The covariate function is explicitly specified in the parametric but not in the semiparametric approach. The parametric approach is more precise when models are correctly specified, but suffer from bias when they are not; the semiparametric approach is more robust to model misspecification, but may be less efficient. To help preserve type I error while also improving power in both approaches, we propose resampling approaches based on matching of observations with similar covariate values. Matching reduces the impact of model misspecification as well as imprecision in estimation. These methods are evaluated via simulation studies and applied to a data set that combines results from a variety of clinical studies of salvage regimens. Our focus is on relating HIV genotype to viral susceptibility to abacavir after adjustment for the number of active antiretroviral drugs (excluding abacavir) in the patient's regimen. 相似文献

18.

Empirical likelihood for cumulative hazard ratio estimation with covariate adjustment

Dong B Matthews DE 《Biometrics》2012,68(2):408-418

In medical studies, it is often of scientific interest to evaluate the treatment effect via the ratio of cumulative hazards, especially when those hazards may be nonproportional. To deal with nonproportionality in the Cox regression model, investigators usually assume that the treatment effect has some functional form. However, to do so may create a model misspecification problem because it is generally difficult to justify the specific parametric form chosen for the treatment effect. In this article, we employ empirical likelihood (EL) to develop a nonparametric estimator of the cumulative hazard ratio with covariate adjustment under two nonproportional hazard models, one that is stratified, as well as a less restrictive framework involving group-specific treatment adjustment. The asymptotic properties of the EL ratio statistic are derived in each situation and the finite-sample properties of EL-based estimators are assessed via simulation studies. Simultaneous confidence bands for all values of the adjusted cumulative hazard ratio in a fixed interval of interest are also developed. The proposed methods are illustrated using two different datasets concerning the survival experience of patients with non-Hodgkin's lymphoma or ovarian cancer. 相似文献

19.

Residual-based diagnostics for structural equation models

Sánchez BN Houseman EA Ryan LM 《Biometrics》2009,65(1):104-115

Summary . Classical diagnostics for structural equation models are based on aggregate forms of the data and are ill suited for checking distributional or linearity assumptions. We extend recently developed goodness-of-fit tests for correlated data based on subject-specific residuals to structural equation models with latent variables. The proposed tests lend themselves to graphical displays and are designed to detect misspecified distributional or linearity assumptions. To complement graphical displays, test statistics are defined; the null distributions of the test statistics are approximated using computationally efficient simulation techniques. The properties of the proposed tests are examined via simulation studies. We illustrate the methods using data from a study of in utero lead exposure. 相似文献

20.

Functional mixed effects models

Guo W 《Biometrics》2002,58(1):121-128

In this article, a new class of functional models in which smoothing splines are used to model fixed effects as well as random effects is introduced. The linear mixed effects models are extended to nonparametric mixed effects models by introducing functional random effects, which are modeled as realizations of zero-mean stochastic processes. The fixed functional effects and the random functional effects are modeled in the same functional space, which guarantee the population-average and subject-specific curves have the same smoothness property. These models inherit the flexibility of the linear mixed effects models in handling complex designs and correlation structures, can include continuous covariates as well as dummy factors in both the fixed or random design matrices, and include the nested curves models as special cases. Two estimation procedures are proposed. The first estimation procedure exploits the connection between linear mixed effects models and smoothing splines and can be fitted using existing software. The second procedure is a sequential estimation procedure using Kalman filtering. This algorithm avoids inversion of large dimensional matrices and therefore can be applied to large data sets. A generalized maximum likelihood (GML) ratio test is proposed for inference and model selection. An application to comparison of cortisol profiles is used as an illustration. 相似文献