共查询到20条相似文献,搜索用时 15 毫秒
1.
An asymptotic theory for model selection inference in general semiparametric problems 总被引:2,自引:0,他引:2
Hjort & Claeskens (2003) developed an asymptotic theoryfor model selection, model averaging and subsequent inferenceusing likelihood methods in parametric models, along with associatedconfidence statements. In this article, we consider a semiparametricversion of this problem, wherein the likelihood depends on parametersand an unknown function, and model selection/averaging is tobe applied to the parametric parts of the model. We show thatall the results of Hjort & Claeskens hold in the semiparametriccontext, if the Fisher information matrix for parametric modelsis replaced by the semiparametric information bound for semiparametricmodels, and if maximum likelihood estimators for parametricmodels are replaced by semiparametric efficient profile estimators.Our methods of proof employ Le Cam's contiguity lemmas, leadingto transparent results. The results also describe the behaviourof semiparametric model estimators when the parametric componentis misspecified, and also have implications for pointwise-consistentmodel selectors. 相似文献
2.
3.
We address the important practical problem of how to select the random effects component in a linear mixed model. A hierarchical Bayesian model is used to identify any random effect with zero variance. The proposed approach reparameterizes the mixed model so that functions of the covariance parameters of the random effects distribution are incorporated as regression coefficients on standard normal latent variables. We allow random effects to effectively drop out of the model by choosing mixture priors with point mass at zero for the random effects variances. Due to the reparameterization, the model enjoys a conditionally linear structure that facilitates the use of normal conjugate priors. We demonstrate that posterior computation can proceed via a simple and efficient Markov chain Monte Carlo algorithm. The methods are illustrated using simulated data and real data from a study relating prenatal exposure to polychlorinated biphenyls and psychomotor development of children. 相似文献
4.
In radiation epidemiology, it is often necessary to use mathematical models in the absence of direct measurements of individual doses. When complex models are used as surrogates for direct measurements to estimate individual doses that occurred almost 50 years ago, dose estimates will be associated with considerable error, this error being a mixture of (a) classical measurement error due to individual data such as diet histories and (b) Berkson measurement error associated with various aspects of the dosimetry system. In the Nevada Test Site(NTS) Thyroid Disease Study, the Berkson measurement errors are correlated within strata. This article concerns the development of statistical methods for inference about risk of radiation dose on thyroid disease, methods that account for the complex error structure inherence in the problem. Bayesian methods using Markov chain Monte Carlo and Monte-Carlo expectation-maximization methods are described, with both sharing a key Metropolis-Hastings step. Regression calibration is also considered, but we show that regression calibration does not use the correlation structure of the Berkson errors. Our methods are applied to the NTS Study, where we find a strong dose-response relationship between dose and thyroiditis. We conclude that full consideration of mixtures of Berkson and classical uncertainties in reconstructed individual doses are important for quantifying the dose response and its credibility/confidence interval. Using regression calibration and expectation values for individual doses can lead to a substantial underestimation of the excess relative risk per gray and its 95% confidence intervals. 相似文献
5.
6.
Information theory was applied to select the best model fitting total length ( L T )-at-age data and calculate the averaged model for Japanese eel Anguilla japonica compiled from published literature and the differences in growth between sexes were examined. Five candidate growth models were the von Bertalanffy, generalized von Bertalanffy, Gompertz, logistic and power models. The von Bertalanffy growth model with sex-specific coefficients was best supported by the data and nearly overlapped the averaged growth model based on Akaike weights, indicating a similar fit to the data. The Gompertz, generalized von Bertalanffy and power growth models were also substantially supported by the data. The L T at age of A. japonica were larger in females than in males according to the averaged growth mode, suggesting a sexual dimorphism in growth. Model inferences based on information theory, which deal with uncertainty in model selection and robust parameter estimates, are recommended for modelling the growth of A. japonica . 相似文献
7.
SUMMARY: Application of classical model selection methods such as Akaike's information criterion (AIC) becomes problematic when observations are missing. In this article we propose some variations on the AIC, which are applicable to missing covariate problems. The method is directly based on the expectation maximization (EM) algorithm and is readily available for EM-based estimation methods, without much additional computational efforts. The missing data AIC criteria are formally derived and shown to work in a simulation study and by application to data on diabetic retinopathy. 相似文献
8.
In the analysis of data generated by change-point processes, one critical challenge is to determine the number of change-points. The classic Bayes information criterion (BIC) statistic does not work well here because of irregularities in the likelihood function. By asymptotic approximation of the Bayes factor, we derive a modified BIC for the model of Brownian motion with changing drift. The modified BIC is similar to the classic BIC in the sense that the first term consists of the log likelihood, but it differs in the terms that penalize for model dimension. As an example of application, this new statistic is used to analyze array-based comparative genomic hybridization (array-CGH) data. Array-CGH measures the number of chromosome copies at each genome location of a cell sample, and is useful for finding the regions of genome deletion and amplification in tumor cells. The modified BIC performs well compared to existing methods in accurately choosing the number of regions of changed copy number. Unlike existing methods, it does not rely on tuning parameters or intensive computing. Thus it is impartial and easier to understand and to use. 相似文献
9.
Thomas R. Stanley Kenneth P. Burnham 《Biometrical journal. Biometrische Zeitschrift》1998,40(4):475-494
Specification of an appropriate model is critical to valid statistical inference. Given the “true model” for the data is unknown, the goal of model selection is to select a plausible approximating model that balances model bias and sampling variance. Model selection based on information criteria such as AIC or its variant AICc, or criteria like CAIC, has proven useful in a variety of contexts including the analysis of open-population capture-recapture data. These criteria have not been intensively evaluated for closed-population capture-recapture models, which are integer parameter models used to estimate population size (N), and there is concern that they will not perform well. To address this concern, we evaluated AIC, AICc, and CAIC model selection for closed-population capture-recapture models by empirically assessing the quality of inference for the population size parameter N. We found that AIC-, AICc-, and CAIC-selected models had smaller relative mean squared errors than randomly selected models, but that confidence interval coverage on N was poor unless unconditional variance estimates (which incorporate model uncertainty) were used to compute confidence intervals. Overall, AIC and AICc outperformed CAIC, and are preferred to CAIC for selection among the closed-population capture-recapture models we investigated. A model averaging approach to estimation, using AIC, AICc, or CAIC to estimate weights, was also investigated and proved superior to estimation using AIC-, AICc-, or CAIC-selected models. Our results suggested that, for model averaging, AIC or AICc should be favored over CAIC for estimating weights. 相似文献
10.
We construct Bayesian methods for semiparametric modeling of a monotonic regression function when the predictors are measured with classical error. Berkson error, or a mixture of the two. Such methods require a distribution for the unobserved (latent) predictor, a distribution we also model semiparametrically. Such combinations of semiparametric methods for the dose response as well as the latent variable distribution have not been considered in the measurement error literature for any form of measurement error. In addition, our methods represent a new approach to those problems where the measurement error combines Berkson and classical components. While the methods are general, we develop them around a specific application, namely, the study of thyroid disease in relation to radiation fallout from the Nevada test site. We use this data to illustrate our methods, which suggest a point estimate (posterior mean) of relative risk at high doses nearly double that of previous analyses but that also suggest much greater uncertainty in the relative risk. 相似文献
11.
Victor Kipnis Laurence S. Freedman Raymond J. Carroll Douglas Midthune 《Biometrics》2016,72(1):106-115
12.
Wang CY 《Biometrics》2000,56(1):106-112
Consider the problem of estimating the correlation between two nutrient measurements, such as the percent energy from fat obtained from a food frequency questionnaire (FFQ) and that from repeated food records or 24-hour recalls. Under a classical additive model for repeated food records, it is known that there is an attenuation effect on the correlation estimation if the sample average of repeated food records for each subject is used to estimate the underlying long-term average. This paper considers the case in which the selection probability of a subject for participation in the calibration study, in which repeated food records are measured, depends on the corresponding FFQ value, and the repeated longitudinal measurement errors have an autoregressive structure. This paper investigates a normality-based estimator and compares it with a simple method of moments. Both methods are consistent if the first two moments of nutrient measurements exist. Furthermore, joint estimating equations are applied to estimate the correlation coefficient and related nuisance parameters simultaneously. This approach provides a simple sandwich formula for the covariance estimation of the estimator. Finite sample performance is examined via a simulation study, and the proposed weighted normality-based estimator performs well under various distributional assumptions. The methods are applied to real data from a dietary assessment study. 相似文献
13.
The conventional model selection criterion, the Akaike informationcriterion, AIC, has been applied to choose candidate modelsin mixed-effects models by the consideration of marginal likelihood.Vaida & Blanchard (2005) demonstrated that such a marginalAIC and its small sample correction are inappropriate when theresearch focus is on clusters. Correspondingly, these authorssuggested the use of conditional AIC. Their conditional AICis derived under the assumption that the variance-covariancematrix or scaled variance-covariance matrix of random effectsis known. This note provides a general conditional AIC but withoutthese strong assumptions. Simulation studies show that the proposedmethod is promising. 相似文献
14.
A method of constructing trees for correlated failure times is put forward. It adopts the backfitting idea of classification and regression trees (CART) (Breiman et al., 1984, in Classification and Regression Trees). The tree method is developed based on the maximized likelihoods associated with the gamma frailty model and standard likelihood-related techniques are incorporated. The proposed method is assessed through simulations conducted under a variety of model configurations and illustrated using the chronic granulomatous disease (CGD) study data. 相似文献
15.
In many areas of medical research, such as psychiatry and gerontology, latent class variables are used to classify individuals into disease categories, often with the intention of hierarchical modeling. Problems arise when it is not clear how many disease classes are appropriate, creating a need for model selection and diagnostic techniques. Previous work has shown that the Pearson chi 2 statistic and the log-likelihood ratio G2 statistic are not valid test statistics for evaluating latent class models. Other methods, such as information criteria, provide decision rules without providing explicit information about where discrepancies occur between a model and the data. Identifiability issues further complicate these problems. This paper develops procedures for assessing Markov chain Monte Carlo convergence and model diagnosis and for selecting the number of categories for the latent variable based on evidence in the data using Markov chain Monte Carlo techniques. Simulations and a psychiatric example are presented to demonstrate the effective use of these methods. 相似文献
16.
Currently available methods for model selection used in phylogenetic analysis are based on an initial fixed-tree topology. Once a model is picked based on this topology, a rigorous search of the tree space is run under that model to find the maximum-likelihood estimate of the tree (topology and branch lengths) and the maximum-likelihood estimates of the model parameters. In this paper, we propose two extensions to the decision-theoretic (DT) approach that relax the fixed-topology restriction. We also relax the fixed-topology restriction for the Bayesian information criterion (BIC) and the Akaike information criterion (AIC) methods. We compare the performance of the different methods (the relaxed, restricted, and the likelihood-ratio test [LRT]) using simulated data. This comparison is done by evaluating the relative complexity of the models resulting from each method and by comparing the performance of the chosen models in estimating the true tree. We also compare the methods relative to one another by measuring the closeness of the estimated trees corresponding to the different chosen models under these methods. We show that varying the topology does not have a major impact on model choice. We also show that the outcome of the two proposed extensions is identical and is comparable to that of the BIC, Extended-BIC, and DT. Hence, using the simpler methods in choosing a model for analyzing the data is more computationally feasible, with results comparable to the more computationally intensive methods. Another outcome of this study is that earlier conclusions about the DT approach are reinforced. That is, LRT, Extended-AIC, and AIC result in more complicated models that do not contribute to the performance of the phylogenetic inference, yet cause a significant increase in the time required for data analysis. 相似文献
17.
A primary objective of current air pollution research is the assessment of health effects related to specific sources of air particles or particulate matter (PM). Quantifying source-specific risk is a challenge because most PM health studies do not directly observe the contributions of the pollution sources themselves. Instead, given knowledge of the chemical characteristics of known sources, investigators infer pollution source contributions via a source apportionment or multivariate receptor analysis applied to a large number of observed elemental concentrations. Although source apportionment methods are well established for exposure assessment, little work has been done to evaluate the appropriateness of characterizing unobservable sources thus in health effects analyses. In this article, we propose a structural equation framework to assess source-specific health effects using speciated elemental data. This approach corresponds to fitting a receptor model and the health outcome model jointly, such that inferences on the health effects account for the fact that uncertainty is associated with the source contributions. Since the structural equation model (SEM) typically involves a large number of parameters, for small-sample settings, we propose a fully Bayesian estimation approach that leverages historical exposure data from previous related exposure studies. We compare via simulation the performance of our approach in estimating source-specific health effects to that of 2 existing approaches, a tracer approach and a 2-stage approach. Simulation results suggest that the proposed informative Bayesian SEM is effective in eliminating the bias incurred by the 2 existing approaches, even when the number of exposures is limited. We employ the proposed methods in the analysis of a concentrator study investigating the association between ST-segment, a cardiovascular outcome, and major sources of Boston PM and discuss the implications of our findings with respect to the design of future PM concentrator studies. 相似文献
18.
In biostatistical practice, it is common to use information criteria as a guide for model selection. We propose new versions of the focused information criterion (FIC) for variable selection in logistic regression. The FIC gives, depending on the quantity to be estimated, possibly different sets of selected variables. The standard version of the FIC measures the mean squared error of the estimator of the quantity of interest in the selected model. In this article, we propose more general versions of the FIC, allowing other risk measures such as the one based on L(p) error. When prediction of an event is important, as is often the case in medical applications, we construct an FIC using the error rate as a natural risk measure. The advantages of using an information criterion which depends on both the quantity of interest and the selected risk measure are illustrated by means of a simulation study and application to a study on diabetic retinopathy. 相似文献
19.
20.
Wei Pan 《Biometrics》2001,57(2):529-534
Model selection is a necessary step in many practical regression analyses. But for methods based on estimating equations, such as the quasi-likelihood and generalized estimating equation (GEE) approaches, there seem to be few well-studied model selection techniques. In this article, we propose a new model selection criterion that minimizes the expected predictive bias (EPB) of estimating equations. A bootstrap smoothed cross-validation (BCV) estimate of EPB is presented and its performance is assessed via simulation for overdispersed generalized linear models. For illustration, the method is applied to a real data set taken from a study of the development of ewe embryos. 相似文献