首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 906 毫秒
1.
Marginal methods have been widely used for the analysis of longitudinal ordinal and categorical data. These models do not require full parametric assumptions on the joint distribution of repeated response measurements but only specify the marginal or even association structures. However, inference results obtained from these methods often incur serious bias when variables are subject to error. In this paper, we tackle the problem that misclassification exists in both response and categorical covariate variables. We develop a marginal method for misclassification adjustment, which utilizes second‐order estimating functions and a functional modeling approach, and can yield consistent estimates and valid inference for mean and association parameters. We propose a two‐stage estimation approach for cases in which validation data are available. Our simulation studies show good performance of the proposed method under a variety of settings. Although the proposed method is phrased to data with a longitudinal design, it also applies to correlated data arising from clustered and family studies, in which association parameters may be of scientific interest. The proposed method is applied to analyze a dataset from the Framingham Heart Study as an illustration.  相似文献   

2.
O'Brien SM  Dunson DB 《Biometrics》2004,60(3):739-746
Bayesian analyses of multivariate binary or categorical outcomes typically rely on probit or mixed effects logistic regression models that do not have a marginal logistic structure for the individual outcomes. In addition, difficulties arise when simple noninformative priors are chosen for the covariance parameters. Motivated by these problems, we propose a new type of multivariate logistic distribution that can be used to construct a likelihood for multivariate logistic regression analysis of binary and categorical data. The model for individual outcomes has a marginal logistic structure, simplifying interpretation. We follow a Bayesian approach to estimation and inference, developing an efficient data augmentation algorithm for posterior computation. The method is illustrated with application to a neurotoxicology study.  相似文献   

3.
Logistic probability models—models linear in the log odds of the outcome event—have found extensive application in modelling of unordered categorical responses. This paper illustrates some extensions of logistic models to the modelling of probabilities of ordinal responses. The extensions arise naturally from discrete probability models for the conditional distribution of the ordinal response, as well as from linear modelling of the log odds of response. Methods of estimation and examination of fit developed for the binary logistic model extend in a straightforward manner to the ordinal models. The models and methods are illustrated in an analysis of the dependence of chronic obstructive respiratory disease prevalence on smoking and age.  相似文献   

4.
McNemar's test is popular for assessing the difference between proportions when two observations are taken on each experimental unit. It is useful under a variety of epidemiological study designs that produce correlated binary outcomes. In studies involving outcome ascertainment, cost or feasibility concerns often lead researchers to employ error-prone surrogate diagnostic tests. Assuming an available gold standard diagnostic method, we address point and confidence interval estimation of the true difference in proportions and the paired-data odds ratio by incorporating external or internal validation data. We distinguish two special cases, depending on whether it is reasonable to assume that the diagnostic test properties remain the same for both assessments (e.g., at baseline and at follow-up). Likelihood-based analysis yields closed-form estimates when validation data are external and requires numeric optimization when they are internal. The latter approach offers important advantages in terms of robustness and efficient odds ratio estimation. We consider internal validation study designs geared toward optimizing efficiency given a fixed cost allocated for measurements. Two motivating examples are presented, using gold standard and surrogate bivariate binary diagnoses of bacterial vaginosis (BV) on women participating in the HIV Epidemiology Research Study (HERS).  相似文献   

5.
Meta-analysis of binary data is challenging when the event under investigation is rare, and standard models for random-effects meta-analysis perform poorly in such settings. In this simulation study, we investigate the performance of different random-effects meta-analysis models in terms of point and interval estimation of the pooled log odds ratio in rare events meta-analysis. First and foremost, we evaluate the performance of a hypergeometric-normal model from the family of generalized linear mixed models (GLMMs), which has been recommended, but has not yet been thoroughly investigated for rare events meta-analysis. Performance of this model is compared to performance of the beta-binomial model, which yielded favorable results in previous simulation studies, and to the performance of models that are frequently used in rare events meta-analysis, such as the inverse variance model and the Mantel–Haenszel method. In addition to considering a large number of simulation parameters inspired by real-world data settings, we study the comparative performance of the meta-analytic models under two different data-generating models (DGMs) that have been used in past simulation studies. The results of this study show that the hypergeometric-normal GLMM is useful for meta-analysis of rare events when moderate to large heterogeneity is present. In addition, our study reveals important insights with regard to the performance of the beta-binomial model under different DGMs from the binomial-normal family. In particular, we demonstrate that although misalignment of the beta-binomial model with the DGM affects its performance, it shows more robustness to the DGM than its competitors.  相似文献   

6.
Newton MA  Lee Y 《Biometrics》2000,56(4):1088-1097
Cancerous tumor growth creates cells with abnormal DNA. Allelic-loss experiments identify genomic deletions in cancer cells, but sources of variation and intrinsic dependencies complicate inference about the location and effect of suppressor genes; such genes are the target of these experiments and are thought to be involved in tumor development. We investigate properties of an instability-selection model of allelic-loss data, including likelihood-based parameter estimation and hypothesis testing. By considering a special complete-data case, we derive an approximate calibration method for hypothesis tests of sporadic deletion. Parametric bootstrap and Bayesian computations are also developed. Data from three allelic-loss studies are reanalyzed to illustrate the methods.  相似文献   

7.
A generalized case-control (GCC) study, like the standard case-control study, leverages outcome-dependent sampling (ODS) to extend to nonbinary responses. We develop a novel, unifying approach for analyzing GCC study data using the recently developed semiparametric extension of the generalized linear model (GLM), which is substantially more robust to model misspecification than existing approaches based on parametric GLMs. For valid estimation and inference, we use a conditional likelihood to account for the biased sampling design. We describe analysis procedures for estimation and inference for the semiparametric GLM under a conditional likelihood, and we discuss problems with estimation and inference under a conditional likelihood when the response distribution is misspecified. We demonstrate the flexibility of our approach over existing ones through extensive simulation studies, and we apply the methodology to an analysis of the Asset and Health Dynamics Among the Oldest Old study, which motives our research. The proposed approach yields a simple yet versatile solution for handling ODS in a wide variety of possible response distributions and sampling schemes encountered in practice.  相似文献   

8.
We are interested in the estimation of average treatment effects based on right-censored data of an observational study. We focus on causal inference of differences between t-year absolute event risks in a situation with competing risks. We derive doubly robust estimation equations and implement estimators for the nuisance parameters based on working regression models for the outcome, censoring, and treatment distribution conditional on auxiliary baseline covariates. We use the functional delta method to show that these estimators are regular asymptotically linear estimators and estimate their variances based on estimates of their influence functions. In empirical studies, we assess the robustness of the estimators and the coverage of confidence intervals. The methods are further illustrated using data from a Danish registry study.  相似文献   

9.
Weibin Zhong  Guoqing Diao 《Biometrics》2023,79(3):1959-1971
Two-phase studies such as case-cohort and nested case-control studies are widely used cost-effective sampling strategies. In the first phase, the observed failure/censoring time and inexpensive exposures are collected. In the second phase, a subgroup of subjects is selected for measurements of expensive exposures based on the information from the first phase. One challenging issue is how to utilize all the available information to conduct efficient regression analyses of the two-phase study data. This paper proposes a joint semiparametric modeling of the survival outcome and the expensive exposures. Specifically, we assume a class of semiparametric transformation models and a semiparametric density ratio model for the survival outcome and the expensive exposures, respectively. The class of semiparametric transformation models includes the proportional hazards model and the proportional odds model as special cases. The density ratio model is flexible in modeling multivariate mixed-type data. We develop efficient likelihood-based estimation and inference procedures and establish the large sample properties of the nonparametric maximum likelihood estimators. Extensive numerical studies reveal that the proposed methods perform well under practical settings. The proposed methods also appear to be reasonably robust under various model mis-specifications. An application to the National Wilms Tumor Study is provided.  相似文献   

10.
Interpreting parameters in the logistic regression model with random effects   总被引:11,自引:0,他引:11  
Logistic regression with random effects is used to study the relationship between explanatory variables and a binary outcome in cases with nonindependent outcomes. In this paper, we examine in detail the interpretation of both fixed effects and random effects parameters. As heterogeneity measures, the random effects parameters included in the model are not easily interpreted. We discuss different alternative measures of heterogeneity and suggest using a median odds ratio measure that is a function of the original random effects parameters. The measure allows a simple interpretation, in terms of well-known odds ratios, that greatly facilitates communication between the data analyst and the subject-matter researcher. Three examples from different subject areas, mainly taken from our own experience, serve to motivate and illustrate different aspects of parameter interpretation in these models.  相似文献   

11.
Variable Selection for Semiparametric Mixed Models in Longitudinal Studies   总被引:2,自引:0,他引:2  
Summary .  We propose a double-penalized likelihood approach for simultaneous model selection and estimation in semiparametric mixed models for longitudinal data. Two types of penalties are jointly imposed on the ordinary log-likelihood: the roughness penalty on the nonparametric baseline function and a nonconcave shrinkage penalty on linear coefficients to achieve model sparsity. Compared to existing estimation equation based approaches, our procedure provides valid inference for data with missing at random, and will be more efficient if the specified model is correct. Another advantage of the new procedure is its easy computation for both regression components and variance parameters. We show that the double-penalized problem can be conveniently reformulated into a linear mixed model framework, so that existing software can be directly used to implement our method. For the purpose of model inference, we derive both frequentist and Bayesian variance estimation for estimated parametric and nonparametric components. Simulation is used to evaluate and compare the performance of our method to the existing ones. We then apply the new method to a real data set from a lactation study.  相似文献   

12.
In inter-laboratory studies, a fundamental problem of interest is inference concerning the consensus mean, when the measurements are made by several laboratories which may exhibit different within-laboratory variances, apart from the between laboratory variability. A heteroscedastic one-way random model is very often used to model this scenario. Under such a model, a modified signed log-likelihood ratio procedure is developed for the interval estimation of the common mean. Furthermore, simulation results are presented to show the accuracy of the proposed confidence interval, especially for small samples. The results are illustrated using an example on the determination of selenium in non-fat milk powder by combining the results of four methods. Here, the sample size is small, and the confidence limits for the common mean obtained by different methods produce very different results. The confidence interval based on the modified signed log-likelihood ratio procedure appears to be quite satisfactory.  相似文献   

13.
A Bayesian missing value estimation method for gene expression profile data   总被引:13,自引:0,他引:13  
MOTIVATION: Gene expression profile analyses have been used in numerous studies covering a broad range of areas in biology. When unreliable measurements are excluded, missing values are introduced in gene expression profiles. Although existing multivariate analysis methods have difficulty with the treatment of missing values, this problem has received little attention. There are many options for dealing with missing values, each of which reaches drastically different results. Ignoring missing values is the simplest method and is frequently applied. This approach, however, has its flaws. In this article, we propose an estimation method for missing values, which is based on Bayesian principal component analysis (BPCA). Although the methodology that a probabilistic model and latent variables are estimated simultaneously within the framework of Bayes inference is not new in principle, actual BPCA implementation that makes it possible to estimate arbitrary missing variables is new in terms of statistical methodology. RESULTS: When applied to DNA microarray data from various experimental conditions, the BPCA method exhibited markedly better estimation ability than other recently proposed methods, such as singular value decomposition and K-nearest neighbors. While the estimation performance of existing methods depends on model parameters whose determination is difficult, our BPCA method is free from this difficulty. Accordingly, the BPCA method provides accurate and convenient estimation for missing values. AVAILABILITY: The software is available at http://hawaii.aist-nara.ac.jp/~shige-o/tools/.  相似文献   

14.
Recently, there has been a great deal of interest in the analysis of multivariate survival data. In most epidemiological studies, survival times of the same cluster are related because of some unobserved risk factors such as the environmental or genetic factors. Therefore, modelling of dependence between events of correlated individuals is required to ensure a correct inference on the effects of treatments or covariates on the survival times. In the past decades, extension of proportional hazards model has been widely considered for modelling multivariate survival data by incorporating a random effect which acts multiplicatively on the hazard function. In this article, we consider the proportional odds model, which is an alternative to the proportional hazards model at which the hazard ratio between individuals converges to unity eventually. This is a reasonable property particularly when the treatment effect fades out gradually and the homogeneity of the population increases over time. The objective of this paper is to assess the influence of the random effect on the within‐subject correlation and the population heterogeneity. We are particularly interested in the properties of the proportional odds model with univariate random effect and correlated random effect. The correlations between survival times are derived explicitly for both choices of mixing distributions and are shown to be independent of the covariates. The time path of the odds function among the survivors are also examined to study the effect of the choice of mixing distribution. Modelling multivariate survival data using a univariate mixing distribution may be inadequate as the random effect not only characterises the dependence of the survival times, but also the conditional heterogeneity among the survivors. A robust estimate for the correlation of the logarithm of the survival times within a cluster is obtained disregarding the choice of the mixing distributions. The sensitivity of the estimate of the regression parameter under a misspecification of the mixing distribution is studied through simulation. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

15.
Roy J  Lin X 《Biometrics》2005,61(3):837-846
We consider estimation in generalized linear mixed models (GLMM) for longitudinal data with informative dropouts. At the time a unit drops out, time-varying covariates are often unobserved in addition to the missing outcome. However, existing informative dropout models typically require covariates to be completely observed. This assumption is not realistic in the presence of time-varying covariates. In this article, we first study the asymptotic bias that would result from applying existing methods, where missing time-varying covariates are handled using naive approaches, which include: (1) using only baseline values; (2) carrying forward the last observation; and (3) assuming the missing data are ignorable. Our asymptotic bias analysis shows that these naive approaches yield inconsistent estimators of model parameters. We next propose a selection/transition model that allows covariates to be missing in addition to the outcome variable at the time of dropout. The EM algorithm is used for inference in the proposed model. Data from a longitudinal study of human immunodeficiency virus (HIV)-infected women are used to illustrate the methodology.  相似文献   

16.
We propose a generalization of the varying coefficient modelfor longitudinal data to cases where not only current but alsorecent past values of the predictor process affect current response.More precisely, the targeted regression coefficient functionsof the proposed model have sliding window supports around currenttime t. A variant of a recently proposed two-step estimationmethod for varying coefficient models is proposed for estimationin the context of these generalized varying coefficient models,and is found to lead to improvements, especially for the caseof additive measurement errors in both response and predictors.The proposed methodology for estimation and inference is alsoapplicable for the case of additive measurement error in thecommon versions of varying coefficient models that relate onlycurrent observations of predictor and response processes toeach other. Asymptotic distributions of the proposed estimatorsare derived, and the model is applied to the problem of predictingprotein concentrations in a longitudinal study. Simulation studiesdemonstrate the efficacy of the proposed estimation procedure.  相似文献   

17.
Summary With advances in modern medicine and clinical diagnosis, case–control data with characterization of finer subtypes of cases are often available. In matched case–control studies, missingness in exposure values often leads to deletion of entire stratum, and thus entails a significant loss in information. When subtypes of cases are treated as categorical outcomes, the data are further stratified and deletion of observations becomes even more expensive in terms of precision of the category‐specific odds‐ratio parameters, especially using the multinomial logit model. The stereotype regression model for categorical responses lies intermediate between the proportional odds and the multinomial or baseline category logit model. The use of this class of models has been limited as the structure of the model implies certain inferential challenges with nonidentifiability and nonlinearity in the parameters. We illustrate how to handle missing data in matched case–control studies with finer disease subclassification within the cases under a stereotype regression model. We present both Monte Carlo based full Bayesian approach and expectation/conditional maximization algorithm for the estimation of model parameters in the presence of a completely general missingness mechanism. We illustrate our methods by using data from an ongoing matched case–control study of colorectal cancer. Simulation results are presented under various missing data mechanisms and departures from modeling assumptions.  相似文献   

18.
ABSTRACT: BACKGROUND: The representation of a biochemical system as a network is the precursor of any mathematical model of the processes driving the dynamics of that system. Pharmacokinetics uses mathematical models to describe the interactions between drug, and drug metabolites and targets and through the simulation of these models predicts drug levels and/or dynamic behaviors of drug entities in the body. Therefore, the development of computational techniques for inferring the interaction network of the drug entities and its kinetic parameters from observational data is raising great interest in the scientic community of pharmacologists. In fact, the network inference is a set of mathematical procedures deducing the structure of a model from the experimental data associated to the nodes of the network of interactions. In this paper, we deal with the inference of a pharmacokinetic network from the concentrations of the drug and its metabolites observed at discrete time points. RESULTS: The method of network inference presented in this paper is inspired by the theory of time-lagged correlation inference with regard to the deduction of the interaction network, and on a maximum likelihood approach with regard to the estimation of the kinetic parameters of the network. Both network inference and parameter estimation have been designed specically to identify systems of biotransformations, at the biochemical level, from noisy time-resolved experimental data. We use our inference method to deduce the metabolic pathway of the gemcitabine. The inputs to our inference algorithm are the experimental time series of the concentration of gemcitabine and its metabolites. The output is the set of reactions of the metabolic network of the gemcitabine. CONCLUSIONS: Time-lagged correlation based inference pairs up to a probabilistic model of parameter inference from metabolites time series allows the identication of the microscopic pharmacokinetics and pharmacodynamics of a drug with a minimal a priori knowledge. In fact, the inference model presented in this paper is completely unsupervised. It takes as input the time series of the concetrations of the parent drug and its metabolites. The method, applied to the case study of the gemcitabine pharmacokinetics, shows good accuracy and sensitivity.  相似文献   

19.

Background  

Mathematical models for revealing the dynamics and interactions properties of biological systems play an important role in computational systems biology. The inference of model parameter values from time-course data can be considered as a "reverse engineering" process and is still one of the most challenging tasks. Many parameter estimation methods have been developed but none of these methods is effective for all cases and can overwhelm all other approaches. Instead, various methods have their advantages and disadvantages. It is worth to develop parameter estimation methods which are robust against noise, efficient in computation and flexible enough to meet different constraints.  相似文献   

20.
This paper introduces an adaptive neuro ?C fuzzy inference system (ANFIS) and artificial neural networks (ANN) models to predict the apparent and complex viscosity values of model system meat emulsions. Constructed models were compared with multiple linear regression (MLR) modeling based on their estimation performance. The root mean square error (RMSE), mean absolute error (MAE) and determination coefficient (R 2) statistics were performed to evaluate the accuracy of the models tested. Comparison of the models showed that the ANFIS model performed better than the ANN and MLR models to estimate the apparent and complex viscosity values of the model system meat emulsions. Coefficients of determination (R 2) calculated for estimation performance of ANFIS modeling to predict apparent and complex viscosity of the emulsions were 0.996 and 0.992, respectively. Similar R 2 values (0.991 and 0.985) were obtained when estimating the performance of the ANN model. In the present study, use of the constructed ANFIS models can be suggested to effectively predict the apparent and complex viscosity values of model system meat emulsions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号