首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 406 毫秒
1.
Larsen K 《Biometrics》2004,60(1):85-92
Multiple categorical variables are commonly used in medical and epidemiological research to measure specific aspects of human health and functioning. To analyze such data, models have been developed considering these categorical variables as imperfect indicators of an individual's "true" status of health or functioning. In this article, the latent class regression model is used to model the relationship between covariates, a latent class variable (the unobserved status of health or functioning), and the observed indicators (e.g., variables from a questionnaire). The Cox model is extended to encompass a latent class variable as predictor of time-to-event, while using information about latent class membership available from multiple categorical indicators. The expectation-maximization (EM) algorithm is employed to obtain maximum likelihood estimates, and standard errors are calculated based on the profile likelihood, treating the nonparametric baseline hazard as a nuisance parameter. A sampling-based method for model checking is proposed. It allows for graphical investigation of the assumption of proportional hazards across latent classes. It may also be used for checking other model assumptions, such as no additional effect of the observed indicators given latent class. The usefulness of the model framework and the proposed techniques are illustrated in an analysis of data from the Women's Health and Aging Study concerning the effect of severe mobility disability on time-to-death for elderly women.  相似文献   

2.
Studies of latent traits often collect data for multiple items measuring different aspects of the trait. For such data, it is common to consider models in which the different items are manifestations of a normal latent variable, which depends on covariates through a linear regression model. This article proposes a flexible Bayesian alternative in which the unknown latent variable density can change dynamically in location and shape across levels of a predictor. Scale mixtures of underlying normals are used in order to model flexibly the measurement errors and allow mixed categorical and continuous scales. A dynamic mixture of Dirichlet processes is used to characterize the latent response distributions. Posterior computation proceeds via a Markov chain Monte Carlo algorithm, with predictive densities used as a basis for inferences and evaluation of model fit. The methods are illustrated using data from a study of DNA damage in response to oxidative stress.  相似文献   

3.
Miglioretti DL 《Biometrics》2003,59(3):710-720
Health status is a complex outcome, often characterized by multiple measures. When assessing changes in health status over time, multiple measures are typically collected longitudinally. Analytic challenges posed by these multivariate longitudinal data are further complicated when the outcomes are combinations of continuous, categorical, and count data. To address these challenges, we propose a fully Bayesian latent transition regression approach for jointly analyzing a mixture of longitudinal outcomes from any distribution. Health status is assumed to be a categorical latent variable, and the multiple outcomes are treated as surrogate measures of the latent health state, observed with error. Using this approach, both baseline latent health state prevalences and the probabilities of transitioning between the health states over time are modeled as functions of covariates. The observed outcomes are related to the latent health states through regression models that include subject-specific effects to account for residual correlation among repeated measures over time, and covariate effects to account for differential measurement of the latent health states. We illustrate our approach with data from a longitudinal study of back pain.  相似文献   

4.
Roy J  Lin X 《Biometrics》2000,56(4):1047-1054
Multiple outcomes are often used to properly characterize an effect of interest. This paper proposes a latent variable model for the situation where repeated measures over time are obtained on each outcome. These outcomes are assumed to measure an underlying quantity of main interest from different perspectives. We relate the observed outcomes using regression models to a latent variable, which is then modeled as a function of covariates by a separate regression model. Random effects are used to model the correlation due to repeated measures of the observed outcomes and the latent variable. An EM algorithm is developed to obtain maximum likelihood estimates of model parameters. Unit-specific predictions of the latent variables are also calculated. This method is illustrated using data from a national panel study on changes in methadone treatment practices.  相似文献   

5.
Finite mixture modeling with mixture outcomes using the EM algorithm   总被引:10,自引:0,他引:10  
Muthén B  Shedden K 《Biometrics》1999,55(2):463-469
This paper discusses the analysis of an extended finite mixture model where the latent classes corresponding to the mixture components for one set of observed variables influence a second set of observed variables. The research is motivated by a repeated measurement study using a random coefficient model to assess the influence of latent growth trajectory class membership on the probability of a binary disease outcome. More generally, this model can be seen as a combination of latent class modeling and conventional mixture modeling. The EM algorithm is used for estimation. As an illustration, a random-coefficient growth model for the prediction of alcohol dependence from three latent classes of heavy alcohol use trajectories among young adults is analyzed.  相似文献   

6.
Missing data are ubiquitous in clinical and social research, and multiple imputation (MI) is increasingly the methodology of choice for practitioners. Two principal strategies for imputation have been proposed in the literature: joint modelling multiple imputation (JM‐MI) and full conditional specification multiple imputation (FCS‐MI). While JM‐MI is arguably a preferable approach, because it involves specification of an explicit imputation model, FCS‐MI is pragmatically appealing, because of its flexibility in handling different types of variables. JM‐MI has developed from the multivariate normal model, and latent normal variables have been proposed as a natural way to extend this model to handle categorical variables. In this article, we evaluate the latent normal model through an extensive simulation study and an application on data from the German Breast Cancer Study Group, comparing the results with FCS‐MI. We divide our investigation in four sections, focusing on (i) binary, (ii) categorical, (iii) ordinal, and (iv) count data. Using data simulated from both the latent normal model and the general location model, we find that in all but one extreme general location model setting JM‐MI works very well, and sometimes outperforms FCS‐MI. We conclude the latent normal model, implemented in the R package jomo , can be used with confidence by researchers, both for single and multilevel multiple imputation.  相似文献   

7.
ABSTRACT Ecologists often develop complex regression models that include multiple categorical and continuous variables, interactions among predictors, and nonlinear relationships between the response and predictor variables. Nomograms, which are graphical devices for presenting mathematical functions and calculating output values, can aid biologists in interpreting and presenting these complex models. To illustrate benefits of nomograms, we developed a logistic regression model of elk (Cervus elaphus) resource selection. With this model, we demonstrated how a nomogram helps scientists and managers interpret interactions among variables, compare the relative biological importance of variables, and examine predicted shapes of relationships (e.g., linear vs. nonlinear) between response and predictor variables. Although our example focused on logistic regression, nomograms are equally useful for other linear and nonlinear models. Regardless of the approach used for model development, nomograms and other graphical summaries can help scientists and managers develop, interpret, and apply statistical models.  相似文献   

8.
通常来讲,生态学者对于解释生态关系、描述格局和过程、进行空间或时间预测比较感兴趣。这些工作可以通过模拟输出值(响应)与一些特征值(即解释变量)的关系来实现。然而,生态数据模拟遇到了挑战,这是因为响应变量和预测变量可能是连续变量或离散变量。需要解释的生态关系通常是非线性的,并且解释变量之间具有复杂的相互作用关系。响应变量和解释变量存在缺失值并不是不常有的现象,奇异值也经常出现在生态数据中。此外,生态学者通常希望生态模型即要易于建立又易要于解释。通常是利用多种统计方法来分析处理各种各样情景中出现的独特的生态问题,这些模型包括(多元)逻辑回归、线性模型、生存模型、方差分析等等。随机森林是一个可以处理所有这些问题的有效方法。随机森林可以用来做分类、聚类、回归和生存分析、评估变量的重要性、检测数据中的奇异值、对缺失数据进行插补等。鉴于随机森林本身在算法上的优势,将就随机森林在生态学中的应用进行总结,对建模过程进行概述,并以云南松分布模拟研究为例,对其主要功能特点进行案例展示。通过对随机森林的一般术语、概念和建模思想进行介绍,有利于读者掌握本方法的应用本质,可以预见随机森林在生态学研究中将得到更多的应用和发展。  相似文献   

9.
Houseman EA  Coull BA  Betensky RA 《Biometrics》2006,62(4):1062-1070
Genomic data are often characterized by a moderate to large number of categorical variables observed for relatively few subjects. Some of the variables may be missing or noninformative. An example of such data is loss of heterozygosity (LOH), a dichotomous variable, observed on a moderate number of genetic markers. We first consider a latent class model where, conditional on unobserved membership in one of k classes, the variables are independent with probabilities determined by a regression model of low dimension q. Using a family of penalties including the ridge and LASSO, we extend this model to address higher-dimensional problems. Finally, we present an orthogonal map that transforms marker space to a space of "features" for which the constrained model has better predictive power. We demonstrate these methods on LOH data collected at 19 markers from 93 brain tumor patients. For this data set, the existing unpenalized latent class methodology does not produce estimates. Additionally, we show that posterior classes obtained from this method are associated with survival for these patients.  相似文献   

10.
The debate on the causal association between vitamin D status, measured as serum concentration of 25-hydroxyvitamin D (25[OH]D), and various health outcomes warrants investigation in large-scale health surveys. Measuring the 25(OH)D concentration for each participant is not always feasible, because of the logistics of blood collection and the costs of vitamin D testing. To address this problem, past research has used predicted 25(OH)D concentration, based on multivariable linear regression, as a proxy for unmeasured vitamin D status. We restate this approach in a mathematical framework, to deduce its possible pitfalls. Monte Carlo simulation and real data from the National Health and Nutrition Examination Survey 2005–06 are used to confirm the deductions. The results indicate that variables that are used in the prediction model (for 25[OH]D concentration) but not in the model for the health outcome (called instrumental variables), play an essential role in the identification of an effect. Such variables should be unrelated to the health outcome other than through vitamin D; otherwise the estimate of interest will be biased. The approach of predicted 25(OH)D concentration derived from multivariable linear regression may be valid. However, careful verification that the instrumental variables are unrelated to the health outcome is required.  相似文献   

11.
Burgette LF  Reiter JP 《Biometrics》2012,68(1):92-100
We describe a Bayesian quantile regression model that uses a confirmatory factor structure for part of the design matrix. This model is appropriate when the covariates are indicators of scientifically determined latent factors, and it is these latent factors that analysts seek to include as predictors in the quantile regression. We apply the model to a study of birth weights in which the effects of latent variables representing psychosocial health and actual tobacco usage on the lower quantiles of the response distribution are of interest. The models can be fit using an R package called factorQR.  相似文献   

12.
Summary This article addresses modeling and inference for ordinal outcomes nested within categorical responses. We propose a mixture of normal distributions for latent variables associated with the ordinal data. This mixture model allows us to fix without loss of generality the cutpoint parameters that link the latent variable with the observed ordinal outcome. Moreover, the mixture model is shown to be more flexible in estimating cell probabilities when compared to the traditional Bayesian ordinal probit regression model with random cutpoint parameters. We extend our model to take into account possible dependence among the outcomes in different categories. We apply the model to a randomized phase III study to compare treatments on the basis of toxicities recorded by type of toxicity and grade within type. The data include the different (categorical) toxicity types exhibited in each patient. Each type of toxicity has an (ordinal) grade associated to it. The dependence among the different types of toxicity exhibited by the same patient is modeled by introducing patient‐specific random effects.  相似文献   

13.
Monitoring health care quality involves combining continuous and discrete outcomes measured on subjects across health care units over time. This article describes a Bayesian approach to jointly modeling multilevel multidimensional continuous and discrete outcomes with serial dependence. The overall goal is to characterize trajectories of traits of each unit. Underlying normal regression models for each outcome are used and dependence among different outcomes is induced through latent variables. Serial dependence is accommodated through modeling the pairwise correlations of the latent variables. Methods are illustrated to assess trends in quality of health care units using continuous and discrete outcomes from a sample of adult veterans discharged from 1 of 22 Veterans Integrated Service Networks with a psychiatric diagnosis between 1993 and 1998.  相似文献   

14.
Objective: To explore the relationship between public trust in scientific experts on obesity and public attention to nutrition recommendations, to investigate trust as a predictor of weight‐related behaviors, and to identify the sociodemographic characteristics associated with high and low trust in scientific experts on obesity. Research Methods and Procedures: This analysis used survey data from two sources: 1) a 2005 Harvard School of Public Health Obesity Survey (N = 2033), and 2) the 2004 General Social Survey (N = 2812). Five outcome measures were used. Three were used to explore trust as a predictor of attention and weight‐related behaviors. Two were used to identify the sociodemographic predictors of trust. Logistic regression analysis was used to model the outcome variables. Results: Trust in scientific experts was the strongest predictor of public attention to nutritional recommendations from scientific experts, but it was not directly related to weight‐related behaviors. Public attention was significantly associated with two weight‐related behaviors: tracking fruit and vegetable intake and exercise. Women and more educated individuals had significantly higher odds of trusting scientific experts. Characteristics associated with distrust in scientific experts included Hispanic race and older age (over 50). Discussion: Public health experts should work toward building trust as an important step in stemming the obesity epidemic. Further, more research is necessary to better understand the factors driving trust in scientific experts on obesity. A deeper insight in this area will certainly be of great benefit to obesity‐related risk communication and potentially lead to positive behavior change.  相似文献   

15.
In biomedical or public health research, it is common for both survival time and longitudinal categorical outcomes to be collected for a subject, along with the subject’s characteristics or risk factors. Investigators are often interested in finding important variables for predicting both survival time and longitudinal outcomes which could be correlated within the same subject. Existing approaches for such joint analyses deal with continuous longitudinal outcomes. New statistical methods need to be developed for categorical longitudinal outcomes. We propose to simultaneously model the survival time with a stratified Cox proportional hazards model and the longitudinal categorical outcomes with a generalized linear mixed model. Random effects are introduced to account for the dependence between survival time and longitudinal outcomes due to unobserved factors. The Expectation–Maximization (EM) algorithm is used to derive the point estimates for the model parameters, and the observed information matrix is adopted to estimate their asymptotic variances. Asymptotic properties for our proposed maximum likelihood estimators are established using the theory of empirical processes. The method is demonstrated to perform well in finite samples via simulation studies. We illustrate our approach with data from the Carolina Head and Neck Cancer Study (CHANCE) and compare the results based on our simultaneous analysis and the separately conducted analyses using the generalized linear mixed model and the Cox proportional hazards model. Our proposed method identifies more predictors than by separate analyses.  相似文献   

16.
Classification and regression tree (CART) modelling was used to determine infectious hypodermal and haematopoietic necrosis virus (IHHNV) resistance and susceptibility in Penaeus stylirostris. In a previous study, eight random amplified polymorphic DNA (RAPD) markers and viral load values using real-time quantitative PCR were obtained and used as the training data set in order to create numerous regression tree models. Specifically, the genetic markers were used as categorical predictor variables and viral load values as the dependent response variable. To determine which model has the highest predictive accuracy for future samples, RAPD fingerprint data was generated from new Penaues stylirostris IHHNV resistant and susceptible individuals and used to test the regression models. The best performing tree was a four terminal node tree with three genetic markers as significant variables. Marker-assisted breeding practices may benefit from the creation of regression tree models that apply genetic markers as predictive factors. To our knowledge this is the first study to use RAPD markers as predictors within a CART prediction model to determine viral susceptibility.  相似文献   

17.
18.
Data from the first wave of the Irish Longitudinal Study on Ageing are used to examine the relationship between fatness and obesity and employment status among older Irish adults. Employment status is regressed on one of the following measures of fatness: BMI and waist circumference entered linearly as continuous variables and obesity as a categorical variable defined using both BMI and waist circumference. Controls for demographic and socioeconomic characteristics, socioeconomic characteristics in childhood and physical, mental and behavioural health are also included. The regression results for women indicate that all measures of fatness are negatively associated with the probability of being employed and that the employment elasticity associated with waist circumference is larger than the elasticity associated with BMI. The results for men indicate that employment is not significantly associated with BMI and waist circumference when these are entered linearly in the regression, but it is significantly and negatively associated with obesity defined either using BMI or waist circumference as categorical variables. The results also indicate that the negative association between obesity and employment status is larger among women. For example, the probability of being employed for the obese category defined using BMI is around 8 percentage points lower for women and 5 percentage points lower for men.  相似文献   

19.
Daniel R. Kowal  Bohan Wu 《Biometrics》2023,79(2):1520-1533
‘‘For how many days during the past 30 days was your mental health not good?” The responses to this question measure self-reported mental health and can be linked to important covariates in the National Health and Nutrition Examination Survey (NHANES). However, these count variables present major distributional challenges: The data are overdispersed, zero-inflated, bounded by 30, and heaped in 5- and 7-day increments. To address these challenges—which are especially common for health questionnaire data—we design a semiparametric estimation and inference framework for count data regression. The data-generating process is defined by simultaneously transforming and rounding (star ) a latent Gaussian regression model. The transformation is estimated nonparametrically and the rounding operator ensures the correct support for the discrete and bounded data. Maximum likelihood estimators are computed using an expectation-maximization (EM) algorithm that is compatible with any continuous data model estimable by least squares. star regression includes asymptotic hypothesis testing and confidence intervals, variable selection via information criteria, and customized diagnostics. Simulation studies validate the utility of this framework. Using star regression, we identify key factors associated with self-reported mental health and demonstrate substantial improvements in goodness-of-fit compared to existing count data regression models.  相似文献   

20.
We propose a model for high dimensional mediation analysis that includes latent variables. We describe our model in the context of an epidemiologic study for incident breast cancer with one exposure and a large number of biomarkers (i.e., potential mediators). We assume that the exposure directly influences a group of latent, or unmeasured, factors which are associated with both the outcome and a subset of the biomarkers. The biomarkers associated with the latent factors linking the exposure to the outcome are considered “mediators.” We derive the likelihood for this model and develop an expectation‐maximization algorithm to maximize an L1‐penalized version of this likelihood to limit the number of factors and associated biomarkers. We show that the resulting estimates are consistent and that the estimates of the nonzero parameters have an asymptotically normal distribution. In simulations, procedures based on this new model can have significantly higher power for detecting the mediating biomarkers compared with the simpler approaches. We apply our method to a study that evaluates the relationship between body mass index, 481 metabolic measurements, and estrogen‐receptor positive breast cancer.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号