首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Latent class regression on latent factors   总被引:1,自引:0,他引:1  
In the research of public health, psychology, and social sciences, many research questions investigate the relationship between a categorical outcome variable and continuous predictor variables. The focus of this paper is to develop a model to build this relationship when both the categorical outcome and the predictor variables are latent (i.e. not observable directly). This model extends the latent class regression model so that it can include regression on latent predictors. Maximum likelihood estimation is used and two numerical methods for performing it are described: the Monte Carlo expectation and maximization algorithm and Gaussian quadrature followed by quasi-Newton algorithm. A simulation study is carried out to examine the behavior of the model under different scenarios. A data example involving adolescent health is used for demonstration where the latent classes of eating disorders risk are predicted by the latent factor body satisfaction.  相似文献   

2.
This paper proposes a two-part model for studying transitions between health states over time when multiple, discrete health indicators are available. The includes a measurement model positing underlying latent health states and a transition model between latent health states over time. Full maximum likelihood estimation procedures are computationally complex in this latent variable framework, making only a limited class of models feasible and estimation of standard errors problematic. For this reason, an estimating equations analogue of the pseudo-likelihood method for the parameters of interest, namely the transition model parameters, is considered. The finite sample properties of the proposed procedure are investigated through a simulation study and the importance of choosing strong indicators of the latent variable is demonstrated. The applicability of the methodology is illustrated with health survey data measuring disability in the elderly from the Longitudinal Study of Aging.  相似文献   

3.
Miglioretti DL 《Biometrics》2003,59(3):710-720
Health status is a complex outcome, often characterized by multiple measures. When assessing changes in health status over time, multiple measures are typically collected longitudinally. Analytic challenges posed by these multivariate longitudinal data are further complicated when the outcomes are combinations of continuous, categorical, and count data. To address these challenges, we propose a fully Bayesian latent transition regression approach for jointly analyzing a mixture of longitudinal outcomes from any distribution. Health status is assumed to be a categorical latent variable, and the multiple outcomes are treated as surrogate measures of the latent health state, observed with error. Using this approach, both baseline latent health state prevalences and the probabilities of transitioning between the health states over time are modeled as functions of covariates. The observed outcomes are related to the latent health states through regression models that include subject-specific effects to account for residual correlation among repeated measures over time, and covariate effects to account for differential measurement of the latent health states. We illustrate our approach with data from a longitudinal study of back pain.  相似文献   

4.

Background

Self-reported health status measures, like the Short Form 36-item Health Survey (SF-36), can provide rich information about the overall health of a population and its components, such as physical, mental, and social health. However, differential item functioning (DIF), which arises when population sub-groups with the same underlying (i.e., latent) level of health have different measured item response probabilities, may compromise the comparability of these measures. The purpose of this study was to test for DIF on the SF-36 physical functioning (PF) and mental health (MH) sub-scale items in a Canadian population-based sample.

Methods

Study data were from the prospective Canadian Multicentre Osteoporosis Study (CaMos), which collected baseline data in 1996–1997. DIF was tested using a multiple indicators multiple causes (MIMIC) method. Confirmatory factor analysis defined the latent variable measurement model for the item responses and latent variable regression with demographic and health status covariates (i.e., sex, age group, body weight, self-perceived general health) produced estimates of the magnitude of DIF effects.

Results

The CaMos cohort consisted of 9423 respondents; 69.4% were female and 51.7% were less than 65 years. Eight of 10 items on the PF sub-scale and four of five items on the MH sub-scale exhibited DIF. Large DIF effects were observed on PF sub-scale items about vigorous and moderate activities, lifting and carrying groceries, walking one block, and bathing or dressing. On the MH sub-scale items, all DIF effects were small or moderate in size.

Conclusions

SF-36 PF and MH sub-scale scores were not comparable across population sub-groups defined by demographic and health status variables due to the effects of DIF, although the magnitude of this bias was not large for most items. We recommend testing and adjusting for DIF to ensure comparability of the SF-36 in population-based investigations.  相似文献   

5.
Larsen K 《Biometrics》2005,61(4):1049-1055
This article is motivated by the Women's Health and Aging Study, where information about physical functioning was recorded along with death information in a group of elderly women. The focus is on determining whether having difficulties in daily living tasks is accompanied by a higher mortality rate. To this end, a two-parameter logistic regression model is used for the modeling of binary questionnaire data assuming an underlying continuous latent variable, difficulty in daily living. The Cox model is used for the survival information, and the continuous latent variable is included as an explanatory variable along with other observed variables. Parameters are estimated by maximizing the likelihood for the joint distribution of the items and the time-to-event information. In addition to presenting a new statistical model, this article also illustrates the use of the model in a real data setting and addresses the more practical issues of model building, diagnostics, and parameter interpretation.  相似文献   

6.
Houseman EA  Coull BA  Betensky RA 《Biometrics》2006,62(4):1062-1070
Genomic data are often characterized by a moderate to large number of categorical variables observed for relatively few subjects. Some of the variables may be missing or noninformative. An example of such data is loss of heterozygosity (LOH), a dichotomous variable, observed on a moderate number of genetic markers. We first consider a latent class model where, conditional on unobserved membership in one of k classes, the variables are independent with probabilities determined by a regression model of low dimension q. Using a family of penalties including the ridge and LASSO, we extend this model to address higher-dimensional problems. Finally, we present an orthogonal map that transforms marker space to a space of "features" for which the constrained model has better predictive power. We demonstrate these methods on LOH data collected at 19 markers from 93 brain tumor patients. For this data set, the existing unpenalized latent class methodology does not produce estimates. Additionally, we show that posterior classes obtained from this method are associated with survival for these patients.  相似文献   

7.
Roy J  Lin X 《Biometrics》2000,56(4):1047-1054
Multiple outcomes are often used to properly characterize an effect of interest. This paper proposes a latent variable model for the situation where repeated measures over time are obtained on each outcome. These outcomes are assumed to measure an underlying quantity of main interest from different perspectives. We relate the observed outcomes using regression models to a latent variable, which is then modeled as a function of covariates by a separate regression model. Random effects are used to model the correlation due to repeated measures of the observed outcomes and the latent variable. An EM algorithm is developed to obtain maximum likelihood estimates of model parameters. Unit-specific predictions of the latent variables are also calculated. This method is illustrated using data from a national panel study on changes in methadone treatment practices.  相似文献   

8.
Missing data are ubiquitous in clinical and social research, and multiple imputation (MI) is increasingly the methodology of choice for practitioners. Two principal strategies for imputation have been proposed in the literature: joint modelling multiple imputation (JM‐MI) and full conditional specification multiple imputation (FCS‐MI). While JM‐MI is arguably a preferable approach, because it involves specification of an explicit imputation model, FCS‐MI is pragmatically appealing, because of its flexibility in handling different types of variables. JM‐MI has developed from the multivariate normal model, and latent normal variables have been proposed as a natural way to extend this model to handle categorical variables. In this article, we evaluate the latent normal model through an extensive simulation study and an application on data from the German Breast Cancer Study Group, comparing the results with FCS‐MI. We divide our investigation in four sections, focusing on (i) binary, (ii) categorical, (iii) ordinal, and (iv) count data. Using data simulated from both the latent normal model and the general location model, we find that in all but one extreme general location model setting JM‐MI works very well, and sometimes outperforms FCS‐MI. We conclude the latent normal model, implemented in the R package jomo , can be used with confidence by researchers, both for single and multilevel multiple imputation.  相似文献   

9.
When observing data on a patient-reported outcome measure in, for example, clinical trials, the variables observed are often correlated and intended to measure a latent variable. In addition, such data are also often characterized by a hierarchical structure, meaning that the outcome is repeatedly measured within patients. To analyze such data, it is important to use an appropriate statistical model, such as structural equation modeling (SEM). However, researchers may rely on simpler statistical models that are applied to an aggregated data structure. For example, correlated variables are combined into one sum score that approximates a latent variable. This may have implications when, for example, the sum score consists of indicators that relate differently to the latent variable being measured. This study compares three models that can be applied to analyze such data: the multilevel multiple indicators multiple causes (ML-MIMIC) model, a univariate multilevel model, and a mixed analysis of variance (ANOVA) model. The focus is on the estimation of a cross-level interaction effect that presents the difference over time on the patient-reported outcome between two treatment groups. The ML-MIMIC model is an SEM-type model that considers the relationship between the indicators and the latent variable in a multilevel setting, whereas the univariate multilevel and mixed ANOVA model rely on sum scores to approximate the latent variable. In addition, the mixed ANOVA model uses aggregated second-level means as outcome. This study showed that the ML-MIMIC model produced unbiased cross-level interaction effect estimates when the relationships between the indicators and the latent variable being measured varied across indicators. In contrast, under similar conditions, the univariate multilevel and mixed ANOVA model underestimated the cross-level interaction effect.  相似文献   

10.
Dynamic Model for Multivariate Markers of Fecundability   总被引:1,自引:0,他引:1  
Summary : Dynamic latent class models provide a flexible framework for studying biologic processes that evolve over time. Motivated by studies of markers of the fertile days of the menstrual cycle, we propose a discrete‐time dynamic latent class framework, allowing change points to depend on time, fixed predictors, and random effects. Observed data consist of multivariate categorical indicators, which change dynamically in a flexible manner according to latent class status. Given the flexibility of the framework, which incorporates semi‐parametric components using mixtures of betas, identifiability constraints are needed to define the latent classes. Such constraints are most appropriately based on the known biology of the process. The Bayesian method is developed particularly for analyzing mucus symptom data from a study of women using natural family planning.  相似文献   

11.
Lee SY  Song XY 《Biometrics》2004,60(3):624-636
A general two-level latent variable model is developed to provide a comprehensive framework for model comparison of various submodels. Nonlinear relationships among the latent variables in the structural equations at both levels, as well as the effects of fixed covariates in the measurement and structural equations at both levels, can be analyzed within the framework. Moreover, the methodology can be applied to hierarchically mixed continuous, dichotomous, and polytomous data. A Monte Carlo EM algorithm is implemented to produce the maximum likelihood estimate. The E-step is completed by approximating the conditional expectations through observations that are simulated by Markov chain Monte Carlo methods, while the M-step is completed by conditional maximization. A procedure is proposed for computing the complicated observed-data log likelihood and the BIC for model comparison. The methods are illustrated by using a real data set.  相似文献   

12.
This paper considers the regression analysis of categorical variables when the response variable is incompletely observed and the non‐response mechanism is assumed to be non‐ignorable. Maximum likelihood estimation of the model parameters can lead to substantively implausible boundary solutions where the estimated proportion of non‐respondents taking certain values of the response variable is zero. A geometric explanation of why boundary solutions occur was given in a previous paper for a simple model. By extending this explanation, it is possible to define the sub‐class of non‐ignorable models whose parameters are identified, and to show all models not in this sub‐class are non‐identified. The conditions under which a model is a member of this class are easily established.  相似文献   

13.

Interval-censored failure times arise when the status with respect to an event of interest is only determined at intermittent examination times. In settings where there exists a sub-population of individuals who are not susceptible to the event of interest, latent variable models accommodating a mixture of susceptible and nonsusceptible individuals are useful. We consider such models for the analysis of bivariate interval-censored failure time data with a model for bivariate binary susceptibility indicators and a copula model for correlated failure times given joint susceptibility. We develop likelihood, composite likelihood, and estimating function methods for model fitting and inference, and assess asymptotic-relative efficiency and finite sample performance. Extensions dealing with higher-dimensional responses and current status data are also described.

  相似文献   

14.
We used latent class analysis (LCA) to identify heterogeneous subgroups with respect to behavioral obesity risk factors in a sample of 4th grade children (n = 997) residing in Southern California. Multiple dimensions assessing physical activity, eating and sedentary behavior, and weight perceptions were explored. A set of 11 latent class indicators were used in the analysis. The final model yielded a five-class solution: "High-sedentary, high-fat/high-sugar (HF/HS) snacks, not weight conscious," "dieting without exercise, weight conscious," "high-sedentary, HF/HS snacks, weight conscious," "active, healthy eating," and "low healthy, snack food, inactive, not weight conscious." The results suggested distinct subtypes of children with respect to obesity-related risk behaviors. Ethnicity, gender, and a socioeconomic status proxy variable significantly predicted the above latent classes. Overweight or obese weight status was determined based on the Centers for Disease Control and Prevention BMI (kg/m2)-for-age-and-sex percentile (overweight, 85th percentile ≤ BMI < 95th percentile; obese, 95th percentile ≤ BMI). The identified latent subgroup membership, in turn, was associated with the children's weight categories. The results suggest that intervention programs could be refined or targeted based on children's characteristics to promote effective pediatric obesity interventions.  相似文献   

15.
Learning causality from data is known as the causal discovery problem, and it is an important and relatively new field. In many applications, there often exist latent variables, if such latent variables are completely ignored, which can lead to the estimation results seriously biased. In this paper, a method of combining exploratory factor analysis and path analysis (EFA-PA) is proposed to infer the causality in the presence of latent variables. Our method expands latent variables as well as their linear causal relationships with observed variables, which enhances the accuracy of causal models. Such model can be thought of as the simplest possible causal models for continuous data. The EFA-PA is very similar to that of structural equation model, but the theoretical model established by the structural equation model needs to be modified in the process of data fitting until the ideal model is established.The model gained by EFA-PA not only avoids subjectivity but also reduces estimation complexity. It is found that the EFA-PA estimation model is superior to the other models. EFA-PA can provides a basis for the correct estimation of the causal relationship between the observed variables in the presence of latent variables. The experiment shows that EFA-PA is better than the structural equation model.  相似文献   

16.
Many different methods for evaluating diagnostic test results in the absence of a gold standard have been proposed. In this paper, we discuss how one common method, a maximum likelihood estimate for a latent class model found via the Expectation-Maximization (EM) algorithm can be applied to longitudinal data where test sensitivity changes over time. We also propose two simplified and nonparametric methods which use data-based indicator variables for disease status and compare their accuracy to the maximum likelihood estimation (MLE) results. We find that with high specificity tests, the performance of simpler approximations may be just as high as the MLE.  相似文献   

17.
Dunson DB  Perreault SD 《Biometrics》2001,57(1):302-308
This article describes a general class of factor analytic models for the analysis of clustered multivariate data in the presence of informative missingness. We assume that there are distinct sets of cluster-level latent variables related to the primary outcomes and to the censoring process, and we account for dependency between these latent variables through a hierarchical model. A linear model is used to relate covariates and latent variables to the primary outcomes for each subunit. A generalized linear model accounts for covariate and latent variable effects on the probability of censoring for subunits within each cluster. The model accounts for correlation within clusters and within subunits through a flexible factor analytic framework that allows multiple latent variables and covariate effects on the latent variables. The structure of the model facilitates implementation of Markov chain Monte Carlo methods for posterior estimation. Data from a spermatotoxicity study are analyzed to illustrate the proposed approach.  相似文献   

18.
A method for analysing dependent agreement data with categorical responses is proposed. A generalized estimating equation approach is developed with two sets of equations. The first set models the marginal distribution of categorical ratings, and the second set models the pairwise association of ratings with the kappa coefficient (kappa) as a metric. Covariates can be incorporated into both sets of equations. This approach is compared with a latent variable model that assumes an underlying multivariate normal distribution in which the intraclass correlation coefficient is used as a measure of association. Examples are from a cervical ectopy study and the National Heart, Lung, and Blood Institute Veteran Twin Study.  相似文献   

19.
Studies of latent traits often collect data for multiple items measuring different aspects of the trait. For such data, it is common to consider models in which the different items are manifestations of a normal latent variable, which depends on covariates through a linear regression model. This article proposes a flexible Bayesian alternative in which the unknown latent variable density can change dynamically in location and shape across levels of a predictor. Scale mixtures of underlying normals are used in order to model flexibly the measurement errors and allow mixed categorical and continuous scales. A dynamic mixture of Dirichlet processes is used to characterize the latent response distributions. Posterior computation proceeds via a Markov chain Monte Carlo algorithm, with predictive densities used as a basis for inferences and evaluation of model fit. The methods are illustrated using data from a study of DNA damage in response to oxidative stress.  相似文献   

20.
Albert PS  Dodd LE 《Biometrics》2004,60(2):427-435
Modeling diagnostic error without a gold standard has been an active area of biostatistical research. In a majority of the approaches, model-based estimates of sensitivity, specificity, and prevalence are derived from a latent class model in which the latent variable represents an individual's true unobserved disease status. For simplicity, initial approaches assumed that the diagnostic test results on the same subject were independent given the true disease status (i.e., the conditional independence assumption). More recently, various authors have proposed approaches for modeling the dependence structure between test results given true disease status. This note discusses a potential problem with these approaches. Namely, we show that when the conditional dependence between tests is misspecified, estimators of sensitivity, specificity, and prevalence can be biased. Importantly, we demonstrate that with small numbers of tests, likelihood comparisons and other model diagnostics may not be able to distinguish between models with different dependence structures. We present asymptotic results that show the generality of the problem. Further, data analysis and simulations demonstrate the practical implications of model misspecification. Finally, we present some guidelines about the use of these models for practitioners.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号