共查询到20条相似文献,搜索用时 15 毫秒
1.
In many areas of medical research, such as psychiatry and gerontology, latent class variables are used to classify individuals into disease categories, often with the intention of hierarchical modeling. Problems arise when it is not clear how many disease classes are appropriate, creating a need for model selection and diagnostic techniques. Previous work has shown that the Pearson chi 2 statistic and the log-likelihood ratio G2 statistic are not valid test statistics for evaluating latent class models. Other methods, such as information criteria, provide decision rules without providing explicit information about where discrepancies occur between a model and the data. Identifiability issues further complicate these problems. This paper develops procedures for assessing Markov chain Monte Carlo convergence and model diagnosis and for selecting the number of categories for the latent variable based on evidence in the data using Markov chain Monte Carlo techniques. Simulations and a psychiatric example are presented to demonstrate the effective use of these methods. 相似文献
2.
Summary : Dynamic latent class models provide a flexible framework for studying biologic processes that evolve over time. Motivated by studies of markers of the fertile days of the menstrual cycle, we propose a discrete‐time dynamic latent class framework, allowing change points to depend on time, fixed predictors, and random effects. Observed data consist of multivariate categorical indicators, which change dynamically in a flexible manner according to latent class status. Given the flexibility of the framework, which incorporates semi‐parametric components using mixtures of betas, identifiability constraints are needed to define the latent classes. Such constraints are most appropriately based on the known biology of the process. The Bayesian method is developed particularly for analyzing mucus symptom data from a study of women using natural family planning. 相似文献
3.
Liang Li Huzhang Mao Hemant Ishwaran Jeevanantham Rajeswaran John Ehrlinger Eugene H. Blackstone 《Biometrical journal. Biometrische Zeitschrift》2017,59(2):331-343
Atrial fibrillation (AF) is an abnormal heart rhythm characterized by rapid and irregular heartbeat, with or without perceivable symptoms. In clinical practice, the electrocardiogram (ECG) is often used for diagnosis of AF. Since the AF often arrives as recurrent episodes of varying frequency and duration and only the episodes that occur at the time of ECG can be detected, the AF is often underdiagnosed when a limited number of repeated ECGs are used. In studies evaluating the efficacy of AF ablation surgery, each patient undergoes multiple ECGs and the AF status at the time of ECG is recorded. The objective of this paper is to estimate the marginal proportions of patients with or without AF in a population, which are important measures of the efficacy of the treatment. The underdiagnosis problem is addressed by a three‐class mixture regression model in which a patient's probability of having no AF, paroxysmal AF, and permanent AF is modeled by auxiliary baseline covariates in a nested logistic regression. A binomial regression model is specified conditional on a subject being in the paroxysmal AF group. The model parameters are estimated by the Expectation‐Maximization (EM) algorithm. These parameters are themselves nuisance parameters for the purpose of this research, but the estimators of the marginal proportions of interest can be expressed as functions of the data and these nuisance parameters and their variances can be estimated by the sandwich method. We examine the performance of the proposed methodology in simulations and two real data applications. 相似文献
4.
Stacia M. DeSantis E. Andrés Houseman Brent A. Coull David N. Louis Gayatry Mohapatra Rebecca A. Betensky 《Biometrics》2009,65(4):1296-1305
Summary Array CGH is a high‐throughput technique designed to detect genomic alterations linked to the development and progression of cancer. The technique yields fluorescence ratios that characterize DNA copy number change in tumor versus healthy cells. Classification of tumors based on aCGH profiles is of scientific interest but the analysis of these data is complicated by the large number of highly correlated measures. In this article, we develop a supervised Bayesian latent class approach for classification that relies on a hidden Markov model to account for the dependence in the intensity ratios. Supervision means that classification is guided by a clinical endpoint. Posterior inferences are made about class‐specific copy number gains and losses. We demonstrate our technique on a study of brain tumors, for which our approach is capable of identifying subsets of tumors with different genomic profiles, and differentiates classes by survival much better than unsupervised methods. 相似文献
5.
6.
7.
基于贝叶斯网潜类模型的高维SNPs分析 总被引:1,自引:0,他引:1
采用贝叶斯(Bayesian)网的潜类模型对GAW17高维SNPs数据进行分析,为复杂性状疾病遗传以及基因定位等方面的研究提供新的方法支持。本研究从GAW17提供的包含697个个体22条常染色体的上万个SNP中,随机挑选出1号染色体上12个基因的29个SNPs作为研究对象。按照累计信息贡献率达到95%的原则,应用贝叶斯网潜变量模型选出C1S11408,C1S3201,C1S1786等15个与X0互信息量大的SNPs位点来对研究人群进行分类与解释。结果表明697个个体总的被分为2个潜在类别,各类别的概率分别为0.68和0.32。对两类人群的疾病分布状况进行分析,结果表明二者不一致,第二个类别人群患病率(38.64%)明显高于第一个类别人群(25.99%)(χ2=11.46,P=0.001)。由此可见,两类人群疾病患病率的差别正是由选出的15个SNPs造成的,从而有理由认为这些SNPs为可疑致病位点,为进一步的研究提供明确的思路。 相似文献
8.
Shirley Pledger 《Biometrics》2005,61(3):868-873
Summary . Dorazio and Royle (2003, Biometrics 59, 351–364) investigated the behavior of three mixture models for closed population capture–recapture analysis in the presence of individual heterogeneity of capture probability. Their simulations were from the beta-binomial distribution, with analyses from the beta-binomial, the logit-normal, and the finite mixture (latent class) models. In this response, simulations from many different distributions give a broader picture of the relative value of the beta-binomial and the finite mixture models, and provide some preliminary insights into the situations in which these models are useful. 相似文献
9.
The probability of conception in a given menstrual cycle is closely related to the timing of intercourse relative to ovulation. Although commonly used markers of time of ovulation are known to be error prone, most fertility models assume the day of ovulation is measured without error. We develop a mixture model that allows the day to be misspecified. We assume that the measurement errors are i.i.d. across menstrual cycles. Heterogeneity among couples in the per cycle likelihood of conception is accounted for using a beta mixture model. Bayesian estimation is straightforward using Markov chain Monte Carlo techniques. The methods are applied to a prospective study of couples at risk of pregnancy. In the absence of validation data or multiple independent markers of ovulation, the identifiability of the measurement error distribution depends on the assumed model. Thus, the results of studies relating the timing of intercourse to the probability of conception should be interpreted cautiously. 相似文献
10.
Coupled with environmental factors, genes contribute to numerous human diseases and traits. While there are many epidemiological methods to assess the familial clustering of traits, few are flexible enough to accommodate interactions between covariates and familial factors. In this paper, we propose and develop a frailty model that establishes an integrated framework to evaluate familial transmission of a disease by controlling for covariate effects and conveniently testing the interactions between covariates and familial factors. We also present a peeling algorithm that dramatically reduces the computational burden. This frailty model is employed to examine the familial transmission of major subtypes of alcoholism, namely, alcohol abuse and dependence. We conclude that alcohol dependence is strongly familial whereas alcohol abuse expresses a marginally significant pattern of familial transmission. Moreover, females manifest alcoholism at a lower threshold, and there is no sex-specific familial transmission of alcoholism after adjustment for the threshold effect. 相似文献
11.
With the prevalence of gene expression studies and the relatively low reproducibility caused by insufficient sample sizes, it is natural to consider joint analysis that could combine data from different experiments effectively to achieve improved accuracy. We present in this article a model-based approach for better identification of differentially expressed genes by incorporating data from different studies. The model can accommodate in a seamless fashion a wide range of studies including those performed at different platforms by fitting each data with different set of parameters, and/or under different but overlapping biological conditions. Model-based inferences can be done in an empirical Bayes' fashion. Because of the information sharing among studies, the joint analysis dramatically improves inferences based on individual analysis. Simulation studies and real data examples are presented to demonstrate the effectiveness of the proposed approach under a variety of complications that often arise in practice. 相似文献
12.
On a logistic mixture autoregressive model 总被引:6,自引:0,他引:6
13.
采用结构方程混合模型(SEMM)对实际SNP数据进行分析,为遗传统计学提供一种新的有效的分析方法。本研究的数据是由GAW17提供的,包含697个个体的22条常染色体的上万个SNP和根据这些SNP所模拟的697个个体的性状特点。随机挑选了1号染色体上的4个SNP和3个定量性状作为研究变量,分别进行潜在类别分析和结构方程混合模型分析。根据4个SNP数据,人群被分为3个潜在类别,概率分别为0.53,0.34,0.13。潜在类别1、2和3中的因子均值Q分别为-4.029、-2.052和0,潜在类别1、2的因子均值均低于3(<0.001)。研究表明:结构方程混合模型(SEMM)综合了结构方程模型和潜在类别模型的思想,形成了自己的优势,可用于处理同时包含分类潜变量和连续潜变量的数据。 相似文献
14.
15.
16.
Summary . A variety of flexible approaches have been proposed for functional data analysis, allowing both the mean curve and the distribution about the mean to be unknown. Such methods are most useful when there is limited prior information. Motivated by applications to modeling of temperature curves in the menstrual cycle, this article proposes a flexible approach for incorporating prior information in semiparametric Bayesian analyses of hierarchical functional data. The proposed approach is based on specifying the distribution of functions as a mixture of a parametric hierarchical model and a nonparametric contamination. The parametric component is chosen based on prior knowledge, while the contamination is characterized as a functional Dirichlet process. In the motivating application, the contamination component allows unanticipated curve shapes in unhealthy menstrual cycles. Methods are developed for posterior computation, and the approach is applied to data from a European fecundability study. 相似文献
17.
Albert PS 《Biometrics》2007,63(2):593-602
Estimating diagnostic accuracy without a gold standard is an important problem in medical testing. Although there is a fairly large literature on this problem for the case of repeated binary tests, there is substantially less work for the case of ordinal tests. A noted exception is the work by Zhou, Castelluccio, and Zhou (2005, Biometrics 61, 600-609), which proposed a methodology for estimating receiver operating characteristic (ROC) curves without a gold standard from multiple ordinal tests. A key assumption in their work was that the test results are independent conditional on the true test result. I propose random effects modeling approaches that incorporate dependence between the ordinal tests, and I show through asymptotic results and simulations the importance of correctly accounting for the dependence between tests. These modeling approaches, along with the importance of accounting for the dependence between tests, are illustrated by analyzing the uterine cancer pathology data analyzed by Zhou et al. (2005). 相似文献
18.
Geoffrey Jones Wesley O. Johnson Timothy E. Hanson Ronald Christensen 《Biometrics》2010,66(3):855-863
Summary We discuss the issue of identifiability of models for multiple dichotomous diagnostic tests in the absence of a gold standard (GS) test. Data arise as multinomial or product‐multinomial counts depending upon the number of populations sampled. Models are generally posited in terms of population prevalences, test sensitivities and specificities, and test dependence terms. It is commonly believed that if the degrees of freedom in the data meet or exceed the number of parameters in a fitted model then the model is identifiable. Goodman (1974, Biometrika 61, 215–231) established that this was not the case a long time ago. We discuss currently available models for multiple tests and argue in favor of an extension of a model that was developed by Dendukuri and Joseph (2001, Biometrics 57, 158–167). Subsequently, we further develop Goodman's technique, and make geometric arguments to give further insight into the nature of models that lack identifiability. We present illustrations using simulated and real data. 相似文献
19.
Summary . We consider a Markov structure for partially unobserved time-varying compliance classes in the Imbens–Rubin (1997, The Annals of Statistics 25, 305–327) compliance model framework. The context is a longitudinal randomized intervention study where subjects are randomized once at baseline, outcomes and patient adherence are measured at multiple follow-ups, and patient adherence to their randomized treatment could vary over time. We propose a nested latent compliance class model where we use time-invariant subject-specific compliance principal strata to summarize longitudinal trends of subject-specific time-varying compliance patterns. The principal strata are formed using Markov models that relate current compliance behavior to compliance history. Treatment effects are estimated as intent-to-treat effects within the compliance principal strata. 相似文献
20.
On the geometry of measurement error models 总被引:2,自引:0,他引:2