首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A frequently encountered problem in longitudinal studies is data that are missing due to missed visits or dropouts. In the statistical literature, interest has primarily focused on monotone missing data (dropout) with much less work on intermittent missing data in which a subject may return after one or more missed visits. Intermittent missing data have broader applicability that can include the frequent situation in which subjects do not have common sets of visit times or they visit at nonprescheduled times. In this article, we propose a latent pattern mixture model (LPMM), where the mixture patterns are formed from latent classes that link the longitudinal response and the missingness process. This allows us to handle arbitrary patterns of missing data embodied by subjects' visit process, and avoids the need to specify the mixture patterns a priori. One assumption of our model is that the missingness process is assumed to be conditionally independent of the longitudinal outcomes given the latent classes. We propose a noniterative approach to assess this key assumption. The LPMM is illustrated with a data set from a health service research study in which homeless people with mental illness were randomized to three different service packages and measures of homelessness were recorded at multiple time points. Our model suggests the presence of four latent classes linking subject visit patterns to homeless outcomes.  相似文献   

2.
Complex diseases like cancers can often be classified into subtypes using various pathological and molecular traits of the disease. In this article, we develop methods for analysis of disease incidence in cohort studies incorporating data on multiple disease traits using a two-stage semiparametric Cox proportional hazards regression model that allows one to examine the heterogeneity in the effect of the covariates by the levels of the different disease traits. For inference in the presence of missing disease traits, we propose a generalization of an estimating equation approach for handling missing cause of failure in competing-risk data. We prove asymptotic unbiasedness of the estimating equation method under a general missing-at-random assumption and propose a novel influence-function-based sandwich variance estimator. The methods are illustrated using simulation studies and a real data application involving the Cancer Prevention Study II nutrition cohort.  相似文献   

3.
Encouragement design studies are particularly useful for estimating the effect of an intervention that cannot itself be randomly administered to some and not to others. They require a randomly selected group receive extra encouragement to undertake the treatment of interest, where the encouragement typically takes the form of additional information or incentives. We consider a "clustered encouragement design" (CED), where the randomization is at the level of the clusters (e.g. physicians), but the compliance with assignment is at the level of the units (e.g. patients) within clusters. Noncompliance and missing data are particular problems in encouragement design studies, where encouragement to take the treatment, rather than the treatment itself, is randomized. The motivating study looks at whether computer-based care suggestions can improve patient outcomes in veterans with chronic heart failure. Since physician adherence has been inadequate, the original study focused on methods to improve physician adherence, although an equally important question is whether physician adherence improves patient outcomes. Here, we reanalyze the data to determine the effect of physician adherence on patient outcomes. We propose causal inference methodology for the effect of a treatment versus a control in a randomized CED study with all-or-none compliance at the unit level. These methods extend the current approaches to account for nonignorable missing data and use an alternative approach to inference using multiple imputation methods, which have been successfully applied to a wide variety of missing data problems and have recently been applied to the potential outcomes framework of causal inference (Taylor and Zhou, 2009b).  相似文献   

4.
In this article, we develop a latent class model with class probabilities that depend on subject-specific covariates. One of our major goals is to identify important predictors of latent classes. We consider methodology that allows estimation of latent classes while allowing for variable selection uncertainty. We propose a Bayesian variable selection approach and implement a stochastic search Gibbs sampler for posterior computation to obtain model-averaged estimates of quantities of interest such as marginal inclusion probabilities of predictors. Our methods are illustrated through simulation studies and application to data on weight gain during pregnancy, where it is of interest to identify important predictors of latent weight gain classes.  相似文献   

5.
Ma Y  Tang W  Feng C  Tu XM 《Biometrics》2008,64(3):781-789
Summary .   Analysis of instrument reliability and rater agreement is used in a wide range of behavioral, medical, psychosocial, and health-care-related research to assess psychometric properties of instruments, consensus in disease diagnoses, fidelity of psychosocial intervention, and accuracy of proxy outcomes. For categorical outcomes, Cohen's kappa is the most widely used index of agreement and reliability. In many modern-day applications, data are often clustered, making inference difficult to perform using existing methods. In addition, as longitudinal study designs become increasingly popular, missing data have become a serious issue, and the lack of methods to systematically address this problem has hampered the progress of research in the aforementioned fields. In this article, we develop a novel approach based on a new class of kappa estimates to tackle the complexities involved in addressing missing data and other related issues arising from a general multirater and longitudinal data setting. The approach is illustrated with real data in sexual health research.  相似文献   

6.
Neocortical neurons show UP-DOWN state (UDS) oscillations under a variety of conditions. These UDS have been extensively studied because of the insight they can yield into the functioning of cortical networks, and their proposed role in putative memory formation. A key element in these studies is determining the precise duration and timing of the UDS. These states are typically determined from the membrane potential of one or a small number of cells, which is often not sufficient to reliably estimate the state of an ensemble of neocortical neurons. The local field potential (LFP) provides an attractive method for determining the state of a patch of cortex with high spatio-temporal resolution; however current methods for inferring UDS from LFP signals lack the robustness and flexibility to be applicable when UDS properties may vary substantially within and across experiments. Here we present an explicit-duration hidden Markov model (EDHMM) framework that is sufficiently general to allow statistically principled inference of UDS from different types of signals (membrane potential, LFP, EEG), combinations of signals (e.g., multichannel LFP recordings) and signal features over long recordings where substantial non-stationarities are present. Using cortical LFPs recorded from urethane-anesthetized mice, we demonstrate that the proposed method allows robust inference of UDS. To illustrate the flexibility of the algorithm we show that it performs well on EEG recordings as well. We then validate these results using simultaneous recordings of the LFP and membrane potential (MP) of nearby cortical neurons, showing that our method offers significant improvements over standard methods. These results could be useful for determining functional connectivity of different brain regions, as well as understanding network dynamics.  相似文献   

7.
Large amounts of longitudinal health records are now available for dynamic monitoring of the underlying processes governing the observations. However, the health status progression across time is not typically observed directly: records are observed only when a subject interacts with the system, yielding irregular and often sparse observations. This suggests that the observed trajectories should be modeled via a latent continuous‐time process potentially as a function of time‐varying covariates. We develop a continuous‐time hidden Markov model to analyze longitudinal data accounting for irregular visits and different types of observations. By employing a specific missing data likelihood formulation, we can construct an efficient computational algorithm. We focus on Bayesian inference for the model: this is facilitated by an expectation‐maximization algorithm and Markov chain Monte Carlo methods. Simulation studies demonstrate that these approaches can be implemented efficiently for large data sets in a fully Bayesian setting. We apply this model to a real cohort where patients suffer from chronic obstructive pulmonary disease with the outcome being the number of drugs taken, using health care utilization indicators and patient characteristics as covariates.  相似文献   

8.
Bayesian inference has emerged as a general framework that captures how organisms make decisions under uncertainty. Recent experimental findings reveal disparate mechanisms for how the brain generates behaviors predicted by normative Bayesian theories. Here, we identify two broad classes of neural implementations for Bayesian inference: a modular class, where each probabilistic component of Bayesian computation is independently encoded and a transform class, where uncertain measurements are converted to Bayesian estimates through latent processes. Many recent experimental neuroscience findings studying probabilistic inference broadly fall into these classes. We identify potential avenues for synthesis across these two classes and the disparities that, at present, cannot be reconciled. We conclude that to distinguish among implementation hypotheses for Bayesian inference, we require greater engagement among theoretical and experimental neuroscientists in an effort that spans different scales of analysis, circuits, tasks, and species.  相似文献   

9.
Chen H  Geng Z  Zhou XH 《Biometrics》2009,65(3):675-682
Summary .  In this article, we first study parameter identifiability in randomized clinical trials with noncompliance and missing outcomes. We show that under certain conditions the parameters of interest are identifiable even under different types of completely nonignorable missing data: that is, the missing mechanism depends on the outcome. We then derive their maximum likelihood and moment estimators and evaluate their finite-sample properties in simulation studies in terms of bias, efficiency, and robustness. Our sensitivity analysis shows that the assumed nonignorable missing-data model has an important impact on the estimated complier average causal effect (CACE) parameter. Our new method provides some new and useful alternative nonignorable missing-data models over the existing latent ignorable model, which guarantees parameter identifiability, for estimating the CACE in a randomized clinical trial with noncompliance and missing data.  相似文献   

10.
Why environmental scientists are becoming Bayesians   总被引:11,自引:0,他引:11  
Advances in computational statistics provide a general framework for the high‐dimensional models typically needed for ecological inference and prediction. Hierarchical Bayes (HB) represents a modelling structure with capacity to exploit diverse sources of information, to accommodate influences that are unknown (or unknowable), and to draw inference on large numbers of latent variables and parameters that describe complex relationships. Here I summarize the structure of HB and provide examples for common spatiotemporal problems. The flexible framework means that parameters, variables and latent variables can represent broader classes of model elements than are treated in traditional models. Inference and prediction depend on two types of stochasticity, including (1) uncertainty, which describes our knowledge of fixed quantities, it applies to all ‘unobservables’ (latent variables and parameters), and it declines asymptotically with sample size, and (2) variability, which applies to fluctuations that are not explained by deterministic processes and does not decline asymptotically with sample size. Examples demonstrate how different sources of stochasticity impact inference and prediction and how allowance for stochastic influences can guide research.  相似文献   

11.
Mapping quantitative trait loci with censored observations   总被引:2,自引:0,他引:2  
Diao G  Lin DY  Zou F 《Genetics》2004,168(3):1689-1698
The existing statistical methods for mapping quantitative trait loci (QTL) assume that the phenotype follows a normal distribution and is fully observed. These assumptions may not be satisfied when the phenotype pertains to the survival time or failure time, which has a skewed distribution and is usually subject to censoring due to random loss of follow-up or limited duration of the experiment. In this article, we propose an interval-mapping approach for censored failure time phenotypes. We formulate the effects of QTL on the failure time through parametric proportional hazards models and develop efficient likelihood-based inference procedures. In addition, we show how to assess genome-wide statistical significance. The performance of the proposed methods is evaluated through extensive simulation studies. An application to a mouse cross is provided.  相似文献   

12.
In problems with missing or latent data, a standard approach is to first impute the unobserved data, then perform all statistical analyses on the completed dataset--corresponding to the observed data and imputed unobserved data--using standard procedures for complete-data inference. Here, we extend this approach to model checking by demonstrating the advantages of the use of completed-data model diagnostics on imputed completed datasets. The approach is set in the theoretical framework of Bayesian posterior predictive checks (but, as with missing-data imputation, our methods of missing-data model checking can also be interpreted as "predictive inference" in a non-Bayesian context). We consider the graphical diagnostics within this framework. Advantages of the completed-data approach include: (1) One can often check model fit in terms of quantities that are of key substantive interest in a natural way, which is not always possible using observed data alone. (2) In problems with missing data, checks may be devised that do not require to model the missingness or inclusion mechanism; the latter is useful for the analysis of ignorable but unknown data collection mechanisms, such as are often assumed in the analysis of sample surveys and observational studies. (3) In many problems with latent data, it is possible to check qualitative features of the model (for example, independence of two variables) that can be naturally formalized with the help of the latent data. We illustrate with several applied examples.  相似文献   

13.
Browning SR 《Human genetics》2008,124(5):439-450
Imputation of missing data and the use of haplotype-based association tests can improve the power of genome-wide association studies (GWAS). In this article, I review methods for haplotype inference and missing data imputation, and discuss their application to GWAS. I discuss common features of the best algorithms for haplotype phase inference and missing data imputation in large-scale data sets, as well as some important differences between classes of methods, and highlight the methods that provide the highest accuracy and fastest computational performance.  相似文献   

14.
Summary Fetal growth restriction is a leading cause of perinatal morbidity and mortality that could be reduced if high‐risk infants are identified early in pregnancy. We propose a Bayesian model for aggregating 18 longitudinal ultrasound measurements of fetal size and blood flow into three underlying, continuous latent factors. Our procedure is more flexible than typical latent variable methods in that we relax the normality assumptions by allowing the latent factors to follow finite mixture distributions. Using mixture distributions also permits us to cluster individuals with similar observed characteristics and identify latent classes of subjects who are more likely to be growth or blood flow restricted during pregnancy. We also use our latent variable mixture distribution model to identify a clinically meaningful latent class of subjects with low birth weight and early gestational age. We then examine the association of latent classes of intrauterine growth restriction with latent classes of birth outcomes as well as observed maternal covariates including fetal gender and maternal race, parity, body mass index, and height. Our methods identified a latent class of subjects who have increased blood flow restriction and below average intrauterine size during pregnancy. These subjects were more likely to be growth restricted at birth than a class of individuals with typical size and blood flow.  相似文献   

15.
Pathogenic Aeromonas hydrophila (strain VB21), a multiple-drug resistance strain contains a plasmid of about 21 kb. After curing of plasmid, the isolates became sensitive to antimicrobials, to which they were earlier resistant. The cured bacteria exhibited significant alterations in their surface structure, growth profile and virulence properties, and failed to cause ulcerative disease syndrome (UDS) when injected into the Indian catfish Clarias batrachus. Routine biochemical studies revealed that the plasmid curing did not alter the biochemical properties of the bacteria. After transformation of the plasmid into cured A. hydrophila the bacterium regained its virulence properties and induced all the characteristic symptoms of UDS when injected into fish. Thus, the plasmid plays a pivotal role in the phenotype, growth and virulence of A. hydrophila and pathogenesis of aeromonad UDS.  相似文献   

16.
Summary Latent class analysis (LCA) and latent class regression (LCR) are widely used for modeling multivariate categorical outcomes in social science and biomedical studies. Standard analyses assume data of different respondents to be mutually independent, excluding application of the methods to familial and other designs in which participants are clustered. In this article, we consider multilevel latent class models, in which subpopulation mixing probabilities are treated as random effects that vary among clusters according to a common Dirichlet distribution. We apply the expectation‐maximization (EM) algorithm for model fitting by maximum likelihood (ML). This approach works well, but is computationally intensive when either the number of classes or the cluster size is large. We propose a maximum pairwise likelihood (MPL) approach via a modified EM algorithm for this case. We also show that a simple latent class analysis, combined with robust standard errors, provides another consistent, robust, but less‐efficient inferential procedure. Simulation studies suggest that the three methods work well in finite samples, and that the MPL estimates often enjoy comparable precision as the ML estimates. We apply our methods to the analysis of comorbid symptoms in the obsessive compulsive disorder study. Our models' random effects structure has more straightforward interpretation than those of competing methods, thus should usefully augment tools available for LCA of multilevel data.  相似文献   

17.
Houseman EA  Coull BA  Betensky RA 《Biometrics》2006,62(4):1062-1070
Genomic data are often characterized by a moderate to large number of categorical variables observed for relatively few subjects. Some of the variables may be missing or noninformative. An example of such data is loss of heterozygosity (LOH), a dichotomous variable, observed on a moderate number of genetic markers. We first consider a latent class model where, conditional on unobserved membership in one of k classes, the variables are independent with probabilities determined by a regression model of low dimension q. Using a family of penalties including the ridge and LASSO, we extend this model to address higher-dimensional problems. Finally, we present an orthogonal map that transforms marker space to a space of "features" for which the constrained model has better predictive power. We demonstrate these methods on LOH data collected at 19 markers from 93 brain tumor patients. For this data set, the existing unpenalized latent class methodology does not produce estimates. Additionally, we show that posterior classes obtained from this method are associated with survival for these patients.  相似文献   

18.
19.
High-dimensional biomarker data are often collected in epidemiological studies when assessing the association between biomarkers and human disease is of interest. We develop a latent class modeling approach for joint analysis of high-dimensional semicontinuous biomarker data and a binary disease outcome. To model the relationship between complex biomarker expression patterns and disease risk, we use latent risk classes to link the 2 modeling components. We characterize complex biomarker-specific differences through biomarker-specific random effects, so that different biomarkers can have different baseline (low-risk) values as well as different between-class differences. The proposed approach also accommodates data features that are common in environmental toxicology and other biomarker exposure data, including a large number of biomarkers, numerous zero values, and complex mean-variance relationship in the biomarkers levels. A Monte Carlo EM (MCEM) algorithm is proposed for parameter estimation. Both the MCEM algorithm and model selection procedures are shown to work well in simulations and applications. In applying the proposed approach to an epidemiological study that examined the relationship between environmental polychlorinated biphenyl (PCB) exposure and the risk of endometriosis, we identified a highly significant overall effect of PCB concentrations on the risk of endometriosis.  相似文献   

20.
Understanding the categorization of human diseases is critical for reliably identifying disease causal genes. Recently, genome-wide studies of abnormal chromosomal locations related to diseases have mapped >2000 phenotype–gene relations, which provide valuable information for classifying diseases and identifying candidate genes as drug targets. In this article, a regularized non-negative matrix tri-factorization (R-NMTF) algorithm is introduced to co-cluster phenotypes and genes, and simultaneously detect associations between the detected phenotype clusters and gene clusters. The R-NMTF algorithm factorizes the phenotype–gene association matrix under the prior knowledge from phenotype similarity network and protein–protein interaction network, supervised by the label information from known disease classes and biological pathways. In the experiments on disease phenotype–gene associations in OMIM and KEGG disease pathways, R-NMTF significantly improved the classification of disease phenotypes and disease pathway genes compared with support vector machines and Label Propagation in cross-validation on the annotated phenotypes and genes. The newly predicted phenotypes in each disease class are highly consistent with human phenotype ontology annotations. The roles of the new member genes in the disease pathways are examined and validated in the protein–protein interaction subnetworks. Extensive literature review also confirmed many new members of the disease classes and pathways as well as the predicted associations between disease phenotype classes and pathways.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号