首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 859 毫秒
1.
High-dimensional gene expression data often exhibit intricate correlation patterns as the result of coordinated genetic regulation. In practice, however, it is difficult to directly measure these coordinated underlying activities. Analysis of breast cancer survival data with gene expressions motivates us to use a two-stage latent factor approach to estimate these unobserved coordinated biological processes. Compared to existing approaches, our proposed procedure has several unique characteristics. In the first stage, an important distinction is that our procedure incorporates prior biological knowledge about gene-pathway membership into the analysis and explicitly model the effects of genetic pathways on the latent factors. Second, to characterize the molecular heterogeneity of breast cancer, our approach provides estimates specific to each cancer subtype. Finally, our proposed framework incorporates sparsity condition due to the fact that genetic networks are often sparse. In the second stage, we investigate the relationship between latent factor activity levels and survival time with censoring using a general dimension reduction model in the survival analysis context. Combining the factor model and sufficient direction model provides an efficient way of analyzing high-dimensional data and reveals some interesting relations in the breast cancer gene expression data.  相似文献   

2.
Summary Latent class analysis (LCA) and latent class regression (LCR) are widely used for modeling multivariate categorical outcomes in social science and biomedical studies. Standard analyses assume data of different respondents to be mutually independent, excluding application of the methods to familial and other designs in which participants are clustered. In this article, we consider multilevel latent class models, in which subpopulation mixing probabilities are treated as random effects that vary among clusters according to a common Dirichlet distribution. We apply the expectation‐maximization (EM) algorithm for model fitting by maximum likelihood (ML). This approach works well, but is computationally intensive when either the number of classes or the cluster size is large. We propose a maximum pairwise likelihood (MPL) approach via a modified EM algorithm for this case. We also show that a simple latent class analysis, combined with robust standard errors, provides another consistent, robust, but less‐efficient inferential procedure. Simulation studies suggest that the three methods work well in finite samples, and that the MPL estimates often enjoy comparable precision as the ML estimates. We apply our methods to the analysis of comorbid symptoms in the obsessive compulsive disorder study. Our models' random effects structure has more straightforward interpretation than those of competing methods, thus should usefully augment tools available for LCA of multilevel data.  相似文献   

3.
Health researchers are often interested in assessing the direct effect of a treatment or exposure on an outcome variable, as well as its indirect (or mediation) effect through an intermediate variable (or mediator). For an outcome following a nonlinear model, the mediation formula may be used to estimate causally interpretable mediation effects. This method, like others, assumes that the mediator is observed. However, as is common in structural equations modeling, we may wish to consider a latent (unobserved) mediator. We follow a potential outcomes framework and assume a generalized structural equations model (GSEM). We provide maximum‐likelihood estimation of GSEM parameters using an approximate Monte Carlo EM algorithm, coupled with a mediation formula approach to estimate natural direct and indirect effects. The method relies on an untestable sequential ignorability assumption; we assess robustness to this assumption by adapting a recently proposed method for sensitivity analysis. Simulation studies show good properties of the proposed estimators in plausible scenarios. Our method is applied to a study of the effect of mother education on occurrence of adolescent dental caries, in which we examine possible mediation through latent oral health behavior.  相似文献   

4.
5.
Existing methods for joint modeling of longitudinal measurements and survival data can be highly influenced by outliers in the longitudinal outcome. We propose a joint model for analysis of longitudinal measurements and competing risks failure time data which is robust in the presence of outlying longitudinal observations during follow‐up. Our model consists of a linear mixed effects sub‐model for the longitudinal outcome and a proportional cause‐specific hazards frailty sub‐model for the competing risks data, linked together by latent random effects. Instead of the usual normality assumption for measurement errors in the linear mixed effects sub‐model, we adopt a t ‐distribution which has a longer tail and thus is more robust to outliers. We derive an EM algorithm for the maximum likelihood estimates of the parameters and estimate their standard errors using a profile likelihood method. The proposed method is evaluated by simulation studies and is applied to a scleroderma lung study (© 2009 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

6.
High-dimensional biomarker data are often collected in epidemiological studies when assessing the association between biomarkers and human disease is of interest. We develop a latent class modeling approach for joint analysis of high-dimensional semicontinuous biomarker data and a binary disease outcome. To model the relationship between complex biomarker expression patterns and disease risk, we use latent risk classes to link the 2 modeling components. We characterize complex biomarker-specific differences through biomarker-specific random effects, so that different biomarkers can have different baseline (low-risk) values as well as different between-class differences. The proposed approach also accommodates data features that are common in environmental toxicology and other biomarker exposure data, including a large number of biomarkers, numerous zero values, and complex mean-variance relationship in the biomarkers levels. A Monte Carlo EM (MCEM) algorithm is proposed for parameter estimation. Both the MCEM algorithm and model selection procedures are shown to work well in simulations and applications. In applying the proposed approach to an epidemiological study that examined the relationship between environmental polychlorinated biphenyl (PCB) exposure and the risk of endometriosis, we identified a highly significant overall effect of PCB concentrations on the risk of endometriosis.  相似文献   

7.
Roy J  Lin X 《Biometrics》2000,56(4):1047-1054
Multiple outcomes are often used to properly characterize an effect of interest. This paper proposes a latent variable model for the situation where repeated measures over time are obtained on each outcome. These outcomes are assumed to measure an underlying quantity of main interest from different perspectives. We relate the observed outcomes using regression models to a latent variable, which is then modeled as a function of covariates by a separate regression model. Random effects are used to model the correlation due to repeated measures of the observed outcomes and the latent variable. An EM algorithm is developed to obtain maximum likelihood estimates of model parameters. Unit-specific predictions of the latent variables are also calculated. This method is illustrated using data from a national panel study on changes in methadone treatment practices.  相似文献   

8.

Summary

We consider a functional linear Cox regression model for characterizing the association between time‐to‐event data and a set of functional and scalar predictors. The functional linear Cox regression model incorporates a functional principal component analysis for modeling the functional predictors and a high‐dimensional Cox regression model to characterize the joint effects of both functional and scalar predictors on the time‐to‐event data. We develop an algorithm to calculate the maximum approximate partial likelihood estimates of unknown finite and infinite dimensional parameters. We also systematically investigate the rate of convergence of the maximum approximate partial likelihood estimates and a score test statistic for testing the nullity of the slope function associated with the functional predictors. We demonstrate our estimation and testing procedures by using simulations and the analysis of the Alzheimer's Disease Neuroimaging Initiative (ADNI) data. Our real data analyses show that high‐dimensional hippocampus surface data may be an important marker for predicting time to conversion to Alzheimer's disease. Data used in the preparation of this article were obtained from the ADNI database ( adni.loni.usc.edu ).  相似文献   

9.
This research is motivated by a pilot colorectal adenoma study, where the outcome of interest is the presence of colorectal adenoma representing risk for colorectal cancer, and the predictors of interest are protein biomarkers that are repeatedly measured with errors along the length of a microscopic structure in the human colon, the colon crypt. Biomarkers of this type are referred to as functional biomarkers. The investigators are interested in identifying features of functional biomarkers that are associated with risk for colorectal cancer. In this paper, we investigate a joint modeling approach, where the binary clinical outcome is modeled using a logistic regression model with the unobserved true functional biomarkers as the predictors. Most existing methods are developed either for linear models or for functional biomarkers measured without errors and cannot be directly applied to our data. The applicable methods include a two-step method and a maximum likelihood method, which have some limitations. We propose a robust semiparametric method to overcome the limitations of the existing methods. We study the properties of the proposed method, and show in simulations that it compares favorably with other methods and also offers significant savings in CPU time. We analyze the pilot colorectal adenoma data and show that expression levels of AFC, a tumor suppressor gene, in the transitional area from the proliferation zone to the differentiation zone of colon crypts are likely to be associated with risk for colorectal cancer. Given the relatively small sample size in the pilot study, our results need to be validated in the future full-scale studies.  相似文献   

10.
Frailty models are useful for measuring unobserved heterogeneity in risk of failures across clusters, providing cluster-specific risk prediction. In a frailty model, the latent frailties shared by members within a cluster are assumed to act multiplicatively on the hazard function. In order to obtain parameter and frailty variate estimates, we consider the hierarchical likelihood (H-likelihood) approach (Ha, Lee and Song, 2001. Hierarchical-likelihood approach for frailty models. Biometrika 88, 233-243) in which the latent frailties are treated as "parameters" and estimated jointly with other parameters of interest. We find that the H-likelihood estimators perform well when the censoring rate is low, however, they are substantially biased when the censoring rate is moderate to high. In this paper, we propose a simple and easy-to-implement bias correction method for the H-likelihood estimators under a shared frailty model. We also extend the method to a multivariate frailty model, which incorporates complex dependence structure within clusters. We conduct an extensive simulation study and show that the proposed approach performs very well for censoring rates as high as 80%. We also illustrate the method with a breast cancer data set. Since the H-likelihood is the same as the penalized likelihood function, the proposed bias correction method is also applicable to the penalized likelihood estimators.  相似文献   

11.
The development of molecular diagnostic tools to achieve individualized medicine requires identifying predictive biomarkers associated with subgroups of individuals who might receive beneficial or harmful effects from different available treatments. However, due to the large number of candidate biomarkers in the large‐scale genetic and molecular studies, and complex relationships among clinical outcome, biomarkers, and treatments, the ordinary statistical tests for the interactions between treatments and covariates have difficulties from their limited statistical powers. In this paper, we propose an efficient method for detecting predictive biomarkers. We employ weighted loss functions of Chen et al. to directly estimate individual treatment scores and propose synthetic posterior inference for effect sizes of biomarkers. We develop an empirical Bayes approach, namely, we estimate unknown hyperparameters in the prior distribution based on data. We then provide efficient screening methods for the candidate biomarkers via optimal discovery procedure with adequate control of false discovery rate. The proposed method is demonstrated in simulation studies and an application to a breast cancer clinical study in which the proposed method was shown to detect the much larger numbers of significant biomarkers than existing standard methods.  相似文献   

12.
Large amounts of longitudinal health records are now available for dynamic monitoring of the underlying processes governing the observations. However, the health status progression across time is not typically observed directly: records are observed only when a subject interacts with the system, yielding irregular and often sparse observations. This suggests that the observed trajectories should be modeled via a latent continuous‐time process potentially as a function of time‐varying covariates. We develop a continuous‐time hidden Markov model to analyze longitudinal data accounting for irregular visits and different types of observations. By employing a specific missing data likelihood formulation, we can construct an efficient computational algorithm. We focus on Bayesian inference for the model: this is facilitated by an expectation‐maximization algorithm and Markov chain Monte Carlo methods. Simulation studies demonstrate that these approaches can be implemented efficiently for large data sets in a fully Bayesian setting. We apply this model to a real cohort where patients suffer from chronic obstructive pulmonary disease with the outcome being the number of drugs taken, using health care utilization indicators and patient characteristics as covariates.  相似文献   

13.
Substantial gaps exist in our ability to accurately predict prognosis, and these gaps limit our understanding of the complex mechanisms that contribute to the greatest cancer epidemic of our time, prostate cancer. This review addresses contemporary epidemiologic and biostatistical issues in prostate cancer. It covers the science of outcome prediction and biomarker evaluation, recognition of the need to combine biomarkers to improve the accuracy of our outcome estimates and an analysis of current outcome assessment methods, including the TNM staging system and multivariate regression models. The simplicity and intuitive ease of the current TNM staging system must be balanced against its serious limitations in predictive accuracy and its loss of clinical utility. Statistical regression methods are required as we move to the new era of personalized medicine. We must implement statistical approaches that integrate the new molecular biomarkers with existing prognostic biomarkers to accurately predict which patients require treatment and to determine the optimal therapy.  相似文献   

14.
Recent technological advances continue to provide noninvasive and more accurate biomarkers for evaluating disease status. One standard tool for assessing the accuracy of diagnostic tests is the receiver operating characteristic (ROC) curve. Few statistical methods exist to accommodate multiple continuous‐scale biomarkers in the framework of ROC analysis. In this paper, we propose a method to integrate continuous‐scale biomarkers to optimize classification accuracy. Specifically, we develop semiparametric transformation models for multiple biomarkers. We assume that unknown and marker‐specific transformations of biomarkers follow a multivariate normal distribution. Our models accommodate biomarkers subject to limits of detection and account for the dependence among biomarkers by including a subject‐specific random effect. We also propose a diagnostic measure using an optimal linear combination of the transformed biomarkers. Our diagnostic rule does not depend on any monotone transformation of biomarkers and is not sensitive to extreme biomarker values. Nonparametric maximum likelihood estimation (NPMLE) is used for inference. We show that the parameter estimators are asymptotically normal and efficient. We illustrate our semiparametric approach using data from the Endometriosis, Natural History, Diagnosis, and Outcomes (ENDO) study.  相似文献   

15.
Little attention has been paid to the use of multi‐sample batch‐marking studies, as it is generally assumed that an individual's capture history is necessary for fully efficient estimates. However, recently, Huggins et al. ( 2010 ) present a pseudo‐likelihood for a multi‐sample batch‐marking study where they used estimating equations to solve for survival and capture probabilities and then derived abundance estimates using a Horvitz–Thompson‐type estimator. We have developed and maximized the likelihood for batch‐marking studies. We use data simulated from a Jolly–Seber‐type study and convert this to what would have been obtained from an extended batch‐marking study. We compare our abundance estimates obtained from the Crosbie–Manly–Arnason–Schwarz (CMAS) model with those of the extended batch‐marking model to determine the efficiency of collecting and analyzing batch‐marking data. We found that estimates of abundance were similar for all three estimators: CMAS, Huggins, and our likelihood. Gains are made when using unique identifiers and employing the CMAS model in terms of precision; however, the likelihood typically had lower mean square error than the pseudo‐likelihood method of Huggins et al. ( 2010 ). When faced with designing a batch‐marking study, researchers can be confident in obtaining unbiased abundance estimators. Furthermore, they can design studies in order to reduce mean square error by manipulating capture probabilities and sample size.  相似文献   

16.
We propose a parametric regression model for the cumulative incidence functions (CIFs) commonly used for competing risks data. The model adopts a modified logistic model as the baseline CIF and a generalized odds‐rate model for covariate effects, and it explicitly takes into account the constraint that a subject with any given prognostic factors should eventually fail from one of the causes such that the asymptotes of the CIFs should add up to one. This constraint intrinsically holds in a nonparametric analysis without covariates, but is easily overlooked in a semiparametric or parametric regression setting. We hence model the CIF from the primary cause assuming the generalized odds‐rate transformation and the modified logistic function as the baseline CIF. Under the additivity constraint, the covariate effects on the competing cause are modeled by a function of the asymptote of the baseline distribution and the covariate effects on the primary cause. The inference procedure is straightforward by using the standard maximum likelihood theory. We demonstrate desirable finite‐sample performance of our model by simulation studies in comparison with existing methods. Its practical utility is illustrated in an analysis of a breast cancer dataset to assess the treatment effect of tamoxifen, adjusting for age and initial pathological tumor size, on breast cancer recurrence that is subject to dependent censoring by second primary cancers and deaths.  相似文献   

17.
We are studying variable selection in multiple regression models in which molecular markers and/or gene-expression measurements as well as intensity measurements from protein spectra serve as predictors for the outcome variable (i.e., trait or disease state). Finding genetic biomarkers and searching genetic–epidemiological factors can be formulated as a statistical problem of variable selection, in which, from a large set of candidates, a small number of trait-associated predictors are identified. We illustrate our approach by analyzing the data available for chronic fatigue syndrome (CFS). CFS is a complex disease from several aspects, e.g., it is difficult to diagnose and difficult to quantify. To identify biomarkers we used microarray data and SELDI-TOF-based proteomics data. We also analyzed genetic marker information for a large number of SNPs for an overlapping set of individuals. The objectives of the analyses were to identify markers specific to fatigue that are also possibly exclusive to CFS. The use of such models can be motivated, for example, by the search for new biomarkers for the diagnosis and prognosis of cancer and measures of response to therapy. Generally, for this we use Bayesian hierarchical modeling and Markov Chain Monte Carlo computation.  相似文献   

18.
Using biomarkers to model disease course effectively and make early prediction is a challenging but critical path to improving diagnostic accuracy and designing preventive trials for neurological disorders. Leveraging the domain knowledge that certain neuroimaging biomarkers may reflect the disease pathology, we propose a model inspired by the neural mass model from cognitive neuroscience to jointly model nonlinear dynamic trajectories of the biomarkers. Under a nonlinear mixed‐effects model framework, we introduce subject‐ and biomarker‐specific random inflection points to characterize the critical time of underlying disease progression as reflected in the biomarkers. A latent liability score is shared across biomarkers to pool information. Our model allows assessing how the underlying disease progression will affect the trajectories of the biomarkers, and, thus, is potentially useful for individual disease management or preventive therapeutics. We propose an EM algorithm for maximum likelihood estimation, where in the E step, a normal approximation is used to facilitate numerical integration. We perform extensive simulation studies and apply the method to analyze data from a large multisite natural history study of Huntington's Disease (HD). The results show that some neuroimaging biomarker inflection points are early signs of the HD onset. Finally, we develop an online tool to provide the individual prediction of the biomarker trajectories given the medical history and baseline measurements.  相似文献   

19.
Elashoff RM  Li G  Li N 《Biometrics》2008,64(3):762-771
Summary .   In this article we study a joint model for longitudinal measurements and competing risks survival data. Our joint model provides a flexible approach to handle possible nonignorable missing data in the longitudinal measurements due to dropout. It is also an extension of previous joint models with a single failure type, offering a possible way to model informatively censored events as a competing risk. Our model consists of a linear mixed effects submodel for the longitudinal outcome and a proportional cause-specific hazards frailty submodel ( Prentice et al., 1978 , Biometrics 34, 541–554) for the competing risks survival data, linked together by some latent random effects. We propose to obtain the maximum likelihood estimates of the parameters by an expectation maximization (EM) algorithm and estimate their standard errors using a profile likelihood method. The developed method works well in our simulation studies and is applied to a clinical trial for the scleroderma lung disease.  相似文献   

20.
Lyles RH  MacFarlane G 《Biometrics》2000,56(2):634-639
When repeated measures of an exposure variable are obtained on individuals, it can be of epidemiologic interest to relate the slope of this variable over time to a subsequent response. Subject-specific estimates of this slope are measured with error, as are corresponding estimates of the level of exposure, i.e., the intercept of a linear regression over time. Because the intercept is often correlated with the slope and may also be associated with the outcome, each error-prone covariate (intercept and slope) is a potential confounder, thereby tending to accentuate potential biases due to measurement error. Under a familiar mixed linear model for the exposure measurements, we present closed-form estimators for the true parameters of interest in the case of a continuous outcome with complete and equally timed follow-up for all subjects. Generalizations to handle incomplete follow-up, other types of outcome variables, and additional fixed covariates are illustrated via maximum likelihood. We provide examples using data from the Multicenter AIDS Cohort Study. In these examples, substantial adjustments are made to uncorrected parameter estimates corresponding to the health-related effects of exposure variable slopes over time. We illustrate the potential impact of such adjustments on the interpretation of an epidemiologic analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号