首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 265 毫秒
1.
Summary We consider the problem of estimating the effect of exposure on multiple continuous outcomes, when the outcomes are measured on different scales and are nested within multiple outcome classes, or “domains.” Our Bayesian model extends the linear mixed models approach to allow the exposure effect to differ across domains and across outcomes within domains. Our model can be parameterized to allow shrinkage of the effects within the different levels of nesting, or to allow fixed domain‐specific effects with no shrinkage. Our model also allows covariate effects to differ across outcomes and domains. Our methodology is applied to data on prenatal methylmercury exposure and multiple outcomes in four domains measured at 9 years of age on children enrolled in the Seychelles Child Development Study. We use three different priors and found that our main conclusions were not sensitive to the choice of prior. Simulation studies examine the model performance under alternative scenarios. Our results demonstrate that a sizeable increase in power is possible.  相似文献   

2.
Dunson DB  Chen Z  Harry J 《Biometrics》2003,59(3):521-530
In applications that involve clustered data, such as longitudinal studies and developmental toxicity experiments, the number of subunits within a cluster is often correlated with outcomes measured on the individual subunits. Analyses that ignore this dependency can produce biased inferences. This article proposes a Bayesian framework for jointly modeling cluster size and multiple categorical and continuous outcomes measured on each subunit. We use a continuation ratio probit model for the cluster size and underlying normal regression models for each of the subunit-specific outcomes. Dependency between cluster size and the different outcomes is accommodated through a latent variable structure. The form of the model facilitates posterior computation via a simple and computationally efficient Gibbs sampler. The approach is illustrated with an application to developmental toxicity data, and other applications, to joint modeling of longitudinal and event time data, are discussed.  相似文献   

3.
Bayesian hierarchical models have been applied in clinical trials to allow for information sharing across subgroups. Traditional Bayesian hierarchical models do not have subgroup classifications; thus, information is shared across all subgroups. When the difference between subgroups is large, it suggests that the subgroups belong to different clusters. In that case, placing all subgroups in one pool and borrowing information across all subgroups can result in substantial bias for the subgroups with strong borrowing, or a lack of efficiency gain with weak borrowing. To resolve this difficulty, we propose a hierarchical Bayesian classification and information sharing (BaCIS) model for the design of multigroup phase II clinical trials with binary outcomes. We introduce subgroup classification into the hierarchical model. Subgroups are classified into two clusters on the basis of their outcomes mimicking the hypothesis testing framework. Subsequently, information sharing takes place within subgroups in the same cluster, rather than across all subgroups. This method can be applied to the design and analysis of multigroup clinical trials with binary outcomes. Compared to the traditional hierarchical models, better operating characteristics are obtained with the BaCIS model under various scenarios.  相似文献   

4.
We present a Bayesian approach to analyze matched "case-control" data with multiple disease states. The probability of disease development is described by a multinomial logistic regression model. The exposure distribution depends on the disease state and could vary across strata. In such a model, the number of stratum effect parameters grows in direct proportion to the sample size leading to inconsistent MLEs for the parameters of interest even when one uses a retrospective conditional likelihood. We adopt a semiparametric Bayesian framework instead, assuming a Dirichlet process prior with a mixing normal distribution on the distribution of the stratum effects. We also account for possible missingness in the exposure variable in our model. The actual estimation is carried out through a Markov chain Monte Carlo numerical integration scheme. The proposed methodology is illustrated through simulation and an example of a matched study on low birth weight of newborns (Hosmer, D. A. and Lemeshow, S., 2000, Applied Logistic Regression) with two possible disease groups matched with a control group.  相似文献   

5.
We propose a model-based clustering method for high-dimensional longitudinal data via regularization in this paper. This study was motivated by the Trial of Activity in Adolescent Girls (TAAG), which aimed to examine multilevel factors related to the change of physical activity by following up a cohort of 783 girls over 10 years from adolescence to early adulthood. Our goal is to identify the intrinsic grouping of subjects with similar patterns of physical activity trajectories and the most relevant predictors within each group. The previous analyses conducted clustering and variable selection in two steps, while our new method can perform the tasks simultaneously. Within each cluster, a linear mixed-effects model (LMM) is fitted with a doubly penalized likelihood to induce sparsity for parameter estimation and effect selection. The large-sample joint properties are established, allowing the dimensions of both fixed and random effects to increase at an exponential rate of the sample size, with a general class of penalty functions. Assuming subjects are drawn from a Gaussian mixture distribution, model effects and cluster labels are estimated via a coordinate descent algorithm nested inside the Expectation-Maximization (EM) algorithm. Bayesian Information Criterion (BIC) is used to determine the optimal number of clusters and the values of tuning parameters. Our numerical studies show that the new method has satisfactory performance and is able to accommodate complex data with multilevel and/or longitudinal effects.  相似文献   

6.
In longitudinal studies and in clustered situations often binary and continuous response variables are observed and need to be modeled together. In a recent publication Dunson, Chen, and Harry (2003, Biometrics 59, 521-530) (DCH) propose a Bayesian approach for joint modeling of cluster size and binary and continuous subunit-specific outcomes and illustrate this approach with a developmental toxicity data example. In this note we demonstrate how standard software (PROC NLMIXED in SAS) can be used to obtain maximum likelihood estimates in an alternative parameterization of the model with a single cluster-level factor considered by DCH for that example. We also suggest that a more general model with additional cluster-level random effects provides a better fit to the data set. An apparent discrepancy between the estimates obtained by DCH and the estimates obtained earlier by Catalano and Ryan (1992, Journal of the American Statistical Association 87, 651-658) is also resolved. The issue of bias in inferences concerning the dose effect when cluster size is ignored is discussed. The maximum-likelihood approach considered herein is applicable to general situations with multiple clustered or longitudinally measured outcomes of different type and does not require prior specification and extensive programming.  相似文献   

7.
Summary This article addresses modeling and inference for ordinal outcomes nested within categorical responses. We propose a mixture of normal distributions for latent variables associated with the ordinal data. This mixture model allows us to fix without loss of generality the cutpoint parameters that link the latent variable with the observed ordinal outcome. Moreover, the mixture model is shown to be more flexible in estimating cell probabilities when compared to the traditional Bayesian ordinal probit regression model with random cutpoint parameters. We extend our model to take into account possible dependence among the outcomes in different categories. We apply the model to a randomized phase III study to compare treatments on the basis of toxicities recorded by type of toxicity and grade within type. The data include the different (categorical) toxicity types exhibited in each patient. Each type of toxicity has an (ordinal) grade associated to it. The dependence among the different types of toxicity exhibited by the same patient is modeled by introducing patient‐specific random effects.  相似文献   

8.
In observational studies, subjects are often nested within clusters. In medical studies, patients are often treated by doctors and therefore patients are regarded as nested or clustered within doctors. A concern that arises with clustered data is that cluster-level characteristics (e.g., characteristics of the doctor) are associated with both treatment selection and patient outcomes, resulting in cluster-level confounding. Measuring and modeling cluster attributes can be difficult and statistical methods exist to control for all unmeasured cluster characteristics. An assumption of these methods however is that characteristics of the cluster and the effects of those characteristics on the outcome (as well as probability of treatment assignment when using covariate balancing methods) are constant over time. In this paper, we consider methods that relax this assumption and allow for estimation of treatment effects in the presence of unmeasured time-dependent cluster confounding. The methods are based on matching with the propensity score and incorporate unmeasured time-specific cluster effects by performing matching within clusters or using fixed- or random-cluster effects in the propensity score model. The methods are illustrated using data to compare the effectiveness of two total hip devices with respect to survival of the device and a simulation study is performed that compares the proposed methods. One method that was found to perform well is matching within surgeon clusters partitioned by time. Considerations in implementing the proposed methods are discussed.  相似文献   

9.
Model-based clustering is a popular tool for summarizing high-dimensional data. With the number of high-throughput large-scale gene expression studies still on the rise, the need for effective data- summarizing tools has never been greater. By grouping genes according to a common experimental expression profile, we may gain new insight into the biological pathways that steer biological processes of interest. Clustering of gene profiles can also assist in assigning functions to genes that have not yet been functionally annotated. In this paper, we propose 2 model selection procedures for model-based clustering. Model selection in model-based clustering has to date focused on the identification of data dimensions that are relevant for clustering. However, in more complex data structures, with multiple experimental factors, such an approach does not provide easily interpreted clustering outcomes. We propose a mixture model with multiple levels, , that provides sparse representations both "within" and "between" cluster profiles. We explore various flexible "within-cluster" parameterizations and discuss how efficient parameterizations can greatly enhance the objective interpretability of the generated clusters. Moreover, we allow for a sparse "between-cluster" representation with a different number of clusters at different levels of an experimental factor of interest. This enhances interpretability of clusters generated in multiple-factor contexts. Interpretable cluster profiles can assist in detecting biologically relevant groups of genes that may be missed with less efficient parameterizations. We use our multilevel mixture model to mine a proliferating cell line expression data set for annotational context and regulatory motifs. We also investigate the performance of the multilevel clustering approach on several simulated data sets.  相似文献   

10.
Herring AH  Yang J 《Biometrics》2007,63(2):381-388
An individual's health condition can affect the frequency and intensity of episodes that can occur repeatedly and that may be related to an event time of interest. For example, bleeding episodes during pregnancy may indicate problems predictive of preterm delivery. Motivated by this application, we propose a joint model for a multiple episode process and an event time. The frequency of occurrence and severity of the episodes are characterized by a latent variable model, which allows an individual's episode intensity to change dynamically over time. This latent episode intensity is then incorporated as a predictor in a discrete time model for the terminating event. Time-varying coefficients are used to distinguish among effects earlier versus later in gestation. Formulating the model within a Bayesian framework, prior distributions are chosen so that conditional posterior distributions are conjugate after data augmentation. Posterior computation proceeds via an efficient Gibbs sampling algorithm. The methods are illustrated using bleeding episode and gestational length data from a pregnancy study.  相似文献   

11.

Background  

With the increasing amount of data generated in molecular genetics laboratories, it is often difficult to make sense of results because of the vast number of different outcomes or variables studied. Examples include expression levels for large numbers of genes and haplotypes at large numbers of loci. It is then natural to group observations into smaller numbers of classes that allow for an easier overview and interpretation of the data. This grouping is often carried out in multiple steps with the aid of hierarchical cluster analysis, each step leading to a smaller number of classes by combining similar observations or classes. At each step, either implicitly or explicitly, researchers tend to interpret results and eventually focus on that set of classes providing the "best" (most significant) result. While this approach makes sense, the overall statistical significance of the experiment must include the clustering process, which modifies the grouping structure of the data and often removes variation.  相似文献   

12.
Peak lists derived from nuclear magnetic resonance (NMR) spectra are commonly used as input data for a variety of computer assisted and automated analyses. These include automated protein resonance assignment and protein structure calculation software tools. Prior to these analyses, peak lists must be aligned to each other and sets of related peaks must be grouped based on common chemical shift dimensions. Even when programs can perform peak grouping, they require the user to provide uniform match tolerances or use default values. However, peak grouping is further complicated by multiple sources of variance in peak position limiting the effectiveness of grouping methods that utilize uniform match tolerances. In addition, no method currently exists for deriving peak positional variances from single peak lists for grouping peaks into spin systems, i.e. spin system grouping within a single peak list. Therefore, we developed a complementary pair of peak list registration analysis and spin system grouping algorithms designed to overcome these limitations. We have implemented these algorithms into an approach that can identify multiple dimension-specific positional variances that exist in a single peak list and group peaks from a single peak list into spin systems. The resulting software tools generate a variety of useful statistics on both a single peak list and pairwise peak list alignment, especially for quality assessment of peak list datasets. We used a range of low and high quality experimental solution NMR and solid-state NMR peak lists to assess performance of our registration analysis and grouping algorithms. Analyses show that an algorithm using a single iteration and uniform match tolerances approach is only able to recover from 50 to 80% of the spin systems due to the presence of multiple sources of variance. Our algorithm recovers additional spin systems by reevaluating match tolerances in multiple iterations. To facilitate evaluation of the algorithms, we developed a peak list simulator within our nmrstarlib package that generates user-defined assigned peak lists from a given BMRB entry or database of entries. In addition, over 100,000 simulated peak lists with one or two sources of variance were generated to evaluate the performance and robustness of these new registration analysis and peak grouping algorithms.  相似文献   

13.

Background

A tremendous amount of efforts have been devoted to identifying genes for diagnosis and prognosis of diseases using microarray gene expression data. It has been demonstrated that gene expression data have cluster structure, where the clusters consist of co-regulated genes which tend to have coordinated functions. However, most available statistical methods for gene selection do not take into consideration the cluster structure.

Results

We propose a supervised group Lasso approach that takes into account the cluster structure in gene expression data for gene selection and predictive model building. For gene expression data without biological cluster information, we first divide genes into clusters using the K-means approach and determine the optimal number of clusters using the Gap method. The supervised group Lasso consists of two steps. In the first step, we identify important genes within each cluster using the Lasso method. In the second step, we select important clusters using the group Lasso. Tuning parameters are determined using V-fold cross validation at both steps to allow for further flexibility. Prediction performance is evaluated using leave-one-out cross validation. We apply the proposed method to disease classification and survival analysis with microarray data.

Conclusion

We analyze four microarray data sets using the proposed approach: two cancer data sets with binary cancer occurrence as outcomes and two lymphoma data sets with survival outcomes. The results show that the proposed approach is capable of identifying a small number of influential gene clusters and important genes within those clusters, and has better prediction performance than existing methods.  相似文献   

14.
Lin X  Ryan L  Sammel M  Zhang D  Padungtod C  Xu X 《Biometrics》2000,56(2):593-601
We propose a scaled linear mixed model to assess the effects of exposure and other covariates on multiple continuous outcomes. The most general form of the model allows a different exposure effect for each outcome. An important special case is a model that represents the exposure effects using a common global measure that can be characterized in terms of effect sizes. Correlations among different outcomes within the same subject are accommodated using random effects. We develop two approaches to model fitting, including the maximum likelihood method and the working parameter method. A key feature of both methods is that they can be easily implemented by repeatedly calling software for fitting standard linear mixed models, e.g., SAS PROC MIXED. Compared to the maximum likelihood method, the working parameter method is easier to implement and yields fully efficient estimators of the parameters of interest. We illustrate the proposed methods by analyzing data from a study of the effects of occupational pesticide exposure on semen quality in a cohort of Chinese men.  相似文献   

15.
Computational, systems-based approaches can provide a quantitative construct for evaluating risk in the context of mechanistic data. Previously, we developed computational models for the rat, mouse, rhesus monkey, and human, describing the acquisition of adult neuron number in the neocortex during the key neurodevelopmental processes of neurogenesis and synaptogenesis. Here we apply mechanistic data from the rat describing ethanol-induced toxicity in the developing neocortex to evaluate the utility of these models for analyzing neurodevelopmental toxicity across species. Our model can explain long-term neocortical neuronal loss in the rodent model after in utero exposure to ethanol based on inhibition of proliferation during neurogenesis. Our human model predicts a significant neuronal deficit after daily peak BECs reaching 10-20 mg/dl, which is the approximate BEC reached after drinking one standard drink within one hour. In contrast, peak daily BECs of 100 mg/dl are necessary to predict similar deficits in the rat. Our model prediction of increased sensitivity of primate species to ethanol-induced inhibition of proliferation is based on application of in vivo experimental data from primates showing a prolonged rapid growth period in the primate versus rodent neuronal progenitor population. To place our predictions into a broader context, we evaluate the evidence for functional low-dose effects across rats, monkeys, and humans. Results from this critical evaluation suggest subtle effects are evident at doses causing peak BECs of approximately 20 mg/dl daily, corroborating our model predictions. Our example highlights the utility of a systems-based modeling approach in risk assessment.  相似文献   

16.
We explore the problem of variable selection in a case‐control setting with mass spectrometry proteomic data consisting of paired measurements. Each pair corresponds to a distinct isotope cluster and each component within pair represents a summary of isotopic expression based on either the intensity or the shape of the cluster. Our objective is to identify a collection of isotope clusters associated with the disease outcome and at the same time assess the predictive added‐value of shape beyond intensity while maintaining predictive performance. We propose a Bayesian model that exploits the paired structure of our data and utilizes prior information on the relative predictive power of each source by introducing multiple layers of selection. This allows us to make simultaneous inference on which are the most informative pairs and for which—and to what extent—shape has a complementary value in separating the two groups. We evaluate the Bayesian model on pancreatic cancer data. Results from the fitted model show that most predictive potential is achieved with a subset of just six (out of 1289) pairs while the contribution of the intensity components is much higher than the shape components. To demonstrate how the method behaves under a controlled setting we consider a simulation study. Results from this study indicate that the proposed approach can successfully select the truly predictive pairs and accurately estimate the effects of both components although, in some cases, the model tends to overestimate the inclusion probability of the second component.  相似文献   

17.
Biomedical studies often collect multivariate event time data from multiple clusters (either subjects or groups) within each of which event times for individuals are correlated and the correlation may vary in different classes. In such survival analyses, heterogeneity among clusters for shared and specific classes can be accommodated by incorporating parametric frailty terms into the model. In this article, we propose a Bayesian approach to relax the parametric distribution assumption for shared and specific‐class frailties by using a Dirichlet process prior while also allowing for the uncertainty of heterogeneity for different classes. Multiple cluster‐specific frailty selections rely on variable selection‐type mixture priors by applying mixtures of point masses at zero and inverse gamma distributions to the variance of log frailties. This selection allows frailties with zero variance to effectively drop out of the model. A reparameterization of log‐frailty terms is performed to reduce the potential bias of fixed effects due to variation of the random distribution and dependence among the parameters resulting in easy interpretation and faster Markov chain Monte Carlo convergence. Simulated data examples and an application to a lung cancer clinical trial are used for illustration.  相似文献   

18.
Dunson DB  Perreault SD 《Biometrics》2001,57(1):302-308
This article describes a general class of factor analytic models for the analysis of clustered multivariate data in the presence of informative missingness. We assume that there are distinct sets of cluster-level latent variables related to the primary outcomes and to the censoring process, and we account for dependency between these latent variables through a hierarchical model. A linear model is used to relate covariates and latent variables to the primary outcomes for each subunit. A generalized linear model accounts for covariate and latent variable effects on the probability of censoring for subunits within each cluster. The model accounts for correlation within clusters and within subunits through a flexible factor analytic framework that allows multiple latent variables and covariate effects on the latent variables. The structure of the model facilitates implementation of Markov chain Monte Carlo methods for posterior estimation. Data from a spermatotoxicity study are analyzed to illustrate the proposed approach.  相似文献   

19.
Association Models for Clustered Data with Binary and Continuous Responses   总被引:1,自引:0,他引:1  
Summary .  We consider analysis of clustered data with mixed bivariate responses, i.e., where each member of the cluster has a binary and a continuous outcome. We propose a new bivariate random effects model that induces associations among the binary outcomes within a cluster, among the continuous outcomes within a cluster, between a binary outcome and a continuous outcome from different subjects within a cluster, as well as the direct association between the binary and continuous outcomes within the same subject. For the ease of interpretations of the regression effects, the marginal model of the binary response probability integrated over the random effects preserves the logistic form and the marginal expectation of the continuous response preserves the linear form. We implement maximum likelihood estimation of our model parameters using standard software such as PROC NLMIXED of SAS . Our simulation study demonstrates the robustness of our method with respect to the misspecification of the regression model as well as the random effects model. We illustrate our methodology by analyzing a developmental toxicity study of ethylene glycol in mice.  相似文献   

20.
Summary : We propose a semiparametric Bayesian method for handling measurement error in nutritional epidemiological data. Our goal is to estimate nonparametrically the form of association between a disease and exposure variable while the true values of the exposure are never observed. Motivated by nutritional epidemiological data, we consider the setting where a surrogate covariate is recorded in the primary data, and a calibration data set contains information on the surrogate variable and repeated measurements of an unbiased instrumental variable of the true exposure. We develop a flexible Bayesian method where not only is the relationship between the disease and exposure variable treated semiparametrically, but also the relationship between the surrogate and the true exposure is modeled semiparametrically. The two nonparametric functions are modeled simultaneously via B‐splines. In addition, we model the distribution of the exposure variable as a Dirichlet process mixture of normal distributions, thus making its modeling essentially nonparametric and placing this work into the context of functional measurement error modeling. We apply our method to the NIH‐AARP Diet and Health Study and examine its performance in a simulation study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号