首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Dropouts are common in longitudinal study. If the dropout probability depends on the missing observations at or after dropout, this type of dropout is called informative (or nonignorable) dropout (ID). Failure to accommodate such dropout mechanism into the model will bias the parameter estimates. We propose a conditional autoregressive model for longitudinal binary data with an ID model such that the probabilities of positive outcomes as well as the drop‐out indicator in each occasion are logit linear in some covariates and outcomes. This model adopting a marginal model for outcomes and a conditional model for dropouts is called a selection model. To allow for the heterogeneity and clustering effects, the outcome model is extended to incorporate mixture and random effects. Lastly, the model is further extended to a novel model that models the outcome and dropout jointly such that their dependency is formulated through an odds ratio function. Parameters are estimated by a Bayesian approach implemented using the user‐friendly Bayesian software WinBUGS. A methadone clinic dataset is analyzed to illustrate the proposed models. Result shows that the treatment time effect is still significant but weaker after allowing for an ID process in the data. Finally the effect of drop‐out on parameter estimates is evaluated through simulation studies.  相似文献   

2.
Logistic regression for dependent binary observations   总被引:3,自引:0,他引:3  
G E Bonney 《Biometrics》1987,43(4):951-973
The likelihood of a set of binary dependent outcomes, with or without explanatory variables, is expressed as a product of conditional probabilities each of which is assumed to be logistic. The models are called regressive logistic models. They provide a simple but relatively unknown parametrization of the multivariate distribution. They have the theoretical and practical advantage that they can be analyzed and fitted as in logistic regression for independent outcomes, and with the same computer programs. The paper is largely expository and is intended to motivate the development and usage of the regressive logistic models. The discussion includes serially dependent outcomes, equally predictive outcomes, more specialized patterns of dependence, multidimensional tables, and three examples.  相似文献   

3.
Despite major methodological developments, Bayesian inference in Gaussian graphical models remains challenging in high dimension due to the tremendous size of the model space. This article proposes a method to infer the marginal and conditional independence structures between variables by multiple testing, which bypasses the exploration of the model space. Specifically, we introduce closed‐form Bayes factors under the Gaussian conjugate model to evaluate the null hypotheses of marginal and conditional independence between variables. Their computation for all pairs of variables is shown to be extremely efficient, thereby allowing us to address large problems with thousands of nodes as required by modern applications. Moreover, we derive exact tail probabilities from the null distributions of the Bayes factors. These allow the use of any multiplicity correction procedure to control error rates for incorrect edge inclusion. We demonstrate the proposed approach on various simulated examples as well as on a large gene expression data set from The Cancer Genome Atlas.  相似文献   

4.
In many observational studies, individuals are measured repeatedly over time, although not necessarily at a set of pre-specified occasions. Instead, individuals may be measured at irregular intervals, with those having a history of poorer health outcomes being measured with somewhat greater frequency and regularity. In this paper, we consider likelihood-based estimation of the regression parameters in marginal models for longitudinal binary data when the follow-up times are not fixed by design, but can depend on previous outcomes. In particular, we consider assumptions regarding the follow-up time process that result in the likelihood function separating into two components: one for the follow-up time process, the other for the outcome measurement process. The practical implication of this separation is that the follow-up time process can be ignored when making likelihood-based inferences about the marginal regression model parameters. That is, maximum likelihood (ML) estimation of the regression parameters relating the probability of success at a given time to covariates does not require that a model for the distribution of follow-up times be specified. However, to obtain consistent parameter estimates, the multinomial distribution for the vector of repeated binary outcomes must be correctly specified. In general, ML estimation requires specification of all higher-order moments and the likelihood for a marginal model can be intractable except in cases where the number of repeated measurements is relatively small. To circumvent these difficulties, we propose a pseudolikelihood for estimation of the marginal model parameters. The pseudolikelihood uses a linear approximation for the conditional distribution of the response at any occasion, given the history of previous responses. The appeal of this approximation is that the conditional distributions are functions of the first two moments of the binary responses only. When the follow-up times depend only on the previous outcome, the pseudolikelihood requires correct specification of the conditional distribution of the current outcome given the outcome at the previous occasion only. Results from a simulation study and a study of asymptotic bias are presented. Finally, we illustrate the main results using data from a longitudinal observational study that explored the cardiotoxic effects of doxorubicin chemotherapy for the treatment of acute lymphoblastic leukemia in children.  相似文献   

5.
An important assumption in observational studies is that sampled individuals are representative of some larger study population. Yet, this assumption is often unrealistic. Notable examples include online public‐opinion polls, publication biases associated with statistically significant results, and in ecology, telemetry studies with significant habitat‐induced probabilities of missed locations. This problem can be overcome by modeling selection probabilities simultaneously with other predictor–response relationships or by weighting observations by inverse selection probabilities. We illustrate the problem and a solution when modeling mixed migration strategies of northern white‐tailed deer (Odocoileus virginianus). Captures occur on winter yards where deer migrate in response to changing environmental conditions. Yet, not all deer migrate in all years, and captures during mild years are more likely to target deer that migrate every year (i.e., obligate migrators). Characterizing deer as conditional or obligate migrators is also challenging unless deer are observed for many years and under a variety of winter conditions. We developed a hidden Markov model where the probability of capture depends on each individual's migration strategy (conditional versus obligate migrator), a partially latent variable that depends on winter severity in the year of capture. In a 15‐year study, involving 168 white‐tailed deer, the estimated probability of migrating for conditional migrators increased nonlinearly with an index of winter severity. We estimated a higher proportion of obligates in the study cohort than in the population, except during a span of 3 years surrounding back‐to‐back severe winters. These results support the hypothesis that selection biases occur as a result of capturing deer on winter yards, with the magnitude of bias depending on the severity of winter weather. Hidden Markov models offer an attractive framework for addressing selection biases due to their ability to incorporate latent variables and model direct and indirect links between state variables and capture probabilities.  相似文献   

6.
Logistic regression in capture-recapture models   总被引:6,自引:1,他引:5  
J M Alho 《Biometrics》1990,46(3):623-635
The effect of population heterogeneity in capture-recapture, or dual registration, models is discussed. An estimator of the unknown population size based on a logistic regression model is introduced. The model allows different capture probabilities across individuals and across capture times. The probabilities are estimated from the observed data using conditional maximum likelihood. The resulting population estimator is shown to be consistent and asymptotically normal. A variance estimator under population heterogeneity is derived. The finite-sample properties of the estimators are studied via simulation. An application to Finnish occupational disease registration data is presented.  相似文献   

7.
This work is motivated by clinical trials in chronic heart failure disease, where treatment has effects both on morbidity (assessed as recurrent non‐fatal hospitalisations) and on mortality (assessed as cardiovascular death, CV death). Recently, a joint frailty proportional hazards model has been proposed for these kind of efficacy outcomes to account for a potential association between the risk rates for hospital admissions and CV death. However, more often clinical trial results are presented by treatment effect estimates that have been derived from marginal proportional hazards models, that is, a Cox model for mortality and an Andersen–Gill model for recurrent hospitalisations. We show how these marginal hazard ratios and their estimates depend on the association between the risk processes, when these are actually linked by shared or dependent frailty terms. First we derive the marginal hazard ratios as a function of time. Then, applying least false parameter theory, we show that the marginal hazard ratio estimate for the hospitalisation rate depends on study duration and on parameters of the underlying joint frailty model. In particular, we identify parameters, for example the treatment effect on mortality, that determine if the marginal hazard ratio estimate for hospitalisations is smaller, equal or larger than the conditional one. How this affects rejection probabilities is further investigated in simulation studies. Our findings can be used to interpret marginal hazard ratio estimates in heart failure trials and are illustrated by the results of the CHARM‐Preserved trial (where CHARM is the ‘Candesartan in Heart failure Assessment of Reduction in Mortality and morbidity’ programme).  相似文献   

8.
Deletion diagnostics are introduced for the regression analysis of clustered binary outcomes estimated with alternating logistic regressions, an implementation of generalized estimating equations (GEE) that estimates regression coefficients in a marginal mean model and in a model for the intracluster association given by the log odds ratio. The diagnostics are developed within an estimating equations framework that recasts the estimating functions for association parameters based upon conditional residuals into equivalent functions based upon marginal residuals. Extensions of earlier work on GEE diagnostics follow directly, including computational formulae for one‐step deletion diagnostics that measure the influence of a cluster of observations on the estimated regression parameters and on the overall marginal mean or association model fit. The diagnostic formulae are evaluated with simulations studies and with an application concerning an assessment of factors associated with health maintenance visits in primary care medical practices. The application and the simulations demonstrate that the proposed cluster‐deletion diagnostics for alternating logistic regressions are good approximations of their exact fully iterated counterparts.  相似文献   

9.
The risk difference is an intelligible measure for comparing disease incidence in two exposure or treatment groups. Despite its convenience in interpretation, it is less prevalent in epidemiological and clinical areas where regression models are required in order to adjust for confounding. One major barrier to its popularity is that standard linear binomial or Poisson regression models can provide estimated probabilities out of the range of (0,1), resulting in possible convergence issues. For estimating adjusted risk differences, we propose a general framework covering various constraint approaches based on binomial and Poisson regression models. The proposed methods span the areas of ordinary least squares, maximum likelihood estimation, and Bayesian inference. Compared to existing approaches, our methods prevent estimates and confidence intervals of predicted probabilities from falling out of the valid range. Through extensive simulation studies, we demonstrate that the proposed methods solve the issue of having estimates or confidence limits of predicted probabilities out of (0,1), while offering performance comparable to its alternative in terms of the bias, variability, and coverage rates in point and interval estimation of the risk difference. An application study is performed using data from the Prospective Registry Evaluating Myocardial Infarction: Event and Recovery (PREMIER) study.  相似文献   

10.
Cluster randomized trials (CRTs) frequently recruit a small number of clusters, therefore necessitating the application of small-sample corrections for valid inference. A recent systematic review indicated that CRTs reporting right-censored, time-to-event outcomes are not uncommon and that the marginal Cox proportional hazards model is one of the common approaches used for primary analysis. While small-sample corrections have been studied under marginal models with continuous, binary, and count outcomes, no prior research has been devoted to the development and evaluation of bias-corrected sandwich variance estimators when clustered time-to-event outcomes are analyzed by the marginal Cox model. To improve current practice, we propose nine bias-corrected sandwich variance estimators for the analysis of CRTs using the marginal Cox model and report on a simulation study to evaluate their small-sample properties. Our results indicate that the optimal choice of bias-corrected sandwich variance estimator for CRTs with survival outcomes can depend on the variability of cluster sizes and can also slightly differ whether it is evaluated according to relative bias or type I error rate. Finally, we illustrate the new variance estimators in a real-world CRT where the conclusion about intervention effectiveness differs depending on the use of small-sample bias corrections. The proposed sandwich variance estimators are implemented in an R package CoxBcv .  相似文献   

11.
Bird ring‐recovery data have been widely used to estimate demographic parameters such as survival probabilities since the mid‐20th century. However, while the total number of birds ringed each year is usually known, historical information on age at ringing is often not available. A standard ring‐recovery model, for which information on age at ringing is required, cannot be used when historical data are incomplete. We develop a new model to estimate age‐dependent survival probabilities from such historical data when age at ringing is not recorded; we call this the historical data model. This new model provides an extension to the model of Robinson, 2010, Ibis, 152, 651–795 by estimating the proportion of the ringed birds marked as juveniles as an additional parameter. We conduct a simulation study to examine the performance of the historical data model and compare it with other models including the standard and conditional ring‐recovery models. Simulation studies show that the approach of Robinson, 2010, Ibis, 152, 651–795 can cause bias in parameter estimates. In contrast, the historical data model yields similar parameter estimates to the standard model. Parameter redundancy results show that the newly developed historical data model is comparable to the standard ring‐recovery model, in terms of which parameters can be estimated, and has fewer identifiability issues than the conditional model. We illustrate the new proposed model using Blackbird and Sandwich Tern data. The new historical data model allows us to make full use of historical data and estimate the same parameters as the standard model with incomplete data, and in doing so, detect potential changes in demographic parameters further back in time.  相似文献   

12.
An evaluation of randomization models for nested species subsets analysis   总被引:5,自引:0,他引:5  
Randomization models, often termed “null” models, have been widely used since the 1970s in studies of species community and biogeographic patterns. More recently they have been used to test for nested species subset patterns (or nestedness) among assemblages of species occupying spatially subdivided habitats, such as island archipelagoes and terrestrial habitat patches. Nestedness occurs when the species occupying small or species-poor sites have a strong tendency to form proper subsets of richer species assemblages. In this paper, we examine the ability of several published simulation models to detect, in an unbiased way, nested subset patterns from a simple matrix of site-by-species presence-absence data. Each approach attempts to build in biological realism by following the assumption that the ecological processes that generated the patterns observed in nature would, if they could be repeated many times over using the same species and landscape configuration, produce islands with the same number of species and species present on the same number of islands as observed. In mathematical terms, the mean marginal totals (column and row sums) of many simulated matrices would match those of the observed matrix. Results of model simulations suggest that the true probability of a species occupying any given site cannot be estimated unambiguously. Nearly all of the models tested were shown to bias simulation matrices toward low levels of nestedness, increasing the probability of a Type I statistical error. Further, desired marginal totals could be obtained only through ad-hoc manipulation of the calculated probabilities. Paradoxically, when such results are achieved, the model is shown to have little statistical power to detect nestedness. This is because nestedness is determined largely by the marginal totals of the matrix themselves, as suggested earlier by Wright and Reeves. We conclude that at the present time, the best null model for nested subset patterns may be one based on equal probabilities of occurrence for all species. Examples of such models are readily available in the literature. Received: 3 February 1997 / Accepted: 21 September 1997  相似文献   

13.
In longitudinal studies of disease, patients may experience several events through a follow‐up period. In these studies, the sequentially ordered events are often of interest and lead to problems that have received much attention recently. Issues of interest include the estimation of bivariate survival, marginal distributions, and the conditional distribution of gap times. In this work, we consider the estimation of the survival function conditional to a previous event. Different nonparametric approaches will be considered for estimating these quantities, all based on the Kaplan–Meier estimator of the survival function. We explore the finite sample behavior of the estimators through simulations. The different methods proposed in this article are applied to a dataset from a German Breast Cancer Study. The methods are used to obtain predictors for the conditional survival probabilities as well as to study the influence of recurrence in overall survival.  相似文献   

14.
The joint analysis of spatial and temporal processes poses computational challenges due to the data's high dimensionality. Furthermore, such data are commonly non‐Gaussian. In this paper, we introduce a copula‐based spatiotemporal model for analyzing spatiotemporal data and propose a semiparametric estimator. The proposed algorithm is computationally simple, since it models the marginal distribution and the spatiotemporal dependence separately. Instead of assuming a parametric distribution, the proposed method models the marginal distributions nonparametrically and thus offers more flexibility. The method also provides a convenient way to construct both point and interval predictions at new times and locations, based on the estimated conditional quantiles. Through a simulation study and an analysis of wind speeds observed along the border between Oregon and Washington, we show that our method produces more accurate point and interval predictions for skewed data than those based on normality assumptions.  相似文献   

15.
Bayesian segmentation of protein secondary structure.   总被引:12,自引:0,他引:12  
We present a novel method for predicting the secondary structure of a protein from its amino acid sequence. Most existing methods predict each position in turn based on a local window of residues, sliding this window along the length of the sequence. In contrast, we develop a probabilistic model of protein sequence/structure relationships in terms of structural segments, and formulate secondary structure prediction as a general Bayesian inference problem. A distinctive feature of our approach is the ability to develop explicit probabilistic models for alpha-helices, beta-strands, and other classes of secondary structure, incorporating experimentally and empirically observed aspects of protein structure such as helical capping signals, side chain correlations, and segment length distributions. Our model is Markovian in the segments, permitting efficient exact calculation of the posterior probability distribution over all possible segmentations of the sequence using dynamic programming. The optimal segmentation is computed and compared to a predictor based on marginal posterior modes, and the latter is shown to provide significant improvement in predictive accuracy. The marginalization procedure provides exact secondary structure probabilities at each sequence position, which are shown to be reliable estimates of prediction uncertainty. We apply this model to a database of 452 nonhomologous structures, achieving accuracies as high as the best currently available methods. We conclude by discussing an extension of this framework to model nonlocal interactions in protein structures, providing a possible direction for future improvements in secondary structure prediction accuracy.  相似文献   

16.
With the increasing use of survival models in animal breeding to address the genetic aspects of mainly longevity of livestock but also disease traits, the need for methods to infer genetic correlations and to do multivariate evaluations of survival traits and other types of traits has become increasingly important. In this study we derived and implemented a bivariate quantitative genetic model for a linear Gaussian and a survival trait that are genetically and environmentally correlated. For the survival trait, we considered the Weibull log-normal animal frailty model. A Bayesian approach using Gibbs sampling was adopted. Model parameters were inferred from their marginal posterior distributions. The required fully conditional posterior distributions were derived and issues on implementation are discussed. The two Weibull baseline parameters were updated jointly using a Metropolis-Hasting step. The remaining model parameters with non-normalized fully conditional distributions were updated univariately using adaptive rejection sampling. Simulation results showed that the estimated marginal posterior distributions covered well and placed high density to the true parameter values used in the simulation of data. In conclusion, the proposed method allows inferring additive genetic and environmental correlations, and doing multivariate genetic evaluation of a linear Gaussian trait and a survival trait.  相似文献   

17.
The multilevel item response theory (MLIRT) models have been increasingly used in longitudinal clinical studies that collect multiple outcomes. The MLIRT models account for all the information from multiple longitudinal outcomes of mixed types (e.g., continuous, binary, and ordinal) and can provide valid inference for the overall treatment effects. However, the continuous outcomes and the random effects in the MLIRT models are often assumed to be normally distributed. The normality assumption can sometimes be unrealistic and thus may produce misleading results. The normal/independent (NI) distributions have been increasingly used to handle the outlier and heavy tail problems in order to produce robust inference. In this article, we developed a Bayesian approach that implemented the NI distributions on both continuous outcomes and random effects in the MLIRT models and discussed different strategies of implementing the NI distributions. Extensive simulation studies were conducted to demonstrate the advantage of our proposed models, which provided parameter estimates with smaller bias and more reasonable coverage probabilities. Our proposed models were applied to a motivating Parkinson's disease study, the DATATOP study, to investigate the effect of deprenyl in slowing down the disease progression.  相似文献   

18.
Within the pattern-mixture modeling framework for informative dropout, conditional linear models (CLMs) are a useful approach to deal with dropout that can occur at any point in continuous time (not just at observation times). However, in contrast with selection models, inferences about marginal covariate effects in CLMs are not readily available if nonidentity links are used in the mean structures. In this article, we propose a CLM for long series of longitudinal binary data with marginal covariate effects directly specified. The association between the binary responses and the dropout time is taken into account by modeling the conditional mean of the binary response as well as the dependence between the binary responses given the dropout time. Specifically, parameters in both the conditional mean and dependence models are assumed to be linear or quadratic functions of the dropout time; and the continuous dropout time distribution is left completely unspecified. Inference is fully Bayesian. We illustrate the proposed model using data from a longitudinal study of depression in HIV-infected women, where the strategy of sensitivity analysis based on the extrapolation method is also demonstrated.  相似文献   

19.
Roy J  Daniels MJ 《Biometrics》2008,64(2):538-545
Summary .   In this article we consider the problem of fitting pattern mixture models to longitudinal data when there are many unique dropout times. We propose a marginally specified latent class pattern mixture model. The marginal mean is assumed to follow a generalized linear model, whereas the mean conditional on the latent class and random effects is specified separately. Because the dimension of the parameter vector of interest (the marginal regression coefficients) does not depend on the assumed number of latent classes, we propose to treat the number of latent classes as a random variable. We specify a prior distribution for the number of classes, and calculate (approximate) posterior model probabilities. In order to avoid the complications with implementing a fully Bayesian model, we propose a simple approximation to these posterior probabilities. The ideas are illustrated using data from a longitudinal study of depression in HIV-infected women.  相似文献   

20.
Ostrovnaya I  Seshan VE  Begg CB 《Biometrics》2008,64(4):1018-1022
SUMMARY: In a recent article Begg et al. (2007, Biometrics 63, 522-530) proposed a statistical test to determine whether or not a diagnosed second primary tumor is biologically independent of the original primary tumor, by comparing patterns of allelic losses at candidate genetic loci. The proposed concordant mutations test is a conditional test, an adaptation of Fisher's exact test, that requires no knowledge of the marginal mutation probabilities. The test was shown to have generally good properties, but is susceptible to anticonservative bias if there is wide variation in mutation probabilities between loci, or if the individual mutation probabilities of the parental alleles for individual patients differ substantially from each other. In this article, a likelihood ratio test is derived in an effort to address these validity issues. This test requires prespecification of the marginal mutation probabilities at each locus, parameters for which some information will typically be available in the literature. In simulations this test is shown to be valid, but to be considerably less efficient than the concordant mutations test for sample sizes (numbers of informative loci) typical of this problem. Much of the efficiency deficit can be recovered, however, by restricting the allelic imbalance parameter estimate to a prespecified range, assuming that this parameter is in the prespecified range.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号