首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 625 毫秒
1.
Marginalized models (Heagerty, 1999, Biometrics 55, 688-698) permit likelihood-based inference when interest lies in marginal regression models for longitudinal binary response data. Two such models are the marginalized transition and marginalized latent variable models. The former captures within-subject serial dependence among repeated measurements with transition model terms while the latter assumes exchangeable or nondiminishing response dependence using random intercepts. In this article, we extend the class of marginalized models by proposing a single unifying model that describes both serial and long-range dependence. This model will be particularly useful in longitudinal analyses with a moderate to large number of repeated measurements per subject, where both serial and exchangeable forms of response correlation can be identified. We describe maximum likelihood and Bayesian approaches toward parameter estimation and inference, and we study the large sample operating characteristics under two types of dependence model misspecification. Data from the Madras Longitudinal Schizophrenia Study (Thara et al., 1994, Acta Psychiatrica Scandinavica 90, 329-336) are analyzed.  相似文献   

2.
Summary Growth curve data consist of repeated measurements of a continuous growth process over time in a population of individuals. These data are classically analyzed by nonlinear mixed models. However, the standard growth functions used in this context prescribe monotone increasing growth and can fail to model unexpected changes in growth rates. We propose to model these variations using stochastic differential equations (SDEs) that are deduced from the standard deterministic growth function by adding random variations to the growth dynamics. A Bayesian inference of the parameters of these SDE mixed models is developed. In the case when the SDE has an explicit solution, we describe an easily implemented Gibbs algorithm. When the conditional distribution of the diffusion process has no explicit form, we propose to approximate it using the Euler–Maruyama scheme. Finally, we suggest validating the SDE approach via criteria based on the predictive posterior distribution. We illustrate the efficiency of our method using the Gompertz function to model data on chicken growth, the modeling being improved by the SDE approach.  相似文献   

3.
Longitudinal data are frequently treated with the classic analysis of variance and regression models. However, these models assume independence of observations. Hoel (1964) demonstrated that the use of least-squares methods on intercorrelated serial observations results in the rejection of the null hypothesis much too frequently. Although appropriate models for analyzing longitudinal data have been available for quite some time, they have remained inaccessible due to cumbersome matrix manipulations. We implement Rao's (1959) one-sample polynomial growth curve model using the programming capability and matrix language of SAS, which involves testing the goodness-of-fit and calculation of confidence bands for polynomial growth curves fit to data at equally spaced time points. Confidence intervals for the parameters themselves are also computed. The method and program (presented in the Appendix) are illustrated with examples involving mandibular ramus height in 12 young male rhesus monkeys. The data set, which spans a 4 year period (yearly observations), is fit adequately by a quadratic equation. The data spanning a 2 year period (half-year observations) are fit adequately by the linear equation. These examples illustrate the considerable widening of confidence bands that occurs when polynomial equations having more terms than are needed to meet the goodness-of-fit requirement are considered.  相似文献   

4.
Huang X  Tebbs JM 《Biometrics》2009,65(3):710-718
Summary .  We consider structural measurement error models for a binary response. We show that likelihood-based estimators obtained from fitting structural measurement error models with pooled binary responses can be far more robust to covariate measurement error in the presence of latent-variable model misspecification than the corresponding estimators from individual responses. Furthermore, despite the loss in information, pooling can provide improved parameter estimators in terms of mean-squared error. Based on these and other findings, we create a new diagnostic method to detect latent-variable model misspecification in structural measurement error models with individual binary response. We use simulation and data from the Framingham Heart Study to illustrate our methods.  相似文献   

5.
A working guide to boosted regression trees   总被引:33,自引:0,他引:33  
1. Ecologists use statistical models for both explanation and prediction, and need techniques that are flexible enough to express typical features of their data, such as nonlinearities and interactions. 2. This study provides a working guide to boosted regression trees (BRT), an ensemble method for fitting statistical models that differs fundamentally from conventional techniques that aim to fit a single parsimonious model. Boosted regression trees combine the strengths of two algorithms: regression trees (models that relate a response to their predictors by recursive binary splits) and boosting (an adaptive method for combining many simple models to give improved predictive performance). The final BRT model can be understood as an additive regression model in which individual terms are simple trees, fitted in a forward, stagewise fashion. 3. Boosted regression trees incorporate important advantages of tree-based methods, handling different types of predictor variables and accommodating missing data. They have no need for prior data transformation or elimination of outliers, can fit complex nonlinear relationships, and automatically handle interaction effects between predictors. Fitting multiple trees in BRT overcomes the biggest drawback of single tree models: their relatively poor predictive performance. Although BRT models are complex, they can be summarized in ways that give powerful ecological insight, and their predictive performance is superior to most traditional modelling methods. 4. The unique features of BRT raise a number of practical issues in model fitting. We demonstrate the practicalities and advantages of using BRT through a distributional analysis of the short-finned eel (Anguilla australis Richardson), a native freshwater fish of New Zealand. We use a data set of over 13 000 sites to illustrate effects of several settings, and then fit and interpret a model using a subset of the data. We provide code and a tutorial to enable the wider use of BRT by ecologists.  相似文献   

6.
7.
Albert PS  Follmann DA  Wang SA  Suh EB 《Biometrics》2002,58(3):631-642
Longitudinal clinical trials often collect long sequences of binary data. Our application is a recent clinical trial in opiate addicts that examined the effect of a new treatment on repeated binary urine tests to assess opiate use over an extended follow-up. The dataset had two sources of missingness: dropout and intermittent missing observations. The primary endpoint of the study was comparing the marginal probability of a positive urine test over follow-up across treatment arms. We present a latent autoregressive model for longitudinal binary data subject to informative missingness. In this model, a Gaussian autoregressive process is shared between the binary response and missing-data processes, thereby inducing informative missingness. Our approach extends the work of others who have developed models that link the various processes through a shared random effect but do not allow for autocorrelation. We discuss parameter estimation using Monte Carlo EM and demonstrate through simulations that incorporating within-subject autocorrelation through a latent autoregressive process can be very important when longitudinal binary data is subject to informative missingness. We illustrate our new methodology using the opiate clinical trial data.  相似文献   

8.
In many observational studies, individuals are measured repeatedly over time, although not necessarily at a set of pre-specified occasions. Instead, individuals may be measured at irregular intervals, with those having a history of poorer health outcomes being measured with somewhat greater frequency and regularity. In this paper, we consider likelihood-based estimation of the regression parameters in marginal models for longitudinal binary data when the follow-up times are not fixed by design, but can depend on previous outcomes. In particular, we consider assumptions regarding the follow-up time process that result in the likelihood function separating into two components: one for the follow-up time process, the other for the outcome measurement process. The practical implication of this separation is that the follow-up time process can be ignored when making likelihood-based inferences about the marginal regression model parameters. That is, maximum likelihood (ML) estimation of the regression parameters relating the probability of success at a given time to covariates does not require that a model for the distribution of follow-up times be specified. However, to obtain consistent parameter estimates, the multinomial distribution for the vector of repeated binary outcomes must be correctly specified. In general, ML estimation requires specification of all higher-order moments and the likelihood for a marginal model can be intractable except in cases where the number of repeated measurements is relatively small. To circumvent these difficulties, we propose a pseudolikelihood for estimation of the marginal model parameters. The pseudolikelihood uses a linear approximation for the conditional distribution of the response at any occasion, given the history of previous responses. The appeal of this approximation is that the conditional distributions are functions of the first two moments of the binary responses only. When the follow-up times depend only on the previous outcome, the pseudolikelihood requires correct specification of the conditional distribution of the current outcome given the outcome at the previous occasion only. Results from a simulation study and a study of asymptotic bias are presented. Finally, we illustrate the main results using data from a longitudinal observational study that explored the cardiotoxic effects of doxorubicin chemotherapy for the treatment of acute lymphoblastic leukemia in children.  相似文献   

9.
In the field of pharmaceutical drug development, there have been extensive discussions on the establishment of statistically significant results that demonstrate the efficacy of a new treatment with multiple co‐primary endpoints. When designing a clinical trial with such multiple co‐primary endpoints, it is critical to determine the appropriate sample size for indicating the statistical significance of all the co‐primary endpoints with preserving the desired overall power because the type II error rate increases with the number of co‐primary endpoints. We consider overall power functions and sample size determinations with multiple co‐primary endpoints that consist of mixed continuous and binary variables, and provide numerical examples to illustrate the behavior of the overall power functions and sample sizes. In formulating the problem, we assume that response variables follow a multivariate normal distribution, where binary variables are observed in a dichotomized normal distribution with a certain point of dichotomy. Numerical examples show that the sample size decreases as the correlation increases when the individual powers of each endpoint are approximately and mutually equal.  相似文献   

10.
11.
Nonadherence to assigned treatment is common in randomized controlled trials (RCTs). Recently, there has been increased interest in estimating causal effects of treatment received, for example, the so‐called local average treatment effect (LATE). Instrumental variables (IV) methods can be used for identification, with estimation proceeding either via fully parametric mixture models or two‐stage least squares (TSLS). TSLS is popular but can be problematic for binary outcomes where the estimand of interest is a causal odds ratio. Mixture models are rarely used in practice, perhaps because of their perceived complexity and need for specialist software. Here, we propose using multiple imputation (MI) to impute the latent compliance class appearing in the mixture models. Since such models include an interaction term between the latent compliance class and randomized treatment, we use “substantive model compatible” MI (SMC MIC), which can additionally handle missing data in outcomes and other variables in the model, before fitting the mixture models via maximum likelihood to the MI data sets and combining results via Rubin's rules. We use simulations to compare the performance of SMC MIC to existing approaches and also illustrate the methods by reanalyzing an RCT in UK primary health. We show that SMC MIC can be more efficient than full Bayesian estimation when auxiliary variables are incorporated, and is superior to two‐stage methods, especially for binary outcomes.  相似文献   

12.
The weights used in iterative weighted least squares (IWLS) regression are usually estimated parametrically using a working model for the error variance. When the variance function is misspecified, the IWLS estimates of the regression coefficients β are still asymptotically consistent but there is some loss in efficiency. Since second moments can be quite hard to model, it makes sense to estimate the error variances nonparametrically and to employ weights inversely proportional to the estimated variances in computing the WLS estimate for β. Surprisingly, this approach had not received much attention in the literature. The aim of this note is to demonstrate that such a procedure can be implemented easily in S-plus using standard functions with default options making it suitable for routine applications. The particular smoothing method that we use is local polynomial regression applied to the logarithm of the squared residuals but other smoothers can be tried as well. The proposed procedure is applied to data on the use of two different assay methods for a hormone. Efficiency calculations based on the estimated model show that the nonparametric IWLS estimates are more efficient than the parametric IWLS estimates based on three different plausible working models for the variance function. The proposed estimators also perform well in a simulation study using both parametric and nonparametric variance functions as well as normal and gamma errors.  相似文献   

13.
Neuhaus JM  Scott AJ  Wild CJ 《Biometrics》2006,62(2):488-494
Case-control studies augmented by the values of responses and covariates from family members allow investigators to study the association between the response and genetics and environment by relating differences in the response directly to within-family differences in covariates. However, existing approaches for case-control family data parameterize covariate effects in terms of the marginal probability of response, the same effects that one estimates from standard case-control studies. This article focuses on the estimation of family-specific covariate effects and develops efficient methods to fit family-specific models such as binary mixed-effects models. We also extend the approach to cover any setting where one has a fully specified model for the vector of responses in a family. We illustrate our approach using data from a case-control family study of brain cancer and consider the use of weighted and conditional likelihood methods as alternatives.  相似文献   

14.
The relationship between the modern univariate mixed model for analyzing longitudinal data, popularized by Laird and Ware (1982, Biometrics 38, 963-974), and its predecessor, the classical multivariate growth curve model, summarized by Grizzle and Allen (1969, Biometrics 25, 357-381), has never been clearly established. Here, the link between the two methodologies is derived, and balanced polynomial and cosinor examples cited in the literature are analyzed with both approaches. Relating the two models demonstrates that classical covariance adjustment for higher-order terms is analogous to including them as random effects in the mixed model. The polynomial example clearly illustrates the relationship between the methodologies and shows their equivalence when all matrices are properly defined. The cosinor example demonstrates how results from each method may differ when the total variance-covariance matrix is positive definite, but that the between-subjects component of that matrix is not so constrained by the growth curve approach. Additionally, advocates of each approach tend to consider different covariance structures. Modern mixed model analysts consider only those terms in a model's expectation (or linear combinations), and preferably the most parsimonious subset, as candidates for random effects. Classical growth curve analysts automatically consider all terms in a model's expectation as random effects and then investigate whether "covariance adjusting" for higher-order terms improves the model. We apply mixed model techniques to cosinor analyses of a large, unbalanced data set to demonstrate the relevance of classical covariance structures that were previously conceived for use only with completely balanced data.  相似文献   

15.

Background  

In dynamical models with feedback and sigmoidal response functions, some or all variables have thresholds around which they regulate themselves or other variables. A mathematical analysis has shown that when the dose-response functions approach binary or on/off responses, any variable with an equilibrium value close to one of its thresholds is very robust to parameter perturbations of a homeostatic state. We denote this threshold robustness. To check the empirical relevance of this phenomenon with response function steepnesses ranging from a near on/off response down to Michaelis-Menten conditions, we have performed a simulation study to investigate the degree of threshold robustness in models for a three-gene system with one downstream gene, using several logical input gates, but excluding models with positive feedback to avoid multistationarity. Varying parameter values representing functional genetic variation, we have analysed the coefficient of variation (CV) of the gene product concentrations in the stable state for the regulating genes in absolute terms and compared to the CV for the unregulating downstream gene. The sigmoidal or binary dose-response functions in these models can be considered as phenomenological models of the aggregated effects on protein or mRNA expression rates of all cellular reactions involved in gene expression.  相似文献   

16.
Binary regression models for spatial data are commonly used in disciplines such as epidemiology and ecology. Many spatially referenced binary data sets suffer from location error, which occurs when the recorded location of an observation differs from its true location. When location error occurs, values of the covariates associated with the true spatial locations of the observations cannot be obtained. We show how a change of support (COS) can be applied to regression models for binary data to provide coefficient estimates when the true values of the covariates are unavailable, but the unknown location of the observations are contained within nonoverlapping arbitrarily shaped polygons. The COS accommodates spatial and nonspatial covariates and preserves the convenient interpretation of methods such as logistic and probit regression. Using a simulation experiment, we compare binary regression models with a COS to naive approaches that ignore location error. We illustrate the flexibility of the COS by modeling individual-level disease risk in a population using a binary data set where the locations of the observations are unknown but contained within administrative units. Our simulation experiment and data illustration corroborate that conventional regression models for binary data that ignore location error are unreliable, but that the COS can be used to eliminate bias while preserving model choice.  相似文献   

17.
Data arising from social systems is often highly complex, involving non-linear relationships between the macro-level variables that characterize these systems. We present a method for analyzing this type of longitudinal or panel data using differential equations. We identify the best non-linear functions that capture interactions between variables, employing Bayes factor to decide how many interaction terms should be included in the model. This method punishes overly complicated models and identifies models with the most explanatory power. We illustrate our approach on the classic example of relating democracy and economic growth, identifying non-linear relationships between these two variables. We show how multiple variables and variable lags can be accounted for and provide a toolbox in R to implement our approach.  相似文献   

18.
Wang YG 《Biometrics》1999,55(3):900-903
James (1991, Biometrics 47, 1519-1530) constructed unbiased estimating functions for estimating the two parameters in the von Bertalanffy growth curve from tag-recapture data. This paper provides unbiased estimating functions for a class of growth models that incorporate stochastic components and explanatory variables. A simulation study using seasonal growth models indicates that the proposed method works well while the least-squares methods that are commonly used in the literature may produce substantially biased estimates. The proposed model and method are also applied to real data from tagged rock lobsters to assess the possible seasonal effect on growth.  相似文献   

19.
Researchers usually estimate benchmark dose (BMD) for dichotomous experimental data using a binomial model with a single response function. Several forms of response function have been proposed to fit dose–response models to estimate the BMD and the corresponding benchmark dose lower bound (BMDL). However, if the assumed response function is not correct, then the estimated BMD and BMDL from the fitted model may not be accurate. To account for model uncertainty, model averaging (MA) methods are proposed to estimate BMD averaging over a model space containing a finite number of standard models. Usual model averaging focuses on a pre-specified list of parametric models leading to pitfalls when none of the models in the list is the correct model. Here, an alternative which augments an initial list of parametric models with an infinite number of additional models having varying response functions has been proposed to estimate BMD for dichotomous response data. In addition, different methods for estimating BMDL based on the family of response functions are derived. The proposed approach is compared with MA in a simulation study and applied to a real dataset. Simulation studies are also conducted to compare the four methods of estimating BMDL.  相似文献   

20.
The Cox proportional hazards model has become the standard for the analysis of survival time data in cancer and other chronic diseases. In most studies, proportional hazards (PH) are assumed for covariate effects. With long-term follow-up, the PH assumption may be violated, leading to poor model fit. To accommodate non-PH effects, we introduce a new procedure, MFPT, an extension of the multivariable fractional polynomial (MFP) approach, to do the following: (1) select influential variables; (2) determine a sensible dose-response function for continuous variables; (3) investigate time-varying effects; (4) model such time-varying effects on a continuous scale. Assuming PH initially, we start with a detailed model-building step, including a search for possible non-linear functions for continuous covariates. Sometimes a variable with a strong short-term effect may appear weak or non-influential if 'averaged' over time under the PH assumption. To protect against omitting such variables, we repeat the analysis over a restricted time-interval. Any additional prognostic variables identified by this second analysis are added to create our final time-fixed multivariable model. Using a forward-selection algorithm we search for possible improvements in fit by adding time-varying covariates. The first part to create a final time-fixed model does not require the use of MFP. A model may be given from 'outside' or a different strategy may be preferred for this part. This broadens the scope of the time-varying part. To motivate and illustrate the methodology, we create prognostic models from a large database of patients with primary breast cancer. Non-linear time-fixed effects are found for progesterone receptor status and number of positive lymph nodes. Highly statistically significant time-varying effects are present for progesterone receptor status and tumour size.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号