共查询到20条相似文献,搜索用时 0 毫秒
1.
A latent autoregressive model for longitudinal binary data subject to informative missingness 总被引:1,自引:0,他引:1
Longitudinal clinical trials often collect long sequences of binary data. Our application is a recent clinical trial in opiate addicts that examined the effect of a new treatment on repeated binary urine tests to assess opiate use over an extended follow-up. The dataset had two sources of missingness: dropout and intermittent missing observations. The primary endpoint of the study was comparing the marginal probability of a positive urine test over follow-up across treatment arms. We present a latent autoregressive model for longitudinal binary data subject to informative missingness. In this model, a Gaussian autoregressive process is shared between the binary response and missing-data processes, thereby inducing informative missingness. Our approach extends the work of others who have developed models that link the various processes through a shared random effect but do not allow for autocorrelation. We discuss parameter estimation using Monte Carlo EM and demonstrate through simulations that incorporating within-subject autocorrelation through a latent autoregressive process can be very important when longitudinal binary data is subject to informative missingness. We illustrate our new methodology using the opiate clinical trial data. 相似文献
2.
3.
Marginal regression analysis of a multivariate binary response 总被引:2,自引:0,他引:2
We propose the use of the mean parameter for regression analysisof a multivariate binary response. We model the associationusing dependence ratios defined in terms of the mean parameter,the components of which are the joint success probabilitiesof all orders. This permits flexible modelling of higher-orderassociations, using maximum likelihood estimation. We reanalysetwo data sets, one with variable cluster size and the othera longitudinal data set with constant cluster size. 相似文献
4.
5.
Models for a multivariate binary response are parameterized by univariate marginal probabilities and dependence ratios of all orders. The w-order dependence ratio is the joint success probability of w binary responses divided by the joint success probability assuming independence. This parameterization supports likelihood-based inference for both regression parameters, relating marginal probabilities to explanatory variables, and association model parameters, relating dependence ratios to simple and meaningful mechanisms. Five types of association models are proposed, where responses are (1) independent given a necessary factor for the possibility of a success, (2) independent given a latent binary factor, (3) independent given a latent beta distributed variable, (4) follow a Markov chain, and (5) follow one of two first-order Markov chains depending on the realization of a binary latent factor. These models are illustrated by reanalyzing three data sets, foremost a set of binary time series on auranofin therapy against arthritis. Likelihood-based approaches are contrasted with approaches based on generalized estimating equations. Association models specified by dependence ratios are contrasted with other models for a multivariate binary response that are specified by odds ratios or correlation coefficients. 相似文献
6.
This paper highlights the consequences of incomplete observations in the analysis of longitudinal binary data, in particular non-monotone missing data patterns. Sensitivity analysis is advocated and a method is proposed based on a log-linear model. A sensitivity parameter that represents the relationship between the response mechanism and the missing data mechanism is introduced. It is shown that although this parameter is identifiable, its estimation is highly questionable. A far better approach is to consider a range of plausible values and to estimate the parameters of interest conditionally upon each value of the sensitivity parameter. This allows us to assess the sensitivity of study's conclusion to assumptions regarding the missing data mechanism. The method is applied to a randomized clinical trial comparing the efficacy of two treatment regimens in patients with persistent asthma. 相似文献
7.
Marginalized models (Heagerty, 1999, Biometrics 55, 688-698) permit likelihood-based inference when interest lies in marginal regression models for longitudinal binary response data. Two such models are the marginalized transition and marginalized latent variable models. The former captures within-subject serial dependence among repeated measurements with transition model terms while the latter assumes exchangeable or nondiminishing response dependence using random intercepts. In this article, we extend the class of marginalized models by proposing a single unifying model that describes both serial and long-range dependence. This model will be particularly useful in longitudinal analyses with a moderate to large number of repeated measurements per subject, where both serial and exchangeable forms of response correlation can be identified. We describe maximum likelihood and Bayesian approaches toward parameter estimation and inference, and we study the large sample operating characteristics under two types of dependence model misspecification. Data from the Madras Longitudinal Schizophrenia Study (Thara et al., 1994, Acta Psychiatrica Scandinavica 90, 329-336) are analyzed. 相似文献
8.
A transitional model for longitudinal binary data subject to nonignorable missing data 总被引:1,自引:0,他引:1
Albert PS 《Biometrics》2000,56(2):602-608
Binary longitudinal data are often collected in clinical trials when interest is on assessing the effect of a treatment over time. Our application is a recent study of opiate addiction that examined the effect of a new treatment on repeated urine tests to assess opiate use over an extended follow-up. Drug addiction is episodic, and a new treatment may affect various features of the opiate-use process such as the proportion of positive urine tests over follow-up and the time to the first occurrence of a positive test. Complications in this trial were the large amounts of dropout and intermittent missing data and the large number of observations on each subject. We develop a transitional model for longitudinal binary data subject to nonignorable missing data and propose an EM algorithm for parameter estimation. We use the transitional model to derive summary measures of the opiate-use process that can be compared across treatment groups to assess treatment effect. Through analyses and simulations, we show the importance of properly accounting for the missing data mechanism when assessing the treatment effect in our example. 相似文献
9.
Bayesian informative dropout model for longitudinal binary data with random effects using conditional and joint modeling approaches 下载免费PDF全文
Jennifer S. K. Chan 《Biometrical journal. Biometrische Zeitschrift》2016,58(3):549-569
Dropouts are common in longitudinal study. If the dropout probability depends on the missing observations at or after dropout, this type of dropout is called informative (or nonignorable) dropout (ID). Failure to accommodate such dropout mechanism into the model will bias the parameter estimates. We propose a conditional autoregressive model for longitudinal binary data with an ID model such that the probabilities of positive outcomes as well as the drop‐out indicator in each occasion are logit linear in some covariates and outcomes. This model adopting a marginal model for outcomes and a conditional model for dropouts is called a selection model. To allow for the heterogeneity and clustering effects, the outcome model is extended to incorporate mixture and random effects. Lastly, the model is further extended to a novel model that models the outcome and dropout jointly such that their dependency is formulated through an odds ratio function. Parameters are estimated by a Bayesian approach implemented using the user‐friendly Bayesian software WinBUGS. A methadone clinic dataset is analyzed to illustrate the proposed models. Result shows that the treatment time effect is still significant but weaker after allowing for an ID process in the data. Finally the effect of drop‐out on parameter estimates is evaluated through simulation studies. 相似文献
10.
Summary . Many longitudinal studies generate both the time to some event of interest and repeated measures data. This article is motivated by a study on patients with a renal allograft, in which interest lies in the association between longitudinal proteinuria (a dichotomous variable) measurements and the time to renal graft failure. An interesting feature of the sample at hand is that nearly half of the patients were never tested positive for proteinuria (≥1g/day) during follow-up, which introduces a degenerate part in the random-effects density for the longitudinal process. In this article we propose a two-part shared parameter model framework that effectively takes this feature into account, and we investigate sensitivity to the various dependence structures used to describe the association between the longitudinal measurements of proteinuria and the time to renal graft failure. 相似文献
11.
When novel scientific questions arise after longitudinal binary data have been collected, the subsequent selection of subjects from the cohort for whom further detailed assessment will be undertaken is often necessary to efficiently collect new information. Key examples of additional data collection include retrospective questionnaire data, novel data linkage, or evaluation of stored biological specimens. In such cases, all data required for the new analyses are available except for the new target predictor or exposure. We propose a class of longitudinal outcome-dependent sampling schemes and detail a design corrected conditional maximum likelihood analysis for highly efficient estimation of time-varying and time-invariant covariate coefficients when resource limitations prohibit exposure ascertainment on all participants. Additionally, we detail an important study planning phase that exploits available cohort data to proactively examine the feasibility of any proposed substudy as well as to inform decisions regarding the most desirable study design. The proposed designs and associated analyses are discussed in the context of a study that seeks to examine the modifying effect of an interleukin-10 cytokine single nucleotide polymorphism on asthma symptom regression in adolescents participating Childhood Asthma Management Program Continuation Study. Using this example we assume that all data necessary to conduct the study are available except subject-specific genotype data. We also assume that these data would be ascertained by analyzing stored blood samples, the cost of which limits the sample size. 相似文献
12.
13.
14.
A simulation was carried out to investigate the methods of analyzing uncertain binary responses for success or failure at first insemination. A linear mixed model that included, herd, year, and month of mating as fixed effects; and unrelated service sire, sire and residual as random effects was used to generate binary data. Binary responses were assigned using the difference between days to calving and average gestation length. Females deviating from average gestation length lead to uncertain binary responses. Thus, the methods investigated were the following: (1) a threshold model fitted to certain (no uncertainty) binary data (M1); (2) a threshold model fitted to uncertain binary data ignoring uncertainty (M2); and (3) analysis of uncertain binary data, accounting for uncertainty from day 16 to 26 (M3) or from day 14 to 28 (M4) after introduction of the bull, using a threshold model with fuzzy logic classification. There was virtually no difference between point estimates obtained from M1, M3, and M4 with true values. When uncertain binary data were analyzed ignoring uncertainty (M2), sire variance and heritability were underestimated by 22 and 24%, respectively. Thus, for noisy binary data, a threshold model contemplating uncertainty is needed to avoid bias when estimating genetic parameters. 相似文献
15.
Reilly C 《Biostatistics (Oxford, England)》2005,6(2):271-278
Here we develop a completely nonparametric method for comparing two groups on a set of longitudinal measurements. No assumptions are made about the form of the mean response function, the covariance structure or the distributional form of disturbances around the mean response function. The solution proposed here is based on the realization that every longitudinal data set can also be thought of as a collection of survival data sets where the events of interest are level crossings. The method for testing for differences in the longitudinal measurements then is as follows: for an arbitrarily large set of levels, for each subject determine the first time the subject has an upcrossing and a downcrossing for each level. For each level one then computes the log rank statistic and uses the maximum in absolute value of all these statistics as the test statistic. By permuting group labels we obtain a permutation test of the hypothesis that the joint distribution of the measurements over time does not depend on group membership. Simulations are performed to investigate the power and it is applied to the area that motivated the method-the analysis of microarrays. In this area small sample sizes, few time points and far too many genes to consider genuine gene level longitudinal modeling have created a need for a simple, model free test to screen for interesting features in the data. 相似文献
16.
Reliable estimates of past land cover are critical for assessing potential effects of anthropogenic land-cover changes on past earth surface-climate feedbacks and landscape complexity. Fossil pollen records from lakes and bogs have provided important information on past natural and human-induced vegetation cover. However, those records provide only point estimates of past land cover, and not the spatially continuous maps at regional and sub-continental scales needed for climate modelling.We propose a set of statistical models that create spatially continuous maps of past land cover by combining two data sets: 1) pollen-based point estimates of past land cover (from the REVEALS model) and 2) spatially continuous estimates of past land cover, obtained by combining simulated potential vegetation (from LPJ-GUESS) with an anthropogenic land-cover change scenario (KK10). The proposed models rely on statistical methodology for compositional data and use Gaussian Markov Random Fields to model spatial dependencies in the data.Land-cover reconstructions are presented for three time windows in Europe: 0.05, 0.2, and 6 ka years before present (BP). The models are evaluated through cross-validation, deviance information criteria and by comparing the reconstruction of the 0.05 ka time window to the present-day land-cover data compiled by the European Forest Institute (EFI). For 0.05 ka, the proposed models provide reconstructions that are closer to the EFI data than either the REVEALS- or LPJ-GUESS/KK10-based estimates; thus the statistical combination of the two estimates improves the reconstruction. The reconstruction by the proposed models for 0.2 ka is also good. For 6 ka, however, the large differences between the REVEALS- and LPJ-GUESS/KK10-based estimates reduce the reliability of the proposed models. Possible reasons for the increased differences between REVEALS and LPJ-GUESS/KK10 for older time periods and further improvement of the proposed models are discussed. 相似文献
17.
Inference of population structure from genetic data plays an important role in population and medical genetics studies. With the advancement and decreasing cost of sequencing technology, the increasingly available whole genome sequencing data provide much richer information about the underlying population structure. The traditional method originally developed for array-based genotype data for computing and selecting top principal components (PCs) that capture population structure may not perform well on sequencing data for two reasons. First, the number of genetic variants p is much larger than the sample size n in sequencing data such that the sample-to-marker ratio is nearly zero, violating the assumption of the Tracy-Widom test used in their method. Second, their method might not be able to handle the linkage disequilibrium well in sequencing data. To resolve those two practical issues, we propose a new method called ERStruct to determine the number of top informative PCs based on sequencing data. More specifically, we propose to use the ratio of consecutive eigenvalues as a more robust test statistic, and then we approximate its null distribution using modern random matrix theory. Both simulation studies and applications to two public data sets from the HapMap 3 and the 1000 Genomes Projects demonstrate the empirical performance of our ERStruct method. 相似文献
18.
An earlier analysis of the trnL intron in the Colletieae (Rhamnaceae) showed polyphyly of the genus Discaria. Polyphyly of Discaria is supported only by an AT-rich region of ambiguous alignment within the trnL intron. Polyphyly of the genus relies on extracting the information of the AT-rich region correctly. Ambiguously aligned regions are commonly excluded from phylogenetic analysis. In the present study the question was raised whether random or noisy data could generate a pattern like the one found in the AT-rich region of ambiguous alignment. The original pattern was resistant to changes in alignment parameter cost when submitted to a sensitivity analysis using direct optimization. Artificially generated random or noisy data gave well-resolved trees but these were found to be extremely sensitive to changes in parameter costs. However, information from additional data, such as conserved regions, restricts the influence of random data. It is here suggested that the information in ambiguously aligned regions need not be dismissed, provided that an appropriate method that finds all possible optimal alignments is used to extract the information. In addition to commonly used support measures, some information of robustness to changes in alignment parameter costs is needed in order to make the most reliable conclusions. 相似文献
19.
Several different methods of analysis are applied to data consisting of weight measurements, taken at specified post-treatment times, of harvested thyroids from rats given one of four treatments. Previous studies of this type of data indicated that the growth is initially rapid, and that a second phase of less rapid growth is followed by a final phase in which little additional growth occurs. The data are further characterized by increasing variance through time. The primary purpose of the analysis is to study the effect of the treatments at the end of the study period. One-way analysis of variance tests among groups are performed on each day, but the results are not particularly helpful. However, results from two-way analyses of variance (over subsets of days and groups) are consistent with the three phase model and accordingly indicate significant group differences during each. Finally, maximum likelihood methods are used to fit a three part segmented linear regression model. 相似文献
20.
The Extracellular signal Regulated Kinase (ERK) pathway is one of the most well-studied signaling pathways in cell cycle regulation. Disruption in the normal functioning of this pathway is linked to many forms of cancer. In a previous study [D.K. Pant, A. Ghosh, Automated oncogene detection in complex protein networks, with applications to the MAPK signal transduction pathway, Biophys. Chem. 113 (2005) 275-288.], we developed a novel approach to predict single point mutations that are likely to cause cellular transformation in signaling transduction networks. We have extended this method to study disparate pair mutation in enzyme/protein interactions and in expression levels in signal transduction pathway and have applied it to the MAPK signaling pathway to study how synergistic or cooperative mutation within signaling networks acts in unison to cause malignant transformation. The method provides a quantitative ranking of the modifier pair of ERK activation. It is seen that the highest ranking single point mutations comprise the highest ranking pair mutations. We validate some of our results with experimental literature on multiple mutations. A second order sensitivity analysis scheme is additionally used to determine the effect of correlations among mutations at different sites in the pathways. 相似文献