期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Informative cluster sizes for subcluster-level covariates and weighted generalized estimating equations

Huang Y Leroux B 《Biometrics》2011,67(3):843-851

Summary Williamson, Datta, and Satten's (2003, Biometrics 59 , 36–42) cluster‐weighted generalized estimating equations (CWGEEs) are effective in adjusting for bias due to informative cluster sizes for cluster‐level covariates. We show that CWGEE may not perform well, however, for covariates that can take different values within a cluster if the numbers of observations at each covariate level are informative. On the other hand, inverse probability of treatment weighting accounts for informative treatment propensity but not for informative cluster size. Motivated by evaluating the effect of a binary exposure in presence of such types of informativeness, we propose several weighted GEE estimators, with weights related to the size of a cluster as well as the distribution of the binary exposure within the cluster. Choice of the weights depends on the population of interest and the nature of the exposure. Through simulation studies, we demonstrate the superior performance of the new estimators compared to existing estimators such as from GEE, CWGEE, and inverse probability of treatment‐weighted GEE. We demonstrate the use of our method using an example examining covariate effects on the risk of dental caries among small children. 相似文献

2.

Marginal analysis of multiple outcomes with informative cluster size

A. A. Mitani E. K. Kaye K. P. Nelson 《Biometrics》2021,77(1):271-282

In surveillance studies of periodontal disease, the relationship between disease and other health and socioeconomic conditions is of key interest. To determine whether a patient has periodontal disease, multiple clinical measurements (eg, clinical attachment loss, alveolar bone loss, and tooth mobility) are taken at the tooth‐level. Researchers often create a composite outcome from these measurements or analyze each outcome separately. Moreover, patients have varying number of teeth, with those who are more prone to the disease having fewer teeth compared to those with good oral health. Such dependence between the outcome of interest and cluster size (number of teeth) is called informative cluster size and results obtained from fitting conventional marginal models can be biased. We propose a novel method to jointly analyze multiple correlated binary outcomes for clustered data with informative cluster size using the class of generalized estimating equations (GEE) with cluster‐specific weights. We compare our proposed multivariate outcome cluster‐weighted GEE results to those from the convectional GEE using the baseline data from Veterans Affairs Dental Longitudinal Study. In an extensive simulation study, we show that our proposed method yields estimates with minimal relative biases and excellent coverage probabilities. 相似文献

3.

Relative efficiency of unequal versus equal cluster sizes in cluster randomized trials using generalized estimating equation models

下载免费PDF全文

Jingxia Liu Graham A. Colditz 《Biometrical journal. Biometrische Zeitschrift》2018,60(3):616-638

There is growing interest in conducting cluster randomized trials (CRTs). For simplicity in sample size calculation, the cluster sizes are assumed to be identical across all clusters. However, equal cluster sizes are not guaranteed in practice. Therefore, the relative efficiency (RE) of unequal versus equal cluster sizes has been investigated when testing the treatment effect. One of the most important approaches to analyze a set of correlated data is the generalized estimating equation (GEE) proposed by Liang and Zeger, in which the “working correlation structure” is introduced and the association pattern depends on a vector of association parameters denoted by ρ. In this paper, we utilize GEE models to test the treatment effect in a two‐group comparison for continuous, binary, or count data in CRTs. The variances of the estimator of the treatment effect are derived for the different types of outcome. RE is defined as the ratio of variance of the estimator of the treatment effect for equal to unequal cluster sizes. We discuss a commonly used structure in CRTs—exchangeable, and derive the simpler formula of RE with continuous, binary, and count outcomes. Finally, REs are investigated for several scenarios of cluster size distributions through simulation studies. We propose an adjusted sample size due to efficiency loss. Additionally, we also propose an optimal sample size estimation based on the GEE models under a fixed budget for known and unknown association parameter (ρ) in the working correlation structure within the cluster. 相似文献

4.

Marginal analysis of correlated failure time data with informative cluster sizes

Cong XJ Yin G Shen Y 《Biometrics》2007,63(3):663-672

We consider modeling correlated survival data when cluster sizes may be informative to the outcome of interest based on a within-cluster resampling (WCR) approach and a weighted score function (WSF) method. We derive the large sample properties for the WCR estimators under the Cox proportional hazards model. We establish consistency and asymptotic normality of the regression coefficient estimators, and the weak convergence property of the estimated baseline cumulative hazard function. The WSF method is to incorporate the inverse of cluster sizes as weights in the score function. We conduct simulation studies to assess and compare the finite-sample behaviors of the estimators and apply the proposed methods to a dental study as an illustration. 相似文献

5.

K‐Sample Test and Sample Size Calculation for Comparing Slopes in Data with Repeated Measurements

Sin‐Ho Jung Chul Ahn 《Biometrical journal. Biometrische Zeitschrift》2004,46(5):554-564

Sample size calculations based on two‐sample comparisons of slopes in repeated measurements have been reported by many investigators. In contrast, the literature has paid relatively little attention to the design and analysis of K‐sample trials in repeated measurements studies where K is 3 or greater. Jung and Ahn (2003) derived a closed sample size formula for two‐sample comparisons of slopes by taking into account the impact of missing data. We extend their method to compare K‐sample slopes in repeated measurement studies using the generalized estimating equation (GEE) approach based on independent working correlation structure. We investigate the performance of the sample size formula since the sample size formula is based on asymptotic theory. The proposed sample size formula is illustrated using a clinical trial example. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim) 相似文献

6.

A mixed model-based variance estimator for marginal model analyses of cluster randomized trials

Braun TM 《Biometrical journal. Biometrische Zeitschrift》2007,49(3):394-405

Generalized estimating equations (GEE) are used in the analysis of cluster randomized trials (CRTs) because: 1) the resulting intervention effect estimate has the desired marginal or population-averaged interpretation, and 2) most statistical packages contain programs for GEE. However, GEE tends to underestimate the standard error of the intervention effect estimate in CRTs. In contrast, penalized quasi-likelihood (PQL) estimates the standard error of the intervention effect in CRTs much better than GEE but is used less frequently because: 1) it generates an intervention effect estimate with a conditional, or cluster-specific, interpretation, and 2) PQL is not a part of most statistical packages. We propose taking the variance estimator from PQL and re-expressing it as a sandwich-type estimator that could be easily incorporated into existing GEE packages, thereby making GEE useful for the analysis of CRTs. Using numerical examples and data from an actual CRT, we compare the performance of this variance estimator to others proposed in the literature, and we find that our variance estimator performs as well as or better than its competitors. 相似文献

7.

Estimating marginal proportions and intraclass correlations with clustered binary data

Josep L. Carrasco Yi Pan Rosa Abellana 《Biometrical journal. Biometrische Zeitschrift》2019,61(3):574-599

A logistic regression with random effects model is commonly applied to analyze clustered binary data, and every cluster is assumed to have a different proportion of success. However, it could be of interest to obtain the proportion of success over clusters (i.e. the marginal proportion of success). Furthermore, the degree of correlation among data of the same cluster (intraclass correlation) is also a relevant concept to assess, but when using logistic regression with random effects it is not possible to get an analytical expression of the estimators for marginal proportion and intraclass correlation. In our paper, we assess and compare approaches using different kinds of approximations: based on the logistic‐normal mixed effects model (LN), linear mixed model (LMM), and generalized estimating equations (GEE). The comparisons are completed by using two real data examples and a simulation study. The results show the performance of the approaches strongly depends on the magnitude of the marginal proportion, the intraclass correlation, and the sample size. In general, the reliability of the approaches get worsen with low marginal proportion and large intraclass correlation. LMM and GEE approaches arises as reliable approaches when the sample size is large. 相似文献

8.

Variable selection for marginal longitudinal generalized linear models

Cantoni E Flemming JM Ronchetti E 《Biometrics》2005,61(2):507-514

Variable selection is an essential part of any statistical analysis and yet has been somewhat neglected in the context of longitudinal data analysis. In this article, we propose a generalized version of Mallows's C(p) (GC(p)) suitable for use with both parametric and nonparametric models. GC(p) provides an estimate of a measure of model's adequacy for prediction. We examine its performance with popular marginal longitudinal models (fitted using GEE) and contrast results with what is typically done in practice: variable selection based on Wald-type or score-type tests. An application to real data further demonstrates the merits of our approach while at the same time emphasizing some important robust features inherent to GC(p). 相似文献

9.

Extended generalized estimating equations for binary familial data with incomplete families

FitzGerald PE 《Biometrics》2002,58(4):718-726

In this article, we assess the performance of two standard, but naive, methods for handling incomplete familial data in GEE2 analyses when the outcome is binary. We also propose a new method for analyzing such data using GEE2 when explanatory variables are discrete. Unlike the naive methods, the new method does not require the missing data process to be ignorable. We illustrate our method with an example that examines the familial aggregation of obesity. 相似文献

10.

Optimal design of longitudinal data analysis using generalized estimating equation models

Jingxia Liu Graham A. Colditz 《Biometrical journal. Biometrische Zeitschrift》2017,59(2):315-330

Longitudinal studies are often applied in biomedical research and clinical trials to evaluate the treatment effect. The association pattern within the subject must be considered in both sample size calculation and the analysis. One of the most important approaches to analyze such a study is the generalized estimating equation (GEE) proposed by Liang and Zeger, in which “working correlation structure” is introduced and the association pattern within the subject depends on a vector of association parameters denoted by ρ. The explicit sample size formulas for two‐group comparison in linear and logistic regression models are obtained based on the GEE method by Liu and Liang. For cluster randomized trials (CRTs), researchers proposed the optimal sample sizes at both the cluster and individual level as a function of sampling costs and the intracluster correlation coefficient (ICC). In these approaches, the optimal sample sizes depend strongly on the ICC. However, the ICC is usually unknown for CRTs and multicenter trials. To overcome this shortcoming, Van Breukelen et al. consider a range of possible ICC values identified from literature reviews and present Maximin designs (MMDs) based on relative efficiency (RE) and efficiency under budget and cost constraints. In this paper, the optimal sample size and number of repeated measurements using GEE models with an exchangeable working correlation matrix is proposed under the considerations of fixed budget, where “optimal” refers to maximum power for a given sampling budget. The equations of sample size and number of repeated measurements for a known parameter value ρ are derived and a straightforward algorithm for unknown ρ is developed. Applications in practice are discussed. We also discuss the existence of the optimal design when an AR(1) working correlation matrix is assumed. Our proposed method can be extended under the scenarios when the true and working correlation matrix are different. 相似文献

11.

Power calculation for cross-sectional stepped wedge cluster randomized trials with variable cluster sizes

Linda J Harrison Tom Chen Rui Wang 《Biometrics》2020,76(3):951-962

Standard sample size calculation formulas for stepped wedge cluster randomized trials (SW-CRTs) assume that cluster sizes are equal. When cluster sizes vary substantially, ignoring this variation may lead to an under-powered study. We investigate the relative efficiency of a SW-CRT with varying cluster sizes to equal cluster sizes, and derive variance estimators for the intervention effect that account for this variation under a mixed effects model—a commonly used approach for analyzing data from cluster randomized trials. When cluster sizes vary, the power of a SW-CRT depends on the order in which clusters receive the intervention, which is determined through randomization. We first derive a variance formula that corresponds to any particular realization of the randomized sequence and propose efficient algorithms to identify upper and lower bounds of the power. We then obtain an “expected” power based on a first-order approximation to the variance formula, where the expectation is taken with respect to all possible randomization sequences. Finally, we provide a variance formula for more general settings where only the cluster size arithmetic mean and coefficient of variation, instead of exact cluster sizes, are known in the design stage. We evaluate our methods through simulations and illustrate that the average power of a SW-CRT decreases as the variation in cluster sizes increases, and the impact is largest when the number of clusters is small. 相似文献

12.

A covariance estimator for GEE with improved small-sample properties 总被引：2，自引：0，他引：2

Mancl LA DeRouen TA 《Biometrics》2001,57(1):126-134

In this paper, we propose an alternative covariance estimator to the robust covariance estimator of generalized estimating equations (GEE). Hypothesis tests using the robust covariance estimator can have inflated size when the number of independent clusters is small. Resampling methods, such as the jackknife and bootstrap, have been suggested for covariance estimation when the number of clusters is small. A drawback of the resampling methods when the response is binary is that the methods can break down when the number of subjects is small due to zero or near-zero cell counts caused by resampling. We propose a bias-corrected covariance estimator that avoids this problem. In a small simulation study, we compare the bias-corrected covariance estimator to the robust and jackknife covariance estimators for binary responses for situations involving 10-40 subjects with equal and unequal cluster sizes of 16-64 observations. The bias-corrected covariance estimator gave tests with sizes close to the nominal level even when the number of subjects was 10 and cluster sizes were unequal, whereas the robust and jackknife covariance estimators gave tests with sizes that could be 2-3 times the nominal level. The methods are illustrated using data from a randomized clinical trial on treatment for bone loss in subjects with periodontal disease. 相似文献

13.

Multiple outputation: inference for complex clustered data by averaging analyses from independent data

Follmann D Proschan M Leifer E 《Biometrics》2003,59(2):420-429

This article applies a simple method for settings where one has clustered data, but statistical methods are only available for independent data. We assume the statistical method provides us with a normally distributed estimate, theta, and an estimate of its variance sigma. We randomly select a data point from each cluster and apply our statistical method to this independent data. We repeat this multiple times, and use the average of the associated theta's as our estimate. An estimate of the variance is given by the average of the sigma2's minus the sample variance of the theta's. We call this procedure multiple outputation, as all "excess" data within each cluster is thrown out multiple times. Hoffman, Sen, and Weinberg (2001, Biometrika 88, 1121-1134) introduced this approach for generalized linear models when the cluster size is related to outcome. In this article, we demonstrate the broad applicability of the approach. Applications to angular data, p-values, vector parameters, Bayesian inference, genetics data, and random cluster sizes are discussed. In addition, asymptotic normality of estimates based on all possible outputations, as well as a finite number of outputations, is proven given weak conditions. Multiple outputation provides a simple and broadly applicable method for analyzing clustered data. It is especially suited to settings where methods for clustered data are impractical, but can also be applied generally as a quick and simple tool. 相似文献

14.

Sample Size Considerations for GEE Analyses of Three‐Level Cluster Randomized Trials

Steven Teerenstra Bing Lu John S. Preisser Theo van Achterberg George F. Borm 《Biometrics》2010,66(4):1230-1237

Summary Cluster randomized trials in health care may involve three instead of two levels, for instance, in trials where different interventions to improve quality of care are compared. In such trials, the intervention is implemented in health care units (“clusters”) and aims at changing the behavior of health care professionals working in this unit (“subjects”), while the effects are measured at the patient level (“evaluations”). Within the generalized estimating equations approach, we derive a sample size formula that accounts for two levels of clustering: that of subjects within clusters and that of evaluations within subjects. The formula reveals that sample size is inflated, relative to a design with completely independent evaluations, by a multiplicative term that can be expressed as a product of two variance inflation factors, one that quantifies the impact of within‐subject correlation of evaluations on the variance of subject‐level means and the other that quantifies the impact of the correlation between subject‐level means on the variance of the cluster means. Power levels as predicted by the sample size formula agreed well with the simulated power for more than 10 clusters in total, when data were analyzed using bias‐corrected estimating equations for the correlation parameters in combination with the model‐based covariance estimator or the sandwich estimator with a finite sample correction. 相似文献

15.

A signed-rank test for clustered data

Datta S Satten GA 《Biometrics》2008,64(2):501-507

Summary . We consider the problem of comparing two outcome measures when the pairs are clustered. Using the general principle of within-cluster resampling, we obtain a novel signed-rank test for clustered paired data. We show by a simple informative cluster size simulation model that only our test maintains the correct size under a null hypothesis of marginal symmetry compared to four other existing signed rank tests; further, our test has adequate power when cluster size is noninformative. In general, cluster size is informative if the distribution of pair-wise differences within a cluster depends on the cluster size. An application of our method to testing radiation toxicity trend is presented. 相似文献

16.

The Determination of Sample Sizes in the Comparison of Two Multinomial Proportions from Ordered Categories

Myoung‐Keun Lee Hae‐Hiang Song Seung‐Ho Kang Chul W. Ahn 《Biometrical journal. Biometrische Zeitschrift》2002,44(4):395-409

We consider sample size determination for ordered categorical data when the alternative assumption is the proportional odds model. In this paper the sample size formula proposed by Whitehead (Statistics in Medicine, 12 , 2257–2271, 1993) is compared with the methods based on exact and asymptotic linear rank tests with Wilcoxon and trend scores. We show that Whitehead's formula, which is based on a normal approximation, works well when the sample size is moderate to large but recommend the exact method with Wilcoxon scores for small sample sizes. The consequences of misspecification in models are also investigated. 相似文献

17.

Characterizing monoclonal antibody structure by carbodiimide/GEE footprinting

Parminder Kaur Sara Tomechko Janna Kiselar Wuxian Shi Galahad Deperalta Aaron T Wecksler Giridharan Gokulrangan Victor Ling Mark R Chance 《MABS-AUSTIN》2014,6(6):1486-1499

Amino acid-specific covalent labeling is well suited to probe protein structure and macromolecular interactions, especially for macromolecules and their complexes that are difficult to examine by alternative means, due to size, complexity, or instability. Here we present a detailed account of carbodiimide-based covalent labeling (with GEE tagging) applied to a glycosylated monoclonal antibody therapeutic, which represents an important class of biologic drugs. Characterization of such proteins and their antigen complexes is essential to development of new biologic-based medicines. In this study, the experiments were optimized to preserve the structural integrity of the protein, and experimental conditions were varied and replicated to establish the reproducibility and precision of the technique. Homology-based models were generated and used to compare the solvent accessibility of the labeled residues, which include D, E, and the C-terminus, against the experimental surface accessibility data in order to understand the accuracy of the approach in providing an unbiased assessment of structure. Data from the protein were also compared to reactivity measures of several model peptides to explain sequence or structure-based variations in reactivity. The results highlight several advantages of this approach. These include: the ease of use at the bench top, the linearity of the dose response plots at high levels of labeling (indicating that the label does not significantly perturb the structure of the protein), the high reproducibility of replicate experiments (<2 % variation in modification extent), the similar reactivity of the 3 target probe residues (as suggested by analysis of model peptides), and the overall positive and significant correlation of reactivity and solvent accessible surface area (the latter values predicted by the homology modeling). Attenuation of reactivity, in otherwise solvent accessible probes, is documented as arising from the effects of positive charge or bond formation between adjacent amine and carboxyl groups, the latter accompanied by observed water loss. The results are also compared with data from hydroxyl radical-mediated oxidative footprinting on the same protein, showing that complementary information is gained from the 2 approaches, although the number of target residues in carbodiimide/GEE labeling is fewer. Overall, this approach is an accurate and precise method for assessing protein structure of biologic drugs. 相似文献

18.

Regularized Sandwich Estimators for Analysis of High‐Dimensional Data Using Generalized Estimating Equations

David I. Warton 《Biometrics》2011,67(1):116-123

Summary A modification of generalized estimating equations (GEEs) methodology is proposed for hypothesis testing of high‐dimensional data, with particular interest in multivariate abundance data in ecology, an important application of interest in thousands of environmental science studies. Such data are typically counts characterized by high dimensionality (in the sense that cluster size exceeds number of clusters, n>K) and over‐dispersion relative to the Poisson distribution. Usual GEE methods cannot be applied in this setting primarily because sandwich estimators become numerically unstable as n increases. We propose instead using a regularized sandwich estimator that assumes a common correlation matrix R , and shrinks the sample estimate of R toward the working correlation matrix to improve its numerical stability. It is shown via theory and simulation that this substantially improves the power of Wald statistics when cluster size is not small. We apply the proposed approach to study the effects of nutrient addition on nematode communities, and in doing so discuss important issues in implementation, such as using statistics that have good properties when parameter estimates approach the boundary (), and using resampling to enable valid inference that is robust to high dimensionality and to possible model misspecification. 相似文献

19.

Characterizing monoclonal antibody structure by carbodiimide/GEE footprinting

《MABS-AUSTIN》2013,5(6):1486-1499

Amino acid-specific covalent labeling is well suited to probe protein structure and macromolecular interactions, especially for macromolecules and their complexes that are difficult to examine by alternative means, due to size, complexity, or instability. Here we present a detailed account of carbodiimide-based covalent labeling (with GEE tagging) applied to a glycosylated monoclonal antibody therapeutic, which represents an important class of biologic drugs. Characterization of such proteins and their antigen complexes is essential to development of new biologic-based medicines. In this study, the experiments were optimized to preserve the structural integrity of the protein, and experimental conditions were varied and replicated to establish the reproducibility and precision of the technique. Homology-based models were generated and used to compare the solvent accessibility of the labeled residues, which include D, E, and the C-terminus, against the experimental surface accessibility data in order to understand the accuracy of the approach in providing an unbiased assessment of structure. Data from the protein were also compared to reactivity measures of several model peptides to explain sequence or structure-based variations in reactivity. The results highlight several advantages of this approach. These include: the ease of use at the bench top, the linearity of the dose response plots at high levels of labeling (indicating that the label does not significantly perturb the structure of the protein), the high reproducibility of replicate experiments (<2 % variation in modification extent), the similar reactivity of the 3 target probe residues (as suggested by analysis of model peptides), and the overall positive and significant correlation of reactivity and solvent accessible surface area (the latter values predicted by the homology modeling). Attenuation of reactivity, in otherwise solvent accessible probes, is documented as arising from the effects of positive charge or bond formation between adjacent amine and carboxyl groups, the latter accompanied by observed water loss. The results are also compared with data from hydroxyl radical-mediated oxidative footprinting on the same protein, showing that complementary information is gained from the 2 approaches, although the number of target residues in carbodiimide/GEE labeling is fewer. Overall, this approach is an accurate and precise method for assessing protein structure of biologic drugs. 相似文献

20.

Efficiency of regression estimates for clustered data 总被引：1，自引：0，他引：1

Mancl LA Leroux BG 《Biometrics》1996,52(2):500-511

Statistical methods for clustered data, such as generalized estimating equations (GEE) and generalized least squares (GLS), require selecting a correlation or convariance structure to specify the dependence between observations within a cluster. Valid regression estimates can be obtained that do not depend on correct specification of the true correlation, but inappropriate specifications can result in a loss of efficiency. We derive general expressions for the asymptotic relative efficiency of GEE and GLS estimators under nested correlation structures. Efficiency is shown to depend on the covariate distribution, the cluster sizes, the response variable correlation, and the regression parameters. The results demonstrate that efficiency is quite sensitive to the between- and within-cluster variation of the covariates, and provide useful characterizations of models for which upper and lower efficiency bounds are attained. Efficiency losses for simple working correlation matrices, such as independence, can be large even for small to moderate correlations and cluster sizes. 相似文献