首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Generalized estimating equations (Liang and Zeger, 1986) is a widely used, moment-based procedure to estimate marginal regression parameters. However, a subtle and often overlooked point is that valid inference requires the mean for the response at time t to be expressed properly as a function of the complete past, present, and future values of any time-varying covariate. For example, with environmental exposures it may be necessary to express the response as a function of multiple lagged values of the covariate series. Despite the fact that multiple lagged covariates may be predictive of outcomes, researchers often focus interest on parameters in a 'cross-sectional' model, where the response is expressed as a function of a single lag in the covariate series. Cross-sectional models yield parameters with simple interpretations and avoid issues of collinearity associated with multiple lagged values of a covariate. Pepe and Anderson (1994), showed that parameter estimates for time-varying covariates may be biased unless the mean, given all past, present, and future covariate values, is equal to the cross-sectional mean or unless independence estimating equations are used. Although working independence avoids potential bias, many authors have shown that a poor choice for the response correlation model can lead to highly inefficient parameter estimates. The purpose of this paper is to study the bias-efficiency trade-off associated with working correlation choices for application with binary response data. We investigate data characteristics or design features (e.g. cluster size, overall response association, functional form of the response association, covariate distribution, and others) that influence the small and large sample characteristics of parameter estimates obtained from several different weighting schemes or equivalently 'working' covariance models. We find that the impact of covariance model choice depends highly on the specific structure of the data features, and that key aspects should be examined before choosing a weighting scheme.  相似文献   

2.
Datta S  Satten GA 《Biometrics》2008,64(2):501-507
Summary .   We consider the problem of comparing two outcome measures when the pairs are clustered. Using the general principle of within-cluster resampling, we obtain a novel signed-rank test for clustered paired data. We show by a simple informative cluster size simulation model that only our test maintains the correct size under a null hypothesis of marginal symmetry compared to four other existing signed rank tests; further, our test has adequate power when cluster size is noninformative. In general, cluster size is informative if the distribution of pair-wise differences within a cluster depends on the cluster size. An application of our method to testing radiation toxicity trend is presented.  相似文献   

3.
Exact tests for one sample correlated binary data   总被引:1,自引:0,他引:1  
In this paper we developed exact tests for one sample correlated binary data whose cluster sizes are at most two. Although significant progress has been made in the development and implementation of the exact tests for uncorrelated data, exact tests for correlated data are rare. Lack of a tractable likelihood function has made it difficult to develop exact tests for correlated binary data. However, when cluster sizes of binary data are at most two, only three parameters are needed to characterize the problem. One parameter is fixed under the null hypothesis, while the other two parameters can be removed by both conditional and unconditional approaches, respectively, to construct exact tests. We compared the exact and asymptotic p-values in several cases. The proposed method is applied to real-life data.  相似文献   

4.
Dunson DB  Chen Z  Harry J 《Biometrics》2003,59(3):521-530
In applications that involve clustered data, such as longitudinal studies and developmental toxicity experiments, the number of subunits within a cluster is often correlated with outcomes measured on the individual subunits. Analyses that ignore this dependency can produce biased inferences. This article proposes a Bayesian framework for jointly modeling cluster size and multiple categorical and continuous outcomes measured on each subunit. We use a continuation ratio probit model for the cluster size and underlying normal regression models for each of the subunit-specific outcomes. Dependency between cluster size and the different outcomes is accommodated through a latent variable structure. The form of the model facilitates posterior computation via a simple and computationally efficient Gibbs sampler. The approach is illustrated with an application to developmental toxicity data, and other applications, to joint modeling of longitudinal and event time data, are discussed.  相似文献   

5.
In surveillance studies of periodontal disease, the relationship between disease and other health and socioeconomic conditions is of key interest. To determine whether a patient has periodontal disease, multiple clinical measurements (eg, clinical attachment loss, alveolar bone loss, and tooth mobility) are taken at the tooth‐level. Researchers often create a composite outcome from these measurements or analyze each outcome separately. Moreover, patients have varying number of teeth, with those who are more prone to the disease having fewer teeth compared to those with good oral health. Such dependence between the outcome of interest and cluster size (number of teeth) is called informative cluster size and results obtained from fitting conventional marginal models can be biased. We propose a novel method to jointly analyze multiple correlated binary outcomes for clustered data with informative cluster size using the class of generalized estimating equations (GEE) with cluster‐specific weights. We compare our proposed multivariate outcome cluster‐weighted GEE results to those from the convectional GEE using the baseline data from Veterans Affairs Dental Longitudinal Study. In an extensive simulation study, we show that our proposed method yields estimates with minimal relative biases and excellent coverage probabilities.  相似文献   

6.
David I. Warton 《Biometrics》2011,67(1):116-123
Summary A modification of generalized estimating equations (GEEs) methodology is proposed for hypothesis testing of high‐dimensional data, with particular interest in multivariate abundance data in ecology, an important application of interest in thousands of environmental science studies. Such data are typically counts characterized by high dimensionality (in the sense that cluster size exceeds number of clusters, n>K) and over‐dispersion relative to the Poisson distribution. Usual GEE methods cannot be applied in this setting primarily because sandwich estimators become numerically unstable as n increases. We propose instead using a regularized sandwich estimator that assumes a common correlation matrix R , and shrinks the sample estimate of R toward the working correlation matrix to improve its numerical stability. It is shown via theory and simulation that this substantially improves the power of Wald statistics when cluster size is not small. We apply the proposed approach to study the effects of nutrient addition on nematode communities, and in doing so discuss important issues in implementation, such as using statistics that have good properties when parameter estimates approach the boundary (), and using resampling to enable valid inference that is robust to high dimensionality and to possible model misspecification.  相似文献   

7.
A method is proposed that aims at identifying clusters of individuals that show similar patterns when observed repeatedly. We consider linear‐mixed models that are widely used for the modeling of longitudinal data. In contrast to the classical assumption of a normal distribution for the random effects a finite mixture of normal distributions is assumed. Typically, the number of mixture components is unknown and has to be chosen, ideally by data driven tools. For this purpose, an EM algorithm‐based approach is considered that uses a penalized normal mixture as random effects distribution. The penalty term shrinks the pairwise distances of cluster centers based on the group lasso and the fused lasso method. The effect is that individuals with similar time trends are merged into the same cluster. The strength of regularization is determined by one penalization parameter. For finding the optimal penalization parameter a new model choice criterion is proposed.  相似文献   

8.
There is growing interest in conducting cluster randomized trials (CRTs). For simplicity in sample size calculation, the cluster sizes are assumed to be identical across all clusters. However, equal cluster sizes are not guaranteed in practice. Therefore, the relative efficiency (RE) of unequal versus equal cluster sizes has been investigated when testing the treatment effect. One of the most important approaches to analyze a set of correlated data is the generalized estimating equation (GEE) proposed by Liang and Zeger, in which the “working correlation structure” is introduced and the association pattern depends on a vector of association parameters denoted by ρ. In this paper, we utilize GEE models to test the treatment effect in a two‐group comparison for continuous, binary, or count data in CRTs. The variances of the estimator of the treatment effect are derived for the different types of outcome. RE is defined as the ratio of variance of the estimator of the treatment effect for equal to unequal cluster sizes. We discuss a commonly used structure in CRTs—exchangeable, and derive the simpler formula of RE with continuous, binary, and count outcomes. Finally, REs are investigated for several scenarios of cluster size distributions through simulation studies. We propose an adjusted sample size due to efficiency loss. Additionally, we also propose an optimal sample size estimation based on the GEE models under a fixed budget for known and unknown association parameter (ρ) in the working correlation structure within the cluster.  相似文献   

9.
Rosner B  Glynn RJ  Lee ML 《Biometrics》2006,62(1):185-192
The Wilcoxon signed rank test is a frequently used nonparametric test for paired data (e.g., consisting of pre- and posttreatment measurements) based on independent units of analysis. This test cannot be used for paired comparisons arising from clustered data (e.g., if paired comparisons are available for each of two eyes of an individual). To incorporate clustering, a generalization of the randomization test formulation for the signed rank test is proposed, where the unit of randomization is at the cluster level (e.g., person), while the individual paired units of analysis are at the subunit within cluster level (e.g., eye within person). An adjusted variance estimate of the signed rank test statistic is then derived, which can be used for either balanced (same number of subunits per cluster) or unbalanced (different number of subunits per cluster) data, with an exchangeable correlation structure, with or without tied values. The resulting test statistic is shown to be asymptotically normal as the number of clusters becomes large, if the cluster size is bounded. Simulation studies are performed based on simulating correlated ranked data from a signed log-normal distribution. These studies indicate appropriate type I error for data sets with > or =20 clusters and a superior power profile compared with either the ordinary signed rank test based on the average cluster difference score or the multivariate signed rank test of Puri and Sen. Finally, the methods are illustrated with two data sets, (i) an ophthalmologic data set involving a comparison of electroretinogram (ERG) data in retinitis pigmentosa (RP) patients before and after undergoing an experimental surgical procedure, and (ii) a nutritional data set based on a randomized prospective study of nutritional supplements in RP patients where vitamin E intake outside of study capsules is compared before and after randomization to monitor compliance with nutritional protocols.  相似文献   

10.
The frequency of cluster-randomized trials (CRTs) in peer-reviewed literature has increased exponentially over the past two decades. CRTs are a valuable tool for studying interventions that cannot be effectively implemented or randomized at the individual level. However, some aspects of the design and analysis of data from CRTs are more complex than those for individually randomized controlled trials. One of the key components to designing a successful CRT is calculating the proper sample size (i.e. number of clusters) needed to attain an acceptable level of statistical power. In order to do this, a researcher must make assumptions about the value of several variables, including a fixed mean cluster size. In practice, cluster size can often vary dramatically. Few studies account for the effect of cluster size variation when assessing the statistical power for a given trial. We conducted a simulation study to investigate how the statistical power of CRTs changes with variable cluster sizes. In general, we observed that increases in cluster size variability lead to a decrease in power.  相似文献   

11.
Standard sample size calculation formulas for stepped wedge cluster randomized trials (SW-CRTs) assume that cluster sizes are equal. When cluster sizes vary substantially, ignoring this variation may lead to an under-powered study. We investigate the relative efficiency of a SW-CRT with varying cluster sizes to equal cluster sizes, and derive variance estimators for the intervention effect that account for this variation under a mixed effects model—a commonly used approach for analyzing data from cluster randomized trials. When cluster sizes vary, the power of a SW-CRT depends on the order in which clusters receive the intervention, which is determined through randomization. We first derive a variance formula that corresponds to any particular realization of the randomized sequence and propose efficient algorithms to identify upper and lower bounds of the power. We then obtain an “expected” power based on a first-order approximation to the variance formula, where the expectation is taken with respect to all possible randomization sequences. Finally, we provide a variance formula for more general settings where only the cluster size arithmetic mean and coefficient of variation, instead of exact cluster sizes, are known in the design stage. We evaluate our methods through simulations and illustrate that the average power of a SW-CRT decreases as the variation in cluster sizes increases, and the impact is largest when the number of clusters is small.  相似文献   

12.
MOTIVATION: High-dimensional data such as microarrays have created new challenges to traditional statistical methods. One such example is on class prediction with high-dimension, low-sample size data. Due to the small sample size, the sample mean estimates are usually unreliable. As a consequence, the performance of the class prediction methods using the sample mean may also be unsatisfactory. To obtain more accurate estimation of parameters some statistical methods, such as regularizations through shrinkage, are often desired. RESULTS: In this article, we investigate the family of shrinkage estimators for the mean value under the quadratic loss function. The optimal shrinkage parameter is proposed under the scenario when the sample size is fixed and the dimension is large. We then construct a shrinkage-based diagonal discriminant rule by replacing the sample mean by the proposed shrinkage mean. Finally, we demonstrate via simulation studies and real data analysis that the proposed shrinkage-based rule outperforms its original competitor in a wide range of settings.  相似文献   

13.
Postnatal growth is an important life‐history trait that varies widely across avian species, and several equations with a sigmoidal shape have been used to model it. Classical three‐parameter models have an inflection point fixed at a percentage of the upper asymptote which could be an unrealistic assumption generating biased fits. The Richards model emerged as an interesting alternative because it includes an extra parameter that determines the location of the inflection point which can move freely along the growth curve. Recently, nonlinear mixed models (NLMM) have been used in modeling avian growth because these models can deal with a lack of independence among data as typically occurs with multiple measurements on the same individual or on groups of related individuals. Here, we evaluated the usefulness of von Bertalanffy, Gompertz, logistic, U4 and Richards's equations modeling chick growth in the imperial shag Phalacrocorax atriceps. We modelled growth in commonly used morphological traits, including body mass, bill length, head length and tarsus length, and compared the performance of models by using NLMM. Estimated adult size, age at maximum growth and maximum growth rates markedly differed across models. Overall, the most consistent performance in estimated adult size was obtained by the Richards model that showed deviations from mean adult size within 5%. Based on AICc values, the Richards equation was the best model for all traits analyzed. For tarsus length, both Richards and U4 models provided indistinguishable fits because the relative inflection value estimated from the Richards model was very close to that assumed by the U4 model. Our results highlight the bias incurred by three‐parameter models when the assumed inflection placement deviates from that derived from data. Thus, the application of the Richards equation using the NLMM framework represents a flexible and powerful tool for the analysis of avian growth.  相似文献   

14.
Weitao Sun  Jing He 《Biopolymers》2010,93(10):904-916
Residue clusters play essential role in stabilizing protein structures in the form of complex networks. We show that the cluster sizes in a native protein follow the log‐normal distribution for a dataset consisting of 424 proteins. To our knowledge, this is the first time of such fitting for the native structures. Based on log‐normal model, the asymptotically increasing mean cluster sizes produce a critical protein chain length of about 200 amino acids, beyond which length most globular proteins have nearly the same mean cluster sizes. This suggests that the larger proteins use a different packing mechanism than the smaller proteins. We confirmed the scale‐free property of the residue contact network for most of the protein structures in the dataset, although the violations were observed for the tightly packed proteins. Residue cluster network wheel (RCNW) is proposed to visualize the relationship between the multiple properties of the residue network such as the cluster size, the residue types and contacts, and the flexibility of the residue. We noticed that the residues with large cluster size have smaller Cα displacement measured using the normal mode analysis. © 2010 Wiley Periodicals, Inc. Biopolymers 93: 904–916, 2010.  相似文献   

15.
Cluster randomized studies are common in community trials. The standard method for estimating sample size for cluster randomized studies assumes a common cluster size. However often in cluster randomized studies, size of the clusters vary. In this paper, we derive sample size estimation for continuous outcomes for cluster randomized studies while accounting for the variability due to cluster size. It is shown that the proposed formula for estimating total cluster size can be obtained by adding a correction term to the traditional formula which uses the average cluster size. Application of these results to the design of a health promotion educational intervention study is discussed.  相似文献   

16.
We examine the effects of changing plot size on parameter estimation efficiency in multivariate (community-level) ecological studies, where estimation efficiency is defined in terms relating to the statistical precision of estimates of all variables (e.g. species) in a data set. Three ‘efficiency criteria’ for multivariate estimation are developed, and the relationship between estimation efficiency and plot size examined using three field data sets (deciduous understory, coniferous understory, and mire vegetation) from central Canada. For all three communities, estimation efficiency was found to increase monotonically with increasing plot size. However, relative gains in efficiency at larger plot sizes were offset by substantial increases in sampling effort (enumeration time per plot). Our results indicate that the largest plot size possible, given the constraints of time, should be used for parameter estimation in plant communities. Also, plots that are larger than the mean patch size should be utilized when sampling heterogeneous vegetation.  相似文献   

17.
Abstract The fine‐scale spatial genetic structure (SGS) of alpine plants is receiving increasing attention, from which seed and pollen dispersal can be inferred. However, estimation of SGS may depend strongly on the sampling strategy, including the sample size and spatial sampling scheme. Here, we examined the effects of sample size and three spatial schemes, simple‐random, line‐transect, and random‐cluster sampling, on the estimation of SGS in Androsace tapete, an alpine cushion plant endemic to Qinghai‐Tibetan Plateau. Using both real data and simulated data of dominant molecular markers, we show that: (i) SGS is highly sensitive to sample strategy especially when the sample size is small (e.g., below 100); (ii) the commonly used SGS parameter (the intercept of the autocorrelogram) is more susceptible to sample error than a newly developed Sp statistic; and (iii) the random‐cluster scheme is susceptible to obvious bias in parameter estimation even when the sample size is relatively large (e.g., above 200). Overall, the line‐transect scheme is recommendable, in that it performs slightly better than the simple‐random scheme in parameter estimation and is more efficient to encompass broad spatial scales. The consistency between simulated data and real data implies that these findings might hold true in other alpine plants and more species should be examined in future work.  相似文献   

18.
Spatial scan statistics are widely used in various fields. The performance of these statistics is influenced by parameters, such as maximum spatial cluster size, and can be improved by parameter selection using performance measures. Current performance measures are based on the presence of clusters and are thus inapplicable to data sets without known clusters. In this work, we propose a novel overall performance measure called maximum clustering set–proportion (MCS-P), which is based on the likelihood of the union of detected clusters and the applied dataset. MCS-P was compared with existing performance measures in a simulation study to select the maximum spatial cluster size. Results of other performance measures, such as sensitivity and misclassification, suggest that the spatial scan statistic achieves accurate results in most scenarios with the maximum spatial cluster sizes selected using MCS-P. Given that previously known clusters are not required in the proposed strategy, selection of the optimal maximum cluster size with MCS-P can improve the performance of the scan statistic in applications without identified clusters.  相似文献   

19.
P Smolen  J Rinzel    A Sherman 《Biophysical journal》1993,64(6):1668-1680
Previous mathematical modeling of beta cell electrical activity has involved single cells or, recently, clusters of identical cells. Here we model clusters of heterogeneous cells that differ in size, channel density, and other parameters. We use gap-junctional electrical coupling, with conductances determined by an experimental histogram. We find that, for reasonable parameter distributions, only a small proportion of isolated beta cells will burst when uncoupled, at any given value of a glucose-sensing parameter. However, a coupled, heterogeneous cluster of such cells, if sufficiently large (approximately 125 cells), will burst synchronously. Small clusters of such cells will burst only with low probability. In large clusters, the dynamics of intracellular calcium compare well with experiments. Also, these clusters possess a dose-response curve of increasing average electrical activity with respect to a glucose-sensing parameter that is sharp when the cluster is coupled, but shallow when the cluster is decoupled into individual cells. This is in agreement with comparative experiments on cells in suspension and islets.  相似文献   

20.
Summary .   Motivated by the spatial modeling of aberrant crypt foci (ACF) in colon carcinogenesis, we consider binary data with probabilities modeled as the sum of a nonparametric mean plus a latent Gaussian spatial process that accounts for short-range dependencies. The mean is modeled in a general way using regression splines. The mean function can be viewed as a fixed effect and is estimated with a penalty for regularization. With the latent process viewed as another random effect, the model becomes a generalized linear mixed model. In our motivating data set and other applications, the sample size is too large to easily accommodate maximum likelihood or restricted maximum likelihood estimation (REML), so pairwise likelihood, a special case of composite likelihood, is used instead. We develop an asymptotic theory for models that are sufficiently general to be used in a wide variety of applications, including, but not limited to, the problem that motivated this work. The splines have penalty parameters that must converge to zero asymptotically: we derive theory for this along with a data-driven method for selecting the penalty parameter, a method that is shown in simulations to improve greatly upon standard devices, such as likelihood crossvalidation. Finally, we apply the methods to the data from our experiment ACF. We discover an unexpected location for peak formation of ACF.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号