首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This article applies a simple method for settings where one has clustered data, but statistical methods are only available for independent data. We assume the statistical method provides us with a normally distributed estimate, theta, and an estimate of its variance sigma. We randomly select a data point from each cluster and apply our statistical method to this independent data. We repeat this multiple times, and use the average of the associated theta's as our estimate. An estimate of the variance is given by the average of the sigma2's minus the sample variance of the theta's. We call this procedure multiple outputation, as all "excess" data within each cluster is thrown out multiple times. Hoffman, Sen, and Weinberg (2001, Biometrika 88, 1121-1134) introduced this approach for generalized linear models when the cluster size is related to outcome. In this article, we demonstrate the broad applicability of the approach. Applications to angular data, p-values, vector parameters, Bayesian inference, genetics data, and random cluster sizes are discussed. In addition, asymptotic normality of estimates based on all possible outputations, as well as a finite number of outputations, is proven given weak conditions. Multiple outputation provides a simple and broadly applicable method for analyzing clustered data. It is especially suited to settings where methods for clustered data are impractical, but can also be applied generally as a quick and simple tool.  相似文献   

2.
David I. Warton 《Biometrics》2011,67(1):116-123
Summary A modification of generalized estimating equations (GEEs) methodology is proposed for hypothesis testing of high‐dimensional data, with particular interest in multivariate abundance data in ecology, an important application of interest in thousands of environmental science studies. Such data are typically counts characterized by high dimensionality (in the sense that cluster size exceeds number of clusters, n>K) and over‐dispersion relative to the Poisson distribution. Usual GEE methods cannot be applied in this setting primarily because sandwich estimators become numerically unstable as n increases. We propose instead using a regularized sandwich estimator that assumes a common correlation matrix R , and shrinks the sample estimate of R toward the working correlation matrix to improve its numerical stability. It is shown via theory and simulation that this substantially improves the power of Wald statistics when cluster size is not small. We apply the proposed approach to study the effects of nutrient addition on nematode communities, and in doing so discuss important issues in implementation, such as using statistics that have good properties when parameter estimates approach the boundary (), and using resampling to enable valid inference that is robust to high dimensionality and to possible model misspecification.  相似文献   

3.
Summary Many time‐to‐event studies are complicated by the presence of competing risks and by nesting of individuals within a cluster, such as patients in the same center in a multicenter study. Several methods have been proposed for modeling the cumulative incidence function with independent observations. However, when subjects are clustered, one needs to account for the presence of a cluster effect either through frailty modeling of the hazard or subdistribution hazard, or by adjusting for the within‐cluster correlation in a marginal model. We propose a method for modeling the marginal cumulative incidence function directly. We compute leave‐one‐out pseudo‐observations from the cumulative incidence function at several time points. These are used in a generalized estimating equation to model the marginal cumulative incidence curve, and obtain consistent estimates of the model parameters. A sandwich variance estimator is derived to adjust for the within‐cluster correlation. The method is easy to implement using standard software once the pseudovalues are obtained, and is a generalization of several existing models. Simulation studies show that the method works well to adjust the SE for the within‐cluster correlation. We illustrate the method on a dataset looking at outcomes after bone marrow transplantation.  相似文献   

4.
Wang YG  Zhao Y 《Biometrics》2008,64(1):39-45
Summary .   We consider ranked-based regression models for clustered data analysis. A weighted Wilcoxon rank method is proposed to take account of within-cluster correlations and varying cluster sizes. The asymptotic normality of the resulting estimators is established. A method to estimate covariance of the estimators is also given, which can bypass estimation of the density function. Simulation studies are carried out to compare different estimators for a number of scenarios on the correlation structure, presence/absence of outliers and different correlation values. The proposed methods appear to perform well, in particular, the one incorporating the correlation in the weighting achieves the highest efficiency and robustness against misspecification of correlation structure and outliers. A real example is provided for illustration.  相似文献   

5.
Fu R  Dey DK  Holsinger KE 《Biometrics》2011,67(3):1073-1082
Summary An important fraction of recently generated molecular data is dominant markers. They contain substantial information about genetic variation but dominance makes it impossible to apply standard techniques to calculate measures of genetic differentiation, such as F‐statistics. In this article, we propose a new Bayesian beta‐mixture model that more accurately describes the genetic structure from dominant markers and estimates multiple FST s from the sample. The model also has important application for codominant markers and single‐nucleotide polymorphism (SNP) data. The number of FST is assumed unknown beforehand and follows a random distribution. The reversible jump algorithm is used to estimate the unknown number of multiple FST s. We evaluate the performance of three split proposals and the overall performance of the proposed model based on simulated dominant marker data. The model could reliably identify and estimate a spectrum of degrees of genetic differentiation present in multiple loci. The estimates of FST s also incorporate uncertainty about the magnitude of within‐population inbreeding coefficient. We illustrate the method with two examples, one using dominant marker data from a rare orchid and the other using codominant marker data from human populations.  相似文献   

6.
A nonparametric estimator of a joint distribution function F0 of a d‐dimensional random vector with interval‐censored (IC) data is the generalized maximum likelihood estimator (GMLE), where d ≥ 2. The GMLE of F0 with univariate IC data is uniquely defined at each follow‐up time. However, this is no longer true in general with multivariate IC data as demonstrated by a data set from an eye study. How to estimate the survival function and the covariance matrix of the estimator in such a case is a new practical issue in analyzing IC data. We propose a procedure in such a situation and apply it to the data set from the eye study. Our method always results in a GMLE with a nonsingular sample information matrix. We also give a theoretical justification for such a procedure. Extension of our procedure to Cox's regression model is also mentioned.  相似文献   

7.
Menggang Yu  Bin Nan 《Biometrics》2010,66(2):405-414
Summary In large cohort studies, it often happens that some covariates are expensive to measure and hence only measured on a validation set. On the other hand, relatively cheap but error‐prone measurements of the covariates are available for all subjects. Regression calibration (RC) estimation method ( Prentice, 1982 , Biometrika 69 , 331–342) is a popular method for analyzing such data and has been applied to the Cox model by Wang et al. (1997, Biometrics 53 , 131–145) under normal measurement error and rare disease assumptions. In this article, we consider the RC estimation method for the semiparametric accelerated failure time model with covariates subject to measurement error. Asymptotic properties of the proposed method are investigated under a two‐phase sampling scheme for validation data that are selected via stratified random sampling, resulting in neither independent nor identically distributed observations. We show that the estimates converge to some well‐defined parameters. In particular, unbiased estimation is feasible under additive normal measurement error models for normal covariates and under Berkson error models. The proposed method performs well in finite‐sample simulation studies. We also apply the proposed method to a depression mortality study.  相似文献   

8.
Summary In this article, we propose a positive stable shared frailty Cox model for clustered failure time data where the frailty distribution varies with cluster‐level covariates. The proposed model accounts for covariate‐dependent intracluster correlation and permits both conditional and marginal inferences. We obtain marginal inference directly from a marginal model, then use a stratified Cox‐type pseudo‐partial likelihood approach to estimate the regression coefficient for the frailty parameter. The proposed estimators are consistent and asymptotically normal and a consistent estimator of the covariance matrix is provided. Simulation studies show that the proposed estimation procedure is appropriate for practical use with a realistic number of clusters. Finally, we present an application of the proposed method to kidney transplantation data from the Scientific Registry of Transplant Recipients.  相似文献   

9.
等位基因多态性群体遗传结构的多元非线性分析方法   总被引:4,自引:0,他引:4  
长期以来,对于多维基因多态性数据的多元统计分析,如计算遗传距离时昕用的聚类分析、分析群体遗传结构时所用的主成分分析、因子分析和典型相关分析等,一直应用为无约束条件数据而设计的经典多元线性分析方法,并没有注意基因多态性数据的“闭合效应”所带来的问题。从分析基因多态性数据的分布和结构特征入手,文中指出了基因多态性分布具有“闭合数据”的特点,分析了由于“闭合效应”的影响,经典多元线性方法用于群体遗传结构分析昕面临的困难。根据成分数据统计分析的理论和方法,提出了基因多态性群体遗传结构的多元非线性分析基本方法。并以主成分分析为例,通过实例比较和分析了经典线性主成分分析和“对数比”非线性主成分分析的结果,证明“对数比”非线性主成分分析方法是研究基因多态性群体遗传结构的良好方法,具有特异、灵敏等优点,其结果符合群体遗传学规律。  相似文献   

10.
The quantification of complex morphological patterns typically involves comprehensive shape and size analyses, usually obtained by gathering morphological data from all the structures that capture the phenotypic diversity of an organism or object. Articulated structures are a critical component of overall phenotypic diversity, but data gathered from these structures are difficult to incorporate into modern analyses because of the complexities associated with jointly quantifying 3D shape in multiple structures. While there are existing methods for analyzing shape variation in articulated structures in two‐dimensional (2D) space, these methods do not work in 3D, a rapidly growing area of capability and research. Here, we describe a simple geometric rigid rotation approach that removes the effect of random translation and rotation, enabling the morphological analysis of 3D articulated structures. Our method is based on Cartesian coordinates in 3D space, so it can be applied to any morphometric problem that also uses 3D coordinates (e.g., spherical harmonics). We demonstrate the method by applying it to a landmark‐based dataset for analyzing shape variation using geometric morphometrics. We have developed an R tool (ShapeRotator) so that the method can be easily implemented in the commonly used R package geomorph and MorphoJ software. This method will be a valuable tool for 3D morphological analyses in articulated structures by allowing an exhaustive examination of shape and size diversity.  相似文献   

11.
Estimating population density as precise as possible is a key premise for managing wild animal species. This can be a challenging task if the species in question is elusive or, due to high quantities, hard to count. We present a new, mathematically derived estimator for population size, where the estimation is based solely on the frequency of genetically assigned parent–offspring pairs within a subsample of an ungulate population. By use of molecular markers like microsatellites, the number of these parent–offspring pairs can be determined. The study's aim was to clarify whether a classical capture–mark–recapture (CMR) method can be adapted or extended by this genetic element to a genetic‐based capture–mark–recapture (g‐CMR). We numerically validate the presented estimator (and corresponding variance estimates) and provide the R‐code for the computation of estimates of population size including confidence intervals. The presented method provides a new framework to precisely estimate population size based on the genetic analysis of a one‐time subsample. This is especially of value where traditional CMR methods or other DNA‐based (fecal or hair) capture–recapture methods fail or are too difficult to apply. The DNA source used is basically irrelevant, but in the present case the sampling of an annual hunting bag is to serve as data basis. In addition to the high quality of muscle tissue samples, hunting bags provide additional and essential information for wildlife management practices, such as age, weight, or sex. In cases where a g‐CMR method is ecologically and hunting‐wise appropriate, it enables a wide applicability, also through its species‐independent use.  相似文献   

12.
A weighted quantile sum (WQS) regression has been used to assess the associations between environmental exposures and health outcomes. However, the currently available WQS approach, which is based on additive effects, does not allow exploring for potential interactions of exposures with other covariates in relation to a health outcome. In addition, the current WQS cannot account for clustering, thus it may not be valid for analysis of clustered data. We propose a generalized WQS approach that can assess interactions by estimating stratum‐specific weights of exposures in a mixture, while accounting for potential clustering effect of matched pairs of cases and controls as well as censored exposure data due to being below the limits of detection. The performance of the proposed method in identifying interactions is evaluated through simulations based on various scenarios of correlation structures among the exposures and with an outcome. We also assess how well the proposed method performs in the presence of the varying levels of censoring in exposures. Our findings from the simulation study show that the proposed method outperforms the traditional WQS, as indicated by higher power of detecting interactions. We also find no strong evidence that the proposed method falsely identifies interactions when there are no true interactive effects. We demonstrate application of the proposed method to real data from the Epidemiological Research on Autism Spectrum Disorder (ASD) in Jamaica (ERAJ) by examining interactions between exposure to manganese and glutathione S‐transferase family gene, GSTP1 in relation to ASD.  相似文献   

13.
Recently, although advances were made on modeling multivariate count data, existing models really has several limitations: (i) The multivariate Poisson log‐normal model (Aitchison and Ho, 1989) cannot be used to fit multivariate count data with excess zero‐vectors; (ii) The multivariate zero‐inflated Poisson (ZIP) distribution (Li et al., 1999) cannot be used to model zero‐truncated/deflated count data and it is difficult to apply to high‐dimensional cases; (iii) The Type I multivariate zero‐adjusted Poisson (ZAP) distribution (Tian et al., 2017) could only model multivariate count data with a special correlation structure for random components that are all positive or negative. In this paper, we first introduce a new multivariate ZAP distribution, based on a multivariate Poisson distribution, which allows the correlations between components with a more flexible dependency structure, that is some of the correlation coefficients could be positive while others could be negative. We then develop its important distributional properties, and provide efficient statistical inference methods for multivariate ZAP model with or without covariates. Two real data examples in biomedicine are used to illustrate the proposed methods.  相似文献   

14.
Malka Gorfine  Li Hsu 《Biometrics》2011,67(2):415-426
Summary In this work, we provide a new class of frailty‐based competing risks models for clustered failure times data. This class is based on expanding the competing risks model of Prentice et al. (1978, Biometrics 34 , 541–554) to incorporate frailty variates, with the use of cause‐specific proportional hazards frailty models for all the causes. Parametric and nonparametric maximum likelihood estimators are proposed. The main advantages of the proposed class of models, in contrast to the existing models, are: (1) the inclusion of covariates; (2) the flexible structure of the dependency among the various types of failure times within a cluster; and (3) the unspecified within‐subject dependency structure. The proposed estimation procedures produce the most efficient parametric and semiparametric estimators and are easy to implement. Simulation studies show that the proposed methods perform very well in practical situations.  相似文献   

15.
Auxiliary covariate data are often collected in biomedical studies when the primary exposure variable is only assessed on a subset of the study subjects. In this study, we investigate a semiparametric‐estimated likelihood estimation for the generalized linear mixed models (GLMM) in the presence of a continuous auxiliary variable. We use a kernel smoother to handle continuous auxiliary data. The method can be used to deal with missing or mismeasured covariate data problems in a variety of applications when an auxiliary variable is available and cluster sizes are not too small. Simulation study results show that the proposed method performs better than that which ignores the random effects in GLMM and that which only uses data in the validation data set. We illustrate the proposed method with a real data set from a recent environmental epidemiology study on the maternal serum 1,1‐dichloro‐2,2‐bis(p‐chlorophenyl) ethylene level in relationship to preterm births.  相似文献   

16.
Johnson BA  Long Q  Chung M 《Biometrics》2011,67(4):1379-1388
Summary Dimension reduction, model and variable selection are ubiquitous concepts in modern statistical science and deriving new methods beyond the scope of current methodology is noteworthy. This article briefly reviews existing regularization methods for penalized least squares and likelihood for survival data and their extension to a certain class of penalized estimating function. We show that if one's goal is to estimate the entire regularized coefficient path using the observed survival data, then all current strategies fail for the Buckley–James estimating function. We propose a novel two‐stage method to estimate and restore the entire Dantzig‐regularized coefficient path for censored outcomes in a least‐squares framework. We apply our methods to a microarray study of lung andenocarcinoma with sample size n = 200 and p = 1036 gene predictors and find 10 genes that are consistently selected across different criteria and an additional 14 genes that merit further investigation. In simulation studies, we found that the proposed path restoration and variable selection technique has the potential to perform as well as existing methods that begin with a proper convex loss function at the outset.  相似文献   

17.
Sheets, H.D., Mitchell, C.E., Izard, Z.T., Willis, J.M., Melchin, M.J. & Holmden, C. 2012: Horizon annealing: a collection‐based approach to automated sequencing of the fossil record. Lethaia, Vol. 45, pp. 532–547. A number of different approaches to quantitative biochronology have been proposed and used to construct high‐resolution time‐scales for a range of uses. We present a new approach, horizon annealing, which uses simulated annealing to optimize the sequencing of collection horizons. Temporal sequences of events produced by this method are compared with those produced by graphic correlation, CONOP and RASC for a series of previously studied exemplar data sets. Horizon annealing produces results similar to other methods, but it does have properties (the ordination of collections and the avoidance of some local minima) that make it useful for high‐resolution studies, particularly those based on capture‐mark‐recapture methods requiring detailed presence–absence data for individual collections and taxa. □ Chronostratigraphy, graphic correlation, graptolite, rate of evolution, CONOP9.  相似文献   

18.
Ralstonia solanacearum, a phytopathogenic bacterium, uses an environmentally sensitive and complex regulatory network to control expression of multiple virulence genes. Part of this network is an unusual autoregulatory system that produces and senses 3-hydroxypalmitic acid methyl ester. In culture, this autoregulatory system ensures that expression of virulence genes, such as those of the eps operon encoding biosynthesis of the acidic extracellular polysaccharide, occurs only at high cell density (>10(7) cells/ml). To determine if regulation follows a similar pattern within tomato plants, we first developed a quantitative immunofluorescence (QIF) method that measures the relative amount of a target protein within individual bacterial cells. For R. solanacearum, QIF was used to determine the amount of beta-galactosidase protein within wild-type cells containing a stable eps-lacZ reporter allele. When cultured cells were examined to test the method, QIF accurately detected both low and high levels of eps gene expression. QIF analysis of R. solanacearum cells recovered from stems of infected tomato plants showed that expression of eps during pathogenesis was similar to that in culture. These results suggest that there are no special signals or conditions within plants that override or short-circuit the regulatory processes observed in R. solanacearum in culture. Because QIF is a robust, relatively simple procedure that uses generally accessible equipment, it should be useful in many situations where gene expression in single bacterial cells must be determined.  相似文献   

19.
Existing methods for joint modeling of longitudinal measurements and survival data can be highly influenced by outliers in the longitudinal outcome. We propose a joint model for analysis of longitudinal measurements and competing risks failure time data which is robust in the presence of outlying longitudinal observations during follow‐up. Our model consists of a linear mixed effects sub‐model for the longitudinal outcome and a proportional cause‐specific hazards frailty sub‐model for the competing risks data, linked together by latent random effects. Instead of the usual normality assumption for measurement errors in the linear mixed effects sub‐model, we adopt a t ‐distribution which has a longer tail and thus is more robust to outliers. We derive an EM algorithm for the maximum likelihood estimates of the parameters and estimate their standard errors using a profile likelihood method. The proposed method is evaluated by simulation studies and is applied to a scleroderma lung study (© 2009 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

20.

Thanks to advances in high-throughput sequencing technologies, the importance of microbiome to human health and disease has been increasingly recognized. Analyzing microbiome data from sequencing experiments is challenging due to their unique features such as compositional data, excessive zero observations, overdispersion, and complex relations among microbial taxa. Clustered microbiome data have become prevalent in recent years from designs such as longitudinal studies, family studies, and matched case–control studies. The within-cluster dependence compounds the challenge of the microbiome data analysis. Methods that properly accommodate intra-cluster correlation and features of the microbiome data are needed. We develop robust and powerful differential composition tests for clustered microbiome data. The methods do not rely on any distributional assumptions on the microbial compositions, which provides flexibility to model various correlation structures among taxa and among samples within a cluster. By leveraging the adjusted sandwich covariance estimate, the methods properly accommodate sample dependence within a cluster. The two-part version of the test can further improve power in the presence of excessive zero observations. Different types of confounding variables can be easily adjusted for in the methods. We perform extensive simulation studies under commonly adopted clustered data designs to evaluate the methods. We demonstrate that the methods properly control the type I error under all designs and are more powerful than existing methods in many scenarios. The usefulness of the proposed methods is further demonstrated with two real datasets from longitudinal microbiome studies on pregnant women and inflammatory bowel disease patients. The methods have been incorporated into the R package “miLineage” publicly available at https://tangzheng1.github.io/tanglab/software.html.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号