首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A covariance estimator for GEE with improved small-sample properties   总被引:2,自引:0,他引:2  
Mancl LA  DeRouen TA 《Biometrics》2001,57(1):126-134
In this paper, we propose an alternative covariance estimator to the robust covariance estimator of generalized estimating equations (GEE). Hypothesis tests using the robust covariance estimator can have inflated size when the number of independent clusters is small. Resampling methods, such as the jackknife and bootstrap, have been suggested for covariance estimation when the number of clusters is small. A drawback of the resampling methods when the response is binary is that the methods can break down when the number of subjects is small due to zero or near-zero cell counts caused by resampling. We propose a bias-corrected covariance estimator that avoids this problem. In a small simulation study, we compare the bias-corrected covariance estimator to the robust and jackknife covariance estimators for binary responses for situations involving 10-40 subjects with equal and unequal cluster sizes of 16-64 observations. The bias-corrected covariance estimator gave tests with sizes close to the nominal level even when the number of subjects was 10 and cluster sizes were unequal, whereas the robust and jackknife covariance estimators gave tests with sizes that could be 2-3 times the nominal level. The methods are illustrated using data from a randomized clinical trial on treatment for bone loss in subjects with periodontal disease.  相似文献   

2.
Xu R  Adak S 《Biometrics》2002,58(2):305-315
Nonproportional hazards often arise in survival analysis, as is evident in the data from the International Non-Hodgkin's Lymphoma Prognostic Factors Project. A tree-based method to handle such survival data is developed for the assessment and estimation of time-dependent regression effects under a Cox-type model. The tree method approximates the time-varying regression effects as piecewise constants and is designed to estimate change points in the regression parameters. A fast algorithm that relies on maximized score statistics is used in recursive segmentation of the time axis. Following the segmentation, a pruning algorithm with optimal properties similar to those of classification and regression trees (CART) is used to determine a sparse segmentation. Bootstrap resampling is used in correcting for overoptimism due to split point optimization. The piecewise constant model is often more suitable for clinical interpretation of the regression parameters than the more flexible spline models. The utility of the algorithm is shown on the lymphoma data, where we further develop the published International Risk Index into a time-varying risk index for non-Hodgkin's lymphoma.  相似文献   

3.
Grant and Kluge (2003) associated resampling measures of group support with the aim of evaluating statistical stability, confidence, or the probability of recovering a true phylogenetic group. This interpretation is not necessary to methods such as jackknifing or bootstrapping, which are better interpreted as measures of support from the current dataset. Grant and Kluge only accepted the absolute Bremer value as a measure of group support, and considered resampling methods as irrelevant to phylogenetic inference. It is shown that under simple circumstances resampling indices better reflect the degree of support than Bremer values. Grant and Kluge associated the resampling methods (and the use of measures of group support in general) with what they call a “verificationist agenda”, where strongly supported groups are first detected, and then protected against additional testing. They propose that identifying weakly supported groups, and then concentrating additional tests on them, will better serve science. Both programs are actually equivalent, and inert as to the selection of methods to estimate group support. The ranking of groups under a range of resampling strength is proposed as an additional criterion to evaluate resampling methods. A reexamination of the slope of symmetric resampling frequency as a function of resampling strength suggest that slopes can be problematic as well as a measure of group support. © The Willi Hennig Society 2005.  相似文献   

4.
The clade size effect refers to a bias that causes middle‐sized clades to be less supported than small or large‐sized clades. This bias is present in resampling measures of support calculated under maximum likelihood and maximum parsimony and in Bayesian posterior probabilities. Previous analyses indicated that the clade size effect is worst in maximum parsimony, followed by maximum likelihood, while Bayesian inference is the least affected. Homoplasy was interpreted as the main cause of the effect. In this study, we explored the presence of the clade size effect in alternative measures of branch support under maximum parsimony: Bremer support and symmetric resampling, expressed as absolute frequencies and frequency differences. Analyses were performed using 50 molecular and morphological matrices. Symmetric resampling showed the same tendency that bootstrap and jackknife did for maximum parsimony and maximum likelihood. Few matrices showed a significant bias using Bremer support, presenting a better performance than resampling measures of support and comparable to Bayesian posterior probabilities. Our results indicate that the problem is not maximum parsimony, but resampling measures of support. We corroborated the role of homoplasy as a possible cause of the clade size effect, increasing the number of random trees during the resampling, which together with the higher chances that medium‐sized clades have of being contradicted generates the bias during the perturbation of the original matrix, making it stronger in resampling measures of support.  相似文献   

5.
Aim: Phytosociological databases often contain unbalanced samples of real vegetation, which should be carefully resampled before any analyses. We propose a new resampling method based on species composition, called heterogeneity‐constrained random (HCR) resampling. Method: Many subsets of the source vegetation database are selected randomly. These subsets are sorted by decreasing mean dissimilarity between pairs of the vegetation plots, and then sorted again by increasing variance of these dissimilarities. Ranks from both sortings are summed for each subset, and the subset with the lowest summed rank is considered as the most representative. The performance of this method was tested using simulated point patterns that represented different levels of aggregation of vegetation plots within a database. The distributions of points in the subsets resulting from different resampling methods, both with and without database stratification, were compared using Ripley's K function. The mean of random selections from an unbiased sample was used as a reference in these comparisons. The efficiency of the method was also demonstrated with real phytosociological data. Results: Both stratified and HCR resampling yielded selection patterns more similar to the reference than resampling without these tools. Outcomes from the resampling that combined these two methods were the most similar to the reference. The efficiency of the HCR resampling method varied with different levels of aggregation in the database. Conclusions: This new method is efficient for resampling phytosociological databases. As it only uses information on species occurrences/abundances, it does not require the definition of strata, thereby avoiding the effect of subjective decisions on the selection outcome. Nevertheless, this method can also be applied to stratified databases.  相似文献   

6.
We compared general behaviour trends of resampling methods (bootstrap, bootstrap with Poisson distribution, jackknife, and jackknife with symmetric resampling) and different ways to summarize the results for resampling (absolute frequency, F, and frequency difference, GC') for real data sets under variable resampling strengths in three weighting schemes. We propose an equivalence between bootstrap and jackknife in order to make bootstrap variable across different resampling strengths. Specifically, for each method we evaluated the number of spurious groups (groups not present in the strict consensus of the unaltered data set), of real groups, and of inconsistencies in ranking of groups under variable resampling strengths. We found that GC' always generated more spurious groups and recovered more groups than F. Bootstrap methods generated more spurious groups than jackknife methods; and jackknife is the method that recovered more real groups. We consistently obtained a higher proportion of spurious groups for GC' than for F; and for bootstrap than for jackknife. Finally, we evaluated the ranking of groups under variable resampling strengths qualitatively in the trajectories of "support" against resampling strength, and quantitatively with Kendall coefficient values. We found fewer ranking inconsistencies for GC' than for F, and for bootstrap than for jackknife.
© The Willi Hennig Society 2009.  相似文献   

7.
Rieger RH  Weinberg CR 《Biometrics》2002,58(2):332-341
Conditional logistic regression (CLR) is useful for analyzing clustered binary outcome data when interest lies in estimating a cluster-specific exposure parameter while treating the dependency arising from random cluster effects as a nuisance. CLR aggregates unmeasured cluster-specific factors into a cluster-specific baseline risk and is invalid in the presence of unmodeled heterogeneous covariate effects or within-cluster dependency. We propose an alternative, resampling-based method for analyzing clustered binary outcome data, within-cluster paired resampling (WCPR), which allows for within-cluster dependency not solely due to baseline heterogeneity. For example, dependency may be in part caused by heterogeneity in response to an exposure across clusters due to unmeasured cofactors. When both CLR and WCPR are valid, our simulations suggest that the two methods perform comparably. When CLR is invalid, WCPR continues to have good operating characteristics. For illustration, we apply both WCPR and CLR to a periodontal data set where there is heterogeneity in response to exposure across clusters.  相似文献   

8.
Informative drop-out arises in longitudinal studies when the subject's follow-up time depends on the unobserved values of the response variable. We specify a semiparametric linear regression model for the repeatedly measured response variable and an accelerated failure time model for the time to informative drop-out. The error terms from the two models are assumed to have a common, but completely arbitrary joint distribution. Using a rank-based estimator for the accelerated failure time model and an artificial censoring device, we construct an asymptotically unbiased estimating function for the linear regression model. The resultant estimator is shown to be consistent and asymptotically normal. A resampling scheme is developed to estimate the limiting covariance matrix. Extensive simulation studies demonstrate that the proposed methods are suitable for practical use. Illustrations with data taken from two AIDS clinical trials are provided.  相似文献   

9.
We propose new resampling‐based approaches to construct asymptotically valid time‐simultaneous confidence bands for cumulative hazard functions in multistate Cox models. In particular, we exemplify the methodology in detail for the simple Cox model with time‐dependent covariates, where the data may be subject to independent right‐censoring or left‐truncation. We use simulations to investigate their finite sample behavior. Finally, the methods are utilized to analyze two empirical examples with survival and competing risks data.  相似文献   

10.
Collings and Hamilton (1988), described a uniform bootstrap method that is applied on observed or pilot data in order to approximate the power of the two-sample Wilcoxon test for location shift alternatives. In this paper we demonstrate how importance and antithetic resampling can be used to substantially reduce the amount of computation needed to approximate the power of the two-sample tests for location shift and scale alternatives. Importance and antithetic bootstrap resampling methods are applied to simulated data of different sample sizes from a variety of distributions as well as to data from the Iowa 65+ Rural Health Study. Also, a suggestion is given for using a combination of importance and antithetic resampling for approximating the power of two-sample tests.  相似文献   

11.
In model building and model evaluation, cross‐validation is a frequently used resampling method. Unfortunately, this method can be quite time consuming. In this article, we discuss an approximation method that is much faster and can be used in generalized linear models and Cox’ proportional hazards model with a ridge penalty term. Our approximation method is based on a Taylor expansion around the estimate of the full model. In this way, all cross‐validated estimates are approximated without refitting the model. The tuning parameter can now be chosen based on these approximations and can be optimized in less time. The method is most accurate when approximating leave‐one‐out cross‐validation results for large data sets which is originally the most computationally demanding situation. In order to demonstrate the method's performance, it will be applied to several microarray data sets. An R package penalized, which implements the method, is available on CRAN.  相似文献   

12.
Model checking for ROC regression analysis   总被引:1,自引:0,他引:1  
Cai T  Zheng Y 《Biometrics》2007,63(1):152-163
Summary .   The receiver operating characteristic (ROC) curve is a prominent tool for characterizing the accuracy of a continuous diagnostic test. To account for factors that might influence the test accuracy, various ROC regression methods have been proposed. However, as in any regression analysis, when the assumed models do not fit the data well, these methods may render invalid and misleading results. To date, practical model-checking techniques suitable for validating existing ROC regression models are not yet available. In this article, we develop cumulative residual-based procedures to graphically and numerically assess the goodness of fit for some commonly used ROC regression models, and show how specific components of these models can be examined within this framework. We derive asymptotic null distributions for the residual processes and discuss resampling procedures to approximate these distributions in practice. We illustrate our methods with a dataset from the cystic fibrosis registry.  相似文献   

13.
The success of resampling approaches to branch support depends on the effectiveness of the underlying tree searches. Two primary factors are identified as key: the depth of tree search and the number of trees saved per resampling replicate. Two datasets were explored for a range of search parameters using jackknifing. Greater depth of tree search tends to increase support values because shorter trees conflict less with each other, while increasing numbers of trees saved tends to reduce support values because of conflict that reduces structure in the replicate consensus. Although a relatively small amount of branch swapping will achieve near‐accurate values for a majority of clades, some clades do not yield accurate values until more extensive searches are performed. This means that in order to maximize the accuracy of resampling analyses, one should employ as extensive a search strategy as possible, and save as many trees per replicate as possible. Strict consensus summary of resampling replicates is preferable to frequency‐within‐replicates summary because it is a more conservative approach to the reporting of replicate results. Jackknife analysis is preferable to bootstrap because of its closer relationship to the original data.© The Willi Hennig Society 2010.  相似文献   

14.
Pvclust: an R package for assessing the uncertainty in hierarchical clustering   总被引:11,自引:0,他引:11  
SUMMARY: Pvclust is an add-on package for a statistical software R to assess the uncertainty in hierarchical cluster analysis. Pvclust can be used easily for general statistical problems, such as DNA microarray analysis, to perform the bootstrap analysis of clustering, which has been popular in phylogenetic analysis. Pvclust calculates probability values (p-values) for each cluster using bootstrap resampling techniques. Two types of p-values are available: approximately unbiased (AU) p-value and bootstrap probability (BP) value. Multiscale bootstrap resampling is used for the calculation of AU p-value, which has superiority in bias over BP value calculated by the ordinary bootstrap resampling. In addition the computation time can be enormously decreased with parallel computing option.  相似文献   

15.
We propose a conditional scores procedure for obtaining bias-corrected estimates of log odds ratios from matched case-control data in which one or more covariates are subject to measurement error. The approach involves conditioning on sufficient statistics for the unobservable true covariates that are treated as fixed unknown parameters. For the case of Gaussian nondifferential measurement error, we derive a set of unbiased score equations that can then be solved to estimate the log odds ratio parameters of interest. The procedure successfully removes the bias in naive estimates, and standard error estimates are obtained by resampling methods. We present an example of the procedure applied to data from a matched case-control study of prostate cancer and serum hormone levels, and we compare its performance to that of regression calibration procedures.  相似文献   

16.
Adrian E. Raftery  Le Bao 《Biometrics》2010,66(4):1162-1173
Summary The Joint United Nations Programme on HIV/AIDS (UNAIDS) has decided to use Bayesian melding as the basis for its probabilistic projections of HIV prevalence in countries with generalized epidemics. This combines a mechanistic epidemiological model, prevalence data, and expert opinion. Initially, the posterior distribution was approximated by sampling‐importance‐resampling, which is simple to implement, easy to interpret, transparent to users, and gave acceptable results for most countries. For some countries, however, this is not computationally efficient because the posterior distribution tends to be concentrated around nonlinear ridges and can also be multimodal. We propose instead incremental mixture importance sampling (IMIS), which iteratively builds up a better importance sampling function. This retains the simplicity and transparency of sampling importance resampling, but is much more efficient computationally. It also leads to a simple estimator of the integrated likelihood that is the basis for Bayesian model comparison and model averaging. In simulation experiments and on real data, it outperformed both sampling importance resampling and three publicly available generic Markov chain Monte Carlo algorithms for this kind of problem.  相似文献   

17.
The computer program delrious analyses molecular marker data and calculates delta and relatedness estimates. A computer simulation is presented in which delrious is used to determine relations between relatedness estimate confidence and locus number. The results obtained suggest that many kinship studies probably have been conducted at significance levels less than 95%. Confidence measures provide a means of assessing reliability of calculated parameters and, therefore, would be beneficial to kinship hypothesis testing. Consequently, resampling procedures should be conducted routinely to determine delta and relatedness estimate confidence. delrious can implement bootstrap and jackknife resampling procedures for this purpose.  相似文献   

18.
Populations evolving under the joint influence of recombination and resampling (traditionally known as genetic drift) are investigated. First, we summarize and adapt a deterministic approach, as valid for infinite populations, which assumes continuous time and single crossover events. The corresponding nonlinear system of differential equations permits a closed solution, both in terms of the type frequencies and via linkage disequilibria of all orders. To include stochastic effects, we then consider the corresponding finite-population model, the Moran model with single crossovers, and examine it both analytically and by means of simulations. Particular emphasis is on the connection with the deterministic solution. If there is only recombination and every pair of recombined offspring replaces their pair of parents (i.e., there is no resampling), then the expected type frequencies in the finite population, of arbitrary size, equal the type frequencies in the infinite population. If resampling is included, the stochastic process converges, in the infinite-population limit, to the deterministic dynamics, which turns out to be a good approximation already for populations of moderate size.  相似文献   

19.
探讨景观生态风险与生态系统服务价值之间的关系,对于构建生态安全格局和促进人类福祉具有重要意义。以福建省为研究区域,利用1980、2000和2020年3期土地利用遥感监测数据,基于5 km×5 km评价网格重采样,定量评估景观生态风险和生态系统服务价值,分析二者的时空变化特征,采用双变量空间自相关分析方法和空间回归模型分析二者的空间关系。结果表明: 1980—2020年,福建省景观生态风险等级由中等风险转入较低风险,风险状况转好,各等级总体呈现东高西低的分布格局;生态系统服务价值总体下降,各单项生态系统功能结构相对稳定,各等级由高到低以高值区为核心向外依次扩散分布。景观生态风险与生态系统服务价值存在显著的空间负相关性,景观生态风险对生态系统服务总价值具有负效应,对供给功能价值的影响最大。  相似文献   

20.
Quantifying branch support using the bootstrap and/or jackknife is generally considered to be an essential component of rigorous parsimony and maximum likelihood phylogenetic analyses. Previous authors have described how application of the frequency-within-replicates approach to treating multiple equally optimal trees found in a given bootstrap pseudoreplicate can provide apparent support for otherwise unsupported clades. We demonstrate how a similar problem may occur when a non-representative subset of equally optimal trees are held per pseudoreplicate, which we term the undersampling-within-replicates artifact. We illustrate the frequency-within-replicates and undersampling-within-replicates bootstrap and jackknife artifacts using both contrived and empirical examples, demonstrate that the artifacts can occur in both parsimony and likelihood analyses, and show that the artifacts occur in outputs from multiple different phylogenetic-inference programs. Based on our results, we make the following five recommendations, which are particularly relevant to supermatrix analyses, but apply to all phylogenetic analyses. First, when two or more optimal trees are found in a given pseudoreplicate they should be summarized using the strict-consensus rather than frequency-within-replicates approach. Second jackknife resampling should be used rather than bootstrap resampling. Third, multiple tree searches while holding multiple trees per search should be conducted in each pseudoreplicate rather than conducting only a single search and holding only a single tree. Fourth, branches with a minimum possible optimized length of zero should be collapsed within each tree search rather than collapsing branches only if their maximum possible optimized length is zero. Fifth, resampling values should be mapped onto the strict consensus of all optimal trees found rather than simply presenting the ≥ 50% bootstrap or jackknife tree or mapping the resampling values onto a single optimal tree.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号