首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Cross-validation based point estimates of prediction accuracy are frequently reported in microarray class prediction problems. However these point estimates can be highly variable, particularly for small sample numbers, and it would be useful to provide confidence intervals of prediction accuracy. We performed an extensive study of existing confidence interval methods and compared their performance in terms of empirical coverage and width. We developed a bootstrap case cross-validation (BCCV) resampling scheme and defined several confidence interval methods using BCCV with and without bias-correction. The widely used approach of basing confidence intervals on an independent binomial assumption of the leave-one-out cross-validation errors results in serious under-coverage of the true prediction error. Two split-sample based methods previously proposed in the literature tend to give overly conservative confidence intervals. Using BCCV resampling, the percentile confidence interval method was also found to be overly conservative without bias-correction, while the bias corrected accelerated (BCa) interval method of Efron returns substantially anti-conservative confidence intervals. We propose a simple bias reduction on the BCCV percentile interval. The method provides mildly conservative inference under all circumstances studied and outperforms the other methods in microarray applications with small to moderate sample sizes.  相似文献   

2.
C M Lebreton  P M Visscher 《Genetics》1998,148(1):525-535
Several nonparametric bootstrap methods are tested to obtain better confidence intervals for the quantitative trait loci (QTL) positions, i.e., with minimal width and unbiased coverage probability. Two selective resampling schemes are proposed as a means of conditioning the bootstrap on the number of genetic factors in our model inferred from the original data. The selection is based on criteria related to the estimated number of genetic factors, and only the retained bootstrapped samples will contribute a value to the empirically estimated distribution of the QTL position estimate. These schemes are compared with a nonselective scheme across a range of simple configurations of one QTL on a one-chromosome genome. In particular, the effect of the chromosome length and the relative position of the QTL are examined for a given experimental power, which determines the confidence interval size. With the test protocol used, it appears that the selective resampling schemes are either unbiased or least biased when the QTL is situated near the middle of the chromosome. When the QTL is closer to one end, the likelihood curve of its position along the chromosome becomes truncated, and the nonselective scheme then performs better inasmuch as the percentage of estimated confidence intervals that actually contain the real QTL''s position is closer to expectation. The nonselective method, however, produces larger confidence intervals. Hence, we advocate use of the selective methods, regardless of the QTL position along the chromosome (to reduce confidence interval sizes), but we leave the problem open as to how the method should be altered to take into account the bias of the original estimate of the QTL''s position.  相似文献   

3.
The epidemiologic concept of the adjusted attributable risk is a useful approach to quantitatively describe the importance of risk factors on the population level. It measures the proportional reduction in disease probability when a risk factor is eliminated from the population, accounting for effects of confounding and effect-modification by nuisance variables. The computation of asymptotic variance estimates for estimates of the adjusted attributable risk is often done by applying the delta method. Investigations on the delta method have shown, however, that the delta method generally tends to underestimate the standard error, leading to biased confidence intervals. We compare confidence intervals for the adjusted attributable risk derived by applying computer intensive methods like the bootstrap or jackknife to confidence intervals based on asymptotic variance estimates using an extensive Monte Carlo simulation and within a real data example from a cohort study in cardiovascular disease epidemiology. Our results show that confidence intervals based on bootstrap and jackknife methods outperform intervals based on asymptotic theory. Best variants of computer intensive confidence intervals are indicated for different situations.  相似文献   

4.
Diversity indices might be used to assess the impact of treatments on the relative abundance patterns in species communities. When several treatments are to be compared, simultaneous confidence intervals for the differences of diversity indices between treatments may be used. The simultaneous confidence interval methods described until now are either constructed or validated under the assumption of the multinomial distribution for the abundance counts. Motivated by four example data sets with background in agricultural and marine ecology, we focus on the situation when available replications show that the count data exhibit extra‐multinomial variability. Based on simulated overdispersed count data, we compare previously proposed methods assuming multinomial distribution, a method assuming normal distribution for the replicated observations of the diversity indices and three different bootstrap methods to construct simultaneous confidence intervals for multiple differences of Simpson and Shannon diversity indices. The focus of the simulation study is on comparisons to a control group. The severe failure of asymptotic multinomial methods in overdispersed settings is illustrated. Among the bootstrap methods, the widely known Westfall–Young method performs best for the Simpson index, while for the Shannon index, two methods based on stratified bootstrap and summed count data are preferable. The methods application is illustrated for an example.  相似文献   

5.
When predicting population dynamics, the value of the prediction is not enough and should be accompanied by a confidence interval that integrates the whole chain of errors, from observations to predictions via the estimates of the parameters of the model. Matrix models are often used to predict the dynamics of age- or size-structured populations. Their parameters are vital rates. This study aims (1) at assessing the impact of the variability of observations on vital rates, and then on model’s predictions, and (2) at comparing three methods for computing confidence intervals for values predicted from the models. The first method is the bootstrap. The second method is analytic and approximates the standard error of predictions by their asymptotic variance as the sample size tends to infinity. The third method combines use of the bootstrap to estimate the standard errors of vital rates with the analytical method to then estimate the errors of predictions from the model. Computations are done for an Usher matrix models that predicts the asymptotic (as time goes to infinity) stock recovery rate for three timber species in French Guiana. Little difference is found between the hybrid and the analytic method. Their estimates of bias and standard error converge towards the bootstrap estimates when the error on vital rates becomes small enough, which corresponds in the present case to a number of observations greater than 5000 trees.  相似文献   

6.
We use the Genetic Analysis Workshop 14 simulated data to explore the effectiveness of a two-stage strategy for mapping complex disease loci consisting of an initial genome scan with confidence interval construction for gene location, followed by fine mapping with family-based tests of association on a dense set of single-nucleotide polymorphisms. We considered four types of intervals: the 1-LOD interval, a basic percentile bootstrap confidence interval based on the position of the maximum Zlr score, and asymptotic and bootstrap confidence intervals based on a generalized estimating equations method. For fine mapping we considered two family-based tests of association: a test based on a likelihood ratio statistic and a transmission-disequilibrium-type test implemented in the software FBAT. In two of the simulation replicates, we found that the bootstrap confidence intervals based on the peak Zlr and the 1-LOD support interval always contained the true disease loci and that the likelihood ratio test provided further strong confirmatory evidence of the presence of disease loci in these regions.  相似文献   

7.
Zhou XH  Tu W 《Biometrics》2000,56(4):1118-1125
In this paper, we consider the problem of interval estimation for the mean of diagnostic test charges. Diagnostic test charge data may contain zero values, and the nonzero values can often be modeled by a log-normal distribution. Under such a model, we propose three different interval estimation procedures: a percentile-t bootstrap interval based on sufficient statistics and two likelihood-based confidence intervals. For theoretical properties, we show that the two likelihood-based one-sided confidence intervals are only first-order accurate and that the bootstrap-based one-sided confidence interval is second-order accurate. For two-sided confidence intervals, all three proposed methods are second-order accurate. A simulation study in finite-sample sizes suggests all three proposed intervals outperform a widely used minimum variance unbiased estimator (MVUE)-based interval except for the case of one-sided lower end-point intervals when the skewness is very small. Among the proposed one-sided intervals, the bootstrap interval has the best coverage accuracy. For the two-sided intervals, when the sample size is small, the bootstrap method still yields the best coverage accuracy unless the skewness is very small, in which case the bias-corrected ML method has the best accuracy. When the sample size is large, all three proposed intervals have similar coverage accuracy. Finally, we analyze with the proposed methods one real example assessing diagnostic test charges among older adults with depression.  相似文献   

8.
Bootstrap confidence intervals for adaptive cluster sampling   总被引:2,自引:0,他引:2  
Consider a collection of spatially clustered objects where the clusters are geographically rare. Of interest is estimation of the total number of objects on the site from a sample of plots of equal size. Under these spatial conditions, adaptive cluster sampling of plots is generally useful in improving efficiency in estimation over simple random sampling without replacement (SRSWOR). In adaptive cluster sampling, when a sampled plot meets some predefined condition, neighboring plots are added to the sample. When populations are rare and clustered, the usual unbiased estimators based on small samples are often highly skewed and discrete in distribution. Thus, confidence intervals based on asymptotic normal theory may not be appropriate. We investigated several nonparametric bootstrap methods for constructing confidence intervals under adaptive cluster sampling. To perform bootstrapping, we transformed the initial sample in order to include the information from the adaptive portion of the sample yet maintain a fixed sample size. In general, coverages of bootstrap percentile methods were closer to nominal coverage than the normal approximation.  相似文献   

9.
Duval S  Tweedie R 《Biometrics》2000,56(2):455-463
We study recently developed nonparametric methods for estimating the number of missing studies that might exist in a meta-analysis and the effect that these studies might have had on its outcome. These are simple rank-based data augmentation techniques, which formalize the use of funnel plots. We show that they provide effective and relatively powerful tests for evaluating the existence of such publication bias. After adjusting for missing studies, we find that the point estimate of the overall effect size is approximately correct and coverage of the effect size confidence intervals is substantially improved, in many cases recovering the nominal confidence levels entirely. We illustrate the trim and fill method on existing meta-analyses of studies in clinical trials and psychometrics.  相似文献   

10.
Dinh P  Zhou XH 《Biometrics》2006,62(2):576-588
Two measures often used in a cost-effectiveness analysis are the incremental cost-effectiveness ratio (ICER) and the net health benefit (NHB). Inferences on these two quantities are often hindered by highly skewed cost data. In this article, we derive the Edgeworth expansions for the studentized t-statistics for the two measures and show how they could be used to guide inferences. In particular, we use the expansions to study the theoretical performance of existing confidence intervals based on normal theory and to derive new confidence intervals for the ICER and the NHB. We conduct a simulation study to compare our new intervals with several existing methods. The methods evaluated include Taylor's interval, Fieller's interval, the bootstrap percentile interval, and the bootstrap bias-corrected acceleration interval. We found that our new intervals give good coverage accuracy and are narrower compared to the current recommended intervals.  相似文献   

11.
Seaman SR  White IR  Copas AJ  Li L 《Biometrics》2012,68(1):129-137
Two approaches commonly used to deal with missing data are multiple imputation (MI) and inverse-probability weighting (IPW). IPW is also used to adjust for unequal sampling fractions. MI is generally more efficient than IPW but more complex. Whereas IPW requires only a model for the probability that an individual has complete data (a univariate outcome), MI needs a model for the joint distribution of the missing data (a multivariate outcome) given the observed data. Inadequacies in either model may lead to important bias if large amounts of data are missing. A third approach combines MI and IPW to give a doubly robust estimator. A fourth approach (IPW/MI) combines MI and IPW but, unlike doubly robust methods, imputes only isolated missing values and uses weights to account for remaining larger blocks of unimputed missing data, such as would arise, e.g., in a cohort study subject to sample attrition, and/or unequal sampling fractions. In this article, we examine the performance, in terms of bias and efficiency, of IPW/MI relative to MI and IPW alone and investigate whether the Rubin's rules variance estimator is valid for IPW/MI. We prove that the Rubin's rules variance estimator is valid for IPW/MI for linear regression with an imputed outcome, we present simulations supporting the use of this variance estimator in more general settings, and we demonstrate that IPW/MI can have advantages over alternatives. IPW/MI is applied to data from the National Child Development Study.  相似文献   

12.
The intraclass correlation coefficient (ICC) is a classical index of measurement reliability. With the advent of new and complex types of data for which the ICC is not defined, there is a need for new ways to assess reliability. To meet this need, we propose a new distance‐based ICC (dbICC), defined in terms of arbitrary distances among observations. We introduce a bias correction to improve the coverage of bootstrap confidence intervals for the dbICC, and demonstrate its efficacy via simulation. We illustrate the proposed method by analyzing the test‐retest reliability of brain connectivity matrices derived from a set of repeated functional magnetic resonance imaging scans. The Spearman‐Brown formula, which shows how more intensive measurement increases reliability, is extended to encompass the dbICC.  相似文献   

13.
14.
The methods described here make it possible to use data on sporophytic genotype frequencies to estimate the frequency of gametophytic self-fertilization in populations of homosporous plants. Bootstrap bias reduction is effective in reducing or eliminating the bias of the maximum likelihood estimate of the gametophytic selfing rate. The bias-corrected percentile method provides the most reliable confidence intervals for allele frequencies. The percentile method gives the most reliable confidence intervals for the gametophytic selfing rate when selfing is common. The maximum likelihood intervals, the percentile intervals, the bias-corrected percentile intervals, and the bootstrap t intervals are all overly conservative in their construction of confidence intervals for the gametophytic selfing rate when self-fertilization is rare. Application of the recommended methods indicates that gametophytic self-fertilization is quite rare in two sexually reproducing populations of Pellaea andromedifolia studied by Gastony and Gottlieb (1985).  相似文献   

15.
We analysed data from a selective DNA pooling experiment with 130 individuals of the arctic fox (Alopex lagopus), which originated from 2 different types regarding body size. The association between alleles of 6 selected unlinked molecular markers and body size was tested by using univariate and multinomial logistic regression models, applying odds ratio and test statistics from the power divergence family. Due to the small sample size and the resulting sparseness of the data table, in hypothesis testing we could not rely on the asymptotic distributions of the tests. Instead, we tried to account for data sparseness by (i) modifying confidence intervals of odds ratio; (ii) using a normal approximation of the asymptotic distribution of the power divergence tests with different approaches for calculating moments of the statistics; and (iii) assessing P values empirically, based on bootstrap samples. As a result, a significant association was observed for 3 markers. Furthermore, we used simulations to assess the validity of the normal approximation of the asymptotic distribution of the test statistics under the conditions of small and sparse samples.  相似文献   

16.
This paper considers statistical inference for the receiver operating characteristic (ROC) curve in the presence of missing biomarker values by utilizing estimating equations (EEs) together with smoothed empirical likelihood (SEL). Three approaches are developed to estimate ROC curve and construct its SEL-based confidence intervals based on the kernel-assisted EE imputation, multiple imputation, and hybrid imputation combining the inverse probability weighted imputation and multiple imputation. Under some regularity conditions, we show asymptotic properties of the proposed maximum SEL estimators for ROC curve. Simulation studies are conducted to investigate the performance of the proposed SEL approaches. An example is illustrated by the proposed methodologies. Empirical results show that the hybrid imputation method behaves better than the kernel-assisted and multiple imputation methods, and the proposed three SEL methods outperform existing nonparametric method.  相似文献   

17.
Lu Xia  Bin Nan  Yi Li 《Biometrics》2023,79(1):344-357
Modeling and drawing inference on the joint associations between single-nucleotide polymorphisms and a disease has sparked interest in genome-wide associations studies. In the motivating Boston Lung Cancer Survival Cohort (BLCSC) data, the presence of a large number of single nucleotide polymorphisms of interest, though smaller than the sample size, challenges inference on their joint associations with the disease outcome. In similar settings, we find that neither the debiased lasso approach (van de Geer et al., 2014), which assumes sparsity on the inverse information matrix, nor the standard maximum likelihood method can yield confidence intervals with satisfactory coverage probabilities for generalized linear models. Under this “large n, diverging p” scenario, we propose an alternative debiased lasso approach by directly inverting the Hessian matrix without imposing the matrix sparsity assumption, which further reduces bias compared to the original debiased lasso and ensures valid confidence intervals with nominal coverage probabilities. We establish the asymptotic distributions of any linear combinations of the parameter estimates, which lays the theoretical ground for drawing inference. Simulations show that the proposed refined debiased estimating method performs well in removing bias and yields honest confidence interval coverage. We use the proposed method to analyze the aforementioned BLCSC data, a large-scale hospital-based epidemiology cohort study investigating the joint effects of genetic variants on lung cancer risks.  相似文献   

18.
When comparing two competing interventions, confidence intervals for cost‐effectiveness ratios (CERs) provide information on the uncertainty in their point estimates. Techniques for constructing these confidence intervals are much debated. We provide a formal comparison of the Fieller, symmetric and Bonferroni methods for constructing confidence intervals for the CER using only the joint asymptotic distribution of the incremental cost and incremental effectiveness of the two interventions being compared. We prove the existence of a finite interval under the Fieller method when the incremental effectiveness is statistically significant. When this difference is not significant the Fieller method yields an unbounded confidence interval. The Fieller interval is always wider than the symmetric interval, but the latter is an approximation to the Fieller interval when the incremental effectiveness is highly significant. The Bonferroni method is shown to produce the widest interval. Because it accounts for the likely correlation between cost and effectiveness measures, and the intuitively appealing relationship between the existence of a bounded interval and the significance of the incremental effectiveness, the Fieller interval is to be preferred in reporting a confidence interval for the CER.  相似文献   

19.
Many confidence intervals calculated in practice are potentially not exact, either because the requirements for the interval estimator to be exact are known to be violated, or because the (exact) distribution of the data is unknown. If a confidence interval is approximate, the crucial question is how well its true coverage probability approximates its intended coverage probability. In this paper we propose to use the bootstrap to calculate an empirical estimate for the (true) coverage probability of a confidence interval. In the first instance, the empirical coverage can be used to assess whether a given type of confidence interval is adequate for the data at hand. More generally, when planning the statistical analysis of future trials based on existing data pools, the empirical coverage can be used to study the coverage properties of confidence intervals as a function of type of data, sample size, and analysis scale, and thus inform the statistical analysis plan for the future trial. In this sense, the paper proposes an alternative to the problematic pretest of the data for normality, followed by selection of the analysis method based on the results of the pretest. We apply the methodology to a data pool of bioequivalence studies, and in the selection of covariance patterns for repeated measures data.  相似文献   

20.
Bertail P  Tressou J 《Biometrics》2006,62(1):66-74
This article proposes statistical tools for quantitative evaluation of the risk due to the presence of some particular contaminants in food. We focus on the estimation of the probability of the exposure to exceed the so-called provisional tolerable weekly intake (PTWI), when both consumption data and contamination data are independently available. A Monte Carlo approximation of the plug-in estimator, which may be seen as an incomplete generalized U-statistic, is investigated. We obtain the asymptotic properties of this estimator and propose several confidence intervals, based on two estimators of the asymptotic variance: (i) a bootstrap type estimator and (ii) an approximate jackknife estimator relying on the Hoeffding decomposition of the original U-statistics. As an illustration, we present an evaluation of the exposure to Ochratoxin A in France.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号