首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Quantifying branch support using the bootstrap and/or jackknife is generally considered to be an essential component of rigorous parsimony and maximum likelihood phylogenetic analyses. Previous authors have described how application of the frequency-within-replicates approach to treating multiple equally optimal trees found in a given bootstrap pseudoreplicate can provide apparent support for otherwise unsupported clades. We demonstrate how a similar problem may occur when a non-representative subset of equally optimal trees are held per pseudoreplicate, which we term the undersampling-within-replicates artifact. We illustrate the frequency-within-replicates and undersampling-within-replicates bootstrap and jackknife artifacts using both contrived and empirical examples, demonstrate that the artifacts can occur in both parsimony and likelihood analyses, and show that the artifacts occur in outputs from multiple different phylogenetic-inference programs. Based on our results, we make the following five recommendations, which are particularly relevant to supermatrix analyses, but apply to all phylogenetic analyses. First, when two or more optimal trees are found in a given pseudoreplicate they should be summarized using the strict-consensus rather than frequency-within-replicates approach. Second jackknife resampling should be used rather than bootstrap resampling. Third, multiple tree searches while holding multiple trees per search should be conducted in each pseudoreplicate rather than conducting only a single search and holding only a single tree. Fourth, branches with a minimum possible optimized length of zero should be collapsed within each tree search rather than collapsing branches only if their maximum possible optimized length is zero. Fifth, resampling values should be mapped onto the strict consensus of all optimal trees found rather than simply presenting the ≥ 50% bootstrap or jackknife tree or mapping the resampling values onto a single optimal tree.  相似文献   

2.

Background  

Hierarchical clustering is a widely applied tool in the analysis of microarray gene expression data. The assessment of cluster stability is a major challenge in clustering procedures. Statistical methods are required to distinguish between real and random clusters. Several methods for assessing cluster stability have been published, including resampling methods such as the bootstrap.  相似文献   

3.
Cross-validation based point estimates of prediction accuracy are frequently reported in microarray class prediction problems. However these point estimates can be highly variable, particularly for small sample numbers, and it would be useful to provide confidence intervals of prediction accuracy. We performed an extensive study of existing confidence interval methods and compared their performance in terms of empirical coverage and width. We developed a bootstrap case cross-validation (BCCV) resampling scheme and defined several confidence interval methods using BCCV with and without bias-correction. The widely used approach of basing confidence intervals on an independent binomial assumption of the leave-one-out cross-validation errors results in serious under-coverage of the true prediction error. Two split-sample based methods previously proposed in the literature tend to give overly conservative confidence intervals. Using BCCV resampling, the percentile confidence interval method was also found to be overly conservative without bias-correction, while the bias corrected accelerated (BCa) interval method of Efron returns substantially anti-conservative confidence intervals. We propose a simple bias reduction on the BCCV percentile interval. The method provides mildly conservative inference under all circumstances studied and outperforms the other methods in microarray applications with small to moderate sample sizes.  相似文献   

4.
Difference density maps are commonly used in structural biology for identifying conformational changes in macromolecular complexes. For interpretation of the results, it is essential to estimate the variance or standard deviation of the difference density and the distribution of errors in space. In order to compare three-dimensional density maps of gap junction channels with and without the C-terminal regulatory domain, we developed a bootstrap resampling method for estimation of the voxel-wise standard deviation. The bootstrap approach has been successfully used for estimating the sampling distribution from a limited data set and for estimating the statistical properties of the derived quantities [Efron, B., 1979. Bootstrap methods: another look at the jackknife. Ann. Stat. 7, 1-26]. In our application, the standard deviation map can be estimated by bootstrapping the images. Our results show that, apart from the symmetry axes and small regions bordering the lumen of the extracellular vestibule, difference maps normalized by the mean of the standard deviation map can be used as a good approximation of the t-test map of the gap junction crystals.  相似文献   

5.
Aim: Phytosociological databases often contain unbalanced samples of real vegetation, which should be carefully resampled before any analyses. We propose a new resampling method based on species composition, called heterogeneity‐constrained random (HCR) resampling. Method: Many subsets of the source vegetation database are selected randomly. These subsets are sorted by decreasing mean dissimilarity between pairs of the vegetation plots, and then sorted again by increasing variance of these dissimilarities. Ranks from both sortings are summed for each subset, and the subset with the lowest summed rank is considered as the most representative. The performance of this method was tested using simulated point patterns that represented different levels of aggregation of vegetation plots within a database. The distributions of points in the subsets resulting from different resampling methods, both with and without database stratification, were compared using Ripley's K function. The mean of random selections from an unbiased sample was used as a reference in these comparisons. The efficiency of the method was also demonstrated with real phytosociological data. Results: Both stratified and HCR resampling yielded selection patterns more similar to the reference than resampling without these tools. Outcomes from the resampling that combined these two methods were the most similar to the reference. The efficiency of the HCR resampling method varied with different levels of aggregation in the database. Conclusions: This new method is efficient for resampling phytosociological databases. As it only uses information on species occurrences/abundances, it does not require the definition of strata, thereby avoiding the effect of subjective decisions on the selection outcome. Nevertheless, this method can also be applied to stratified databases.  相似文献   

6.
The success of resampling approaches to branch support depends on the effectiveness of the underlying tree searches. Two primary factors are identified as key: the depth of tree search and the number of trees saved per resampling replicate. Two datasets were explored for a range of search parameters using jackknifing. Greater depth of tree search tends to increase support values because shorter trees conflict less with each other, while increasing numbers of trees saved tends to reduce support values because of conflict that reduces structure in the replicate consensus. Although a relatively small amount of branch swapping will achieve near‐accurate values for a majority of clades, some clades do not yield accurate values until more extensive searches are performed. This means that in order to maximize the accuracy of resampling analyses, one should employ as extensive a search strategy as possible, and save as many trees per replicate as possible. Strict consensus summary of resampling replicates is preferable to frequency‐within‐replicates summary because it is a more conservative approach to the reporting of replicate results. Jackknife analysis is preferable to bootstrap because of its closer relationship to the original data.© The Willi Hennig Society 2010.  相似文献   

7.
Several research fields frequently deal with the analysis of diverse classification results of the same entities. This should imply an objective detection of overlaps and divergences between the formed clusters. The congruence between classifications can be quantified by clustering agreement measures, including pairwise agreement measures. Several measures have been proposed and the importance of obtaining confidence intervals for the point estimate in the comparison of these measures has been highlighted. A broad range of methods can be used for the estimation of confidence intervals. However, evidence is lacking about what are the appropriate methods for the calculation of confidence intervals for most clustering agreement measures. Here we evaluate the resampling techniques of bootstrap and jackknife for the calculation of the confidence intervals for clustering agreement measures. Contrary to what has been shown for some statistics, simulations showed that the jackknife performs better than the bootstrap at accurately estimating confidence intervals for pairwise agreement measures, especially when the agreement between partitions is low. The coverage of the jackknife confidence interval is robust to changes in cluster number and cluster size distribution.  相似文献   

8.

Background  

For parsimony analyses, the most common way to estimate confidence is by resampling plans (nonparametric bootstrap, jackknife), and Bremer support (Decay indices). The recent literature reveals that parameter settings that are quite commonly employed are not those that are recommended by theoretical considerations and by previous empirical studies. The optimal search strategy to be applied during resampling was previously addressed solely via standard search strategies available in PAUP*. The question of a compromise between search extensiveness and improved support accuracy for Bremer support received even less attention. A set of experiments was conducted on different datasets to find an empirical cut-off point at which increased search extensiveness does not significantly change Bremer support and jackknife or bootstrap proportions any more.  相似文献   

9.
A covariance estimator for GEE with improved small-sample properties   总被引:2,自引:0,他引:2  
Mancl LA  DeRouen TA 《Biometrics》2001,57(1):126-134
In this paper, we propose an alternative covariance estimator to the robust covariance estimator of generalized estimating equations (GEE). Hypothesis tests using the robust covariance estimator can have inflated size when the number of independent clusters is small. Resampling methods, such as the jackknife and bootstrap, have been suggested for covariance estimation when the number of clusters is small. A drawback of the resampling methods when the response is binary is that the methods can break down when the number of subjects is small due to zero or near-zero cell counts caused by resampling. We propose a bias-corrected covariance estimator that avoids this problem. In a small simulation study, we compare the bias-corrected covariance estimator to the robust and jackknife covariance estimators for binary responses for situations involving 10-40 subjects with equal and unequal cluster sizes of 16-64 observations. The bias-corrected covariance estimator gave tests with sizes close to the nominal level even when the number of subjects was 10 and cluster sizes were unequal, whereas the robust and jackknife covariance estimators gave tests with sizes that could be 2-3 times the nominal level. The methods are illustrated using data from a randomized clinical trial on treatment for bone loss in subjects with periodontal disease.  相似文献   

10.
Molecular phylogenies for the fungi in the Ascomycota rely heavily on 18S rRNA gene sequences but this gene alone does not answer all questions about relationships. Particularly problematical are the relationships among the first ascomycetes to diverge, the Archiascomycetes, and the branching order among the basal filamentous ascomycetes, the Euascomycetes. Would more data resolve branching order? We used the jackknife and bootstrapping resampling approach that constitutes the "pattern of resolved nodes" method to address the relationship between number of variable sites in a DNA sequence alignment and support for taxonomic clusters. We graphed the effect of increasing sizes of subsamples of the 18S rRNA gene sequences on bootstrap support for nodes in the Ascomycota tree. Nodes responded differently to increasing data. Some nodes, those uniting the filamentous ascomycetes for example, would still have been well supported with only two thirds of the 18S rRNA gene. Other nodes, like the one uniting the Archiascomycetes as a monophyletic group, would require about double the number of variable sites available in the 18S gene for 95% neighbor-joining bootstrap support. Of the several groups emerging at the base of the filamentous ascomycetes, the Pezizales receive the most support as the first to diverge. Our analysis suggests that we would also need almost three times as much sequence data as that provided by the 18S gene to confirm the basal position for the Pezizales and more than seven times as much data to resolve the next group to diverge. If more data from other genes show the same pattern, the lack of resolution for the filamentous ascomycetes may indicate rapid radiation within this clade.  相似文献   

11.
ANOTHER MONOPHYLY INDEX: REVISITING THE JACKKNIFE   总被引:1,自引:0,他引:1  
Abstract — Randomization routines have quickly gained wide usage in phylogenetic systematies. Introduced a decade ago, the jackknife has rarely been applied in cladistic methodology. This data resampling technique was re-investigated here as a means to discover the effect that taxon removal may have on the stability of the results obtained from parsimony analyses. This study shows that the removal of even a single taxon in an analysis can cause a solution of relatively few multiple equally parsimonious trees in an inclusive matrix to result in hundreds of equally parsimonious trees with the single removal of a taxon. On the other hand, removal of other taxa can stabilize the results to fewer trees. An index of clade stability, the Jackknife Monophyly Index (JMI) is developed which, like the bootstrap, applies a value to each clade according to its frequency of occurrence in jackknife pseudoreplicates. Unlike the bootstrap and earlier application of the jackknife, alternative suboptimal hypotheses are not forwarded by this method. Only those clades in the most parsimonious tree(s) are given JMI values. The behaviour of this index is investigated both in relation to a hypothetical and a real data set, as well as how it performs in comparison to the bootstrap. The JMI is found to not be influenced by uninformative characters or relative synapomorphy number, unlike the bootstrap.  相似文献   

12.
We propose a simple and general resampling strategy to estimatevariances for parameter estimators derived from nonsmooth estimatingfunctions. This approach applies to a wide variety of semiparametricand nonparametric problems in biostatistics. It does not requiresolving estimating equations and is thus much faster than theexisting resampling procedures. Its usefulness is illustratedwith heteroscedastic quantile regression and censored data rankregression. Numerical results based on simulated and real dataare provided.  相似文献   

13.
The clade size effect refers to a bias that causes middle‐sized clades to be less supported than small or large‐sized clades. This bias is present in resampling measures of support calculated under maximum likelihood and maximum parsimony and in Bayesian posterior probabilities. Previous analyses indicated that the clade size effect is worst in maximum parsimony, followed by maximum likelihood, while Bayesian inference is the least affected. Homoplasy was interpreted as the main cause of the effect. In this study, we explored the presence of the clade size effect in alternative measures of branch support under maximum parsimony: Bremer support and symmetric resampling, expressed as absolute frequencies and frequency differences. Analyses were performed using 50 molecular and morphological matrices. Symmetric resampling showed the same tendency that bootstrap and jackknife did for maximum parsimony and maximum likelihood. Few matrices showed a significant bias using Bremer support, presenting a better performance than resampling measures of support and comparable to Bayesian posterior probabilities. Our results indicate that the problem is not maximum parsimony, but resampling measures of support. We corroborated the role of homoplasy as a possible cause of the clade size effect, increasing the number of random trees during the resampling, which together with the higher chances that medium‐sized clades have of being contradicted generates the bias during the perturbation of the original matrix, making it stronger in resampling measures of support.  相似文献   

14.
Grant and Kluge (2003) associated resampling measures of group support with the aim of evaluating statistical stability, confidence, or the probability of recovering a true phylogenetic group. This interpretation is not necessary to methods such as jackknifing or bootstrapping, which are better interpreted as measures of support from the current dataset. Grant and Kluge only accepted the absolute Bremer value as a measure of group support, and considered resampling methods as irrelevant to phylogenetic inference. It is shown that under simple circumstances resampling indices better reflect the degree of support than Bremer values. Grant and Kluge associated the resampling methods (and the use of measures of group support in general) with what they call a “verificationist agenda”, where strongly supported groups are first detected, and then protected against additional testing. They propose that identifying weakly supported groups, and then concentrating additional tests on them, will better serve science. Both programs are actually equivalent, and inert as to the selection of methods to estimate group support. The ranking of groups under a range of resampling strength is proposed as an additional criterion to evaluate resampling methods. A reexamination of the slope of symmetric resampling frequency as a function of resampling strength suggest that slopes can be problematic as well as a measure of group support. © The Willi Hennig Society 2005.  相似文献   

15.
Improvements to resampling measures of group support   总被引:4,自引:0,他引:4  
Several aspects of current resampling methods to assess group support are reviewed. When the characters have different prior weights or some state transformation costs are different, the frequencies under either bootstrapping or jackknifing can be distorted, producing either under‐ or overestimations of the actual group support. This is avoided by symmetric resampling, where the probability p of increasing the weight of a character equals the probability of decreasing it. Problems with interpreting absolute group frequencies as a measure of the support are discussed; group support does not necessarily vary with the frequency itself, since in some cases groups with positive support may have much lower frequencies than groups with no support at all. Three possible solutions for this problem are suggested. The first is measuring the support as the difference in frequency between the group and its most frequent contradictory group. The second is calculating frequencies for values of p below the threshold under which the frequency ranks the groups in the right order of support (this threshold may vary from data set to data set). The third is estimating the support by using the slope of the frequency as a function of different (low) values of p; when p is low, groups with actual support have negative slopes (closer to 0 when the support is higher), and groups with no support have positive slopes (larger when evidence for and against the group is more abundant).  相似文献   

16.
The computer program delrious analyses molecular marker data and calculates delta and relatedness estimates. A computer simulation is presented in which delrious is used to determine relations between relatedness estimate confidence and locus number. The results obtained suggest that many kinship studies probably have been conducted at significance levels less than 95%. Confidence measures provide a means of assessing reliability of calculated parameters and, therefore, would be beneficial to kinship hypothesis testing. Consequently, resampling procedures should be conducted routinely to determine delta and relatedness estimate confidence. delrious can implement bootstrap and jackknife resampling procedures for this purpose.  相似文献   

17.
Prediction error estimation: a comparison of resampling methods   总被引:1,自引:0,他引:1  
MOTIVATION: In genomic studies, thousands of features are collected on relatively few samples. One of the goals of these studies is to build classifiers to predict the outcome of future observations. There are three inherent steps to this process: feature selection, model selection and prediction assessment. With a focus on prediction assessment, we compare several methods for estimating the 'true' prediction error of a prediction model in the presence of feature selection. RESULTS: For small studies where features are selected from thousands of candidates, the resubstitution and simple split-sample estimates are seriously biased. In these small samples, leave-one-out cross-validation (LOOCV), 10-fold cross-validation (CV) and the .632+ bootstrap have the smallest bias for diagonal discriminant analysis, nearest neighbor and classification trees. LOOCV and 10-fold CV have the smallest bias for linear discriminant analysis. Additionally, LOOCV, 5- and 10-fold CV, and the .632+ bootstrap have the lowest mean square error. The .632+ bootstrap is quite biased in small sample sizes with strong signal-to-noise ratios. Differences in performance among resampling methods are reduced as the number of specimens available increase. SUPPLEMENTARY INFORMATION: A complete compilation of results and R code for simulations and analyses are available in Molinaro et al. (2005) (http://linus.nci.nih.gov/brb/TechReport.htm).  相似文献   

18.
While hierarchical experimental designs are near-ubiquitous in neuroscience and biomedical research, researchers often do not take the structure of their datasets into account while performing statistical hypothesis tests. Resampling-based methods are a flexible strategy for performing these analyses but are difficult due to the lack of open-source software to automate test construction and execution. To address this, we present Hierarch, a Python package to perform hypothesis tests and compute confidence intervals on hierarchical experimental designs. Using a combination of permutation resampling and bootstrap aggregation, Hierarch can be used to perform hypothesis tests that maintain nominal Type I error rates and generate confidence intervals that maintain the nominal coverage probability without making distributional assumptions about the dataset of interest. Hierarch makes use of the Numba JIT compiler to reduce p-value computation times to under one second for typical datasets in biomedical research. Hierarch also enables researchers to construct user-defined resampling plans that take advantage of Hierarch’s Numba-accelerated functions.  相似文献   

19.
RAPD problems in phylogenetics   总被引:1,自引:0,他引:1  
This paper is intended to clarify some of the questions related with the application of RAPD for phylogenetic reconstruction purposes. Using different specimens of mammals selected across various taxonomic levels, we assessed the validity of RAPD to recover a known phylogeny, using four distance coefficients (simple matching, Russell & Rao, Jaccard, and Dice). We assessed the minimum number of primers required in the computations to obtain stable results in terms of distance estimates and/or topologies of the derived trees. These results based on distance methods were compared with those obtained with parsimony analyses of RAPD markers. Both approaches have shown to be equally problematic for comparing taxa above the family level. On the basis of these comparisons among various indices and methods, we recommend the use of Jaccard or Dice coefficients, with no less than twelve primers. We also suggest validation of any phylogeny based on RAPD data with a resampling procedure (i.e. the bootstrap or the jackknife) before any sound conclusion can be drawn.  相似文献   

20.
The epidemiologic concept of the adjusted attributable risk is a useful approach to quantitatively describe the importance of risk factors on the population level. It measures the proportional reduction in disease probability when a risk factor is eliminated from the population, accounting for effects of confounding and effect-modification by nuisance variables. The computation of asymptotic variance estimates for estimates of the adjusted attributable risk is often done by applying the delta method. Investigations on the delta method have shown, however, that the delta method generally tends to underestimate the standard error, leading to biased confidence intervals. We compare confidence intervals for the adjusted attributable risk derived by applying computer intensive methods like the bootstrap or jackknife to confidence intervals based on asymptotic variance estimates using an extensive Monte Carlo simulation and within a real data example from a cohort study in cardiovascular disease epidemiology. Our results show that confidence intervals based on bootstrap and jackknife methods outperform intervals based on asymptotic theory. Best variants of computer intensive confidence intervals are indicated for different situations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号