首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Bootstrap is a time-honoured distribution-free approach for attaching standard error to any statistic of interest, but has not received much attention for data with missing values especially when using imputation techniques to replace missing values. We propose a proportional bootstrap method that allows effective use of imputation techniques for all bootstrap samples. Five detcnninistic imputation techniques are examined and particular emphasis is placed on the estimation of standard error for correlation coefficient. Some real data examples are presented. Other possible applications of the proposed bootstrap method are discussed.  相似文献   

2.
Bennewitz J  Reinsch N  Kalm E 《Genetics》2002,160(4):1673-1686
The nonparametric bootstrap approach is known to be suitable for calculating central confidence intervals for the locations of quantitative trait loci (QTL). However, the distribution of the bootstrap QTL position estimates along the chromosome is peaked at the positions of the markers and is not tailed equally. This results in conservativeness and large width of the confidence intervals. In this study three modified methods are proposed to calculate nonparametric bootstrap confidence intervals for QTL locations, which compute noncentral confidence intervals (uncorrected method I), correct for the impact of the markers (weighted method I), or both (weighted method II). Noncentral confidence intervals were computed with an analog of the highest posterior density method. The correction for the markers is based on the distribution of QTL estimates along the chromosome when the QTL is not linked with any marker, and it can be obtained with a permutation approach. In a simulation study the three methods were compared with the original bootstrap method. The results showed that it is useful, first, to compute noncentral confidence intervals and, second, to correct the bootstrap distribution of the QTL estimates for the impact of the markers. The weighted method II, combining these two properties, produced the shortest and less biased confidence intervals in a large number of simulated configurations.  相似文献   

3.
The recently-developed statistical method known as the “bootstrap” can be used to place confidence intervals on phylogenies. It involves resampling points from one's own data, with replacement, to create a series of bootstrap samples of the same size as the original data. Each of these is analyzed, and the variation among the resulting estimates taken to indicate the size of the error involved in making estimates from the original data. In the case of phylogenies, it is argued that the proper method of resampling is to keep all of the original species while sampling characters with replacement, under the assumption that the characters have been independently drawn by the systematist and have evolved independently. Majority-rule consensus trees can be used to construct a phylogeny showing all of the inferred monophyletic groups that occurred in a majority of the bootstrap samples. If a group shows up 95% of the time or more, the evidence for it is taken to be statistically significant. Existing computer programs can be used to analyze different bootstrap samples by using weights on the characters, the weight of a character being how many times it was drawn in bootstrap sampling. When all characters are perfectly compatible, as envisioned by Hennig, bootstrap sampling becomes unnecessary; the bootstrap method would show significant evidence for a group if it is defined by three or more characters.  相似文献   

4.
Abstract. A method is described to determine the number of significant dimensions in metric ordination of a sample. The method is probabilistic, based on bootstrap resampling. An iterative algorithm takes bootstrap samples with replacement from the sample. It finds in each bootstrap sample ordination coordinates and computes, after Procrustean adjustments, the correlation between observed and bootstrap ordination scores. It compares this correlation to the same parameter generated in a parallel bootstrapped ordination of randomly permuted data, which upon many iterations will generate a probability. The method is assessed in principal coordinates analysis of simulated data sets that have varying number of variables and correlation levels, uniform or patterned correlation structure. The results suggest the method is more reliable than other available methods in recovering the true intrinsic dimensionality. Examples with grassland data illustrate utility.  相似文献   

5.
For independent data, non-parametric bootstrap is realised by resampling the data with replacement. This approach fails for dependent data such as time series. If the data generating process is at least stationary and mixing, the blockwise bootstrap by drawing subsamples or blocks of the data saves the concept. For the blockwise bootstrap a blocklength has to be selected. We propose a method for selecting the optimal blocklength. To improve the finite size properties of the blockwise bootstrap, studentised statistics is considered. If the statistic can be represented as a smooth function model this studentisation can be approximated efficiently. The studentised blockwise bootstrap method is applied for testing hypotheses on medical time series.  相似文献   

6.
The bootstrap error estimation method is investigated in comparison with the known π-method and with a combined error estimation suggested by us using simulated and normally distributed “populations” in 15 and 30 characters, respectively. For small sample sizes (below the double to threefold number of characters per class) the estimates resulting from the bootstrap method are on the average too small and can no longer be accepted. Significantly better results (with an essentially lower calculation expenditure) are obtained for the π-method and the combined estimation. The variability is essentially the same for all the three methods. This applies both in the case of rather badly separated and in the case of very well separated populations. A bootstrap estimation modified by us also gives unsatisfactory results.  相似文献   

7.
The bootstrap is an important tool for estimating the confidence interval of monophyletic groups within phylogenies. Although bootstrap analyses are used in most evolutionary studies, there is no clear consensus as how best to interpret bootstrap probability values. To study further the bootstrap method, nine small subunit ribosomal DNA (SSU rDNA) data sets were submitted to bootstrapped maximum parsimony (MP) analyses using unweighted and weighted sequence positions. Analyses of the lengths (i.e., parsimony steps) of the bootstrap trees show that the shape and mean of the bootstrap tree distribution may provide important insights into the evolutionary signal within the sequence data. With complex phylogenies containing nodes defined by short internal branches (multifurcations), the mean of the bootstrap tree distribution may differ by 2 standard deviations from the length of the best tree found from the original data set. Weighting sequence positions significantly increases the bootstrap values at internal nodes. There may, however, be strong bootstrap support for conflicting species groupings among different data sets. This phenomenon appears to result from a correlation between the topology of the tree used to create the weights and the topology of the bootstrap consensus tree inferred from the MP analysis of these weighted data. The analyses also show that characteristics of the bootstrap tree distribution (e.g., skewness) may be used to choose between alternative weighting schemes for phylogenetic analyses.  相似文献   

8.
Usually, genetic correlations are estimated from breeding designs in the laboratory or greenhouse. However, estimates of the genetic correlation for natural populations are lacking, mostly because pedigrees of wild individuals are rarely known. Recently Lynch (1999) proposed a formula to estimate the genetic correlation in the absence of data on pedigree. This method has been shown to be particularly accurate provided a large sample size and a minimum (20%) proportion of relatives. Lynch (1999) proposed the use of the bootstrap to estimate standard errors associated with genetic correlations, but did not test the reliability of such a method. We tested the bootstrap and showed the jackknife can provide valid estimates of the genetic correlation calculated with the Lynch formula. The occurrence of undefined estimates, combined with the high number of replicates involved in the bootstrap, means there is a high probability of obtaining a biased upward, incomplete bootstrap, even when there is a high fraction of related pairs in a sample. It is easier to obtain complete jackknife estimates for which all the pseudovalues have been defined. We therefore recommend the use of the jackknife to estimate the genetic correlation with the Lynch formula. Provided data can be collected for more than two individuals at each location, we propose a group sampling method that produces low standard errors associated with the jackknife, even when there is a low fraction of relatives in a sample.  相似文献   

9.
In this paper the detection of rare variants association with continuous phenotypes of interest is investigated via the likelihood-ratio based variance component test under the framework of linear mixed models. The hypothesis testing is challenging and nonstandard, since under the null the variance component is located on the boundary of its parameter space. In this situation the usual asymptotic chisquare distribution of the likelihood ratio statistic does not necessarily hold. To circumvent the derivation of the null distribution we resort to the bootstrap method due to its generic applicability and being easy to implement. Both parametric and nonparametric bootstrap likelihood ratio tests are studied. Numerical studies are implemented to evaluate the performance of the proposed bootstrap likelihood ratio test and compare to some existing methods for the identification of rare variants. To reduce the computational time of the bootstrap likelihood ratio test we propose an effective approximation mixture for the bootstrap null distribution. The GAW17 data is used to illustrate the proposed test.  相似文献   

10.
The statistical properties of sample estimation and bootstrap estimation of phylogenetic variability from a sample of nucleotide sequences are studied by using model trees of three taxa with an outgroup and by assuming a constant rate of nucleotide substitution. The maximum-parsimony method of tree reconstruction is used. An analytic formula is derived for estimating the sequence length that is required if P, the probability of obtaining the true tree from the sampled sequences, is to be equal to or higher than a given value. Bootstrap estimation is formulated as a two-step sampling procedure: (1) sampling of sequences from the evolutionary process and (2) resampling of the original sequence sample. The probability that a bootstrap resampling of an original sequence sample will support the true tree is found to depend on the model tree, the sequence length, and the probability that a randomly chosen nucleotide site is an informative site. When a trifurcating tree is used as the model tree, the probability that one of the three bifurcating trees will appear in > or = 95% of the bootstrap replicates is < 5%, even if the number of bootstrap replicates is only 50; therefore, the probability of accepting an erroneous tree as the true tree is < 5% if that tree appears in > or = 95% of the bootstrap replicates and if more than 50 bootstrap replications are conducted. However, if a particular bifurcating tree is observed in, say, < 75% of the bootstrap replicates, then it cannot be claimed to be better than the trifurcating tree even if > or = 1,000 bootstrap replications are conducted. When a bifurcating tree is used as the model tree, the bootstrap approach tends to overestimate P when the sequences are very short, but it tends to underestimate that probability when the sequences are long. Moreover, simulation results show that, if a tree is accepted as the true tree only if it has appeared in > or = 95% of the bootstrap replicates, then the probability of failing to accept any bifurcating tree can be as large as 58% even when P = 95%, i.e., even when 95% of the samples from the evolutionary process will support the true tree. Thus, if the rate-constancy assumption holds, bootstrapping is a conservative approach for estimating the reliability of an inferred phylogeny for four taxa.  相似文献   

11.
Assessment of the reliability of a given phylogenetic hypothesis is an important step in phylogenetic analysis. Historically, the nonparametric bootstrap procedure has been the most frequently used method for assessing the support for specific phylogenetic relationships. The recent employment of Bayesian methods for phylogenetic inference problems has resulted in clade support being expressed in terms of posterior probabilities. We used simulated data and the four-taxon case to explore the relationship between nonparametric bootstrap values (as inferred by maximum likelihood) and posterior probabilities (as inferred by Bayesian analysis). The results suggest a complex association between the two measures. Three general regions of tree space can be identified: (1) the neutral zone, where differences between mean bootstrap and mean posterior probability values are not significant, (2) near the two-branch corner, and (3) deep in the two-branch corner. In the last two regions, significant differences occur between mean bootstrap and mean posterior probability values. Whether bootstrap or posterior probability values are higher depends on the data in support of alternative topologies. Examination of star topologies revealed that both bootstrap and posterior probability values differ significantly from theoretical expectations; in particular, there are more posterior probability values in the range 0.85-1 than expected by theory. Therefore, our results corroborate the findings of others that posterior probability values are excessively high. Our results also suggest that extrapolations from single topology branch-length studies are unlikely to provide any general conclusions regarding the relationship between bootstrap and posterior probability values.  相似文献   

12.
Majority-rule reduced consensus trees and their use in bootstrapping   总被引:3,自引:0,他引:3  
Bootstrap analyses are usually summarized with majority-rule component consensus trees. This consensus method is based on replicated components and, like all component consensus methods, it is insensitive to other kinds of agreement between trees. Recently developed reduced consensus methods can be used to summarize much additional agreement on hypothesised phylogenetic relationships among multiple trees. The new methods are "strict" in the sense that they require agreement among all the trees being compared for any relationships to be represented in a consensus tree. Majority-rule reduced consensus methods are described and their use in bootstrap analyses is illustrated with a hypothetical and a real example. The new methods provide summaries of the bootstrap proportions of all n-taxon statements/partitions and facilitate the identification of hypotheses of relationships that are supported by high bootstrap proportions, in spite of a lack of support for particular components or clades. In practice majority-rule reduced consensus profiles may contain many trees. The size of the profile can be reduced by constraints on minimal bootstrap proportions and/or cardinality of the included trees. Majority-rule reduced consensus trees can also be selected a posteriori from the profile. Surrogates to the majority-rule reduced consensus methods using partition tables or tree pruning options provided by widely used phylogenetic inference software are also described. The methods are designed to produce more informative summaries of bootstrap analyses and thereby foster more informed assessment of the strengths and weaknesses of complex phylogenetic hypotheses.   相似文献   

13.
Chang Xuan Mao  Jun Li 《Biometrics》2009,65(4):1063-1067
Summary Comparing species assemblages given incidence‐based data is of importance in ecological studies, often done by a visual inspection of estimated species accumulation curves or by an ad hoc use of 95% pointwise confidence bands of these curves. It is shown that comparing species assemblages is a challenging problem. A χ2 test is proposed. An adjustment using an eigenvalue decomposition is proposed to overcome computational difficulties. The bootstrap method is also suggested to approximate the distribution of the proposed test statistic. The eigenvalue adjusted (Eva) χ2 test and the Eva‐bootstrap test are assessed by a simulation study. Both the Eva‐χ2 and the Eva‐bootstrap tests are applied to a study that involves two woody seedling species assemblages.  相似文献   

14.
Automated variable selection procedures, such as backward elimination, are commonly employed to perform model selection in the context of multivariable regression. The stability of such procedures can be investigated using a bootstrap‐based approach. The idea is to apply the variable selection procedure on a large number of bootstrap samples successively and to examine the obtained models, for instance, in terms of the inclusion of specific predictor variables. In this paper, we aim to investigate a particular important problem affecting this method in the case of categorical predictor variables with different numbers of categories and to give recommendations on how to avoid it. For this purpose, we systematically assess the behavior of automated variable selection based on the likelihood ratio test using either bootstrap samples drawn with replacement or subsamples drawn without replacement from the original dataset. Our study consists of extensive simulations and a real data example from the NHANES study. Our main result is that if automated variable selection is conducted on bootstrap samples, variables with more categories are substantially favored over variables with fewer categories and over metric variables even if none of them have any effect. Importantly, variables with no effect and many categories may be (wrongly) preferred to variables with an effect but few categories. We suggest the use of subsamples instead of bootstrap samples to bypass these drawbacks.  相似文献   

15.
In studies of morphology, methods for comparing amounts of variability are often important. Three different ways of utilizing determinants of covariance matrices for testing for surplus variability in a hypothesis sample compared to a reference sample are presented: an F-test based on standardized generalized variances, a parametric bootstrap based on draws on Wishart matrices, and a nonparametric bootstrap. The F-test based on standardized generalized variances and the Wishart-based bootstrap are applicable when multivariate normality can be assumed. These methods can be applied with only summary data available. However, the nonparametric bootstrap can be applied with multivariate nonnormally distributed data as well as multivariate normally distributed data, and small sample sizes. Therefore, this method is preferable when raw data are available. Three craniometric samples are used to present the methods. A Hungarian Zalavár sample and an Austrian Berg sample are compared to a Norwegian Oslo sample, the latter employed as reference sample. In agreement with a previous study, it is shown that the Zalavár sample does not represent surplus variability, whereas the Berg sample does represent such a surplus variability.  相似文献   

16.
S M Snapinn  J D Knoke 《Biometrics》1989,45(1):289-299
Accurate estimation of misclassification rates in discriminant analysis with selection of variables by, for example, a stepwise algorithm, is complicated by the large optimistic bias inherent in standard estimators such as those obtained by the resubstitution method. Application of a bootstrap adjustment can reduce the bias of the resubstitution method; however, the bootstrap technique requires the variable selection procedure to be repeated many times and is therefore difficult to compute. In this paper we propose a smoothed estimator that requires relatively little computation and which, on the basis of a Monte Carlo sampling study, is found to perform generally at least as well as the bootstrap method.  相似文献   

17.
Several analysis of the geographic variation of mortality rates in space have been proposed in the literature. Poisson models allowing the incorporation of random effects to model extra‐variability are widely used. The typical modelling approach uses normal random effects to accommodate local spatial autocorrelation. When spatial autocorrelation is absent but overdispersion persists, a discrete mixture model is an alternative approach. However, a technique for identifying regions which have significant high or low risk in any given area has not been developed yet when using the discrete mixture model. Taking into account the importance that this information provides to the epidemiologists to formulate hypothesis related to the potential risk factors affecting the population, different procedures for obtaining confidence intervals for relative risks are derived in this paper. These methods are the standard information‐based method and other four, all based on bootstrap techniques, namely the asymptotic‐bootstrap, the percentile‐bootstrap, the BC‐bootstrap and the modified information‐based method. All of them are compared empirically by their application to mortality data due to cardiovascular diseases in women from Navarra, Spain, during the period 1988–1994. In the small area example considered here, we find that the information‐based method is sensible at estimating standard errors of the component means in the discrete mixture model but it is not appropriate for providing standard errors of the estimated relative risks and hence, for constructing confidence intervals for the relative risk associated to each region. Therefore, the bootstrap‐based methods are recommended for this matter. More specifically, the BC method seems to provide better coverage probabilities in the case studied, according to a small scale simulation study that has been carried out using a scenario as encountered in the analysis of the real data.  相似文献   

18.
When predicting population dynamics, the value of the prediction is not enough and should be accompanied by a confidence interval that integrates the whole chain of errors, from observations to predictions via the estimates of the parameters of the model. Matrix models are often used to predict the dynamics of age- or size-structured populations. Their parameters are vital rates. This study aims (1) at assessing the impact of the variability of observations on vital rates, and then on model’s predictions, and (2) at comparing three methods for computing confidence intervals for values predicted from the models. The first method is the bootstrap. The second method is analytic and approximates the standard error of predictions by their asymptotic variance as the sample size tends to infinity. The third method combines use of the bootstrap to estimate the standard errors of vital rates with the analytical method to then estimate the errors of predictions from the model. Computations are done for an Usher matrix models that predicts the asymptotic (as time goes to infinity) stock recovery rate for three timber species in French Guiana. Little difference is found between the hybrid and the analytic method. Their estimates of bias and standard error converge towards the bootstrap estimates when the error on vital rates becomes small enough, which corresponds in the present case to a number of observations greater than 5000 trees.  相似文献   

19.
Wood SN 《Biometrics》2001,57(1):240-244
Objective functions that arise when fitting nonlinear models often contain local minima that are of little significance except for their propensity to trap minimization algorithms. The standard methods for attempting to deal with this problem treat the objective function as fixed and employ stochastic minimization approaches in the hope of randomly jumping out of local minima. This article suggests a simple trick for performing such minimizations that can be employed in conjunction with most conventional nonstochastic fitting methods. The trick is to stochastically perturb the objective function by bootstrapping the data to be fit. Each bootstrap objective shares the large-scale structure of the original objective but has different small-scale structure. Minimizations of bootstrap objective functions are alternated with minimizations of the original objective function starting from the parameter values with which minimization of the previous bootstrap objective terminated. An example is presented, fitting a nonlinear population dynamic model to population dynamic data and including a comparison of the suggested method with simulated annealing. Convergence diagnostics are discussed.  相似文献   

20.
Interior-branch and bootstrap tests of phylogenetic trees   总被引:19,自引:3,他引:16  
We have compared statistical properties of the interior-branch and bootstrap tests of phylogenetic trees when the neighbor-joining tree- building method is used. For each interior branch of a predetermined topology, the interior-branch and bootstrap tests provide the confidence values, PC and PB, respectively, that indicate the extent of statistical support of the sequence cluster generated by the branch. In phylogenetic analysis these two values are often interpreted in the same way, and if PC and PB are high (say, > or = 0.95), the sequence cluster is regarded as reliable. We have shown that PC is in fact the complement of the P-value used in the standard statistical test, but PB is not. Actually, the bootstrap test usually underestimates the extent of statistical support of species clusters. The relationship between the confidence values obtained by the two tests varies with both the topology and expected branch lengths of the true (model) tree. The most conspicuous difference between PC and PB is observed when the true tree is starlike, and there is a tendency for the difference to increase as the number of sequences in the tree increases. The reason for this is that the bootstrap test tends to become progressively more conservative as the number of sequences in the tree increases. Unlike the bootstrap, the interior-branch test has the same statistical properties irrespective of the number of sequences used when a predetermined tree is considered. Therefore, the interior-branch test appears to be preferable to the bootstrap test as long as unbiased estimators of evolutionary distances are used. However, when the interior-branch is applied to a tree estimated from a given data set, PC may give an overestimate of statistical confidence. For this case, we developed a method for computing a modified version (P'C) of the PC value and showed that this P'C tends to give a conservative estimate of statistical confidence, though it is not as conservative as PB. In this paper we have introduced a model in which evolutionary distances between sequences follow a multivariate normal distribution. This model allowed us to study the relationships between the two tests analytically.   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号