首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Collings and Hamilton (1988), described a uniform bootstrap method that is applied on observed or pilot data in order to approximate the power of the two-sample Wilcoxon test for location shift alternatives. In this paper we demonstrate how importance and antithetic resampling can be used to substantially reduce the amount of computation needed to approximate the power of the two-sample tests for location shift and scale alternatives. Importance and antithetic bootstrap resampling methods are applied to simulated data of different sample sizes from a variety of distributions as well as to data from the Iowa 65+ Rural Health Study. Also, a suggestion is given for using a combination of importance and antithetic resampling for approximating the power of two-sample tests.  相似文献   

2.
Adaptive two‐stage designs allow a data‐driven change of design characteristics during the ongoing trial. One of the available options is an adaptive choice of the test statistic for the second stage of the trial based on the results of the interim analysis. Since there is often only a vague knowledge of the distribution shape of the primary endpoint in the planning phase of a study, a change of the test statistic may then be considered if the data indicate that the assumptions underlying the initial choice of the test are not correct. Collings and Hamilton proposed a bootstrap method for the estimation of the power of the two‐sample Wilcoxon test for shift alternatives. We use this approach for the selection of the test statistic. By means of a simulation study, we show that the gain in terms of power may be considerable when the initial assumption about the underlying distribution was wrong, whereas the loss is relatively small when in the first instance the optimal test statistic was chosen. The results also hold true for comparison with a one‐stage design. Application of the method is illustrated by a clinical trial example.  相似文献   

3.
Power calculations of a statistical test require that the underlying population distribution(s) be completely specified. Statisticians, in practice, may not have complete knowledge of the entire nature of the underlying distribution(s) and are at a loss for calculating the exact power of the test. In such cases, an estimate of the power would provide a suitable substitute. In this paper, we are interested in estimating the power of the Kruskal-Wallis one-way analysis of variance by ranks test for a location shift. We investigated an extension of a data-based power estimation method presented by Collings and Hamilton (1988), which requires no prior knowledge of the underlying population distributions other than necessary to perform the Kruskal-Wallis test for a location shift. This method utilizes bootstrapping techniques to produce a power estimate based on the empirical cumulative distribution functions of the sample data. We performed a simulation study of the extended power estimator under the conditions of k = 3 and k = 5 samples of equal sizes m = 10 and m = 20, with four underlying continuous distributions that possessed various location configurations. Our simulation study demonstates that the Extended Average × & Y power estimation method is a reliable estimator of the power of the Kruskal-Wallis test for k = 3 samples, and a more conservative to a mild overestimator of the true power for k = 5 samples.  相似文献   

4.
The bootstrap is an important tool for estimating the confidence interval of monophyletic groups within phylogenies. Although bootstrap analyses are used in most evolutionary studies, there is no clear consensus as how best to interpret bootstrap probability values. To study further the bootstrap method, nine small subunit ribosomal DNA (SSU rDNA) data sets were submitted to bootstrapped maximum parsimony (MP) analyses using unweighted and weighted sequence positions. Analyses of the lengths (i.e., parsimony steps) of the bootstrap trees show that the shape and mean of the bootstrap tree distribution may provide important insights into the evolutionary signal within the sequence data. With complex phylogenies containing nodes defined by short internal branches (multifurcations), the mean of the bootstrap tree distribution may differ by 2 standard deviations from the length of the best tree found from the original data set. Weighting sequence positions significantly increases the bootstrap values at internal nodes. There may, however, be strong bootstrap support for conflicting species groupings among different data sets. This phenomenon appears to result from a correlation between the topology of the tree used to create the weights and the topology of the bootstrap consensus tree inferred from the MP analysis of these weighted data. The analyses also show that characteristics of the bootstrap tree distribution (e.g., skewness) may be used to choose between alternative weighting schemes for phylogenetic analyses.  相似文献   

5.
Bootstrap method of interior-branch test for phylogenetic trees   总被引:7,自引:2,他引:5  
Statistical properties of the bootstrap test of interior branch lengths of phylogenetic trees have been studied and compared with those of the standard interior-branch test in computer simulations. Examination of the properties of the tests under the null hypothesis showed that both tests for an interior branch of a predetermined topology are quite reliable when the distribution of the branch length estimate approaches a normal distribution. Unlike the standard interior-branch test, the bootstrap test appears to retain this property even when the substitution rate varies among sites. In this case, the distribution of the branch length estimate deviates from a normal distribution, and the standard interior-branch test gives conservative confidence probability values. A simple correction method was developed for both interior- branch tests to be applied for testing the reliability of tree topologies estimated from sequence data. This correction for the standard interior-branch test appears to be as effective as that obtained in our previous study, though it is much simpler. The bootstrap and standard interior-branch tests for estimated topologies become conservative as the number of sequence groups in a star-like tree increases.   相似文献   

6.
A method is described to compare two evoked potential scalp fields in order to decide if the two fields are the same or different. The method uses Efron's bootstrap technique which avoids potential errors due to assumptions about the underlying stochastic process. It is configured to focus only on the shape of the evoked potential scalp field. The method is applied to a simple visually evoked potential paradigm and results are compared to the chi-square test using data from 7 normal subjects.  相似文献   

7.
Baggerly KA 《Cytometry》2001,45(2):141-150
BACKGROUND: A key problem in immunohistochemistry is assessing when two sample histograms are significantly different. One test that is commonly used for this purpose in the univariate case is the chi-squared test. Comparing multivariate distributions is qualitatively harder, as the "curse of dimensionality" means that the number of bins can grow exponentially. For the chi-squared test to be useful, data-dependent binning methods must be employed. An example of how this can be done is provided by the "probability binning" method of Roederer et al. (1,2,3). METHODS: We derive the theoretical distribution of the probability binning statistic, giving it a more rigorous foundation. We show that the null distribution is a scaled chi-square, and show how it can be related to the standard chi-squared statistic. RESULTS: A small simulation shows how the theoretical results can be used to (a) modify the probability binning statistic to make it more sensitive and (b) suggest variant statistics which, while still exploiting the data-dependent strengths of the probability binning procedure, may be easier to work with. CONCLUSIONS: The probability binning procedure effectively uses adaptive binning to locate structure in high-dimensional data. The derivation of a theoretical basis provides a more detailed interpretation of its behavior and renders the probability binning method more flexible.  相似文献   

8.
In this paper we describe and test a new method for characterizing the space use patterns of individual animals on the basis of successive locations of marked individuals. Existing methods either do not describe space use in probabilistic terms, e.g. the maximum distance between locations or the area of the convex hull of all locations, or they assume a priori knowledge of the probabilistic shape of each individual's use pattern, e.g. bivariate or circular normal distributions. We develop a method for calculating a probability of location distribution for an average individual member of a population that requires no assumptions about the shape of the distribution (we call this distribution the population utilization distribution or PUD). Using nine different sets of location data, we demonstrate that these distributions accurately characterize the space use patterns of the populations from which they were derived. The assumption of normality is found to result in a consistent and significant overestimate of the area of use. We then describe a function which relates probability of location to area (termed the MAP index) which has a number of advantages over existing size indices. Finally, we show how any quantities such as the MAP index derived from our average distributions can be subjected to standard statistical tests of significance.  相似文献   

9.
The recently-developed statistical method known as the “bootstrap” can be used to place confidence intervals on phylogenies. It involves resampling points from one's own data, with replacement, to create a series of bootstrap samples of the same size as the original data. Each of these is analyzed, and the variation among the resulting estimates taken to indicate the size of the error involved in making estimates from the original data. In the case of phylogenies, it is argued that the proper method of resampling is to keep all of the original species while sampling characters with replacement, under the assumption that the characters have been independently drawn by the systematist and have evolved independently. Majority-rule consensus trees can be used to construct a phylogeny showing all of the inferred monophyletic groups that occurred in a majority of the bootstrap samples. If a group shows up 95% of the time or more, the evidence for it is taken to be statistically significant. Existing computer programs can be used to analyze different bootstrap samples by using weights on the characters, the weight of a character being how many times it was drawn in bootstrap sampling. When all characters are perfectly compatible, as envisioned by Hennig, bootstrap sampling becomes unnecessary; the bootstrap method would show significant evidence for a group if it is defined by three or more characters.  相似文献   

10.
Usually, genetic correlations are estimated from breeding designs in the laboratory or greenhouse. However, estimates of the genetic correlation for natural populations are lacking, mostly because pedigrees of wild individuals are rarely known. Recently Lynch (1999) proposed a formula to estimate the genetic correlation in the absence of data on pedigree. This method has been shown to be particularly accurate provided a large sample size and a minimum (20%) proportion of relatives. Lynch (1999) proposed the use of the bootstrap to estimate standard errors associated with genetic correlations, but did not test the reliability of such a method. We tested the bootstrap and showed the jackknife can provide valid estimates of the genetic correlation calculated with the Lynch formula. The occurrence of undefined estimates, combined with the high number of replicates involved in the bootstrap, means there is a high probability of obtaining a biased upward, incomplete bootstrap, even when there is a high fraction of related pairs in a sample. It is easier to obtain complete jackknife estimates for which all the pseudovalues have been defined. We therefore recommend the use of the jackknife to estimate the genetic correlation with the Lynch formula. Provided data can be collected for more than two individuals at each location, we propose a group sampling method that produces low standard errors associated with the jackknife, even when there is a low fraction of relatives in a sample.  相似文献   

11.
Recently, empirical evidence was presented that the permutation tail probability (PTP) test has extremely low discriminatory power when assessing character covariance in phylogenetic data based on bootstrap measures of confidence. Here we are concerned with the problem of using one statistical approach, especially when applied to empirical data, to judge the performance of another. Applying an appropriate statistical approach, we statistically demonstrated that the PTP test is extremely weak in detecting the absence of character covariation. In addition, we show that PTP is highly dependent on the number of terminals and the proportion of character states in phylogenetic matrices. In conclusion, we advocate the use of simulation studies when testing the performance of statistical tools applied to phylogenetic data.  相似文献   

12.
Summary As the nonparametric generalization of the one‐way analysis of variance model, the Kruskal–Wallis test applies when the goal is to test the difference between multiple samples and the underlying population distributions are nonnormal or unknown. Although the Kruskal–Wallis test has been widely used for data analysis, power and sample size methods for this test have been investigated to a much lesser extent. This article proposes new power and sample size calculation methods for the Kruskal–Wallis test based on the pilot study in either a completely nonparametric model or a semiparametric location model. No assumption is made on the shape of the underlying population distributions. Simulation results show that, in terms of sample size calculation for the Kruskal–Wallis test, the proposed methods are more reliable and preferable to some more traditional methods. A mouse peritoneal cavity study is used to demonstrate the application of the methods.  相似文献   

13.
In microarray studies it is common that the number of replications (i.e. the sample size) is small and that the distribution of expression values differs from normality. In this situation, permutation and bootstrap tests may be appropriate for the identification of differentially expressed genes. However, unlike bootstrap tests, permutation tests are not suitable for very small sample sizes, such as three per group. A variety of different bootstrap tests exists. For example, it is possible to adjust the data to have a common mean before the bootstrap samples are drawn. For small significance levels, which can occur when a large number of genes is investigated, the original bootstrap test, as well as a bootstrap test suggested for the Behrens-Fisher problem, have no power in cases of very small sample sizes. In contrast, the modified test based on adjusted data is powerful. Using a Monte Carlo simulation study, we demonstrate that the difference in power can be huge. In addition, the different tests are illustrated using microarray data.  相似文献   

14.
Overdispersion is a common phenomenon in Poisson modeling, and the negative binomial (NB) model is frequently used to account for overdispersion. Testing approaches (Wald test, likelihood ratio test (LRT), and score test) for overdispersion in the Poisson regression versus the NB model are available. Because the generalized Poisson (GP) model is similar to the NB model, we consider the former as an alternate model for overdispersed count data. The score test has an advantage over the LRT and the Wald test in that the score test only requires that the parameter of interest be estimated under the null hypothesis. This paper proposes a score test for overdispersion based on the GP model and compares the power of the test with the LRT and Wald tests. A simulation study indicates the score test based on asymptotic standard Normal distribution is more appropriate in practical application for higher empirical power, however, it underestimates the nominal significance level, especially in small sample situations, and examples illustrate the results of comparing the candidate tests between the Poisson and GP models. A bootstrap test is also proposed to adjust the underestimation of nominal level in the score statistic when the sample size is small. The simulation study indicates the bootstrap test has significance level closer to nominal size and has uniformly greater power than the score test based on asymptotic standard Normal distribution. From a practical perspective, we suggest that, if the score test gives even a weak indication that the Poisson model is inappropriate, say at the 0.10 significance level, we advise the more accurate bootstrap procedure as a better test for comparing whether the GP model is more appropriate than Poisson model. Finally, the Vuong test is illustrated to choose between GP and NB2 models for the same dataset.  相似文献   

15.
Population multiple components is a statistical tool useful for the analysis of time-dependent hybrid data. With a small number of parameters, it is possible to model and to predict the periodic behavior of a population. In this article, we propose two methods to compare among populations rhythmometric parameters obtained by multiple component analysis. The first is a parametric method based in the usual statistical techniques for comparison of mean vectors in multivariate normal populations. The method, through MANOVA analysis, allows comparison of the MESOR and amplitude-acrophase pair of each component among two or more populations. The second is a nonparametric method, based in bootstrap techniques, to compare parameters from two populations. This test allows one to compare the MESOR, the amplitude, and the acrophase of each fitted component, as well as the global amplitude, orthophase, and bathyphase estimated when all fitted components are harmonics of a fundamental period. The idea is to calculate a confidence interval for the difference of the parameters of interest. If this interval does not contain zero, it can be concluded that the parameters from the two models are different with high probability. An estimation of p-value for the corresponding test can also be calculated. Both methods are illustrated with an example, based on clinical data. The nonparametric test can also be applied to paired data, a special situation of great interest in practice. By the use of similar bootstrap techniques, we illustrate how to construct confidence intervals for any rhythmometric parameter estimated from population multiple components models, including the orthophase, bathyphase, and global amplitude. These tests for comparison of parameters among populations are a needed tool when modeling the nonsinusoidal rhythmic behavior of hybrid data by population multiple component analysis.  相似文献   

16.
Population multiple components is a statistical tool useful for the analysis of time-dependent hybrid data. With a small number of parameters, it is possible to model and to predict the periodic behavior of a population. In this article, we propose two methods to compare among populations rhythmometric parameters obtained by multiple component analysis. The first is a parametric method based in the usual statistical techniques for comparison of mean vectors in multivariate normal populations. The method, through MANOVA analysis, allows comparison of the MESOR and amplitude-acrophase pair of each component among two or more populations. The second is a nonparametric method, based in bootstrap techniques, to compare parameters from two populations. This test allows one to compare the MESOR, the amplitude, and the acrophase of each fitted component, as well as the global amplitude, orthophase, and bathyphase estimated when all fitted components are harmonics of a fundamental period. The idea is to calculate a confidence interval for the difference of the parameters of interest. If this interval does not contain zero, it can be concluded that the parameters from the two models are different with high probability. An estimation of p-value for the corresponding test can also be calculated. Both methods are illustrated with an example, based on clinical data. The nonparametric test can also be applied to paired data, a special situation of great interest in practice. By the use of similar bootstrap techniques, we illustrate how to construct confidence intervals for any rhythmometric parameter estimated from population multiple components models, including the orthophase, bathyphase, and global amplitude. These tests for comparison of parameters among populations are a needed tool when modeling the nonsinusoidal rhythmic behavior of hybrid data by population multiple component analysis.  相似文献   

17.
A bootstrap method suggested by COLLINGS and HAMILTON (1988) to estimate the power of the two-sample Wilcoxon test is adapted to use for estimating the power of the Gehan test. A Monte Carlo simulation study is done to determine how well the method works in this case.  相似文献   

18.
Subcellular location of protein is constructive information in determining its function, screening for drug candidates, vaccine design, annotation of gene products and in selecting relevant proteins for further studies. Computational prediction of subcellular localization deals with predicting the location of a protein from its amino acid sequence. For a computational localization prediction method to be more accurate, it should exploit all possible relevant biological features that contribute to the subcellular localization. In this work, we extracted the biological features from the full length protein sequence to incorporate more biological information. A new biological feature, distribution of atomic composition is effectively used with, multiple physiochemical properties, amino acid composition, three part amino acid composition, and sequence similarity for predicting the subcellular location of the protein. Support Vector Machines are designed for four modules and prediction is made by a weighted voting system. Our system makes prediction with an accuracy of 100, 82.47, 88.81 for self-consistency test, jackknife test and independent data test respectively. Our results provide evidence that the prediction based on the biological features derived from the full length amino acid sequence gives better accuracy than those derived from N-terminal alone. Considering the features as a distribution within the entire sequence will bring out underlying property distribution to a greater detail to enhance the prediction accuracy.  相似文献   

19.
The paper deals with the classical two-sample testing problem for the equality of two populations, one of the most fundamental problems in biomedical experiments and case–control studies. The most familiar alternatives are the difference in location parameters or the difference in scale parameters or in both the parameters of the population density. All the tests designed for classical location or scale or location–scale alternatives assume that there is no change in the shape of the distribution. Some authors also consider the Lehmann-type alternative that addresses the change in shape. Two-sample tests under Lehmann alternative assume that the location and scale parameters are invariant. In real life, when a shift in the distribution occurs, one or more of the location, scale, and shape parameters may change simultaneously. We refer to change of one or more of the three parameters as a versatile alternative. Noting the dearth of literature for the equality two populations against such versatile alternative, we introduce two distribution-free tests based on the Euclidean and Mahalanobis distance. We obtain the asymptotic distributions of the two test statistics and study asymptotic power. We also discuss approximating p-values of the proposed tests in real applications with small samples. We compare the power performance of the two tests with several popular existing distribution-free tests against various fixed alternatives using Monte Carlo. We provide two illustrations based on biomedical experiments. Unlike existing tests which are suitable only in certain situations, proposed tests offer very good power in almost all types of shifts.  相似文献   

20.
Susko E 《Systematic biology》2008,57(4):602-612
Several authors have recently noted that when data are generated from a star topology, posterior probabilities can often be very large, even with arbitrarily large sequence lengths. This is counter to intuition, which suggests convergence to the limit of equal probability for each topology. Here the limiting distributions of bootstrap support and posterior probabilities are obtained for a four-taxon star tree. Theoretical results are given, providing confirmation that this counterintuitive phenomenon holds for both posterior probabilities and bootstrap support. For large samples the limiting results for posterior probabilities are the same regardless of the prior. With equal-length terminal edges, the limiting distribution is similar but not the same across different choices for the lengths of the edges. In contrast to previous results, the case of unequal lengths of terminal edges is considered. With two long edges, the posterior probability of the tree with long edges together tends to be much larger. Using the neighbor-joining algorithm, with equal edge lengths, the distribution of bootstrap support tends to be qualitatively comparable to posterior probabilities. As with posterior probabilities, when two of the edges are long, bootstrap support for the tree with long branches together tends to be large. The bias is less pronounced, however, as the distribution of bootstrap support gets close to uniform for this tree, whereas posterior probabilities are much more likely to be large. Our findings for maximum likelihood estimation are based entirely on simulation and in contrast suggest that bootstrap support tends to be fairly constant across edge-length choices.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号