首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Bilder CR  Loughin TM 《Biometrics》2004,60(1):241-248
Questions that ask respondents to "choose all that apply" from a set of items occur frequently in surveys. Categorical variables that summarize this type of survey data are called both pick any/c variables and multiple-response categorical variables. It is often of interest to test for independence between two categorical variables. When both categorical variables can have multiple responses, traditional Pearson chi-square tests for independence should not be used because of the within-subject dependence among responses. An intuitively constructed version of the Pearson statistic is proposed to perform the test using bootstrap procedures to approximate its sampling distribution. First- and second-order adjustments to the proposed statistic are given in order to use a chi-square distribution approximation. A Bonferroni adjustment is proposed to perform the test when the joint set of responses for individual subjects is unavailable. Simulations show that the bootstrap procedures hold the correct size more consistently than the other procedures.  相似文献   

2.
Yin G  Cai J 《Biometrics》2005,61(1):151-161
As an alternative to the mean regression model, the quantile regression model has been studied extensively with independent failure time data. However, due to natural or artificial clustering, it is common to encounter multivariate failure time data in biomedical research where the intracluster correlation needs to be accounted for appropriately. For right-censored correlated survival data, we investigate the quantile regression model and adapt an estimating equation approach for parameter estimation under the working independence assumption, as well as a weighted version for enhancing the efficiency. We show that the parameter estimates are consistent and asymptotically follow normal distributions. The variance estimation using asymptotic approximation involves nonparametric functional density estimation. We employ the bootstrap and perturbation resampling methods for the estimation of the variance-covariance matrix. We examine the proposed method for finite sample sizes through simulation studies, and illustrate it with data from a clinical trial on otitis media.  相似文献   

3.
The bootstrap is an important tool for estimating the confidence interval of monophyletic groups within phylogenies. Although bootstrap analyses are used in most evolutionary studies, there is no clear consensus as how best to interpret bootstrap probability values. To study further the bootstrap method, nine small subunit ribosomal DNA (SSU rDNA) data sets were submitted to bootstrapped maximum parsimony (MP) analyses using unweighted and weighted sequence positions. Analyses of the lengths (i.e., parsimony steps) of the bootstrap trees show that the shape and mean of the bootstrap tree distribution may provide important insights into the evolutionary signal within the sequence data. With complex phylogenies containing nodes defined by short internal branches (multifurcations), the mean of the bootstrap tree distribution may differ by 2 standard deviations from the length of the best tree found from the original data set. Weighting sequence positions significantly increases the bootstrap values at internal nodes. There may, however, be strong bootstrap support for conflicting species groupings among different data sets. This phenomenon appears to result from a correlation between the topology of the tree used to create the weights and the topology of the bootstrap consensus tree inferred from the MP analysis of these weighted data. The analyses also show that characteristics of the bootstrap tree distribution (e.g., skewness) may be used to choose between alternative weighting schemes for phylogenetic analyses.  相似文献   

4.
Due to its speed, the distance approach remains the best hope for building phylogenies on very large sets of taxa. Recently (R. Desper and O. Gascuel, J. Comp. Biol. 9:687-705, 2002), we introduced a new "balanced" minimum evolution (BME) principle, based on a branch length estimation scheme of Y. Pauplin (J. Mol. Evol. 51:41-47, 2000). Initial simulations suggested that FASTME, our program implementing the BME principle, was more accurate than or equivalent to all other distance methods we tested, with running time significantly faster than Neighbor-Joining (NJ). This article further explores the properties of the BME principle, and it explains and illustrates its impressive topological accuracy. We prove that the BME principle is a special case of the weighted least-squares approach, with biologically meaningful variances of the distance estimates. We show that the BME principle is statistically consistent. We demonstrate that FASTME only produces trees with positive branch lengths, a feature that separates this approach from NJ (and related methods) that may produce trees with branches with biologically meaningless negative lengths. Finally, we consider a large simulated data set, with 5,000 100-taxon trees generated by the Aldous beta-splitting distribution encompassing a range of distributions from Yule-Harding to uniform, and using a covarion-like model of sequence evolution. FASTME produces trees faster than NJ, and much faster than WEIGHBOR and the weighted least-squares implementation of PAUP*. Moreover, FASTME trees are consistently more accurate at all settings, ranging from Yule-Harding to uniform distributions, and all ranges of maximum pairwise divergence and departure from molecular clock. Interestingly, the covarion parameter has little effect on the tree quality for any of the algorithms. FASTME is freely available on the web.  相似文献   

5.
We show empirically that the PTP test has very little discriminatory power, with highly significant PTP test probabilities often being associated with parsimony data that produce trees with low confidence (as measured by bootstrapping) and resolution. Because of this, we argue that the PTP test is useful only in the following, very limited way: if a data set fails the PTP test, it should not be used in a phylogenetic analysis. More conservative methods of measuring confidence such as the bootstrap or decay index are preferable.  相似文献   

6.
年龄-龄期两性生命表(age-stage, two-sex life table)简称两性生命表,是种群生态学研究与害虫治理中常用的重要理论与分析工具。根据两性生命表理论而设计的方便用户的软件TWOSEX-MSChart近年来被越来越多国内外学者用于昆虫种群研究的数据分析。两性生命表软件的分析功能是由许多的统计技术与计算机模拟方法作为数据分析的支撑,其中自我重复取样(bootstrap)是其重要技术之一。本文详述了bootstrap技术的基本原理、方法、优缺点及其在两性生命表分析中的应用,并介绍了其理论基础多项式定理(multinomial theorem)在生命表研究中的应用。与常用统计方法相比,bootstrap不需要数据分布假设就可以对数据总体的分布特性进行统计和推断。在两性生命表分析中,bootstrap不仅可以估算种群参数或一般统计值的方差和标准误,同时利用paired bootstrap test还可以比较不同处理间的差异,准确显示种群的变异性。利用相同的自我重复取样样本(same bootstrap samples)可以正确计算昆虫的孵化率与不同繁殖型对种群参数的贡献,并...  相似文献   

7.
In studies of morphology, methods for comparing amounts of variability are often important. Three different ways of utilizing determinants of covariance matrices for testing for surplus variability in a hypothesis sample compared to a reference sample are presented: an F-test based on standardized generalized variances, a parametric bootstrap based on draws on Wishart matrices, and a nonparametric bootstrap. The F-test based on standardized generalized variances and the Wishart-based bootstrap are applicable when multivariate normality can be assumed. These methods can be applied with only summary data available. However, the nonparametric bootstrap can be applied with multivariate nonnormally distributed data as well as multivariate normally distributed data, and small sample sizes. Therefore, this method is preferable when raw data are available. Three craniometric samples are used to present the methods. A Hungarian Zalavár sample and an Austrian Berg sample are compared to a Norwegian Oslo sample, the latter employed as reference sample. In agreement with a previous study, it is shown that the Zalavár sample does not represent surplus variability, whereas the Berg sample does represent such a surplus variability.  相似文献   

8.
The data used in studies of bivariate interspecific allometry usually violate the assumption of statistical independence. Although the traits of each species are commonly treated as independent, the expression of a trait among species within a genus may covary because of shared common ancestry. The same effect exists for genera within a family and so on up the phylogenetic hierarchy. Determining sample size by counting data points overestimates the effective sample size, which then leads to overestimating the degrees of freedom that should be used in calculating probabilities and confidence intervals. This results in an inflated Type 1 error rate. Although some workers (e.g., Felsenstein [1985] Am. Nat. 125:1–15) have suggested that this issue may invalidate interspecific allometry as a comparative method, a correction for the problem can be approximated with variance components from a nested analysis of variance. Variance components partition the total variation in the data set among the levels of the nested hierarchy. If the variance component for each nested level is weighted by the number of groups at that level, the sum of these values is an estimate of an effective sample size for the data set which reflects the effects of phylogenetic constraint. Analysis of two data sets, using taxonomy to define levels of the nested hierarchy, suggests that it has been common for published studies of interspecific allometry to severely overestimate the number of degrees of freedom. Interspecific allometry remains an important comparative method for evaluating questions concerning individual species that are not similarly addressed by the format of most of the newer comparative methods. With the correction proposed here for estimating degrees of freedom, the major statistical weakness of the procedure is substantially reduced. © 1994 Wiley-Liss, Inc.  相似文献   

9.
Whole-genome or multiple gene phylogenetic analysis is of interest since single gene analysis often results in poorly resolved trees. Here, the use of spectral techniques for analyzing multigene data sets is explored. The protein sequences are treated as categorical time series, and a measure of similarity between a pair of sequences, the spectral covariance, is based on the common periodicity between these two sequences. Unlike the other methods, the spectral covariance method focuses on the relationship between the sites of genetic sequences. By properly scaling the dissimilarity measures derived from different genes between a pair of species, we can use the mean of these scaled dissimilarity measures as a summary statistic to measure the taxonomic distances across multiple genes. The methods are applied to three different data sets, one noncontroversial and two with some dispute over the correct placement of the taxa in the tree. Trees are constructed using two distance-based methods, BIONJ and FITCH. A variation of block bootstrap sampling method is used for inference. The methods are able to recover all major clades in the corresponding reference trees with moderate to high bootstrap support. Through simulations, we show that the covariance-based methods effectively capture phylogenetic signal even when structural information is not fully retained. Comparisons of simulation results with the bootstrap permutation results indicate that the covariance-based methods are fairly robust under perturbations in sequence similarity but more sensitive to perturbations in structural similarity.  相似文献   

10.
Interior-branch and bootstrap tests of phylogenetic trees   总被引:19,自引:3,他引:16  
We have compared statistical properties of the interior-branch and bootstrap tests of phylogenetic trees when the neighbor-joining tree- building method is used. For each interior branch of a predetermined topology, the interior-branch and bootstrap tests provide the confidence values, PC and PB, respectively, that indicate the extent of statistical support of the sequence cluster generated by the branch. In phylogenetic analysis these two values are often interpreted in the same way, and if PC and PB are high (say, > or = 0.95), the sequence cluster is regarded as reliable. We have shown that PC is in fact the complement of the P-value used in the standard statistical test, but PB is not. Actually, the bootstrap test usually underestimates the extent of statistical support of species clusters. The relationship between the confidence values obtained by the two tests varies with both the topology and expected branch lengths of the true (model) tree. The most conspicuous difference between PC and PB is observed when the true tree is starlike, and there is a tendency for the difference to increase as the number of sequences in the tree increases. The reason for this is that the bootstrap test tends to become progressively more conservative as the number of sequences in the tree increases. Unlike the bootstrap, the interior-branch test has the same statistical properties irrespective of the number of sequences used when a predetermined tree is considered. Therefore, the interior-branch test appears to be preferable to the bootstrap test as long as unbiased estimators of evolutionary distances are used. However, when the interior-branch is applied to a tree estimated from a given data set, PC may give an overestimate of statistical confidence. For this case, we developed a method for computing a modified version (P'C) of the PC value and showed that this P'C tends to give a conservative estimate of statistical confidence, though it is not as conservative as PB. In this paper we have introduced a model in which evolutionary distances between sequences follow a multivariate normal distribution. This model allowed us to study the relationships between the two tests analytically.   相似文献   

11.
Susko E 《Systematic biology》2011,60(5):668-675
Generalized least squares (GLS) methods provide a relatively fast means of constructing a confidence set of topologies. Because they utilize information about the covariances between distances, it is reasonable to expect additional efficiency in estimation and confidence set construction relative to other least squares (LS) methods. Difficulties have been found to arise in a number of practical settings due to estimates of covariance matrices being ill conditioned or even noninvertible. We present here new ways of estimating the covariance matrices for distances that are much more likely to be positive definite, as the actual covariance matrices are. A thorough investigation of performance is also conducted. An alternative to GLS that has been proposed for constructing confidence sets of topologies is weighted least squares (WLS). As currently implemented, this approach is equivalent to the use of GLS but with covariances set to zero rather than being estimated. In effect, this approach assumes normality of the estimated distances and zero covariances. As the results here illustrate, this assumption leads to poor performance. A 95% confidence set is almost certain to contain the true topology but will contain many more topologies than are needed. On the other hand, the results here also indicate that, among LS methods, WLS performs quite well at estimating the correct topology. It turns out to be possible to improve the performance of WLS for confidence set construction through a relatively inexpensive normal parametric bootstrap that utilizes the same variances and covariances of GLS. The resulting procedure is shown to perform at least as well as GLS and thus provides a reasonable alternative in cases where covariance matrices are ill conditioned.  相似文献   

12.
Two methods are commonly employed for evaluating the extent of the uncertainty of evolutionary distances between sequences: either some estimator of the variance of the distance estimator, or the bootstrap method. However, both approaches can be misleading, particularly when the evolutionary distance is small. We propose using another statistical method which does not have the same defect: interval estimation. We show how confidence intervals may be constructed for the Jukes and Cantor (1969) and Kimura two-parameter (1980) estimators. We compare the exact confidence intervals thus obtained with the approximate intervals derived by the two previous methods, using artificial and biological data. The results show that the usual methods clearly underestimate the variability when the substitution rate is low and when sequences are short. Moreover, our analysis suggests that similar results may be expected for other evolutionary distance estimators.   相似文献   

13.
An approximately unbiased (AU) test that uses a newly devised multiscale bootstrap technique was developed for general hypothesis testing of regions in an attempt to reduce test bias. It was applied to maximum-likelihood tree selection for obtaining the confidence set of trees. The AU test is based on the theory of Efron et al. (Proc. Natl. Acad. Sci. USA 93:13429-13434; 1996), but the new method provides higher-order accuracy yet simpler implementation. The AU test, like the Shimodaira-Hasegawa (SH) test, adjusts the selection bias overlooked in the standard use of the bootstrap probability and Kishino-Hasegawa tests. The selection bias comes from comparing many trees at the same time and often leads to overconfidence in the wrong trees. The SH test, though safe to use, may exhibit another type of bias such that it appears conservative. Here I show that the AU test is less biased than other methods in typical cases of tree selection. These points are illustrated in a simulation study as well as in the analysis of mammalian mitochondrial protein sequences. The theoretical argument provides a simple formula that covers the bootstrap probability test, the Kishino-Hasegawa test, the AU test, and the Zharkikh-Li test. A practical suggestion is provided as to which test should be used under particular circumstances.  相似文献   

14.
A fundamental task in sequence analysis is to calculate the probability of a multiple alignment given a phylogenetic tree relating the sequences and an evolutionary model describing how sequences change over time. However, the most widely used phylogenetic models only account for residue substitution events. We describe a probabilistic model of a multiple sequence alignment that accounts for insertion and deletion events in addition to substitutions, given a phylogenetic tree, using a rate matrix augmented by the gap character. Starting from a continuous Markov process, we construct a non-reversible generative (birth-death) evolutionary model for insertions and deletions. The model assumes that insertion and deletion events occur one residue at a time. We apply this model to phylogenetic tree inference by extending the program dnaml in phylip. Using standard benchmarking methods on simulated data and a new "concordance test" benchmark on real ribosomal RNA alignments, we show that the extended program dnamlepsilon improves accuracy relative to the usual approach of ignoring gaps, while retaining the computational efficiency of the Felsenstein peeling algorithm.  相似文献   

15.
Tests for equal relative variation are valuable and frequently used tools for evaluating hypotheses about taxonomic heterogeneity in fossil hominids. In this study, Monte Carlo methods and simulated data are used to evaluate and compare 11 tests for equal relative variation. The tests evaluated include CV-based parametric bootstrap tests, modifications of Levene's test, and modified weighted scores tests. The results of these simulations show that a modified version of the weighted scores test developed by Fligner and Killeen ([1976] J. Am. Stat. Assoc. 71:210-213) is the only test that maintains an acceptable balance of type I and type II errors, even under conditions where all other tests have extraordinarily high type I error rates or little power.  相似文献   

16.
When primary endpoints of randomized trials are continuous variables, the analysis of covariance (ANCOVA) with pre-treatment measurements as a covariate is often used to compare two treatment groups. In the ANCOVA, equal slopes (coefficients of pre-treatment measurements) and equal residual variances are commonly assumed. However, random allocation guarantees only equal variances of pre-treatment measurements. Unequal covariances and variances of post-treatment measurements indicate unequal slopes and, usually, unequal residual variances. For non-normal data with unequal covariances and variances of post-treatment measurements, it is known that the ANCOVA with equal slopes and equal variances using an ordinary least-squares method provides an asymptotically normal estimator for the treatment effect. However, the asymptotic variance of the estimator differs from the variance estimated from a standard formula, and its property is unclear. Furthermore, the asymptotic properties of the ANCOVA with equal slopes and unequal variances using a generalized least-squares method are unclear. In this paper, we consider non-normal data with unequal covariances and variances of post-treatment measurements, and examine the asymptotic properties of the ANCOVA with equal slopes using the variance estimated from a standard formula. Analytically, we show that the actual type I error rate, thus the coverage, of the ANCOVA with equal variances is asymptotically at a nominal level under equal sample sizes. That of the ANCOVA with unequal variances using a generalized least-squares method is asymptotically at a nominal level, even under unequal sample sizes. In conclusion, the ANCOVA with equal slopes can be asymptotically justified under random allocation.  相似文献   

17.
When ecological traits differ seasonally among biological groups, environmental adaptation is expected. However, current analytical methods for such seasonal data ignore the circular nature of data and therefore are likely to be flawed. We propose a simple bootstrap hypothesis test which statistically quantifies the presence/absence of differences in peak months for multiple groups taking the circular nature of the data into account. The test is based on a robust distribution-free method. Simulations showed that the test gives a satisfactory performance. The test is illustrated using data for anchovy and sardine egg abundances in the western North Pacific.  相似文献   

18.
Typically, in many studies in ecology, epidemiology, biomedicine and others, we are confronted with panels of short time-series of which we are interested in obtaining a biologically meaningful grouping. Here, we propose a bootstrap approach to test whether the regression functions or the variances of the error terms in a family of stochastic regression models are the same. Our general setting includes panels of time-series models as a special case. We rigorously justify the use of the test by investigating its asymptotic properties, both theoretically and through simulations. The latter confirm that for finite sample size, bootstrap provides a better approximation than classical asymptotic theory. We then apply the proposed tests to the mink-muskrat data across 81 trapping regions in Canada. Ecologically interpretable groupings are obtained, which serve as a necessary first step before a fuller biological and statistical analysis of the food chain interaction.  相似文献   

19.
Analyzing gene expression data in terms of gene sets: methodological issues   总被引:3,自引:0,他引:3  
MOTIVATION: Many statistical tests have been proposed in recent years for analyzing gene expression data in terms of gene sets, usually from Gene Ontology. These methods are based on widely different methodological assumptions. Some approaches test differential expression of each gene set against differential expression of the rest of the genes, whereas others test each gene set on its own. Also, some methods are based on a model in which the genes are the sampling units, whereas others treat the subjects as the sampling units. This article aims to clarify the assumptions behind different approaches and to indicate a preferential methodology of gene set testing. RESULTS: We identify some crucial assumptions which are needed by the majority of methods. P-values derived from methods that use a model which takes the genes as the sampling unit are easily misinterpreted, as they are based on a statistical model that does not resemble the biological experiment actually performed. Furthermore, because these models are based on a crucial and unrealistic independence assumption between genes, the P-values derived from such methods can be wildly anti-conservative, as a simulation experiment shows. We also argue that methods that competitively test each gene set against the rest of the genes create an unnecessary rift between single gene testing and gene set testing.  相似文献   

20.
Pan W  Chappell R 《Biometrics》2002,58(1):64-70
We show that the nonparametric maximum likelihood estimate (NPMLE) of the regression coefficient from the joint likelihood (of the regression coefficient and the baseline survival) works well for the Cox proportional hazards model with left-truncated and interval-censored data, but the NPMLE may underestimate the baseline survival. Two alternatives are also considered: first, the marginal likelihood approach by extending Satten (1996, Biometrika 83, 355-370) to truncated data, where the baseline distribution is eliminated as a nuisance parameter; and second, the monotone maximum likelihood estimate that maximizes the joint likelihood by assuming that the baseline distribution has a nondecreasing hazard function, which was originally proposed to overcome the underestimation of the survival from the NPMLE for left-truncated data without covariates (Tsai, 1988, Biometrika 75, 319-324). The bootstrap is proposed to draw inference. Simulations were conducted to assess their performance. The methods are applied to the Massachusetts Health Care Panel Study data set to compare the probabilities of losing functional independence for male and female seniors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号