首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A variety of analytical methods is available for branch testing in distance-based phylogenies. However, these methods are rarely used, possibly because the estimation of some of their statistics, especially the covariances, is not always feasible. We show that these difficulties can be overcome if some simplifying assumptions are made, namely distance independence. The weighted least-squares likelihood ratio test (WLS-LRT) we propose is easy to perform, using only the distances and some of their associated variances. If no variances are known, the use of the Felsenstein F-test, also based on weighted least squares, is discussed. Using simulated data and a data set of 43 mammalian mitochondrial sequences we demonstrate that the WLS-LRT performs as well as the generalized least-squares test, and indeed better for a large number of taxa data set. We thus show that the assumption of independence does not negatively affect the reliability or the accuracy of the least-squares approach. The results of the WLS-LRT are no worse than the results of the bootstrap methods, such as the Felsenstein bootstrap selection probability test and the Dopazo test. We also show that WLS-LRT can be applied in instances where other analytical methods are inappropriate. This point is illustrated by analyzing the relationships between human immunodeficiency virus type 1 (HIV-1) sequences isolated from various organs of different individuals.  相似文献   

2.

Background  

The analysis of high-throughput gene expression data with respect to sets of genes rather than individual genes has many advantages. A variety of methods have been developed for assessing the enrichment of sets of genes with respect to differential expression. In this paper we provide a comparative study of four of these methods: Fisher's exact test, Gene Set Enrichment Analysis (GSEA), Random-Sets (RS), and Gene List Analysis with Prediction Accuracy (GLAPA). The first three methods use associative statistics, while the fourth uses predictive statistics. We first compare all four methods on simulated data sets to verify that Fisher's exact test is markedly worse than the other three approaches. We then validate the other three methods on seven real data sets with known genetic perturbations and then compare the methods on two cancer data sets where our a priori knowledge is limited.  相似文献   

3.
Diversity indices might be used to assess the impact of treatments on the relative abundance patterns in species communities. When several treatments are to be compared, simultaneous confidence intervals for the differences of diversity indices between treatments may be used. The simultaneous confidence interval methods described until now are either constructed or validated under the assumption of the multinomial distribution for the abundance counts. Motivated by four example data sets with background in agricultural and marine ecology, we focus on the situation when available replications show that the count data exhibit extra‐multinomial variability. Based on simulated overdispersed count data, we compare previously proposed methods assuming multinomial distribution, a method assuming normal distribution for the replicated observations of the diversity indices and three different bootstrap methods to construct simultaneous confidence intervals for multiple differences of Simpson and Shannon diversity indices. The focus of the simulation study is on comparisons to a control group. The severe failure of asymptotic multinomial methods in overdispersed settings is illustrated. Among the bootstrap methods, the widely known Westfall–Young method performs best for the Simpson index, while for the Shannon index, two methods based on stratified bootstrap and summed count data are preferable. The methods application is illustrated for an example.  相似文献   

4.
Whole-genome or multiple gene phylogenetic analysis is of interest since single gene analysis often results in poorly resolved trees. Here, the use of spectral techniques for analyzing multigene data sets is explored. The protein sequences are treated as categorical time series, and a measure of similarity between a pair of sequences, the spectral covariance, is based on the common periodicity between these two sequences. Unlike the other methods, the spectral covariance method focuses on the relationship between the sites of genetic sequences. By properly scaling the dissimilarity measures derived from different genes between a pair of species, we can use the mean of these scaled dissimilarity measures as a summary statistic to measure the taxonomic distances across multiple genes. The methods are applied to three different data sets, one noncontroversial and two with some dispute over the correct placement of the taxa in the tree. Trees are constructed using two distance-based methods, BIONJ and FITCH. A variation of block bootstrap sampling method is used for inference. The methods are able to recover all major clades in the corresponding reference trees with moderate to high bootstrap support. Through simulations, we show that the covariance-based methods effectively capture phylogenetic signal even when structural information is not fully retained. Comparisons of simulation results with the bootstrap permutation results indicate that the covariance-based methods are fairly robust under perturbations in sequence similarity but more sensitive to perturbations in structural similarity.  相似文献   

5.
Zhou XH  Tu W 《Biometrics》2000,56(4):1118-1125
In this paper, we consider the problem of interval estimation for the mean of diagnostic test charges. Diagnostic test charge data may contain zero values, and the nonzero values can often be modeled by a log-normal distribution. Under such a model, we propose three different interval estimation procedures: a percentile-t bootstrap interval based on sufficient statistics and two likelihood-based confidence intervals. For theoretical properties, we show that the two likelihood-based one-sided confidence intervals are only first-order accurate and that the bootstrap-based one-sided confidence interval is second-order accurate. For two-sided confidence intervals, all three proposed methods are second-order accurate. A simulation study in finite-sample sizes suggests all three proposed intervals outperform a widely used minimum variance unbiased estimator (MVUE)-based interval except for the case of one-sided lower end-point intervals when the skewness is very small. Among the proposed one-sided intervals, the bootstrap interval has the best coverage accuracy. For the two-sided intervals, when the sample size is small, the bootstrap method still yields the best coverage accuracy unless the skewness is very small, in which case the bias-corrected ML method has the best accuracy. When the sample size is large, all three proposed intervals have similar coverage accuracy. Finally, we analyze with the proposed methods one real example assessing diagnostic test charges among older adults with depression.  相似文献   

6.
Bayesian inference is becoming a common statistical approach to phylogenetic estimation because, among other reasons, it allows for rapid analysis of large data sets with complex evolutionary models. Conveniently, Bayesian phylogenetic methods use currently available stochastic models of sequence evolution. However, as with other model-based approaches, the results of Bayesian inference are conditional on the assumed model of evolution: inadequate models (models that poorly fit the data) may result in erroneous inferences. In this article, I present a Bayesian phylogenetic method that evaluates the adequacy of evolutionary models using posterior predictive distributions. By evaluating a model's posterior predictive performance, an adequate model can be selected for a Bayesian phylogenetic study. Although I present a single test statistic that assesses the overall (global) performance of a phylogenetic model, a variety of test statistics can be tailored to evaluate specific features (local performance) of evolutionary models to identify sources failure. The method presented here, unlike the likelihood-ratio test and parametric bootstrap, accounts for uncertainty in the phylogeny and model parameters.  相似文献   

7.
Efficient measurement error correction with spatially misaligned data   总被引:1,自引:0,他引:1  
Association studies in environmental statistics often involve exposure and outcome data that are misaligned in space. A common strategy is to employ a spatial model such as universal kriging to predict exposures at locations with outcome data and then estimate a regression parameter of interest using the predicted exposures. This results in measurement error because the predicted exposures do not correspond exactly to the true values. We characterize the measurement error by decomposing it into Berkson-like and classical-like components. One correction approach is the parametric bootstrap, which is effective but computationally intensive since it requires solving a nonlinear optimization problem for the exposure model parameters in each bootstrap sample. We propose a less computationally intensive alternative termed the "parameter bootstrap" that only requires solving one nonlinear optimization problem, and we also compare bootstrap methods to other recently proposed methods. We illustrate our methodology in simulations and with publicly available data from the Environmental Protection Agency.  相似文献   

8.
Bilder CR  Loughin TM 《Biometrics》2004,60(1):241-248
Questions that ask respondents to "choose all that apply" from a set of items occur frequently in surveys. Categorical variables that summarize this type of survey data are called both pick any/c variables and multiple-response categorical variables. It is often of interest to test for independence between two categorical variables. When both categorical variables can have multiple responses, traditional Pearson chi-square tests for independence should not be used because of the within-subject dependence among responses. An intuitively constructed version of the Pearson statistic is proposed to perform the test using bootstrap procedures to approximate its sampling distribution. First- and second-order adjustments to the proposed statistic are given in order to use a chi-square distribution approximation. A Bonferroni adjustment is proposed to perform the test when the joint set of responses for individual subjects is unavailable. Simulations show that the bootstrap procedures hold the correct size more consistently than the other procedures.  相似文献   

9.
The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T(2), collapsing method, multivariate and collapsing (CMC) method, individual χ(2) test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets.  相似文献   

10.
Abstract. A method is described to determine the number of significant dimensions in metric ordination of a sample. The method is probabilistic, based on bootstrap resampling. An iterative algorithm takes bootstrap samples with replacement from the sample. It finds in each bootstrap sample ordination coordinates and computes, after Procrustean adjustments, the correlation between observed and bootstrap ordination scores. It compares this correlation to the same parameter generated in a parallel bootstrapped ordination of randomly permuted data, which upon many iterations will generate a probability. The method is assessed in principal coordinates analysis of simulated data sets that have varying number of variables and correlation levels, uniform or patterned correlation structure. The results suggest the method is more reliable than other available methods in recovering the true intrinsic dimensionality. Examples with grassland data illustrate utility.  相似文献   

11.
Traditional methods for calculating the power of a statistical test for location shift require knowledge of the shape of the underlying probability distribution. The distribution shape, however, may be unknown. This paper describes a bootstrap method for using observed data (or pilot data) to approximate the power. No assumptions need be made about the shape of the underlying continuous probability distribution. Simulation evidence shows that, when applied to the Wilcoxon two-sample test for location shift, the suggested method is reliable. The evidence also shows that it is more accurate than a benchmark traditional approach. The bootstrap method is applied to a real-data example. The analysis demonstrates how the method can be used to determine sample sizes and how to choose the more powerful of two alternative tests for location shift.  相似文献   

12.
 This study presents two efficient algorithms – combinatorial and probabilistic combinatorial methods (CM and PCM) – for estimation of a number of precise patterns of discharges that occur by chance in records of multiple single-unit spike trains. The confidence limits estimated by these methods are in good agreement with different sets of simulated test data as well as with the ad-hoc method. Both combinatorial methods provided a better accuracy than the bootstrap algorithm and in most cases of nonstationary data PCM provided better estimations than the ad-hoc method. Introduction of a jitter for searching patterns with a precision of a few milliseconds and burst filtering may introduce biases in the estimations. Comparison of a new filtering procedure based upon a filtering frequency with previously described schemes of filtering indicates the possibility of using a simple setting which remains accurate over a wide range of parameters. We aim to implement a combination of PCM for estimations of the number of patterns formed by three to seven spikes and CM for higher-order complexities for estimations during experiments in progress. Received: 12 June 1995 / Accepted in revised form: 5 February 1997  相似文献   

13.
Indirect gradient analysis, or ordination, is primarily a method of exploratory data analysis. However, to support biological interpretations of resulting axes as vegetation gradients, or later confirmatory analyses and statistical tests, these axes need to be stable or at least robust into minor sampling effects. We develop a computer-intensive bootstrap (resampling) approach to estimate sampling effects on solutions from nonlinear ordination.We apply this approach to simulated data and to three forest data sets from North Carolina, USA and examine the resulting patterns of local and global instability in detrended correspondence analysis (DCA) solutions. We propose a bootstrap coefficient, scaled rank variance (SRV), to estimate remaining instability in species ranks after rotating axes to a common global orientation. In analysis of simulated data, bootstrap SRV was generally consistent with an equivalent estimate from repeated sampling. In an example using field data SRV, bootstrapped DCA showed good recovery of the order of common species along the first two axes, but poor recovery of later axes. We also suggest some criteria to use with the SRV to decide how many axes to retain and attempt to interpret.Abbreviations DCA= detrended correspondence analysis - SRV= scaled rank variance  相似文献   

14.
Bennewitz J  Reinsch N  Kalm E 《Genetics》2002,160(4):1673-1686
The nonparametric bootstrap approach is known to be suitable for calculating central confidence intervals for the locations of quantitative trait loci (QTL). However, the distribution of the bootstrap QTL position estimates along the chromosome is peaked at the positions of the markers and is not tailed equally. This results in conservativeness and large width of the confidence intervals. In this study three modified methods are proposed to calculate nonparametric bootstrap confidence intervals for QTL locations, which compute noncentral confidence intervals (uncorrected method I), correct for the impact of the markers (weighted method I), or both (weighted method II). Noncentral confidence intervals were computed with an analog of the highest posterior density method. The correction for the markers is based on the distribution of QTL estimates along the chromosome when the QTL is not linked with any marker, and it can be obtained with a permutation approach. In a simulation study the three methods were compared with the original bootstrap method. The results showed that it is useful, first, to compute noncentral confidence intervals and, second, to correct the bootstrap distribution of the QTL estimates for the impact of the markers. The weighted method II, combining these two properties, produced the shortest and less biased confidence intervals in a large number of simulated configurations.  相似文献   

15.
The bootstrap error estimation method is investigated in comparison with the known π-method and with a combined error estimation suggested by us using simulated and normally distributed “populations” in 15 and 30 characters, respectively. For small sample sizes (below the double to threefold number of characters per class) the estimates resulting from the bootstrap method are on the average too small and can no longer be accepted. Significantly better results (with an essentially lower calculation expenditure) are obtained for the π-method and the combined estimation. The variability is essentially the same for all the three methods. This applies both in the case of rather badly separated and in the case of very well separated populations. A bootstrap estimation modified by us also gives unsatisfactory results.  相似文献   

16.
MOTIVATION: An important goal in analyzing microarray data is to determine which genes are differentially expressed across two kinds of tissue samples or samples obtained under two experimental conditions. Various parametric tests, such as the two-sample t-test, have been used, but their possibly too strong parametric assumptions or large sample justifications may not hold in practice. As alternatives, a class of three nonparametric statistical methods, including the empirical Bayes method of Efron et al. (2001), the significance analysis of microarray (SAM) method of Tusher et al. (2001) and the mixture model method (MMM) of Pan et al. (2001), have been proposed. All the three methods depend on constructing a test statistic and a so-called null statistic such that the null statistic's distribution can be used to approximate the null distribution of the test statistic. However, relatively little effort has been directed toward assessment of the performance or the underlying assumptions of the methods in constructing such test and null statistics. RESULTS: We point out a problem of a current method to construct the test and null statistics, which may lead to largely inflated Type I errors (i.e. false positives). We also propose two modifications that overcome the problem. In the context of MMM, the improved performance of the modified methods is demonstrated using simulated data. In addition, our numerical results also provide evidence to support the utility and effectiveness of MMM.  相似文献   

17.
Summary Methods for performing multiple tests of paired proportions are described. A broadly applicable method using McNemar's exact test and the exact distributions of all test statistics is developed; the method controls the familywise error rate in the strong sense under minimal assumptions. A closed form (not simulation‐based) algorithm for carrying out the method is provided. A bootstrap alternative is developed to account for correlation structures. Operating characteristics of these and other methods are evaluated via a simulation study. Applications to multiple comparisons of predictive models for disease classification and to postmarket surveillance of adverse events are given.  相似文献   

18.
年龄-龄期两性生命表(age-stage, two-sex life table)简称两性生命表,是种群生态学研究与害虫治理中常用的重要理论与分析工具。根据两性生命表理论而设计的方便用户的软件TWOSEX-MSChart近年来被越来越多国内外学者用于昆虫种群研究的数据分析。两性生命表软件的分析功能是由许多的统计技术与计算机模拟方法作为数据分析的支撑,其中自我重复取样(bootstrap)是其重要技术之一。本文详述了bootstrap技术的基本原理、方法、优缺点及其在两性生命表分析中的应用,并介绍了其理论基础多项式定理(multinomial theorem)在生命表研究中的应用。与常用统计方法相比,bootstrap不需要数据分布假设就可以对数据总体的分布特性进行统计和推断。在两性生命表分析中,bootstrap不仅可以估算种群参数或一般统计值的方差和标准误,同时利用paired bootstrap test还可以比较不同处理间的差异,准确显示种群的变异性。利用相同的自我重复取样样本(same bootstrap samples)可以正确计算昆虫的孵化率与不同繁殖型对种群参数的贡献,并...  相似文献   

19.
In this paper, we focus discussion on testing the homogeneity of risk difference for sparse data, in which we have few patients in each stratum, but a moderate or large number of strata. When the number of patients per treatment within strata is small (2 to 5 patients), none of test procedures proposed previously for testing the homogeneity of risk difference for sparse data can really perform well. On the basis of bootstrap methods, we develop a simple test procedure that can improve the power of the previous test procedures. Using Monte Carlo simulations, we demonstrate that the test procedure developed here can perform reasonable well with respect to Type I error even when the number of patients per stratum for each treatment is as small as two patients. We evaluate and study the power of the proposed test procedure in a variety of situations. We also include a comparison of the performance between the test statistics proposed elsewhere and the test procedure developed here. Finally, we briefly discuss the limitation of using the proposed test procedure. We use the data comparing two chemotherapy treatments in patients with multiple myeloma to illustrate the use of the proposed test procedure.  相似文献   

20.
We use the Genetic Analysis Workshop 14 simulated data to explore the effectiveness of a two-stage strategy for mapping complex disease loci consisting of an initial genome scan with confidence interval construction for gene location, followed by fine mapping with family-based tests of association on a dense set of single-nucleotide polymorphisms. We considered four types of intervals: the 1-LOD interval, a basic percentile bootstrap confidence interval based on the position of the maximum Zlr score, and asymptotic and bootstrap confidence intervals based on a generalized estimating equations method. For fine mapping we considered two family-based tests of association: a test based on a likelihood ratio statistic and a transmission-disequilibrium-type test implemented in the software FBAT. In two of the simulation replicates, we found that the bootstrap confidence intervals based on the peak Zlr and the 1-LOD support interval always contained the true disease loci and that the likelihood ratio test provided further strong confirmatory evidence of the presence of disease loci in these regions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号