首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: Many biomedical experiments are carried out by pooling individual biological samples. However, pooling samples can potentially hide biological variance and give false confidence concerning the data significance. In the context of microarray experiments for detecting differentially expressed genes, recent publications have addressed the problem of the efficiency of sample pooling, and some approximate formulas were provided for the power and sample size calculations. It is desirable to have exact formulas for these calculations and have the approximate results checked against the exact ones. We show that the difference between the approximate and the exact results can be large. RESULTS: In this study, we have characterized quantitatively the effect of pooling samples on the efficiency of microarray experiments for the detection of differential gene expression between two classes. We present exact formulas for calculating the power of microarray experimental designs involving sample pooling and technical replications. The formulas can be used to determine the total number of arrays and biological subjects required in an experiment to achieve the desired power at a given significance level. The conditions under which pooled design becomes preferable to non-pooled design can then be derived given the unit cost associated with a microarray and that with a biological subject. This paper thus serves to provide guidance on sample pooling and cost-effectiveness. The formulation in this paper is outlined in the context of performing microarray comparative studies, but its applicability is not limited to microarray experiments. It is also applicable to a wide range of biomedical comparative studies where sample pooling may be involved.  相似文献   

2.
In the Georgia Centenarian Study (Poon et al., Exceptional Longevity, 2006), centenarian cases and young controls are classified according to three categories (age, ethnic origin, and single nucleotide polymorphisms [SNPs] of candidate longevity genes), where each factor has two possible levels. Here we provide methodologies to determine the minimum sample size needed to detect dependence in 2 x 2 x 2 tables based on Fisher's exact test evaluated exactly or by Markov chain Monte Carlo (MCMC), assuming only the case total L and the control total N are known. While our MCMC method uses serial computing, parallel computing techniques are employed to solve the exact sample size problem. These tools will allow researchers to design efficient sampling strategies and to select informative SNPs. We apply our tools to 2 x 2 x 2 tables obtained from a pilot study of the Georgia Centenarians Study, and the sample size results provided important information for the subsequent major study. A comparison between the results of an exact method and those of a MCMC method showed that the MCMC method studied needed much less computation time on average (10.16 times faster on average for situations examined with S.E. = 2.60), but its sample size results were only valid as a rule for larger sample sizes (in the hundreds).  相似文献   

3.
Microarray technology is rapidly emerging for genome-wide screening of differentially expressed genes between clinical subtypes or different conditions of human diseases. Traditional statistical testing approaches, such as the two-sample t-test or Wilcoxon test, are frequently used for evaluating statistical significance of informative expressions but require adjustment for large-scale multiplicity. Due to its simplicity, Bonferroni adjustment has been widely used to circumvent this problem. It is well known, however, that the standard Bonferroni test is often very conservative. In the present paper, we compare three multiple testing procedures in the microarray context: the original Bonferroni method, a Bonferroni-type improved single-step method and a step-down method. The latter two methods are based on nonparametric resampling, by which the null distribution can be derived with the dependency structure among gene expressions preserved and the family-wise error rate accurately controlled at the desired level. We also present a sample size calculation method for designing microarray studies. Through simulations and data analyses, we find that the proposed methods for testing and sample size calculation are computationally fast and control error and power precisely.  相似文献   

4.
Information on statistical power is critical when planning investigations and evaluating empirical data, but actual power estimates are rarely presented in population genetic studies. We used computer simulations to assess and evaluate power when testing for genetic differentiation at multiple loci through combining test statistics or P values obtained by four different statistical approaches, viz. Pearson's chi-square, the log-likelihood ratio G-test, Fisher's exact test, and an F(ST)-based permutation test. Factors considered in the comparisons include the number of samples, their size, and the number and type of genetic marker loci. It is shown that power for detecting divergence may be substantial for frequently used sample sizes and sets of markers, also at quite low levels of differentiation. The choice of statistical method may be critical, though. For multi-allelic loci such as microsatellites, combining exact P values using Fisher's method is robust and generally provides a high resolving power. In contrast, for few-allele loci (e.g. allozymes and single nucleotide polymorphisms) and when making pairwise sample comparisons, this approach may yield a remarkably low power. In such situations chi-square typically represents a better alternative. The G-test without Williams's correction frequently tends to provide an unduly high proportion of false significances, and results from this test should be interpreted with great care. Our results are not confined to population genetic analyses but applicable to contingency testing in general.  相似文献   

5.
In the field of pharmaceutical drug development, there have been extensive discussions on the establishment of statistically significant results that demonstrate the efficacy of a new treatment with multiple co‐primary endpoints. When designing a clinical trial with such multiple co‐primary endpoints, it is critical to determine the appropriate sample size for indicating the statistical significance of all the co‐primary endpoints with preserving the desired overall power because the type II error rate increases with the number of co‐primary endpoints. We consider overall power functions and sample size determinations with multiple co‐primary endpoints that consist of mixed continuous and binary variables, and provide numerical examples to illustrate the behavior of the overall power functions and sample sizes. In formulating the problem, we assume that response variables follow a multivariate normal distribution, where binary variables are observed in a dichotomized normal distribution with a certain point of dichotomy. Numerical examples show that the sample size decreases as the correlation increases when the individual powers of each endpoint are approximately and mutually equal.  相似文献   

6.
Significance levels obtained from a chi 2 contingency test are suspect when sample sizes are small. Traditionally this has meant that data must be combined. However, such an approach may obscure heterogeneity and hence potentially reduce the power of the statistical test. In this paper, we present a Monte Carlo solution to this problem: by this method, no lumping of data is required, and the accuracy of the estimate of alpha (i.e., a type 1 error) depends only on the number of randomizations of the original data set. We illustrate this technique with data from mtDNA studies, where numerous genotypes are often observed and sample sizes are relatively small.   相似文献   

7.
In this article, we describe a conditional score test for detecting a monotone dose‐response relationship with ordinal response data. We consider three different versions of this test: asymptotic, conditional exact, and mid‐P conditional score test. Exact and asymptotic power formulae based on these tests will be studied. Asymptotic sample size formulae based on the asymptotic conditional score test will be derived. The proposed formulae are applied to a vaccination study and a developmental toxicity study for illustrative purposes. Actual significance level and exact power properties of these tests are compared in a small empirical study. The mid‐P conditional score test is observed to be the most powerful test with actual significance level close to the pre‐specified nominal level.  相似文献   

8.
Genetic studies of sexual isolation in Drosophila have generally failed to fully evaluate the effects of their sample size and recombination between markers on their conclusions. In this study we evaluate recombinational distances between markers in Drosophila pseudoobscura and D. persimilis, a species pair in which numerous genetic mapping studies have been performed. We conclude that, contrary to assertions, the inversions that distinguish these two species still allow for much recombination within most of their chromosome arms in F1 hybrid females. Such recombination may have caused previous mapping studies in these species to miss (or grossly underestimate) the effects of several genomic regions. We also evaluate the effects of sample size and recombination on genetic studies of sexual isolation in other Drosophila species groups. We conclude that some of these studies may have been heavily biased toward detecting only genes of large effect. Future studies of sexual isolation should be preceded by detailed statistical power analyses that determine the effects of recombination and sample size in the species pair being studied to avoid these complications.  相似文献   

9.
A statistical test for detecting geographic subdivision.   总被引:24,自引:0,他引:24  
A statistical test for detecting genetic differentiation of subpopulations is described that uses molecular variation in samples of DNA sequences from two or more localities. The statistical significance of the test is determined with Monte Carlo simulations. The power of the test to detect genetic differentiation in a selectively neutral Wright-Fisher island model depends on both sample size and the rates of migration, mutation, and recombination. It is found that the power of the test is substantial with samples of size 50, when 4Nm less than 10, where N is the subpopulation size and m is the fraction of migrants in each subpopulation each generation. More powerful tests are obtained with genes with recombination than with genes without recombination.  相似文献   

10.
Power and sample size calculations are critical parts of any research design for genetic association. We present a method that utilizes haplotype frequency information and average marker-marker linkage disequilibrium on SNPs typed in and around all genes on a chromosome. The test statistic used is the classic likelihood ratio test applied to haplotypes in case/control populations. Haplotype frequencies are computed through specification of genetic model parameters. Power is determined by computation of the test's non-centrality parameter. Power per gene is computed as a weighted average of the power assuming each haplotype is associated with the trait. We apply our method to genotype data from dense SNP maps across three entire chromosomes (6, 21, and 22) for three different human populations (African-American, Caucasian, Chinese), three different models of disease (additive, dominant, and multiplicative) and two trait allele frequencies (rare, common). We perform a regression analysis using these factors, average marker-marker disequilibrium, and the haplotype diversity across the gene region to determine which factors most significantly affect average power for a gene in our data. Also, as a 'proof of principle' calculation, we perform power and sample size calculations for all genes within 100 kb of the PSORS1 locus (chromosome 6) for a previously published association study of psoriasis. Results of our regression analysis indicate that four highly significant factors that determine average power to detect association are: disease model, average marker-marker disequilibrium, haplotype diversity, and the trait allele frequency. These findings may have important implications for the design of well-powered candidate gene association studies. Our power and sample size calculations for the PSORS1 gene appear consistent with published findings, namely that there is substantial power (>0.99) for most genes within 100 kb of the PSORS1 locus at the 0.01 significance level.  相似文献   

11.
Liu Q  Chi GY 《Biometrics》2001,57(1):172-177
Proschan and Hunsberger (1995, Biometrics 51, 1315-1324) proposed a two-stage adaptive design that maintains the Type I error rate. For practical applications, a two-stage adaptive design is also required to achieve a desired statistical power while limiting the maximum overall sample size. In our proposal, a two-stage adaptive design is comprised of a main stage and an extension stage, where the main stage has sufficient power to reject the null under the anticipated effect size and the extension stage allows increasing the sample size in case the true effect size is smaller than anticipated. For statistical inference, methods for obtaining the overall adjusted p-value, point estimate and confidence intervals are developed. An exact two-stage test procedure is also outlined for robust inference.  相似文献   

12.
Multiple diagnostic tests and risk factors are commonly available for many diseases. This information can be either redundant or complimentary. Combining them may improve the diagnostic/predictive accuracy, but also unnecessarily increase complexity, risks, and/or costs. The improved accuracy gained by including additional variables can be evaluated by the increment of the area under (AUC) the receiver‐operating characteristic curves with and without the new variable(s). In this study, we derive a new test statistic to accurately and efficiently determine the statistical significance of this incremental AUC under a multivariate normality assumption. Our test links AUC difference to a quadratic form of a standardized mean shift in a unit of the inverse covariance matrix through a properly linear transformation of all diagnostic variables. The distribution of the quadratic estimator is related to the multivariate Behrens–Fisher problem. We provide explicit mathematical solutions of the estimator and its approximate non‐central F‐distribution, type I error rate, and sample size formula. We use simulation studies to prove that our new test maintains prespecified type I error rates as well as reasonable statistical power under practical sample sizes. We use data from the Study of Osteoporotic Fractures as an application example to illustrate our method.  相似文献   

13.
MOTIVATION: Human clinical projects typically require a priori statistical power analyses. Towards this end, we sought to build a flexible and interactive power analysis tool for microarray studies integrated into our public domain HCE 3.5 software package. We then sought to determine if probe set algorithms or organism type strongly influenced power analysis results. RESULTS: The HCE 3.5 power analysis tool was designed to import any pre-existing Affymetrix microarray project, and interactively test the effects of user-defined definitions of alpha (significance), beta (1-power), sample size and effect size. The tool generates a filter for all probe sets or more focused ontology-based subsets, with or without noise filters that can be used to limit analyses of a future project to appropriately powered probe sets. We studied projects from three organisms (Arabidopsis, rat, human), and three probe set algorithms (MAS5.0, RMA, dChip PM/MM). We found large differences in power results based on probe set algorithm selection and noise filters. RMA provided high sensitivity for low numbers of arrays, but this came at a cost of high false positive results (24% false positive in the human project studied). Our data suggest that a priori power calculations are important for both experimental design in hypothesis testing and hypothesis generation, as well as for the selection of optimized data analysis parameters. AVAILABILITY: The Hierarchical Clustering Explorer 3.5 with the interactive power analysis functions is available at www.cs.umd.edu/hcil/hce or www.cnmcresearch.org/bioinformatics. CONTACT: jseo@cnmcresearch.org  相似文献   

14.
The conditional exact tests of homogeneity of two binomial proportions are often used in small samples, because the exact tests guarantee to keep the size under the nominal level. The Fisher's exact test, the exact chi‐squared test and the exact likelihood ratio test are popular and can be implemented in software StatXact. In this paper we investigate which test is the best in small samples in terms of the unconditional exact power. In equal sample cases it is proved that the three tests produce the same unconditional exact power. A symmetry of the unconditional exact power is also found. In unequal sample cases the unconditional exact powers of the three tests are computed and compared. In most cases the Fisher's exact test turns out to be best, but we characterize some cases in which the exact likelihood ratio test has the highest unconditional exact power. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

15.
Although phylogenetic hypotheses can provide insights into mechanisms of evolution, their utility is limited by our inability to differentiate simultaneous speciation events (hard polytomies) from rapid cladogenesis (soft polytomies). In the present paper, we tested the potential for statistical power analysis to differentiate between hard and soft polytomies in molecular phytogenies. Classical power analysis typically is used a priori to determine the sample size required to detect a particular effect size at a particular level of significance (a) with a certain power (1 – β). A posteriori, power analysis is used to infer whether failure to reject a null hypothesis results from lack of an effect or from insufficient data (i.e., low power). We adapted this approach to molecular data to infer whether polytomies result from simultaneous branching events or from insufficient sequence information. We then used this approach to determine the amount of sequence data (sample size) required to detect a positive branch length (effect size). A worked example is provided based on the auklets (Charadriiformes: Alcidae), a group of seabirds among which relationships are represented by a polytomy, despite analyses of over 3000 bp of sequence data. We demonstrate the calculation of effect sizes and sample sizes from sequence data using a normal curve test for difference of a proportion from an expected value and a t-test for a difference of a mean from an expected value. Power analyses indicated that the data for the auklets should be sufficient to differentiate speciation events that occurred at least 100,000 yr apart (the duration of the shortest glacial and interglacial events of the Pleistocene), 2.6 million years ago.  相似文献   

16.
Beyond Bonferroni: less conservative analyses for conservation genetics   总被引:1,自引:0,他引:1  
Studies in conservation genetics often attempt to determine genetic differentiation between two or more temporally or geographically distinct sample collections. Pairwise p-values from Fisher’s exact tests or contingency Chi-square tests are commonly reported with a Bonferroni correction for multiple tests. While the Bonferroni correction controls the experiment-wise α, this correction is very conservative and results in greatly diminished power to detect differentiation among pairs of sample collections. An alternative is to control the false discovery rate (FDR) that provides increased power, but this method only maintains experiment-wise α when none of the pairwise comparisons are significant. Recent modifications to the FDR method provide a moderate approach to determining significance level. Simulations reveal that critical values of multiple comparison tests with both the Bonferroni method and a modified FDR method approach a minimum asymptote very near zero as the number of tests gets large, but the Bonferroni method approaches zero much more rapidly than the modified FDR method. I compared pairwise significance from three published studies using three critical values corresponding to Bonferroni, FDR, and modified FDR methods. Results suggest that the modified FDR method may provide the most biologically important critical value for evaluating significance of population differentiation in conservation genetics.␣Ultimately, more thorough reporting of statistical significance is needed to allow interpretation of biological significance of genetic differentiation among populations.An erratum to this article can be found at  相似文献   

17.
In many research disciplines, hypothesis tests are applied to evaluate whether findings are statistically significant or could be explained by chance. The Wilcoxon–Mann–Whitney(WMW) test is among the most popular hypothesis tests in medicine and life science to analyze if two groups of samples are equally distributed. This nonparametric statistical homogeneity test is commonly applied in molecular diagnosis. Generally, the solution of the WMW test takes a high combinatorial effort for large sample cohorts containing a significant number of ties. Hence, P value is frequently approximated by a normal distribution. We developed EDISON-WMW, a new approach to calculate the exact permutation of the two-tailed unpaired WMW test without any corrections required and allowing for ties. The method relies on dynamic programing to solve the combinatorial problem of the WMW test efficiently. Beyond a straightforward implementation of the algorithm, we presented different optimization strategies and developed a parallel solution. Using our program,the exact P value for large cohorts containing more than 1000 samples with ties can be calculated within minutes. We demonstrate the performance of this novel approach on randomly-generated data, benchmark it against 13 other commonly-applied approaches and moreover evaluate molecular biomarkers for lung carcinoma and chronic obstructive pulmonary disease(COPD). We foundthat approximated P values were generally higher than the exact solution provided by EDISONWMW. Importantly, the algorithm can also be applied to high-throughput omics datasets, where hundreds or thousands of features are included. To provide easy access to the multi-threaded version of EDISON-WMW, a web-based solution of our algorithm is freely available at http://www.ccb.uni-saarland.de/software/wtest/.  相似文献   

18.
It was shown recently using experimental data that it is possible under certain conditions to determine whether a person with known genotypes at a number of markers was part of a sample from which only allele frequencies are known. Using population genetic and statistical theory, we show that the power of such identification is, approximately, proportional to the number of independent SNPs divided by the size of the sample from which the allele frequencies are available. We quantify the limits of identification and propose likelihood and regression analysis methods for the analysis of data. We show that these methods have similar statistical properties and have more desirable properties, in terms of type-I error rate and statistical power, than test statistics suggested in the literature.  相似文献   

19.
MOTIVATION: There is not a widely applicable method to determine the sample size for experiments basing statistical significance on the false discovery rate (FDR). RESULTS: We propose and develop the anticipated FDR (aFDR) as a conceptual tool for determining sample size. We derive mathematical expressions for the aFDR and anticipated average statistical power. These expressions are used to develop a general algorithm to determine sample size. We provide specific details on how to implement the algorithm for a k-group (k > or = 2) comparisons. The algorithm performs well for k-group comparisons in a series of traditional simulations and in a real-data simulation conducted by resampling from a large, publicly available dataset. AVAILABILITY: Documented S-plus and R code libraries are freely available from www.stjuderesearch.org/depts/biostats.  相似文献   

20.
Two statistical tests for meiotic breakpoint analysis.   总被引:2,自引:0,他引:2       下载免费PDF全文
Meiotic breakpoint analysis (BPA), a statistical method for ordering genetic markers, is increasing in importance as a method for building genetic maps of human chromosomes. Although BPA does not provide estimates of genetic distances between markers, it efficiently locates new markers on already defined dense maps, when likelihood analysis becomes cumbersome or the sample size is small. However, until now no assessments of statistical significance have been available for evaluating the possibility that the results of a BPA were produced by chance. In this paper, we propose two statistical tests to determine whether the size of a sample and its genetic information content are sufficient to distinguish between "no linkage" and "linkage" of a marker mapped by BPA to a certain region. Both tests are exact and should be conducted after a BPA has assigned the marker to an interval on the map. Applications of the new tests are demonstrated by three examples: (1) a synthetic data set, (2) a data set of five markers on human chromosome 8p, and (3) a data set of four markers on human chromosome 17q.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号