首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A common statistical method for assessing bioequivalence of two formulations of a chemical substance is the symmetric confidence interval of WESTLAKE (1972). As mentioned by WEST -LAKE (1981) and SCHUIRMAN (1981) a more powerful method consists of two one-sided t-tests. An (1-α)-confidence interval consistent with the two one-sided t-tests procedure is given by [min(α, 0), max (0, b)] where [a, b] is the conventional (1–2α)-confidence interval of the t-test. This “central” confidence interval is always a strict subset of the symmetric confidence interval and thus has more power in proving bioequivalence. The central confidence interval has properties comparable with those of the conventional one-sided confidence intervals.  相似文献   

2.
A genetic model with genotype×environment (GE) interactions for controlling systematical errors in the field can be used for predicting genotypic values by an adjusted unbiased prediction (AUP) method. Mahalanobis distance, calculated based on the genotypic values, is then applied to measure the genetic distance among accessions. The unweighted pair-group average, Ward’s and the complete linkage methods of hierarchical clustering combined with three sampling strategies are proposed to construct core collections in a procedure of stepwise clustering. A homogeneous test and t-tests are suggested for use in testing variances and means, respectively. The coincidence rate (CR%) for range and the variable rate (VR%) for the coefficient of variation are designed to evaluate the property of core collections. A worked example of constructing core collections in cotton with 21 traits was conducted. Random sampling can represent the genetic diversity structure of the initial collection. Preferred sampling can keep the accessions with special or valuable characteristics in the initial collection. Deviation sampling can retain the larger genetic variability of the initial collection. For better representation of the core collection, cluster methods should be combined with different sampling strategies. The core collections based on genotypic values retained larger genetic variability and had superior representatives than those based on phenotypic values. Received: 15 October 1999 / Accepted: 24 November 1999  相似文献   

3.
Testing for differentially expressed genes with microarray data   总被引:1,自引:1,他引:0       下载免费PDF全文
This paper compares the type I error and power of the one- and two-sample t-tests, and the one- and two-sample permutation tests for detecting differences in gene expression between two microarray samples with replicates using Monte Carlo simulations. When data are generated from a normal distribution, type I errors and powers of the one-sample parametric t-test and one-sample permutation test are very close, as are the two-sample t-test and two-sample permutation test, provided that the number of replicates is adequate. When data are generated from a t-distribution, the permutation tests outperform the corresponding parametric tests if the number of replicates is at least five. For data from a two-color dye swap experiment, the one-sample test appears to perform better than the two-sample test since expression measurements for control and treatment samples from the same spot are correlated. For data from independent samples, such as the one-channel array or two-channel array experiment using reference design, the two-sample t-tests appear more powerful than the one-sample t-tests.  相似文献   

4.
A simple testing procedure “control versus k treatments” for one-sided ordered alternatives for univariate, continuous variables is given. With a simulation study both the first kind risk a and the power behaviour under several distributions, expected value profiles, sample sizes and a levels are shown.  相似文献   

5.
This work focuses on differential expression analysis of microarray datasets. One way to improve such statistical analyses is to integrate biological information in the design of these analyses. In this paper, we will use the relationship between the level of gene expression and variability. Using this biological information, we propose to integrate the information from multiple genes to get a better estimate of individual gene variance, when a small number of replicates are available, to increase the power of the statistical analysis. We describe a strategy named the “Window t test” that uses multiple genes which share a similar expression level to compute the variance which is then incorporated a classic t test. The performances of this new method are evaluated by comparison with classic and widely-used methods for differential expression analysis (the classic Student t test, the Regularized t test (reg t test), SAM, Limma, LPE and Shrinkage t). In each case tested, the results obtained were at least equivalent to the best performing method and, in most cases, outperformed it. Moreover, the Window t test relies on a very simple procedure requiring small computing power compared with other methods designed for microarray differential expression analysis. Electronic Supplementary Material  Supplementary material is available for this article at and is accessible for authorized users.  相似文献   

6.
The distribution of mean heterozygosities under an infinite allele model with constant mutation rate was examined through simulation studies. It was found that, although the variance of the distribution decreases with increasing numbers of loci examined as expected, the shape of the distribution may remain skewed or bimodal. The distribution becomes symmetrical for increasing mean heterozygosity levels and numbers of loci. As a result, parametric statistical tests may not be valid for making comparisons among populations or species. Independent sample t-tests were examined in detail to determine the frequency of rejection of the null hypothesis when pairs of samples are drawn from populations with the same mean heterozygosity. Differing numbers of loci and levels of mean heterozygosity were examined. For mean heterozygosity levels above 7.5%, t-tests provide the proper rejection rate, with as few as five loci. When mean heterozygosity is as low as 2.5%, the t-test is conservative even when 40 loci are examined in each population. Independent sample t-tests were then examined for their power to detect true differences between populations as the degree of difference and number of loci vary. Although large differences can be found with high certainty, differences on the order of 5% heterozygosity may require that large numbers of loci (>40) be examined in order to be 80% or more certain of detecting them. In addition, it is emphasized that, for small numbers of loci (<25), the statistical detection of differences of interesting magnitude requires that relatively rare sampling events occur and that much larger differences be observed among the samples than exist for the population means. Two reasons exist for the lack of sensitivity of the test procedures. First, when mean heterozygosity levels are low, the non-normality of the sample means is perhaps most important. Second, even when mean heterozygosity levels are high or when sample sizes are large enough so sample means are approximately normally distributed, the intrinsically high interlocus variance of heterozygosity estimates makes the tests insensitive to the presence of heterozygosity differences that might be biologically meaningful. Finally, the implications of the results of this study are discussed with regard to observed low levels of correlation between heterozygosity and other explanatory variables.  相似文献   

7.
This work discusses how two sample t-tests behave when applied to data that may violate the classical statistical assumptions of independence, heteroscedasticity and Gaussianity. The usual two sample t-statistic based on a pooled variance estimate and the Welch-Aspin statistic are treated in detail. Practical “rules-of-thumb” are given along with their applications to various examples so that readers will easily be able to use such tests on their own data sets.  相似文献   

8.
This paper proposes a procedure for testing and classifying data with multiple factors. A two-way analysis of covariance is used to classify the differences among the batches as well as another factor such as package type and/or product strength. In the test procedure, slopes and intercepts of the main effects are tested using a combination of simultaneous and sequential F-tests. Based on the test procedure results, the data are classified into one of four different groups. For each group, shelf life can be calculated accordingly. We examine if the procedure produces satisfactory control of the probability of a Type I error and the power of detecting the difference of degradation rates and intercepts for different nominal levels. The method is evaluated with a Monte Carlo simulation study. The proposed procedure is compared with the current FDA procedure using real data.  相似文献   

9.
Motivation: In searching for differentially expressed (DE) genesin microarray data, we often observe a fraction of the genesto have unequal variability between groups. This is not an issuein large samples, where a valid test exists that uses individualvariances separately. The problem arises in the small-samplesetting, where the approximately valid Welch test lacks sensitivity,while the more sensitive moderated t-test assumes equal variance. Methods: We introduce a moderated Welch test (MWT) that allowsunequal variance between groups. It is based on (i) weightingof pooled and unpooled standard errors and (ii) improved estimationof the gene-level variance that exploits the information fromacross the genes. Results: When a non-trivial proportion of genes has unequalvariability, false discovery rate (FDR) estimates based on thestandard t and moderated t-tests are often too optimistic, whilethe standard Welch test has low sensitivity. The MWT is shownto (i) perform better than the standard t, the standard Welchand the moderated t-tests when the variances are unequal betweengroups and (ii) perform similarly to the moderated t, and betterthan the standard t and Welch tests when the group variancesare equal. These results mean that MWT is more reliable thanother existing tests over wider range of data conditions. Availability: R package to perform MWT is available at http://www.meb.ki.se/~yudpaw Contact: yudi.pawitan{at}ki.se Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Martin Bishop  相似文献   

10.
Consider k independent exponential populations with location parameters μ1,…, μk and a common scale parameter or standard deviation θ. Let μ(k) be the largest of the μ's and define a population to be good if its location parameter exceeds μ(k) –Δ1. A selection procedure is proposed to select a subset of the k populations which includes the good populations with probability at least P*, a pre-assigned value. Simultaneous confidence intervals, that can be derived with the proposed selection procedure, are discussed. Moreover, if populations with locations below μ(k) –δ2, (δ2 > δ1) are “bad”, a selection procedure is proposed and a sample size is determined so that the probability of omitting a “good” population or selecting a “bad” population is at most 1 – P*.  相似文献   

11.
For J dependent groups, let θj, j = 1, …, J, be some measure of location associated with the jth group. A common goal is computing confidence intervals for the pairwise differences, θj — θk, j < k, such that the simultaneous probability coverage is 1 — α. If means are used, it is well known that slight departures from normality (as measured by the Kolmogorov distance) toward a heavy-tailed distribution can substantially inflate the standard error of the sample mean, which in turn can result in relatively low power. Also, when distributions differ in shape, or when sampling from skewed distributions with relatively light tails, practical problems arise when the goal is to obtain confidence intervals with simultaneous probability coverage reasonably close to the nominal level. Extant theoretical and simulation results suggest replacing means with trimmed means. The Tukey-McLaughlin method is easily adapted to the problem at hand via the Bonferroni inequality, but this paper illustrates that practical concerns remain. Here, the main result is that the percentile t bootstrap method, used in conjunction with trimmed means, gives improved probability coverage and substantially better power. A method based on a one-step M-estimator is also considered but found to be less satisfactory.  相似文献   

12.
The classical normal-theory tests for testing the null hypothesis of common variance and the classical estimates of scale have long been known to be quite nonrobust to even mild deviations from normality assumptions for moderate sample sizes. Levene (1960) suggested a one-way ANOVA type statistic as a robust test. Brown and Forsythe (1974) considered a modified version of Levene's test by replacing the sample means with sample medians as estimates of population locations, and their test is computationally the simplest among the three tests recommended by Conover , Johnson , and Johnson (1981) in terms of robustness and power. In this paper a new robust and powerful test for homogeneity of variances is proposed based on a modification of Levene's test using the weighted likelihood estimates (Markatou , Basu , and Lindsay , 1996) of the population means. For two and three populations the proposed test using the Hellinger distance based weighted likelihood estimates is observed to achieve better empirical level and power than Brown-Forsythe's test in symmetric distributions having a thicker tail than the normal, and higher empirical power in skew distributions under the use of F distribution critical values.  相似文献   

13.
In scientific research, many hypotheses relate to the comparison of two independent groups. Usually, it is of interest to use a design (i.e., the allocation of sample sizes m and n for fixed ) that maximizes the power of the applied statistical test. It is known that the two‐sample t‐tests for homogeneous and heterogeneous variances may lose substantial power when variances are unequal but equally large samples are used. We demonstrate that this is not the case for the nonparametric Wilcoxon–Mann–Whitney‐test, whose application in biometrical research fields is motivated by two examples from cancer research. We prove the optimality of the design in case of symmetric and identically shaped distributions using normal approximations and show that this design generally offers power only negligibly lower than the optimal design for a wide range of distributions.  相似文献   

14.

Background  

When analyzing microarray data a primary objective is often to find differentially expressed genes. With empirical Bayes and penalized t-tests the sample variances are adjusted towards a global estimate, producing more stable results compared to ordinary t-tests. However, for Affymetrix type data a clear dependency between variability and intensity-level generally exists, even for logged intensities, most clearly for data at the probe level but also for probe-set summarizes such as the MAS5 expression index. As a consequence, adjustment towards a global estimate results in an intensity-level dependent false positive rate.  相似文献   

15.
A generalization of the Behrens‐Fisher problem for two samples is examined in a nonparametric model. It is not assumed that the underlying distribution functions are continuous so that data with arbitrary ties can be handled. A rank test is considered where the asymptotic variance is estimated consistently by using the ranks over all observations as well as the ranks within each sample. The consistency of the estimator is derived in the appendix. For small samples (n1, n2 ≥ 10), a simple approximation by a central t‐distribution is suggested where the degrees of freedom are taken from the Satterthwaite‐Smith‐Welch approximation in the parametric Behrens‐Fisher problem. It is demonstrated by means of a simulation study that the Wilcoxon‐Mann‐Whitney‐test may be conservative or liberal depending on the ratio of the sample sizes and the variances of the underlying distribution functions. For the suggested approximation, however, it turns out that the nominal level is maintained rather accurately. The suggested nonparametric procedure is applied to a data set from a clinical trial. Moreover, a confidence interval for the nonparametric treatment effect is given.  相似文献   

16.

Background  

Determining whether a gene is differentially expressed in two different samples remains an important statistical problem. Prior work in this area has featured the use of t-tests with pooled estimates of the sample variance based on similarly expressed genes. These methods do not display consistent behavior across the entire range of pooling and can be biased when the prior hyperparameters are specified heuristically.  相似文献   

17.
A model of a pre-planned single joint movements performed without feedback is considered. Modifications of this movement result from transformation of a trajectory pattern f(t) in space and time. The control system adjusts the movement to concrete external conditions specifying values of the transform parameters before the movement performance. The preplanned movement is considered to be simple one, if the transform can be approximated by an affine transform of the movement space and time. In this case, the trajectory of the movement is x(t) = Af(t/ + s) +p, were A and 1/ are space and time scales, s and p are translations. The variability of movements is described by time profiles of variances and covariances of the trajectory x(t), velocity v(t), and acceleration a(t). It is assumed that the variability is defined only by parameters variations. From this assumption follows the main finding of this work: the variability time profiles can be expanded on a special system of basic functions corresponding to established movement parameters. Particularly, basic functions of variance time profiles, reflecting spatial and temporal scaling, are x 2(t) and t 2 v 2(t) for trajectory, v 2(t) and (v(t) + t · a(t))2 for velocity, and a 2(t) and (2a(t) +t · j(t))2, where j(t) = d3 x(t)/dt 3, for acceleration. The variability of a model of a reaching movement was studied analytically. The model predicts certain peculiarities of the form of time profiles (e.g., the variance time profile of velocity is bi-modal, the one of acceleration is tri-modal, etc.). Experimental measurements confirmed predictions. Their consistence allows them to be considered invariant properties of reaching movement. A conclusion can be made, that reaching movement belongs to the type of simple preplanned movements. For a more complex movement, time profiles of variability are also measured and explained by the model of movements of this type. Thus, a movement can be attributed to the type of simple pre-planned ones by testing its variability.  相似文献   

18.
Paired data arises in a wide variety of applications where often the underlying distribution of the paired differences is unknown. When the differences are normally distributed, the t‐test is optimum. On the other hand, if the differences are not normal, the t‐test can have substantially less power than the appropriate optimum test, which depends on the unknown distribution. In textbooks, when the normality of the differences is questionable, typically the non‐parametric Wilcoxon signed rank test is suggested. An adaptive procedure that uses the Shapiro‐Wilk test of normality to decide whether to use the t‐test or the Wilcoxon signed rank test has been employed in several studies. Faced with data from heavy tails, the U.S. Environmental Protection Agency (EPA) introduced another approach: it applies both the sign and t‐tests to the paired differences, the alternative hypothesis is accepted if either test is significant. This paper investigates the statistical properties of a currently used adaptive test, the EPA's method and suggests an alternative technique. The new procedure is easy to use and generally has higher empirical power, especially when the differences are heavy‐tailed, than currently used methods.  相似文献   

19.
The two‐sided Simes test is known to control the type I error rate with bivariate normal test statistics. For one‐sided hypotheses, control of the type I error rate requires that the correlation between the bivariate normal test statistics is non‐negative. In this article, we introduce a trimmed version of the one‐sided weighted Simes test for two hypotheses which rejects if (i) the one‐sided weighted Simes test rejects and (ii) both p‐values are below one minus the respective weighted Bonferroni adjusted level. We show that the trimmed version controls the type I error rate at nominal significance level α if (i) the common distribution of test statistics is point symmetric and (ii) the two‐sided weighted Simes test at level 2α controls the level. These assumptions apply, for instance, to bivariate normal test statistics with arbitrary correlation. In a simulation study, we compare the power of the trimmed weighted Simes test with the power of the weighted Bonferroni test and the untrimmed weighted Simes test. An additional result of this article ensures type I error rate control of the usual weighted Simes test under a weak version of the positive regression dependence condition for the case of two hypotheses. This condition is shown to apply to the two‐sided p‐values of one‐ or two‐sample t‐tests for bivariate normal endpoints with arbitrary correlation and to the corresponding one‐sided p‐values if the correlation is non‐negative. The Simes test for such types of bivariate t‐tests has not been considered before. According to our main result, the trimmed version of the weighted Simes test then also applies to the one‐sided bivariate t‐test with arbitrary correlation.  相似文献   

20.
Microarray experiments are being increasingly used in molecular biology. A common task is to detect genes with differential expression across two experimental conditions, such as two different tissues or the same tissue at two time points of biological development. To take proper account of statistical variability, some statistical approaches based on the t-statistic have been proposed. In constructing the t-statistic, one needs to estimate the variance of gene expression levels. With a small number of replicated array experiments, the variance estimation can be challenging. For instance, although the sample variance is unbiased, it may have large variability, leading to a large mean squared error. For duplicated array experiments, a new approach based on simple averaging has recently been proposed in the literature. Here we consider two more general approaches based on nonparametric smoothing. Our goal is to assess the performance of each method empirically. The three methods are applied to a colon cancer data set containing 2,000 genes. Using two arrays, we compare the variance estimates obtained from the three methods. We also consider their impact on the t-statistics. Our results indicate that the three methods give variance estimates close to each other. Due to its simplicity and generality, we recommend the use of the smoothed sample variance for data with a small number of replicates. Electronic Publication  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号