首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Combining information across genes in the statistical analysis of microarray data is desirable because of the relatively small number of data points obtained for each individual gene. Here we develop an estimator of the error variance that can borrow information across genes using the James-Stein shrinkage concept. A new test statistic (FS) is constructed using this estimator. The new statistic is compared with other statistics used to test for differential expression: the gene-specific F test (F1), the pooled-variance F statistic (F3), a hybrid statistic (F2) that uses the average of the individual and pooled variances, the regularized t-statistic, the posterior odds statistic B, and the SAM t-test. The FS-test shows best or nearly best power for detecting differentially expressed genes over a wide range of simulated data in which the variance components associated with individual genes are either homogeneous or heterogeneous. Thus FS provides a powerful and robust approach to test differential expression of genes that utilizes information not available in individual gene testing approaches and does not suffer from biases of the pooled variance approach.  相似文献   

2.
Testing for unequal variances is usually performed in order to check the validity of the assumptions that underlie standard tests for differences between means (the t-test and anova). However, existing methods for testing for unequal variances (Levene's test and Bartlett's test) are notoriously non-robust to normality assumptions, especially for small sample sizes. Moreover, although these methods were designed to deal with one hypothesis at a time, modern applications (such as to microarrays and fMRI experiments) often involve parallel testing over a large number of levels (genes or voxels). Moreover, in these settings a shift in variance may be biologically relevant, perhaps even more so than a change in the mean. This paper proposes a parsimonious model for parallel testing of the equal variance hypothesis. It is designed to work well when the number of tests is large; typically much larger than the sample sizes. The tests are implemented using an empirical Bayes estimation procedure which `borrows information' across levels. The method is shown to be quite robust to deviations from normality, and to substantially increase the power to detect differences in variance over the more traditional approaches even when the normality assumption is valid.  相似文献   

3.
This paper investigates homogeneity test of rate ratios in stratified matched-pair studies on the basis of asymptotic and bootstrap-resampling methods. Based on the efficient score approach, we develop a simple and computationally tractable score test statistic. Several other homogeneity test statistics are also proposed on the basis of the weighted least-squares estimate and logarithmic transformation. Sample size formulae are derived to guarantee a pre-specified power for the proposed tests at the pre-given significance level. Empirical results confirm that (i) the modified score statistic based on the bootstrap-resampling method performs better in the sense that its empirical type I error rate is much closer to the pre-specified nominal level than those of other tests and its power is greater than those of other tests, and is hence recommended, whilst the statistics based on the weighted least-squares estimate and logarithmic transformation are slightly conservative under some of the considered settings; (ii) the derived sample size formulae are rather accurate in the sense that their empirical powers obtained from the estimated sample sizes are very close to the pre-specified nominal powers. A real example is used to illustrate the proposed methodologies.  相似文献   

4.
The classical normal-theory tests for testing the null hypothesis of common variance and the classical estimates of scale have long been known to be quite nonrobust to even mild deviations from normality assumptions for moderate sample sizes. Levene (1960) suggested a one-way ANOVA type statistic as a robust test. Brown and Forsythe (1974) considered a modified version of Levene's test by replacing the sample means with sample medians as estimates of population locations, and their test is computationally the simplest among the three tests recommended by Conover , Johnson , and Johnson (1981) in terms of robustness and power. In this paper a new robust and powerful test for homogeneity of variances is proposed based on a modification of Levene's test using the weighted likelihood estimates (Markatou , Basu , and Lindsay , 1996) of the population means. For two and three populations the proposed test using the Hellinger distance based weighted likelihood estimates is observed to achieve better empirical level and power than Brown-Forsythe's test in symmetric distributions having a thicker tail than the normal, and higher empirical power in skew distributions under the use of F distribution critical values.  相似文献   

5.
The Cochran-Armitage trend test is commonly used as a genotype-based test for candidate gene association. Corresponding to each underlying genetic model there is a particular set of scores assigned to the genotypes that maximizes its power. When the variance of the test statistic is known, the formulas for approximate power and associated sample size are readily obtained. In practice, however, the variance of the test statistic needs to be estimated. We present formulas for the required sample size to achieve a prespecified power that account for the need to estimate the variance of the test statistic. When the underlying genetic model is unknown one can incur a substantial loss of power when a test suitable for one mode of inheritance is used where another mode is the true one. Thus, tests having good power properties relative to the optimal tests for each model are useful. These tests are called efficiency robust and we study two of them: the maximin efficiency robust test is a linear combination of the standardized optimal tests that has high efficiency and the MAX test, the maximum of the standardized optimal tests. Simulation results of the robustness of these two tests indicate that the more computationally involved MAX test is preferable.  相似文献   

6.
MOTIVATION: The desire to compare molecular phylogenies has stimulated the design of numerous tests. Most of these tests are formulated in a frequentist framework, and it is not known how they compare with Bayes procedures. I propose here two new Bayes tests that either compare pairs of trees (Bayes hypothesis test, BHT), or test each tree against an average of the trees included in the analysis (Bayes significance test, BST). RESULTS: The algorithm, based on a standard Metropolis-Hastings sampler, integrates nuisance parameters out and estimates the probability of the data under each topology. These quantities are used to estimate Bayes factors for composite vs. composite hypotheses. Based on two data sets, the BHT and BST are shown to construct similar confidence sets to the bootstrap and the Shimodaira Hasegawa test, respectively. This suggests that the known difference among previous tests is mainly due to the null hypothesis considered.  相似文献   

7.
Microarrays are widely used for examining differential gene expression, identifying single nucleotide polymorphisms, and detecting methylation loci. Multiple testing methods in microarray data analysis aim at controlling both Type I and Type II error rates; however, real microarray data do not always fit their distribution assumptions. Smyth''s ubiquitous parametric method, for example, inadequately accommodates violations of normality assumptions, resulting in inflated Type I error rates. The Significance Analysis of Microarrays, another widely used microarray data analysis method, is based on a permutation test and is robust to non-normally distributed data; however, the Significance Analysis of Microarrays method fold change criteria are problematic, and can critically alter the conclusion of a study, as a result of compositional changes of the control data set in the analysis. We propose a novel approach, combining resampling with empirical Bayes methods: the Resampling-based empirical Bayes Methods. This approach not only reduces false discovery rates for non-normally distributed microarray data, but it is also impervious to fold change threshold since no control data set selection is needed. Through simulation studies, sensitivities, specificities, total rejections, and false discovery rates are compared across the Smyth''s parametric method, the Significance Analysis of Microarrays, and the Resampling-based empirical Bayes Methods. Differences in false discovery rates controls between each approach are illustrated through a preterm delivery methylation study. The results show that the Resampling-based empirical Bayes Methods offer significantly higher specificity and lower false discovery rates compared to Smyth''s parametric method when data are not normally distributed. The Resampling-based empirical Bayes Methods also offers higher statistical power than the Significance Analysis of Microarrays method when the proportion of significantly differentially expressed genes is large for both normally and non-normally distributed data. Finally, the Resampling-based empirical Bayes Methods are generalizable to next generation sequencing RNA-seq data analysis.  相似文献   

8.
The importance of variance modelling is now widely known for the analysis of microarray data. In particular the power and accuracy of statistical tests for differential gene expressions are highly dependent on variance modelling. The aim of this paper is to use a structural model on the variances, which includes a condition effect and a random gene effect, and to propose a simple estimation procedure for these parameters by working on the empirical variances. The proposed variance model was compared with various methods on both real and simulated data. It proved to be more powerful than the gene-by-gene analysis and more robust to the number of false positives than the homogeneous variance model. It performed well compared with recently proposed approaches such as SAM and VarMixt even for a small number of replicates, and performed similarly to Limma. The main advantage of the structural model is that, thanks to the use of a linear mixed model on the logarithm of the variances, various factors of variation can easily be incorporated in the model, which is not the case for previously proposed empirical Bayes methods. It is also very fast to compute and is adapted to the comparison of more than two conditions.  相似文献   

9.
There is growing interest in understanding how the brain utilizes synchronized oscillatory activity to integrate information across functionally connected regions. Computing phase-locking values (PLV) between EEG signals is a popular method for quantifying such synchronizations and elucidating their role in cognitive tasks. However, high-dimensionality in PLV data incurs a serious multiple testing problem. Standard multiple testing methods in neuroimaging research (e.g., false discovery rate, FDR) suffer severe loss of power, because they fail to exploit complex dependence structure between hypotheses that vary in spectral, temporal and spatial dimension. Previously, we showed that a hierarchical FDR and optimal discovery procedures could be effectively applied for PLV analysis to provide better power than FDR. In this article, we revisit the multiple comparison problem from a new Empirical Bayes perspective and propose the application of the local FDR method (locFDR; Efron, 2001) for PLV synchrony analysis to compute FDR as a posterior probability that an observed statistic belongs to a null hypothesis. We demonstrate the application of Efron''s Empirical Bayes approach for PLV synchrony analysis for the first time. We use simulations to validate the specificity and sensitivity of locFDR and a real EEG dataset from a visual search study for experimental validation. We also compare locFDR with hierarchical FDR and optimal discovery procedures in both simulation and experimental analyses. Our simulation results showed that the locFDR can effectively control false positives without compromising on the power of PLV synchrony inference. Our results from the application locFDR on experiment data detected more significant discoveries than our previously proposed methods whereas the standard FDR method failed to detect any significant discoveries.  相似文献   

10.
MOTIVATION: One of the recently developed statistics for identifying differentially expressed genetic networks is Hotelling T2 statistic, which is a quadratic form of difference in linear functions of means of gene expressions between two types of tissue samples, and so their power is limited. RESULTS: To improve the power of test statistics, a general statistical framework for construction of non-linear tests is presented, and two specific non-linear test statistics that use non-linear transformations of means are developed. Asymptotical distributions of the non-linear test statistics under the null and alternative hypothesis are derived. It has been proved that under some conditions the power of the non-linear test statistics is higher than that of the T2 statistic. Besides theory, to evaluate in practice the performance of the non-linear test statistics, they are applied to two real datasets. The preliminary results demonstrate that the P-values of the non-linear statistics for testing differential expressions of the genetic networks are much smaller than those of the T2 statistic. And furthermore simulations show the Type I errors of the non-linear statistics agree with the threshold used and the statistics fit the chi2 distribution. SUPPLEMENTARY INFORMATION: Supplementary data are available on Bioinformatics online.  相似文献   

11.
To fully elucidate the functional relationship between DNA methylation and histone hypoacetylation in gene silencing, we have developed an integrated "triple" microarray system that allows us to begin to decipher the influence of epigenetic hierarchies on the regulation of gene expression in cancer cells. Our hypothesis is that in the promoter region of a silenced gene, reversal of two epigenetic factors (i.e., DNA demethylation and/or histone hyperacetylation) is highly correlated with gene reexpression after treatment of the human epithelial ovarian cancer cell line CP70 with the drug combination 5-aza-2'-deoxycytidine (DAC), a demethylating agent, and trichostatin A (TSA), an inhibitor of histone deacetylases. To estimate the posterior probabilities for genes with altered expression, DNA methylation and histone acetylation status measured with a triple-microarray system, we have employed an established empirical Bayes model. Two methods have been proposed to test our hypothesis that DNA demethylation and histone hyperacetylation are highly correlated among those up-regulated genes. One method follows a weighted least squares regression, while the other is derived from a chi-square statistic. The data derived by these approaches, which have been further verified through bootstrap analyses, support the proposed epigenetic correlation (p-values are less than 0.001). Further simulations suggest that even if the constant variance and normality assumptions do not hold, the power of those two tests is robust.  相似文献   

12.
The availability of cheap and abundant molecular markers has led to plant-breeding methods that rely on the prediction of genotypic value from marker data, but published information is lacking on the accuracy of genotypic value predictions with empirical data in plants. Our objectives were to (1) determine the accuracy of genotypic value predictions from multiple linear regression (MLR) and genomewide selection via best linear unbiased prediction (BLUP) in biparental plant populations; (2) assess the accuracy of predictions for different numbers of markers (N M) and progenies (N P) used in estimation; and (3) determine if an empirical Bayes approach for modeling of the variances of individual markers and of epistatic effects leads to more accurate predictions in empirical data. We divided each of four maize (Zea mays L.) datasets, one Arabidopsis dataset, and two barley (Hordeum vulgare L.) datasets into an estimation set, where marker effects were calculated, and a test set, where genotypic values were predicted based on markers. Predictions were more accurate with BLUP than with MLR. Predictions became more accurate as N P and N M increased, until sufficient genome coverage was reached. Modeling marker variances with the empirical Bayes method sometimes led to slightly better predictions, but the accuracy with different variants of the empirical Bayes method was often inconsistent. In nearly all cases, the accuracy with BLUP was not significantly different from the highest accuracy across all methods. Accounting for epistasis in the empirical Bayes procedure led to poorer predictions. We concluded that among the methods considered, the quick and simple BLUP approach is the method of choice for predicting genotypic value in biparental plant populations.  相似文献   

13.
Zhang K  Wiener H  Beasley M  George V  Amos CI  Allison DB 《Genetics》2006,173(4):2283-2296
Individual genome scans for quantitative trait loci (QTL) mapping often suffer from low statistical power and imprecise estimates of QTL location and effect. This lack of precision yields large confidence intervals for QTL location, which are problematic for subsequent fine mapping and positional cloning. In prioritizing areas for follow-up after an initial genome scan and in evaluating the credibility of apparent linkage signals, investigators typically examine the results of other genome scans of the same phenotype and informally update their beliefs about which linkage signals in their scan most merit confidence and follow-up via a subjective-intuitive integration approach. A method that acknowledges the wisdom of this general paradigm but formally borrows information from other scans to increase confidence in objectivity would be a benefit. We developed an empirical Bayes analytic method to integrate information from multiple genome scans. The linkage statistic obtained from a single genome scan study is updated by incorporating statistics from other genome scans as prior information. This technique does not require that all studies have an identical marker map or a common estimated QTL effect. The updated linkage statistic can then be used for the estimation of QTL location and effect. We evaluate the performance of our method by using extensive simulations based on actual marker spacing and allele frequencies from available data. Results indicate that the empirical Bayes method can account for between-study heterogeneity, estimate the QTL location and effect more precisely, and provide narrower confidence intervals than results from any single individual study. We also compared the empirical Bayes method with a method originally developed for meta-analysis (a closely related but distinct purpose). In the face of marked heterogeneity among studies, the empirical Bayes method outperforms the comparator.  相似文献   

14.
MOTIVATION: Spotted arrays are often printed with probes in duplicate or triplicate, but current methods for assessing differential expression are not able to make full use of the resulting information. The usual practice is to average the duplicate or triplicate results for each probe before assessing differential expression. This results in the loss of valuable information about genewise variability. RESULTS: A method is proposed for extracting more information from within-array replicate spots in microarray experiments by estimating the strength of the correlation between them. The method involves fitting separate linear models to the expression data for each gene but with a common value for the between-replicate correlation. The method greatly improves the precision with which the genewise variances are estimated and thereby improves inference methods designed to identify differentially expressed genes. The method may be combined with empirical Bayes methods for moderating the genewise variances between genes. The method is validated using data from a microarray experiment involving calibration and ratio control spots in conjunction with spiked-in RNA. Comparing results for calibration and ratio control spots shows that the common correlation method results in substantially better discrimination of differentially expressed genes from those which are not. The spike-in experiment also confirms that the results may be further improved by empirical Bayes smoothing of the variances when the sample size is small. AVAILABILITY: The methodology is implemented in the limma software package for R, available from the CRAN repository http://www.r-project.org  相似文献   

15.
Statistical power of the classical twin design was revisited. The approximate sampling variances of a least-squares estimate of the heritability in a univariate analysis and estimate of the genetic correlation coefficient in a bivariate analysis were derived analytically for the ACE model. Statistical power to detect additive genetic variation under the ACE model was derived analytically for least-squares, goodness-of-fit and maximum likelihood-based test statistics. The noncentrality parameter for the likelihood ratio test statistic is shown to be a simple function of the MZ and DZ intraclass correlation coefficients and the proportion of MZ and DZ twin pairs in the sample. All theoretical results were validated using simulation. The derived expressions can be used to calculate power of the classical twin design in a simple and rapid manner.  相似文献   

16.
Scientific discovery requires both abstract, theoretically defined concepts and discovery operations formed by sets of rules that permit the empirical detection of instances of those concepts. In this paper, I examine the ontological status of discovery operations and the tests employed to evaluate them in evolutionary biology. Attention is drawn to the distinction between nomothetic (universal, predictive) and ideographic (historical, retrodictive) discovery operations, and between complementary and exclusive discovery operations. Three types of tests of discovery operations are commonly employed in evolutionary biology. Theoretical tests aim to show that a discovery operation is inconsistent with accepted, well-corroborated, empirical theories. Empirical tests evaluate the performance of competing discovery operations in terms of their results when applied to the same empirical data sets. Philosophical tests aim to show that an operation is inconsistent with logical and epistemological principles. Appropriately designed theoretical and philosophical tests of ideographic discovery operations may be scientifically valid. Empirical tests, however, are incapable of evaluating the scientific merits of competing discovery operations. Nonetheless, empirical comparisons (not tests ) of competing discovery operations may provide insight into the ways discovery operations may be misleading and therefore may play an important role in stimulating critical debate and eventually establishing a scientifically optimal operation. In practice, theoretical and philosophical tests are often combined to test competing discovery operations as rigorously as possible.  相似文献   

17.
Wu B  Guan Z  Zhao H 《Biometrics》2006,62(3):735-744
Nonparametric and parametric approaches have been proposed to estimate false discovery rate under the independent hypothesis testing assumption. The parametric approach has been shown to have better performance than the nonparametric approaches. In this article, we study the nonparametric approaches and quantify the underlying relations between parametric and nonparametric approaches. Our study reveals the conservative nature of the nonparametric approaches, and establishes the connections between the empirical Bayes method and p-value-based nonparametric methods. Based on our results, we advocate using the parametric approach, or directly modeling the test statistics using the empirical Bayes method.  相似文献   

18.
A class of nonparametric statistical methods, including a nonparametric empirical Bayes (EB) method, the Significance Analysis of Microarrays (SAM) and the mixture model method (MMM) have been proposed to detect differential gene expression for replicated microarray experiments. They all depend on constructing a test statistic, for example, a t-statistic, and then using permutation to draw inferences. However, due to special features of microarray data, using standard permutation scores may not estimate the null distribution of the test statistic well, leading to possibly too conservative inferences. We propose a new method of constructing weighted permutation scores to overcome the problem: posterior probabilities of having no differential expression from the EB method are used as weights for genes to better estimate the null distribution of the test statistic. We also propose a weighted method to estimate the false discovery rate (FDR) using the posterior probabilities. Using simulated data and real data for time-course microarray experiments, we show the improved performance of the proposed methods when implemented in MMM, EB and SAM.  相似文献   

19.
Anthony Almudevar 《Biometrics》2001,57(4):1080-1088
The problem of inferring kinship structure among a sample of individuals using genetic markers is considered with the objective of developing hypothesis tests for genetic relatedness with nearly optimal properties. The class of tests considered are those that are constrained to be permutation invariant, which in this context defines tests whose properties do not depend on the labeling of the individuals. This is appropriate when all individuals are to be treated identically from a statistical point of view. The approach taken is to derive tests that are probably most powerful for a permutation invariant alternative hypothesis that is, in some sense, close to a null hypothesis of mutual independence. This is analagous to the locally most powerful test commonly used in parametric inference. Although the resulting test statistic is a U-statistic, normal approximation theory is found to be inapplicable because of high skewness. As an alternative it is found that a conditional procedure based on the most powerful test statistic can calculate accurate significance levels without much loss in power. Examples are given in which this type of test proves to be more powerful than a number of alternatives considered in the literature, including Queller and Goodknight's (1989) estimate of genetic relatedness, the average number of shared alleles (Blouin, 1996), and the number of feasible sibling triples (Almudevar and Field, 1999).  相似文献   

20.
Sensitivity and specificity have traditionally been used to assess the performance of a diagnostic procedure. Diagnostic procedures with both high sensitivity and high specificity are desirable, but these procedures are frequently too expensive, hazardous, and/or difficult to operate. A less sophisticated procedure may be preferred, if the loss of the sensitivity or specificity is determined to be clinically acceptable. This paper addresses the problem of simultaneous testing of sensitivity and specificity for an alternative test procedure with a reference test procedure when a gold standard is present. The hypothesis is formulated as a compound hypothesis of two non‐inferiority (one‐sided equivalence) tests. We present an asymptotic test statistic based on the restricted maximum likelihood estimate in the framework of comparing two correlated proportions under the prospective and retrospective sampling designs. The sample size and power of an asymptotic test statistic are derived. The actual type I error and power are calculated by enumerating the exact probabilities in the rejection region. For applications that require high sensitivity as well as high specificity, a large number of positive subjects and a large number of negative subjects are needed. We also propose a weighted sum statistic as an alternative test by comparing a combined measure of sensitivity and specificity of the two procedures. The sample size determination is independent of the sampling plan for the two tests.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号