共查询到20条相似文献,搜索用时 15 毫秒
1.
Nonparametric and parametric approaches have been proposed to estimate false discovery rate under the independent hypothesis testing assumption. The parametric approach has been shown to have better performance than the nonparametric approaches. In this article, we study the nonparametric approaches and quantify the underlying relations between parametric and nonparametric approaches. Our study reveals the conservative nature of the nonparametric approaches, and establishes the connections between the empirical Bayes method and p-value-based nonparametric methods. Based on our results, we advocate using the parametric approach, or directly modeling the test statistics using the empirical Bayes method. 相似文献
2.
Benjamini and Hochberg's method for controlling the false discoveryrate is applied to the problem of testing infinitely many contrastsin linear models. Exact, easily calculated critical values arederived, defining a new multiple comparisons method for testingcontrasts in linear models. The method is adaptive, dependingon the data through the F-statistic, like the Waller–DuncanBayesian multiple comparisons method. Comparisons with Scheffé'smethod are given, and the method is extended to the simultaneousconfidence intervals of Benjamini and Yekutieli. 相似文献
3.
Testing of multiple hypotheses involves statistics that arestrongly dependent in some applications, but most work on thissubject is based on the assumption of independence. We proposea new method for estimating the false discovery rate of multiplehypothesis tests, in which the density of test scores is estimatedparametrically by minimizing the Kullback–Leibler distancebetween the unknown density and its estimator using the stochasticapproximation algorithm, and the false discovery rate is estimatedusing the ensemble averaging method. Our method is applicableunder general dependence between test statistics. Numericalcomparisons between our method and several competitors, conductedon simulated and real data examples, show that our method achievesmore accurate control of the false discovery rate in almostall scenarios. 相似文献
4.
Jung YY Oh MS Shin DW Kang SH Oh HS 《Biometrical journal. Biometrische Zeitschrift》2006,48(3):435-450
A Bayesian model-based clustering approach is proposed for identifying differentially expressed genes in meta-analysis. A Bayesian hierarchical model is used as a scientific tool for combining information from different studies, and a mixture prior is used to separate differentially expressed genes from non-differentially expressed genes. Posterior estimation of the parameters and missing observations are done by using a simple Markov chain Monte Carlo method. From the estimated mixture model, useful measure of significance of a test such as the Bayesian false discovery rate (FDR), the local FDR (Efron et al., 2001), and the integration-driven discovery rate (IDR; Choi et al., 2003) can be easily computed. The model-based approach is also compared with commonly used permutation methods, and it is shown that the model-based approach is superior to the permutation methods when there are excessive under-expressed genes compared to over-expressed genes or vice versa. The proposed method is applied to four publicly available prostate cancer gene expression data sets and simulated data sets. 相似文献
5.
False discovery control with p-value weighting 总被引:2,自引:0,他引:2
6.
In most analyses of large-scale genomic data sets, differentialexpression analysis is typically assessed by testing for differencesin the mean of the distributions between 2 groups. A recentfinding by Tomlins and others (2005) is of a different typeof pattern of differential expression in which a fraction ofsamples in one group have overexpression relative to samplesin the other group. In this work, we describe a general mixturemodel framework for the assessment of this type of expression,called outlier profile analysis. We start by considering thesingle-gene situation and establishing results on identifiability.We propose 2 nonparametric estimation procedures that have naturallinks to familiar multiple testing procedures. We then developmultivariate extensions of this methodology to handle genome-widemeasurements. The proposed methodologies are compared usingsimulation studies as well as data from a prostate cancer geneexpression study. 相似文献
7.
8.
9.
Summary . In this article, we apply the recently developed Bayesian wavelet-based functional mixed model methodology to analyze MALDI-TOF mass spectrometry proteomic data. By modeling mass spectra as functions, this approach avoids reliance on peak detection methods. The flexibility of this framework in modeling nonparametric fixed and random effect functions enables it to model the effects of multiple factors simultaneously, allowing one to perform inference on multiple factors of interest using the same model fit, while adjusting for clinical or experimental covariates that may affect both the intensities and locations of peaks in the spectra. For example, this provides a straightforward way to account for systematic block and batch effects that characterize these data. From the model output, we identify spectral regions that are differentially expressed across experimental conditions, in a way that takes both statistical and clinical significance into account and controls the Bayesian false discovery rate to a prespecified level. We apply this method to two cancer studies. 相似文献
10.
Tan YD 《Genomics》2011,98(5):390-399
Receiver operating characteristic (ROC) has been widely used to evaluate statistical methods, but a fatal problem is that ROC cannot evaluate estimation of the false discovery rate (FDR) of a statistical method and hence the area under of curve as a criterion cannot tell us if a statistical method is conservative. To address this issue, we propose an alternative criterion, work efficiency. Work efficiency is defined as the product of the power and degree of conservativeness of a statistical method. We conducted large-scale simulation comparisons among the optimizing discovery procedure (ODP), the Bonferroni (B-) procedure, Local FDR (Localfdr), ranking analysis of the F-statistics (RAF), the Benjamini-Hochberg (BH-) procedure, and significance analysis of microarray data (SAM). The results show that ODP, SAM, and the B-procedure perform with low efficiencies while the BH-procedure, RAF, and Localfdr work with higher efficiency. ODP and SAM have the same ROC curves but their efficiencies are significantly different. 相似文献
11.
Time-course studies of gene expression are essential in biomedical research to understand biological phenomena that evolve in a temporal fashion. We introduce a functional hierarchical model for detecting temporally differentially expressed (TDE) genes between two experimental conditions for cross-sectional designs, where the gene expression profiles are treated as functional data and modeled by basis function expansions. A Monte Carlo EM algorithm was developed for estimating both the gene-specific parameters and the hyperparameters in the second level of modeling. We use a direct posterior probability approach to bound the rate of false discovery at a pre-specified level and evaluate the methods by simulations and application to microarray time-course gene expression data on Caenorhabditis elegans developmental processes. Simulation results suggested that the procedure performs better than the two-way ANOVA in identifying TDE genes, resulting in both higher sensitivity and specificity. Genes identified from the C. elegans developmental data set show clear patterns of changes between the two experimental conditions. 相似文献
12.
13.
Microarray technology provides a powerful tool for the expression profile of thousands of genes simultaneously, which makes it possible to explore the molecular and metabolic etiology of the development of a complex disease under study. However, classical statistical methods and technologies fail to be applicable to microarray data. Therefore, it is necessary and motivating to develop powerful methods for large-scale statistical analyses. In this paper, we described a novel method, called Ranking Analysis of Microarray Data (RAM). RAM, which is a large-scale two-sample t-test method, is based on comparisons between a set of ranked T statistics and a set of ranked Z values (a set of ranked estimated null scores) yielded by a "randomly splitting" approach instead of a "permutation" approach and a two-simulation strategy for estimating the proportion of genes identified by chance, i.e., the false discovery rate (FDR). The results obtained from the simulated and observed microarray data show that RAM is more efficient in identification of genes differentially expressed and estimation of FDR under undesirable conditions such as a large fudge factor, small sample size, or mixture distribution of noises than Significance Analysis of Microarrays. 相似文献
14.
Summary Given a large number of t‐statistics, we consider the problem of approximating the distribution of noncentrality parameters (NCPs) by a continuous density. This problem is closely related to the control of false discovery rates (FDR) in massive hypothesis testing applications, e.g., microarray gene expression analysis. Our methodology is similar to, but improves upon, the existing approach by Ruppert, Nettleton, and Hwang (2007, Biometrics, 63, 483–495). We provide parametric, nonparametric, and semiparametric estimators for the distribution of NCPs, as well as estimates of the FDR and local FDR. In the parametric situation, we assume that the NCPs follow a distribution that leads to an analytically available marginal distribution for the test statistics. In the nonparametric situation, we use convex combinations of basis density functions to estimate the density of the NCPs. A sequential quadratic programming procedure is developed to maximize the penalized likelihood. The smoothing parameter is selected with the approximate network information criterion. A semiparametric estimator is also developed to combine both parametric and nonparametric fits. Simulations show that, under a variety of situations, our density estimates are closer to the underlying truth and our FDR estimates are improved compared with alternative methods. Data‐based simulations and the analyses of two microarray datasets are used to evaluate the performance in realistic situations. 相似文献
15.
Wavelet thresholding with bayesian false discovery rate control 总被引:1,自引:0,他引:1
The false discovery rate (FDR) procedure has become a popular method for handling multiplicity in high-dimensional data. The definition of FDR has a natural Bayesian interpretation; it is the expected proportion of null hypotheses mistakenly rejected given a measure of evidence for their truth. In this article, we propose controlling the positive FDR using a Bayesian approach where the rejection rule is based on the posterior probabilities of the null hypotheses. Correspondence between Bayesian and frequentist measures of evidence in hypothesis testing has been studied in several contexts. Here we extend the comparison to multiple testing with control of the FDR and illustrate the procedure with an application to wavelet thresholding. The problem consists of recovering signal from noisy measurements. This involves extracting wavelet coefficients that result from true signal and can be formulated as a multiple hypotheses-testing problem. We use simulated examples to compare the performance of our approach to the Benjamini and Hochberg (1995, Journal of the Royal Statistical Society, Series B57, 289-300) procedure. We also illustrate the method with nuclear magnetic resonance spectral data from human brain. 相似文献
16.
Repsilber D Mira A Lindroos H Andersson S Ziegler A 《Biometrical journal. Biometrische Zeitschrift》2005,47(4):585-598
Unsequenced bacterial strains can be characterized by comparing their genomic DNA to a sequenced reference genome of the same species. This comparative genomic approach, also called genomotyping, is leading to an increased understanding of bacterial evolution and pathogenesis. It is efficiently accomplished by comparative genomic hybridization on custom-designed cDNA microarrays. The microarray experiment results in fluorescence intensities for reference and sample genome for each gene. The log-ratio of these intensities is usually compared to a cut-off, classifying each gene of the sample genome as a candidate for an absent or present gene with respect to the reference genome. Reducing the usually high rate of false positives in the list of candidates for absent genes is decisive for both time and costs of the experiment. We propose a novel method to improve efficiency of genomotyping experiments in this sense, by rotating the normalized intensity data before setting up the list of candidate genes. We analyze simulated genomotyping data and also re-analyze an experimental data set for comparison and illustration. We approximately halve the proportion of false positives in the list of candidate absent genes for the example comparative genomic hybridization experiment as well as for the simulation experiments. 相似文献
17.
I. Ahmed C. Dalmasso F. Haramburu F. Thiessard P. Broët P. Tubert-Bitter 《Biometrics》2010,66(1):301-309
Summary . Pharmacovigilance systems aim at early detection of adverse effects of marketed drugs. They maintain large spontaneous reporting databases for which several automatic signaling methods have been developed. One limit of those methods is that the decision rules for the signal generation are based on arbitrary thresholds. In this article, we propose a new signal-generation procedure. The decision criterion is formulated in terms of a critical region for the P-values resulting from the reporting odds ratio method as well as from the Fisher's exact test. For the latter, we also study the use of mid-P-values. The critical region is defined by the false discovery rate, which can be estimated by adapting the P-values mixture model based procedures to one-sided tests. The methodology is mainly illustrated with the location-based estimator procedure. It is studied through a large simulation study and applied to the French pharmacovigilance database. 相似文献
18.
19.
Summary A new methodology is proposed for estimating the proportion of true null hypotheses in a large collection of tests. Each test concerns a single parameter δ whose value is specified by the null hypothesis. We combine a parametric model for the conditional cumulative distribution function (CDF) of the p‐value given δ with a nonparametric spline model for the density g(δ) of δ under the alternative hypothesis. The proportion of true null hypotheses and the coefficients in the spline model are estimated by penalized least squares subject to constraints that guarantee that the spline is a density. The estimator is computed efficiently using quadratic programming. Our methodology produces an estimate of the density of δ when the null is false and can address such questions as “when the null is false, is the parameter usually close to the null or far away?” This leads us to define a falsely interesting discovery rate (FIDR), a generalization of the false discovery rate. We contrast the FIDR approach to Efron's (2004, Journal of the American Statistical Association 99, 96–104) empirical null hypothesis technique. We discuss the use of in sample size calculations based on the expected discovery rate (EDR). Our recommended estimator of the proportion of true nulls has less bias compared to estimators based upon the marginal density of the p‐values at 1. In a simulation study, we compare our estimators to the convex, decreasing estimator of Langaas, Lindqvist, and Ferkingstad (2005, Journal of the Royal Statistical Society, Series B 67, 555–572). The most biased of our estimators is very similar in performance to the convex, decreasing estimator. As an illustration, we analyze differences in gene expression between resistant and susceptible strains of barley. 相似文献
20.
Summary Meta‐analysis summarizes the results of a series of trials. When more than two treatments are included in the trials and when the set of treatments tested differs between trials, the combination of results across trials requires some care. Several methods have been proposed for this purpose, which feature under different labels, such as network meta‐analysis or mixed treatment comparisons. Two types of linear mixed model can be used for meta‐analysis. The one expresses the expected outcome of treatments as a contrast to a baseline treatment. The other uses a classical two‐way linear predictor with main effects for treatment and trial. In this article, we compare both types of model and explore under which conditions they give equivalent results. We illustrate practical advantages of the two‐way model using two published datasets. In particular, it is shown that between‐trial heterogeneity as well as inconsistency between different types of trial is straightforward to account for. 相似文献