共查询到20条相似文献,搜索用时 0 毫秒
1.
Jung YY Oh MS Shin DW Kang SH Oh HS 《Biometrical journal. Biometrische Zeitschrift》2006,48(3):435-450
A Bayesian model-based clustering approach is proposed for identifying differentially expressed genes in meta-analysis. A Bayesian hierarchical model is used as a scientific tool for combining information from different studies, and a mixture prior is used to separate differentially expressed genes from non-differentially expressed genes. Posterior estimation of the parameters and missing observations are done by using a simple Markov chain Monte Carlo method. From the estimated mixture model, useful measure of significance of a test such as the Bayesian false discovery rate (FDR), the local FDR (Efron et al., 2001), and the integration-driven discovery rate (IDR; Choi et al., 2003) can be easily computed. The model-based approach is also compared with commonly used permutation methods, and it is shown that the model-based approach is superior to the permutation methods when there are excessive under-expressed genes compared to over-expressed genes or vice versa. The proposed method is applied to four publicly available prostate cancer gene expression data sets and simulated data sets. 相似文献
2.
Summary Given a large number of t‐statistics, we consider the problem of approximating the distribution of noncentrality parameters (NCPs) by a continuous density. This problem is closely related to the control of false discovery rates (FDR) in massive hypothesis testing applications, e.g., microarray gene expression analysis. Our methodology is similar to, but improves upon, the existing approach by Ruppert, Nettleton, and Hwang (2007, Biometrics, 63, 483–495). We provide parametric, nonparametric, and semiparametric estimators for the distribution of NCPs, as well as estimates of the FDR and local FDR. In the parametric situation, we assume that the NCPs follow a distribution that leads to an analytically available marginal distribution for the test statistics. In the nonparametric situation, we use convex combinations of basis density functions to estimate the density of the NCPs. A sequential quadratic programming procedure is developed to maximize the penalized likelihood. The smoothing parameter is selected with the approximate network information criterion. A semiparametric estimator is also developed to combine both parametric and nonparametric fits. Simulations show that, under a variety of situations, our density estimates are closer to the underlying truth and our FDR estimates are improved compared with alternative methods. Data‐based simulations and the analyses of two microarray datasets are used to evaluate the performance in realistic situations. 相似文献
3.
Tan YD 《Genomics》2011,98(5):390-399
Receiver operating characteristic (ROC) has been widely used to evaluate statistical methods, but a fatal problem is that ROC cannot evaluate estimation of the false discovery rate (FDR) of a statistical method and hence the area under of curve as a criterion cannot tell us if a statistical method is conservative. To address this issue, we propose an alternative criterion, work efficiency. Work efficiency is defined as the product of the power and degree of conservativeness of a statistical method. We conducted large-scale simulation comparisons among the optimizing discovery procedure (ODP), the Bonferroni (B-) procedure, Local FDR (Localfdr), ranking analysis of the F-statistics (RAF), the Benjamini-Hochberg (BH-) procedure, and significance analysis of microarray data (SAM). The results show that ODP, SAM, and the B-procedure perform with low efficiencies while the BH-procedure, RAF, and Localfdr work with higher efficiency. ODP and SAM have the same ROC curves but their efficiencies are significantly different. 相似文献
4.
Time-course studies of gene expression are essential in biomedical research to understand biological phenomena that evolve in a temporal fashion. We introduce a functional hierarchical model for detecting temporally differentially expressed (TDE) genes between two experimental conditions for cross-sectional designs, where the gene expression profiles are treated as functional data and modeled by basis function expansions. A Monte Carlo EM algorithm was developed for estimating both the gene-specific parameters and the hyperparameters in the second level of modeling. We use a direct posterior probability approach to bound the rate of false discovery at a pre-specified level and evaluate the methods by simulations and application to microarray time-course gene expression data on Caenorhabditis elegans developmental processes. Simulation results suggested that the procedure performs better than the two-way ANOVA in identifying TDE genes, resulting in both higher sensitivity and specificity. Genes identified from the C. elegans developmental data set show clear patterns of changes between the two experimental conditions. 相似文献
5.
Testing of multiple hypotheses involves statistics that arestrongly dependent in some applications, but most work on thissubject is based on the assumption of independence. We proposea new method for estimating the false discovery rate of multiplehypothesis tests, in which the density of test scores is estimatedparametrically by minimizing the Kullback–Leibler distancebetween the unknown density and its estimator using the stochasticapproximation algorithm, and the false discovery rate is estimatedusing the ensemble averaging method. Our method is applicableunder general dependence between test statistics. Numericalcomparisons between our method and several competitors, conductedon simulated and real data examples, show that our method achievesmore accurate control of the false discovery rate in almostall scenarios. 相似文献
6.
Repsilber D Mira A Lindroos H Andersson S Ziegler A 《Biometrical journal. Biometrische Zeitschrift》2005,47(4):585-598
Unsequenced bacterial strains can be characterized by comparing their genomic DNA to a sequenced reference genome of the same species. This comparative genomic approach, also called genomotyping, is leading to an increased understanding of bacterial evolution and pathogenesis. It is efficiently accomplished by comparative genomic hybridization on custom-designed cDNA microarrays. The microarray experiment results in fluorescence intensities for reference and sample genome for each gene. The log-ratio of these intensities is usually compared to a cut-off, classifying each gene of the sample genome as a candidate for an absent or present gene with respect to the reference genome. Reducing the usually high rate of false positives in the list of candidates for absent genes is decisive for both time and costs of the experiment. We propose a novel method to improve efficiency of genomotyping experiments in this sense, by rotating the normalized intensity data before setting up the list of candidate genes. We analyze simulated genomotyping data and also re-analyze an experimental data set for comparison and illustration. We approximately halve the proportion of false positives in the list of candidate absent genes for the example comparative genomic hybridization experiment as well as for the simulation experiments. 相似文献
7.
SUMMARY: We consider the problem of testing for partial conjunction of hypothesis, which argues that at least u out of n tested hypotheses are false. It offers an in-between approach to the testing of the conjunction of null hypotheses against the alternative that at least one is not, and the testing of the disjunction of null hypotheses against the alternative that all hypotheses are not null. We suggest powerful test statistics for testing such a partial conjunction hypothesis that are valid under dependence between the test statistics as well as under independence. We then address the problem of testing many partial conjunction hypotheses simultaneously using the false discovery rate (FDR) approach. We prove that if the FDR controlling procedure in Benjamini and Hochberg (1995, Journal of the Royal Statistical Society, Series B 57, 289-300) is used for this purpose the FDR is controlled under various dependency structures. Moreover, we can screen at all levels simultaneously in order to display the findings on a superimposed map and still control an appropriate FDR measure. We apply the method to examples from microarray analysis and functional magnetic resonance imaging (fMRI), two application areas where the need for partial conjunction analysis has been identified. 相似文献
8.
Microarray technology provides a powerful tool for the expression profile of thousands of genes simultaneously, which makes it possible to explore the molecular and metabolic etiology of the development of a complex disease under study. However, classical statistical methods and technologies fail to be applicable to microarray data. Therefore, it is necessary and motivating to develop powerful methods for large-scale statistical analyses. In this paper, we described a novel method, called Ranking Analysis of Microarray Data (RAM). RAM, which is a large-scale two-sample t-test method, is based on comparisons between a set of ranked T statistics and a set of ranked Z values (a set of ranked estimated null scores) yielded by a "randomly splitting" approach instead of a "permutation" approach and a two-simulation strategy for estimating the proportion of genes identified by chance, i.e., the false discovery rate (FDR). The results obtained from the simulated and observed microarray data show that RAM is more efficient in identification of genes differentially expressed and estimation of FDR under undesirable conditions such as a large fudge factor, small sample size, or mixture distribution of noises than Significance Analysis of Microarrays. 相似文献
9.
Summary For analysis of genomic data, e.g., microarray data from gene expression profiling experiments, the two‐component mixture model has been widely used in practice to detect differentially expressed genes. However, it naïvely imposes strong exchangeability assumptions across genes and does not make active use of a priori information about intergene relationships that is currently available, e.g., gene annotations through the Gene Ontology (GO) project. We propose a general strategy that first generates a set of covariates that summarizes the intergene information and then extends the two‐component mixture model into a hierarchical semiparametric model utilizing the generated covariates through latent nonparametric regression. Simulations and analysis of real microarray data show that our method can outperform the naïve two‐component mixture model. 相似文献
10.
隐马尔科夫过程在生物信息学中的应用 总被引:3,自引:0,他引:3
隐马尔科夫过程(hidden markov model,简称HMM)是20世纪70年代提出来的一种统计方法,以前主要用于语音识别。1989年Churchill将其引入计算生物学。目前,HMM是生物信息学中应用比较广泛的一种统计方法,主要用于:线性序列分析、模型分析、基因发现等方面。对HMM进行了简明扼要的描述,并对其在上述几个方面的应用作一概略介绍。 相似文献
11.
Summary A new methodology is proposed for estimating the proportion of true null hypotheses in a large collection of tests. Each test concerns a single parameter δ whose value is specified by the null hypothesis. We combine a parametric model for the conditional cumulative distribution function (CDF) of the p‐value given δ with a nonparametric spline model for the density g(δ) of δ under the alternative hypothesis. The proportion of true null hypotheses and the coefficients in the spline model are estimated by penalized least squares subject to constraints that guarantee that the spline is a density. The estimator is computed efficiently using quadratic programming. Our methodology produces an estimate of the density of δ when the null is false and can address such questions as “when the null is false, is the parameter usually close to the null or far away?” This leads us to define a falsely interesting discovery rate (FIDR), a generalization of the false discovery rate. We contrast the FIDR approach to Efron's (2004, Journal of the American Statistical Association 99, 96–104) empirical null hypothesis technique. We discuss the use of in sample size calculations based on the expected discovery rate (EDR). Our recommended estimator of the proportion of true nulls has less bias compared to estimators based upon the marginal density of the p‐values at 1. In a simulation study, we compare our estimators to the convex, decreasing estimator of Langaas, Lindqvist, and Ferkingstad (2005, Journal of the Royal Statistical Society, Series B 67, 555–572). The most biased of our estimators is very similar in performance to the convex, decreasing estimator. As an illustration, we analyze differences in gene expression between resistant and susceptible strains of barley. 相似文献
12.
Nonparametric and parametric approaches have been proposed to estimate false discovery rate under the independent hypothesis testing assumption. The parametric approach has been shown to have better performance than the nonparametric approaches. In this article, we study the nonparametric approaches and quantify the underlying relations between parametric and nonparametric approaches. Our study reveals the conservative nature of the nonparametric approaches, and establishes the connections between the empirical Bayes method and p-value-based nonparametric methods. Based on our results, we advocate using the parametric approach, or directly modeling the test statistics using the empirical Bayes method. 相似文献
13.
False discovery control with p-value weighting 总被引:2,自引:0,他引:2
14.
We consider hidden Markov models as a versatile class of models for weakly dependent random phenomena. The topic of the present paper is likelihood-ratio testing for hidden Markov models, and we show that, under appropriate conditions, the standard asymptotic theory of likelihood-ratio tests is valid. Such tests are crucial in the specification of multivariate Gaussian hidden Markov models, which we use to illustrate the applicability of our general results. Finally, the methodology is illustrated by means of a real data set. 相似文献
15.
16.
Under the model of independent test statistics, we propose atwo-parameter family of Bayes multiple testing procedures. Thetwo parameters can be viewed as tuning parameters. Using theBenjamini–Hochberg step-up procedure for controlling falsediscovery rate as a baseline for conservativeness, we choosethe tuning parameters to compromise between the operating characteristicsof that procedure and a less conservative procedure that focuseson alternatives that a priori might be considered likely ormeaningful. The Bayes procedures do not have the theoreticaland practical shortcomings of the popular stepwise procedures.In terms of the number of mistakes, simulations for two examplesindicate that over a large segment of the parameter space, theBayes procedure is preferable to the step-up procedure. Anotherdesirable feature of the procedures is that they are computationallyfeasible for any number of hypotheses. 相似文献
17.
We study the estimation of mean medical cost when censoring is dependent and a large amount of auxiliary information is present. Under missing at random assumption, we propose semiparametric working models to obtain low-dimensional summarized scores. An estimator for the mean total cost can be derived nonparametrically conditional on the summarized scores. We show that when either the two working models for cost-survival process or the model for censoring distribution is correct, the estimator is consistent and asymptotically normal. Small-sample performance of the proposed method is evaluated via simulation studies. Finally, our approach is applied to analyze a real data set in health economics. 相似文献
18.
19.
Calibration plot for proteomics: A graphical tool to visually check the assumptions underlying FDR control in quantitative experiments
下载免费PDF全文

Quentin Giai Gianetto Florence Combes Claire Ramus Christophe Bruley Yohann Couté Thomas Burger 《Proteomics》2016,16(1):29-32
In MS‐based quantitative proteomics, the FDR control (i.e. the limitation of the number of proteins that are wrongly claimed as differentially abundant between several conditions) is a major postanalysis step. It is classically achieved thanks to a specific statistical procedure that computes the adjusted p‐values of the putative differentially abundant proteins. Unfortunately, such adjustment is conservative only if the p‐values are well‐calibrated; the false discovery control being spuriously underestimated otherwise. However, well‐calibration is a property that can be violated in some practical cases. To overcome this limitation, we propose a graphical method to straightforwardly and visually assess the p‐value well‐calibration, as well as the R codes to embed it in any pipeline. All MS data have been deposited in the ProteomeXchange with identifier PXD002370 ( http://proteomecentral.proteomexchange.org/dataset/PXD002370 ). 相似文献
20.