期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Comparison of false discovery rate methods in identifying genes with differential expression

Qian HR Huang S 《Genomics》2005,86(4):495-503

Current high-throughput techniques such as microarray in genomics or mass spectrometry in proteomics usually generate thousands of hypotheses to be tested simultaneously. The usual purpose of these techniques is to identify a subset of interesting cases that deserve further investigation. As a consequence, the control of false positives among the tests called "significant" becomes a critical issue for researchers. Over the past few years, several false discovery rate (FDR)-controlling methods have been proposed; each method favors certain scenarios and is introduced with the purpose of improving the control of FDR at the targeted level. In this paper, we compare the performance of the five FDR-controlling methods proposed by Benjamini et al., the qvalue method proposed by Storey, and the traditional Bonferroni method. The purpose is to investigate the "observed" sensitivity of each method on typical microarray experiments in which the majority (or all) of the truth is unknown. Based on two well-studied microarray datasets, it is found that in terms of the "apparent" test power, the ranking of the FDR methods is given as Step-down相似文献

2.

A novel approach to minimize false discovery rate in genome-wide data analysis

Yuanzhe Bei Pengyu Hong 《BMC systems biology》2013,7(Z4):S1

Background

High-throughput technologies, such as DNA microarray, have significantly advanced biological and biomedical research by enabling researchers to carry out genome-wide screens. One critical task in analyzing genome-wide datasets is to control the false discovery rate (FDR) so that the proportion of false positive features among those called significant is restrained. Recently a number of FDR control methods have been proposed and widely practiced, such as the Benjamini-Hochberg approach, the Storey approach and Significant Analysis of Microarrays (SAM).

Methods

This paper presents a straight-forward yet powerful FDR control method termed miFDR, which aims to minimize FDR when calling a fixed number of significant features. We theoretically proved that the strategy used by miFDR is able to find the optimal number of significant features when the desired FDR is fixed.

Results

We compared miFDR with the BH approach, the Storey approach and SAM on both simulated datasets and public DNA microarray datasets. The results demonstrated that miFDR outperforms others by identifying more significant features under the same FDR cut-offs. Literature search showed that many genes called only by miFDR are indeed relevant to the underlying biology of interest.

Conclusions

FDR has been widely applied to analyzing high-throughput datasets allowed for rapid discoveries. Under the same FDR threshold, miFDR is capable to identify more significant features than its competitors at a compatible level of complexity. Therefore, it can potentially generate great impacts on biological and biomedical research.

Availability

If interested, please contact the authors for getting miFDR.

相似文献

3.

A unified approach to false discovery rate estimation

Korbinian Strimmer 《BMC bioinformatics》2008,9(1):303

Background

False discovery rate (FDR) methods play an important role in analyzing high-dimensional data. There are two types of FDR, tail area-based FDR and local FDR, as well as numerous statistical algorithms for estimating or controlling FDR. These differ in terms of underlying test statistics and procedures employed for statistical learning. 相似文献

4.

A first principles approach to differential expression in microarray data analysis

Robert A Rubin 《BMC bioinformatics》2009,10(1):292

Background

The disparate results from the methods commonly used to determine differential expression in Affymetrix microarray experiments may well result from the wide variety of probe set and probe level models employed. Here we take the approach of making the fewest assumptions about the structure of the microarray data. Specifically, we only require that, under the null hypothesis that a gene is not differentially expressed for specified conditions, for any probe position in the gene's probe set: a) the probe amplitudes are independent and identically distributed over the conditions, and b) the distributions of the replicated probe amplitudes are amenable to classical analysis of variance (ANOVA). Log-amplitudes that have been standardized within-chip meet these conditions well enough for our approach, which is to perform ANOVA across conditions for each probe position, and then take the median of the resulting (1 - p) values as a gene-level measure of differential expression. 相似文献

5.

Clustering approaches to identifying gene expression patterns from DNA microarray data

Do JH Choi DK 《Molecules and cells》2008,25(2):279-288

相似文献

6.

Multidimensional local false discovery rate for microarray studies 总被引：1，自引：0，他引：1

Ploner A Calza S Gusnanto A Pawitan Y 《Bioinformatics (Oxford, England)》2006,22(5):556-565

MOTIVATION: The false discovery rate (fdr) is a key tool for statistical assessment of differential expression (DE) in microarray studies. Overall control of the fdr alone, however, is not sufficient to address the problem of genes with small variance, which generally suffer from a disproportionally high rate of false positives. It is desirable to have an fdr-controlling procedure that automatically accounts for gene variability. METHODS: We generalize the local fdr as a function of multiple statistics, combining a common test statistic for assessing DE with its standard error information. We use a non-parametric mixture model for DE and non-DE genes to describe the observed multi-dimensional statistics, and estimate the distribution for non-DE genes via the permutation method. We demonstrate this fdr2d approach for simulated and real microarray data. RESULTS: The fdr2d allows objective assessment of DE as a function of gene variability. We also show that the fdr2d performs better than commonly used modified test statistics. AVAILABILITY: An R-package OCplus containing functions for computing fdr2d() and other operating characteristics of microarray data is available at http://www.meb.ki.se/~yudpaw. 相似文献

7.

A note on using permutation-based false discovery rate estimates to compare different analysis methods for microarray data

Xie Y Pan W Khodursky AB 《Bioinformatics (Oxford, England)》2005,21(23):4280-4288

MOTIVATION: False discovery rate (FDR) is defined as the expected percentage of false positives among all the claimed positives. In practice, with the true FDR unknown, an estimated FDR can serve as a criterion to evaluate the performance of various statistical methods under the condition that the estimated FDR approximates the true FDR well, or at least, it does not improperly favor or disfavor any particular method. Permutation methods have become popular to estimate FDR in genomic studies. The purpose of this paper is 2-fold. First, we investigate theoretically and empirically whether the standard permutation-based FDR estimator is biased, and if so, whether the bias inappropriately favors or disfavors any method. Second, we propose a simple modification of the standard permutation to yield a better FDR estimator, which can in turn serve as a more fair criterion to evaluate various statistical methods. RESULTS: Both simulated and real data examples are used for illustration and comparison. Three commonly used test statistics, the sample mean, SAM statistic and Student's t-statistic, are considered. The results show that the standard permutation method overestimates FDR. The overestimation is the most severe for the sample mean statistic while the least for the t-statistic with the SAM-statistic lying between the two extremes, suggesting that one has to be cautious when using the standard permutation-based FDR estimates to evaluate various statistical methods. In addition, our proposed FDR estimation method is simple and outperforms the standard method. 相似文献

8.

A CART-based approach to discover emerging patterns in microarray data 总被引：1，自引：0，他引：1

Boulesteix AL Tutz G Strimmer K 《Bioinformatics (Oxford, England)》2003,19(18):2465-2472

MOTIVATION: Cancer diagnosis using gene expression profiles requires supervised learning and gene selection methods. Of the many suggested approaches, the method of emerging patterns (EPs) has the particular advantage of explicitly modeling interactions among genes, which improves classification accuracy. However, finding useful (i.e. short and statistically significant) EP is typically very hard. METHODS: Here we introduce a CART-based approach to discover EPs in microarray data. The method is based on growing decision trees from which the EPs are extracted. This approach combines pattern search with a statistical procedure based on Fisher's exact test to assess the significance of each EP. Subsequently, sample classification based on the inferred EPs is performed using maximum-likelihood linear discriminant analysis. RESULTS: Using simulated data as well as gene expression data from colon and leukemia cancer experiments we assessed the performance of our pattern search algorithm and classification procedure. In the simulations, our method recovers a large proportion of known EPs while for real data it is comparable in classification accuracy with three top-performing alternative classification algorithms. In addition, it assigns statistical significance to the inferred EPs and allows to rank the patterns while simultaneously avoiding overfit of the data. The new approach therefore provides a versatile and computationally fast tool for elucidating local gene interactions as well as for classification. AVAILABILITY: A computer program written in the statistical language R implementing the new approach is freely available from the web page http://www.stat.uni-muenchen.de/~socher/ 相似文献

9.

Bias in the estimation of false discovery rate in microarray studies 总被引：4，自引：0，他引：4

Pawitan Y Murthy KR Michiels S Ploner A 《Bioinformatics (Oxford, England)》2005,21(20):3865-3872

MOTIVATION: The false discovery rate (FDR) provides a key statistical assessment for microarray studies. Its value depends on the proportion pi(0) of non-differentially expressed (non-DE) genes. In most microarray studies, many genes have small effects not easily separable from non-DE genes. As a result, current methods often overestimate pi(0) and FDR, leading to unnecessary loss of power in the overall analysis. METHODS: For the common two-sample comparison we derive a natural mixture model of the test statistic and an explicit bias formula in the standard estimation of pi(0). We suggest an improved estimation of pi(0) based on the mixture model and describe a practical likelihood-based procedure for this purpose. RESULTS: The analysis shows that a large bias occurs when pi(0) is far from 1 and when the non-centrality parameters of the distribution of the test statistic are near zero. The theoretical result also explains substantial discrepancies between non-parametric and model-based estimates of pi(0). Simulation studies indicate mixture-model estimates are less biased than standard estimates. The method is applied to breast cancer and lymphoma data examples. AVAILABILITY: An R-package OCplus containing functions to compute pi(0) based on the mixture model, the resulting FDR and other operating characteristics of microarray data, is freely available at http://www.meb.ki.se/~yudpaw CONTACT: yudi.pawitan@meb.ki.se and alexander.ploner@meb.ki.se. 相似文献

10.

Local false discovery rate facilitates comparison of different microarray experiments

Wan-Jen Hong Robert Tibshirani Gilbert Chu 《Nucleic acids research》2009,37(22):7483-7497

The local false discovery rate (LFDR) estimates the probability of falsely identifying specific genes with changes in expression. In computer simulations, LFDR <10% successfully identified genes with changes in expression, while LFDR >90% identified genes without changes. We used LFDR to compare different microarray experiments quantitatively: (i) Venn diagrams of genes with and without changes in expression, (ii) scatter plots of the genes, (iii) correlation coefficients in the scatter plots and (iv) distributions of gene function. To illustrate, we compared three methods for pre-processing microarray data. Correlations between methods were high (r = 0.84–0.92). However, responses were often different in magnitude, and sometimes discordant, even though the methods used the same raw data. LFDR complements functional assessments like gene set enrichment analysis. To illustrate, we compared responses to ultraviolet radiation (UV), ionizing radiation (IR) and tobacco smoke. Compared to unresponsive genes, genes responsive to both UV and IR were enriched for cell cycle, mitosis, and DNA repair functions. Genes responsive to UV but not IR were depleted for cell adhesion functions. Genes responsive to tobacco smoke were enriched for detoxification functions. Thus, LFDR reveals differences and similarities among experiments. 相似文献

11.

Local false discovery rate and minimum total error rate approaches to identifying interesting chromosomal regions

Sinha R Sinha M Mathew G Elston RC Luo Y 《BMC genetics》2005,6(Z1):S23

The simultaneous testing of a large number of hypotheses in a genome scan, using individual thresholds for significance, inherently leads to inflated genome-wide false positive rates. There exist various approaches to approximating the correct genomewide p-values under various assumptions, either by way of asymptotics or simulations. We explore a philosophically different criterion, recently proposed in the literature, which controls the false discovery rate. The test statistics are assumed to arise from a mixture of distributions under the null and non-null hypotheses. We fit the mixture distribution using both a nonparametric approach and commingling analysis, and then apply the local false discovery rate to select cut-off points for regions to be declared interesting. Another criterion, the minimum total error, is also explored. Both criteria seem to be sensible alternatives to controlling the classical type I and type II error rates. 相似文献

12.

A mixture model for estimating the local false discovery rate in DNA microarray analysis 总被引：3，自引：0，他引：3

Liao JG Lin Y Selvanayagam ZE Shih WJ 《Bioinformatics (Oxford, England)》2004,20(16):2694-2701

MOTIVATION: Statistical methods based on controlling the false discovery rate (FDR) or positive false discovery rate (pFDR) are now well established in identifying differentially expressed genes in DNA microarray. Several authors have recently raised the important issue that FDR or pFDR may give misleading inference when specific genes are of interest because they average the genes under consideration with genes that show stronger evidence for differential expression. The paper proposes a flexible and robust mixture model for estimating the local FDR which quantifies how plausible each specific gene expresses differentially. RESULTS: We develop a special mixture model tailored to multiple testing by requiring the P-value distribution for the differentially expressed genes to be stochastically smaller than the P-value distribution for the non-differentially expressed genes. A smoothing mechanism is built in. The proposed model gives robust estimation of local FDR for any reasonable underlying P-value distributions. It also provides a single framework for estimating the proportion of differentially expressed genes, pFDR, negative predictive values, sensitivity and specificity. A cervical cancer study shows that the local FDR gives more specific and relevant quantification of the evidence for differential expression that can be substantially different from pFDR. AVAILABILITY: An R function implementing the proposed model is available at http://www.geocities.com/jg_liao/software 相似文献

13.

Identification of differentially expressed genes and false discovery rate in microarray studies

Gusnanto A Calza S Pawitan Y 《Current opinion in lipidology》2007,18(2):187-193

PURPOSE OF REVIEW: To highlight the development in microarray data analysis for the identification of differentially expressed genes, particularly via control of false discovery rate. RECENT FINDINGS: The emergence of high-throughput technology such as microarrays raises two fundamental statistical issues: multiplicity and sensitivity. We focus on the biological problem of identifying differentially expressed genes. First, multiplicity arises due to testing tens of thousands of hypotheses, rendering the standard P value meaningless. Second, known optimal single-test procedures such as the t-test perform poorly in the context of highly multiple tests. The standard approach of dealing with multiplicity is too conservative in the microarray context. The false discovery rate concept is fast becoming the key statistical assessment tool replacing the P value. We review the false discovery rate approach and argue that it is more sensible for microarray data. We also discuss some methods to take into account additional information from the microarrays to improve the false discovery rate. SUMMARY: There is growing consensus on how to analyse microarray data using the false discovery rate framework in place of the classical P value. Further research is needed on the preprocessing of the raw data, such as the normalization step and filtering, and on finding the most sensitive test procedure. 相似文献

14.

Gene expression * Quick calculation for sample size while controlling false discovery rate with application to microarray analysis

Liu Peng; Gene Hwang J.T. 《Bioinformatics (Oxford, England)》2008,24(1):149

Vol. 23, No. 6, 2007, pp. 739–746 doi:10.1093/bioinformatics/btl664 The calculation of the sample sizes using the method of Poundsand Cheng 相似文献

15.

A mixture model-based approach to the clustering of microarray expression data 总被引：13，自引：0，他引：13

McLachlan GJ Bean RW Peel D 《Bioinformatics (Oxford, England)》2002,18(3):413-422

MOTIVATION: This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. RESULTS: The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets. AVAILABILITY: EMMIX-GENE is available at http://www.maths.uq.edu.au/~gjm/emmix-gene/ 相似文献

16.

Kerfdr: a semi-parametric kernel-based approach to local false discovery rate estimation

Mickael Guedj Stephane Robin Alain Celisse Gregory Nuel 《BMC bioinformatics》2009,10(1):84

Background

The use of current high-throughput genetic, genomic and post-genomic data leads to the simultaneous evaluation of a large number of statistical hypothesis and, at the same time, to the multiple-testing problem. As an alternative to the too conservative Family-Wise Error-Rate (FWER), the False Discovery Rate (FDR) has appeared for the last ten years as more appropriate to handle this problem. However one drawback of FDR is related to a given rejection region for the considered statistics, attributing the same value to those that are close to the boundary and those that are not. As a result, the local FDR has been recently proposed to quantify the specific probability for a given null hypothesis to be true. 相似文献

17.

Extracting gene expression patterns and identifying co-expressed genes from microarray data reveals biologically responsive processes

Jeff W Chou Tong Zhou William K Kaufmann Richard S Paules Pierre R Bushel 《BMC bioinformatics》2007,8(1):427

Background

A common observation in the analysis of gene expression data is that many genes display similarity in their expression patterns and therefore appear to be co-regulated. However, the variation associated with microarray data and the complexity of the experimental designs make the acquisition of co-expressed genes a challenge. We developed a novel method for Extracting microarray gene expression Patterns and Identifying co-expressed Genes, designated as EPIG. The approach utilizes the underlying structure of gene expression data to extract patterns and identify co-expressed genes that are responsive to experimental conditions. 相似文献

18.

Quick calculation for sample size while controlling false discovery rate with application to microarray analysis

Liu P Hwang JT 《Bioinformatics (Oxford, England)》2007,23(6):739-746

MOTIVATION: Sample size calculation is important in experimental design and is even more so in microarray or proteomic experiments since only a few repetitions can be afforded. In the multiple testing problems involving these experiments, it is more powerful and more reasonable to control false discovery rate (FDR) or positive FDR (pFDR) instead of type I error, e.g. family-wise error rate (FWER). When controlling FDR, the traditional approach of estimating sample size by controlling type I error is no longer applicable. RESULTS: Our proposed method applies to controlling FDR. The sample size calculation is straightforward and requires minimal computation, as illustrated with two sample t-tests and F-tests. Based on simulation with the resultant sample size, the power is shown to be achievable by the q-value procedure. AVAILABILITY: A Matlab code implementing the described methods is available upon request. 相似文献

19.

cn.FARMS: a latent variable model to detect copy number variations in microarray data with a low false discovery rate

Clevert DA Mitterecker A Mayr A Klambauer G Tuefferd M De Bondt A Talloen W Göhlmann H Hochreiter S 《Nucleic acids research》2011,39(12):e79

Cost-effective oligonucleotide genotyping arrays like the Affymetrix SNP 6.0 are still the predominant technique to measure DNA copy number variations (CNVs). However, CNV detection methods for microarrays overestimate both the number and the size of CNV regions and, consequently, suffer from a high false discovery rate (FDR). A high FDR means that many CNVs are wrongly detected and therefore not associated with a disease in a clinical study, though correction for multiple testing takes them into account and thereby decreases the study's discovery power. For controlling the FDR, we propose a probabilistic latent variable model, 'cn.FARMS', which is optimized by a Bayesian maximum a posteriori approach. cn.FARMS controls the FDR through the information gain of the posterior over the prior. The prior represents the null hypothesis of copy number 2 for all samples from which the posterior can only deviate by strong and consistent signals in the data. On HapMap data, cn.FARMS clearly outperformed the two most prevalent methods with respect to sensitivity and FDR. The software cn.FARMS is publicly available as a R package at http://www.bioinf.jku.at/software/cnfarms/cnfarms.html. 相似文献

20.

Biomarker discovery in microarray gene expression data with Gaussian processes

Chu W Ghahramani Z Falciani F Wild DL 《Bioinformatics (Oxford, England)》2005,21(16):3385-3393

MOTIVATION: In clinical practice, pathological phenotypes are often labelled with ordinal scales rather than binary, e.g. the Gleason grading system for tumour cell differentiation. However, in the literature of microarray analysis, these ordinal labels have been rarely treated in a principled way. This paper describes a gene selection algorithm based on Gaussian processes to discover consistent gene expression patterns associated with ordinal clinical phenotypes. The technique of automatic relevance determination is applied to represent the significance level of the genes in a Bayesian inference framework. RESULTS: The usefulness of the proposed algorithm for ordinal labels is demonstrated by the gene expression signature associated with the Gleason score for prostate cancer data. Our results demonstrate how multi-gene markers that may be initially developed with a diagnostic or prognostic application in mind are also useful as an investigative tool to reveal associations between specific molecular and cellular events and features of tumour physiology. Our algorithm can also be applied to microarray data with binary labels with results comparable to other methods in the literature. 相似文献