期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Using weighted permutation scores to detect differential gene expression with microarray data

Guo X Pan W 《Journal of bioinformatics and computational biology》2005,3(4):989-1006

A class of nonparametric statistical methods, including a nonparametric empirical Bayes (EB) method, the Significance Analysis of Microarrays (SAM) and the mixture model method (MMM) have been proposed to detect differential gene expression for replicated microarray experiments. They all depend on constructing a test statistic, for example, a t-statistic, and then using permutation to draw inferences. However, due to special features of microarray data, using standard permutation scores may not estimate the null distribution of the test statistic well, leading to possibly too conservative inferences. We propose a new method of constructing weighted permutation scores to overcome the problem: posterior probabilities of having no differential expression from the EB method are used as weights for genes to better estimate the null distribution of the test statistic. We also propose a weighted method to estimate the false discovery rate (FDR) using the posterior probabilities. Using simulated data and real data for time-course microarray experiments, we show the improved performance of the proposed methods when implemented in MMM, EB and SAM. 相似文献

2.

A comprehensive evaluation of SAM,the SAM R-package and a simple modification to improve its performance

Shunpu Zhang 《BMC bioinformatics》2007,8(1):230

Background

The Significance Analysis of Microarrays (SAM) is a popular method for detecting significantly expressed genes and controlling the false discovery rate (FDR). Recently, it has been reported in the literature that the FDR is not well controlled by SAM. Due to the vast application of SAM in microarray data analysis, it is of great importance to have an extensive evaluation of SAM and its associated R-package (sam2.20). 相似文献

3.

Identification of Significant Features by the Global Mean Rank Test

Martin Klammer J. Nikolaj Dybowski Daniel Hoffmann Christoph Schaab 《PloS one》2014,9(8)

相似文献

4.

Hierarchical Bayes models for cDNA microarray gene expression 总被引：2，自引：0，他引：2

Lönnstedt I Britton T 《Biostatistics (Oxford, England)》2005,6(2):279-291

cDNA microarrays are used in many contexts to compare mRNA levels between samples of cells. Microarray experiments typically give us expression measurements on 1000-20 000 genes, but with few replicates for each gene. Traditional methods using means and standard deviations to detect differential expression are not satisfactory in this context. A handful of alternative statistics have been developed, including several empirical Bayes methods. In the present paper we present two full hierarchical Bayes models for detecting gene expression, of which one (D) describes our microarray data very well. We also compare the full Bayes and empirical Bayes approaches with respect to model assumptions, false discovery rates and computer running time. The proposed models are compared to existing empirical Bayes models in a simulation study and for a set of data (Yuen et al., 2002), where 27 genes have been categorized by quantitative real-time PCR. It turns out that the existing empirical Bayes methods have at least as good performance as the full Bayes ones. 相似文献

5.

Resampling-based empirical Bayes multiple testing procedures for controlling generalized tail probability and expected value error rates: focus on the false discovery rate and simulation study

Dudoit S Gilbert HN van der Laan MJ 《Biometrical journal. Biometrische Zeitschrift》2008,50(5):716-744

This article proposes resampling-based empirical Bayes multiple testing procedures for controlling a broad class of Type I error rates, defined as generalized tail probability (gTP) error rates, gTP (q,g) = Pr(g (V(n),S(n)) > q), and generalized expected value (gEV) error rates, gEV (g) = E [g (V(n),S(n))], for arbitrary functions g (V(n),S(n)) of the numbers of false positives V(n) and true positives S(n). Of particular interest are error rates based on the proportion g (V(n),S(n)) = V(n) /(V(n) + S(n)) of Type I errors among the rejected hypotheses, such as the false discovery rate (FDR), FDR = E [V(n) /(V(n) + S(n))]. The proposed procedures offer several advantages over existing methods. They provide Type I error control for general data generating distributions, with arbitrary dependence structures among variables. Gains in power are achieved by deriving rejection regions based on guessed sets of true null hypotheses and null test statistics randomly sampled from joint distributions that account for the dependence structure of the data. The Type I error and power properties of an FDR-controlling version of the resampling-based empirical Bayes approach are investigated and compared to those of widely-used FDR-controlling linear step-up procedures in a simulation study. The Type I error and power trade-off achieved by the empirical Bayes procedures under a variety of testing scenarios allows this approach to be competitive with or outperform the Storey and Tibshirani (2003) linear step-up procedure, as an alternative to the classical Benjamini and Hochberg (1995) procedure. 相似文献

6.

Ranking analysis of microarray data: a powerful method for identifying differentially expressed genes

Tan YD Fornage M Fu YX 《Genomics》2006,88(6):846-854

Microarray technology provides a powerful tool for the expression profile of thousands of genes simultaneously, which makes it possible to explore the molecular and metabolic etiology of the development of a complex disease under study. However, classical statistical methods and technologies fail to be applicable to microarray data. Therefore, it is necessary and motivating to develop powerful methods for large-scale statistical analyses. In this paper, we described a novel method, called Ranking Analysis of Microarray Data (RAM). RAM, which is a large-scale two-sample t-test method, is based on comparisons between a set of ranked T statistics and a set of ranked Z values (a set of ranked estimated null scores) yielded by a "randomly splitting" approach instead of a "permutation" approach and a two-simulation strategy for estimating the proportion of genes identified by chance, i.e., the false discovery rate (FDR). The results obtained from the simulated and observed microarray data show that RAM is more efficient in identification of genes differentially expressed and estimation of FDR under undesirable conditions such as a large fudge factor, small sample size, or mixture distribution of noises than Significance Analysis of Microarrays. 相似文献

7.

Assessment of differential gene expression in human peripheral nerve injury

Xiao Y Segal MR Rabert D Ahn AH Anand P Sangameswaran L Hu D Hunt CA 《BMC genomics》2002,3(1):28-11

Background

Microarray technology is a powerful methodology for identifying differentially expressed genes. However, when thousands of genes in a microarray data set are evaluated simultaneously by fold changes and significance tests, the probability of detecting false positives rises sharply. In this first microarray study of brachial plexus injury, we applied and compared the performance of two recently proposed algorithms for tackling this multiple testing problem, Significance Analysis of Microarrays (SAM) and Westfall and Young step down adjusted p values, as well as t-statistics and Welch statistics, in specifying differential gene expression under different biological States. 相似文献

8.

乳腺癌基因芯片实验数据分析与挖掘

蒋峥卜友泉刘忠于宋方洲张世强《国外医学:分子生物学分册》2011,(3):220-224,252

目的对公共数据库上下载得到的乳腺癌基因芯片试验结果进行数据分析,找出在正常组织与癌组织中呈现差异表达的基因,并寻找差异表达基因的相关基因.方法综合运用显著性分析（SAM）、顶级评分基因对（TSP）、关联规则挖掘等方法,对数据进行处理.结果筛选出若干呈现差异表达的基因,并且寻找了其中一部分基因的可能高度相关的基因.结论筛选出的基因及其相关基因可用于为进一步的研究提供候选基因. 相似文献

9.

Parametric and nonparametric FDR estimation revisited

Wu B Guan Z Zhao H 《Biometrics》2006,62(3):735-744

Nonparametric and parametric approaches have been proposed to estimate false discovery rate under the independent hypothesis testing assumption. The parametric approach has been shown to have better performance than the nonparametric approaches. In this article, we study the nonparametric approaches and quantify the underlying relations between parametric and nonparametric approaches. Our study reveals the conservative nature of the nonparametric approaches, and establishes the connections between the empirical Bayes method and p-value-based nonparametric methods. Based on our results, we advocate using the parametric approach, or directly modeling the test statistics using the empirical Bayes method. 相似文献

10.

Modified nonparametric approaches to detecting differentially expressed genes in replicated microarray experiments 总被引：3，自引：0，他引：3

Zhao Y Pan W 《Bioinformatics (Oxford, England)》2003,19(9):1046-1054

MOTIVATION: An important goal in analyzing microarray data is to determine which genes are differentially expressed across two kinds of tissue samples or samples obtained under two experimental conditions. Various parametric tests, such as the two-sample t-test, have been used, but their possibly too strong parametric assumptions or large sample justifications may not hold in practice. As alternatives, a class of three nonparametric statistical methods, including the empirical Bayes method of Efron et al. (2001), the significance analysis of microarray (SAM) method of Tusher et al. (2001) and the mixture model method (MMM) of Pan et al. (2001), have been proposed. All the three methods depend on constructing a test statistic and a so-called null statistic such that the null statistic's distribution can be used to approximate the null distribution of the test statistic. However, relatively little effort has been directed toward assessment of the performance or the underlying assumptions of the methods in constructing such test and null statistics. RESULTS: We point out a problem of a current method to construct the test and null statistics, which may lead to largely inflated Type I errors (i.e. false positives). We also propose two modifications that overcome the problem. In the context of MMM, the improved performance of the modified methods is demonstrated using simulated data. In addition, our numerical results also provide evidence to support the utility and effectiveness of MMM. 相似文献

11.

A case study on choosing normalization methods and test statistics for two-channel microarray data 总被引：1，自引：0，他引：1

Xie Y Jeong KS Pan W Khodursky A Carlin BP 《Comparative and Functional Genomics》2004,5(5):432-444

相似文献

12.

Applications of DNA tiling arrays for whole-genome analysis 总被引：26，自引：0，他引：26

Mockler TC Chan S Sundaresan A Chen H Jacobsen SE Ecker JR 《Genomics》2005,85(1):1-15

相似文献

13.

Detecting differential gene expression with a semiparametric hierarchical mixture method 总被引：11，自引：0，他引：11

Newton MA Noueiry A Sarkar D Ahlquist P 《Biostatistics (Oxford, England)》2004,5(2):155-176

Mixture modeling provides an effective approach to the differential expression problem in microarray data analysis. Methods based on fully parametric mixture models are available, but lack of fit in some examples indicates that more flexible models may be beneficial. Existing, more flexible, mixture models work at the level of one-dimensional gene-specific summary statistics, and so when there are relatively few measurements per gene these methods may not provide sensitive detectors of differential expression. We propose a hierarchical mixture model to provide methodology that is both sensitive in detecting differential expression and sufficiently flexible to account for the complex variability of normalized microarray data. EM-based algorithms are used to fit both parametric and semiparametric versions of the model. We restrict attention to the two-sample comparison problem; an experiment involving Affymetrix microarrays and yeast translation provides the motivating case study. Gene-specific posterior probabilities of differential expression form the basis of statistical inference; they define short gene lists and false discovery rates. Compared to several competing methodologies, the proposed methodology exhibits good operating characteristics in a simulation study, on the analysis of spike-in data, and in a cross-validation calculation. 相似文献

14.

Implementation of microarrays for Methylobacterium extorquens AM1 总被引：1，自引：0，他引：1

Okubo Y Skovran E Guo X Sivam D Lidstrom ME 《Omics : a journal of integrative biology》2007,11(4):325-340

Microarrays are an important tool for understanding global gene expression changes, and the resulting data sets can be used to direct physiologic and metabolic studies. To take advantage of this technology, 60-mer oligonucleotide microarrays were designed for Methylobacterium extorquens AM1 to study gene expression changes that occur under differing physiological conditions. The carbon utilization pathways for methanol and succinate have been well characterized, and growth with these substrates was chosen as the condition used to validate the microarray data. The data were analyzed using two different methods and compared to previously obtained experimental data. The array data processed using the Significance Analysis of Microarrays followed by p-value assessment, correlated best to the experimental data. In addition to validating the microarrays, these studies uncovered possible connections between methylotrophy, iron, and sulfur homeostasis, bacteriochlorophyll production and polyketide synthesis, and will likely aid in uncovering further metabolic networks and genes required for methylotrophy. 相似文献

15.

Semi-parametric differential expression analysis via partial mixture estimation

Rossell D Guerra R Scott C 《Statistical applications in genetics and molecular biology》2008,7(1):Article15

We develop an approach for microarray differential expression analysis, i.e. identifying genes whose expression levels differ between two or more groups. Current approaches to inference rely either on full parametric assumptions or on permutation-based techniques for sampling under the null distribution. In some situations, however, a full parametric model cannot be justified, or the sample size per group is too small for permutation methods to be valid. We propose a semi-parametric framework based on partial mixture estimation which only requires a parametric assumption for the null (equally expressed) distribution and can handle small sample sizes where permutation methods break down. We develop two novel improvements of Scott's minimum integrated square error criterion for partial mixture estimation [Scott, 2004a,b]. As a side benefit, we obtain interpretable and closed-form estimates for the proportion of EE genes. Pseudo-Bayesian and frequentist procedures for controlling the false discovery rate are given. Results from simulations and real datasets indicate that our approach can provide substantial advantages for small sample sizes over the SAM method of Tusher et al. [2001], the empirical Bayes procedure of Efron and Tibshirani [2002], the mixture of normals of Pan et al. [2003] and a t-test with p-value adjustment [Dudoit et al., 2003] to control the FDR [Benjamini and Hochberg, 1995]. 相似文献

16.

Adjusting batch effects in microarray expression data using empirical Bayes methods

Johnson WE Li C Rabinovic A 《Biostatistics (Oxford, England)》2007,8(1):118-127

Non-biological experimental variation or "batch effects" are commonly observed across multiple batches of microarray experiments, often rendering the task of combining data from these batches difficult. The ability to combine microarray data sets is advantageous to researchers to increase statistical power to detect biological phenomena from studies where logistical considerations restrict sample size or in studies that require the sequential hybridization of arrays. In general, it is inappropriate to combine data sets without adjusting for batch effects. Methods have been proposed to filter batch effects from data, but these are often complicated and require large batch sizes ( > 25) to implement. Because the majority of microarray studies are conducted using much smaller sample sizes, existing methods are not sufficient. We propose parametric and non-parametric empirical Bayes frameworks for adjusting data for batch effects that is robust to outliers in small sample sizes and performs comparable to existing methods for large samples. We illustrate our methods using two example data sets and show that our methods are justifiable, easy to apply, and useful in practice. Software for our method is freely available at: http://biosun1.harvard.edu/complab/batch/. 相似文献

17.

A nonparametric empirical Bayes framework for large-scale multiple testing

Martin R Tokdar ST 《Biostatistics (Oxford, England)》2012,13(3):427-439

We propose a flexible and identifiable version of the 2-groups model, motivated by hierarchical Bayes considerations, that features an empirical null and a semiparametric mixture model for the nonnull cases. We use a computationally efficient predictive recursion (PR) marginal likelihood procedure to estimate the model parameters, even the nonparametric mixing distribution. This leads to a nonparametric empirical Bayes testing procedure, which we call PRtest, based on thresholding the estimated local false discovery rates. Simulations and real data examples demonstrate that, compared to existing approaches, PRtest's careful handling of the nonnull density can give a much better fit in the tails of the mixture distribution which, in turn, can lead to more realistic conclusions. 相似文献

18.

A weighted principal component analysis and its application to gene expression data

Pinto da Costa JF Alonso H Roque L 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2011,8(1):246-252

In this work, we introduce in the first part new developments in Principal Component Analysis (PCA) and in the second part a new method to select variables (genes in our application). Our focus is on problems where the values taken by each variable do not all have the same importance and where the data may be contaminated with noise and contain outliers, as is the case with microarray data. The usual PCA is not appropriate to deal with this kind of problems. In this context, we propose the use of a new correlation coefficient as an alternative to Pearson's. This leads to a so-called weighted PCA (WPCA). In order to illustrate the features of our WPCA and compare it with the usual PCA, we consider the problem of analyzing gene expression data sets. In the second part of this work, we propose a new PCA-based algorithm to iteratively select the most important genes in a microarray data set. We show that this algorithm produces better results when our WPCA is used instead of the usual PCA. Furthermore, by using Support Vector Machines, we show that it can compete with the Significance Analysis of Microarrays algorithm. 相似文献

19.

An investigation on performance of Significance Analysis of Microarray (SAM) for the comparisons of several treatments with one control in the presence of small-variance genes

Lin D Shkedy Z Burzykowski T Ion R Göhlmann HW Bondt AD Perer T Geerts T Van den Wyngaert I Bijnens L 《Biometrical journal. Biometrische Zeitschrift》2008,50(5):801-823

One of multiple testing problems in drug finding experiments is the comparison of several treatments with one control. In this paper we discuss a particular situation of such an experiment, i.e., a microarray setting, where the many-to-one comparisons need to be addressed for thousands of genes simultaneously. For a gene-specific analysis, Dunnett's single step procedure is considered within gene tests, while the FDR controlling procedures such as Significance Analysis of Microarrays (SAM) and Benjamini and Hochberg (BH) False Discovery Rate (FDR) adjustment are applied to control the error rate across genes. The method is applied to a microarray experiment with four treatment groups (three microarrays in each group) and 16,998 genes. Simulation studies are conducted to investigate the performance of the SAM method and the BH-FDR procedure with regard to controlling the FDR, and to investigate the effect of small-variance genes on the FDR in the SAM procedure. 相似文献

20.

Testing differential expression in nonoverlapping gene pairs: a new perspective for the empirical Bayes method

Klebanov L Qiu X Yakovlev A 《Journal of bioinformatics and computational biology》2008,6(2):301-316

The currently practiced methods of significance testing in microarray gene expression profiling are highly unstable and tend to be very low in power. These undesirable properties are due to the nature of multiple testing procedures, as well as extremely strong and long-ranged correlations between gene expression levels. In an earlier publication, we identified a special structure in gene expression data that produces a sequence of weakly dependent random variables. This structure, termed the delta-sequence, lies at the heart of a new methodology for selecting differentially expressed genes in nonoverlapping gene pairs. The proposed method has two distinct advantages: (1) it leads to dramatic gains in terms of the mean numbers of true and false discoveries, and in the stability of the results of testing; and (2) its outcomes are entirely free from the log-additive array-specific technical noise. We demonstrate the usefulness of this approach in conjunction with the nonparametric empirical Bayes method. The proposed modification of the empirical Bayes method leads to significant improvements in its performance. The new paradigm arising from the existence of the delta-sequence in biological data offers considerable scope for future developments in this area of methodological research. 相似文献