期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Computation of significance scores of unweighted Gene Set Enrichment Analyses

Andreas Keller Christina Backes Hans-Peter Lenhof 《BMC bioinformatics》2007,8(1):290

Background

Gene Set Enrichment Analysis (GSEA) is a computational method for the statistical evaluation of sorted lists of genes or proteins. Originally GSEA was developed for interpreting microarray gene expression data, but it can be applied to any sorted list of genes. Given the gene list and an arbitrary biological category, GSEA evaluates whether the genes of the considered category are randomly distributed or accumulated on top or bottom of the list. Usually, significance scores (p-values) of GSEA are computed by nonparametric permutation tests, a time consuming procedure that yields only estimates of the p-values. 相似文献

2.

A simple method for statistical analysis of intensity differences in microarray-derived gene expression data

Alexander Kamb Mani Ramaswami 《BMC biotechnology》2001,1(1):8-8

Background

Microarray experiments offer a potent solution to the problem of making and comparing large numbers of gene expression measurements either in different cell types or in the same cell type under different conditions. Inferences about the biological relevance of observed changes in expression depend on the statistical significance of the changes. In lieu of many replicates with which to determine accurate intensity means and variances, reliable estimates of statistical significance remain problematic. Without such estimates, overly conservative choices for significance must be enforced. 相似文献

3.

A tool for comparing different statistical methods on identifying differentially expressed genes

Paul?Fogel Li?Liu Email author Bruno?Dumas Nanxiang?Ge 《Genome biology》2005,6(1):P2

Background

Many different statistical methods have been developed to deal with two group comparison microarray experiments. Most often, a substantial number of genes may be selected or not, depending on which method was actually used. Practical guidance on the application of these methods is therefore required. We developed a procedure based on bootstrap and a criterion to allow viewing and quantifying differences between method-dependent selections. We applied this procedure on three datasets that cover a range of possible sample sizes to compare three well known methods, namely: t-test, LPE and SAM. 相似文献

4.

Comparison of small n statistical tests of differential expression applied to microarrays

Carl Murie Owen Woody Anna Y Lee Robert Nadon 《BMC bioinformatics》2009,10(1):45-18

Background

DNA microarrays provide data for genome wide patterns of expression between observation classes. Microarray studies often have small samples sizes, however, due to cost constraints or specimen availability. This can lead to poor random error estimates and inaccurate statistical tests of differential expression. We compare the performance of the standard t-test, fold change, and four small n statistical test methods designed to circumvent these problems. We report results of various normalization methods for empirical microarray data and of various random error models for simulated data. 相似文献

5.

A bi-ordering approach to linking gene expression with clinical annotations in gastric cancer

Fan Shi Christopher Leckie Geoff MacIntyre Izhak Haviv Alex Boussioutas Adam Kowalczyk 《BMC bioinformatics》2010,11(1):477

Background

In the study of cancer genomics, gene expression microarrays, which measure thousands of genes in a single assay, provide abundant information for the investigation of interesting genes or biological pathways. However, in order to analyze the large number of noisy measurements in microarrays, effective and efficient bioinformatics techniques are needed to identify the associations between genes and relevant phenotypes. Moreover, systematic tests are needed to validate the statistical and biological significance of those discoveries. 相似文献

6.

A new multitest correction (SGoF) that increases its statistical power when increasing the number of tests

Antonio Carvajal-Rodríguez Jacobo de U?a-Alvarez Emilio Rolán-Alvarez 《BMC bioinformatics》2009,10(1):209

Background

The detection of true significant cases under multiple testing is becoming a fundamental issue when analyzing high-dimensional biological data. Unfortunately, known multitest adjustments reduce their statistical power as the number of tests increase. We propose a new multitest adjustment, based on a sequential goodness of fit metatest (SGoF), which increases its statistical power with the number of tests. The method is compared with Bonferroni and FDR-based alternatives by simulating a multitest context via two different kinds of tests: 1) one-sample t-test, and 2) homogeneity G-test. 相似文献

7.

EasyGene – a prokaryotic gene finder that ranks ORFs by statistical significance

Thomas?Schou?Larsen Email author Anders?Krogh 《BMC bioinformatics》2003,4(1):21

Background

Contrary to other areas of sequence analysis, a measure of statistical significance of a putative gene has not been devised to help in discriminating real genes from the masses of random Open Reading Frames (ORFs) in prokaryotic genomes. Therefore, many genomes have too many short ORFs annotated as genes. 相似文献

8.

An improved procedure for gene selection from microarray experiments using false discovery rate criterion

James J Yang Mark CK Yang 《BMC bioinformatics》2006,7(1):15-14

Background

A large number of genes usually show differential expressions in a microarray experiment with two types of tissues, and the p-values of a proper statistical test are often used to quantify the significance of these differences. The genes with small p-values are then picked as the genes responsible for the differences in the tissue RNA expressions. One key question is what should be the threshold to consider the p-values small. There is always a trade off between this threshold and the rate of false claims. Recent statistical literature shows that the false discovery rate (FDR) criterion is a powerful and reasonable criterion to pick those genes with differential expression. Moreover, the power of detection can be increased by knowing the number of non-differential expression genes. While this number is unknown in practice, there are methods to estimate it from data. The purpose of this paper is to present a new method of estimating this number and use it for the FDR procedure construction. 相似文献

9.

Rank-invariant resampling based estimation of false discovery rate for analysis of small sample microarray data

Nitin?Jain HyungJun?Cho Michael?O'Connell Jae?K?Lee Email author 《BMC bioinformatics》2005,6(1):187

Background

The evaluation of statistical significance has become a critical process in identifying differentially expressed genes in microarray studies. Classical p-value adjustment methods for multiple comparisons such as family-wise error rate (FWER) have been found to be too conservative in analyzing large-screening microarray data, and the False Discovery Rate (FDR), the expected proportion of false positives among all positives, has been recently suggested as an alternative for controlling false positives. Several statistical approaches have been used to estimate and control FDR, but these may not provide reliable FDR estimation when applied to microarray data sets with a small number of replicates. 相似文献

10.

Density based pruning for identification of differentially expressed genes from microarray data

Hu J Xu J 《BMC genomics》2010,11(Z2):S3

Motivation

Identification of differentially expressed genes from microarray datasets is one of the most important analyses for microarray data mining. Popular algorithms such as statistical t-test rank genes based on a single statistics. The false positive rate of these methods can be improved by considering other features of differentially expressed genes.

Results

We proposed a pattern recognition strategy for identifying differentially expressed genes. Genes are mapped to a two dimension feature space composed of average difference of gene expression and average expression levels. A density based pruning algorithm (DB Pruning) is developed to screen out potential differentially expressed genes usually located in the sparse boundary region. Biases of popular algorithms for identifying differentially expressed genes are visually characterized. Experiments on 17 datasets from Gene Omnibus Database (GEO) with experimentally verified differentially expressed genes showed that DB pruning can significantly improve the prediction accuracy of popular identification algorithms such as t-test, rank product, and fold change.

Conclusions

Density based pruning of non-differentially expressed genes is an effective method for enhancing statistical testing based algorithms for identifying differentially expressed genes. It improves t-test, rank product, and fold change by 11% to 50% in the numbers of identified true differentially expressed genes. The source code of DB pruning is freely available on our website http://mleg.cse.sc.edu/degprune

相似文献

11.

A two-sample Bayesian t-test for microarray data

Richard J Fox Matthew W Dimmic 《BMC bioinformatics》2006,7(1):126-11

Background

Determining whether a gene is differentially expressed in two different samples remains an important statistical problem. Prior work in this area has featured the use of t-tests with pooled estimates of the sample variance based on similarly expressed genes. These methods do not display consistent behavior across the entire range of pooling and can be biased when the prior hyperparameters are specified heuristically. 相似文献

12.

Determination of the differentially expressed genes in microarray experiments using local FDR

J?Aubert A?Bar-Hen J-J?Daudin Email author S?Robin 《BMC bioinformatics》2004,5(1):125

Background

Thousands of genes in a genomewide data set are tested against some null hypothesis, for detecting differentially expressed genes in microarray experiments. The expected proportion of false positive genes in a set of genes, called the False Discovery Rate (FDR), has been proposed to measure the statistical significance of this set. Various procedures exist for controlling the FDR. However the threshold (generally 5%) is arbitrary and a specific measure associated with each gene would be worthwhile. 相似文献

13.

Intensity-based hierarchical Bayes method improves testing for differentially expressed genes in microarray experiments

Maureen A Sartor Craig R Tomlinson Scott C Wesselkamper Siva Sivaganesan George D Leikauf Mario Medvedovic 《BMC bioinformatics》2006,7(1):538-17

Background

The small sample sizes often used for microarray experiments result in poor estimates of variance if each gene is considered independently. Yet accurately estimating variability of gene expression measurements in microarray experiments is essential for correctly identifying differentially expressed genes. Several recently developed methods for testing differential expression of genes utilize hierarchical Bayesian models to "pool" information from multiple genes. We have developed a statistical testing procedure that further improves upon current methods by incorporating the well-documented relationship between the absolute gene expression level and the variance of gene expression measurements into the general empirical Bayes framework. 相似文献

14.

ProbCD: enrichment analysis accounting for categorization uncertainty

Ricardo ZN Vêncio Ilya Shmulevich 《BMC bioinformatics》2007,8(1):383

Background

As in many other areas of science, systems biology makes extensive use of statistical association and significance estimates in contingency tables, a type of categorical data analysis known in this field as enrichment (also over-representation or enhancement) analysis. In spite of efforts to create probabilistic annotations, especially in the Gene Ontology context, or to deal with uncertainty in high throughput-based datasets, current enrichment methods largely ignore this probabilistic information since they are mainly based on variants of the Fisher Exact Test. 相似文献

15.

MILANO – custom annotation of microarray results using automatic literature searches

Ran?Rubinstein Itamar?Simon Email author 《BMC bioinformatics》2005,6(1):12

Background

High-throughput genomic research tools are becoming standard in the biologist's toolbox. After processing the genomic data with one of the many available statistical algorithms to identify statistically significant genes, these genes need to be further analyzed for biological significance in light of all the existing knowledge. Literature mining – the process of representing literature data in a fashion that is easy to relate to genomic data – is one solution to this problem. 相似文献

16.

Primary and secondary transcriptional effects in the developing human Down syndrome brain and heart

下载免费PDF全文

Mao R Wang X Spitznagel EL Frelin LP Ting JC Ding H Kim JW Ruczinski I Downey TJ Pevsner J 《Genome biology》2005,6(13):R107

相似文献

17.

Multiclass classification of microarray data samples with a reduced number of genes

Elizabeth Tapia Leonardo Ornella Pilar Bulacio Laura Angelone 《BMC bioinformatics》2011,12(1):59

Background

Multiclass classification of microarray data samples with a reduced number of genes is a rich and challenging problem in Bioinformatics research. The problem gets harder as the number of classes is increased. In addition, the performance of most classifiers is tightly linked to the effectiveness of mandatory gene selection methods. Critical to gene selection is the availability of estimates about the maximum number of genes that can be handled by any classification algorithm. Lack of such estimates may lead to either computationally demanding explorations of a search space with thousands of dimensions or classification models based on gene sets of unrestricted size. In the former case, unbiased but possibly overfitted classification models may arise. In the latter case, biased classification models unable to support statistically significant findings may be obtained. 相似文献

18.

The PowerAtlas: a power and sample size atlas for microarray experimental design and research

Grier P Page Jode W Edwards Gary L Gadbury Prashanth Yelisetti Jelai Wang Prinal Trivedi David B Allison 《BMC bioinformatics》2006,7(1):84

Background

Microarrays permit biologists to simultaneously measure the mRNA abundance of thousands of genes. An important issue facing investigators planning microarray experiments is how to estimate the sample size required for good statistical power. What is the projected sample size or number of replicate chips needed to address the multiple hypotheses with acceptable accuracy? Statistical methods exist for calculating power based upon a single hypothesis, using estimates of the variability in data from pilot studies. There is, however, a need for methods to estimate power and/or required sample sizes in situations where multiple hypotheses are being tested, such as in microarray experiments. In addition, investigators frequently do not have pilot data to estimate the sample sizes required for microarray studies. 相似文献

19.

Intensity dependent estimation of noise in microarrays improves detection of differentially expressed genes

Amit Zeisel Amnon Amir Wolfgang J Köstler Eytan Domany 《BMC bioinformatics》2010,11(1):400

Background

In many microarray experiments, analysis is severely hindered by a major difficulty: the small number of samples for which expression data has been measured. When one searches for differentially expressed genes, the small number of samples gives rise to an inaccurate estimation of the experimental noise. This, in turn, leads to loss of statistical power. 相似文献

20.

On the number of founding germ cells in humans

Chang-Jiang?Zheng Email author E?Georg?Luebeck Breck?Byers Suresh?H?Moolgavkar 《Theoretical biology & medical modelling》2005,2(1):32

Background

The number of founding germ cells (FGCs) in mammals is of fundamental significance to the fidelity of gene transmission between generations, but estimates from various methods vary widely. In this paper we obtain a new estimate for the value in humans by using a mathematical model of germ cell development that depends on available oocyte counts for adult women. 相似文献