首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
There are many options in handling microarray data that can affect study conclusions, sometimes drastically. Working with a two-color platform, this study uses ten spike-in microarray experiments to evaluate the relative effectiveness of some of these options for the experimental goal of detecting differential expression. We consider two data transformations, background subtraction and intensity normalization, as well as six different statistics for detecting differentially expressed genes. Findings support the use of an intensity-based normalization procedure and also indicate that local background subtraction can be detrimental for effectively detecting differential expression. We also verify that robust statistics outperform t-statistics in identifying differentially expressed genes when there are few replicates. Finally, we find that choice of image analysis software can also substantially influence experimental conclusions.  相似文献   

2.
There exist now a number of statistical methods for detecting differential gene expression in experiments with microarray data. In trials under two conditions, a version of the two-sample t statistic is usually used. However, the problem of estimating the power for these tests has so far been insufficiently studied. In this paper, we propose a method to calculate the power of the robust t test for detecting differential gene expression in experiments with twins. We discuss also the results of the implementation of this method to simulated data.  相似文献   

3.

Background

Sessile serrated adenomas/polyps are distinguished from hyperplastic colonic polyps subjectively by their endoscopic appearance and histological morphology. However, hyperplastic and sessile serrated polyps can have overlapping morphological features resulting in sessile serrated polyps diagnosed as hyperplastic. While sessile serrated polyps can progress into colon cancer, hyperplastic polyps have virtually no risk for colon cancer. Objective measures, differentiating these types of polyps would improve cancer prevention and treatment outcome.

Methods

RNA-seq training data set and Affimetrix, Illumina testing data sets were obtained from Gene Expression Omnibus (GEO). RNA-seq single-end reads were filtered with FastX toolkit. Read mapping to the human genome, gene abundance estimation, and differential expression analysis were performed with Tophat-Cufflinks pipeline. Background correction, normalization, and probe summarization steps for Affimetrix arrays were performed using the robust multi-array method (RMA). For Illumina arrays, log2-scale expression data was obtained from GEO. Pathway analysis was implemented using Bioconductor package GSAR. To build a platform-independent molecular classifier that accurately differentiates sessile serrated and hyperplastic polyps we developed a new feature selection step. We also developed a simple procedure to classify new samples as either sessile serrated or hyperplastic with a class probability assigned to the decision, estimated using Cantelli’s inequality.

Results

The classifier trained on RNA-seq data and tested on two independent microarray data sets resulted in zero and three errors. The classifier was further tested using quantitative real-time PCR expression levels of 45 blinded independent formalin-fixed paraffin-embedded specimens and was highly accurate. Pathway analyses have shown that sessile serrated polyps are distinguished from hyperplastic polyps and normal controls by: up-regulation of pathways implicated in proliferation, inflammation, cell-cell adhesion and down-regulation of serine threonine kinase signaling pathway; differential co-expression of pathways regulating cell division, protein trafficking and kinase activities.

Conclusions

Most of the differentially expressed pathways are known as hallmarks of cancer and likely to explain why sessile serrated polyps are more prone to neoplastic transformation than hyperplastic. The new molecular classifier includes 13 genes and may facilitate objective differentiation between two polyps.
  相似文献   

4.
High‐throughput microarray experiments often generate far more biological information than is required to test the experimental hypotheses. Many microarray analyses are considered finished after differential expression and additional analyses are typically not performed, leaving untapped biological information left undiscovered. This is especially true if the microarray experiment is from an ecological study of multiple populations. Comparisons across populations may also contain important genomic polymorphisms, and a subset of these polymorphisms may be identified with microarrays using techniques for the detection of single feature polymorphisms (SFP). SFPs are differences in microarray probe level intensities caused by genetic polymorphisms such as single‐nucleotide polymorphisms and small insertions/deletions and not expression differences. In this study, we provide a new algorithm for the detection of SFPs, evaluate the algorithm using existing data from two publicly available Affymetrix Barley (Hordeum vulgare) microarray data sets and compare them to two previously published SFP detection algorithms. Results show that our algorithm provides more consistent and sensitive calling of SFPs with a lower false discovery rate. Simultaneous analysis of SFPs and differential expression is a low‐cost method for the enhanced analysis of microarray data, enabling additional biological inferences to be made.  相似文献   

5.
ABSTRACT: BACKGROUND: In the field of mouse genetics the advent of technologies like microarray based expression profiling dramatically increased data availability and sensitivity, yet these advanced methods are often vulnerable to the unavoidable heterogeneity of in vivo material and might therefore reflect differentially expressed genes between mouse strains of no relevance to a targeted experiment. The aim of this study was not to elaborate on the usefulness of microarray analysis in general, but to expand our knowledge regarding this potential "background noise" for the widely used Illumina microarray platform surpassing existing data which focused primarily on the adult sensory and nervous system, by analyzing patterns of gene expression at different embryonic stages using wild type strains and modern transgenic models of often non-isogenic backgrounds. RESULTS: Wild type embryos of 11 mouse strains commonly used in transgenic and molecular genetic studies at three developmental time points were subjected to Illumina microarray expression profiling in a strain-by-strain comparison. Our data robustly reflects known gene expression patterns during mid-gestation development. Decreasing diversity of the input tissue and/or increasing strain diversity raised the sensitivity of the array towards the genetic background. Consistent strain sensitivity of some probes was attributed to genetic polymorphisms or probe design related artifacts. CONCLUSION: Our study provides an extensive reference list of gene expression profiling background noise of value to anyone in the field of developmental biology and transgenic research performing microarray expression profiling with the widely used Illumina microarray platform. Probes identified as strain specific background noise further allow for microarray expression profiling on its own to be a valuable tool for establishing genealogies of mouse inbred strains.  相似文献   

6.
limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.  相似文献   

7.
8.
The Illumina HumanMethylation450 BeadChip is increasingly utilized in epigenome-wide association studies, however, this array-based measurement of DNA methylation is subject to measurement variation. Appropriate data preprocessing to remove background noise is important for detecting the small changes that may be associated with disease. We developed a novel background correction method, ENmix, that uses a mixture of exponential and truncated normal distributions to flexibly model signal intensity and uses a truncated normal distribution to model background noise. Depending on data availability, we employ three approaches to estimate background normal distribution parameters using (i) internal chip negative controls, (ii) out-of-band Infinium I probe intensities or (iii) combined methylated and unmethylated intensities. We evaluate ENmix against other available methods for both reproducibility among duplicate samples and accuracy of methylation measurement among laboratory control samples. ENmix out-performed other background correction methods for both these measures and substantially reduced the probe-design type bias between Infinium I and II probes. In reanalysis of existing EWAS data we show that ENmix can identify additional CpGs, and results in smaller P-value estimates for previously-validated CpGs. We incorporated the method into R package ENmix, which is freely available from Bioconductor website.  相似文献   

9.
The analysis of two-colour cDNA microarray data usually involves subtracting background values from foreground values prior to normalization and further analysis. This approach has the advantage of reducing bias and the disadvantage of blowing up the variance of lower abundant spots. Whenever background subtraction is considered, it implicitly assumes locally constant background values. In practice, this assumption is often not met, which casts doubts on the usefulness of simple background subtraction. In order to improve background correction, we propose local background smoothing within the pre-processing pipeline of cDNA microarray data prior to background correction. For this purpose, we employ a geostatistical framework with ordinary kriging using both isotropic and anisotropic models of spatial correlation and 2-D locally weighted regression. We show that application of local background smoothing prior to background correction is beneficial in comparison to using raw background estimates. This is done using data of a self-versus-self experiment in Arabidopsis where subsets of differentially expressed genes were simulated. Using locally smoothed background values in conjunction with existing background correction methods increases the power, increases the accuracy and decreases the number of false positive results.  相似文献   

10.
Andersson T  Unneberg P  Nilsson P  Odeberg J  Quackenbush J  Lundeberg J 《BioTechniques》2002,32(6):1348-50, 1352, 1354-6, 1358
Various approaches to the study of differential gene expression are applied to compare cell lines and tissue samples in a wide range of biological contexts. The compromise between focusing on only the important genes in certain cellular processes and achieving a complete picture is critical for the selection of strategy. We demonstrate how global microarray technology can be used for the exploration of the differentially expressed genes extracted through representational difference analysis (RDA). The subtraction of ubiquitous gene fragments from the two samples was demonstrated using cDNA microarrays including more than 32 000 spotted, PCR-amplified human clones. Hybridizations indicated the expression of 9100 of the microarray elements in a macrophage/foam cell atherosclerosis model system, of which many were removed during the RDA process. The stepwise subtraction procedure was demonstrated to yield an efficient enrichment of gene fragments overrepresented in either sample (18% in the representations, 86% after the first subtraction, and 88% after the second subtraction), many of which were impossible to detect in the starting material. Interestingly, the method allowed for the observation of the differential expression of several members of the low-abundant nuclear receptor gene family. We also observed a certain background level in the difference products of nondifferentially expressed gene fragments, warranting a verification strategy for selected candidate genes. The differential expression of several genes was verified by real-time PCR.  相似文献   

11.
Statistical tests for differential expression in cDNA microarray experiments   总被引:13,自引:0,他引:13  
Extracting biological information from microarray data requires appropriate statistical methods. The simplest statistical method for detecting differential expression is the t test, which can be used to compare two conditions when there is replication of samples. With more than two conditions, analysis of variance (ANOVA) can be used, and the mixed ANOVA model is a general and powerful approach for microarray experiments with multiple factors and/or several sources of variation.  相似文献   

12.
Lyu  Yafei  Li  Qunhua 《BMC bioinformatics》2016,17(1):51-60
Determining differentially expressed genes (DEGs) between biological samples is the key to understand how genotype gives rise to phenotype. RNA-seq and microarray are two main technologies for profiling gene expression levels. However, considerable discrepancy has been found between DEGs detected using the two technologies. Integration data across these two platforms has the potential to improve the power and reliability of DEG detection. We propose a rank-based semi-parametric model to determine DEGs using information across different sources and apply it to the integration of RNA-seq and microarray data. By incorporating both the significance of differential expression and the consistency across platforms, our method effectively detects DEGs with moderate but consistent signals. We demonstrate the effectiveness of our method using simulation studies, MAQC/SEQC data and a synthetic microRNA dataset. Our integration method is not only robust to noise and heterogeneity in the data, but also adaptive to the structure of data. In our simulations and real data studies, our approach shows a higher discriminate power and identifies more biologically relevant DEGs than eBayes, DEseq and some commonly used meta-analysis methods.  相似文献   

13.
The level of differential gene expression may be defined as a fold change, a frequency of upregulation, or some other measure of the degree or extent of a difference in expression across groups of interest. On the basis of expression data for hundreds or thousands of genes, inferring which genes are differentially expressed or ranking genes in order of priority introduces a bias in estimates of their differential expression levels. A previous correction of this feature selection bias suffers from a lack of generality in the method of ranking genes, from requiring many biological replicates, and from unnecessarily overcompensating for the bias. For any method of ranking genes on the basis of gene expression measured for as few as three biological replicates, a simple leave-one-out algorithm corrects, with less overcompensation, the bias in estimates of the level of differential gene expression. In a microarray data set, the bias correction reduces estimates of the probability of upregulation or downregulation from 100% to as low as 60%, even for genes with estimated local false discovery rates close to 0. A simulation study quantifies both the advantage of smoothing estimates of bias before correction and the degree of overcompensation.  相似文献   

14.
Aberrant DNA methylation is known to occur in cancer, including hematological malignancies such as acute myeloid leukemia (AML). However, less is known about whether specific methylation profiles characterize specific subcategories of AML. We examined this issue by using comprehensive high-throughput array-based relative methylation analysis (CHARM) to compare methylation profiles among patients in different AML cytogenetic risk groups. We found distinct profiles in each group, with the high-risk group showing overall increased methylation compared with low- and mid-risk groups. The differentially methylated regions (DMRs) distinguishing cytogenetic risk groups of AML were enriched in the CpG island shores. Specific risk-group associated DMRs were located near genes previously known to play a role in AML or other malignancies, such as MN1, UHRF1, HOXB3, and HOXB4, as well as TRIM71, the function of which in cancer is not well characterized. These findings were verified by quantitative bisulfite pyrosequencing and by comparison with results available at the TCGA cancer genome browser. To explore the potential biological significance of the observed methylation changes, we correlated our findings with gene expression data available through the TCGA database. The results showed that decreased methylation at HOXB3 and HOXB4 was associated with increased gene expression of both HOXB genes specific to the mid-risk AML, while increased DNA methylation at DCC distinctive to the high-risk AML was associated with increased gene expression. Our results suggest that the differential impact of cytogenetic changes on AML prognosis may, in part, be mediated by changes in methylation.  相似文献   

15.
Broberg P 《Genome biology》2002,3(9):preprint00-23

Background  

In the pharmaceutical industry and in academia substantial efforts are made to make the best use of the promising microarray technology. The data generated by microarrays are more complex than most other biological data attracting much attention at this point. A method for finding an optimal test statistic with which to rank genes with respect to differential expression is outlined and tested. At the heart of the method lies an estimate of the false negative and false positive rates. Both investing in false positives and missing true positives lead to a waste of resources. The procedure sets out to minimise these errors. For calculation of the false positive and negative rates a simulation procedure is invoked.  相似文献   

16.
Differential analysis of DNA microarray gene expression data   总被引:6,自引:0,他引:6  
Here, we review briefly the sources of experimental and biological variance that affect the interpretation of high-dimensional DNA microarray experiments. We discuss methods using a regularized t-test based on a Bayesian statistical framework that allow the identification of differentially regulated genes with a higher level of confidence than a simple t-test when only a few experimental replicates are available. We also describe a computational method for calculating the global false-positive and false-negative levels inherent in a DNA microarray data set. This method provides a probability of differential expression for each gene based on experiment-wide false-positive and -negative levels driven by experimental error and biological variance.  相似文献   

17.
18.

Background

Although numerous investigations have compared gene expression microarray platforms, preprocessing methods and batch correction algorithms using constructed spike-in or dilution datasets, there remains a paucity of studies examining the properties of microarray data using diverse biological samples. Most microarray experiments seek to identify subtle differences between samples with variable background noise, a scenario poorly represented by constructed datasets. Thus, microarray users lack important information regarding the complexities introduced in real-world experimental settings. The recent development of a multiplexed, digital technology for nucleic acid measurement enables counting of individual RNA molecules without amplification and, for the first time, permits such a study.

Results

Using a set of human leukocyte subset RNA samples, we compared previously acquired microarray expression values with RNA molecule counts determined by the nCounter Analysis System (NanoString Technologies) in selected genes. We found that gene measurements across samples correlated well between the two platforms, particularly for high-variance genes, while genes deemed unexpressed by the nCounter generally had both low expression and low variance on the microarray. Confirming previous findings from spike-in and dilution datasets, this “gold-standard” comparison demonstrated signal compression that varied dramatically by expression level and, to a lesser extent, by dataset. Most importantly, examination of three different cell types revealed that noise levels differed across tissues.

Conclusions

Microarray measurements generally correlate with relative RNA molecule counts within optimal ranges but suffer from expression-dependent accuracy bias and precision that varies across datasets. We urge microarray users to consider expression-level effects in signal interpretation and to evaluate noise properties in each dataset independently.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-649) contains supplementary material, which is available to authorized users.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号