首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 0 毫秒


DNA methylation has been identified to be widely associated to complex diseases. Among biological platforms to profile DNA methylation in human, the Illumina Infinium HumanMethylation450 BeadChip (450K) has been accepted as one of the most efficient technologies. However, challenges exist in analysis of DNA methylation data generated by this technology due to widespread biases.


Here we proposed a generalized framework for evaluating data analysis methods for Illumina 450K array. This framework considers the following steps towards a successful analysis: importing data, quality control, within-array normalization, correcting type bias, detecting differentially methylated probes or regions and biological interpretation.


We evaluated five methods using three real datasets, and proposed outperform methods for the Illumina 450K array data analysis. Minfi and methylumi are optimal choice when analyzing small dataset. BMIQ and RCP are proper to correcting type bias and the normalized result of them can be used to discover DMPs. R package missMethyl is suitable for GO term enrichment analysis and biological interpretation.

The proper identification of differentially methylated CpGs is central in most epigenetic studies. The Illumina HumanMethylation450 BeadChip is widely used to quantify DNA methylation; nevertheless, the design of an appropriate analysis pipeline faces severe challenges due to the convolution of biological and technical variability and the presence of a signal bias between Infinium I and II probe design types. Despite recent attempts to investigate how to analyze DNA methylation data with such an array design, it has not been possible to perform a comprehensive comparison between different bioinformatics pipelines due to the lack of appropriate data sets having both large sample size and sufficient number of technical replicates. Here we perform such a comparative analysis, targeting the problems of reducing the technical variability, eliminating the probe design bias and reducing the batch effect by exploiting two unpublished data sets, which included technical replicates and were profiled for DNA methylation either on peripheral blood, monocytes or muscle biopsies. We evaluated the performance of different analysis pipelines and demonstrated that: (1) it is critical to correct for the probe design type, since the amplitude of the measured methylation change depends on the underlying chemistry; (2) the effect of different normalization schemes is mixed, and the most effective method in our hands were quantile normalization and Beta Mixture Quantile dilation (BMIQ); (3) it is beneficial to correct for batch effects. In conclusion, our comparative analysis using a comprehensive data set suggests an efficient pipeline for proper identification of differentially methylated CpGs using the Illumina 450K arrays.  相似文献   

DNA methylation plays a fundamental role in the regulation of the genome, but the optimal strategy for analysis of genome-wide DNA methylation data remains to be determined. We developed a comprehensive analysis pipeline for epigenome-wide association studies (EWAS) using the Illumina Infinium HumanMethylation450 BeadChip, based on 2,687 individuals, with 36 samples measured in duplicate. We propose new approaches to quality control, data normalisation and batch correction through control-probe adjustment and establish a null hypothesis for EWAS using permutation testing. Our analysis pipeline outperforms existing approaches, enabling accurate identification of methylation quantitative trait loci for hypothesis driven follow-up experiments.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-015-0600-x) contains supplementary material, which is available to authorized users.  相似文献   

DNA methylation is the most widely studied epigenetic mark and is known to be essential to normal development and frequently disrupted in disease. The Illumina HumanMethylation450 BeadChip assays the methylation status of CpGs at 485,577 sites across the genome. Here we present Subset-quantile Within Array Normalization (SWAN), a new method that substantially improves the results from this platform by reducing technical variation within and between arrays. SWAN is available in the minfi Bioconductor package.  相似文献   

Due to their relatively low-cost per sample and broad, gene-centric coverage of CpGs across the human genome, Illumina''s 450k arrays are widely used in large scale differential methylation studies. However, by their very nature, large studies are particularly susceptible to the effects of unwanted variation. The effects of unwanted variation have been extensively documented in gene expression array studies and numerous methods have been developed to mitigate these effects. However, there has been much less research focused on the appropriate methodology to use for accounting for unwanted variation in methylation array studies. Here we present a novel 2-stage approach using RUV-inverse in a differential methylation analysis of 450k data and show that it outperforms existing methods.  相似文献   

DNA methylation, an important type of epigenetic modification in humans, participates in crucial cellular processes, such as embryonic development, X-inactivation, genomic imprinting and chromosome stability. Several platforms have been developed to study genome-wide DNA methylation. Many investigators in the field have chosen the Illumina Infinium HumanMethylation microarray for its ability to reliably assess DNA methylation following sodium bisulfite conversion. Here, we analyzed methylation profiles of 489 adult males and 357 adult females generated by the Infinium HumanMethylation450 microarray. Among the autosomal CpG sites that displayed significant methylation differences between the two sexes, we observed a significant enrichment of cross-reactive probes co-hybridizing to the sex chromosomes with more than 94% sequence identity. This could lead investigators to mistakenly infer the existence of significant autosomal sex-associated methylation. Using sequence identity cutoffs derived from the sex methylation analysis, we concluded that 6% of the array probes can potentially generate spurious signals because of co-hybridization to alternate genomic sequences highly homologous to the intended targets. Additionally, we discovered probes targeting polymorphic CpGs that overlapped SNPs. The methylation levels detected by these probes are simply the reflection of underlying genetic polymorphisms but could be misinterpreted as true signals. The existence of probes that are cross-reactive or of target polymorphic CpGs in the Illumina HumanMethylation microarrays can confound data obtained from such microarrays. Therefore, investigators should exercise caution when significant biological associations are found using these array platforms. A list of all cross-reactive probes and polymorphic CpGs identified by us are annotated in this paper.  相似文献   

Analysis of epigenetic mechanisms, particularly DNA methylation, is of increasing interest for epidemiologic studies examining disease etiology and impacts of environmental exposures. The Infinium HumanMethylation450 BeadChip® (450K), which interrogates over 480?000 CpG sites and is relatively cost effective, has become a popular tool to characterize the DNA methylome. For large-scale studies, minimizing technical variability and potential bias is paramount. The goal of this paper was to evaluate the performance of several existing and novel color channel normalizations designed to reduce technical variability and batch effects in 450K analysis from a large population study. Comparative assessment of 10 normalization procedures included the GenomeStudio® Illumina procedure, the lumi smooth quantile approach, and the newly proposed All Sample Mean Normalization (ASMN). We also examined the performance of normalizations in combination with correction for the two types of Infinium chemistry utilized on the 450K array. We observed that the performance of the GenomeStudio® normalization procedure was highly variable and dependent on the quality of the first sample analyzed in an experiment, which is used as a reference in this procedure. While the lumi normalization was able to decrease batch variability, it increased variation among technical replicates, potentially reducing biologically meaningful findings. The proposed ASMN procedure performed consistently well, both at reducing batch effects and improving replicate comparability. In summary, the ASMN procedure can improve existing color channel normalization, especially for large epidemiologic studies, and can be successfully implemented to enhance a 450K DNA methylation data pipeline.  相似文献   

To address the limitations in current classic twin/family research on the genetic and/or environmental causes of human methylomic variation, we measured blood DNA methylation for 479 women (mean age 56 years) including 66 monozygotic (MZ), 66 dizygotic (DZ) twin pairs and 215 sisters of twins, and 11 random technical duplicates using the HumanMethylation450 array. For each methylation site, we estimated the correlation for pairs of duplicates, MZ twins, DZ twins, and siblings, fitted variance component models by assuming the variation is explained by genetic factors, by shared and individual environmental factors, and by independent measurement error, and assessed the best fitting model. We found that the average (standard deviation) correlations for duplicate, MZ, DZ, and sibling pairs were 0.10 (0.35), 0.07 (0.21), -0.01 (0.14) and -0.04 (0.07). At the genome-wide significance level of 10?7, 93.3% of sites had no familial correlation, and 5.6%, 0.1%, and 0.2% of sites were correlated for MZ, DZ, and sibling pairs. For 86.4%, 6.9%, and 7.1% of sites, the best fitting model included measurement error only, a genetic component, and at least one environmental component. For the 13.6% of sites influenced by genetic and/or environmental factors, the average proportion of variance explained by environmental factors was greater than that explained by genetic factors (0.41 vs. 0.37, P value <10?15). Our results are consistent with, for middle-aged woman, blood methylomic variation measured by the HumanMethylation450 array being largely explained by measurement error, and more influenced by environmental factors than by genetic factors.  相似文献   



The original spotted array technology with competitive hybridization of two experimental samples and measuring relative expression levels is increasingly displaced by more accurate platforms that allow determining absolute expression values for a single sample (for example, Affymetrix GeneChip and Illumina BeadChip). Unfortunately, cross-platform comparisons show a disappointingly low concordance between lists of regulated genes between the latter two platforms.  相似文献   

A method has been developed for reducing the intrinsic autofluorescence background component in cells labeled with fluorescent antibodies, thus permitting low levels of antibody-binding on highly autofluorescent cells to be quantified. The method is based on the broad autofluorescent excitation spectra compared to the well-defined spectra of the fluorescent label. Two laser wavelengths were used, one optimally to excite the fluorescent label plus autofluorescence and the second to excite only the autofluorescence. Two fluorescence measurements were made in the same wavelength region and the signals were subtracted on a cell-by-cell basis using a difference amplifier to zero the autofluorescence and amplify the signal from the fluorescent label. Test results on unlabeled autofluorescent macrophages showed that the autofluorescence component was reduced by balancing the signal inputs to the difference amplifier. When labeled macrophages were analyzed, the autofluorescence was reduced and the fluorescent-labeled antibody-binding component was amplified. The method was also able to resolve labeled lymphocytes from unlabeled autofluorescent macrophages.  相似文献   



Paclitaxel is a microtubule-stabilizing drug that has been commonly used in treating cancer. Due to genetic heterogeneity within patient populations, therapeutic response rates often vary. Here we used the NCI60 panel to identify SNPs associated with paclitaxel sensitivity. Using the panel's GI50 response data available from Developmental Therapeutics Program, cell lines were categorized as either sensitive or resistant. PLINK software was used to perform a genome-wide association analysis of the cellular response to paclitaxel with the panel's SNP-genotype data on the Affymetrix 125 k SNP array. FastSNP software helped predict each SNP's potential impact on their gene product. mRNA expression differences between sensitive and resistant cell lines was examined using data from BioGPS. Using Haploview software, we investigated for haplotypes that were more strongly associated with the cellular response to paclitaxel. Ingenuity Pathway Analysis software helped us understand how our identified genes may alter the cellular response to paclitaxel.


43 SNPs were found significantly associated (FDR < 0.005) with paclitaxel response, with 10 belonging to protein-coding genes (CFTR, ROBO1, PTPRD, BTBD12, DCT, SNTG1, SGCD, LPHN2, GRIK1, ZNF607). SNPs in GRIK1, DCT, SGCD and CFTR were predicted to be intronic enhancers, altering gene expression, while SNPs in ZNF607 and BTBD12 cause conservative missense mutations. mRNA expression analysis supported these findings as GRIK1, DCT, SNTG1, SGCD and CFTR showed significantly (p < 0.05) increased expression among sensitive cell lines. Haplotypes found in GRIK1, SGCD, ROBO1, LPHN2, and PTPRD were more strongly associated with response than their individual SNPs.


Our study has taken advantage of available genotypic data and its integration with drug response data obtained from the NCI60 panel. We identified 10 SNPs located within protein-coding genes that were not previously shown to be associated with paclitaxel response. As only five genes showed differential mRNA expression, the remainder would not have been detected solely based on expression data. The identified haplotypes highlight the role of utilizing SNP combinations within genomic loci of interest to improve the risk determination associated with drug response. These genetic variants represent promising biomarkers for predicting paclitaxel response and may play a significant role in the cellular response to paclitaxel.  相似文献   

lumi: a pipeline for processing Illumina microarray   总被引:2,自引:0,他引:2  
Illumina microarray is becoming a popular microarray platform. The BeadArray technology from Illumina makes its preprocessing and quality control different from other microarray technologies. Unfortunately, most other analyses have not taken advantage of the unique properties of the BeadArray system, and have just incorporated preprocessing methods originally designed for Affymetrix microarrays. lumi is a Bioconductor package especially designed to process the Illumina microarray data. It includes data input, quality control, variance stabilization, normalization and gene annotation portions. In specific, the lumi package includes a variance-stabilizing transformation (VST) algorithm that takes advantage of the technical replicates available on every Illumina microarray. Different normalization method options and multiple quality control plots are provided in the package. To better annotate the Illumina data, a vendor independent nucleotide universal identifier (nuID) was devised to identify the probes of Illumina microarray. The nuID annotation packages and output of lumi processed results can be easily integrated with other Bioconductor packages to construct a statistical data analysis pipeline for Illumina data. Availability: The lumi Bioconductor package, www.bioconductor.org  相似文献   

The Infinium Human Methylation450 BeadChip ArrayTM (Infinium 450K) is an important tool for studying epigenetic patterns associated with disease. This array offers a high-throughput, low cost alternative to more comprehensive sequencing-based methodologies. Here we compare data generated by interrogation of the same seven clinical samples by Infinium 450K and reduced representation bisulfite sequencing (RRBS). This is the largest data set comparing Infinium 450K array to the comprehensive RRBS methodology reported so far. We show good agreement between the two methodologies. A read depth of four or more reads in the RRBS data was sufficient to achieve good agreement with Infinium 450K. However, we observe that intermediate methylation values (20–80%) are more variable between technologies than values at the extremes of the bimodal methylation distribution. We describe careful processing of Infinium 450K data to correct for known limitations and batch effects. Using methodologies proposed by others and newly implemented and combined in this report, agreement of Infinium 450K data with independent techniques can be vastly improved.  相似文献   

MOTIVATION: High-throughput screening (HTS) is an important method in drug discovery in which the activities of a large number of candidate chemicals or genetic materials are rapidly evaluated. Data are usually obtained by measurements on samples in microwell plates and are often subjected to artefacts that can bias the result selection. We report here a novel edge effect correction algorithm suitable for RNA interference (RNAi) screening, because its normalization does not rely on the entire dataset and takes into account the specificities of such a screening process. The proposed method is able to estimate the edge effects for each assay plate individually using the data from a single control column based on diffusion model, and thus targeting a specific but recurrent well-known HTS artefact. This method was first developed and validated using control plates and was then applied to the correction of experimental data generated during a genome-wide siRNA screen aimed at studying HIV-host interactions. The proposed algorithm was able to correct the edge effect biasing the control data and thus improve assay quality and, consequently, the hit-selection step.  相似文献   

The Infinium Human Methylation450 BeadChip ArrayTM (Infinium 450K) is an important tool for studying epigenetic patterns associated with disease. This array offers a high-throughput, low cost alternative to more comprehensive sequencing-based methodologies. Here we compare data generated by interrogation of the same seven clinical samples by Infinium 450K and reduced representation bisulfite sequencing (RRBS). This is the largest data set comparing Infinium 450K array to the comprehensive RRBS methodology reported so far. We show good agreement between the two methodologies. A read depth of four or more reads in the RRBS data was sufficient to achieve good agreement with Infinium 450K. However, we observe that intermediate methylation values (20–80%) are more variable between technologies than values at the extremes of the bimodal methylation distribution. We describe careful processing of Infinium 450K data to correct for known limitations and batch effects. Using methodologies proposed by others and newly implemented and combined in this report, agreement of Infinium 450K data with independent techniques can be vastly improved.  相似文献   

Summary In the event of weak autoradiographic labelling, the proportion of truly labelled cells or structures can be calculated from the frequency distributions of grains per area or cell structure fori=0, 1,...,n grains using the results obtained for an experimental group after the application of a radioactively labelled substance and those obtained for a control group without radioactivity. The principle of this computer-aided method is also applicable when the grain counts are related to varying areas in histological sections.Dedicated to Professor Dr. T.H.Schiebler on the occasion of his 65th birthday  相似文献   

An improved procedure for background correction in autoradiography   总被引:1,自引:0,他引:1  
H Korr  H Schmidt 《Histochemistry》1988,88(3-6):407-410
In the event of weak autoradiographic labelling, the proportion of truly labelled cells or structures can be calculated from the frequency distributions of grains per area or cell structure for i = 0, 1,..., n grains using the results obtained for an experimental group after the application of a radioactively labelled substance and those obtained for a control group without radioactivity. The principle of this computer-aided method is also applicable when the grain counts are related to varying areas in histological sections.  相似文献   

Most microarray scanning software for glass spotted arrays provides estimates for the intensity for the "foreground" and "background" of two channels for every spot. The common approach in further analyzing such data is to first subtract the background from the foreground for each channel and to use the ratio of these two results as the estimate of the expression level. The resulting ratios are, after possible averaging over replicates, the usual inputs for further data analysis, such as clustering. If, with this background correction procedure, the foreground intensity was smaller than the background intensity for a channel, that spot (on that array) yields no usable data. In this paper it is argued that this preprocessing leads to estimates of the expression that have a much larger variance than needed when the expression levels are low.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号