期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

The advent of high-throughput technologies such as ChIP-seq has made possible the study of histone modifications. A problem of particular interest is the identification of regions of the genome where different cell types from the same organism exhibit different patterns of histone enrichment. This problem turns out to be surprisingly difficult, even in simple pairwise comparisons, because of the significant level of noise in ChIP-seq data. In this paper we propose a two-stage statistical method, called ChIPnorm, to normalize ChIP-seq data, and to find differential regions in the genome, given two libraries of histone modifications of different cell types. We show that the ChIPnorm method removes most of the noise and bias in the data and outperforms other normalization methods. We correlate the histone marks with gene expression data and confirm that histone modifications H3K27me3 and H3K4me3 act as respectively a repressor and an activator of genes. Compared to what was previously reported in the literature, we find that a substantially higher fraction of bivalent marks in ES cells for H3K27me3 and H3K4me3 move into a K27-only state. We find that most of the promoter regions in protein-coding genes have differential histone-modification sites. The software for this work can be downloaded from http://lcbb.epfl.ch/software.html. 相似文献

7.

Epigenetics meets next-generation sequencing

《Epigenetics》2013,8(6):318-321

Next-generation sequencing is poised to unleash dramatic changes in every area of molecular biology. In the past few years, chromatin immunoprecipitation (ChIP) on tiled microarrays (ChIP-chip) has been an important tool for genome-wide mapping of DNA-binding proteins or histone modifications. Now, ChIP followed by direct sequencing of DNA fragments (ChIP-seq) offers superior data with less noise and higher resolution and is likely to replace ChIP-chip in the near future. We will describe advantages of this new technology and outline some of the issues in dealing with the data. ChIP-seq generates considerably larger quantities of data and the most challenging aspect for investigators will be computational and statistical analysis necessary to uncover biological insights hidden in the data. 相似文献

8.

A highly efficient and effective motif discovery method for ChIP-seq/ChIP-chip data using positional information

Ma X Kulkarni A Zhang Z Xuan Z Serfling R Zhang MQ 《Nucleic acids research》2012,40(7):e50

相似文献

9.

fdrMotif: identifying cis-elements by an EM algorithm coupled with false discovery rate control

Li L Bass RL Liang Y 《Bioinformatics (Oxford, England)》2008,24(5):629-636

MOTIVATION: Most de novo motif identification methods optimize the motif model first and then separately test the statistical significance of the motif score. In the first stage, a motif abundance parameter needs to be specified or modeled. In the second stage, a Z-score or P-value is used as the test statistic. Error rates under multiple comparisons are not fully considered. Methodology: We propose a simple but novel approach, fdrMotif, that selects as many binding sites as possible while controlling a user-specified false discovery rate (FDR). Unlike existing iterative methods, fdrMotif combines model optimization [e.g. position weight matrix (PWM)] and significance testing at each step. By monitoring the proportion of binding sites selected in many sets of background sequences, fdrMotif controls the FDR in the original data. The model is then updated using an expectation (E)- and maximization (M)-like procedure. We propose a new normalization procedure in the E-step for updating the model. This process is repeated until either the model converges or the number of iterations exceeds a maximum. RESULTS: Simulation studies suggest that our normalization procedure assigns larger weights to the binding sites than do two other commonly used normalization procedures. Furthermore, fdrMotif requires only a user-specified FDR and an initial PWM. When tested on 542 high confidence experimental p53 binding loci, fdrMotif identified 569 p53 binding sites in 505 (93.2%) sequences. In comparison, MEME identified more binding sites but in fewer ChIP sequences than fdrMotif. When tested on 500 sets of simulated 'ChIP' sequences with embedded known p53 binding sites, fdrMotif, compared to MEME, has higher sensitivity with similar positive predictive value. Furthermore, fdrMotif is robust to noise: it selected nearly identical binding sites in data adulterated with 50% added background sequences and the unadulterated data. We suggest that fdrMotif represents an improvement over MEME. AVAILABILITY: C code can be found at: http://www.niehs.nih.gov/research/resources/software/fdrMotif/. 相似文献

10.

A Powerful Statistical Approach for Large-Scale Differential Transcription Analysis

Yuan-De Tan Anita M. Chandler Arindam Chaudhury Joel R. Neilson 《PloS one》2015,10(4)

相似文献

11.

Development of an Illumina-based ChIP-exonuclease method provides insight into FoxA1-DNA binding properties

Aurelien A Serandour Gordon D Brown Joshua D Cohen Jason S Carroll 《Genome biology》2013,14(12):R147

相似文献

12.

Comparison of methods for identifying differentially expressed genes across multiple conditions from microarray data

Tan Y Liu Y 《Bioinformation》2011,7(8):400-404

Identification of genes differentially expressed across multiple conditions has become an important statistical problem in analyzing large-scale microarray data. Many statistical methods have been developed to address the challenging problem. Therefore, an extensive comparison among these statistical methods is extremely important for experimental scientists to choose a valid method for their data analysis. In this study, we conducted simulation studies to compare six statistical methods: the Bonferroni (B-) procedure, the Benjamini and Hochberg (BH-) procedure, the Local false discovery rate (Localfdr) method, the Optimal Discovery Procedure (ODP), the Ranking Analysis of F-statistics (RAF), and the Significant Analysis of Microarray data (SAM) in identifying differentially expressed genes. We demonstrated that the strength of treatment effect, the sample size, proportion of differentially expressed genes and variance of gene expression will significantly affect the performance of different methods. The simulated results show that ODP exhibits an extremely high power in indentifying differentially expressed genes, but significantly underestimates the False Discovery Rate (FDR) in all different data scenarios. The SAM has poor performance when the sample size is small, but is among the best-performing methods when the sample size is large. The B-procedure is stringent and thus has a low power in all data scenarios. Localfdr and RAF show comparable statistical behaviors with the BH-procedure with favorable power and conservativeness of FDR estimation. RAF performs the best when proportion of differentially expressed genes is small and treatment effect is weak, but Localfdr is better than RAF when proportion of differentially expressed genes is large. 相似文献

13.

The use of a synthetic DNA-antibody complex as external reference for chromatin immunoprecipitation

Eberle AB Böhm S Östlund Farrants AK Visa N 《Analytical biochemistry》2012,426(2):147-152

Chromatin immunoprecipitation (ChIP) is an analytical method used to investigate the interactions between proteins and DNA in vivo. ChIP is often used as a quantitative tool, and proper quantification relies on the use of adequate references for data normalization. However, many ChIP experiments involve analyses of samples that have been submitted to experimental treatments with unknown effects, and this precludes the choice of suitable internal references. We have developed a normalization method based on the use of a synthetic DNA-antibody complex that can be used as an external reference instead. A fixed amount of this synthetic DNA-antibody complex is spiked into the chromatin extract at the beginning of the ChIP experiment. The DNA-antibody complex is isolated together with the sample of interest, and the amounts of synthetic DNA recovered in each tube are measured at the end of the process. The yield of synthetic DNA recovery in each sample is then used to normalize the results obtained with the antibodies of interest. Using this approach, we could compensate for losses of material, reduce the variability between ChIP replicates, and increase the accuracy and statistical resolution of the data. 相似文献

14.

Global Mapping of Transcription Factor Binding Sites by Sequencing Chromatin Surrogates: a Perspective on Experimental Design, Data Analysis, and Open Problems

Yingying Wei George Wu Hongkai Ji 《Statistics in biosciences》2013,5(1):156-178

相似文献

15.

Role of ChIP-seq in the discovery of transcription factor binding sites,differential gene regulation mechanism,epigenetic marks and beyond 总被引：1，自引：0，他引：1

Rasika Mundade Hatice Gulcin Ozer Han Wei Lakshmi Prabhu 《Cell cycle (Georgetown, Tex.)》2014,13(18):2847-2852

相似文献

16.

A New Exhaustive Method and Strategy for Finding Motifs in ChIP-Enriched Regions

Caiyan Jia Matthew B. Carson Yang Wang Youfang Lin Hui Lu 《PloS one》2014,9(1)

相似文献

17.

Identifying ChIP-seq enrichment using MACS

J Feng T Liu B Qin Y Zhang XS Liu 《Nature protocols》2012,7(9):1728-1740

相似文献

18.

Statistical Detection of EEG Synchrony Using Empirical Bayesian Inference

Archana K. Singh Hideki Asoh Yuji Takeda Steven Phillips 《PloS one》2015,10(3)

There is growing interest in understanding how the brain utilizes synchronized oscillatory activity to integrate information across functionally connected regions. Computing phase-locking values (PLV) between EEG signals is a popular method for quantifying such synchronizations and elucidating their role in cognitive tasks. However, high-dimensionality in PLV data incurs a serious multiple testing problem. Standard multiple testing methods in neuroimaging research (e.g., false discovery rate, FDR) suffer severe loss of power, because they fail to exploit complex dependence structure between hypotheses that vary in spectral, temporal and spatial dimension. Previously, we showed that a hierarchical FDR and optimal discovery procedures could be effectively applied for PLV analysis to provide better power than FDR. In this article, we revisit the multiple comparison problem from a new Empirical Bayes perspective and propose the application of the local FDR method (locFDR; Efron, 2001) for PLV synchrony analysis to compute FDR as a posterior probability that an observed statistic belongs to a null hypothesis. We demonstrate the application of Efron''s Empirical Bayes approach for PLV synchrony analysis for the first time. We use simulations to validate the specificity and sensitivity of locFDR and a real EEG dataset from a visual search study for experimental validation. We also compare locFDR with hierarchical FDR and optimal discovery procedures in both simulation and experimental analyses. Our simulation results showed that the locFDR can effectively control false positives without compromising on the power of PLV synchrony inference. Our results from the application locFDR on experiment data detected more significant discoveries than our previously proposed methods whereas the standard FDR method failed to detect any significant discoveries. 相似文献

19.

Normalization, testing, and false discovery rate estimation for RNA-sequencing data

Li J Witten DM Johnstone IM Tibshirani R 《Biostatistics (Oxford, England)》2012,13(3):523-538

We discuss the identification of genes that are associated with an outcome in RNA sequencing and other sequence-based comparative genomic experiments. RNA-sequencing data take the form of counts, so models based on the Gaussian distribution are unsuitable. Moreover, normalization is challenging because different sequencing experiments may generate quite different total numbers of reads. To overcome these difficulties, we use a log-linear model with a new approach to normalization. We derive a novel procedure to estimate the false discovery rate (FDR). Our method can be applied to data with quantitative, two-class, or multiple-class outcomes, and the computation is fast even for large data sets. We study the accuracy of our approaches for significance calculation and FDR estimation, and we demonstrate that our method has potential advantages over existing methods that are based on a Poisson or negative binomial model. In summary, this work provides a pipeline for the significance analysis of sequencing data. 相似文献

20.

Widespread Misinterpretable ChIP-seq Bias in Yeast

Daechan Park Yaelim Lee Gurvani Bhupindersingh Vishwanath R. Iyer 《PloS one》2013,8(12)

相似文献