期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Researchers generating new genome-wide data in an exploratory sequencing study can gain biological insights by comparing their data with well-annotated data sets possessing similar genomic patterns. Data compression techniques are needed for efficient comparisons of a new genomic experiment with large repositories of publicly available profiles. Furthermore, data representations that allow comparisons of genomic signals from different platforms and across species enhance our ability to leverage these large repositories. Here, we present a signal processing approach that characterizes protein–chromatin interaction patterns at length scales of several kilobases. This allows us to efficiently compare numerous chromatin-immunoprecipitation sequencing (ChIP-seq) data sets consisting of many types of DNA-binding proteins collected from a variety of cells, conditions and organisms. Importantly, these interaction patterns broadly reflect the biological properties of the binding events. To generate these profiles, termed Arpeggio profiles, we applied harmonic deconvolution techniques to the autocorrelation profiles of the ChIP-seq signals. We used 806 publicly available ChIP-seq experiments and showed that Arpeggio profiles with similar spectral densities shared biological properties. Arpeggio profiles of ChIP-seq data sets revealed characteristics that are not easily detected by standard peak finders. They also allowed us to relate sequencing data sets from different genomes, experimental platforms and protocols. Arpeggio is freely available at http://sourceforge.net/p/arpeggio/wiki/Home/. 相似文献

14.

ChIPnorm: A Statistical Method for Normalizing and Identifying Differential Regions in Histone Modification ChIP-seq Libraries

NU Nair AD Sahu P Bucher BM Moret 《PloS one》2012,7(8):e39573

The advent of high-throughput technologies such as ChIP-seq has made possible the study of histone modifications. A problem of particular interest is the identification of regions of the genome where different cell types from the same organism exhibit different patterns of histone enrichment. This problem turns out to be surprisingly difficult, even in simple pairwise comparisons, because of the significant level of noise in ChIP-seq data. In this paper we propose a two-stage statistical method, called ChIPnorm, to normalize ChIP-seq data, and to find differential regions in the genome, given two libraries of histone modifications of different cell types. We show that the ChIPnorm method removes most of the noise and bias in the data and outperforms other normalization methods. We correlate the histone marks with gene expression data and confirm that histone modifications H3K27me3 and H3K4me3 act as respectively a repressor and an activator of genes. Compared to what was previously reported in the literature, we find that a substantially higher fraction of bivalent marks in ES cells for H3K27me3 and H3K4me3 move into a K27-only state. We find that most of the promoter regions in protein-coding genes have differential histone-modification sites. The software for this work can be downloaded from http://lcbb.epfl.ch/software.html. 相似文献

15.

A complete workflow for the analysis of full-size ChIP-seq (and similar) data sets using peak-motifs

M Thomas-Chollier E Darbo C Herrmann M Defrance D Thieffry J van Helden 《Nature protocols》2012,7(8):1551-1568

相似文献

16.

RTFAdb: A database of computationally predicted associations between retrotransposons and transcription factors in the human and mouse genomes

Gökhan Karakülah 《Genomics》2018,110(5):257-262

相似文献

17.

LOcating Non-Unique matched Tags (LONUT) to Improve the Detection of the Enriched Regions for ChIP-seq Data

Rui Wang Hang-Kai Hsu Adam Blattler Yisong Wang Xun Lan Yao Wang Pei-Yin Hsu Yu-Wei Leu Tim H.-M. Huang Peggy J. Farnham Victor X. Jin 《PloS one》2013,8(6)

One big limitation of computational tools for analyzing ChIP-seq data is that most of them ignore non-unique tags (NUTs) that match the human genome even though NUTs comprise up to 60% of all raw tags in ChIP-seq data. Effectively utilizing these NUTs would increase the sequencing depth and allow a more accurate detection of enriched binding sites, which in turn could lead to more precise and significant biological interpretations. In this study, we have developed a computational tool, LOcating Non-Unique matched Tags (LONUT), to improve the detection of enriched regions from ChIP-seq data. Our LONUT algorithm applies a linear and polynomial regression model to establish an empirical score (ES) formula by considering two influential factors, the distance of NUTs to peaks identified using uniquely matched tags (UMTs) and the enrichment score for those peaks resulting in each NUT being assigned to a unique location on the reference genome. The newly located tags from the set of NUTs are combined with the original UMTs to produce a final set of combined matched tags (CMTs). LONUT was tested on many different datasets representing three different characteristics of biological data types. The detected sites were validated using de novo motif discovery and ChIP-PCR. We demonstrate the specificity and accuracy of LONUT and show that our program not only improves the detection of binding sites for ChIP-seq, but also identifies additional binding sites. 相似文献

18.

Genome-Wide Signatures of Transcription Factor Activity: Connecting Transcription Factors,Disease, and Small Molecules

Jing Chen Zhen Hu Mukta Phatak John Reichard Johannes M. Freudenberg Siva Sivaganesan Mario Medvedovic 《PLoS computational biology》2013,9(9)

相似文献

19.

hmChIP: a database and web server for exploring publicly available human and mouse ChIP-seq and ChIP-chip data

Chen L Wu G Ji H 《Bioinformatics (Oxford, England)》2011,27(10):1447-1448

hmChIP is a database of genome-wide chromatin immunoprecipitation (ChIP) data in human and mouse. Currently, the database contains 2016 samples from 492 ChIP-seq and ChIP-chip experiments, representing a total of 170 proteins and 11 069 914 protein-DNA interactions. A web server provides interface for database query. Protein-DNA binding intensities can be retrieved from individual samples for user-provided genomic regions. The retrieved intensities can be used to cluster samples and genomic regions to facilitate exploration of combinatorial patterns, cell-type dependencies, and cross-sample variability of protein-DNA interactions. AVAILABILITY: http://jilab.biostat.jhsph.edu/database/cgi-bin/hmChIP.pl. 相似文献

20.

Modeling ChIP sequencing in silico with applications

Zhang ZD Rozowsky J Snyder M Chang J Gerstein M 《PLoS computational biology》2008,4(8):e1000158

相似文献