首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Hi-C data provide population averaged estimates of three-dimensional chromatin contacts across cell types and states in bulk samples. Effective analysis of Hi-C data entails controlling for the potential confounding factor of differential cell type proportions across heterogeneous bulk samples. We propose a novel unsupervised deconvolution method for inferring cell type composition from bulk Hi-C data, the Two-step Hi-c UNsupervised DEconvolution appRoach (THUNDER). We conducted extensive simulations to test THUNDER based on combining two published single-cell Hi-C (scHi-C) datasets. THUNDER more accurately estimates the underlying cell type proportions compared to reference-free methods (e.g., TOAST, and NMF) and is more robust than reference-dependent methods (e.g. MuSiC). We further demonstrate the practical utility of THUNDER to estimate cell type proportions and identify cell-type-specific interactions in Hi-C data from adult human cortex tissue samples. THUNDER will be a useful tool in adjusting for varying cell type composition in population samples, facilitating valid and more powerful downstream analysis such as differential chromatin organization studies. Additionally, THUNDER estimated contact profiles provide a useful exploratory framework to investigate cell-type-specificity of the chromatin interactome while experimental data is still rare.  相似文献   

2.
Single cell Hi-C techniques enable one to study cell to cell variability in chromatin interactions. However, single cell Hi-C (scHi-C) data suffer severely from sparsity, that is, the existence of excess zeros due to insufficient sequencing depth. Complicating the matter further is the fact that not all zeros are created equal: some are due to loci truly not interacting because of the underlying biological mechanism (structural zeros); others are indeed due to insufficient sequencing depth (sampling zeros or dropouts), especially for loci that interact infrequently. Differentiating between structural zeros and dropouts is important since correct inference would improve downstream analyses such as clustering and discovery of subtypes. Nevertheless, distinguishing between these two types of zeros has received little attention in the single cell Hi-C literature, where the issue of sparsity has been addressed mainly as a data quality improvement problem. To fill this gap, in this paper, we propose HiCImpute, a Bayesian hierarchical model that goes beyond data quality improvement by also identifying observed zeros that are in fact structural zeros. HiCImpute takes spatial dependencies of scHi-C 2D data structure into account while also borrowing information from similar single cells and bulk data, when such are available. Through an extensive set of analyses of synthetic and real data, we demonstrate the ability of HiCImpute for identifying structural zeros with high sensitivity, and for accurate imputation of dropout values. Downstream analyses using data improved from HiCImpute yielded much more accurate clustering of cell types compared to using observed data or data improved by several comparison methods. Most significantly, HiCImpute-improved data have led to the identification of subtypes within each of the excitatory neuronal cells of L4 and L5 in the prefrontal cortex.  相似文献   

3.
Computational three-dimensional chromatin modeling has helped uncover principles of genome organization. Here, we discuss methods for modeling three-dimensional chromatin structures, with focus on a minimalistic polymer model which inverts population Hi-C into single-cell conformations. Utilizing only basic physical properties, this model reveals that a few specific Hi-C interactions can fold chromatin into conformations consistent with single-cell imaging, Dip-C, and FISH measurements. Aggregated single-cell chromatin conformations also reproduce Hi-C frequencies. This approach allows quantification of structural heterogeneity and discovery of many-body interaction units and has revealed additional insights, including (1) topologically associating domains as a byproduct of folding driven by specific interactions, (2) cell subpopulations with different structural scaffolds are developmental stage dependent, and (3) the functional landscape of many-body units within enhancer-rich regions. We also discuss these findings in relation to the genome structure–function relationship.  相似文献   

4.
5.
《Biophysical journal》2020,118(9):2220-2228
The one-dimensional information of genomic DNA is hierarchically packed inside the eukaryotic cell nucleus and organized in a three-dimensional (3D) space. Genome-wide chromosome conformation capture (Hi-C) methods have uncovered the 3D genome organization and revealed multiscale chromatin domains of compartments and topologically associating domains (TADs). Moreover, single-nucleosome live-cell imaging experiments have revealed the dynamic organization of chromatin domains caused by stochastic thermal fluctuations. However, the mechanism underlying the dynamic regulation of such hierarchical and structural chromatin units within the microscale thermal medium remains unclear. Microrheology is a way to measure dynamic viscoelastic properties coupling between thermal microenvironment and mechanical response. Here, we propose a new, to our knowledge, microrheology for Hi-C data to analyze the dynamic compliance property as a measure of rigidness and flexibility of genomic regions along with the time evolution. Our method allows the conversion of an Hi-C matrix into the spectrum of the dynamic rheological property along the genomic coordinate of a single chromosome. To demonstrate the power of the technique, we analyzed Hi-C data during the neural differentiation of mouse embryonic stem cells. We found that TAD boundaries behave as more rigid nodes than the intra-TAD regions. The spectrum clearly shows the dynamic viscoelasticity of chromatin domain formation at different timescales. Furthermore, we characterized the appearance of synchronous and liquid-like intercompartment interactions in differentiated cells. Together, our microrheology data derived from Hi-C data provide physical insights into the dynamics of the 3D genome organization.  相似文献   

6.
Understanding and characterising biochemical processes inside single cells requires experimental platforms that allow one to perturb and observe the dynamics of such processes as well as computational methods to build and parameterise models from the collected data. Recent progress with experimental platforms and optogenetics has made it possible to expose each cell in an experiment to an individualised input and automatically record cellular responses over days with fine time resolution. However, methods to infer parameters of stochastic kinetic models from single-cell longitudinal data have generally been developed under the assumption that experimental data is sparse and that responses of cells to at most a few different input perturbations can be observed. Here, we investigate and compare different approaches for calculating parameter likelihoods of single-cell longitudinal data based on approximations of the chemical master equation (CME) with a particular focus on coupling the linear noise approximation (LNA) or moment closure methods to a Kalman filter. We show that, as long as cells are measured sufficiently frequently, coupling the LNA to a Kalman filter allows one to accurately approximate likelihoods and to infer model parameters from data even in cases where the LNA provides poor approximations of the CME. Furthermore, the computational cost of filtering-based iterative likelihood evaluation scales advantageously in the number of measurement times and different input perturbations and is thus ideally suited for data obtained from modern experimental platforms. To demonstrate the practical usefulness of these results, we perform an experiment in which single cells, equipped with an optogenetic gene expression system, are exposed to various different light-input sequences and measured at several hundred time points and use parameter inference based on iterative likelihood evaluation to parameterise a stochastic model of the system.  相似文献   

7.
多细胞生物体的生存依赖于不同类型细胞特异性的功能分工,不同类型的细胞尽管基因组相同,但有其独特的发育过程和应对环境变化的能力。生物学的一大挑战就是揭示基因如何在正确的位置、正确的时间表达到正确的水平,最近出现了很多通过细胞类型特异性方法研究单细胞组学的工具,这些新技术使我们能通过空前分辨率,理解多细胞生物体内不同类型的单个细胞基因表达特点及其适应环境变化的机制。单细胞样品的获取一直是单细胞研究的一大技术瓶颈,因此本文将以如何获得起始材料为重点,探讨单细胞研究的样品标记、单细胞分离及获取、组学数据分析和结果验证等技术方法及其在植物研究中的应用。  相似文献   

8.
Construction of chromosomes 3D models based on single cell Hi-C data constitute an important challenge. We present a reconstruction approach, DPDchrom, that incorporates basic knowledge whether the reconstructed conformation should be coil-like or globular and spring relaxation at contact sites. In contrast to previously published protocols, DPDchrom can naturally form globular conformation due to the presence of explicit solvent. Benchmarking of this and several other methods on artificial polymer models reveals similar reconstruction accuracy at high contact density and DPDchrom advantage at low contact density. To compare 3D structures insensitively to spatial orientation and scale, we propose the Modified Jaccard Index. We analyzed two sources of the contact dropout: contact radius change and random contact sampling. We found that the reconstruction accuracy exponentially depends on the number of contacts per genomic bin allowing to estimate the reconstruction accuracy in advance. We applied DPDchrom to model chromosome configurations based on single-cell Hi-C data of mouse oocytes and found that these configurations differ significantly from a random one, that is consistent with other studies.  相似文献   

9.
BackgroundChromosome conformation capture and various derivative methods such as 4C, 5C and Hi-C have emerged as standard tools to analyze the three-dimensional organization of the genome in the nucleus. These methods employ ligation of diluted cross-linked chromatin complexes, intended to favor proximity-dependent, intra-complex ligation. During development of single-cell Hi-C, we devised an alternative Hi-C protocol with ligation in preserved nuclei rather than in solution. Here we directly compare Hi-C methods employing in-nucleus ligation with the standard in-solution ligation.ResultsWe show in-nucleus ligation results in consistently lower levels of inter-chromosomal contacts. Through chromatin mixing experiments we show that a significantly large fraction of inter-chromosomal contacts are the result of spurious ligation events formed during in-solution ligation. In-nucleus ligation significantly reduces this source of experimental noise, and results in improved reproducibility between replicates. We also find that in-nucleus ligation eliminates restriction fragment length bias found with in-solution ligation. These improvements result in greater reproducibility of long-range intra-chromosomal and inter-chromosomal contacts, as well as enhanced detection of structural features such as topologically associated domain boundaries.ConclusionsWe conclude that in-nucleus ligation captures chromatin interactions more consistently over a wider range of distances, and significantly reduces both experimental noise and bias. In-nucleus ligation creates higher quality Hi-C libraries while simplifying the experimental procedure. We suggest that the entire range of 3C applications are likely to show similar benefits from in-nucleus ligation.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-015-0753-7) contains supplementary material, which is available to authorized users.  相似文献   

10.
Due to the growth of interest in single-cell genomics, computational methods for distinguishing true variants from artifacts are highly desirable. While special attention has been paid to false positives in variant or mutation calling from single-cell sequencing data, an equally important but often neglected issue is that of false negatives derived from allele dropout during the amplification of single cell genomes. In this paper, we propose a simple strategy to reduce the false negatives in single-cell sequencing data analysis. Simulation results show that this method is highly reliable, with an error rate of 4.94×10-5, which is orders of magnitude lower than the expected false negative rate (~34%) estimated from a single-cell exome dataset, though the method is limited by the low SNP density in the human genome. We applied this method to analyze the exome data of a few dozen single tumor cells generated in previous studies, and extracted cell specific mutation information for a small set of sites. Interestingly, we found that there are difficulties in using the classical clonal model of tumor cell growth to explain the mutation patterns observed in some tumor cells.  相似文献   

11.
Many computational methods have been developed to discern intratumor heterogeneity (ITH) using DNA sequence data from bulk tumor samples. These methods share an assumption that two mutations arise from the same subclone if they have similar mutant allele-frequencies (MAFs), and thus it is difficult or impossible to distinguish two subclones with similar MAFs. Single-cell DNA sequencing (scDNA-seq) data can be very informative for ITH inference. However, due to the difficulty of DNA amplification, scDNA-seq data are often very noisy. A promising new study design is to collect both bulk and single-cell DNA-seq data and jointly analyze them to mitigate the limitations of each data type. To address the analytic challenges of this new study design, we propose a computational method named BaSiC (B ulk tumor a nd Si ngle C ell), to discern ITH by jointly analyzing DNA-seq data from bulk tumor and single cells. We demonstrate that BaSiC has comparable or better performance than the methods using either data type. We further evaluate BaSiC using bulk tumor and single-cell DNA-seq data from a breast cancer patient and several leukemia patients.  相似文献   

12.
13.
Genome-wide chromatin interaction analysis has become important for understanding 3D topological structure of a genome as well as for linking distal cis-regulatory elements to their target genes. Compared to the Hi-C method, chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) is unique, in that one can interrogate thousands of chromatin interactions (in a genome) mediated by a specific protein of interest at high resolution and reasonable cost. However, because of the noisy nature of the data, efficient analytical tools have become necessary. Here, we review some new computational methods recently developed by us and compare them with other existing methods. Our intention is to help readers to better understand ChIA-PET results and to guide the users on selection of the most appropriate tools for their own projects.  相似文献   

14.
Chromosomal translocations are frequent features of cancer genomes that contribute to disease progression. These rearrangements result from formation and illegitimate repair of DNA double-strand breaks (DSBs), a process that requires spatial colocalization of chromosomal breakpoints. The “contact first” hypothesis suggests that translocation partners colocalize in the nuclei of normal cells, prior to rearrangement. It is unclear, however, the extent to which spatial interactions based on three-dimensional genome architecture contribute to chromosomal rearrangements in human disease. Here we intersect Hi-C maps of three-dimensional chromosome conformation with collections of 1,533 chromosomal translocations from cancer and germline genomes. We show that many translocation-prone pairs of regions genome-wide, including the cancer translocation partners BCR-ABL and MYC-IGH, display elevated Hi-C contact frequencies in normal human cells. Considering tissue specificity, we find that translocation breakpoints reported in human hematologic malignancies have higher Hi-C contact frequencies in lymphoid cells than those reported in sarcomas and epithelial tumors. However, translocations from multiple tissue types show significant correlation with Hi-C contact frequencies, suggesting that both tissue-specific and universal features of chromatin structure contribute to chromosomal alterations. Our results demonstrate that three-dimensional genome architecture shapes the landscape of rearrangements directly observed in human disease and establish Hi-C as a key method for dissecting these effects.  相似文献   

15.
染色体的空间交互作用被视为影响基因表达调控的重要因素,高通量染色体构象捕获(high-throughput chromosome conformation capture,Hi-C)技术已成为3D基因组学中探索染色体空间交互作用的主要实验手段之一。随着Hi-C样本数据的持续累积以及分析处理流程复杂度的不断提升,基于生物信息学的Hi-C数据分析对探究基因表达的时空调控机制而言,是机遇也是挑战。本文从生物信息学角度,综合阐述了Hi-C的国内外研究现状及发展动态,包括数据标准化、多级结构分析、数据可视化以及三维建模,重点剖析了多级结构中的A/B区室(A/B compartments)、拓扑相关域(topological associated domains,TADs)和染色质环(chromain looping),在此基础上分析了该方向未来可能的研究热点及发展趋势,以期为将基因表达调控的探索从传统线性空间进一步拓展到三维结构空间提供支持。  相似文献   

16.
Over the last decade the 3C-based (Chromosome Conformation Capture, 3C) approaches have been developed to describe the frequency of chromatin interaction. The invention of Hi-C allows us to obtain genome-wide chromatin interaction map. However, it is challenging to develop efficient and robust analytical tools to interpret the Hi-C data. Here we present a new method called Clustering based Hi-C Domain Finder (CHDF), which is based on the difference of interaction intensity inside/outside domains, to identify Hi-C domains. We also compared CHDF with existing methods including Direction Index (DI) and HiCseg. CHDF can define more chromatin domains validated by higher resolution local chromatin structure data (Chromosome Conformation Capture Carbon Copy (5C) data). Using Hi-C data of lower sequencing depth, chromatin structure identified by CHDF is closer to that discovered by data of higher sequencing depth. Furthermore, the implement of CHDF is faster than the other two. Using CHDF, we are potentially able to discover more hints and clues about chromatin structural elements at domain level.  相似文献   

17.
Chromosomes are giant chain molecules organized into an ensemble of three-dimensional structures characterized with its genomic state and the corresponding biological functions. Despite the strong cell-to-cell heterogeneity, the cell-type specific pattern demonstrated in high-throughput chromosome conformation capture (Hi-C) data hints at a valuable link between structure and function, which makes inference of chromatin domains (CDs) from the pattern of Hi-C a central problem in genome research. Here we present a unified method for analyzing Hi-C data to determine spatial organization of CDs over multiple genomic scales. By applying statistical physics-based clustering analysis to a polymer physics model of the chromosome, our method identifies the CDs that best represent the global pattern of correlation manifested in Hi-C. The multi-scale intra-chromosomal structures compared across different cell types uncover the principles underlying the multi-scale organization of chromatin chain: (i) Sub-TADs, TADs, and meta-TADs constitute a robust hierarchical structure. (ii) The assemblies of compartments and TAD-based domains are governed by different organizational principles. (iii) Sub-TADs are the common building blocks of chromosome architecture. Our physically principled interpretation and analysis of Hi-C not only offer an accurate and quantitative view of multi-scale chromatin organization but also help decipher its connections with genome function.  相似文献   

18.
Data alignment is one of the first key steps in single cell analysis for integrating multiple datasets and performing joint analysis across studies. Data alignment is challenging in extremely large datasets, however, as the major of the current single cell data alignment methods are not computationally efficient. Here, we present VIPCCA, a computational framework based on non-linear canonical correlation analysis for effective and scalable single cell data alignment. VIPCCA leverages both deep learning for effective single cell data modeling and variational inference for scalable computation, thus enabling powerful data alignment across multiple samples, multiple data platforms, and multiple data types. VIPCCA is accurate for a range of alignment tasks including alignment between single cell RNAseq and ATACseq datasets and can easily accommodate millions of cells, thereby providing researchers unique opportunities to tackle challenges emerging from large-scale single-cell atlas.  相似文献   

19.
Hi-C experiments produce large numbers of DNA sequence read pairs that are typically analyzed to deduce genomewide interactions between arbitrary loci. A key step in these experiments is the cleavage of cross-linked chromatin with a restriction endonuclease. Although this cleavage should happen specifically at the enzyme''s recognition sequence, an unknown proportion of cleavage events may involve other sequences, owing to the enzyme''s star activity or to random DNA breakage. A quantitative estimation of these non-specific cleavages may enable simulating realistic Hi-C read pairs for validation of downstream analyses, monitoring the reproducibility of experimental conditions and investigating biophysical properties that correlate with DNA cleavage patterns. Here we describe a computational method for analyzing Hi-C read pairs to estimate the fractions of cleavages at different possible targets. The method relies on expressing an observed local target distribution downstream of aligned reads as a linear combination of known conditional local target distributions. We validated this method using Hi-C read pairs obtained by computer simulation. Application of the method to experimental Hi-C datasets from murine cells revealed interesting similarities and differences in patterns of cleavage across the various experiments considered.  相似文献   

20.
Significant efforts have been recently made to obtain the three-dimensional (3D) structure of the genome with the goal of understanding how structures may affect gene regulation and expression. Chromosome conformational capture techniques such as Hi-C, have been key in uncovering the quantitative information needed to determine chromatin organization. Complementing these experimental tools, co-polymers theoretical methods are necessary to determine the ensemble of three-dimensional structures associated to the experimental data provided by Hi-C maps. Going beyond just structural information, these theoretical advances also start to provide an understanding of the underlying mechanisms governing genome assembly and function. Recent theoretical work, however, has been focused on single chromosome structures, missing the fact that, in the full nucleus, interactions between chromosomes play a central role in their organization. To overcome this limitation, MiChroM (Minimal Chromatin Model) has been modified to become capable of performing these multi-chromosome simulations. It has been upgraded into a fast and scalable software version, which is able to perform chromosome simulations using GPUs via OpenMM Python API, called Open-MiChroM. To validate the efficiency of this new version, analyses for GM12878 individual autosomes were performed and compared to earlier studies. This validation was followed by multi-chain simulations including the four largest human chromosomes (C1-C4). These simulations demonstrated the full power of this new approach. Comparison to Hi-C data shows that these multiple chromosome interactions are essential for a more accurate agreement with experimental results. Without any changes to the original MiChroM potential, it is now possible to predict experimentally observed inter-chromosome contacts. This scalability of Open-MiChroM allow for more audacious investigations, looking at interactions of multiple chains as well as moving towards higher resolution chromosomes models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号