期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Chromatin segmentation based on a probabilistic model for read counts explains a large portion of the epigenome

Alessandro Mammana Ho-Ryun Chung 《Genome biology》2015,16(1)

Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is an increasingly common experimental approach to generate genome-wide maps of histone modifications and to dissect the complexity of the epigenome. Here, we propose EpiCSeg: a novel algorithm that combines several histone modification maps for the segmentation and characterization of cell-type specific epigenomic landscapes. By using an accurate probabilistic model for the read counts, EpiCSeg provides a useful annotation for a considerably larger portion of the genome, shows a stronger association with validation data, and yields more consistent predictions across replicate experiments when compared to existing methods.The software is available at http://github.com/lamortenera/epicseg

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-015-0708-z) contains supplementary material, which is available to authorized users. 相似文献

2.

Detecting genomic deletions from high-throughput sequence data with unsupervised learning

Li Xin Wu Yufeng 《BMC bioinformatics》2023,23(8):1-16

Background

Structural variation (SV), which ranges from 50 bp to \(\sim\) 3 Mb in size, is an important type of genetic variations. Deletion is a type of SV in which a part of a chromosome or a sequence of DNA is lost during DNA replication. Three types of signals, including discordant read-pairs, reads depth and split reads, are commonly used for SV detection from high-throughput sequence data. Many tools have been developed for detecting SVs by using one or multiple of these signals.

Results

In this paper, we develop a new method called EigenDel for detecting the germline submicroscopic genomic deletions. EigenDel first takes advantage of discordant read-pairs and clipped reads to get initial deletion candidates, and then it clusters similar candidates by using unsupervised learning methods. After that, EigenDel uses a carefully designed approach for calling true deletions from each cluster. We conduct various experiments to evaluate the performance of EigenDel on low coverage sequence data.

Conclusions

Our results show that EigenDel outperforms other major methods in terms of improving capability of balancing accuracy and sensitivity as well as reducing bias. EigenDel can be downloaded from https://github.com/lxwgcool/EigenDel.

相似文献

3.

ENIGMA: an enterotype-like unigram mixture model for microbial association analysis

Abe Ko Hirayama Masaaki Ohno Kinji Shimamura Teppei 《BMC genomics》2019,20(2):63-75

Background

One of the major challenges in microbial studies is detecting associations between microbial communities and a specific disease. A specialized feature of microbiome count data is that intestinal bacterial communities form clusters called as “enterotype”, which are characterized by differences in specific bacterial taxa, making it difficult to analyze these data under health and disease conditions. Traditional probabilistic modeling cannot distinguish between the bacterial differences derived from enterotype and those related to a specific disease.

Results

We propose a new probabilistic model, named as ENIGMA (Enterotype-like uNIGram mixture model for Microbial Association analysis), which can be used to address these problems. ENIGMA enabled simultaneous estimation of enterotype-like clusters characterized by the abundances of signature bacterial genera and the parameters of environmental effects associated with the disease.

Conclusion

In the simulation study, we evaluated the accuracy of parameter estimation. Furthermore, by analyzing the real-world data, we detected the bacteria related to Parkinson’s disease. ENIGMA is implemented in R and is available from GitHub (https://github.com/abikoushi/enigma).

相似文献

4.

Human Histone Interaction Networks: An Old Concept,New Trends

《Journal of molecular biology》2021,433(6):166684

To elucidate the properties of human histone interactions on the large scale, we perform a comprehensive mapping of human histone interaction networks by using data from structural, chemical cross-linking and various high-throughput studies. Histone interactomes derived from different data sources show limited overlap and complement each other. It inspires us to integrate these data into the combined histone global interaction network which includes 5308 proteins and 10,330 interactions. The analysis of topological properties of the human histone interactome reveals its scale free behavior and high modularity. Our study of histone binding interfaces uncovers a remarkably high number of residues involved in interactions between histones and non-histone proteins, 80–90% of residues in histones H3 and H4 have at least one binding partner. Two types of histone binding modes are detected: interfaces conserved in most histone variants and variant specific interfaces. Finally, different types of chromatin factors recognize histones in nucleosomes via distinct binding modes, and many of these interfaces utilize acidic patches among other sites. Interaction networks are available at https://github.com/Panchenko-Lab/Human-histone-interactome. 相似文献

5.

HomBlocks: A multiple-alignment construction pipeline for organelle phylogenomics based on locally collinear block searching

Guiqi Bi Yunxiang Mao Qikun Xing Min Cao 《Genomics》2018,110(1):18-22

Organelle phylogenomic analysis requires precisely constructed multi-gene alignment matrices concatenated by pre-aligned single gene datasets. For non-bioinformaticians, it can take days to weeks to manually create high-quality multi-gene alignments comprising tens or hundreds of homologous genes. Here, we describe a new and highly efficient pipeline, HomBlocks, which uses a homologous block searching method to construct multiple sequence alignment. This approach can automatically recognize locally collinear blocks among organelle genomes and excavate phylogenetically informative regions to construct multiple sequence alignment in a few hours. In addition, HomBlocks supports organelle genomes without annotation and makes adjustment to different taxon datasets, thereby enabling the inclusion of as many common genes as possible. Topology comparison of trees built by conventional multi-gene and HomBlocks alignments implemented in different taxon categories shows that the same efficiency can be achieved by HomBlocks as when using the traditional method. The availability of Homblocks makes organelle phylogenetic analyses more accessible to non-bioinformaticians, thereby promising to lead to a better understanding of phylogenic relationships at an organelle genome level.

Availability and implementation

HomBlocks is implemented in Perl and is supported by Unix-like operative systems, including Linux and macOS. The Perl source code is freely available for download from https://github.com/fenghen360/HomBlocks.git, and documentation and tutorials are available at https://github.com/fenghen360/HomBlocks.Contact: yxmao@ouc.edu.cn or fenghen360@126.com 相似文献

6.

RGAAT: A Reference-based Genome Assembly and Annotation Tool for New Genomes and Upgrade of Known Genomes

Wanfei Liu Shuangyang Wu Qiang Lin Shenghan Gao Feng Ding Xiaowei Zhang Hasan Awad Aljohi Jun Yu Songnian Hu 《基因组蛋白质组与生物信息学报(英文版)》2018,16(5):373-381

The rapid development of high-throughput sequencing technologies has led to a dramatic decrease in the money and time required for de novo genome sequencing or genome resequencing projects, with new genome sequences constantly released every week. Among such projects, the plethora of updated genome assemblies induces the requirement of version-dependent annotation files and other compatible public dataset for downstream analysis. To handle these tasks in an efficient manner, we developed the reference-based genome assembly and annotation tool (RGAAT), a flexible toolkit for resequencing-based consensus building and annotation update. RGAAT can detect sequence variants with comparable precision, specificity, and sensitivity to GATK and with higher precision and specificity than Freebayes and SAMtools on four DNA-seq datasets tested in this study. RGAAT can also identify sequence variants based on cross-cultivar or cross-version genomic alignments. Unlike GATK and SAMtools/BCFtools, RGAAT builds the consensus sequence by taking into account the true allele frequency. Finally, RGAAT generates a coordinate conversion file between the reference and query genomes using sequence variants and supports annotation file transfer. Compared to the rapid annotation transfer tool (RATT), RGAAT displays better performance characteristics for annotation transfer between different genome assemblies, strains, and species. In addition, RGAAT can be used for genome modification, genome comparison, and coordinate conversion. RGAAT is available at https://sourceforge.net/projects/rgaat/ and https://github.com/wushyer/RGAAT_v2 at no cost. 相似文献

7.

A consensus multi-view multi-objective gene selection approach for improved sample classification

Acharya Sudipta Cui Laizhong Pan Yi 《BMC bioinformatics》2020,21(13):1-15

Background

High-dimensional flow cytometry and mass cytometry allow systemic-level characterization of more than 10 protein profiles at single-cell resolution and provide a much broader landscape in many biological applications, such as disease diagnosis and prediction of clinical outcome. When associating clinical information with cytometry data, traditional approaches require two distinct steps for identification of cell populations and statistical test to determine whether the difference between two population proportions is significant. These two-step approaches can lead to information loss and analysis bias.

Results

We propose a novel statistical framework, called LAMBDA (Latent Allocation Model with Bayesian Data Analysis), for simultaneous identification of unknown cell populations and discovery of associations between these populations and clinical information. LAMBDA uses specified probabilistic models designed for modeling the different distribution information for flow or mass cytometry data, respectively. We use a zero-inflated distribution for the mass cytometry data based the characteristics of the data. A simulation study confirms the usefulness of this model by evaluating the accuracy of the estimated parameters. We also demonstrate that LAMBDA can identify associations between cell populations and their clinical outcomes by analyzing real data. LAMBDA is implemented in R and is available from GitHub (https://github.com/abikoushi/lambda).

相似文献

8.

Functional annotation of noncoding causal variants in autoimmune diseases

《Genomics》2020,112(2):1208-1213

相似文献

9.

The coupling of epigenome replication with DNA replication

Liu Q Gong Z 《Current opinion in plant biology》2011,14(2):187-194

In multicellular organisms, each cell contains the same DNA sequence, but with different epigenetic information that determines the cell specificity. Semi-conservative DNA replication faithfully copies the parental nucleotide sequence into two DNA daughter strands during each cell cycle. At the same time, epigenetic marks such as DNA methylation and histone modifications are either precisely transmitted to the daughter cells or dynamically changed during S-phase. Recent studies indicate that in each cell cycle, many DNA replication related proteins are involved in not only genomic but also epigenomic replication. Histone modification proteins, chromatin remodeling proteins, histone variants, and RNAs participate in the epigenomic replication during S-phase. As a consequence, epigenome replication is closely linked with DNA replication during S-phase. 相似文献

10.

Discovering cooperative relationships of chromatin modifications in human T cells based on a proposed closeness measure

Lv J Qiao H Liu H Wu X Zhu J Su J Wang F Cui Y Zhang Y 《PloS one》2010,5(12):e14219

相似文献

11.

Genome-wide analysis of epigenetic dynamics across human developmental stages and tissues

Zhang Xia Gan Yanglan Zou Guobing Guan Jihong Zhou Shuigeng 《BMC genomics》2019,20(2):153-162

Background

Epigenome is highly dynamic during the early stages of embryonic development. Epigenetic modifications provide the necessary regulation for lineage specification and enable the maintenance of cellular identity. Given the rapid accumulation of genome-wide epigenomic modification maps across cellular differentiation process, there is an urgent need to characterize epigenetic dynamics and reveal their impacts on differential gene regulation.

Methods

We proposed DiffEM, a computational method for differential analysis of epigenetic modifications and identified highly dynamic modification sites along cellular differentiation process. We applied this approach to investigating 6 epigenetic marks of 20 kinds of human early developmental stages and tissues, including hESCs, 4 hESC-derived lineages and 15 human primary tissues.

Results

We identified highly dynamic modification sites where different cell types exhibit distinctive modification patterns, and found that these highly dynamic sites enriched in the genes related to cellular development and differentiation. Further, to evaluate the effectiveness of our method, we correlated the dynamics scores of epigenetic modifications with the variance of gene expression, and compared the results of our method with those of the existing algorithms. The comparison results demonstrate the power of our method in evaluating the epigenetic dynamics and identifying highly dynamic regions along cell differentiation process.

相似文献

12.

Modeling multi-species RNA modification through multi-task curriculum learning

Yuanpeng Xiong Xuan He Dan Zhao Tingzhong Tian Lixiang Hong Tao Jiang Jianyang Zeng 《Nucleic acids research》2021,49(7):3719

相似文献

13.

A Matrix Protein Silences Transposons and Repeats through Interaction with Retinoblastoma-Associated Proteins

Yifeng Xu Yizhong Wang Hume Stroud Xiaofeng Gu Bo Sun Eng-Seng Gan Kian-Hong Ng Steven E. Jacobsen Yuehui He Toshiro Ito 《Current biology : CB》2013,23(4):345-350

Download : Download high-res image (133KB)
Download : Download full-size image

Highlights? The Arabidopsis matrix protein TEK silences transposons and repeat-containing genes ? Binding of TEK on targets affects chromatin conformation and histone modifications ? TEK protein associates with FVE/MSI5-containing histone deacetylation complex ? TEK directs repressive modification as a key structural component in gene silencing 相似文献

14.

Improving the sensitivity of long read overlap detection using grouped short k-mer matches

Du Nan Chen Jiao Sun Yanni 《BMC genomics》2019,20(2):49-62

Background

Single-molecule, real-time sequencing (SMRT) developed by Pacific BioSciences produces longer reads than second-generation sequencing technologies such as Illumina. The increased read length enables PacBio sequencing to close gaps in genome assembly, reveal structural variations, and characterize the intra-species variations. It also holds the promise to decipher the community structure in complex microbial communities because long reads help metagenomic assembly. One key step in genome assembly using long reads is to quickly identify reads forming overlaps. Because PacBio data has higher sequencing error rate and lower coverage than popular short read sequencing technologies (such as Illumina), efficient detection of true overlaps requires specially designed algorithms. In particular, there is still a need to improve the sensitivity of detecting small overlaps or overlaps with high error rates in both reads. Addressing this need will enable better assembly for metagenomic data produced by third-generation sequencing technologies.

Results

In this work, we designed and implemented an overlap detection program named GroupK, for third-generation sequencing reads based on grouped k-mer hits. While using k-mer hits for detecting reads’ overlaps has been adopted by several existing programs, our method uses a group of short k-mer hits satisfying statistically derived distance constraints to increase the sensitivity of small overlap detection. Grouped k-mer hit was originally designed for homology search. We are the first to apply group hit for long read overlap detection. The experimental results of applying our pipeline to both simulated and real third-generation sequencing data showed that GroupK enables more sensitive overlap detection, especially for datasets of low sequencing coverage.

Conclusions

GroupK is best used for detecting small overlaps for third-generation sequencing data. It provides a useful supplementary tool to existing ones for more sensitive and accurate overlap detection. The source code is freely available at https://github.com/Strideradu/GroupK.

相似文献

15.

Genome-wide promoter methylation analysis in neuroblastoma identifies prognostic methylation biomarkers

Anneleen Decock Maté Ongenaert Jasmien Hoebeeck Katleen De Preter Gert Van Peer Wim Van Criekinge Ruth Ladenstein Johannes H Schulte Rosa Noguera Raymond L Stallings An Van Damme Geneviève Laureys Joëlle Vermeulen Tom Van Maerken Frank Speleman Jo Vandesompele 《Genome biology》2012,13(10):1-15

ChIP-seq is a powerful method for obtaining genome-wide maps of protein-DNA interactions and epigenetic modifications. CHANCE (CHip-seq ANalytics and Confidence Estimation) is a standalone package for ChIP-seq quality control and protocol optimization. Our user-friendly graphical software quickly estimates the strength and quality of immunoprecipitations, identifies biases, compares the user's data with ENCODE's large collection of published datasets, performs multi-sample normalization, checks against quantitative PCR-validated control regions, and produces informative graphical reports. CHANCE is available at https://github.com/songlab/chance. 相似文献

16.

Interactive web-based identification and visualization of transcript shared sequences

Alaleh Azhir Louis-Henri Merino David W. Nauen 《Genomics》2019,111(4):860-862

相似文献

17.

Adaptation of the Hierarchical Factor Segmentation method to noisy activity data

Kazuki Sakura Kazuyoshi Yasugi 《Chronobiology international》2013,30(8):1131-1137

ABSTRACT

The Hierarchical Factor Segmentation (HFS) method is a non-parametric statistical method for detection of the phase of a biological rhythm shown in an actogram. The detection accuracy of this method was measured on actograms showing only circadian rhythms with a constant ratio of signal to noise (S/N). In the present study, we generated 84 types of artificial actograms including circadian or circatidal rhythms by using three parameters: α/ρ, S/N and period length τ, and evaluated the effectiveness of our devised adaptation of the HFS method, the cycle-by-cycle adaptation. The results showed the effectiveness of the cycle-by-cycle adaptation was high even though S/N or τ was fluctuating through a whole actogram. These suggested that the cycle-by-cycle adaptation could be effectively applied to various kinds of rhythmic activity data. The C++ source code of the cycle-by-cycle adaptation is available on the website at https://github.com/KazukiSakura/cHFS.git. 相似文献

18.

Two-Way Horizontal and Vertical Omics Integration for Disease Subtype Discovery

Huo Zhiguang Zhu Li Ma Tianzhou Liu Hongcheng Han Song Liao Daiqing Zhao Jinying Tseng George 《Statistics in biosciences》2020,12(1):1-22

Disease subtype discovery is an essential step in delivering personalized medicine. Disease subtyping via omics data has become a common approach for this purpose. With the advancement of technology and the lower price for generating omics data, multi-level and multi-cohort omics data are prevalent in the public domain, providing unprecedented opportunities to decrypt disease mechanisms. How to fully utilize multi-level/multi-cohort omics data and incorporate established biological knowledge toward disease subtyping remains a challenging problem. In this paper, we propose a meta-analytic integrative sparse Kmeans (MISKmeans) algorithm for integrating multi-cohort/multi-level omics data and prior biological knowledge. Compared with previous methods, MISKmeans shows better clustering accuracy and feature selection relevancy. An efficient R package, “MIS-Kmeans”, calling C++ is freely available on GitHub (https://github.com/Caleb-Huo/MIS-Kmeans).

相似文献

19.

Enhancing breakpoint resolution with deep segmentation model: A general refinement method for read-depth based structural variant callers

Yao-zhong Zhang Seiya Imoto Satoru Miyano Rui Yamaguchi 《PLoS computational biology》2021,17(10)

Read-depths (RDs) are frequently used in identifying structural variants (SVs) from sequencing data. For existing RD-based SV callers, it is difficult for them to determine breakpoints in single-nucleotide resolution due to the noisiness of RD data and the bin-based calculation. In this paper, we propose to use the deep segmentation model UNet to learn base-wise RD patterns surrounding breakpoints of known SVs. We integrate model predictions with an RD-based SV caller to enhance breakpoints in single-nucleotide resolution. We show that UNet can be trained with a small amount of data and can be applied both in-sample and cross-sample. An enhancement pipeline named RDBKE significantly increases the number of SVs with more precise breakpoints on simulated and real data. The source code of RDBKE is freely available at https://github.com/yaozhong/deepIntraSV. 相似文献

20.

Detecting DNA Modifications from SMRT Sequencing Data by Modeling Sequence Context Dependence of Polymerase Kinetic

Zhixing Feng Gang Fang Jonas Korlach Tyson Clark Khai Luong Xuegong Zhang Wing Wong Eric Schadt 《PLoS computational biology》2013,9(3)

DNA modifications such as methylation and DNA damage can play critical regulatory roles in biological systems. Single molecule, real time (SMRT) sequencing technology generates DNA sequences as well as DNA polymerase kinetic information that can be used for the direct detection of DNA modifications. We demonstrate that local sequence context has a strong impact on DNA polymerase kinetics in the neighborhood of the incorporation site during the DNA synthesis reaction, allowing for the possibility of estimating the expected kinetic rate of the enzyme at the incorporation site using kinetic rate information collected from existing SMRT sequencing data (historical data) covering the same local sequence contexts of interest. We develop an Empirical Bayesian hierarchical model for incorporating historical data. Our results show that the model could greatly increase DNA modification detection accuracy, and reduce requirement of control data coverage. For some DNA modifications that have a strong signal, a control sample is not even needed by using historical data as alternative to control. Thus, sequencing costs can be greatly reduced by using the model. We implemented the model in a R package named seqPatch, which is available at https://github.com/zhixingfeng/seqPatch. 相似文献