期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles

Altuna Akalin Matthias Kormaksson Sheng Li Francine E Garrett-Bakelman Maria E Figueroa Ari Melnick Christopher E Mason 《Genome biology》2012,13(10):R87

DNA methylation is a chemical modification of cytosine bases that is pivotal for gene regulation, cellular specification and cancer development. Here, we describe an R package, methylKit, that rapidly analyzes genome-wide cytosine epigenetic profiles from high-throughput methylation and hydroxymethylation sequencing experiments. methylKit includes functions for clustering, sample quality visualization, differential methylation analysis and annotation features, thus automating and simplifying many of the steps for discerning statistically significant bases or regions of DNA methylation. Finally, we demonstrate methylKit on breast cancer data, in which we find statistically significant regions of differential methylation and stratify tumor subtypes. methylKit is available at http://code.google.com/p/methylkit. 相似文献

2.

FadE: whole genome methylation analysis for multiple sequencing platforms

Tade Souaiaia Zheng Zhang Ting Chen 《Nucleic acids research》2013,41(1):e14

DNA methylation plays a central role in genomic regulation and disease. Sodium bisulfite treatment (SBT) causes unmethylated cytosines to be sequenced as thymine, which allows methylation levels to reflected in the number of ‘C’-‘C’ alignments covering reference cytosines. Di-base color reads produced by lifetech’s SOLiD sequencer provide unreliable results when translated to bases because single sequencing errors effect the downstream sequence. We describe FadE, an algorithm to accurately determine genome-wide methylation rates directly in color or nucleotide space. FadE uses SBT unmethylated and untreated data to determine background error rates and incorporate them into a model which uses Newton–Raphson optimization to estimate the methylation rate and provide a credible interval describing its distribution at every reference cytosine. We sequenced two slides of human fibroblast cell-line bisulfite-converted fragment library with the SOLiD sequencer to investigate genome-wide methylation levels. FadE reported widespread differences in methylation levels across CpG islands and a large number of differentially methylated regions adjacent to genes which compares favorably to the results of an investigation on the same cell-line using nucleotide-space reads at higher coverage levels, suggesting that FadE is an accurate method to estimate genome-wide methylation with color or nucleotide reads. http://code.google.com/p/fade/. 相似文献

3.

MOABS: model based analysis of bisulfite sequencing data

Deqiang Sun Yuanxin Xi Benjamin Rodriguez Hyun Jung Park Pan Tong Mira Meong Margaret A Goodell Wei Li 《Genome biology》2014,15(2):R38

Bisulfite sequencing (BS-seq) is the gold standard for studying genome-wide DNA methylation. We developed MOABS to increase the speed, accuracy, statistical power and biological relevance of BS-seq data analysis. MOABS detects differential methylation with 10-fold coverage at single-CpG resolution based on a Beta-Binomial hierarchical model and is capable of processing two billion reads in 24 CPU hours. Here, using simulated and real BS-seq data, we demonstrate that MOABS outperforms other leading algorithms, such as Fisher’s exact test and BSmooth. Furthermore, MOABS analysis can be easily extended to differential 5hmC analysis using RRBS and oxBS-seq. MOABS is available at http://code.google.com/p/moabs/. 相似文献

4.

BatMeth: improved mapper for bisulfite sequencing reads on DNA methylation

Jing-Quan Lim Chandana Tennakoon Guoliang Li Eleanor Wong Yijun Ruan Chia-Lin Wei Wing-Kin Sung 《Genome biology》2012,13(10):1-14

DNA methylation plays a crucial role in higher organisms. Coupling bisulfite treatment with next generation sequencing enables the interrogation of 5-methylcytosine sites in the genome. However, bisulfite conversion introduces mismatches between the reads and the reference genome, which makes mapping of Illumina and SOLiD reads slow and inaccurate. BatMeth is an algorithm that integrates novel Mismatch Counting, List Filtering, Mismatch Stage Filtering and Fast Mapping onto Two Indexes components to improve unique mapping rate, speed and precision. Experimental results show that BatMeth is faster and more accurate than existing tools. BatMeth is freely available at http://code.google.com/p/batmeth/. 相似文献

5.

Dynamic evolution of clonal epialleles revealed by methclone

Sheng Li Francine Garrett-Bakelman Alexander E Perl Selina M Luger Chao Zhang Bik L To Ian D Lewis Anna L Brown Richard J D’Andrea M Elizabeth Ross Ross Levine Martin Carroll Ari Melnick Christopher E Mason 《Genome biology》2014,15(9)

We describe methclone, a novel method to identify epigenetic loci that harbor large changes in the clonality of their epialleles (epigenetic alleles). Methclone efficiently analyzes genome-wide DNA methylation sequencing data. We quantify the changes using a composition entropy difference calculation and also introduce a new measure of global clonality shift, loci with epiallele shift per million loci covered, which enables comparisons between different samples to gauge overall epiallelic dynamics. Finally, we demonstrate the utility of methclone in capturing functional epiallele shifts in leukemia patients from diagnosis to relapse. Methclone is open-source and freely available at https://code.google.com/p/methclone.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-014-0472-5) contains supplementary material, which is available to authorized users. 相似文献

6.

DNA methylation profiling of the human major histocompatibility complex: a pilot study for the human epigenome project 总被引：10，自引：0，他引：10

下载免费PDF全文

Rakyan VK Hildmann T Novik KL Lewin J Tost J Cox AV Andrews TD Howe KL Otto T Olek A Fischer J Gut IG Berlin K Beck S 《PLoS biology》2004,2(12):e405

The Human Epigenome Project aims to identify, catalogue, and interpret genome-wide DNA methylation phenomena. Occurring naturally on cytosine bases at cytosine–guanine dinucleotides, DNA methylation is intimately involved in diverse biological processes and the aetiology of many diseases. Differentially methylated cytosines give rise to distinct profiles, thought to be specific for gene activity, tissue type, and disease state. The identification of such methylation variable positions will significantly improve our understanding of genome biology and our ability to diagnose disease. Here, we report the results of the pilot study for the Human Epigenome Project entailing the methylation analysis of the human major histocompatibility complex. This study involved the development of an integrated pipeline for high-throughput methylation analysis using bisulphite DNA sequencing, discovery of methylation variable positions, epigenotyping by matrix-assisted laser desorption/ionisation mass spectrometry, and development of an integrated public database available at http://www.epigenome.org. Our analysis of DNA methylation levels within the major histocompatibility complex, including regulatory exonic and intronic regions associated with 90 genes in multiple tissues and individuals, reveals a bimodal distribution of methylation profiles (i.e., the vast majority of the analysed regions were either hypo- or hypermethylated), tissue specificity, inter-individual variation, and correlation with independent gene expression data. 相似文献

7.

Masking as an effective quality control method for next-generation sequencing data analysis

Sajung Yun Sijung Yun 《BMC bioinformatics》2014,15(1)

相似文献

8.

Exploratory Analysis of the Copy Number Alterations in Glioblastoma Multiforme

Pablo Freire Marco Vilela Helena Deus Yong-Wan Kim Dimpy Koul Howard Colman Kenneth D. Aldape Oliver Bogler W. K. Alfred Yung Kevin Coombes Gordon B. Mills Ana T. Vasconcelos Jonas S. Almeida 《PloS one》2008,3(12)

相似文献

9.

Corset: enabling differential gene expression analysis for de novo assembled transcriptomes

Nadia M Davidson Alicia Oshlack 《Genome biology》2014,15(7)

相似文献

10.

DistMap: A Toolkit for Distributed Short Read Mapping on a Hadoop Cluster

Ram Vinay Pandey Christian Schl?tterer 《PloS one》2013,8(8)

With the rapid and steady increase of next generation sequencing data output, the mapping of short reads has become a major data analysis bottleneck. On a single computer, it can take several days to map the vast quantity of reads produced from a single Illumina HiSeq lane. In an attempt to ameliorate this bottleneck we present a new tool, DistMap - a modular, scalable and integrated workflow to map reads in the Hadoop distributed computing framework. DistMap is easy to use, currently supports nine different short read mapping tools and can be run on all Unix-based operating systems. It accepts reads in FASTQ format as input and provides mapped reads in a SAM/BAM format. DistMap supports both paired-end and single-end reads thereby allowing the mapping of read data produced by different sequencing platforms. DistMap is available from http://code.google.com/p/distmap/ 相似文献

11.

PyroHMMsnp: an SNP caller for Ion Torrent and 454 sequencing data

Feng Zeng Rui Jiang Ting Chen 《Nucleic acids research》2013,41(13):e136

Both 454 and Ion Torrent sequencers are capable of producing large amounts of long high-quality sequencing reads. However, as both methods sequence homopolymers in one cycle, they both suffer from homopolymer uncertainty and incorporation asynchronization. In mapping, such sequencing errors could shift alignments around homopolymers and thus induce incorrect mismatches, which have become a critical barrier against the accurate detection of single nucleotide polymorphisms (SNPs). In this article, we propose a hidden Markov model (HMM) to statistically and explicitly formulate homopolymer sequencing errors by the overcall, undercall, insertion and deletion. We use a hierarchical model to describe the sequencing and base-calling processes, and we estimate parameters of the HMM from resequencing data by an expectation-maximization algorithm. Based on the HMM, we develop a realignment-based SNP-calling program, termed PyroHMMsnp, which realigns read sequences around homopolymers according to the error model and then infers the underlying genotype by using a Bayesian approach. Simulation experiments show that the performance of PyroHMMsnp is exceptional across various sequencing coverages in terms of sensitivity, specificity and F₁ measure, compared with other tools. Analysis of the human resequencing data shows that PyroHMMsnp predicts 12.9% more SNPs than Samtools while achieving a higher specificity. (http://code.google.com/p/pyrohmmsnp/). 相似文献

12.

Improved Inference of Gene Regulatory Networks through Integrated Bayesian Clustering and Dynamic Modeling of Time-Course Expression Data

Brian Godsey 《PloS one》2013,8(7)

Inferring gene regulatory networks from expression data is difficult, but it is common and often useful. Most network problems are under-determined–there are more parameters than data points–and therefore data or parameter set reduction is often necessary. Correlation between variables in the model also contributes to confound network coefficient inference. In this paper, we present an algorithm that uses integrated, probabilistic clustering to ease the problems of under-determination and correlated variables within a fully Bayesian framework. Specifically, ours is a dynamic Bayesian network with integrated Gaussian mixture clustering, which we fit using variational Bayesian methods. We show, using public, simulated time-course data sets from the DREAM4 Challenge, that our algorithm outperforms non-clustering methods in many cases (7 out of 25) with fewer samples, rarely underperforming (1 out of 25), and often selects a non-clustering model if it better describes the data. Source code (GNU Octave) for BAyesian Clustering Over Networks (BACON) and sample data are available at: http://code.google.com/p/bacon-for-genetic-networks. 相似文献

13.

Genome-Wide Analysis of DNA Methylation in Soybean

Qing-Xin Song Xiang Lu Qing-Tian Li Hui Chen Xing-Yu Hu Biao Ma Wan-Ke Zhang Shou-Yi Chen Jin-Song Zhang 《植物生理学报》2013,(6):1961-1974

相似文献

14.

Prognostic value of RASSF1A methylation status in non-small cell lung cancer (NSCLC) patients: A meta-analysis of prospective studies

Hao Hu Yuefei Zhou Min Zhang 《Biomarkers》2019,24(3):207-216

Objective: Ras association domain family 1?A (RASSF1A) has been regarded as a biomarker predicting the prognosis of non-small cell lung cancer (NSCLC), but previous findings are inconsistent. This meta-analysis of prospective studies aimed to assess the value of RASSF1A methylation in predicting the prognosis of NSCLC patients.

Methods: Studies were searched in PubMed and Web of Science. The estimates of the effects and the corresponding 95% confidence intervals (95% CIs) were used for the analyses. The overall effects of RASSF1A methylation on overall survival (OS) were estimated, after which subgroup analysis based on regions was conducted. Sensitivity analyses were conducted to restrict the studies with certain features.

Results: A total of 16 studies with 2210 participants were included in this meta-analysis. The overall analysis result indicated that RASSF1A methylation had no statistically significant effects on OS of NSCLC patients (HR?=?1.28; 95% CI 0.86–1.70), which were confirmed by the subgroup analysis. However, the sensitivity analysis indicated that RASSF1A methylation from lung cancer tissues was significantly associated with lower OS (HR?=?1.24; 95% CI 1.04–1.45).

Conclusion: RASSF1A methylation in lung cancer tissue can serve as a prognostic factor of NSCLC. More studies are needed to uncover the underlying mechanisms. 相似文献

15.

Inference of the Properties of the Recombination Process from Whole Bacterial Genomes

M. Azim Ansari Xavier Didelot 《Genetics》2014,196(1):253-265

Patterns of linkage disequilibrium, homoplasy, and incompatibility are difficult to interpret because they depend on several factors, including the recombination process and the population structure. Here we introduce a novel model-based framework to infer recombination properties from such summary statistics in bacterial genomes. The underlying model is sequentially Markovian so that data can be simulated very efficiently, and we use approximate Bayesian computation techniques to infer parameters. As this does not require us to calculate the likelihood function, the model can be easily extended to investigate less probed aspects of recombination. In particular, we extend our model to account for the bias in the recombination process whereby closely related bacteria recombine more often with one another. We show that this model provides a good fit to a data set of Bacillus cereus genomes and estimate several recombination properties, including the rate of bias in recombination. All the methods described in this article are implemented in a software package that is freely available for download at http://code.google.com/p/clonalorigin/. 相似文献

16.

Selection of relevant features from amino acids enables development of robust classifiers

Rishi Das Roy Debasis Dash 《Amino acids》2014,46(5):1343-1351

Machine learning (ML) has been extensively applied to develop models and to understand high-throughput data of biological processes. However, new ML models, trained with novel experimental results, are required to build regularly for more precise predictions. ML methods can build models from numeric data, whereas biological data are generally textual (DNA, protein sequences) or images and needs feature calculation algorithms to generate quantitative features. Programming skills along with domain knowledge are required to develop these algorithms. Therefore, the process of knowledge discovery through ML is decelerated due to lack of generic tools to construct features and to build models directly from the data. Hence, we developed a schema that calculates about 5,000 features, selects relevant features and develops protein classifiers from the training data. To demonstrate the general applicability and robustness of our method, fungal adhesins and nuclear receptor proteins were used for building classifiers which outperformed existing classifiers when tested on independent data. Next, we built a classifier for mitochondrial proteins of Plasmodium falciparum which causes human malaria because the latest corresponding classifiers are not publically accessible. Our classifier attained 98.18 % accuracy and 0.95 Matthews correlation coefficient by fivefold cross-validation and outperformed existing classifiers on independent test set. We implemented this schema as user-friendly and open source application Pro-Gyan (http://code.google.com/p/pro-gyan/), to build and share executable classifiers without programming knowledge. 相似文献

17.

Variable complementary network: a novel approach for identifying biomarkers and their mutual associations 总被引：1，自引：0，他引：1

Hong-Dong Li Qing-Song Xu Wan Zhang Yi-Zeng Liang 《Metabolomics : Official journal of the Metabolomic Society》2012,8(6):1218-1226

Biological variables involved in a disease process often correlate with each other through for example shared metabolic pathways. In addition to their correlation, these variables contain complementary information that is particularly useful for disease classification and prediction. However, complementary information between variables is rarely explored. Therefore, establishing methods for the investigation of variable??s complementary information is very necessary. We propose a model population analysis approach that aggregates information of a number of classification models obtained with the help of Monte Carlo sampling in variable space for quantitatively calculating the complementary information between variables. We then assemble these complementary information to construct a variable complementary network (VCN) to give an overall visualization of how biological variables complement each other. Using a simulated dataset and two metabolomics datasets, we show that the complementary information is effective in biomarker discovery and that mutual associations of metabolites revealed by this method can provide information for exploring altered metabolic pathways. (The source codes for implementing VCN in MATLAB are freely available at: http://code.google.com/p/vcn2011/.) 相似文献

18.

Integrating biological pathways and genomic profiles with ChiBE 2

?zgün Babur Ugur Dogrusoz Merve ?ak?r Bülent Arman Aksoy Nikolaus Schultz Chris Sander Emek Demir 《BMC genomics》2014,15(1)

Background

Dynamic visual exploration of detailed pathway information can help researchers digest and interpret complex mechanisms and genomic datasets.

Results

ChiBE is a free, open-source software tool for visualizing, querying, and analyzing human biological pathways in BioPAX format. The recently released version 2 can search for neighborhoods, paths between molecules, and common regulators/targets of molecules, on large integrated cellular networks in the Pathway Commons database as well as in local BioPAX models. Resulting networks can be automatically laid out for visualization using a graphically rich, process-centric notation. Profiling data from the cBioPortal for Cancer Genomics and expression data from the Gene Expression Omnibus can be overlaid on these networks.

Conclusions

ChiBE’s new capabilities are organized around a genomics-oriented workflow and offer a unique comprehensive pathway analysis solution for genomics researchers. The software is freely available at http://code.google.com/p/chibe. 相似文献

19.

MethylAction: detecting differentially methylated regions that distinguish biological subtypes

Jeffrey M. Bhasin Bo Hu Angela H. Ting 《Nucleic acids research》2016,44(1):106-116

DNA methylation differences capture substantial information about the molecular and gene-regulatory states among biological subtypes. Enrichment-based next generation sequencing methods such as MBD-isolated genome sequencing (MiGS) and MeDIP-seq are appealing for studying DNA methylation genome-wide in order to distinguish between biological subtypes. However, current analytic tools do not provide optimal features for analyzing three-group or larger study designs. MethylAction addresses this need by detecting all possible patterns of statistically significant hyper- and hypo- methylation in comparisons involving any number of groups. Crucially, significance is established at the level of differentially methylated regions (DMRs), and bootstrapping determines false discovery rates (FDRs) associated with each pattern. We demonstrate this functionality in a four-group comparison among benign prostate and three clinical subtypes of prostate cancer and show that the bootstrap FDRs are highly useful in selecting the most robust patterns of DMRs. Compared to existing tools that are limited to two-group comparisons, MethylAction detects more DMRs with strong differential methylation measurements confirmed by whole genome bisulfite sequencing and offers a better balance between precision and recall in cross-cohort comparisons. MethylAction is available as an R package at http://jeffbhasin.github.io/methylaction. 相似文献

20.

The relationship of single-strand breaks in DNA to breast cancer risk and to tissue concentrations of oestrogens

Mathavi Sahadevan Oukseub Lee Miguel Muzzio Belinda Phan Lisa Jacobs Nagi Khouri 《Biomarkers》2017,22(7):689-697

Context: Clinical study of breast cancer patients in Chicago, IL, USA.

Objective: Ascertain the utility of measurements of single-strand breaks (SSB) in DNA for assessment of breast cancer risk.

Methods: Fine-needle aspirates of the breast, SSB by nick translation, percent breast density (PBD), Gail model risk, cumulative methylation index (CMI), enzymes of DNA repair and tissue antioxidants.

Results: DNA repair enzymes and 4-hydroxyestradiol were negatively associated with SSB; CMI and PBD were positively associated.

Conclusions: Quantitative measurement of SSBs by this procedure indicates the relative number of SSBs and is related to promoter methylation, antioxidant availability and percent breast density. 相似文献