期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions

Kasper D Hansen Benjamin Langmead Rafael A Irizarry 《Genome biology》2012,13(10):R83

DNA methylation is an important epigenetic modification involved in gene regulation, which can now be measured using whole-genome bisulfite sequencing. However, cost, complexity of the data, and lack of comprehensive analytical tools are major challenges that keep this technology from becoming widely applied. Here we present BSmooth, an alignment, quality control and analysis pipeline that provides accurate and precise results even with low coverage data, appropriately handling biological replicates. BSmooth is open source software, and can be downloaded from http://rafalab.jhsph.edu/bsmooth. 相似文献

2.

STATdb: A Specialised Resource for the STATome

C. Pawan K. Patro Asif M. Khan Tin Wee Tan Xin-Yuan Fu 《PloS one》2014,9(8)

相似文献

3.

FadE: whole genome methylation analysis for multiple sequencing platforms

Tade Souaiaia Zheng Zhang Ting Chen 《Nucleic acids research》2013,41(1):e14

DNA methylation plays a central role in genomic regulation and disease. Sodium bisulfite treatment (SBT) causes unmethylated cytosines to be sequenced as thymine, which allows methylation levels to reflected in the number of ‘C’-‘C’ alignments covering reference cytosines. Di-base color reads produced by lifetech’s SOLiD sequencer provide unreliable results when translated to bases because single sequencing errors effect the downstream sequence. We describe FadE, an algorithm to accurately determine genome-wide methylation rates directly in color or nucleotide space. FadE uses SBT unmethylated and untreated data to determine background error rates and incorporate them into a model which uses Newton–Raphson optimization to estimate the methylation rate and provide a credible interval describing its distribution at every reference cytosine. We sequenced two slides of human fibroblast cell-line bisulfite-converted fragment library with the SOLiD sequencer to investigate genome-wide methylation levels. FadE reported widespread differences in methylation levels across CpG islands and a large number of differentially methylated regions adjacent to genes which compares favorably to the results of an investigation on the same cell-line using nucleotide-space reads at higher coverage levels, suggesting that FadE is an accurate method to estimate genome-wide methylation with color or nucleotide reads. http://code.google.com/p/fade/. 相似文献

4.

AllerHunter: A SVM-Pairwise System for Assessment of Allergenicity and Allergic Cross-Reactivity in Proteins

Hon Cheng Muh Joo Chuan Tong Martti T. Tammi 《PloS one》2009,4(6)

Allergy is a major health problem in industrialized countries. The number of transgenic food crops is growing rapidly creating the need for allergenicity assessment before they are introduced into human food chain. While existing bioinformatic methods have achieved good accuracies for highly conserved sequences, the discrimination of allergens and non-allergens from allergen-like non-allergen sequences remains difficult. We describe AllerHunter, a web-based computational system for the assessment of potential allergenicity and allergic cross-reactivity in proteins. It combines an iterative pairwise sequence similarity encoding scheme with SVM as the discriminating engine. The pairwise vectorization framework allows the system to model essential features in allergens that are involved in cross-reactivity, but not limited to distinct sets of physicochemical properties. The system was rigorously trained and tested using 1,356 known allergen and 13,449 putative non-allergen sequences. Extensive testing was performed for validation of the prediction models. The system is effective for distinguishing allergens and non-allergens from allergen-like non-allergen sequences. Testing results showed that AllerHunter, with a sensitivity of 83.4% and specificity of 96.4% (accuracy = 95.3%, area under the receiver operating characteristic curve AROC = 0.928±0.004 and Matthew''s correlation coefficient MCC = 0.738), performs significantly better than a number of existing methods using an independent dataset of 1443 protein sequences. AllerHunter is available at http://tiger.dbs.nus.edu.sg/AllerHunter 相似文献

5.

The eSNV-detect: a computational system to identify expressed single nucleotide variants from transcriptome sequencing data

Xiaojia Tang Saurabh Baheti Khader Shameer Kevin J. Thompson Quin Wills Nifang Niu Ilona N. Holcomb Stephane C. Boutet Ramesh Ramakrishnan Jennifer M. Kachergus Jean-Pierre A. Kocher Richard M. Weinshilboum Liewei Wang E.?Aubrey Thompson Krishna R. Kalari 《Nucleic acids research》2014,42(22):e172

Rapid development of next generation sequencing technology has enabled the identification of genomic alterations from short sequencing reads. There are a number of software pipelines available for calling single nucleotide variants from genomic DNA but, no comprehensive pipelines to identify, annotate and prioritize expressed SNVs (eSNVs) from non-directional paired-end RNA-Seq data. We have developed the eSNV-Detect, a novel computational system, which utilizes data from multiple aligners to call, even at low read depths, and rank variants from RNA-Seq. Multi-platform comparisons with the eSNV-Detect variant candidates were performed. The method was first applied to RNA-Seq from a lymphoblastoid cell-line, achieving 99.7% precision and 91.0% sensitivity in the expressed SNPs for the matching HumanOmni2.5 BeadChip data. Comparison of RNA-Seq eSNV candidates from 25 ER+ breast tumors from The Cancer Genome Atlas (TCGA) project with whole exome coding data showed 90.6–96.8% precision and 91.6–95.7% sensitivity. Contrasting single-cell mRNA-Seq variants with matching traditional multicellular RNA-Seq data for the MD-MB231 breast cancer cell-line delineated variant heterogeneity among the single-cells. Further, Sanger sequencing validation was performed for an ER+ breast tumor with paired normal adjacent tissue validating 29 out of 31 candidate eSNVs. The source code and user manuals of the eSNV-Detect pipeline for Sun Grid Engine and virtual machine are available at http://bioinformaticstools.mayo.edu/research/esnv-detect/. 相似文献

6.

Realistic artificial DNA sequences as negative controls for computational genomics

Juan Caballero Arian F. A. Smit Leroy Hood Gustavo Glusman 《Nucleic acids research》2014,42(12):e99

A common practice in computational genomic analysis is to use a set of ‘background’ sequences as negative controls for evaluating the false-positive rates of prediction tools, such as gene identification programs and algorithms for detection of cis-regulatory elements. Such ‘background’ sequences are generally taken from regions of the genome presumed to be intergenic, or generated synthetically by ‘shuffling’ real sequences. This last method can lead to underestimation of false-positive rates. We developed a new method for generating artificial sequences that are modeled after real intergenic sequences in terms of composition, complexity and interspersed repeat content. These artificial sequences can serve as an inexhaustible source of high-quality negative controls. We used artificial sequences to evaluate the false-positive rates of a set of programs for detecting interspersed repeats, ab initio prediction of coding genes, transcribed regions and non-coding genes. We found that RepeatMasker is more accurate than PClouds, Augustus has the lowest false-positive rate of the coding gene prediction programs tested, and Infernal has a low false-positive rate for non-coding gene detection. A web service, source code and the models for human and many other species are freely available at http://repeatmasker.org/garlic/. 相似文献

7.

Rising from the crypt: decreasing DNA methylation during differentiation of the small intestine

Sean M Cullen Margaret A Goodell 《Genome biology》2013,14(5):116

The differentiation of intestinal stem cells involves few DNA methylation changes, assayed by bisulfite sequencing, in contrast to other adult somatic stem cell hierarchies.Please see related Research article: http://genomebiology.com/2013/14/5/R50 相似文献

8.

SNPAAMapperT2K: A genome-wide SNP downstream analysis and annotation pipeline for species annotated with NCBI.tbl data files

Yongsheng Bai 《Bioinformation》2014,10(11):711-715

相似文献

9.

MOABS: model based analysis of bisulfite sequencing data

Deqiang Sun Yuanxin Xi Benjamin Rodriguez Hyun Jung Park Pan Tong Mira Meong Margaret A Goodell Wei Li 《Genome biology》2014,15(2):R38

Bisulfite sequencing (BS-seq) is the gold standard for studying genome-wide DNA methylation. We developed MOABS to increase the speed, accuracy, statistical power and biological relevance of BS-seq data analysis. MOABS detects differential methylation with 10-fold coverage at single-CpG resolution based on a Beta-Binomial hierarchical model and is capable of processing two billion reads in 24 CPU hours. Here, using simulated and real BS-seq data, we demonstrate that MOABS outperforms other leading algorithms, such as Fisher’s exact test and BSmooth. Furthermore, MOABS analysis can be easily extended to differential 5hmC analysis using RRBS and oxBS-seq. MOABS is available at http://code.google.com/p/moabs/. 相似文献

10.

Bis-class: a new classification tool of methylation status using bayes classifier and local methylation information

Iksoo Huh Xingyu Yang Taesung Park Soojin V Yi 《BMC genomics》2014,15(1)

Background

Whole genome sequencing of bisulfite converted DNA (‘methylC-seq’) method provides comprehensive information of DNA methylation. An important application of these whole genome methylation maps is classifying each position as a methylated versus non-methylated nucleotide. A widely used current method for this purpose, the so-called binomial method, is intuitive and straightforward, but lacks power when the sequence coverage and the genome-wide methylation level are low. These problems present a particular challenge when analyzing sparsely methylated genomes, such as those of many invertebrates and plants.

Results

We demonstrate that the number of sequence reads per position from methylC-seq data displays a large variance and can be modeled as a shifted negative binomial distribution. We also show that DNA methylation levels of adjacent CpG sites are correlated, and this similarity in local DNA methylation levels extends several kilobases. Taking these observations into account, we propose a new method based on Bayesian classification to infer DNA methylation status while considering the neighborhood DNA methylation levels of a specific site. We show that our approach has higher sensitivity and better classification performance than the binomial method via multiple analyses, including computational simulations, Area Under Curve (AUC) analyses, and improved consistencies across biological replicates. This method is especially advantageous in the analyses of sparsely methylated genomes with low coverage.

Conclusions

Our method improves the existing binomial method for binary methylation calls by utilizing a posterior odds framework and incorporating local methylation information. This method should be widely applicable to the analyses of methylC-seq data from diverse sparsely methylated genomes. Bis-Class and example data are provided at a dedicated website (http://bibs.snu.ac.kr/software/Bisclass).

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-608) contains supplementary material, which is available to authorized users. 相似文献

11.

A Daily-Updated Database and Tools for Comprehensive SARS-CoV-2 Mutation-Annotated Trees

Jakob McBroome Bryan Thornlow Angie S Hinrichs Alexander Kramer Nicola De Maio Nick Goldman David Haussler Russell Corbett-Detig Yatish Turakhia 《Molecular biology and evolution》2021,38(12):5819

The vast scale of SARS-CoV-2 sequencing data has made it increasingly challenging to comprehensively analyze all available data using existing tools and file formats. To address this, we present a database of SARS-CoV-2 phylogenetic trees inferred with unrestricted public sequences, which we update daily to incorporate new sequences. Our database uses the recently proposed mutation-annotated tree (MAT) format to efficiently encode the tree with branches labeled with parsimony-inferred mutations, as well as Nextstrain clade and Pango lineage labels at clade roots. As of June 9, 2021, our SARS-CoV-2 MAT consists of 834,521 sequences and provides a comprehensive view of the virus’ evolutionary history using public data. We also present matUtils—a command-line utility for rapidly querying, interpreting, and manipulating the MATs. Our daily-updated SARS-CoV-2 MAT database and matUtils software are available at http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/ and https://github.com/yatisht/usher, respectively. 相似文献

12.

A Bayesian Assignment Method for Ambiguous Bisulfite Short Reads

Hong Tran Xiaowei Wu Saima Tithi Ming-an Sun Hehuang Xie Liqing Zhang 《PloS one》2016,11(3)

DNA methylation is an epigenetic modification critical for normal development and diseases. The determination of genome-wide DNA methylation at single-nucleotide resolution is made possible by sequencing bisulfite treated DNA with next generation high-throughput sequencing. However, aligning bisulfite short reads to a reference genome remains challenging as only a limited proportion of them (around 50–70%) can be aligned uniquely; a significant proportion, known as multireads, are mapped to multiple locations and thus discarded from downstream analyses, causing financial waste and biased methylation inference. To address this issue, we develop a Bayesian model that assigns multireads to their most likely locations based on the posterior probability derived from information hidden in uniquely aligned reads. Analyses of both simulated data and real hairpin bisulfite sequencing data show that our method can effectively assign approximately 70% of the multireads to their best locations with up to 90% accuracy, leading to a significant increase in the overall mapping efficiency. Moreover, the assignment model shows robust performance with low coverage depth, making it particularly attractive considering the prohibitive cost of bisulfite sequencing. Additionally, results show that longer reads help improve the performance of the assignment model. The assignment model is also robust to varying degrees of methylation and varying sequencing error rates. Finally, incorporating prior knowledge on mutation rate and context specific methylation level into the assignment model increases inference accuracy. The assignment model is implemented in the BAM-ABS package and freely available at https://github.com/zhanglabvt/BAM_ABS. 相似文献

13.

COHCAP: an integrative genomic pipeline for single-nucleotide resolution DNA methylation analysis

Charles D. Warden Heehyoung Lee Joshua D. Tompkins Xiaojin Li Charles Wang Arthur D. Riggs Hua Yu Richard Jove Yate-Ching Yuan 《Nucleic acids research》2013,41(11):e117

COHCAP (City of Hope CpG Island Analysis Pipeline) is an algorithm to analyze single-nucleotide resolution DNA methylation data produced by either an Illumina methylation array or targeted bisulfite sequencing. The goal of the COHCAP algorithm is to identify CpG islands that show a consistent pattern of methylation among CpG sites. COHCAP is currently the only DNA methylation package that provides integration with gene expression data to identify a subset of CpG islands that are most likely to regulate downstream gene expression, and it can generate lists of differentially methylated CpG islands with ∼50% concordance with gene expression from both cell line data and heterogeneous patient data. For example, this article describes known breast cancer biomarkers (such as estrogen receptor) with a negative correlation between DNA methylation and gene expression. COHCAP also provides visualization for quality control metrics, regions of differential methylation and correlation between methylation and gene expression. This software is freely available at https://sourceforge.net/projects/cohcap/. 相似文献

14.

A Model-Based Clustering Method for Genomic Structural Variant Prediction and Genotyping Using Paired-End Sequencing Data

Matthew Hayes Yoon Soo Pyon Jing Li 《PloS one》2012,7(12)

Structural variation (SV) has been reported to be associated with numerous diseases such as cancer. With the advent of next generation sequencing (NGS) technologies, various types of SV can be potentially identified. We propose a model based clustering approach utilizing a set of features defined for each type of SV events. Our method, termed SVMiner, not only provides a probability score for each candidate, but also predicts the heterozygosity of genomic deletions. Extensive experiments on genome-wide deep sequencing data have demonstrated that SVMiner is robust against the variability of a single cluster feature, and it significantly outperforms several commonly used SV detection programs. SVMiner can be downloaded from http://cbc.case.edu/svminer/. 相似文献

15.

MethylAction: detecting differentially methylated regions that distinguish biological subtypes

Jeffrey M. Bhasin Bo Hu Angela H. Ting 《Nucleic acids research》2016,44(1):106-116

DNA methylation differences capture substantial information about the molecular and gene-regulatory states among biological subtypes. Enrichment-based next generation sequencing methods such as MBD-isolated genome sequencing (MiGS) and MeDIP-seq are appealing for studying DNA methylation genome-wide in order to distinguish between biological subtypes. However, current analytic tools do not provide optimal features for analyzing three-group or larger study designs. MethylAction addresses this need by detecting all possible patterns of statistically significant hyper- and hypo- methylation in comparisons involving any number of groups. Crucially, significance is established at the level of differentially methylated regions (DMRs), and bootstrapping determines false discovery rates (FDRs) associated with each pattern. We demonstrate this functionality in a four-group comparison among benign prostate and three clinical subtypes of prostate cancer and show that the bootstrap FDRs are highly useful in selecting the most robust patterns of DMRs. Compared to existing tools that are limited to two-group comparisons, MethylAction detects more DMRs with strong differential methylation measurements confirmed by whole genome bisulfite sequencing and offers a better balance between precision and recall in cross-cohort comparisons. MethylAction is available as an R package at http://jeffbhasin.github.io/methylaction. 相似文献

16.

miRspring: a compact standalone research tool for analyzing miRNA-seq data

David T. Humphreys Catherine M. Suter 《Nucleic acids research》2013,41(15):e147

High-throughput sequencing for microRNA (miRNA) profiling has revealed a vast complexity of miRNA processing variants, but these are difficult to discern for those without bioinformatics expertise and large computing capability. In this article, we present miRNA Sequence Profiling (miRspring) (http://mirspring.victorchang.edu.au), a software solution that creates a small portable research document that visualizes, calculates and reports on the complexities of miRNA processing. We designed an index-compression algorithm that allows the miRspring document to reproduce a complete miRNA sequence data set while retaining a small file size (typically <3 MB). Through analysis of 73 public data sets, we demonstrate miRspring’s features in assessing quality parameters, miRNA cluster expression levels and miRNA processing. Additionally, we report on a new class of miRNA variants, which we term seed-isomiRs, identified through the novel visualization tools of the miRspring document. Further investigation identified that ∼30% of human miRBase entries are likely to have a seed-isomiR. We believe that miRspring will be a highly useful research tool that will enhance the analysis of miRNA data sets and thus increase our understanding of miRNA biology. 相似文献

17.

methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles

Altuna Akalin Matthias Kormaksson Sheng Li Francine E Garrett-Bakelman Maria E Figueroa Ari Melnick Christopher E Mason 《Genome biology》2012,13(10):R87

DNA methylation is a chemical modification of cytosine bases that is pivotal for gene regulation, cellular specification and cancer development. Here, we describe an R package, methylKit, that rapidly analyzes genome-wide cytosine epigenetic profiles from high-throughput methylation and hydroxymethylation sequencing experiments. methylKit includes functions for clustering, sample quality visualization, differential methylation analysis and annotation features, thus automating and simplifying many of the steps for discerning statistically significant bases or regions of DNA methylation. Finally, we demonstrate methylKit on breast cancer data, in which we find statistically significant regions of differential methylation and stratify tumor subtypes. methylKit is available at http://code.google.com/p/methylkit. 相似文献

18.

CNVannotator: A Comprehensive Annotation Server for Copy Number Variation in the Human Genome

Min Zhao Zhongming Zhao 《PloS one》2013,8(11)

Copy number variation (CNV) is one of the most prevalent genetic variations in the genome, leading to an abnormal number of copies of moderate to large genomic regions. High-throughput technologies such as next-generation sequencing often identify thousands of CNVs involved in biological or pathological processes. Despite the growing demand to filter and classify CNVs by factors such as frequency in population, biological features, and function, surprisingly, no online web server for CNV annotations has been made available to the research community. Here, we present CNVannotator, a web server that accepts an input set of human genomic positions in a user-friendly tabular format. CNVannotator can perform genomic overlaps of the input coordinates using various functional features, including a list of the reported 356,817 common CNVs, 181,261 disease CNVs, as well as, 140,342 SNPs from genome-wide association studies. In addition, CNVannotator incorporates 2,211,468 genomic features, including ENCODE regulatory elements, cytoband, segmental duplication, genome fragile site, pseudogene, promoter, enhancer, CpG island, and methylation site. For cancer research community users, CNVannotator can apply various filters to retrieve a subgroup of CNVs pinpointed in hundreds of tumor suppressor genes and oncogenes. In total, 5,277,234 unique genomic coordinates with functional features are available to generate an output in a plain text format that is free to download. In summary, we provide a comprehensive web resource for human CNVs. The annotated results along with the server can be accessed at http://bioinfo.mc.vanderbilt.edu/CNVannotator/. 相似文献

19.

Time for a standardized system of reporting sites of genomic methylation

Richard Saffery Lavinia Gordon 《Genome biology》2015,16(1)

相似文献

20.

GobyWeb: Simplified Management and Analysis of Gene Expression and DNA Methylation Sequencing Data

Kevin C. Dorff Nyasha Chambwe Zachary Zeno Manuele Simi Rita Shaknovich Fabien Campagne 《PloS one》2013,8(7)

We present GobyWeb, a web-based system that facilitates the management and analysis of high-throughput sequencing (HTS) projects. The software provides integrated support for a broad set of HTS analyses and offers a simple plugin extension mechanism. Analyses currently supported include quantification of gene expression for messenger and small RNA sequencing, estimation of DNA methylation (i.e., reduced bisulfite sequencing and whole genome methyl-seq), or the detection of pathogens in sequenced data. In contrast to previous analysis pipelines developed for analysis of HTS data, GobyWeb requires significantly less storage space, runs analyses efficiently on a parallel grid, scales gracefully to process tens or hundreds of multi-gigabyte samples, yet can be used effectively by researchers who are comfortable using a web browser. We conducted performance evaluations of the software and found it to either outperform or have similar performance to analysis programs developed for specialized analyses of HTS data. We found that most biologists who took a one-hour GobyWeb training session were readily able to analyze RNA-Seq data with state of the art analysis tools. GobyWeb can be obtained at http://gobyweb.campagnelab.org and is freely available for non-commercial use. GobyWeb plugins are distributed in source code and licensed under the open source LGPL3 license to facilitate code inspection, reuse and independent extensions http://github.com/CampagneLaboratory/gobyweb2-plugins. 相似文献