期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

quantro: a data-driven approach to guide the choice of an appropriate normalization method

Stephanie C. Hicks Rafael A. Irizarry 《Genome biology》2015,16(1)

Normalization is an essential step in the analysis of high-throughput data. Multi-sample global normalization methods, such as quantile normalization, have been successfully used to remove technical variation. However, these methods rely on the assumption that observed global changes across samples are due to unwanted technical variability. Applying global normalization methods has the potential to remove biologically driven variation. Currently, it is up to the subject matter experts to determine if the stated assumptions are appropriate. Here, we propose a data-driven alternative. We demonstrate the utility of our method (quantro) through examples and simulations. A software implementation is available from http://www.bioconductor.org/packages/release/bioc/html/quantro.html.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-015-0679-0) contains supplementary material, which is available to authorized users. 相似文献

2.

NanoMethViz: An R/Bioconductor package for visualizing long-read methylation data

Shian Su Quentin Gouil Marnie E. Blewitt Dianne Cook Peter F. Hickey Matthew E. Ritchie 《PLoS computational biology》2021,17(10)

A key benefit of long-read nanopore sequencing technology is the ability to detect modified DNA bases, such as 5-methylcytosine. The lack of R/Bioconductor tools for the effective visualization of nanopore methylation profiles between samples from different experimental groups led us to develop the NanoMethViz R package. Our software can handle methylation output generated from a range of different methylation callers and manages large datasets using a compressed data format. To fully explore the methylation patterns in a dataset, NanoMethViz allows plotting of data at various resolutions. At the sample-level, we use dimensionality reduction to look at the relationships between methylation profiles in an unsupervised way. We visualize methylation profiles of classes of features such as genes or CpG islands by scaling them to relative positions and aggregating their profiles. At the finest resolution, we visualize methylation patterns across individual reads along the genome using the spaghetti plot and heatmaps, allowing users to explore particular genes or genomic regions of interest. In summary, our software makes the handling of methylation signal more convenient, expands upon the visualization options for nanopore data and works seamlessly with existing methylation analysis tools available in the Bioconductor project. Our software is available at https://bioconductor.org/packages/NanoMethViz. 相似文献

3.

SANTA: Quantifying the Functional Content of Molecular Networks

Alex J. Cornish Florian Markowetz 《PLoS computational biology》2014,10(9)

Linking networks of molecular interactions to cellular functions and phenotypes is a key goal in systems biology. Here, we adapt concepts of spatial statistics to assess the functional content of molecular networks. Based on the guilt-by-association principle, our approach (called SANTA) quantifies the strength of association between a gene set and a network, and functionally annotates molecular networks like other enrichment methods annotate lists of genes. As a general association measure, SANTA can (i) functionally annotate experimentally derived networks using a collection of curated gene sets and (ii) annotate experimentally derived gene sets using a collection of curated networks, as well as (iii) prioritize genes for follow-up analyses. We exemplify the efficacy of SANTA in several case studies using the S. cerevisiae genetic interaction network and genome-wide RNAi screens in cancer cell lines. Our theory, simulations, and applications show that SANTA provides a principled statistical way to quantify the association between molecular networks and cellular functions and phenotypes. SANTA is available from http://bioconductor.org/packages/release/bioc/html/SANTA.html. 相似文献

4.

Regulatory network-based imputation of dropouts in single-cell RNA sequencing data

Ana Carolina Leote Xiaohui Wu Andreas Beyer 《PLoS computational biology》2022,18(2)

相似文献

5.

ggtreeExtra: Compact Visualization of Richly Annotated Phylogenetic Data

Shuangbin Xu Zehan Dai Pingfan Guo Xiaocong Fu Shanshan Liu Lang Zhou Wenli Tang Tingze Feng Meijun Chen Li Zhan Tianzhi Wu Erqiang Hu Yong Jiang Xiaochen Bo Guangchuang Yu 《Molecular biology and evolution》2021,38(9):4039

We present the ggtreeExtra package for visualizing heterogeneous data with a phylogenetic tree in a circular or rectangular layout (https://www.bioconductor.org/packages/ggtreeExtra). The package supports more data types and visualization methods than other tools. It supports using the grammar of graphics syntax to present data on a tree with richly annotated layers and allows evolutionary statistics inferred by commonly used software to be integrated and visualized with external data. GgtreeExtra is a universal tool for tree data visualization. It extends the applications of the phylogenetic tree in different disciplines by making more domain-specific data to be available to visualize and interpret in the evolutionary context. 相似文献

6.

Using high-density DNA methylation arrays to profile copy number alterations

Andrew Feber Paul Guilhamon Matthias Lechner Tim Fenton Gareth A Wilson Christina Thirlwell Tiffany J Morris Adrienne M Flanagan Andrew E Teschendorff John D Kelly Stephan Beck 《Genome biology》2014,15(2):R30

The integration of genomic and epigenomic data is an increasingly popular approach for studying the complex mechanisms driving cancer development. We have developed a method for evaluating both methylation and copy number from high-density DNA methylation arrays. Comparing copy number data from Infinium HumanMethylation450 BeadChips and SNP arrays, we demonstrate that Infinium arrays detect copy number alterations with the sensitivity of SNP platforms. These results show that high-density methylation arrays provide a robust and economic platform for detecting copy number and methylation changes in a single experiment. Our method is available in the ChAMP Bioconductor package: http://www.bioconductor.org/packages/2.13/bioc/html/ChAMP.html. 相似文献

7.

QC-Chain: Fast and Holistic Quality Control Method for Next-Generation Sequencing Data

Qian Zhou Xiaoquan Su Anhui Wang Jian Xu Kang Ning 《PloS one》2013,8(4)

Next-generation sequencing (NGS) technologies have been widely used in life sciences. However, several kinds of sequencing artifacts, including low-quality reads and contaminating reads, were found to be quite common in raw sequencing data, which compromise downstream analysis. Therefore, quality control (QC) is essential for raw NGS data. However, although a few NGS data quality control tools are publicly available, there are two limitations: First, the processing speed could not cope with the rapid increase of large data volume. Second, with respect to removing the contaminating reads, none of them could identify contaminating sources de novo, and they rely heavily on prior information of the contaminating species, which is usually not available in advance. Here we report QC-Chain, a fast, accurate and holistic NGS data quality-control method. The tool synergeticly comprised of user-friendly tools for (1) quality assessment and trimming of raw reads using Parallel-QC, a fast read processing tool; (2) identification, quantification and filtration of unknown contamination to get high-quality clean reads. It was optimized based on parallel computation, so the processing speed is significantly higher than other QC methods. Experiments on simulated and real NGS data have shown that reads with low sequencing quality could be identified and filtered. Possible contaminating sources could be identified and quantified de novo, accurately and quickly. Comparison between raw reads and processed reads also showed that subsequent analyses (genome assembly, gene prediction, gene annotation, etc.) results based on processed reads improved significantly in completeness and accuracy. As regard to processing speed, QC-Chain achieves 7–8 time speed-up based on parallel computation as compared to traditional methods. Therefore, QC-Chain is a fast and useful quality control tool for read quality process and de novo contamination filtration of NGS reads, which could significantly facilitate downstream analysis. QC-Chain is publicly available at: http://www.computationalbioenergy.org/qc-chain.html. 相似文献

8.

SomatiCA: Identifying,Characterizing and Quantifying Somatic Copy Number Aberrations from Cancer Genome Sequencing Data

Mengjie Chen Murat Gunel Hongyu Zhao 《PloS one》2013,8(11)

Whole genome sequencing of matched tumor-normal sample pairs is becoming routine in cancer research. However, analysis of somatic copy-number changes from sequencing data is still challenging because of insufficient sequencing coverage, unknown tumor sample purity and subclonal heterogeneity. Here we describe a computational framework, named SomatiCA, which explicitly accounts for tumor purity and subclonality in the analysis of somatic copy-number profiles. Taking read depths (RD) and lesser allele frequencies (LAF) as input, SomatiCA will output 1) admixture rate for each tumor sample, 2) somatic allelic copy-number for each genomic segment, 3) fraction of tumor cells with subclonal change in each somatic copy number aberration (SCNA), and 4) a list of substantial genomic aberration events including gain, loss and LOH. SomatiCA is available as a Bioconductor R package at http://www.bioconductor.org/packages/2.13/bioc/html/SomatiCA.html. 相似文献

9.

CAGEr: precise TSS data retrieval and high-resolution promoterome mining for integrative analyses

Vanja Haberle Alistair R.R. Forrest Yoshihide Hayashizaki Piero Carninci Boris Lenhard 《Nucleic acids research》2015,43(8):e51

相似文献

10.

Memes: A motif analysis environment in R using tools from the MEME Suite

Spencer L. Nystrom Daniel J. McKay 《PLoS computational biology》2021,17(9)

Identification of biopolymer motifs represents a key step in the analysis of biological sequences. The MEME Suite is a widely used toolkit for comprehensive analysis of biopolymer motifs; however, these tools are poorly integrated within popular analysis frameworks like the R/Bioconductor project, creating barriers to their use. Here we present memes, an R package that provides a seamless R interface to a selection of popular MEME Suite tools. memes provides a novel “data aware” interface to these tools, enabling rapid and complex discriminative motif analysis workflows. In addition to interfacing with popular MEME Suite tools, memes leverages existing R/Bioconductor data structures to store the multidimensional data returned by MEME Suite tools for rapid data access and manipulation. Finally, memes provides data visualization capabilities to facilitate communication of results. memes is available as a Bioconductor package at https://bioconductor.org/packages/memes, and the source code can be found at github.com/snystrom/memes. 相似文献

11.

BubbleTree: an intuitive visualization to elucidate tumoral aneuploidy and clonality using next generation sequencing data

Wei Zhu Michael Kuziora Todd Creasy Zhongwu Lai Christopher Morehouse Xiang Guo Yinong Sebastian Dong Shen Jiaqi Huang Jonathan R. Dry Feng Xue Liyan Jiang Yihong Yao Brandon W. Higgs 《Nucleic acids research》2016,44(4):e38

Tumors are characterized by properties of genetic instability, heterogeneity, and significant oligoclonality. Elucidating this intratumoral heterogeneity is challenging but important. In this study, we propose a framework, BubbleTree, to characterize the tumor clonality using next generation sequencing (NGS) data. BubbleTree simultaneously elucidates the complexity of a tumor biopsy, estimating cancerous cell purity, tumor ploidy, allele-specific copy number, and clonality and represents this in an intuitive graph. We further developed a three-step heuristic method to automate the interpretation of the BubbleTree graph, using a divide-and-conquer strategy. In this study, we demonstrated the performance of BubbleTree with comparisons to similar commonly used tools such as THetA2, ABSOLUTE, AbsCN-seq and ASCAT, using both simulated and patient-derived data. BubbleTree outperformed these tools, particularly in identifying tumor subclonal populations and polyploidy. We further demonstrated BubbleTree''s utility in tracking clonality changes from patients’ primary to metastatic tumor and dating somatic single nucleotide and copy number variants along the tumor clonal evolution. Overall, the BubbleTree graph and corresponding model is a powerful approach to provide a comprehensive spectrum of the heterogeneous tumor karyotype in human tumors. BubbleTree is R-based and freely available to the research community (https://www.bioconductor.org/packages/release/bioc/html/BubbleTree.html). 相似文献

12.

A multi-objective genetic algorithm to find active modules in multiplex biological networks

Elva María Novoa-del-Toro Efrn Mezura-Montes Matthieu Vignes Morgane Trzol Frdrique Magdinier Laurent Tichit Anaïs Baudot 《PLoS computational biology》2021,17(8)

The identification of subnetworks of interest—or active modules—by integrating biological networks with molecular profiles is a key resource to inform on the processes perturbed in different cellular conditions. We here propose MOGAMUN, a Multi-Objective Genetic Algorithm to identify active modules in MUltiplex biological Networks. MOGAMUN optimizes both the density of interactions and the scores of the nodes (e.g., their differential expression). We compare MOGAMUN with state-of-the-art methods, representative of different algorithms dedicated to the identification of active modules in single networks. MOGAMUN identifies dense and high-scoring modules that are also easier to interpret. In addition, to our knowledge, MOGAMUN is the first method able to use multiplex networks. Multiplex networks are composed of different layers of physical and functional relationships between genes and proteins. Each layer is associated to its own meaning, topology, and biases; the multiplex framework allows exploiting this diversity of biological networks. We applied MOGAMUN to identify cellular processes perturbed in Facio-Scapulo-Humeral muscular Dystrophy, by integrating RNA-seq expression data with a multiplex biological network. We identified different active modules of interest, thereby providing new angles for investigating the pathomechanisms of this disease.Availability: MOGAMUN is available at https://github.com/elvanov/MOGAMUN and as a Bioconductor package at https://bioconductor.org/packages/release/bioc/html/MOGAMUN.html. Contact: rf.uma-vinu@toduab.siana 相似文献

13.

Accelerating Bayesian Hierarchical Clustering of Time Series Data with a Randomised Algorithm

Robert Darkins Emma J. Cooke Zoubin Ghahramani Paul D. W. Kirk David L. Wild Richard S. Savage 《PloS one》2013,8(4)

相似文献

14.

Identification of active regulatory regions from DNA methylation data

Lukas Burger Dimos Gaidatzis Dirk Schübeler Michael B. Stadler 《Nucleic acids research》2013,41(16):e155

相似文献

15.

mAPKL: R/ Bioconductor package for detecting gene exemplars and revealing their characteristics

Argiris Sakellariou George Spyrou 《BMC bioinformatics》2015,16(1)

Background

So far many algorithms have been proposed towards the detection of significant genes in microarray analysis problems. Several of those approaches are freely available as R-packages though their engagement in gene expression analysis by non-bioinformaticians is usually a frustrating task. Besides, only some of those packages offer a complete suite of tools starting from initial data import and ending to analysis report. Here we present an R/Bioconductor package that implements a hybrid gene selection method along with a bunch of functions to facilitate a thorough and convenient gene expression profiling analysis.

Results

mAPKL is an open-source R/Bioconductor package that implements the mAP-KL hybrid gene selection method. The advantage of this method is that selects a small number of gene exemplars while achieving comparable classification results to other well established algorithms on a variety of datasets and dataset sizes. The mAPKL package is accompanied with extra functionalities including (i) solid data import; (ii) data sampling following a user-defined proportion; (iii) preprocessing through several normalization and transformation alternatives; (iv) classification with the aid of SVM and performance evaluation; (v) network analysis of the significant genes (exemplars), including degree of centrality, closeness, betweeness, clustering coefficient as well as the construction of an edge list table; (vi) gene annotation analysis, (vii) pathway analysis and (viii) auto-generated analysis reporting.

Conclusions

Users are able to run a thorough gene expression analysis in a timely manner starting from raw data and concluding to network characteristics of the selected gene exemplars. Detailed instructions and example data are provided in the R package, which is freely available at Bioconductor under the GPL-2 or later license http://www.bioconductor.org/packages/3.1/bioc/html/mAPKL.html.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0719-5) contains supplementary material, which is available to authorized users. 相似文献

16.

From Gigabyte to Kilobyte: A Bioinformatics Protocol for Mining Large RNA-Seq Transcriptomics Data

Jilong Li Jie Hou Lin Sun Jordan Maximillian Wilkins Yuan Lu Chad E. Niederhuth Benjamin Ryan Merideth Thomas P. Mawhinney Valeri V. Mossine C. Michael Greenlief John C. Walker William R. Folk Mark Hannink Dennis B. Lubahn James A. Birchler Jianlin Cheng 《PloS one》2015,10(4)

RNA-Seq techniques generate hundreds of millions of short RNA reads using next-generation sequencing (NGS). These RNA reads can be mapped to reference genomes to investigate changes of gene expression but improved procedures for mining large RNA-Seq datasets to extract valuable biological knowledge are needed. RNAMiner—a multi-level bioinformatics protocol and pipeline—has been developed for such datasets. It includes five steps: Mapping RNA-Seq reads to a reference genome, calculating gene expression values, identifying differentially expressed genes, predicting gene functions, and constructing gene regulatory networks. To demonstrate its utility, we applied RNAMiner to datasets generated from Human, Mouse, Arabidopsis thaliana, and Drosophila melanogaster cells, and successfully identified differentially expressed genes, clustered them into cohesive functional groups, and constructed novel gene regulatory networks. The RNAMiner web service is available at http://calla.rnet.missouri.edu/rnaminer/index.html. 相似文献

17.

A Graphical Modelling Approach to the Dissection of Highly Correlated Transcription Factor Binding Site Profiles

Robert Stojnic Audrey Qiuyan Fu Boris Adryan 《PLoS computational biology》2012,8(11)

相似文献

18.

Corset: enabling differential gene expression analysis for de novo assembled transcriptomes

Nadia M Davidson Alicia Oshlack 《Genome biology》2014,15(7)

相似文献

19.

Combining DGE and RNA-sequencing data to identify new polyA+ non-coding transcripts in the human genome

Nicolas Philippe Elias Bou Samra Anthony Boureux Alban Mancheron Florence Rufflé Qiang Bai John De Vos Eric Rivals Thérèse Commes 《Nucleic acids research》2014,42(5):2820-2832

相似文献

20.

Tn-Seq Explorer: A Tool for Analysis of High-Throughput Sequencing Data of Transposon Mutant Libraries

Sina Solaimanpour Felipe Sarmiento Jan Mrázek 《PloS one》2015,10(5)

Tn-seq is a high throughput technique for analysis of transposon mutant libraries. Tn-seq Explorer was developed as a convenient and easy-to-use package of tools for exploration of the Tn-seq data. In a typical application, the user will have obtained a collection of sequence reads adjacent to transposon insertions in a reference genome. The reads are first aligned to the reference genome using one of the tools available for this task. Tn-seq Explorer reads the alignment and the gene annotation, and provides the user with a set of tools to investigate the data and identify possibly essential or advantageous genes as those that contain significantly low counts of transposon insertions. Emphasis is placed on providing flexibility in selecting parameters and methodology most appropriate for each particular dataset. Tn-seq Explorer is written in Java as a menu-driven, stand-alone application. It was tested on Windows, Mac OS, and Linux operating systems. The source code is distributed under the terms of GNU General Public License. The program and the source code are available for download at http://www.cmbl.uga.edu/downloads/programs/Tn_seq_Explorer/ and https://github.com/sina-cb/Tn-seqExplorer. 相似文献