期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

NanoMethViz: An R/Bioconductor package for visualizing long-read methylation data

Shian Su Quentin Gouil Marnie E. Blewitt Dianne Cook Peter F. Hickey Matthew E. Ritchie 《PLoS computational biology》2021,17(10)

A key benefit of long-read nanopore sequencing technology is the ability to detect modified DNA bases, such as 5-methylcytosine. The lack of R/Bioconductor tools for the effective visualization of nanopore methylation profiles between samples from different experimental groups led us to develop the NanoMethViz R package. Our software can handle methylation output generated from a range of different methylation callers and manages large datasets using a compressed data format. To fully explore the methylation patterns in a dataset, NanoMethViz allows plotting of data at various resolutions. At the sample-level, we use dimensionality reduction to look at the relationships between methylation profiles in an unsupervised way. We visualize methylation profiles of classes of features such as genes or CpG islands by scaling them to relative positions and aggregating their profiles. At the finest resolution, we visualize methylation patterns across individual reads along the genome using the spaghetti plot and heatmaps, allowing users to explore particular genes or genomic regions of interest. In summary, our software makes the handling of methylation signal more convenient, expands upon the visualization options for nanopore data and works seamlessly with existing methylation analysis tools available in the Bioconductor project. Our software is available at https://bioconductor.org/packages/NanoMethViz. 相似文献

2.

CAGEr: precise TSS data retrieval and high-resolution promoterome mining for integrative analyses

Vanja Haberle Alistair R.R. Forrest Yoshihide Hayashizaki Piero Carninci Boris Lenhard 《Nucleic acids research》2015,43(8):e51

相似文献

3.

Using high-density DNA methylation arrays to profile copy number alterations

Andrew Feber Paul Guilhamon Matthias Lechner Tim Fenton Gareth A Wilson Christina Thirlwell Tiffany J Morris Adrienne M Flanagan Andrew E Teschendorff John D Kelly Stephan Beck 《Genome biology》2014,15(2):R30

The integration of genomic and epigenomic data is an increasingly popular approach for studying the complex mechanisms driving cancer development. We have developed a method for evaluating both methylation and copy number from high-density DNA methylation arrays. Comparing copy number data from Infinium HumanMethylation450 BeadChips and SNP arrays, we demonstrate that Infinium arrays detect copy number alterations with the sensitivity of SNP platforms. These results show that high-density methylation arrays provide a robust and economic platform for detecting copy number and methylation changes in a single experiment. Our method is available in the ChAMP Bioconductor package: http://www.bioconductor.org/packages/2.13/bioc/html/ChAMP.html. 相似文献

4.

ggtreeExtra: Compact Visualization of Richly Annotated Phylogenetic Data

Shuangbin Xu Zehan Dai Pingfan Guo Xiaocong Fu Shanshan Liu Lang Zhou Wenli Tang Tingze Feng Meijun Chen Li Zhan Tianzhi Wu Erqiang Hu Yong Jiang Xiaochen Bo Guangchuang Yu 《Molecular biology and evolution》2021,38(9):4039

We present the ggtreeExtra package for visualizing heterogeneous data with a phylogenetic tree in a circular or rectangular layout (https://www.bioconductor.org/packages/ggtreeExtra). The package supports more data types and visualization methods than other tools. It supports using the grammar of graphics syntax to present data on a tree with richly annotated layers and allows evolutionary statistics inferred by commonly used software to be integrated and visualized with external data. GgtreeExtra is a universal tool for tree data visualization. It extends the applications of the phylogenetic tree in different disciplines by making more domain-specific data to be available to visualize and interpret in the evolutionary context. 相似文献

5.

Accelerating Bayesian Hierarchical Clustering of Time Series Data with a Randomised Algorithm

Robert Darkins Emma J. Cooke Zoubin Ghahramani Paul D. W. Kirk David L. Wild Richard S. Savage 《PloS one》2013,8(4)

相似文献

6.

mAPKL: R/ Bioconductor package for detecting gene exemplars and revealing their characteristics

Argiris Sakellariou George Spyrou 《BMC bioinformatics》2015,16(1)

Background

So far many algorithms have been proposed towards the detection of significant genes in microarray analysis problems. Several of those approaches are freely available as R-packages though their engagement in gene expression analysis by non-bioinformaticians is usually a frustrating task. Besides, only some of those packages offer a complete suite of tools starting from initial data import and ending to analysis report. Here we present an R/Bioconductor package that implements a hybrid gene selection method along with a bunch of functions to facilitate a thorough and convenient gene expression profiling analysis.

Results

mAPKL is an open-source R/Bioconductor package that implements the mAP-KL hybrid gene selection method. The advantage of this method is that selects a small number of gene exemplars while achieving comparable classification results to other well established algorithms on a variety of datasets and dataset sizes. The mAPKL package is accompanied with extra functionalities including (i) solid data import; (ii) data sampling following a user-defined proportion; (iii) preprocessing through several normalization and transformation alternatives; (iv) classification with the aid of SVM and performance evaluation; (v) network analysis of the significant genes (exemplars), including degree of centrality, closeness, betweeness, clustering coefficient as well as the construction of an edge list table; (vi) gene annotation analysis, (vii) pathway analysis and (viii) auto-generated analysis reporting.

Conclusions

Users are able to run a thorough gene expression analysis in a timely manner starting from raw data and concluding to network characteristics of the selected gene exemplars. Detailed instructions and example data are provided in the R package, which is freely available at Bioconductor under the GPL-2 or later license http://www.bioconductor.org/packages/3.1/bioc/html/mAPKL.html.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0719-5) contains supplementary material, which is available to authorized users. 相似文献

7.

SomatiCA: Identifying,Characterizing and Quantifying Somatic Copy Number Aberrations from Cancer Genome Sequencing Data

Mengjie Chen Murat Gunel Hongyu Zhao 《PloS one》2013,8(11)

Whole genome sequencing of matched tumor-normal sample pairs is becoming routine in cancer research. However, analysis of somatic copy-number changes from sequencing data is still challenging because of insufficient sequencing coverage, unknown tumor sample purity and subclonal heterogeneity. Here we describe a computational framework, named SomatiCA, which explicitly accounts for tumor purity and subclonality in the analysis of somatic copy-number profiles. Taking read depths (RD) and lesser allele frequencies (LAF) as input, SomatiCA will output 1) admixture rate for each tumor sample, 2) somatic allelic copy-number for each genomic segment, 3) fraction of tumor cells with subclonal change in each somatic copy number aberration (SCNA), and 4) a list of substantial genomic aberration events including gain, loss and LOH. SomatiCA is available as a Bioconductor R package at http://www.bioconductor.org/packages/2.13/bioc/html/SomatiCA.html. 相似文献

8.

A multi-objective genetic algorithm to find active modules in multiplex biological networks

Elva María Novoa-del-Toro Efrn Mezura-Montes Matthieu Vignes Morgane Trzol Frdrique Magdinier Laurent Tichit Anaïs Baudot 《PLoS computational biology》2021,17(8)

The identification of subnetworks of interest—or active modules—by integrating biological networks with molecular profiles is a key resource to inform on the processes perturbed in different cellular conditions. We here propose MOGAMUN, a Multi-Objective Genetic Algorithm to identify active modules in MUltiplex biological Networks. MOGAMUN optimizes both the density of interactions and the scores of the nodes (e.g., their differential expression). We compare MOGAMUN with state-of-the-art methods, representative of different algorithms dedicated to the identification of active modules in single networks. MOGAMUN identifies dense and high-scoring modules that are also easier to interpret. In addition, to our knowledge, MOGAMUN is the first method able to use multiplex networks. Multiplex networks are composed of different layers of physical and functional relationships between genes and proteins. Each layer is associated to its own meaning, topology, and biases; the multiplex framework allows exploiting this diversity of biological networks. We applied MOGAMUN to identify cellular processes perturbed in Facio-Scapulo-Humeral muscular Dystrophy, by integrating RNA-seq expression data with a multiplex biological network. We identified different active modules of interest, thereby providing new angles for investigating the pathomechanisms of this disease.Availability: MOGAMUN is available at https://github.com/elvanov/MOGAMUN and as a Bioconductor package at https://bioconductor.org/packages/release/bioc/html/MOGAMUN.html. Contact: rf.uma-vinu@toduab.siana 相似文献

9.

SeqFeatR for the Discovery of Feature-Sequence Associations

Bettina Budeus J?rg Timm Daniel Hoffmann 《PloS one》2016,11(1)

Specific selection pressures often lead to specifically mutated genomes. The open source software SeqFeatR has been developed to identify associations between mutation patterns in biological sequences and specific selection pressures (“features”). For instance, SeqFeatR has been used to discover in viral protein sequences new T cell epitopes for hosts of given HLA types. SeqFeatR supports frequentist and Bayesian methods for the discovery of statistical sequence-feature associations. Moreover, it offers novel ways to visualize results of the statistical analyses and to relate them to further properties. In this article we demonstrate various functions of SeqFeatR with real data. The most frequently used set of functions is also provided by a web server. SeqFeatR is implemented as R package and freely available from the R archive CRAN (http://cran.r-project.org/web/packages/SeqFeatR/index.html). The package includes a tutorial vignette. The software is distributed under the GNU General Public License (version 3 or later). The web server URL is https://seqfeatr.zmb.uni-due.de. 相似文献

10.

POMAShiny: A user-friendly web-based workflow for metabolomics and proteomics data analysis

Pol Castellano-Escuder Raúl Gonzlez-Domínguez Francesc Carmona-Pontaque Cristina Andrs-Lacueva Alex Snchez-Pla 《PLoS computational biology》2021,17(7)

Metabolomics and proteomics, like other omics domains, usually face a data mining challenge in providing an understandable output to advance in biomarker discovery and precision medicine. Often, statistical analysis is one of the most difficult challenges and it is critical in the subsequent biological interpretation of the results. Because of this, combined with the computational programming skills needed for this type of analysis, several bioinformatic tools aimed at simplifying metabolomics and proteomics data analysis have emerged. However, sometimes the analysis is still limited to a few hidebound statistical methods and to data sets with limited flexibility. POMAShiny is a web-based tool that provides a structured, flexible and user-friendly workflow for the visualization, exploration and statistical analysis of metabolomics and proteomics data. This tool integrates several statistical methods, some of them widely used in other types of omics, and it is based on the POMA R/Bioconductor package, which increases the reproducibility and flexibility of analyses outside the web environment. POMAShiny and POMA are both freely available at https://github.com/nutrimetabolomics/POMAShiny and https://github.com/nutrimetabolomics/POMA, respectively. 相似文献

11.

BubbleTree: an intuitive visualization to elucidate tumoral aneuploidy and clonality using next generation sequencing data

Wei Zhu Michael Kuziora Todd Creasy Zhongwu Lai Christopher Morehouse Xiang Guo Yinong Sebastian Dong Shen Jiaqi Huang Jonathan R. Dry Feng Xue Liyan Jiang Yihong Yao Brandon W. Higgs 《Nucleic acids research》2016,44(4):e38

Tumors are characterized by properties of genetic instability, heterogeneity, and significant oligoclonality. Elucidating this intratumoral heterogeneity is challenging but important. In this study, we propose a framework, BubbleTree, to characterize the tumor clonality using next generation sequencing (NGS) data. BubbleTree simultaneously elucidates the complexity of a tumor biopsy, estimating cancerous cell purity, tumor ploidy, allele-specific copy number, and clonality and represents this in an intuitive graph. We further developed a three-step heuristic method to automate the interpretation of the BubbleTree graph, using a divide-and-conquer strategy. In this study, we demonstrated the performance of BubbleTree with comparisons to similar commonly used tools such as THetA2, ABSOLUTE, AbsCN-seq and ASCAT, using both simulated and patient-derived data. BubbleTree outperformed these tools, particularly in identifying tumor subclonal populations and polyploidy. We further demonstrated BubbleTree''s utility in tracking clonality changes from patients’ primary to metastatic tumor and dating somatic single nucleotide and copy number variants along the tumor clonal evolution. Overall, the BubbleTree graph and corresponding model is a powerful approach to provide a comprehensive spectrum of the heterogeneous tumor karyotype in human tumors. BubbleTree is R-based and freely available to the research community (https://www.bioconductor.org/packages/release/bioc/html/BubbleTree.html). 相似文献

12.

Regulatory network-based imputation of dropouts in single-cell RNA sequencing data

Ana Carolina Leote Xiaohui Wu Andreas Beyer 《PLoS computational biology》2022,18(2)

相似文献

13.

beadarray: R classes and methods for Illumina bead-based data 总被引：2，自引：0，他引：2

Dunning MJ Smith ML Ritchie ME Tavaré S 《Bioinformatics (Oxford, England)》2007,23(16):2183-2184

The R/Bioconductor package beadarray allows raw data from Illumina experiments to be read and stored in convenient R classes. Users are free to choose between various methods of image processing, background correction and normalization in their analysis rather than using the defaults in Illumina's; proprietary software. The package also allows quality assessment to be carried out on the raw data. The data can then be summarized and stored in a format which can be used by other R/Bioconductor packages to perform downstream analyses. Summarized data processed by Illumina's; BeadStudio software can also be read and analysed in the same manner. Availability: The beadarray package is available from the Bioconductor web page at www.bioconductor.org. A user's guide and example data sets are provided with the package. 相似文献

14.

Rintact: enabling computational analysis of molecular interaction data from the IntAct repository

Chiang T Li N Orchard S Kerrien S Hermjakob H Gentleman R Huber W 《Bioinformatics (Oxford, England)》2008,24(8):1100-1101

MOTIVATION: The IntAct repository is one of the largest and most widely used databases for the curation and storage of molecular interaction data. These datasets need to be analyzed by computational methods. Software packages in the statistical environment R provide powerful tools for conducting such analyses. RESULTS: We introduce Rintact, a Bioconductor package that allows users to transform PSI-MI XML2.5 interaction data files from IntAct into R graph objects. On these, they can use methods from R and Bioconductor for a variety of tasks: determining cohesive subgraphs, computing summary statistics, fitting mathematical models to the data or rendering graphical layouts. Rintact provides a programmatic interface to the IntAct repository and allows the use of the analytic methods provided by R and Bioconductor. AVAILABILITY: Rintact is freely available at http://bioconductor.org 相似文献

15.

plethy: management of whole body plethysmography data in R

Daniel Bottomly Beth Wilmot Shannon K McWeeney 《BMC bioinformatics》2015,16(1)

相似文献

16.

OTUbase: an R infrastructure package for operational taxonomic unit data

Beck D Settles M Foster JA 《Bioinformatics (Oxford, England)》2011,27(12):1700-1701

SUMMARY: OTUbase is an R package designed to facilitate the analysis of operational taxonomic unit (OTU) data and sequence classification (taxonomic) data. Currently there are programs that will cluster sequence data into OTUs and/or classify sequence data into known taxonomies. However, there is a need for software that can take the summarized output of these programs and organize it into easily accessed and manipulated formats. OTUbase provides this structure and organization within R, to allow researchers to easily manipulate the data with the rich library of R packages currently available for additional analysis. AVAILABILITY: OTUbase is an R package available through Bioconductor. It can be found at http://www.bioconductor.org/packages/release/bioc/html/OTUbase.html. 相似文献

17.

A Graphical Modelling Approach to the Dissection of Highly Correlated Transcription Factor Binding Site Profiles

Robert Stojnic Audrey Qiuyan Fu Boris Adryan 《PLoS computational biology》2012,8(11)

相似文献

18.

oneChannelGUI: a graphical interface to Bioconductor tools, designed for life scientists who are not familiar with R language 总被引：1，自引：0，他引：1

Sanges R Cordero F Calogero RA 《Bioinformatics (Oxford, England)》2007,23(24):3406-3408

相似文献

19.

miQC: An adaptive probabilistic framework for quality control of single-cell RNA-sequencing data

Ariel A. Hippen Matias M. Falco Lukas M. Weber Erdogan Pekcan Erkan Kaiyang Zhang Jennifer Anne Doherty Anna Vhrautio Casey S. Greene Stephanie C. Hicks 《PLoS computational biology》2021,17(8)

Single-cell RNA-sequencing (scRNA-seq) has made it possible to profile gene expression in tissues at high resolution. An important preprocessing step prior to performing downstream analyses is to identify and remove cells with poor or degraded sample quality using quality control (QC) metrics. Two widely used QC metrics to identify a ‘low-quality’ cell are (i) if the cell includes a high proportion of reads that map to mitochondrial DNA (mtDNA) encoded genes and (ii) if a small number of genes are detected. Current best practices use these QC metrics independently with either arbitrary, uniform thresholds (e.g. 5%) or biological context-dependent (e.g. species) thresholds, and fail to jointly model these metrics in a data-driven manner. Current practices are often overly stringent and especially untenable on certain types of tissues, such as archived tumor tissues, or tissues associated with mitochondrial function, such as kidney tissue [1]. We propose a data-driven QC metric (miQC) that jointly models both the proportion of reads mapping to mtDNA genes and the number of detected genes with mixture models in a probabilistic framework to predict the low-quality cells in a given dataset. We demonstrate how our QC metric easily adapts to different types of single-cell datasets to remove low-quality cells while preserving high-quality cells that can be used for downstream analyses. Our software package is available at https://bioconductor.org/packages/miQC. 相似文献

20.

Epistasis Test in Meta-Analysis: A Multi-Parameter Markov Chain Monte Carlo Model for Consistency of Evidence

Chin Lin Chi-Ming Chu Sui-Lung Su 《PloS one》2016,11(4)

Conventional genome-wide association studies (GWAS) have been proven to be a successful strategy for identifying genetic variants associated with complex human traits. However, there is still a large heritability gap between GWAS and transitional family studies. The “missing heritability” has been suggested to be due to lack of studies focused on epistasis, also called gene–gene interactions, because individual trials have often had insufficient sample size. Meta-analysis is a common method for increasing statistical power. However, sufficient detailed information is difficult to obtain. A previous study employed a meta-regression-based method to detect epistasis, but it faced the challenge of inconsistent estimates. Here, we describe a Markov chain Monte Carlo-based method, called “Epistasis Test in Meta-Analysis” (ETMA), which uses genotype summary data to obtain consistent estimates of epistasis effects in meta-analysis. We defined a series of conditions to generate simulation data and tested the power and type I error rates in ETMA, individual data analysis and conventional meta-regression-based method. ETMA not only successfully facilitated consistency of evidence but also yielded acceptable type I error and higher power than conventional meta-regression. We applied ETMA to three real meta-analysis data sets. We found significant gene–gene interactions in the renin–angiotensin system and the polycyclic aromatic hydrocarbon metabolism pathway, with strong supporting evidence. In addition, glutathione S-transferase (GST) mu 1 and theta 1 were confirmed to exert independent effects on cancer. We concluded that the application of ETMA to real meta-analysis data was successful. Finally, we developed an R package, etma, for the detection of epistasis in meta-analysis [etma is available via the Comprehensive R Archive Network (CRAN) at https://cran.r-project.org/web/packages/etma/index.html]. 相似文献