期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Flexible comparison of batch correction methods for single-cell RNA-seq using BatchBench

Ruben Chazarra-Gil Stijn van Dongen Vladimir Yu Kiselev Martin Hemberg 《Nucleic acids research》2021,49(7):e42

As the cost of single-cell RNA-seq experiments has decreased, an increasing number of datasets are now available. Combining newly generated and publicly accessible datasets is challenging due to non-biological signals, commonly known as batch effects. Although there are several computational methods available that can remove batch effects, evaluating which method performs best is not straightforward. Here, we present BatchBench (https://github.com/cellgeni/batchbench), a modular and flexible pipeline for comparing batch correction methods for single-cell RNA-seq data. We apply BatchBench to eight methods, highlighting their methodological differences and assess their performance and computational requirements through a compendium of well-studied datasets. This systematic comparison guides users in the choice of batch correction tool, and the pipeline makes it easy to evaluate other datasets. 相似文献

2.

CellCall: integrating paired ligand–receptor and transcription factor activities for cell–cell communication

Yang Zhang Tianyuan Liu Xuesong Hu Mei Wang Jing Wang Bohao Zou Puwen Tan Tianyu Cui Yiying Dou Lin Ning Yan huang Shuan Rao Dong Wang Xiaoyang Zhao 《Nucleic acids research》2021,49(15):8520

相似文献

3.

coupleCoC+: An information-theoretic co-clustering-based transfer learning framework for the integrative analysis of single-cell genomic data

Pengcheng Zeng Zhixiang Lin 《PLoS computational biology》2021,17(6)

Technological advances have enabled us to profile multiple molecular layers at unprecedented single-cell resolution and the available datasets from multiple samples or domains are growing. These datasets, including scRNA-seq data, scATAC-seq data and sc-methylation data, usually have different powers in identifying the unknown cell types through clustering. So, methods that integrate multiple datasets can potentially lead to a better clustering performance. Here we propose coupleCoC+ for the integrative analysis of single-cell genomic data. coupleCoC+ is a transfer learning method based on the information-theoretic co-clustering framework. In coupleCoC+, we utilize the information in one dataset, the source data, to facilitate the analysis of another dataset, the target data. coupleCoC+ uses the linked features in the two datasets for effective knowledge transfer, and it also uses the information of the features in the target data that are unlinked with the source data. In addition, coupleCoC+ matches similar cell types across the source data and the target data. By applying coupleCoC+ to the integrative clustering of mouse cortex scATAC-seq data and scRNA-seq data, mouse and human scRNA-seq data, mouse cortex sc-methylation and scRNA-seq data, and human blood dendritic cells scRNA-seq data from two batches, we demonstrate that coupleCoC+ improves the overall clustering performance and matches the cell subpopulations across multimodal single-cell genomic datasets. coupleCoC+ has fast convergence and it is computationally efficient. The software is available at https://github.com/cuhklinlab/coupleCoC_plus. 相似文献

4.

PseudoGA: cell pseudotime reconstruction based on genetic algorithm

Pronoy Kanti Mondal Udit Surya Saha Indranil Mukhopadhyay 《Nucleic acids research》2021,49(14):7909

相似文献

5.

Dynamic inference of cell developmental complex energy landscape from time series single-cell transcriptomic data

Qi Jiang Shuo Zhang Lin Wan 《PLoS computational biology》2022,18(1)

相似文献

6.

TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis

Zhicheng Ji Hongkai Ji 《Nucleic acids research》2016,44(13):e117

相似文献

7.

CloudForest: A Scalable and Efficient Random Forest Implementation for Biological Data

Ryan Bressler Richard B. Kreisberg Brady Bernard John E. Niederhuber Joseph G. Vockley Ilya Shmulevich Theo A. Knijnenburg 《PloS one》2015,10(12)

Random Forest has become a standard data analysis tool in computational biology. However, extensions to existing implementations are often necessary to handle the complexity of biological datasets and their associated research questions. The growing size of these datasets requires high performance implementations. We describe CloudForest, a Random Forest package written in Go, which is particularly well suited for large, heterogeneous, genetic and biomedical datasets. CloudForest includes several extensions, such as dealing with unbalanced classes and missing values. Its flexible design enables users to easily implement additional extensions. CloudForest achieves fast running times by effective use of the CPU cache, optimizing for different classes of features and efficiently multi-threading. https://github.com/ilyalab/CloudForest. 相似文献

8.

Wham: Identifying Structural Variants of Biological Consequence

Zev N. Kronenberg Edward J. Osborne Kelsey R. Cone Brett J. Kennedy Eric T. Domyan Michael D. Shapiro Nels C. Elde Mark Yandell 《PLoS computational biology》2015,11(12)

Existing methods for identifying structural variants (SVs) from short read datasets are inaccurate. This complicates disease-gene identification and efforts to understand the consequences of genetic variation. In response, we have created Wham (Whole-genome Alignment Metrics) to provide a single, integrated framework for both structural variant calling and association testing, thereby bypassing many of the difficulties that currently frustrate attempts to employ SVs in association testing. Here we describe Wham, benchmark it against three other widely used SV identification tools–Lumpy, Delly and SoftSearch–and demonstrate Wham’s ability to identify and associate SVs with phenotypes using data from humans, domestic pigeons, and vaccinia virus. Wham and all associated software are covered under the MIT License and can be freely downloaded from github (https://github.com/zeeev/wham), with documentation on a wiki (http://zeeev.github.io/wham/). For community support please post questions to https://www.biostars.org/.

This is PLOS Computational Biology software paper.

相似文献

9.

VDJtools: Unifying Post-analysis of T Cell Receptor Repertoires

Mikhail Shugay Dmitriy V. Bagaev Maria A. Turchaninova Dmitriy A. Bolotin Olga V. Britanova Ekaterina V. Putintseva Mikhail V. Pogorelyy Vadim I. Nazarov Ivan V. Zvyagin Vitalina I. Kirgizova Kirill I. Kirgizov Elena V. Skorobogatova Dmitriy M. Chudakov 《PLoS computational biology》2015,11(11)

Despite the growing number of immune repertoire sequencing studies, the field still lacks software for analysis and comprehension of this high-dimensional data. Here we report VDJtools, a complementary software suite that solves a wide range of T cell receptor (TCR) repertoires post-analysis tasks, provides a detailed tabular output and publication-ready graphics, and is built on top of a flexible API. Using TCR datasets for a large cohort of unrelated healthy donors, twins, and multiple sclerosis patients we demonstrate that VDJtools greatly facilitates the analysis and leads to sound biological conclusions. VDJtools software and documentation are available at https://github.com/mikessh/vdjtools. 相似文献

10.

CHANCE: comprehensive software for quality control and validation of ChIP-seq data

Aaron Diaz Abhinav Nellore Jun S Song 《Genome biology》2012,13(10):R98

ChIP-seq is a powerful method for obtaining genome-wide maps of protein-DNA interactions and epigenetic modifications. CHANCE (CHip-seq ANalytics and Confidence Estimation) is a standalone package for ChIP-seq quality control and protocol optimization. Our user-friendly graphical software quickly estimates the strength and quality of immunoprecipitations, identifies biases, compares the user''s data with ENCODE''s large collection of published datasets, performs multi-sample normalization, checks against quantitative PCR-validated control regions, and produces informative graphical reports. CHANCE is available at https://github.com/songlab/chance. 相似文献

11.

A Real-Time All-Atom Structural Search Engine for Proteins

Gabriel Gonzalez Brett Hannigan William F. DeGrado 《PLoS computational biology》2014,10(7)

Protein designers use a wide variety of software tools for de novo design, yet their repertoire still lacks a fast and interactive all-atom search engine. To solve this, we have built the Suns program: a real-time, atomic search engine integrated into the PyMOL molecular visualization system. Users build atomic-level structural search queries within PyMOL and receive a stream of search results aligned to their query within a few seconds. This instant feedback cycle enables a new “designability”-inspired approach to protein design where the designer searches for and interactively incorporates native-like fragments from proven protein structures. We demonstrate the use of Suns to interactively build protein motifs, tertiary interactions, and to identify scaffolds compatible with hot-spot residues. The official web site and installer are located at http://www.degradolab.org/suns/ and the source code is hosted at https://github.com/godotgildor/Suns (PyMOL plugin, BSD license), https://github.com/Gabriel439/suns-cmd (command line client, BSD license), and https://github.com/Gabriel439/suns-search (search engine server, GPLv2 license).

This is a PLOS Computational Biology Software Article

相似文献

12.

Chromatin segmentation based on a probabilistic model for read counts explains a large portion of the epigenome

Alessandro Mammana Ho-Ryun Chung 《Genome biology》2015,16(1)

Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is an increasingly common experimental approach to generate genome-wide maps of histone modifications and to dissect the complexity of the epigenome. Here, we propose EpiCSeg: a novel algorithm that combines several histone modification maps for the segmentation and characterization of cell-type specific epigenomic landscapes. By using an accurate probabilistic model for the read counts, EpiCSeg provides a useful annotation for a considerably larger portion of the genome, shows a stronger association with validation data, and yields more consistent predictions across replicate experiments when compared to existing methods.The software is available at http://github.com/lamortenera/epicseg

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-015-0708-z) contains supplementary material, which is available to authorized users. 相似文献

13.

Ultra-large alignments using phylogeny-aware profiles

Nam-phuong D. Nguyen Siavash Mirarab Keerthana Kumar Tandy Warnow 《Genome biology》2015,16(1)

Many biological questions, including the estimation of deep evolutionary histories and the detection of remote homology between protein sequences, rely upon multiple sequence alignments and phylogenetic trees of large datasets. However, accurate large-scale multiple sequence alignment is very difficult, especially when the dataset contains fragmentary sequences. We present UPP, a multiple sequence alignment method that uses a new machine learning technique, the ensemble of hidden Markov models, which we propose here. UPP produces highly accurate alignments for both nucleotide and amino acid sequences, even on ultra-large datasets or datasets containing fragmentary sequences. UPP is available at https://github.com/smirarab/sepp.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-015-0688-z) contains supplementary material, which is available to authorized users. 相似文献

14.

Identification of novel fusion genes in lung cancer using breakpoint assembly of transcriptome sequencing data

《Genome biology》2015,16(1)

相似文献

15.

NGS-QCbox and Raspberry for Parallel,Automated and Rapid Quality Control Analysis of Large-Scale Next Generation Sequencing (Illumina) Data

Mohan A. V. S. K. Katta Aamir W. Khan Dadakhalandar Doddamani Mahendar Thudi Rajeev K. Varshney 《PloS one》2015,10(10)

相似文献

16.

POMAShiny: A user-friendly web-based workflow for metabolomics and proteomics data analysis

Pol Castellano-Escuder Raúl Gonzlez-Domínguez Francesc Carmona-Pontaque Cristina Andrs-Lacueva Alex Snchez-Pla 《PLoS computational biology》2021,17(7)

Metabolomics and proteomics, like other omics domains, usually face a data mining challenge in providing an understandable output to advance in biomarker discovery and precision medicine. Often, statistical analysis is one of the most difficult challenges and it is critical in the subsequent biological interpretation of the results. Because of this, combined with the computational programming skills needed for this type of analysis, several bioinformatic tools aimed at simplifying metabolomics and proteomics data analysis have emerged. However, sometimes the analysis is still limited to a few hidebound statistical methods and to data sets with limited flexibility. POMAShiny is a web-based tool that provides a structured, flexible and user-friendly workflow for the visualization, exploration and statistical analysis of metabolomics and proteomics data. This tool integrates several statistical methods, some of them widely used in other types of omics, and it is based on the POMA R/Bioconductor package, which increases the reproducibility and flexibility of analyses outside the web environment. POMAShiny and POMA are both freely available at https://github.com/nutrimetabolomics/POMAShiny and https://github.com/nutrimetabolomics/POMA, respectively. 相似文献

17.

cLoops2: a full-stack comprehensive analytical tool for chromatin interactions

Yaqiang Cao Shuai Liu Gang Ren Qingsong Tang Keji Zhao 《Nucleic acids research》2022,50(1):57

Investigating chromatin interactions between regulatory regions such as enhancer and promoter elements is vital for understanding the regulation of gene expression. Compared to Hi-C and its variants, the emerging 3D mapping technologies focusing on enriched signals, such as TrAC-looping, reduce the sequencing cost and provide higher interaction resolution for cis-regulatory elements. A robust pipeline is needed for the comprehensive interpretation of these data, especially for loop-centric analysis. Therefore, we have developed a new versatile tool named cLoops2 for the full-stack analysis of these 3D chromatin interaction data. cLoops2 consists of core modules for peak-calling, loop-calling, differentially enriched loops calling and loops annotation. It also contains multiple modules for interaction resolution estimation, data similarity estimation, features quantification, feature aggregation analysis, and visualization. cLoops2 with documentation and example data are open source and freely available at GitHub: https://github.com/KejiZhaoLab/cLoops2. 相似文献

18.

UFold: fast and accurate RNA secondary structure prediction with deep learning

Laiyi Fu Yingxin Cao Jie Wu Qinke Peng Qing Nie Xiaohui Xie 《Nucleic acids research》2022,50(3):e14

For many RNA molecules, the secondary structure is essential for the correct function of the RNA. Predicting RNA secondary structure from nucleotide sequences is a long-standing problem in genomics, but the prediction performance has reached a plateau over time. Traditional RNA secondary structure prediction algorithms are primarily based on thermodynamic models through free energy minimization, which imposes strong prior assumptions and is slow to run. Here, we propose a deep learning-based method, called UFold, for RNA secondary structure prediction, trained directly on annotated data and base-pairing rules. UFold proposes a novel image-like representation of RNA sequences, which can be efficiently processed by Fully Convolutional Networks (FCNs). We benchmark the performance of UFold on both within- and cross-family RNA datasets. It significantly outperforms previous methods on within-family datasets, while achieving a similar performance as the traditional methods when trained and tested on distinct RNA families. UFold is also able to predict pseudoknots accurately. Its prediction is fast with an inference time of about 160 ms per sequence up to 1500 bp in length. An online web server running UFold is available at https://ufold.ics.uci.edu. Code is available at https://github.com/uci-cbcl/UFold. 相似文献

19.

Comprehensive Annotation of the Parastagonospora nodorum Reference Genome Using Next-Generation Genomics,Transcriptomics and Proteogenomics

Robert A. Syme Kar-Chun Tan James K. Hane Kejal Dodhia Thomas Stoll Marcus Hastie Eiko Furuki Simon R. Ellwood Angela H. Williams Yew-Foon Tan Alison C. Testa Jeffrey J. Gorman Richard P. Oliver 《PloS one》2016,11(2)

相似文献

20.

MGEnrichment: A web application for microglia gene list enrichment analysis

Justin Jao Annie Vogel Ciernia 《PLoS computational biology》2021,17(11)

Gene expression analysis is becoming increasingly utilized in neuro-immunology research, and there is a growing need for non-programming scientists to be able to analyze their own genomic data. MGEnrichment is a web application developed both to disseminate to the community our curated database of microglia-relevant gene lists, and to allow non-programming scientists to easily conduct statistical enrichment analysis on their gene expression data. Users can upload their own gene IDs to assess the relevance of their expression data against gene lists from other studies. We include example datasets of differentially expressed genes (DEGs) from human postmortem brain samples from Autism Spectrum Disorder (ASD) and matched controls. We demonstrate how MGEnrichment can be used to expand the interpretations of these DEG lists in terms of regulation of microglial gene expression and provide novel insights into how ASD DEGs may be implicated specifically in microglial development, microbiome responses and relationships to other neuropsychiatric disorders. This tool will be particularly useful for those working in microglia, autism spectrum disorders, and neuro-immune activation research. MGEnrichment is available at https://ciernialab.shinyapps.io/MGEnrichmentApp/ and further online documentation and datasets can be found at https://github.com/ciernialab/MGEnrichmentApp. The app is released under the GNU GPLv3 open source license. 相似文献