期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Identification of cancer subtypes from single-cell RNA-seq data using a consensus clustering method

Yanglan Gan Ning Li Guobing Zou Yongchang Xin Jihong Guan 《BMC medical genomics》2018,11(6):117

Background

Human cancers are complex ecosystems composed of cells with distinct molecular signatures. Such intratumoral heterogeneity poses a major challenge to cancer diagnosis and treatment. Recent advancements of single-cell techniques such as scRNA-seq have brought unprecedented insights into cellular heterogeneity. Subsequently, a challenging computational problem is to cluster high dimensional noisy datasets with substantially fewer cells than the number of genes.

Methods

In this paper, we introduced a consensus clustering framework conCluster, for cancer subtype identification from single-cell RNA-seq data. Using an ensemble strategy, conCluster fuses multiple basic partitions to consensus clusters.

Results

Applied to real cancer scRNA-seq datasets, conCluster can more accurately detect cancer subtypes than the widely used scRNA-seq clustering methods. Further, we conducted co-expression network analysis for the identified melanoma subtypes.

Conclusions

Our analysis demonstrates that these subtypes exhibit distinct gene co-expression networks and significant gene sets with different functional enrichment.

相似文献

2.

SC3-seq: a method for highly parallel and quantitative measurement of single-cell gene expression

Tomonori Nakamura Yukihiro Yabuta Ikuhiro Okamoto Shinya Aramaki Shihori Yokobayashi Kazuki Kurimoto Kiyotoshi Sekiguchi Masato Nakagawa Takuya Yamamoto Mitinori Saitou 《Nucleic acids research》2015,43(9):e60

相似文献

3.

scDeepSort: a pre-trained cell-type annotation method for single-cell transcriptomics using deep learning with a weighted graph neural network

Xin Shao Haihong Yang Xiang Zhuang Jie Liao Penghui Yang Junyun Cheng Xiaoyan Lu Huajun Chen Xiaohui Fan 《Nucleic acids research》2021,49(21):e122

相似文献

4.

SDImpute: A statistical block imputation method based on cell-level and gene-level information for dropouts in single-cell RNA-seq data

Jing Qi Yang Zhou Zicen Zhao Shuilin Jin 《PLoS computational biology》2021,17(6)

The single-cell RNA sequencing (scRNA-seq) technologies obtain gene expression at single-cell resolution and provide a tool for exploring cell heterogeneity and cell types. As the low amount of extracted mRNA copies per cell, scRNA-seq data exhibit a large number of dropouts, which hinders the downstream analysis of the scRNA-seq data. We propose a statistical method, SDImpute (Single-cell RNA-seq Dropout Imputation), to implement block imputation for dropout events in scRNA-seq data. SDImpute automatically identifies the dropout events based on the gene expression levels and the variations of gene expression across similar cells and similar genes, and it implements block imputation for dropouts by utilizing gene expression unaffected by dropouts from similar cells. In the experiments, the results of the simulated datasets and real datasets suggest that SDImpute is an effective tool to recover the data and preserve the heterogeneity of gene expression across cells. Compared with the state-of-the-art imputation methods, SDImpute improves the accuracy of the downstream analysis including clustering, visualization, and differential expression analysis. 相似文献

5.

Gene Function Prediction from Functional Association Networks Using Kernel Partial Least Squares Regression

Sonja Lehtinen Jon Lees Jürg B?hler John Shawe-Taylor Christine Orengo 《PloS one》2015,10(8)

With the growing availability of large-scale biological datasets, automated methods of extracting functionally meaningful information from this data are becoming increasingly important. Data relating to functional association between genes or proteins, such as co-expression or functional association, is often represented in terms of gene or protein networks. Several methods of predicting gene function from these networks have been proposed. However, evaluating the relative performance of these algorithms may not be trivial: concerns have been raised over biases in different benchmarking methods and datasets, particularly relating to non-independence of functional association data and test data. In this paper we propose a new network-based gene function prediction algorithm using a commute-time kernel and partial least squares regression (Compass). We compare Compass to GeneMANIA, a leading network-based prediction algorithm, using a number of different benchmarks, and find that Compass outperforms GeneMANIA on these benchmarks. We also explicitly explore problems associated with the non-independence of functional association data and test data. We find that a benchmark based on the Gene Ontology database, which, directly or indirectly, incorporates information from other databases, may considerably overestimate the performance of algorithms exploiting functional association data for prediction. 相似文献

6.

Independent component analysis based gene co-expression network inference (ICAnet) to decipher functional modules for better single-cell clustering and batch integration

Weixu Wang Huanhuan Tan Mingwan Sun Yiqing Han Wei Chen Shengnu Qiu Ke Zheng Gang Wei Ting Ni 《Nucleic acids research》2021,49(9):e54

With the tremendous increase of publicly available single-cell RNA-sequencing (scRNA-seq) datasets, bioinformatics methods based on gene co-expression network are becoming efficient tools for analyzing scRNA-seq data, improving cell type prediction accuracy and in turn facilitating biological discovery. However, the current methods are mainly based on overall co-expression correlation and overlook co-expression that exists in only a subset of cells, thus fail to discover certain rare cell types and sensitive to batch effect. Here, we developed independent component analysis-based gene co-expression network inference (ICAnet) that decomposed scRNA-seq data into a series of independent gene expression components and inferred co-expression modules, which improved cell clustering and rare cell-type discovery. ICAnet showed efficient performance for cell clustering and batch integration using scRNA-seq datasets spanning multiple cells/tissues/donors/library types. It works stably on datasets produced by different library construction strategies and with different sequencing depths and cell numbers. We demonstrated the capability of ICAnet to discover rare cell types in multiple independent scRNA-seq datasets from different sources. Importantly, the identified modules activated in acute myeloid leukemia scRNA-seq datasets have the potential to serve as new diagnostic markers. Thus, ICAnet is a competitive tool for cell clustering and biological interpretations of single-cell RNA-seq data analysis. 相似文献

7.

TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis

Zhicheng Ji Hongkai Ji 《Nucleic acids research》2016,44(13):e117

相似文献

8.

iTARGEX analysis of yeast deletome reveals novel regulators of transcriptional buffering in S phase and protein turnover

Jia-Hsin Huang You-Rou Liao Tzu-Chieh Lin Cheng-Hung Tsai Wei-Yun Lai Yang-Kai Chou Jun-Yi Leu Huai-Kuang Tsai Cheng-Fu Kao 《Nucleic acids research》2021,49(13):7318

相似文献

9.

Identification of Common Prognostic Gene Expression Signatures with Biological Meanings from Microarray Gene Expression Datasets

Jun Yao Qi Zhao Ying Yuan Li Zhang Xiaoming Liu W. K. Alfred Yung John N. Weinstein 《PloS one》2012,7(9)

Numerous prognostic gene expression signatures for breast cancer were generated previously with few overlap and limited insight into the biology of the disease. Here we introduce a novel algorithm named SCoR (Survival analysis using Cox proportional hazard regression and Random resampling) to apply random resampling and clustering methods in identifying gene features correlated with time to event data. This is shown to reduce overfitting noises involved in microarray data analysis and discover functional gene sets linked to patient survival. SCoR independently identified a common poor prognostic signature composed of cell proliferation genes from six out of eight breast cancer datasets. Furthermore, a sequential SCoR analysis on highly proliferative breast cancers repeatedly identified T/B cell markers as favorable prognosis factors. In glioblastoma, SCoR identified a common good prognostic signature of chromosome 10 genes from two gene expression datasets (TCGA and REMBRANDT), recapitulating the fact that loss of one copy of chromosome 10 (which harbors the tumor suppressor PTEN) is linked to poor survival in glioblastoma patients. SCoR also identified prognostic genes on sex chromosomes in lung adenocarcinomas, suggesting patient gender might be used to predict outcome in this disease. These results demonstrate the power of SCoR to identify common and biologically meaningful prognostic gene expression signatures. 相似文献

10.

Characterization of the β-tubulin gene family in Ascaris lumbricoides and Ascaris suum and its implication for the molecular detection of benzimidazole resistance

Sara Roose Russell W. Avramenko Stephen M. J. Pollo James D. Wasmuth Shaali Ame Mio Ayana Martha Betson Piet Cools Daniel Dana Ben P. Jones Zeleke Mekonnen Arianna Morosetti Abhinaya Venkatesan Johnny Vlaminck Matthew L. Workentine Bruno Levecke John S. Gilleard Peter Geldhof 《PLoS neglected tropical diseases》2021,15(9)

相似文献

11.

COPS: Detecting Co-Occurrence and Spatial Arrangement of Transcription Factor Binding Motifs in Genome-Wide Datasets

Nati Ha Maria Polychronidou Ingrid Lohmann 《PloS one》2012,7(12)

相似文献

12.

Homeobox Protein HB9 Binds to the Prostaglandin E Receptor 2 Promoter and Inhibits Intracellular cAMP Mobilization in Leukemic Cells

Sarah Wildenhain Deborah Ingenhag Christian Ruckert ?zer Degistirici Martin Dugas Roland Meisel Julia Hauer Arndt Borkhardt 《The Journal of biological chemistry》2012,287(48):40703-40712

相似文献

13.

Single-cell Transcriptome Study as Big Data

Pingjian Yu Wei Lin 《基因组蛋白质组与生物信息学报(英文版)》2016,14(1):21-30

相似文献

14.

Revealing Pathway Dynamics in Heart Diseases by Analyzing Multiple Differential Networks

Xiaoke Ma Long Gao Georgios Karamanlidis Peng Gao Chi Fung Lee Lorena Garcia-Menendez Rong Tian Kai Tan 《PLoS computational biology》2015,11(6)

Development of heart diseases is driven by dynamic changes in both the activity and connectivity of gene pathways. Understanding these dynamic events is critical for understanding pathogenic mechanisms and development of effective treatment. Currently, there is a lack of computational methods that enable analysis of multiple gene networks, each of which exhibits differential activity compared to the network of the baseline/healthy condition. We describe the iMDM algorithm to identify both unique and shared gene modules across multiple differential co-expression networks, termed M-DMs (multiple differential modules). We applied iMDM to a time-course RNA-Seq dataset generated using a murine heart failure model generated on two genotypes. We showed that iMDM achieves higher accuracy in inferring gene modules compared to using single or multiple co-expression networks. We found that condition-specific M-DMs exhibit differential activities, mediate different biological processes, and are enriched for genes with known cardiovascular phenotypes. By analyzing M-DMs that are present in multiple conditions, we revealed dynamic changes in pathway activity and connectivity across heart failure conditions. We further showed that module dynamics were correlated with the dynamics of disease phenotypes during the development of heart failure. Thus, pathway dynamics is a powerful measure for understanding pathogenesis. iMDM provides a principled way to dissect the dynamics of gene pathways and its relationship to the dynamics of disease phenotype. With the exponential growth of omics data, our method can aid in generating systems-level insights into disease progression. 相似文献

15.

Reverse engineering and analysis of large genome-scale gene networks

Maneesha Aluru Jaroslaw Zola Dan Nettleton Srinivas Aluru 《Nucleic acids research》2013,41(1):e24

Reverse engineering the whole-genome networks of complex multicellular organisms continues to remain a challenge. While simpler models easily scale to large number of genes and gene expression datasets, more accurate models are compute intensive limiting their scale of applicability. To enable fast and accurate reconstruction of large networks, we developed Tool for Inferring Network of Genes (TINGe), a parallel mutual information (MI)-based program. The novel features of our approach include: (i) B-spline-based formulation for linear-time computation of MI, (ii) a novel algorithm for direct permutation testing and (iii) development of parallel algorithms to reduce run-time and facilitate construction of large networks. We assess the quality of our method by comparison with ARACNe (Algorithm for the Reconstruction of Accurate Cellular Networks) and GeneNet and demonstrate its unique capability by reverse engineering the whole-genome network of Arabidopsis thaliana from 3137 Affymetrix ATH1 GeneChips in just 9 min on a 1024-core cluster. We further report on the development of a new software Gene Network Analyzer (GeNA) for extracting context-specific subnetworks from a given set of seed genes. Using TINGe and GeNA, we performed analysis of 241 Arabidopsis AraCyc 8.0 pathways, and the results are made available through the web. 相似文献

16.

Intragenic tRNA-promoted R-loops orchestrate transcription interference for plant oxidative stress responses

Kunpeng Liu Qianwen Sun 《The Plant cell》2021,33(11):3574

相似文献

17.

ChIP-GSM: Inferring active transcription factor modules to predict functional regulatory elements

Xi Chen Andrew F. Neuwald Leena Hilakivi-Clarke Robert Clarke Jianhua Xuan 《PLoS computational biology》2021,17(7)

相似文献

18.

Fast and Rigorous Computation of Gene and Pathway Scores from SNP-Based Summary Statistics

David Lamparter Daniel Marbach Rico Rueedi Zoltán Kutalik Sven Bergmann 《PLoS computational biology》2016,12(1)

Integrating single nucleotide polymorphism (SNP) p-values from genome-wide association studies (GWAS) across genes and pathways is a strategy to improve statistical power and gain biological insight. Here, we present Pascal (Pathway scoring algorithm), a powerful tool for computing gene and pathway scores from SNP-phenotype association summary statistics. For gene score computation, we implemented analytic and efficient numerical solutions to calculate test statistics. We examined in particular the sum and the maximum of chi-squared statistics, which measure the strongest and the average association signals per gene, respectively. For pathway scoring, we use a modified Fisher method, which offers not only significant power improvement over more traditional enrichment strategies, but also eliminates the problem of arbitrary threshold selection inherent in any binary membership based pathway enrichment approach. We demonstrate the marked increase in power by analyzing summary statistics from dozens of large meta-studies for various traits. Our extensive testing indicates that our method not only excels in rigorous type I error control, but also results in more biologically meaningful discoveries. 相似文献

19.

MIDDAS-M: Motif-Independent De Novo Detection of Secondary Metabolite Gene Clusters through the Integration of Genome Sequencing and Transcriptome Data

Myco Umemura Hideaki Koike Nozomi Nagano Tomoko Ishii Jin Kawano Noriko Yamane Ikuko Kozone Katsuhisa Horimoto Kazuo Shin-ya Kiyoshi Asai Jiujiang Yu Joan W. Bennett Masayuki Machida 《PloS one》2013,8(12)

相似文献

20.

Life without tRNAArg–adenosine deaminase TadA: evolutionary consequences of decoding the four CGN codons as arginine in Mycoplasmas and other Mollicutes

Shin-ichi Yokobori Aya Kitamura Henri Grosjean Yoshitaka Bessho 《Nucleic acids research》2013,41(13):6531-6543

In most bacteria, two tRNAs decode the four arginine CGN codons. One tRNA harboring a wobble inosine (tRNA^Arg_I_CG) reads the CGU, CGC and CGA codons, whereas a second tRNA harboring a wobble cytidine (tRNA^Arg_C_CG) reads the remaining CGG codon. The reduced genomes of Mycoplasmas and other Mollicutes lack the gene encoding tRNA^Arg_C_CG. This raises the question of how these organisms decode CGG codons. Examination of 36 Mollicute genomes for genes encoding tRNA^Arg and the TadA enzyme, responsible for wobble inosine formation, suggested an evolutionary scenario where tadA gene mutations first occurred. This allowed the temporary accumulation of non-deaminated tRNA^Arg_A_CG, capable of reading all CGN codons. This hypothesis was verified in Mycoplasma capricolum, which contains a small fraction of tRNA^Arg_A_CG with a non-deaminated wobble adenosine. Subsets of Mollicutes continued to evolve by losing both the mutated tRNA^Arg_C_CG and tadA, and then acquired a new tRNA^Arg_U_CG. This permitted further tRNA^Arg_A_CG mutations with tRNA^Arg_G_CG or its disappearance, leaving a single tRNA^Arg_U_CG to decode the four CGN codons. The key point of our model is that the A-to-I deamination activity had to be controlled before the loss of the tadA gene, allowing the stepwise evolution of Mollicutes toward an alternative decoding strategy. 相似文献