首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
With the tremendous increase of publicly available single-cell RNA-sequencing (scRNA-seq) datasets, bioinformatics methods based on gene co-expression network are becoming efficient tools for analyzing scRNA-seq data, improving cell type prediction accuracy and in turn facilitating biological discovery. However, the current methods are mainly based on overall co-expression correlation and overlook co-expression that exists in only a subset of cells, thus fail to discover certain rare cell types and sensitive to batch effect. Here, we developed independent component analysis-based gene co-expression network inference (ICAnet) that decomposed scRNA-seq data into a series of independent gene expression components and inferred co-expression modules, which improved cell clustering and rare cell-type discovery. ICAnet showed efficient performance for cell clustering and batch integration using scRNA-seq datasets spanning multiple cells/tissues/donors/library types. It works stably on datasets produced by different library construction strategies and with different sequencing depths and cell numbers. We demonstrated the capability of ICAnet to discover rare cell types in multiple independent scRNA-seq datasets from different sources. Importantly, the identified modules activated in acute myeloid leukemia scRNA-seq datasets have the potential to serve as new diagnostic markers. Thus, ICAnet is a competitive tool for cell clustering and biological interpretations of single-cell RNA-seq data analysis.  相似文献   

2.
《Genomics》2021,113(3):1308-1324
Single-cell RNA sequencing (scRNA-seq) is a powerful technology that is capable of generating gene expression data at the resolution of individual cell. The scRNA-seq data is characterized by the presence of dropout events, which severely bias the results if they remain unaddressed. There are limited Differential Expression (DE) approaches which consider the biological processes, which lead to dropout events, in the modeling process. So, we develop, SwarnSeq, an improved method for DE, and other downstream analysis that considers the molecular capture process in scRNA-seq data modeling. The performance of the proposed method is benchmarked with 11 existing methods on 10 different real scRNA-seq datasets under three comparison settings. We demonstrate that SwarnSeq method has improved performance over the 11 existing methods. This improvement is consistently observed across several public scRNA-seq datasets generated using different scRNA-seq protocols. The external spike-ins data can be used in the SwarnSeq method to enhance its performance.Availability and implementationThe method is implemented as a publicly available R package available at https://github.com/sam-uofl/SwarnSeq.  相似文献   

3.
4.
5.
Stem cells(SCs) with their self-renewal and pluripotent differentiation potential,show great promise for therapeutic applications to some refractory diseases such as stroke, Parkinsonism, myocardial infarction, and diabetes. Furthermore, as seed cells in tissue engineering, SCs have been applied widely to tissue and organ regeneration. However, previous studies have shown that SCs are heterogeneous and consist of many cell subpopulations. Owing to this heterogeneity of cell states, gene expression is highly diverse between cells even within a single tissue,making precise identification and analysis of biological properties difficult, which hinders their further research and applications. Therefore, a defined understanding of the heterogeneity is a key to research of SCs. Traditional ensemble-based sequencing approaches, such as microarrays, reflect an average of expression levels across a large population, which overlook unique biological behaviors of individual cells, conceal cell-to-cell variations, and cannot understand the heterogeneity of SCs radically. The development of high throughput single cell RNA sequencing(scRNA-seq) has provided a new research tool in biology, ranging from identification of novel cell types and exploration of cell markers to the analysis of gene expression and predicating developmental trajectories. scRNA-seq has profoundly changed our understanding of a series of biological phenomena. Currently, it has been used in research of SCs in many fields, particularly for the research of heterogeneity and cell subpopulations in early embryonic development. In this review, we focus on the scRNA-seq technique and its applications to research of SCs.  相似文献   

6.
Accurate identification of cell types from single-cell RNA sequencing(scRNA-seq) data plays a critical role in a variety of scRNA-seq analysis studies. This task corresponds to solving an unsupervised clustering problem, in which the similarity measurement between cells affects the result significantly. Although many approaches for cell type identification have been proposed,the accuracy still needs to be improved. In this study, we proposed a novel single-cell clustering framework based on similarity learning, called SSRE. SSRE models the relationships between cells based on subspace assumption, and generates a sparse representation of the cell-to-cell similarity.The sparse representation retains the most similar neighbors for each cell. Besides, three classical pairwise similarities are incorporated with a gene selection and enhancement strategy to further improve the effectiveness of SSRE. Tested on ten real scRNA-seq datasets and five simulated datasets, SSRE achieved the superior performance in most cases compared to several state-of-the-art single-cell clustering methods. In addition, SSRE can be extended to visualization of scRNA-seq data and identification of differentially expressed genes. The matlab and python implementations of SSRE are available at https://github.com/CSUBioGroup/SSRE.  相似文献   

7.
8.
More than a decade of genome-wide association studies (GWASs) have identified genetic risk variants that are significantly associated with complex traits. Emerging evidence suggests that the function of trait-associated variants likely acts in a tissue- or cell-type-specific fashion. Yet, it remains challenging to prioritize trait-relevant tissues or cell types to elucidate disease etiology. Here, we present EPIC (cEll tyPe enrIChment), a statistical framework that relates large-scale GWAS summary statistics to cell-type-specific gene expression measurements from single-cell RNA sequencing (scRNA-seq). We derive powerful gene-level test statistics for common and rare variants, separately and jointly, and adopt generalized least squares to prioritize trait-relevant cell types while accounting for the correlation structures both within and between genes. Using enrichment of loci associated with four lipid traits in the liver and enrichment of loci associated with three neurological disorders in the brain as ground truths, we show that EPIC outperforms existing methods. We apply our framework to multiple scRNA-seq datasets from different platforms and identify cell types underlying type 2 diabetes and schizophrenia. The enrichment is replicated using independent GWAS and scRNA-seq datasets and further validated using PubMed search and existing bulk case-control testing results.  相似文献   

9.
Annotating cell types is a critical step in single-cell RNA sequencing(scRNA-seq) data analysis. Some supervised or semi-supervised classification methods have recently emerged to enable automated cell type identification. However, comprehensive evaluations of these methods are lacking. Moreover, it is not clear whether some classification methods originally designed for analyzing other bulk omics data are adaptable to scRNA-seq analysis. In this study, we evaluated ten cell type annotation methods publicly available as R packages. Eight of them are popular methods developed specifically for single-cell research, including Seurat, scmap, SingleR, CHETAH, SingleCellNet, scID, Garnett, and SCINA. The other two methods were repurposed from deconvoluting DNA methylation data, i.e., linear constrained projection(CP) and robust partial correlations(RPC). We conducted systematic comparisons on a wide variety of public scRNA-seq datasets as well as simulation data. We assessed the accuracy through intra-dataset and inter-dataset predictions; the robustness over practical challenges such as gene filtering, high similarity among cell types, and increased cell type classes; as well as the detection of rare and unknown cell types. Overall, methods such as Seurat, SingleR, CP, RPC, and SingleCellNet performed well, with Seurat being the best at annotating major cell types. Additionally, Seurat, SingleR, CP, and RPC were more robust against downsampling. However, Seurat did have a major drawback at predicting rare cell populations, and it was suboptimal at differentiating cell types highly similar to each other,compared to SingleR and RPC. All the code and data are available from https://github.com/qianhuiSenn/scRNA_cell_deconv_benchmark.  相似文献   

10.
Technological advances have enabled us to profile multiple molecular layers at unprecedented single-cell resolution and the available datasets from multiple samples or domains are growing. These datasets, including scRNA-seq data, scATAC-seq data and sc-methylation data, usually have different powers in identifying the unknown cell types through clustering. So, methods that integrate multiple datasets can potentially lead to a better clustering performance. Here we propose coupleCoC+ for the integrative analysis of single-cell genomic data. coupleCoC+ is a transfer learning method based on the information-theoretic co-clustering framework. In coupleCoC+, we utilize the information in one dataset, the source data, to facilitate the analysis of another dataset, the target data. coupleCoC+ uses the linked features in the two datasets for effective knowledge transfer, and it also uses the information of the features in the target data that are unlinked with the source data. In addition, coupleCoC+ matches similar cell types across the source data and the target data. By applying coupleCoC+ to the integrative clustering of mouse cortex scATAC-seq data and scRNA-seq data, mouse and human scRNA-seq data, mouse cortex sc-methylation and scRNA-seq data, and human blood dendritic cells scRNA-seq data from two batches, we demonstrate that coupleCoC+ improves the overall clustering performance and matches the cell subpopulations across multimodal single-cell genomic datasets. coupleCoC+ has fast convergence and it is computationally efficient. The software is available at https://github.com/cuhklinlab/coupleCoC_plus.  相似文献   

11.
Here, we introduce scMAGIC (Single Cell annotation using MArker Genes Identification and two rounds of reference-based Classification [RBC]), a novel method that uses well-annotated single-cell RNA sequencing (scRNA-seq) data as the reference to assist in the classification of query scRNA-seq data. A key innovation in scMAGIC is the introduction of a second-round RBC in which those query cells whose cell identities are confidently validated in the first round are used as a new reference to again classify query cells, therefore eliminating the batch effects between the reference and the query data. scMAGIC significantly outperforms 13 competing RBC methods with their optimal parameter settings across 86 benchmark tests, especially when the cell types in the query dataset are not completely covered by the reference dataset and when there exist significant batch effects between the reference and the query datasets. Moreover, when no reference dataset is available, scMAGIC can annotate query cells with reasonably high accuracy by using an atlas dataset as the reference.  相似文献   

12.
《Genomics》2023,115(4):110644
Single-cell RNA sequencing (scRNA-seq) analysis have provided an unprecedented resolution for the studies on diabetic retinopathy (DR). However, the early changes in the retina in diabetes remain unclear. A total of 8 human and mouse scRNA-seq datasets, containing 276,402 cells were analyzed individually to comprehensively delineate the retinal cell atlas. The neural retinas were isolated from the type 2 diabetes (T2D) and control mice, and scRNA-seq analysis was conducted to evaluate the early effects of diabetes on the retina. Bipolar cell (BC) heterogeneity were identified. We found some stable BCs across multiple datasets, and explored their biological functions. A new RBC subtype (Car8_RBC) in the mouse retina was validated using the multi-color immunohistochemistry. AC149090.1 was significantly upregulated in the rod cells, ON cone BCs (CBCs), OFF CBCs, and RBCs in T2D mice. Additionally, the interneurons, especially BCs, were the most vulnerable cells to diabetes by integrating scRNA-seq and genome-wide association studies (GWAS) analyses. In conclusion, this study delineated a cross-species retinal cell atlas and uncovered the early pathological alterations in the retina of T2D mice.  相似文献   

13.
The single-cell RNA sequencing (scRNA-seq) technologies obtain gene expression at single-cell resolution and provide a tool for exploring cell heterogeneity and cell types. As the low amount of extracted mRNA copies per cell, scRNA-seq data exhibit a large number of dropouts, which hinders the downstream analysis of the scRNA-seq data. We propose a statistical method, SDImpute (Single-cell RNA-seq Dropout Imputation), to implement block imputation for dropout events in scRNA-seq data. SDImpute automatically identifies the dropout events based on the gene expression levels and the variations of gene expression across similar cells and similar genes, and it implements block imputation for dropouts by utilizing gene expression unaffected by dropouts from similar cells. In the experiments, the results of the simulated datasets and real datasets suggest that SDImpute is an effective tool to recover the data and preserve the heterogeneity of gene expression across cells. Compared with the state-of-the-art imputation methods, SDImpute improves the accuracy of the downstream analysis including clustering, visualization, and differential expression analysis.  相似文献   

14.
Pseudotime analysis from scRNA-seq data enables to characterize the continuous progression of various biological processes, such as the cell cycle. Cell cycle plays an important role in cell fate decisions and differentiation and is often regarded as a confounder in scRNA-seq data analysis when analyzing the role of other factors. Therefore, accurate prediction of cell cycle pseudotime and identification of cell cycle stages are important steps for characterizing the development-related biological processes. Here, we develop CCPE, a novel cell cycle pseudotime estimation method to characterize cell cycle timing and identify cell cycle phases from scRNA-seq data. CCPE uses a discriminative helix to characterize the circular process of the cell cycle and estimates each cell''s pseudotime along the cell cycle. We evaluated the performance of CCPE based on a variety of simulated and real scRNA-seq datasets. Our results indicate that CCPE is an effective method for cell cycle estimation and competitive in various applications compared with other existing methods. CCPE successfully identified cell cycle marker genes and is robust to dropout events in scRNA-seq data. Accurate prediction of the cell cycle using CCPE can also effectively facilitate the removal of cell cycle effects across cell types or conditions.  相似文献   

15.
16.
Single-cell RNA-seq (scRNA-seq) can be used to characterize cellular heterogeneity in thousands of cells. The reconstruction of a gene network based on coexpression patterns is a fundamental task in scRNA-seq analyses, and the mutual exclusivity of gene expression can be critical for understanding such heterogeneity. Here, we propose an approach for detecting communities from a genetic network constructed on the basis of coexpression properties. The community-based comparison of multiple coexpression networks enables the identification of functionally related gene clusters that cannot be fully captured through differential gene expression-based analysis. We also developed a novel metric referred to as the exclusively expressed index (EEI) that identifies mutually exclusive gene pairs from sparse scRNA-seq data. EEI quantifies and ranks the exclusive expression levels of all gene pairs from binary expression patterns while maintaining robustness against a low sequencing depth. We applied our methods to glioblastoma scRNA-seq data and found that gene communities were partially conserved after serum stimulation despite a considerable number of differentially expressed genes. We also demonstrate that the identification of mutually exclusive gene sets with EEI can improve the sensitivity of capturing cellular heterogeneity. Our methods complement existing approaches and provide new biological insights, even for a large, sparse dataset, in the single-cell analysis field.  相似文献   

17.
18.
19.
Individual cells are basic units of life. Despite extensive efforts to characterize the cellular heterogeneity of different organisms, cross-species comparisons of landscape dynamics have not been achieved. Here, we applied single-cell RNA sequencing (scRNA-seq) to map organism-level cell landscapes at multiple life stages for mice, zebrafish and Drosophila. By integrating the comprehensive dataset of > 2.6 million single cells, we constructed a cross-species cell landscape and identified signatures and common pathways that changed throughout the life span. We identified structural inflammation and mitochondrial dysfunction as the most common hallmarks of organism aging, and found that pharmacological activation of mitochondrial metabolism alleviated aging phenotypes in mice. The cross-species cell landscape with other published datasets were stored in an integrated online portal—Cell Landscape. Our work provides a valuable resource for studying lineage development, maturation and aging.  相似文献   

20.
Clustering is a prevalent analytical means to analyze single cell RNA sequencing (scRNA-seq) data but the rapidly expanding data volume can make this process computationally challenging. New methods for both accurate and efficient clustering are of pressing need. Here we proposed Spearman subsampling-clustering-classification (SSCC),a new clustering framework based on random projection and feature construction,for large-scale scRNA-seq data. SSCC greatly improves clustering accuracy,robustness,and computational efficacy for various state-of-the-art algorithms benchmarked on multiple real datasets. On a dataset with 68,578 human blood cells,SSCC achieved 20%improvement for clustering accuracy and 50-fold acceleration,but only consumed 66%memory usage,compared to the widelyused software package SC3. Compared to k-means,the accuracy improvement of SSCC can reach 3-fold. An R implementation of SSCC is available at https://github.com/Japrin/sscClust.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号