首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Genetic epidemiology is a rapidly advancing field due to the recent availability of large amounts of omics data. In recent years, it has become possible to obtain omics information at the single-cell level, so genetic epidemiological models need to be updated to integrate with single-cell expression data. In this perspective paper, we propose a cell population-based framework for genetic epidemiology in the single-cell era. In this framework, genetic diversity influences phenotypic diversity through the diversity of cell population profiles, which are defined as high-dimensional probability distributions of the state spaces of biomolecules of each omics layer. We discuss how biomolecular experimental measurement data can capture the different properties of this distribution. In particular, single-cell data constitute a sample from this population distribution where only some coordinate values are observable. From a data analysis standpoint, we introduce methodology for feature extraction from cell population profiles. Finally, we discuss how this framework can be applied not only to genetic epidemiology but also to systems biology.  相似文献   

3.
Clustering cells and depicting the lineage relationship among cell subpopulations are fundamental tasks in single-cell omics studies. However, existing analytical methods face challenges in stratifying cells, tracking cellular trajectories, and identifying critical points of cell transitions. To overcome these, we proposed a novel Markov hierarchical clustering algorithm (MarkovHC), a topological clustering method that leverages the metastability of exponentially perturbed Markov chains for systematically reconstructing the cellular landscape. Briefly, MarkovHC starts with local connectivity and density derived from the input and outputs a hierarchical structure for the data. We firstly benchmarked MarkovHC on five simulated datasets and ten public single-cell datasets with known labels. Then, we used MarkovHC to investigate the multi-level architectures and transition processes during human embryo preimplantation development and gastric cancer procession. MarkovHC found heterogeneous cell states and sub-cell types in lineage-specific progenitor cells and revealed the most possible transition paths and critical points in the cellular processes. These results demonstrated MarkovHC’s effectiveness in facilitating the stratification of cells, identification of cell populations, and characterization of cellular trajectories and critical points.  相似文献   

4.
Single-cell RNA sequencing enables us to characterize the cellular heterogeneity in single cell resolution with the help of cell type identification algorithms. However, the noise inherent in single-cell RNA-sequencing data severely disturbs the accuracy of cell clustering, marker identification and visualization. We propose that clustering based on feature density profiles can distinguish informative features from noise. We named such strategy as ‘entropy subspace’ separation and designed a cell clustering algorithm called ENtropy subspace separation-based Clustering for nOise REduction (ENCORE) by integrating the ‘entropy subspace’ separation strategy with a consensus clustering method. We demonstrate that ENCORE performs superiorly on cell clustering and generates high-resolution visualization across 12 standard datasets. More importantly, ENCORE enables identification of group markers with biological significance from a hard-to-separate dataset. With the advantages of effective feature selection, improved clustering, accurate marker identification and high-resolution visualization, we present ENCORE to the community as an important tool for scRNA-seq data analysis to study cellular heterogeneity and discover group markers.  相似文献   

5.
6.
In gene expression profiling studies, including single-cell RNA sequencing(sc RNA-seq)analyses, the identification and characterization of co-expressed genes provides critical information on cell identity and function. Gene co-expression clustering in sc RNA-seq data presents certain challenges. We show that commonly used methods for single-cell data are not capable of identifying co-expressed genes accurately, and produce results that substantially limit biological expectations of co-expressed genes. Herein, we present single-cell Latent-variable Model(sc LM), a gene coclustering algorithm tailored to single-cell data that performs well at detecting gene clusters with significant biologic context. Importantly, sc LM can simultaneously cluster multiple single-cell datasets, i.e., consensus clustering, enabling users to leverage single-cell data from multiple sources for novel comparative analysis. sc LM takes raw count data as input and preserves biological variation without being influenced by batch effects from multiple datasets. Results from both simulation data and experimental data demonstrate that sc LM outperforms the existing methods with considerably improved accuracy. To illustrate the biological insights of sc LM, we apply it to our in-house and public experimental sc RNA-seq datasets. sc LM identifies novel functional gene modules and refines cell states, which facilitates mechanism discovery and understanding of complex biosystems such as cancers. A user-friendly R package with all the key features of the sc LM method is available at https://github.com/QSong-github/sc LM.  相似文献   

7.
8.
Translational cancer genomics research aims to ensure that experimental knowledge is subject to computational analysis, and integrated with a variety of records from omics and clinical sources. The data retrieval from such sources is not trivial, due to their redundancy and heterogeneity, and the presence of false evidence. In silico marker identification, therefore, remains a complex task that is mainly motivated by the impact that target identification from the elucidation of gene co-expression dynamics and regulation mechanisms, combined with the discovery of genotype–phenotype associations, may have for clinical validation. Based on the reuse of publicly available gene expression data, our aim is to propose cancer marker classification by integrating the prediction power of multiple annotation sources. In particular, with reference to the functional annotation for colorectal markers, we indicate a classification of markers into diagnostic and prognostic classes combined with susceptibility and risk factors.  相似文献   

9.
The ability to analyze multiple single-cell parameters is critical for understanding cellular heterogeneity. Despite recent advances in measurement technology, methods for analyzing high-dimensional single-cell data are often subjective, labor intensive and require prior knowledge of the biological system. To objectively uncover cellular heterogeneity from single-cell measurements, we present a versatile computational approach, spanning-tree progression analysis of density-normalized events (SPADE). We applied SPADE to flow cytometry data of mouse bone marrow and to mass cytometry data of human bone marrow. In both cases, SPADE organized cells in a hierarchy of related phenotypes that partially recapitulated well-described patterns of hematopoiesis. We demonstrate that SPADE is robust to measurement noise and to the choice of cellular markers. SPADE facilitates the analysis of cellular heterogeneity, the identification of cell types and comparison of functional markers in response to perturbations.  相似文献   

10.
Mesenchymal stem cells (MSCs) are multipotent stromal cells with great potential for clinical applications. However, little is known about their cell heterogeneity at a single-cell resolution, which severely impedes the development of MSC therapy. In this review, we focus on advances in the identification of novel surface markers and functional subpopulations of MSCs made by single-cell RNA sequencing and discuss their participation in the pathophysiology of stem cells and related diseases. The challenges and future directions of single-cell RNA sequencing in MSCs are also addressed in this review.  相似文献   

11.
Acute myeloid leukemia (AML) is a fatal hematopoietic malignancy and has a prognosis that varies with its genetic complexity. However, there has been no appropriate integrative analysis on the hierarchy of different AML subtypes. Using Microwell-seq, a high-throughput single-cell mRNA sequencing platform, we analyzed the cellular hierarchy of bone marrow samples from 40 patients and 3 healthy donors. We also used single-cell single-molecule real-time (SMRT) sequencing to investigate the clonal heterogeneity of AML cells. From the integrative analysis of 191727 AML cells, we established a single-cell AML landscape and identified an AML progenitor cell cluster with novel AML markers. Patients with ribosomal protein high progenitor cells had a low remission rate. We deduced two types of AML with diverse clinical outcomes. We traced mitochondrial mutations in the AML landscape by combining Microwell-seq with SMRT sequencing. We propose the existence of a phenotypic “cancer attractor” that might help to define a common phenotype for AML progenitor cells. Finally, we explored the potential drug targets by making comparisons between the AML landscape and the Human Cell Landscape. We identified a key AML progenitor cell cluster. A high ribosomal protein gene level indicates the poor prognosis. We deduced two types of AML and explored the potential drug targets. Our results suggest the existence of a cancer attractor.  相似文献   

12.
Single-cell mass cytometry, also known as cytometry by time of flight (CyTOF) is a powerful high-throughput technology that allows analysis of up to 50 protein markers per cell for the quantification and classification of single cells. Traditional manual gating utilized to identify new cell populations has been inadequate, inefficient, unreliable, and difficult to use, and no algorithms to identify both calibration and new cell populations has been well established. A deep learning with graphic cluster (DGCyTOF) visualization is developed as a new integrated embedding visualization approach in identifying canonical and new cell types. The DGCyTOF combines deep-learning classification and hierarchical stable-clustering methods to sequentially build a tri-layer construct for known cell types and the identification of new cell types. First, deep classification learning is constructed to distinguish calibration cell populations from all cells by softmax classification assignment under a probability threshold, and graph embedding clustering is then used to identify new cell populations sequentially. In the middle of two-layer, cell labels are automatically adjusted between new and unknown cell populations via a feedback loop using an iteration calibration system to reduce the rate of error in the identification of cell types, and a 3-dimensional (3D) visualization platform is finally developed to display the cell clusters with all cell-population types annotated. Utilizing two benchmark CyTOF databases comprising up to 43 million cells, we compared accuracy and speed in the identification of cell types among DGCyTOF, DeepCyTOF, and other technologies including dimension reduction with clustering, including Principal Component Analysis (PCA), Factor Analysis (FA), Independent Component Analysis (ICA), Isometric Feature Mapping (Isomap), t-distributed Stochastic Neighbor Embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP) with k-means clustering and Gaussian mixture clustering. We observed the DGCyTOF represents a robust complete learning system with high accuracy, speed and visualization by eight measurement criteria. The DGCyTOF displayed F-scores of 0.9921 for CyTOF1 and 0.9992 for CyTOF2 datasets, whereas those scores were only 0.507 and 0.529 for the t-SNE+k-means; 0.565 and 0.59, for UMAP+ k-means. Comparison of DGCyTOF with t-SNE and UMAP visualization in accuracy demonstrated its approximately 35% superiority in predicting cell types. In addition, observation of cell-population distribution was more intuitive in the 3D visualization in DGCyTOF than t-SNE and UMAP visualization. The DGCyTOF model can automatically assign known labels to single cells with high accuracy using deep-learning classification assembling with traditional graph-clustering and dimension-reduction strategies. Guided by a calibration system, the model seeks optimal accuracy balance among calibration cell populations and unknown cell types, yielding a complete and robust learning system that is highly accurate in the identification of cell populations compared to results using other methods in the analysis of single-cell CyTOF data. Application of the DGCyTOF method to identify cell populations could be extended to the analysis of single-cell RNASeq data and other omics data.  相似文献   

13.
空间转录物组学是在单细胞RNA测序技术基础上实现细胞空间位置信息测定的组学技术。该技术克服了单细胞转录物组学在单细胞分离建库过程中丢失细胞在组织中空间信息的问题,可同时提供研究对象的转录物组数据信息和在组织中的空间位置信息。空间转录物组学技术对研究细胞谱系的发生过程、细胞间的调控机制和相互作用等具有重要作用,是组学技术研究的重要发展方向和热点。近年来,空间转录物组学技术发展迅速,新的检测方法不断产生,检测灵敏度、分辨率和检测通量等技术指标不断提升。本文根据获取空间信息的原理不同,将较为常用的空间转录物组学技术进行了分类,总结了各类方法的检测原理、代表性技术手段及其相应的技术指标。随后,从脑细胞类型区分与细胞层图谱构建、神经系统相关疾病特征分析与标志物研究两个方面举例论述了空间转录物组学技术在神经科学中的应用。最后,对空间转录物组学技术目前存在的问题进行了总结,并对其未来的发展方向进行了展望。  相似文献   

14.
Cancer stem cell (CSC) theory suggests that only a small subpopulation of cells having stem cell-like potentials can initiate tumor development. While recent data on acute lymphoblastic leukemia (ALL) are conflicting, some studies have demonstrated the existence of such cells following CD34-targeted isolation of primary samples. Although CD34 is a useful marker for the isolation of CSCs in leukemias, the identification of other specific markers besides CD34 has been relatively unsuccessful. To identify new markers, we first performed extensive analysis of surface markers on several B-ALL cell lines. Our data demonstrated that every B-ALL cell line tested did not express CD34 but certain lines contained cell populations with marked heterogeneity in marker expression. Moreover, the CD9+ cell population possessed stem cell characteristics within the clone, as demonstrated by in vitro and transplantation experiments. These results suggest that CD9 is a useful positive-selection marker for the identification of CSCs in B-ALL.  相似文献   

15.
16.
In this work, we describe the development of Polar Gini Curve, a method for characterizing cluster markers by analyzing single-cell RNA sequencing (scRNA-seq) data. Polar Gini Curve combines the gene expression and the 2D coordinates ("spatial") information to detect patterns of uniformity in any clustered cells from scRNA-seq data. We demonstrate that Polar Gini Curve can help users characterize the shape and density distribution of cells in a particular cluster, which can be generated during routine scRNA-seq data analysis. To quantify the extent to which a gene is uniformly distributed in a cell cluster space, we combine two polar Gini curves (PGCs)—one drawn upon the cell-points expressing the gene (the"foreground curve") and the other drawn upon all cell-points in the cluster (the"background curve"). We show that genes with highly dissimilar foreground and background curves tend not to uniformly distributed in the cell cluster—thus having spatially divergent gene expression patterns within the cluster. Genes with similar foreground and background curves tend to uniformly distributed in the cell cluster—thus having uniform gene expression patterns within the cluster. Such quantitative attributes of PGCs can be applied to sensitively discover biomarkers across clusters from scRNA-seq data. We demonstrate the performance of the Polar Gini Curve framework in several simulation case studies. Using this framework to analyze a real-world neonatal mouse heart cell dataset, the detected biomarkers may characterize novel subtypes of cardiac muscle cells. The source code and data for Polar Gini Curve could be found at http://discovery.informatics.uab.edu/PGC/ or https://figshare.com/projects/Polar_Gini_Curve/76749.  相似文献   

17.
The first morphological sign of vertebrate postcranial body segmentation is the sequential production from posterior paraxial mesoderm of blocks of cells termed somites. Each of these embryonic structures is polarized along the anterior/posterior axis, a subdivision first distinguished by marker gene expression restricted to rostral or caudal territories of forming somites. To better understand the generation of segment polarity in vertebrates, we have studied the zebrafish mutant fused somites (fss), because its paraxial mesoderm lacks segment polarity. Previously examined markers of caudal half-segment identity are widely expressed, whereas markers of rostral identity are either missing or dramatically down-regulated, suggesting that the paraxial mesoderm of the fss mutant embryo is profoundly caudalized. These findings gave rise to a model for the formation of segment polarity in the zebrafish in which caudal is the default identity for paraxial mesoderm, upon which is patterned rostral identity in an fss-dependent manner. In contrast to this scheme, the caudal marker gene ephrinA1 was recently shown to be down-regulated in fss embryos. We now show that notch5, another caudal identity marker and a component of the Delta/Notch signaling system, is not expressed in the paraxial mesoderm of early segmentation stage fss embryos. We use cell transplantation to create genetic mosaics between fss and wild-type embryos in order to assay the requirement for fss function in notch5 expression. In contrast to the expression of rostral markers, which have a cell-autonomous requirement for fss, expression of notch5 is induced in fss cells at short range by nearby wild-type cells, indicating a cell-non-autonomous requirement for fss function in this process. These new data suggest that segment polarity is created in a three-step process in which cells that have assumed a rostral identity must subsequently communicate with their partially caudalized neighbors in order to induce the fully caudalized state.  相似文献   

18.
《Genomics》2022,114(3):110353
It has been demonstrated that miRNAs are involved in many biological processes including cell proliferation and differentiation, apoptosis, and stress responses. Although single-cell RNA sequencing technology is prevailing nowadays, it still remains challenging in quantifying miRNA at the single-cell level. Herein, we present the computational methods to infer the single-cell miRNA expression level using its target gene abundances. Firstly, we developed an enrichment-based approach in estimating miRNA expression considering miRNA-mRNA regulation information and miRNA-mRNA correlation signal captured from existing TCGA datasets. Further efforts were made to infer the miRNA expression with machine learning models. The methods were applied to compare the accuracy and robustness with the simulated single-cell data. Finally, we applied the method in single-cell RNA-seq triple negative breast cancer (TNBC) patients to further discover miRNA marker at the single-cell level for the malignant cells. Our tool is available online at: https://github.com/ChengkuiZhao/Single-cell-miRNA-prediction.  相似文献   

19.
Cell surface proteins have a wide range of biological functions, and are often used as lineage-specific markers. Antibodies that recognize cell surface antigens are widely used as research tools, diagnostic markers, and even therapeutic agents. The ability to obtain broad cell surface protein profiles would thus be of great value in a wide range of fields. There are however currently few available methods for high-throughput analysis of large numbers of cell surface proteins. We describe here a high-throughput flow cytometry (HT-FC) platform for rapid analysis of 363 cell surface antigens. Here we demonstrate that HT-FC provides reproducible results, and use the platform to identify cell surface antigens that are influenced by common cell preparation methods. We show that multiple populations within complex samples such as primary tumors can be simultaneously analyzed by co-staining of cells with lineage-specific antibodies, allowing unprecedented depth of analysis of heterogeneous cell populations. Furthermore, standard informatics methods can be used to visualize, cluster and downsample HT-FC data to reveal novel signatures and biomarkers. We show that the cell surface profile provides sufficient molecular information to classify samples from different cancers and tissue types into biologically relevant clusters using unsupervised hierarchical clustering. Finally, we describe the identification of a candidate lineage marker and its subsequent validation. In summary, HT-FC combines the advantages of a high-throughput screen with a detection method that is sensitive, quantitative, highly reproducible, and allows in-depth analysis of heterogeneous samples. The use of commercially available antibodies means that high quality reagents are immediately available for follow-up studies. HT-FC has a wide range of applications, including biomarker discovery, molecular classification of cancers, or identification of novel lineage specific or stem cell markers.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号