首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The single-cell RNA sequencing (scRNA-seq) technologies obtain gene expression at single-cell resolution and provide a tool for exploring cell heterogeneity and cell types. As the low amount of extracted mRNA copies per cell, scRNA-seq data exhibit a large number of dropouts, which hinders the downstream analysis of the scRNA-seq data. We propose a statistical method, SDImpute (Single-cell RNA-seq Dropout Imputation), to implement block imputation for dropout events in scRNA-seq data. SDImpute automatically identifies the dropout events based on the gene expression levels and the variations of gene expression across similar cells and similar genes, and it implements block imputation for dropouts by utilizing gene expression unaffected by dropouts from similar cells. In the experiments, the results of the simulated datasets and real datasets suggest that SDImpute is an effective tool to recover the data and preserve the heterogeneity of gene expression across cells. Compared with the state-of-the-art imputation methods, SDImpute improves the accuracy of the downstream analysis including clustering, visualization, and differential expression analysis.  相似文献   

2.
3.
signatureSearch is an R/Bioconductor package that integrates a suite of existing and novel algorithms into an analysis environment for gene expression signature (GES) searching combined with functional enrichment analysis (FEA) and visualization methods to facilitate the interpretation of the search results. In a typical GES search (GESS), a query GES is searched against a database of GESs obtained from large numbers of measurements, such as different genetic backgrounds, disease states and drug perturbations. Database matches sharing correlated signatures with the query indicate related cellular responses frequently governed by connected mechanisms, such as drugs mimicking the expression responses of a disease. To identify which processes are predominantly modulated in the GESS results, we developed specialized FEA methods combined with drug-target network visualization tools. The provided analysis tools are useful for studying the effects of genetic, chemical and environmental perturbations on biological systems, as well as searching single cell GES databases to identify novel network connections or cell types. The signatureSearch software is unique in that it provides access to an integrated environment for GESS/FEA routines that includes several novel search and enrichment methods, efficient data structures, and access to pre-built GES databases, and allowing users to work with custom databases.  相似文献   

4.
5.
The BioGRID (Biological General Repository for Interaction Datasets, thebiogrid.org ) is an open‐access database resource that houses manually curated protein and genetic interactions from multiple species including yeast, worm, fly, mouse, and human. The ~1.93 million curated interactions in BioGRID can be used to build complex networks to facilitate biomedical discoveries, particularly as related to human health and disease. All BioGRID content is curated from primary experimental evidence in the biomedical literature, and includes both focused low‐throughput studies and large high‐throughput datasets. BioGRID also captures protein post‐translational modifications and protein or gene interactions with bioactive small molecules including many known drugs. A built‐in network visualization tool combines all annotations and allows users to generate network graphs of protein, genetic and chemical interactions. In addition to general curation across species, BioGRID undertakes themed curation projects in specific aspects of cellular regulation, for example the ubiquitin‐proteasome system, as well as specific disease areas, such as for the SARS‐CoV‐2 virus that causes COVID‐19 severe acute respiratory syndrome. A recent extension of BioGRID, named the Open Repository of CRISPR Screens (ORCS, orcs.thebiogrid.org ), captures single mutant phenotypes and genetic interactions from published high throughput genome‐wide CRISPR/Cas9‐based genetic screens. BioGRID‐ORCS contains datasets for over 1,042 CRISPR screens carried out to date in human, mouse and fly cell lines. The biomedical research community can freely access all BioGRID data through the web interface, standardized file downloads, or via model organism databases and partner meta‐databases.  相似文献   

6.
Physcomitrella patens is a bryophyte model plant that is often used to study plant evolution and development. Its resources are of great importance for comparative genomics and evo‐devo approaches. However, expression data from Physcomitrella patens were so far generated using different gene annotation versions and three different platforms: CombiMatrix and NimbleGen expression microarrays and RNA sequencing. The currently available P. patens expression data are distributed across three tools with different visualization methods to access the data. Here, we introduce an interactive expression atlas, Physcomitrella Expression Atlas Tool (PEATmoss), that unifies publicly available expression data for P. patens and provides multiple visualization methods to query the data in a single web‐based tool. Moreover, PEATmoss includes 35 expression experiments not previously available in any other expression atlas. To facilitate gene expression queries across different gene annotation versions, and to access P. patens annotations and related resources, a lookup database and web tool linked to PEATmoss was implemented. PEATmoss can be accessed at https://peatmoss.online.uni-marburg.de  相似文献   

7.
8.
With the tremendous increase of publicly available single-cell RNA-sequencing (scRNA-seq) datasets, bioinformatics methods based on gene co-expression network are becoming efficient tools for analyzing scRNA-seq data, improving cell type prediction accuracy and in turn facilitating biological discovery. However, the current methods are mainly based on overall co-expression correlation and overlook co-expression that exists in only a subset of cells, thus fail to discover certain rare cell types and sensitive to batch effect. Here, we developed independent component analysis-based gene co-expression network inference (ICAnet) that decomposed scRNA-seq data into a series of independent gene expression components and inferred co-expression modules, which improved cell clustering and rare cell-type discovery. ICAnet showed efficient performance for cell clustering and batch integration using scRNA-seq datasets spanning multiple cells/tissues/donors/library types. It works stably on datasets produced by different library construction strategies and with different sequencing depths and cell numbers. We demonstrated the capability of ICAnet to discover rare cell types in multiple independent scRNA-seq datasets from different sources. Importantly, the identified modules activated in acute myeloid leukemia scRNA-seq datasets have the potential to serve as new diagnostic markers. Thus, ICAnet is a competitive tool for cell clustering and biological interpretations of single-cell RNA-seq data analysis.  相似文献   

9.
Filamentous fungal gene expression assays provide essential information for understanding systemic cellular regulation. To aid research on fungal gene expression, we constructed a novel, comprehensive, free database, the filamentous fungal gene expression database (FFGED), available at http://bioinfo.townsend.yale.edu. FFGED features user-friendly management of gene expression data, which are assorted into experimental metadata, experimental design, raw data, normalized details, and analysis results. Data may be submitted in the process of an experiment, and any user can submit multiple experiments, thus classifying the FFGED as an “active experiment” database. Most importantly, FFGED functions as a collective and collaborative platform, by connecting each experiment with similar related experiments made public by other users, maximizing data sharing among different users, and correlating diverse gene expression levels under multiple experimental designs within different experiments. A clear and efficient web interface is provided with enhancement by AJAX (Asynchronous JavaScript and XML) and through a collection of tools to effectively facilitate data submission, sharing, retrieval and visualization.  相似文献   

10.
In gene expression profiling studies, including single-cell RNA sequencing(sc RNA-seq)analyses, the identification and characterization of co-expressed genes provides critical information on cell identity and function. Gene co-expression clustering in sc RNA-seq data presents certain challenges. We show that commonly used methods for single-cell data are not capable of identifying co-expressed genes accurately, and produce results that substantially limit biological expectations of co-expressed genes. Herein, we present single-cell Latent-variable Model(sc LM), a gene coclustering algorithm tailored to single-cell data that performs well at detecting gene clusters with significant biologic context. Importantly, sc LM can simultaneously cluster multiple single-cell datasets, i.e., consensus clustering, enabling users to leverage single-cell data from multiple sources for novel comparative analysis. sc LM takes raw count data as input and preserves biological variation without being influenced by batch effects from multiple datasets. Results from both simulation data and experimental data demonstrate that sc LM outperforms the existing methods with considerably improved accuracy. To illustrate the biological insights of sc LM, we apply it to our in-house and public experimental sc RNA-seq datasets. sc LM identifies novel functional gene modules and refines cell states, which facilitates mechanism discovery and understanding of complex biosystems such as cancers. A user-friendly R package with all the key features of the sc LM method is available at https://github.com/QSong-github/sc LM.  相似文献   

11.
12.
The magnitude of the problems of drug abuse and Neuro-AIDS warrants the development of novel approaches for testing hypotheses in diagnosis and treatment ranging from cell culture models to developing databases. In this study, cultured neurons were treated with/without HIV-TAT, ENV, or cocaine in a 2x2x2 expression study design. RNA was purified, labeled, and expression data were produced and analyzed using ANOVA. Thus, we identified 35 genes that were significantly expressed across treatment conditions. A diagram is presented showing examples of molecular relationships involving a significantly expressed gene in the current study (SOX2). Also, we use this information to discuss examples of gene expression interactions as a means to portray significance and complexity of gene expression studies in Drug Abuse and Neuro-AIDS. Furthermore, we discuss here that critical interactions remain undetected, which may be unravelled by developing robust database systems containing large datasets and gleaned information from collaborating scientists . Hence, we are developing a public domain database we named The Agora database , that will served as a shared infrastructure to query, deposit, and review information related to drug abuse and dementias including Neuro-AIDS. A workflow of this database is also outlined in this paper.  相似文献   

13.
We present a web resource MEM (Multi-Experiment Matrix) for gene expression similarity searches across many datasets. MEM features large collections of microarray datasets and utilizes rank aggregation to merge information from different datasets into a single global ordering with simultaneous statistical significance estimation. Unique features of MEM include automatic detection, characterization and visualization of datasets that includes the strongest coexpression patterns. MEM is freely available at .  相似文献   

14.
Circular RNAs (circRNAs) from back-splicing of exon(s) have been recently identified to be broadly expressed in eukaryotes, in tissue- and species-specific manners. Although functions of most circRNAs remain elusive, some circRNAs are shown to be functional in gene expression regulation and potentially relate to diseases. Due to their stability, circRNAs can also be used as biomarkers for diagnosis. Profiling circRNAs by integrating their expression among different samples thus provides molecular basis for further functional study of circRNAs and their potential application in clinic. Here, we report CIRCpedia v2, an updated database for comprehensive circRNA annotation from over 180 RNA-seq datasets across six different species. This atlas allows users to search, browse, and download circRNAs with expression features in various cell types/tissues, including disease samples. In addition, the updated database incorporates conservation analysis of circRNAs between humans and mice. Finally, the web interface also contains computational tools to compare circRNA expression among samples. CIRCpedia v2 is accessible at http://www.picb.ac.cn/rnomics/circpedia.  相似文献   

15.
SUMMARY: LinkinPath is a pathway mapping and analysis tool that enables users to explore and visualize the list of gene/protein sequences through various Flash-driven interactive web interfaces including KEGG pathway maps, functional composition maps (TreeMaps), molecular interaction/reaction networks and pathway-to-pathway networks. Users can submit single or multiple datasets of gene/protein sequences to LinkinPath to (i) determine the co-occurrence and co-absence of genes/proteins on animated KEGG pathway maps; (ii) compare functional compositions within and among the datasets using TreeMaps; (iii) analyze the statistically enriched pathways across the datasets; (iv) build the pathway-to-pathway networks for each dataset; (v) explore potential interaction/reaction paths between pathways; and (vi) identify common pathway-to-pathway networks across the datasets. AVAILABILITY: LinkinPath is freely available to all interested users at http://www.biotec.or.th/isl/linkinpath/.  相似文献   

16.
Technical and experimental advances in microaspiration techniques, RNA amplification, quantitative real-time polymerase chain reaction (qPCR), and cDNA microarray analysis have led to an increase in the number of studies of single-cell gene expression. In particular, the central nervous system (CNS) is an ideal structure to apply single-cell gene expression paradigms. Unlike an organ that is composed of one principal cell type, the brain contains a constellation of neuronal and noneuronal populations of cells. A goal is to sample gene expression from similar cell types within a defined region without potential contamination by expression profiles of adjacent neuronal subpopulations and noneuronal cells. The unprecedented resolution afforded by single-cell RNA analysis in combination with cDNA microarrays and qPCR-based analyses allows for relative gene expression level comparisons across cell types under different experimental conditions and disease states. The ability to analyze single cells is an important distinction from global and regional assessments of mRNA expression and can be applied to optimally prepared tissues from animal models as well as postmortem human brain tissues. This focused review illustrates the potential power of single-cell gene expression studies within the CNS in relation to neurodegenerative and neuropsychiatric disorders such as Alzheimer's disease (AD) and schizophrenia, respectively.  相似文献   

17.
18.
GENEVESTIGATOR. Arabidopsis microarray database and analysis toolbox   总被引:26,自引:0,他引:26  
High-throughput gene expression analysis has become a frequent and powerful research tool in biology. At present, however, few software applications have been developed for biologists to query large microarray gene expression databases using a Web-browser interface. We present GENEVESTIGATOR, a database and Web-browser data mining interface for Affymetrix GeneChip data. Users can query the database to retrieve the expression patterns of individual genes throughout chosen environmental conditions, growth stages, or organs. Reversely, mining tools allow users to identify genes specifically expressed during selected stresses, growth stages, or in particular organs. Using GENEVESTIGATOR, the gene expression profiles of more than 22,000 Arabidopsis genes can be obtained, including those of 10,600 currently uncharacterized genes. The objective of this software application is to direct gene functional discovery and design of new experiments by providing plant biologists with contextual information on the expression of genes. The database and analysis toolbox is available as a community resource at https://www.genevestigator.ethz.ch.  相似文献   

19.
The Maximal Margin (MAMA) linear programming classification algorithm has recently been proposed and tested for cancer classification based on expression data. It demonstrated sound performance on publicly available expression datasets. We developed a web interface to allow potential users easy access to the MAMA classification tool. Basic and advanced options provide flexibility in exploitation. The input data format is the same as that used in most publicly available datasets. This makes the web resource particularly convenient for non-expert machine learning users working in the field of expression data analysis.  相似文献   

20.
Gene expression patterns can reflect gene regulations in human tissues under normal or pathologic conditions. Gene expression profiling data from studies of primary human disease samples are particularly valuable since these studies often span many years in order to collect patient clinical information and achieve a large sample size. Disease-to-Gene Expression Mapper (DGEM) provides a beneficial community resource to access and analyze these data; it currently includes Affymetrix oligonucleotide array datasets for more than 40 human diseases and 1400 samples. The data are normalized to the same scale and stored in a relational database. A statistical-analysis pipeline was implemented to identify genes abnormally expressed in disease tissues or genes whose expressions are associated with clinical parameters such as cancer patient survival. Data-mining results can be queried through a web-based interface at http://dgem.dhcp.iupui.edu/. The query tool enables dynamic generation of graphs and tables that are further linked to major gene and pathway resources that connect the data to relevant biology, including Entrez Gene and Kyoto Encyclopedia of Genes and Genomes (KEGG). In summary, DGEM provides scientists and physicians a valuable tool to study disease mechanisms, to discover potential disease biomarkers for diagnosis and prognosis, and to identify novel gene targets for drug discovery. The source code is freely available for non-profit use, on request to the authors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号