首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background

The integration of high-quality, genome-wide analyses offers a robust approach to elucidating genetic factors involved in complex human diseases. Even though several methods exist to integrate heterogeneous omics data, most biologists still manually select candidate genes by examining the intersection of lists of candidates stemming from analyses of different types of omics data that have been generated by imposing hard (strict) thresholds on quantitative variables, such as P-values and fold changes, increasing the chance of missing potentially important candidates.

Methods

To better facilitate the unbiased integration of heterogeneous omics data collected from diverse platforms and samples, we propose a desirability function framework for identifying candidate genes with strong evidence across data types as targets for follow-up functional analysis. Our approach is targeted towards disease systems with sparse, heterogeneous omics data, so we tested it on one such pathology: spontaneous preterm birth (sPTB).

Results

We developed the software integRATE, which uses desirability functions to rank genes both within and across studies, identifying well-supported candidate genes according to the cumulative weight of biological evidence rather than based on imposition of hard thresholds of key variables. Integrating 10 sPTB omics studies identified both genes in pathways previously suspected to be involved in sPTB as well as novel genes never before linked to this syndrome. integRATE is available as an R package on GitHub (https://github.com/haleyeidem/integRATE).

Conclusions

Desirability-based data integration is a solution most applicable in biological research areas where omics data is especially heterogeneous and sparse, allowing for the prioritization of candidate genes that can be used to inform more targeted downstream functional analyses.
  相似文献   

2.
高通量实验方法的发展导致大量基因组、转录组、代谢组等组学数据的出现,组学数据的整合为全面了解生物学系统提供了条件.但是,由于当前实验技术手段的限制,高通量组学数据大多存在系统偏差,数据类型和可靠程度也各不相同,这给组学数据的整合带来了困难.本文以转录组、蛋白质组和代谢组为重点,综述了近年来组学数据整合方面的研究进展,包括新的数据整合方法和分析平台.虽然现存的数据统计和网络分析的方法有助于发现不同组学数据之间的关联,但是生物学意义上的深层次的数据整合还有待于生物、数学、计算机等各种领域的全面发展.  相似文献   

3.
The challenge for -omics research is to tackle the problem of fragmentation of knowledge by integrating several sources of heterogeneous information into a coherent entity. It is widely recognized that successful data integration is one of the keys to improve productivity for stored data. Through proper data integration tools and algorithms, researchers may correlate relationships that enable them to make better and faster decisions. The need for data integration is essential for present ‐omics community, because ‐omics data is currently spread world wide in wide variety of formats. These formats can be integrated and migrated across platforms through different techniques and one of the important techniques often used is XML. XML is used to provide a document markup language that is easier to learn, retrieve, store and transmit. It is semantically richer than HTML. Here, we describe bio warehousing, database federation, controlled vocabularies and highlighting the XML application to store, migrate and validate -omics data.  相似文献   

4.
Background: More and more high-throughput datasets are available from multiple levels of measuring gene regulations. The reverse engineering of gene regulatory networks from these data offers a valuable research paradigm to decipher regulatory mechanisms. So far, numerous methods have been developed for reconstructing gene regulatory networks. Results: In this paper, we provide a review of bioinformatics methods for inferring gene regulatory network from omics data. To achieve the precision reconstruction of gene regulatory networks, an intuitive alternative is to integrate these available resources in a rational framework. We also provide computational perspectives in the endeavors of inferring gene regulatory networks from heterogeneous data. We highlight the importance of multi-omics data integration with prior knowledge in gene regulatory network inferences. Conclusions: We provide computational perspectives of inferring gene regulatory networks from multiple omics data and present theoretical analyses of existing challenges and possible solutions. We emphasize on prior knowledge and data integration in network inferences owing to their abilities of identifying regulatory causality.  相似文献   

5.
Molecular and functional profiling of cancer cell lines is subject to laboratory‐specific experimental practices and data analysis protocols. The current challenge therefore is how to make an integrated use of the omics profiles of cancer cell lines for reliable biological discoveries. Here, we carried out a systematic analysis of nine types of data modalities using meta‐analysis of 53 omics studies across 12 research laboratories for 2,018 cell lines. To account for a relatively low consistency observed for certain data modalities, we developed a robust data integration approach that identifies reproducible signals shared among multiple data modalities and studies. We demonstrated the power of the integrative analyses by identifying a novel driver gene, ECHDC1, with tumor suppressive role validated both in breast cancer cells and patient tumors. The multi‐modal meta‐analysis approach also identified synthetic lethal partners of cancer drivers, including a co‐dependency of PTEN deficient endometrial cancer cells on RNA helicases.  相似文献   

6.
Rohan H. C. Palmer  Emma C. Johnson  Hyejung Won  Renato Polimanti  Manav Kapoor  Apurva Chitre  Molly A. Bogue  Chelsie E. Benca-Bachman  Clarissa C. Parker  Anurag Verma  Timothy Reynolds  Jason Ernst  Michael Bray  Soo Bin Kwon  Dongbing Lai  Bryan C. Quach  Nathan C. Gaddis  Laura Saba  Hao Chen  Michael Hawrylycz  Shan Zhang  Yuan Zhou  Spencer Mahaffey  Christian Fischer  Sandra Sanchez-Roige  Anita Bandrowski  Qing Lu  Li Shen  Vivek Philip  Joel Gelernter  Laura J. Bierut  Dana B. Hancock  Howard J. Edenberg  Eric O. Johnson  Eric J. Nestler  Peter B. Barr  Pjotr Prins  Desmond J. Smith  Schahram Akbarian  Thorgeir Thorgeirsson  Dave Walton  Erich Baker  Daniel Jacobson  Abraham A. Palmer  Michael Miles  Elissa J. Chesler  Jake Emerson  Arpana Agrawal  Maryann Martone  Robert W. Williams 《Genes, Brain & Behavior》2021,20(6):e12738
The National Institute on Drug Abuse and Joint Institute for Biological Sciences at the Oak Ridge National Laboratory hosted a meeting attended by a diverse group of scientists with expertise in substance use disorders (SUDs), computational biology, and FAIR (Findability, Accessibility, Interoperability, and Reusability) data sharing. The meeting's objective was to discuss and evaluate better strategies to integrate genetic, epigenetic, and 'omics data across human and model organisms to achieve deeper mechanistic insight into SUDs. Specific topics were to (a) evaluate the current state of substance use genetics and genomics research and fundamental gaps, (b) identify opportunities and challenges of integration and sharing across species and data types, (c) identify current tools and resources for integration of genetic, epigenetic, and phenotypic data, (d) discuss steps and impediment related to data integration, and (e) outline future steps to support more effective collaboration—particularly between animal model research communities and human genetics and clinical research teams. This review summarizes key facets of this catalytic discussion with a focus on new opportunities and gaps in resources and knowledge on SUDs.  相似文献   

7.
In the current omics era, innovative high-throughput technologies allow measuring temporal and conditional changes at various cellular levels. Although individual analysis of each of these omics data undoubtedly results into interesting findings, it is only by integrating them that gaining a global insight into cellular behaviour can be aimed at. A systems approach thus is predicated on data integration. However, because of the complexity of biological systems and the specificities of the data-generating technologies (noisiness, heterogeneity, etc.), integrating omics data in an attempt to reconstruct signalling networks is not trivial. Developing its methodologies constitutes a major research challenge. Besides for their intrinsic value towards health care, environment and industry, prokaryotes are ideal model systems to further develop these methods because of their lower regulatory complexity compared with eukaryotes, and the ease with which they can be manipulated. Several successful examples outlined in this review already show the potential of the systems approach for both fundamental and industrial applications, which would be time-consuming or impossible to develop solely through traditional reductionist approaches.  相似文献   

8.
The cancer tissue proteome has enormous potential as a source of novel predictive biomarkers in oncology. Progress in the development of mass spectrometry (MS)‐based tissue proteomics now presents an opportunity to exploit this by applying the strategies of comprehensive molecular profiling and big‐data analytics that are refined in other fields of ‘omics research. ProCan (ProCan is a registered trademark) is a program aiming to generate high‐quality tissue proteomic data across a broad spectrum of cancer types. It is based on data‐independent acquisition–MS proteomic analysis of annotated tissue samples sourced through collaboration with expert clinical and cancer research groups. The practical requirements of a high‐throughput translational research program have shaped the approach that ProCan is taking to address challenges in study design, sample preparation, raw data acquisition, and data analysis. The ultimate goal is to establish a large proteomics knowledge‐base that, in combination with other cancer ‘omics data, will accelerate cancer research.  相似文献   

9.
全新结构药物的研发存在周期长、耗资大、风险高的问题.通过各种技术预测已有药物的新适应症,即药物重定位,可以缩短药物研发时间、降低研发成本和风险.由于疾病种类和已知药物的数量繁多,完全通过实验筛选已知药物的新用途仍然具有很高的成本.随着组学和药物信息学数据的积累,药物重定位进入到了理性设计和实验筛选相结合的阶段,药物重定位的计算预测已经成为计算生物学和系统生物学的重要研究方向.本文将目前药物重定位计算分析的策略归纳为药物-靶标关系分析、药物-药物关系分析和药物-疾病关系分析,对已报道的技术方法及其成功应用实例进行了综述.  相似文献   

10.
Multi-omics integration is key to fully understand complex biological processes in an holistic manner. Furthermore, multi-omics combined with new longitudinal experimental design can unreveal dynamic relationships between omics layers and identify key players or interactions in system development or complex phenotypes. However, integration methods have to address various experimental designs and do not guarantee interpretable biological results. The new challenge of multi-omics integration is to solve interpretation and unlock the hidden knowledge within the multi-omics data. In this paper, we go beyond integration and propose a generic approach to face the interpretation problem. From multi-omics longitudinal data, this approach builds and explores hybrid multi-omics networks composed of both inferred and known relationships within and between omics layers. With smart node labelling and propagation analysis, this approach predicts regulation mechanisms and multi-omics functional modules. We applied the method on 3 case studies with various multi-omics designs and identified new multi-layer interactions involved in key biological functions that could not be revealed with single omics analysis. Moreover, we highlighted interplay in the kinetics that could help identify novel biological mechanisms. This method is available as an R package netOmics to readily suit any application.  相似文献   

11.
12.
13.
With the advances of genome-wide sequencing technologies and bioinformatics approaches, a large number of datasets of normal and malignant erythropoiesis have been gener-ated and made public to researchers around the world. Collection and integration of these datasets greatly facilitate basic research and clinical diagnosis and treatment of blood disorders. Here we provide a brief introduction of the most popular omics data resources of normal and malignant hematopoiesis, including some integrated web tools, to help users get better equipped to perform common analyses. We hope this review will promote the awareness and facilitate the usage of public database resources in the hematology research.  相似文献   

14.
15.
16.
17.
《Genomics》2019,111(6):1387-1394
To decipher the genetic architecture of human disease, various types of omics data are generated. Two common omics data are genotypes and gene expression. Often genotype data for a large number of individuals and gene expression data for a few individuals are generated due to biological and technical reasons, leading to unequal sample sizes for different omics data. Unavailability of standard statistical procedure for integrating such datasets motivates us to propose a two-step multi-locus association method using latent variables. Our method is powerful than single/separate omics data analysis and it unravels comprehensively deep-seated signals through a single statistical model. Extensive simulation confirms that it is robust to various genetic models as its power increases with sample size and number of associated loci. It provides p-values very fast. Application to real dataset on psoriasis identifies 17 novel SNPs, functionally related to psoriasis-associated genes, at much smaller sample size than standard GWAS.  相似文献   

18.
Computational integrative analysis has become a significant approach in the data-driven exploration of biological problems. Many integration methods for cancer subtyping have been proposed, but evaluating these methods has become a complicated problem due to the lack of gold standards. Moreover, questions of practical importance remain to be addressed regarding the impact of selecting appropriate data types and combinations on the performance of integrative studies. Here, we constructed three classes of benchmarking datasets of nine cancers in TCGA by considering all the eleven combinations of four multi-omics data types. Using these datasets, we conducted a comprehensive evaluation of ten representative integration methods for cancer subtyping in terms of accuracy measured by combining both clustering accuracy and clinical significance, robustness, and computational efficiency. We subsequently investigated the influence of different omics data on cancer subtyping and the effectiveness of their combinations. Refuting the widely held intuition that incorporating more types of omics data always produces better results, our analyses showed that there are situations where integrating more omics data negatively impacts the performance of integration methods. Our analyses also suggested several effective combinations for most cancers under our studies, which may be of particular interest to researchers in omics data analysis.  相似文献   

19.
The drug discovery enterprise provides strong drivers for data integration. While attention in this arena has tended to focus on integration of primary data from omics and other large platform technologies contributing to drug discovery and development, the scientific literature remains a major source of information valuable to pharmaceutical enterprises, and therefore tools for mining such data and integrating it with other sources are of vital interest and economic impact. This review provides a brief overview of approaches to literature mining as they relate to drug discovery, and offers an illustrative case study of a 'lightweight' approach we have implemented within an industrial context.  相似文献   

20.
《遗传学报》2021,48(7):520-530
Genetic, epigenetic, and metabolic alterations are all hallmarks of cancer. However, the epigenome and metabolome are both highly complex and dynamic biological networks in vivo. The interplay between the epigenome and metabolome contributes to a biological system that is responsive to the tumor microenvironment and possesses a wealth of unknown biomarkers and targets of cancer therapy. From this perspective, we first review the state of high-throughput biological data acquisition(i.e. multiomics data)and analysis(i.e. computational tools) and then propose a conceptual in silico metabolic and epigenetic regulatory network(MER-Net) that is based on these current high-throughput methods. The conceptual MER-Net is aimed at linking metabolomic and epigenomic networks through observation of biological processes, omics data acquisition, analysis of network information, and integration with validated database knowledge. Thus, MER-Net could be used to reveal new potential biomarkers and therapeutic targets using deep learning models to integrate and analyze large multiomics networks. We propose that MER-Net can serve as a tool to guide integrated metabolomics and epigenomics research or can be modified to answer other complex biological and clinical questions using multiomics data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号