首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background

The investigation of the interconnections between the molecular and genetic events that govern biological systems is essential if we are to understand the development of disease and design effective novel treatments. Microarray and next-generation sequencing technologies have the potential to provide this information. However, taking full advantage of these approaches requires that biological connections be made across large quantities of highly heterogeneous genomic datasets. Leveraging the increasingly huge quantities of genomic data in the public domain is fast becoming one of the key challenges in the research community today.

Methodology/Results

We have developed a novel data mining framework that enables researchers to use this growing collection of public high-throughput data to investigate any set of genes or proteins. The connectivity between molecular states across thousands of heterogeneous datasets from microarrays and other genomic platforms is determined through a combination of rank-based enrichment statistics, meta-analyses, and biomedical ontologies. We address data quality concerns through dataset replication and meta-analysis and ensure that the majority of the findings are derived using multiple lines of evidence. As an example of our strategy and the utility of this framework, we apply our data mining approach to explore the biology of brown fat within the context of the thousands of publicly available gene expression datasets.

Conclusions

Our work presents a practical strategy for organizing, mining, and correlating global collections of large-scale genomic data to explore normal and disease biology. Using a hypothesis-free approach, we demonstrate how a data-driven analysis across very large collections of genomic data can reveal novel discoveries and evidence to support existing hypothesis.  相似文献   

2.
MOTIVATION: To improve the ability of biologists (both researchers and students) to ask biologically interesting questions of the Gene Ontology (GO) database and to explore the ontologies by seeing large portions of the ontology graphs in context, along with details of individual terms in the ontologies. RESULTS: GoGet and GoView are two new tools built as part of an extensible web application system based on Java 2 Enterprise Edition technology. GoGet has a user interface that enables users to ask biologically interesting questions, such as (1) What are the DNA binding proteins involved in DNA repair, but not in DNA replication? and (2) Of the terms containing the word triphosphatase, which have associated gene products from mouse, but not fruit fly? The results of such queries can be viewed in a collapsed tabular format that eases the burden of getting through large tables of data. GoView enables users to explore the large directed acyclic graph structure of the ontologies in the GO database. The two tools are coordinated, so that results from queries in GoGet can be visualized in GoView in the ontology in which they appear, and explorations started from GoView can request details of gene product associations to appear in a result table in GoGet. AVAILABILITY: Free access to the GoGet query tool and free download of the GoView ontology viewer are provided to all users at http://db.math.macalester.edu/goproject. In addition, source code for the GoView tool is also available from this site, along with a user manual for both tools.  相似文献   

3.
4.
MOTIVATION: Single nucleic polymorphisms (SNPs) are one of the most abundant genetic variations in the human genome. Recently, several platforms for high-throughput SNP analysis have become available, capable of measuring thousands of SNPs across the genome. Tools for analysing and visualizing these large genetic data sets in biologically relevant manner are rare. This hinders effective use of the SNP-array data in research on complex diseases, such as cancer. RESULTS: We describe a computational framework to analyse and visualize SNP-array data, and link the results in relevant databases. Our major objective is to develop methods for identifying DNA regions that likely harbour recessive mutations. Thus, the algorithms are designed to have high sensitivity and the identified regions are ranked using a scoring algorithm. We have also developed annotation tools that automatically query gene IDs, exon counts, microarray probe IDs, etc. In our case study, we apply the methods for identifying candidate regions for recessively inherited colorectal cancer predisposition and suggest directions for wet-lab experiments. AVAILABILITY: R-package implementation is available at http://www.ltdk.helsinki.fi/sysbio/csb/downloads/CohortComparator/  相似文献   

5.

Background  

With the vast amounts of biomedical data being generated by high-throughput analysis methods, controlled vocabularies and ontologies are becoming increasingly important to annotate units of information for ease of search and retrieval. Each scientific community tends to create its own locally available ontology. The interfaces to query these ontologies tend to vary from group to group. We saw the need for a centralized location to perform controlled vocabulary queries that would offer both a lightweight web-accessible user interface as well as a consistent, unified SOAP interface for automated queries.  相似文献   

6.
A data-parallel framework is very attractive for large-scale data processing since it enables such an application to easily process a huge amount of data on commodity machines. MapReduce, a popular data-parallel framework, is used in various fields such as web search, data mining and data warehouses; it is proven to be very practical for such a data-parallel application. A star-join query is a popular query in data warehouses that are a current target domain of data-parallel frameworks. This article proposes a new algorithm that efficiently processes star-join queries in data-parallel frameworks such as MapReduce and Dryad. Our star-join algorithm for general data-parallel frameworks is called Scatter-Gather-Merge, and it processes star-join queries in a constant number of computation steps, although the number of participating dimension tables increases. By adopting bloom filters, Scatter-Gather-Merge reduces a non-trivial amount of IO. We also show that Scatter-Gather-Merge can be easily applied to MapReduce. Our experimental results in both cluster and cloud environments show that Scatter-Gather-Merge outperforms existing approaches.  相似文献   

7.
The immense growth of MEDLINE coupled with the realization that a vast amount of biomedical knowledge is recorded in free-text format, has led to the appearance of a large number of literature mining techniques aiming to extract biomedical terms and their inter-relations from the scientific literature. Ontologies have been extensively utilized in the biomedical domain either as controlled vocabularies or to provide the framework for mapping relations between concepts in biology and medicine. Literature-based approaches and ontologies have been used in the past for the purpose of hypothesis generation in connection with drug discovery. Here, we review the application of literature mining and ontology modeling and traversal to the area of drug repurposing (DR). In recent years, DR has emerged as a noteworthy alternative to the traditional drug development process, in response to the decreased productivity of the biopharmaceutical industry. Thus, systematic approaches to DR have been developed, involving a variety of in silico, genomic and high-throughput screening technologies. Attempts to integrate literature mining with other types of data arising from the use of these technologies as well as visualization tools assisting in the discovery of novel associations between existing drugs and new indications will also be presented.  相似文献   

8.
9.

Background

Mapping medical terms to standardized UMLS concepts is a basic step for leveraging biomedical texts in data management and analysis. However, available methods and tools have major limitations in handling queries over the UMLS Metathesaurus that contain inaccurate query terms, which frequently appear in real world applications.

Methods

To provide a practical solution for this task, we propose a layered dynamic programming mapping (LDPMap) approach, which can efficiently handle these queries. LDPMap uses indexing and two layers of dynamic programming techniques to efficiently map a biomedical term to a UMLS concept.

Results

Our empirical study shows that LDPMap achieves much faster query speeds than LCS. In comparison to the UMLS Metathesaurus Browser and MetaMap, LDPMap is much more effective in querying the UMLS Metathesaurus for inaccurately spelled medical terms, long medical terms, and medical terms with special characters.

Conclusions

These results demonstrate that LDPMap is an efficient and effective method for mapping medical terms to the UMLS Metathesaurus.
  相似文献   

10.
11.
Drug discovery is the process of new drug identification. This process is driven by the increasing data from existing chemical libraries and data banks. The knowledge graph is introduced to the domain of drug discovery for imposing an explicit structure to integrate heterogeneous biomedical data. The graph can provide structured relations among multiple entities and unstructured semantic relations associated with entities. In this review, we summarize knowledge graph-based works that implement drug repurposing and adverse drug reaction prediction for drug discovery. As knowledge representation learning is a common way to explore knowledge graphs for prediction problems, we introduce several representative embedding models to provide a comprehensive understanding of knowledge representation learning.  相似文献   

12.
With the accumulation of large amounts of health related data, predictive analytics could stimulate the transformation of reactive medicine towards Predictive, Preventive and Personalized (PPPM) Medicine, ultimately affecting both cost and quality of care. However, high-dimensionality and high-complexity of the data involved, prevents data-driven methods from easy translation into clinically relevant models. Additionally, the application of cutting edge predictive methods and data manipulation require substantial programming skills, limiting its direct exploitation by medical domain experts. This leaves a gap between potential and actual data usage. In this study, the authors address this problem by focusing on open, visual environments, suited to be applied by the medical community. Moreover, we review code free applications of big data technologies. As a showcase, a framework was developed for the meaningful use of data from critical care patients by integrating the MIMIC-II database in a data mining environment (RapidMiner) supporting scalable predictive analytics using visual tools (RapidMiner’s Radoop extension). Guided by the CRoss-Industry Standard Process for Data Mining (CRISP-DM), the ETL process (Extract, Transform, Load) was initiated by retrieving data from the MIMIC-II tables of interest. As use case, correlation of platelet count and ICU survival was quantitatively assessed. Using visual tools for ETL on Hadoop and predictive modeling in RapidMiner, we developed robust processes for automatic building, parameter optimization and evaluation of various predictive models, under different feature selection schemes. Because these processes can be easily adopted in other projects, this environment is attractive for scalable predictive analytics in health research.  相似文献   

13.
14.
As Semantic Web technologies mature and new releases of key elements, such as SPARQL 1.1 and OWL 2.0, become available, the Life Sciences continue to push the boundaries of these technologies with ever more sophisticated tools and applications. Unsurprisingly, therefore, interest in the SWAT4LS (Semantic Web Applications and Tools for the Life Sciences) activities have remained high, as was evident during the third international SWAT4LS workshop held in Berlin in December 2010. Contributors to this workshop were invited to submit extended versions of their papers, the best of which are now made available in the special supplement of BMC Bioinformatics. The papers reflect the wide range of work in this area, covering the storage and querying of Life Sciences data in RDF triple stores, tools for the development of biomedical ontologies and the semantics-based integration of Life Sciences as well as clinicial data.  相似文献   

15.
The quality of data plays an important role in business analysis and decision making, and data accuracy is an important aspect in data quality. Thus one necessary task for data quality management is to evaluate the accuracy of the data. And in order to solve the problem that the accuracy of the whole data set is low while a useful part may be high, it is also necessary to evaluate the accuracy of the query results, called relative accuracy. However, as far as we know, neither measure nor effective methods for the accuracy evaluation methods are proposed. Motivated by this, for relative accuracy evaluation, we propose a systematic method. We design a relative accuracy evaluation framework for relational databases based on a new metric to measure the accuracy using statistics. We apply the methods to evaluate the precision and recall of basic queries, which show the result''s relative accuracy. We also propose the method to handle data update and to improve accuracy evaluation using functional dependencies. Extensive experimental results show the effectiveness and efficiency of our proposed framework and algorithms.  相似文献   

16.
In this paper, we discuss the properties of biological data and challenges it poses for data management, and argue that, in order to meet the data management requirements for 'digital biology', careful integration of the existing technologies and the development of new data management techniques for biological data are needed. Based on this premise, we present PathCase: Case Pathways Database System. PathCase is an integrated set of software tools for modelling, storing, analysing, visualizing and querying biological pathways data at different levels of genetic, molecular, biochemical and organismal detail. The novel features of the system include: (i) genomic information integrated with other biological data and presented starting from pathways; (ii) design for biologists who are possibly unfamiliar with genomics, but whose research is essential for annotating gene and genome sequences with biological functions; (iii) database design, implementation and graphical tools which enable users to visualize pathways data in multiple abstraction levels and to pose exploratory queries; (iv) a wide range of different types of queries including, 'path' and 'neighbourhood queries' and graphical visualization of query outputs; and (v) an implementation that allows for web (XML)-based dissemination of query outputs (i.e. pathways data in BIOPAX format) to researchers in the community, giving them control on the use of pathways data.  相似文献   

17.
Recent advances in technology and associated methodology have made the current period one of the most exciting in molecular biology and medicine. Underlying these is an appreciation that modern research is driven by increasing large amounts of data being interpreted by interdisciplinary collaborative teams which are often geographically dispersed. The availability of cheap computing power, high speed informatics networks and high quality analysis software has been essential to this as has the application of modern quality assurance methodologies. In this review, we discuss the application of modern 'High-Throughput' molecular biological technologies such as 'Microarrays' and 'Next Generation Sequencing' to scientific and biomedical research as we have observed. Furthermore in this review, we also offer some guidance that enables the reader as to understand certain features of these as well as new strategies and help them to apply these i-Gene tools in their endeavours successfully. Collectively, we term this 'i-Gene Analysis'. We also offer predictions as to the developments that are anticipated in the near and more distant future.  相似文献   

18.
The modern biomedical research and healthcare delivery domains have seen an unparalleled increase in the rate of innovation and novel technologies over the past several decades. Catalyzed by paradigm-shifting public and private programs focusing upon the formation and delivery of genomic and personalized medicine, the need for high-throughput and integrative approaches to the collection, management, and analysis of heterogeneous data sets has become imperative. This need is particularly pressing in the translational bioinformatics domain, where many fundamental research questions require the integration of large scale, multi-dimensional clinical phenotype and bio-molecular data sets. Modern biomedical informatics theory and practice has demonstrated the distinct benefits associated with the use of knowledge-based systems in such contexts. A knowledge-based system can be defined as an intelligent agent that employs a computationally tractable knowledge base or repository in order to reason upon data in a targeted domain and reproduce expert performance relative to such reasoning operations. The ultimate goal of the design and use of such agents is to increase the reproducibility, scalability, and accessibility of complex reasoning tasks. Examples of the application of knowledge-based systems in biomedicine span a broad spectrum, from the execution of clinical decision support, to epidemiologic surveillance of public data sets for the purposes of detecting emerging infectious diseases, to the discovery of novel hypotheses in large-scale research data sets. In this chapter, we will review the basic theoretical frameworks that define core knowledge types and reasoning operations with particular emphasis on the applicability of such conceptual models within the biomedical domain, and then go on to introduce a number of prototypical data integration requirements and patterns relevant to the conduct of translational bioinformatics that can be addressed via the design and use of knowledge-based systems.

What to Learn in This Chapter

  • Understand basic knowledge types and structures that can be applied to biomedical and translational science;
  • Gain familiarity with the knowledge engineering cycle, tools and methods that may be used throughout that cycle, and the resulting classes of knowledge products generated via such processes;
  • An understanding of the basic methods and techniques that can be used to employ knowledge products in order to integrate and reason upon heterogeneous and multi-dimensional data sets; and
  • Become conversant in the open research questions/areas related to the ability to develop and apply knowledge collections in the translational bioinformatics domain.
This article is part of the “Translational Bioinformatics” collection for PLOS Computational Biology.
  相似文献   

19.
Fostering data sharing is a scientific and ethical imperative. Health gains can be achieved more comprehensively and quickly by combining large, information-rich datasets from across conventionally siloed disciplines and geographic areas. While collaboration for data sharing is increasingly embraced by policymakers and the international biomedical community, we lack a common ethical and legal framework to connect regulators, funders, consortia, and research projects so as to facilitate genomic and clinical data linkage, global science collaboration, and responsible research conduct. Governance tools can be used to responsibly steer the sharing of data for proper stewardship of research discovery, genomics research resources, and their clinical applications. In this article, we propose that an international code of conduct be designed to enable global genomic and clinical data sharing for biomedical research. To give this proposed code universal application and accountability, however, we propose to position it within a human rights framework. This proposition is not without precedent: international treaties have long recognized that everyone has a right to the benefits of scientific progress and its applications, and a right to the protection of the moral and material interests resulting from scientific productions. It is time to apply these twin rights to internationally collaborative genomic and clinical data sharing.  相似文献   

20.
MOTIVATION: The information model chosen to store biological data affects the types of queries possible, database performance, and difficulty in updating that information model. Genetic sequence data for pharmacogenetics studies can be complex, and the best information model to use may change over time. As experimental and analytical methods change, and as biological knowledge advances, the data storage requirements and types of queries needed may also change. RESULTS: We developed a model for genetic sequence and polymorphism data, and used XML Schema to specify the elements and attributes required for this model. We implemented this model as an ontology in a frame-based representation and as a relational model in a database system. We collected genetic data from two pharmacogenetics resequencing studies, and formulated queries useful for analysing these data. We compared the ontology and relational models in terms of query complexity, performance, and difficulty in changing the information model. Our results demonstrate benefits of evolving the schema for storing pharmacogenetics data: ontologies perform well in early design stages as the information model changes rapidly and simplify query formulation, while relational models offer improved query speed once the information model and types of queries needed stabilize.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号