期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Chapter 16: Text Mining for Translational Bioinformatics

K. Bretonnel Cohen Lawrence E. Hunter 《PLoS computational biology》2013,9(4)

Text mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research—translating basic science results into new interventions—and T2 translational research, or translational research for public health. Potential use cases include better phenotyping of research subjects, and pharmacogenomic research. A variety of methods for evaluating text mining applications exist, including corpora, structured test suites, and post hoc judging. Two basic principles of linguistic structure are relevant for building text mining applications. One is that linguistic structure consists of multiple levels. The other is that every level of linguistic structure is characterized by ambiguity. There are two basic approaches to text mining: rule-based, also known as knowledge-based; and machine-learning-based, also known as statistical. Many systems are hybrids of the two approaches. Shared tasks have had a strong effect on the direction of the field. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing.

What to Learn in This Chapter

Text mining is an established field, but its application to translational bioinformatics is quite new and it presents myriad research opportunities. It is made difficult by the fact that natural (human) language, unlike computer language, is characterized at all levels by rampant ambiguity and variability. Important sub-tasks include gene name recognition, or finding mentions of gene names in text; gene normalization, or mapping mentions of genes in text to standard database identifiers; phenotype recognition, or finding mentions of phenotypes in text; and phenotype normalization, or mapping mentions of phenotypes to concepts in ontologies. Text mining for translational bioinformatics can necessitate dealing with two widely varying genres of text—published journal articles, and prose fields in electronic medical records. Research into the latter has been impeded for years by lack of public availability of data sets, but this has very recently changed and the field is poised for rapid advances. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing.

This article is part of the “Translational Bioinformatics” collection for PLOS Computational Biology.

相似文献

2.

Chapter 8: Biological Knowledge Assembly and Interpretation

Ju Han Kim 《PLoS computational biology》2012,8(12)

相似文献

3.

geneCBR: a translational tool for multiple-microarray analysis and integrative information retrieval for aiding diagnosis in cancer research

Daniel Glez-Peña Fernando Díaz Jesús M Hernández Juan M Corchado Florentino Fdez-Riverola 《BMC bioinformatics》2009,10(1):187-8

Background

Bioinformatics and medical informatics are two research fields that serve the needs of different but related communities. Both domains share the common goal of providing new algorithms, methods and technological solutions to biomedical research, and contributing to the treatment and cure of diseases. Although different microarray techniques have been successfully used to investigate useful information for cancer diagnosis at the gene expression level, the true integration of existing methods into day-to-day clinical practice is still a long way off. Within this context, case-based reasoning emerges as a suitable paradigm specially intended for the development of biomedical informatics applications and decision support systems, given the support and collaboration involved in such a translational development. With the goals of removing barriers against multi-disciplinary collaboration and facilitating the dissemination and transfer of knowledge to real practice, case-based reasoning systems have the potential to be applied to translational research mainly because their computational reasoning paradigm is similar to the way clinicians gather, analyze and process information in their own practice of clinical medicine. 相似文献

4.

KaBOB: ontology-based semantic integration of biomedical databases

Kevin M Livingston Michael Bada William A Baumgartner Jr Lawrence E Hunter 《BMC bioinformatics》2015,16(1)

Background

The ability to query many independent biological databases using a common ontology-based semantic model would facilitate deeper integration and more effective utilization of these diverse and rapidly growing resources. Despite ongoing work moving toward shared data formats and linked identifiers, significant problems persist in semantic data integration in order to establish shared identity and shared meaning across heterogeneous biomedical data sources.

Results

We present five processes for semantic data integration that, when applied collectively, solve seven key problems. These processes include making explicit the differences between biomedical concepts and database records, aggregating sets of identifiers denoting the same biomedical concepts across data sources, and using declaratively represented forward-chaining rules to take information that is variably represented in source databases and integrating it into a consistent biomedical representation. We demonstrate these processes and solutions by presenting KaBOB (the Knowledge Base Of Biomedicine), a knowledge base of semantically integrated data from 18 prominent biomedical databases using common representations grounded in Open Biomedical Ontologies. An instance of KaBOB with data about humans and seven major model organisms can be built using on the order of 500 million RDF triples. All source code for building KaBOB is available under an open-source license.

Conclusions

KaBOB is an integrated knowledge base of biomedical data representationally based in prominent, actively maintained Open Biomedical Ontologies, thus enabling queries of the underlying data in terms of biomedical concepts (e.g., genes and gene products, interactions and processes) rather than features of source-specific data schemas or file formats. KaBOB resolves many of the issues that routinely plague biomedical researchers intending to work with data from multiple data sources and provides a platform for ongoing data integration and development and for formal reasoning over a wealth of integrated biomedical data.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0559-3) contains supplementary material, which is available to authorized users. 相似文献

5.

Gathering and Exploring Scientific Knowledge in Pharmacovigilance

Pedro Lopes Tiago Nunes David Campos Laura Ines Furlong Anna Bauer-Mehren Ferran Sanz Maria Carmen Carrascosa Jordi Mestres Jan Kors Bharat Singh Erik van Mulligen Johan Van der Lei Gayo Diallo Paul Avillach Ernst Ahlberg Scott Boyer Carlos Diaz José Luís Oliveira 《PloS one》2013,8(12)

相似文献

6.

A new tool for prioritization of sequence variants from whole exome sequencing data

Brigitte?Glanzmann Email author Hendri?Herbst Craig?J.?Kinnear Marlo?M?ller Junaid?Gamieldien Soraya?Bardien 《Source code for biology and medicine》2016,11(1):10

Background

Whole exome sequencing (WES) has provided a means for researchers to gain access to a highly enriched subset of the human genome in which to search for variants that are likely to be pathogenic and possibly provide important insights into disease mechanisms. In developing countries, bioinformatics capacity and expertise is severely limited and wet bench scientists are required to take on the challenging task of understanding and implementing the barrage of bioinformatics tools that are available to them.

Results

We designed a novel method for the filtration of WES data called TAPER? (Tool for Automated selection and Prioritization for Efficient Retrieval of sequence variants).

Conclusions

TAPER? implements a set of logical steps by which to prioritize candidate variants that could be associated with disease and this is aimed for implementation in biomedical laboratories with limited bioinformatics capacity. TAPER? is free, can be setup on a Windows operating system (from Windows 7 and above) and does not require any programming knowledge. In summary, we have developed a freely available tool that simplifies variant prioritization from WES data in order to facilitate discovery of disease-causing genes.

相似文献

7.

MuTrack: a genome analysis system for large-scale mutagenesis in the mouse

Erich?J?Baker Email author Leslie?Galloway Barbara?Jackson Denise?Schmoyer Jay?Snoddy 《BMC bioinformatics》2004,5(1):11

相似文献

8.

MARMoSET – Extracting Publication-ready Mass Spectrometry Metadata from RAW Files

《Molecular & cellular proteomics : MCP》2019,18(8):1700-1702

Download : Download high-res image (44KB)
Download : Download full-size image

Highlights

•Automated metadata extraction from potentially large sets of mass spectrometric raw data.
•Reduction of extracted metadata into groups of shared parameter sets.
•Tabular representation for quality control, reporting and publication.

相似文献

9.

The use of light for engineered control and reprogramming of cellular functions

Bacchus W Fussenegger M 《Current opinion in biotechnology》2012,23(5):695-702

Download : Download high-res image (154KB)
Download : Download full-size image

Highlights? Nature is equipped with solutions to use light to drive essential metabolic processes. ? Delivery of foreign genes enables light sensitivity to otherwise insensitive cells. ? Cells are engineered to transform light energy into desired biological functions. ? Light-regulated control has led to great advances in biomedical research. 相似文献

10.

TrigNER: automatically optimized biomedical event trigger recognition on scientific documents

David Campos Quoc-Chinh Bui Sérgio Matos José Luís Oliveira 《Source code for biology and medicine》2014,9(1):1-13

相似文献

11.

Depletion of Cognate Charged Transfer RNA Causes Translational Frameshifting within the Expanded CAG Stretch in Huntingtin

Hannah Girstmair Paul Saffert Sascha Rode Andreas Czech Gudrun Holland Norbert Bannert Zoya Ignatova 《Cell reports》2013,3(1):148-159

Download : Download full-size image

Highlights? Expanded CAG stretches are prone to translational frameshifting ? Depletion of the charged, cognate tRNA causes translational frameshifting ? Frequency of translational frameshifting correlates with the CAG repeat length ? Frameshifted species modulate the aggregation course of the parental protein 相似文献

12.

Immunomic Identification of Malaria Antigens Associated With Protection in Mice

《Molecular & cellular proteomics : MCP》2019,18(5):837-853

Download : Download high-res image (84KB)
Download : Download full-size image

Highlights

•Production of sera with different levels of protection against rodent Plasmodium.
•Generation of immunomic and proteomic data sets enriched in protective antigens.
•Prediction of the most likely protective antigens using a weighted scoring system.

相似文献

13.

Crystal structures of the outer membrane domain of intimin and invasin from enterohemorrhagic E. coli and enteropathogenic Y. pseudotuberculosis

Fairman JW Dautin N Wojtowicz D Liu W Noinaj N Barnard TJ Udho E Przytycka TM Cherezov V Buchanan SK 《Structure (London, England : 1993)》2012,20(7):1233-1243

Download : Download high-res image (228KB)
Download : Download full-size image

Highlights? First intimin or invasin β domain structure—both crystallize as monomers ? First structure solved using 3-λ SeMet MAD data from crystals grown in LCP ? Identification of a non-BIG domain directly downstream of the β domain ? Highly conserved and coevolving residues are identified in the Int/Inv family 相似文献

14.

Interactive processing and visualization of image data for biomedical and life science applications

Staadt OG Natarajan V Weber GH Wiley DF Hamann B 《BMC cell biology》2007,8(Z1):S10

Background

Applications in biomedical science and life science produce large data sets using increasingly powerful imaging devices and computer simulations. It is becoming increasingly difficult for scientists to explore and analyze these data using traditional tools. Interactive data processing and visualization tools can support scientists to overcome these limitations.

Results

We show that new data processing tools and visualization systems can be used successfully in biomedical and life science applications. We present an adaptive high-resolution display system suitable for biomedical image data, algorithms for analyzing and visualization protein surfaces and retinal optical coherence tomography data, and visualization tools for 3D gene expression data.

Conclusion

We demonstrated that interactive processing and visualization methods and systems can support scientists in a variety of biomedical and life science application areas concerned with massive data analysis.

相似文献

15.

Detection of the inferred interaction network in hepatocellular carcinoma from EHCO (Encyclopedia of Hepatocellular Carcinoma genes Online)

Chun-Nan Hsu Jin-Mei Lai Chia-Hung Liu Huei-Hun Tseng Chih-Yun Lin Kuan-Ting Lin Hsu-Hua Yeh Ting-Yi Sung Wen-Lian Hsu Li-Jen Su Sheng-An Lee Chang-Han Chen Gen-Cher Lee DT Lee Yow-Ling Shiue Chang-Wei Yeh Chao-Hui Chang Cheng-Yan Kao Chi-Ying F Huang 《BMC bioinformatics》2007,8(1):1-12

Background

An adequate and expressive ontological representation of biological organisms and their parts requires formal reasoning mechanisms for their relations of physical aggregation and containment.

Results

We demonstrate that the proposed formalism allows to deal consistently with "role propagation along non-taxonomic hierarchies", a problem which had repeatedly been identified as an intricate reasoning problem in biomedical ontologies.

Conclusion

The proposed approach seems to be suitable for the redesign of compositional hierarchies in (bio)medical terminology systems which are embedded into the framework of the OBO (Open Biological Ontologies) Relation Ontology and are using knowledge representation languages developed by the Semantic Web community. 相似文献

16.

Chapter 9: Analyses Using Disease Ontologies

Nigam H. Shah Tyler Cole Mark A. Musen 《PLoS computational biology》2012,8(12)

Advanced statistical methods used to analyze high-throughput data such as gene-expression assays result in long lists of “significant genes.” One way to gain insight into the significance of altered expression levels is to determine whether Gene Ontology (GO) terms associated with a particular biological process, molecular function, or cellular component are over- or under-represented in the set of genes deemed significant. This process, referred to as enrichment analysis, profiles a gene-set, and is widely used to makes sense of the results of high-throughput experiments. The canonical example of enrichment analysis is when the output dataset is a list of genes differentially expressed in some condition. To determine the biological relevance of a lengthy gene list, the usual solution is to perform enrichment analysis with the GO. We can aggregate the annotating GO concepts for each gene in this list, and arrive at a profile of the biological processes or mechanisms affected by the condition under study. While GO has been the principal target for enrichment analysis, the methods of enrichment analysis are generalizable. We can conduct the same sort of profiling along other ontologies of interest. Just as scientists can ask “Which biological process is over-represented in my set of interesting genes or proteins?” we can also ask “Which disease (or class of diseases) is over-represented in my set of interesting genes or proteins?“. For example, by annotating known protein mutations with disease terms from the ontologies in BioPortal, Mort et al. recently identified a class of diseases—blood coagulation disorders—that were associated with a 14-fold depletion in substitutions at O-linked glycosylation sites. With the availability of tools for automatic annotation of datasets with terms from disease ontologies, there is no reason to restrict enrichment analyses to the GO. In this chapter, we will discuss methods to perform enrichment analysis using any ontology available in the biomedical domain. We will review the general methodology of enrichment analysis, the associated challenges, and discuss the novel translational analyses enabled by the existence of public, national computational infrastructure and by the use of disease ontologies in such analyses.

What to Learn in This Chapter

Review the commonly used approach of Gene Ontology based enrichment analysis
Understand the pitfalls associated with current approaches
Understand the national infrastructure available for using alternative ontologies for enrichment analysis
Learn about a generalized enrichment analysis workflow and its application using disease ontologies

This article is part of the “Translational Bioinformatics” collection for PLOS Computational Biology.

相似文献

17.

Analysis of steroid receptor domains with the aid of antihormones

《The International journal of biochemistry》1994,26(3):341-350

1.1. The receptors for steroid hormones consist of well defined domains with overlapping functions.
2.2. Contrary to the classical view, it is now becoming increasingly evident that agonist binding regions of the ligand binding domain are not identical to those that bind steroid antagonists.
3.3. The DNA binding domain can be activated equally well in presence of both agonists and antagonists, again contradicting the classical view where only the physiologically active hormone was believed to induce such a change.
4.4. In some cases, a synthetic antagonist is a more specific ligand for the receptor than the natural hormone.
5.5. Synthetic antagonists are therefore important not only to alleviate disease in the human subject, they have also become an important tool to elucidate the mechanism of transactivation by steroid hormones.

相似文献

18.

Using contextual and lexical features to restructure and validate the classification of biomedical concepts

Jung-Wei Fan Hua Xu Carol Friedman 《BMC bioinformatics》2007,8(1):264

Background

Biomedical ontologies are critical for integration of data from diverse sources and for use by knowledge-based biomedical applications, especially natural language processing as well as associated mining and reasoning systems. The effectiveness of these systems is heavily dependent on the quality of the ontological terms and their classifications. To assist in developing and maintaining the ontologies objectively, we propose automatic approaches to classify and/or validate their semantic categories. In previous work, we developed an approach using contextual syntactic features obtained from a large domain corpus to reclassify and validate concepts of the Unified Medical Language System (UMLS), a comprehensive resource of biomedical terminology. In this paper, we introduce another classification approach based on words of the concept strings and compare it to the contextual syntactic approach. 相似文献

19.

Transcriptome data modeling for targeted plant metabolic engineering

《Current opinion in biotechnology》2013,24(2):285-290

Download : Download high-res image (235KB)
Download : Download full-size image

Highlights► Network analysis is essential for data mining of omics-based large-scale data sets. ► Gene coexpression analysis is useful for prediction of gene function. ► Comparative network analysis can reveal common and unique plant metabolic pathways. ► Novel genome editing tools facilitate rational metabolic engineering. 相似文献

20.

The emerging CHO systems biology era: harnessing the ‘omics revolution for biotechnology

《Current opinion in biotechnology》2013,24(6):1102-1107

相似文献