首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Text mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research—translating basic science results into new interventions—and T2 translational research, or translational research for public health. Potential use cases include better phenotyping of research subjects, and pharmacogenomic research. A variety of methods for evaluating text mining applications exist, including corpora, structured test suites, and post hoc judging. Two basic principles of linguistic structure are relevant for building text mining applications. One is that linguistic structure consists of multiple levels. The other is that every level of linguistic structure is characterized by ambiguity. There are two basic approaches to text mining: rule-based, also known as knowledge-based; and machine-learning-based, also known as statistical. Many systems are hybrids of the two approaches. Shared tasks have had a strong effect on the direction of the field. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing.

What to Learn in This Chapter

Text mining is an established field, but its application to translational bioinformatics is quite new and it presents myriad research opportunities. It is made difficult by the fact that natural (human) language, unlike computer language, is characterized at all levels by rampant ambiguity and variability. Important sub-tasks include gene name recognition, or finding mentions of gene names in text; gene normalization, or mapping mentions of genes in text to standard database identifiers; phenotype recognition, or finding mentions of phenotypes in text; and phenotype normalization, or mapping mentions of phenotypes to concepts in ontologies. Text mining for translational bioinformatics can necessitate dealing with two widely varying genres of text—published journal articles, and prose fields in electronic medical records. Research into the latter has been impeded for years by lack of public availability of data sets, but this has very recently changed and the field is poised for rapid advances. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing.
This article is part of the “Translational Bioinformatics” collection for PLOS Computational Biology.
  相似文献   

2.
3.

Background  

Bioinformatics and medical informatics are two research fields that serve the needs of different but related communities. Both domains share the common goal of providing new algorithms, methods and technological solutions to biomedical research, and contributing to the treatment and cure of diseases. Although different microarray techniques have been successfully used to investigate useful information for cancer diagnosis at the gene expression level, the true integration of existing methods into day-to-day clinical practice is still a long way off. Within this context, case-based reasoning emerges as a suitable paradigm specially intended for the development of biomedical informatics applications and decision support systems, given the support and collaboration involved in such a translational development. With the goals of removing barriers against multi-disciplinary collaboration and facilitating the dissemination and transfer of knowledge to real practice, case-based reasoning systems have the potential to be applied to translational research mainly because their computational reasoning paradigm is similar to the way clinicians gather, analyze and process information in their own practice of clinical medicine.  相似文献   

4.

Background

The ability to query many independent biological databases using a common ontology-based semantic model would facilitate deeper integration and more effective utilization of these diverse and rapidly growing resources. Despite ongoing work moving toward shared data formats and linked identifiers, significant problems persist in semantic data integration in order to establish shared identity and shared meaning across heterogeneous biomedical data sources.

Results

We present five processes for semantic data integration that, when applied collectively, solve seven key problems. These processes include making explicit the differences between biomedical concepts and database records, aggregating sets of identifiers denoting the same biomedical concepts across data sources, and using declaratively represented forward-chaining rules to take information that is variably represented in source databases and integrating it into a consistent biomedical representation. We demonstrate these processes and solutions by presenting KaBOB (the Knowledge Base Of Biomedicine), a knowledge base of semantically integrated data from 18 prominent biomedical databases using common representations grounded in Open Biomedical Ontologies. An instance of KaBOB with data about humans and seven major model organisms can be built using on the order of 500 million RDF triples. All source code for building KaBOB is available under an open-source license.

Conclusions

KaBOB is an integrated knowledge base of biomedical data representationally based in prominent, actively maintained Open Biomedical Ontologies, thus enabling queries of the underlying data in terms of biomedical concepts (e.g., genes and gene products, interactions and processes) rather than features of source-specific data schemas or file formats. KaBOB resolves many of the issues that routinely plague biomedical researchers intending to work with data from multiple data sources and provides a platform for ongoing data integration and development and for formal reasoning over a wealth of integrated biomedical data.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0559-3) contains supplementary material, which is available to authorized users.  相似文献   

5.
6.

Background

Whole exome sequencing (WES) has provided a means for researchers to gain access to a highly enriched subset of the human genome in which to search for variants that are likely to be pathogenic and possibly provide important insights into disease mechanisms. In developing countries, bioinformatics capacity and expertise is severely limited and wet bench scientists are required to take on the challenging task of understanding and implementing the barrage of bioinformatics tools that are available to them.

Results

We designed a novel method for the filtration of WES data called TAPER? (Tool for Automated selection and Prioritization for Efficient Retrieval of sequence variants).

Conclusions

TAPER? implements a set of logical steps by which to prioritize candidate variants that could be associated with disease and this is aimed for implementation in biomedical laboratories with limited bioinformatics capacity. TAPER? is free, can be setup on a Windows operating system (from Windows 7 and above) and does not require any programming knowledge. In summary, we have developed a freely available tool that simplifies variant prioritization from WES data in order to facilitate discovery of disease-causing genes.
  相似文献   

7.
8.
  1. Download : Download high-res image (44KB)
  2. Download : Download full-size image
Highlights
  • •Automated metadata extraction from potentially large sets of mass spectrometric raw data.
  • •Reduction of extracted metadata into groups of shared parameter sets.
  • •Tabular representation for quality control, reporting and publication.
  相似文献   

9.
Highlights? Nature is equipped with solutions to use light to drive essential metabolic processes. ? Delivery of foreign genes enables light sensitivity to otherwise insensitive cells. ? Cells are engineered to transform light energy into desired biological functions. ? Light-regulated control has led to great advances in biomedical research.  相似文献   

10.
11.
Highlights? Expanded CAG stretches are prone to translational frameshifting ? Depletion of the charged, cognate tRNA causes translational frameshifting ? Frequency of translational frameshifting correlates with the CAG repeat length ? Frameshifted species modulate the aggregation course of the parental protein  相似文献   

12.
  1. Download : Download high-res image (84KB)
  2. Download : Download full-size image
Highlights
  • •Production of sera with different levels of protection against rodent Plasmodium.
  • •Generation of immunomic and proteomic data sets enriched in protective antigens.
  • •Prediction of the most likely protective antigens using a weighted scoring system.
  相似文献   

13.
Highlights? First intimin or invasin β domain structure—both crystallize as monomers ? First structure solved using 3-λ SeMet MAD data from crystals grown in LCP ? Identification of a non-BIG domain directly downstream of the β domain ? Highly conserved and coevolving residues are identified in the Int/Inv family  相似文献   

14.

Background

Applications in biomedical science and life science produce large data sets using increasingly powerful imaging devices and computer simulations. It is becoming increasingly difficult for scientists to explore and analyze these data using traditional tools. Interactive data processing and visualization tools can support scientists to overcome these limitations.

Results

We show that new data processing tools and visualization systems can be used successfully in biomedical and life science applications. We present an adaptive high-resolution display system suitable for biomedical image data, algorithms for analyzing and visualization protein surfaces and retinal optical coherence tomography data, and visualization tools for 3D gene expression data.

Conclusion

We demonstrated that interactive processing and visualization methods and systems can support scientists in a variety of biomedical and life science application areas concerned with massive data analysis.
  相似文献   

15.

Background

An adequate and expressive ontological representation of biological organisms and their parts requires formal reasoning mechanisms for their relations of physical aggregation and containment.

Results

We demonstrate that the proposed formalism allows to deal consistently with "role propagation along non-taxonomic hierarchies", a problem which had repeatedly been identified as an intricate reasoning problem in biomedical ontologies.

Conclusion

The proposed approach seems to be suitable for the redesign of compositional hierarchies in (bio)medical terminology systems which are embedded into the framework of the OBO (Open Biological Ontologies) Relation Ontology and are using knowledge representation languages developed by the Semantic Web community.  相似文献   

16.
Advanced statistical methods used to analyze high-throughput data such as gene-expression assays result in long lists of “significant genes.” One way to gain insight into the significance of altered expression levels is to determine whether Gene Ontology (GO) terms associated with a particular biological process, molecular function, or cellular component are over- or under-represented in the set of genes deemed significant. This process, referred to as enrichment analysis, profiles a gene-set, and is widely used to makes sense of the results of high-throughput experiments. The canonical example of enrichment analysis is when the output dataset is a list of genes differentially expressed in some condition. To determine the biological relevance of a lengthy gene list, the usual solution is to perform enrichment analysis with the GO. We can aggregate the annotating GO concepts for each gene in this list, and arrive at a profile of the biological processes or mechanisms affected by the condition under study. While GO has been the principal target for enrichment analysis, the methods of enrichment analysis are generalizable. We can conduct the same sort of profiling along other ontologies of interest. Just as scientists can ask “Which biological process is over-represented in my set of interesting genes or proteins?” we can also ask “Which disease (or class of diseases) is over-represented in my set of interesting genes or proteins?“. For example, by annotating known protein mutations with disease terms from the ontologies in BioPortal, Mort et al. recently identified a class of diseases—blood coagulation disorders—that were associated with a 14-fold depletion in substitutions at O-linked glycosylation sites. With the availability of tools for automatic annotation of datasets with terms from disease ontologies, there is no reason to restrict enrichment analyses to the GO. In this chapter, we will discuss methods to perform enrichment analysis using any ontology available in the biomedical domain. We will review the general methodology of enrichment analysis, the associated challenges, and discuss the novel translational analyses enabled by the existence of public, national computational infrastructure and by the use of disease ontologies in such analyses.

What to Learn in This Chapter

  • Review the commonly used approach of Gene Ontology based enrichment analysis
  • Understand the pitfalls associated with current approaches
  • Understand the national infrastructure available for using alternative ontologies for enrichment analysis
  • Learn about a generalized enrichment analysis workflow and its application using disease ontologies
This article is part of the “Translational Bioinformatics” collection for PLOS Computational Biology.
  相似文献   

17.
  • 1.1. The receptors for steroid hormones consist of well defined domains with overlapping functions.
  • 2.2. Contrary to the classical view, it is now becoming increasingly evident that agonist binding regions of the ligand binding domain are not identical to those that bind steroid antagonists.
  • 3.3. The DNA binding domain can be activated equally well in presence of both agonists and antagonists, again contradicting the classical view where only the physiologically active hormone was believed to induce such a change.
  • 4.4. In some cases, a synthetic antagonist is a more specific ligand for the receptor than the natural hormone.
  • 5.5. Synthetic antagonists are therefore important not only to alleviate disease in the human subject, they have also become an important tool to elucidate the mechanism of transactivation by steroid hormones.
  相似文献   

18.

Background  

Biomedical ontologies are critical for integration of data from diverse sources and for use by knowledge-based biomedical applications, especially natural language processing as well as associated mining and reasoning systems. The effectiveness of these systems is heavily dependent on the quality of the ontological terms and their classifications. To assist in developing and maintaining the ontologies objectively, we propose automatic approaches to classify and/or validate their semantic categories. In previous work, we developed an approach using contextual syntactic features obtained from a large domain corpus to reclassify and validate concepts of the Unified Medical Language System (UMLS), a comprehensive resource of biomedical terminology. In this paper, we introduce another classification approach based on words of the concept strings and compare it to the contextual syntactic approach.  相似文献   

19.
  1. Download : Download high-res image (235KB)
  2. Download : Download full-size image
Highlights► Network analysis is essential for data mining of omics-based large-scale data sets. ► Gene coexpression analysis is useful for prediction of gene function. ► Comparative network analysis can reveal common and unique plant metabolic pathways. ► Novel genome editing tools facilitate rational metabolic engineering.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号