期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The semantic metadatabase (SEMEDA): ontology based integration of federated molecular biological data sources

Köhler J Schulze-Kremer S 《In silico biology》2002,2(3):219-231

A system for "intelligent" semantic integration and querying of federated databases is being implemented by using three main components: A component which enables SQL access to integrated databases by database federation (MARGBench), an ontology based semantic metadatabase (SEMEDA) and an ontology based query interface (SEMEDA-query). In this publication we explain and demonstrate the principles, architecture and the use of SEMEDA. Since SEMEDA is implemented as 3 tiered web application database providers can enter all relevant semantic and technical information about their databases by themselves via a web browser. SEMEDA' s collaborative ontology editing feature is not restricted to database integration, and might also be useful for ongoing ontology developments, such as the "Gene Ontology" [2]. SEMEDA can be found at http://www-bm.cs.uni-magdeburg.de/semeda/. We explain how this ontologically structured information can be used for semantic database integration. In addition, requirements to ontologies for molecular biological database integration are discussed and relevant existing ontologies are evaluated. We further discuss how ontologies and structured knowledge sources can be used in SEMEDA and whether they can be merged supplemented or updated to meet the requirements for semantic database integration. 相似文献

2.

A new semantic web service classification (SWSC) strategy

Shimaa E. El-Sayyad Ahmed I. Saleh Hesham A. Ali 《Cluster computing》2018,21(3):1639-1665

相似文献

3.

BioCaster: detecting public health rumors with a Web-based text mining system

Collier N Doan S Kawazoe A Goodwin RM Conway M Tateno Y Ngo QH Dien D Kawtrakul A Takeuchi K Shigematsu M Taniguchi K 《Bioinformatics (Oxford, England)》2008,24(24):2940-2941

SUMMARY: BioCaster is an ontology-based text mining system for detecting and tracking the distribution of infectious disease outbreaks from linguistic signals on the Web. The system continuously analyzes documents reported from over 1700 RSS feeds, classifies them for topical relevance and plots them onto a Google map using geocoded information. The background knowledge for bridging the gap between Layman's terms and formal-coding systems is contained in the freely available BioCaster ontology which includes information in eight languages focused on the epidemiological role of pathogens as well as geographical locations with their latitudes/longitudes. The system consists of four main stages: topic classification, named entity recognition (NER), disease/location detection and event recognition. Higher order event analysis is used to detect more precisely specified warning signals that can then be notified to registered users via email alerts. Evaluation of the system for topic recognition and entity identification is conducted on a gold standard corpus of annotated news articles. AVAILABILITY: The BioCaster map and ontology are freely available via a web portal at http://www.biocaster.org. 相似文献

4.

Turning text into research networks: information retrieval and computational ontologies in the creation of scientific databases

Ceci F Pietrobon R Gonçalves AL 《PloS one》2012,7(1):e27499

Background

Web-based, free-text documents on science and technology have been increasing growing on the web. However, most of these documents are not immediately processable by computers slowing down the acquisition of useful information. Computational ontologies might represent a possible solution by enabling semantically machine readable data sets. But, the process of ontology creation, instantiation and maintenance is still based on manual methodologies and thus time and cost intensive.

Method

We focused on a large corpus containing information on researchers, research fields, and institutions. We based our strategy on traditional entity recognition, social computing and correlation. We devised a semi automatic approach for the recognition, correlation and extraction of named entities and relations from textual documents which are then used to create, instantiate, and maintain an ontology.

Results

We present a prototype demonstrating the applicability of the proposed strategy, along with a case study describing how direct and indirect relations can be extracted from academic and professional activities registered in a database of curriculum vitae in free-text format. We present evidence that this system can identify entities to assist in the process of knowledge extraction and representation to support ontology maintenance. We also demonstrate the extraction of relationships among ontology classes and their instances.

Conclusion

We have demonstrated that our system can be used for the conversion of research information in free text format into database with a semantic structure. Future studies should test this system using the growing number of free-text information available at the institutional and national levels. 相似文献

5.

Guide to the Internet. The world wide web.

M. Pallen 《BMJ (Clinical research ed.)》1995,311(7019):1552-1556

The world wide web provides a uniform, user friendly interface to the Internet. Web pages can contain text and pictures and are interconnected by hypertext links. The addresses of web pages are recorded as uniform resource locators (URLs), transmitted by hypertext transfer protocol (HTTP), and written in hypertext markup language (HTML). Programs that allow you to use the web are available for most operating systems. Powerful on line search engines make it relatively easy to find information on the web. Browsing through the web--"net surfing"--is both easy and enjoyable. Contributing to the web is not difficult, and the web opens up new possibilities for electronic publishing and electronic journals. 相似文献

6.

Biomedical word sense disambiguation with ontologies and metadata: automation meets accuracy

Dimitra Alexopoulou Bill Andreopoulos Heiko Dietze Andreas Doms Fabien Gandon J?rg Hakenberg Khaled Khelif Michael Schroeder Thomas W?chter 《BMC bioinformatics》2009,10(1):28

Background

Ontology term labels can be ambiguous and have multiple senses. While this is no problem for human annotators, it is a challenge to automated methods, which identify ontology terms in text. Classical approaches to word sense disambiguation use co-occurring words or terms. However, most treat ontologies as simple terminologies, without making use of the ontology structure or the semantic similarity between terms. Another useful source of information for disambiguation are metadata. Here, we systematically compare three approaches to word sense disambiguation, which use ontologies and metadata, respectively. 相似文献

7.

A web service for biomedical term look-up

Harkema H Roberts I Gaizauskas R Hepple M 《Comparative and Functional Genomics》2005,6(1-2):86-93

Recent years have seen a huge increase in the amount of biomedical information that is available in electronic format. Consequently, for biomedical researchers wishing to relate their experimental results to relevant data lurking somewhere within this expanding universe of on-line information, the ability to access and navigate biomedical information sources in an efficient manner has become increasingly important. Natural language and text processing techniques can facilitate this task by making the information contained in textual resources such as MEDLINE more readily accessible and amenable to computational processing. Names of biological entities such as genes and proteins provide critical links between different biomedical information sources and researchers' experimental data. Therefore, automatic identification and classification of these terms in text is an essential capability of any natural language processing system aimed at managing the wealth of biomedical information that is available electronically. To support term recognition in the biomedical domain, we have developed Termino, a large-scale terminological resource for text processing applications, which has two main components: first, a database into which very large numbers of terms can be loaded from resources such as UMLS, and stored together with various kinds of relevant information; second, a finite state recognizer, for fast and efficient identification and mark-up of terms within text. Since many biomedical applications require this functionality, we have made Termino available to the community as a web service, which allows for its integration into larger applications as a remotely located component, accessed through a standardized interface over the web. 相似文献

8.

OrchidBase: a collection of sequences of the transcriptome derived from orchids 总被引：1，自引：0，他引：1

Fu CH Chen YW Hsiao YY Pan ZJ Liu ZJ Huang YM Tsai WC Chen HH 《Plant & cell physiology》2011,52(2):238-243

Orchids are one of the most ecological and evolutionarily significant plants, and the Orchidaceae is one of the most abundant families of the angiosperms. Genetic databases will be useful not only for gene discovery but also for future genomic annotation. For this purpose, OrchidBase was established from 37,979,342 sequence reads collected from 11 in-house Phalaenopsis orchid cDNA libraries. Among them, 41,310 expressed sequence tags (ESTs) were obtained by using Sanger sequencing, whereas 37,908,032 reads were obtained by using next-generation sequencing (NGS) including both Roche 454 and Solexa Illumina sequencers. These reads were assembled into 8,501 contigs and 76,116 singletons, resulting in 84,617 non-redundant transcribed sequences with an average length of 459 bp. The analysis pipeline of the database is an automated system written in Perl and C#, and consists of the following components: automatic pre-processing of EST reads, assembly of raw sequences, annotation of the assembled sequences and storage of the analyzed information in SQL databases. A web application was implemented with HTML and a Microsoft .NET Framework C# program for browsing and querying the database, creating dynamic web pages on the client side, analyzing gene ontology (GO) and mapping annotated enzymes to KEGG pathways. The online resources for putative annotation can be searched either by text or by using BLAST, and the results can be explored on the website and downloaded. Consequently, the establishment of OrchidBase will provide researchers with a high-quality genetic resource for data mining and facilitate efficient experimental studies on orchid biology and biotechnology. The OrchidBase database is freely available at http://lab.fhes.tn.edu.tw/est. 相似文献

9.

Semantic reclassification of the UMLS concepts

Fan JW Friedman C 《Bioinformatics (Oxford, England)》2008,24(17):1971-1973

Accurate semantic classification is valuable for text mining and knowledge-based tasks that perform inference based on semantic classes. To benefit applications using the semantic classification of the Unified Medical Language System (UMLS) concepts, we automatically reclassified the concepts based on their lexical and contextual features. The new classification is useful for auditing the original UMLS semantic classification and for building biomedical text mining applications. AVAILABILITY: http://www.dbmi.columbia.edu/~juf7002/reclassify_production 相似文献

10.

MaSTerClass: a case-based reasoning system for the classification of biomedical terms

Spasic I Ananiadou S Tsujii J 《Bioinformatics (Oxford, England)》2005,21(11):2748-2758

MOTIVATION: The sheer volume of textually described biomedical knowledge exerts the need for natural language processing (NLP) applications in order to allow flexible and efficient access to relevant information. Specialized semantic networks (such as biomedical ontologies, terminologies or semantic lexicons) can significantly enhance these applications by supplying the necessary terminological information in a machine-readable form. With the explosive growth of bio-literature, new terms (representing newly identified concepts or variations of the existing terms) may not be explicitly described within the network and hence cannot be fully exploited by NLP applications. Linguistic and statistical clues can be used to extract many new terms from free text. The extracted terms still need to be correctly positioned relative to other terms in the network. Classification as a means of semantic typing represents the first step in updating a semantic network with new terms. RESULTS: The MaSTerClass system implements the case-based reasoning methodology for the classification of biomedical terms. 相似文献

11.

Knowledge Retrieval from PubMed Abstracts and Electronic Medical Records with the Multiple Sclerosis Ontology

Ashutosh Malhotra Michaela Gündel Abdul Mateen Rajput Heinz-Theodor Mevissen Albert Saiz Xavier Pastor Raimundo Lozano-Rubi Elena H. Martinez-Lapsicina Irati Zubizarreta Bernd Mueller Ekaterina Kotelnikova Luca Toldo Martin Hofmann-Apitius Pablo Villoslada 《PloS one》2015,10(2)

BackgroundIn order to retrieve useful information from scientific literature and electronic medical records (EMR) we developed an ontology specific for Multiple Sclerosis (MS).MethodsThe MS Ontology was created using scientific literature and expert review under the Protégé OWL environment. We developed a dictionary with semantic synonyms and translations to different languages for mining EMR. The MS Ontology was integrated with other ontologies and dictionaries (diseases/comorbidities, gene/protein, pathways, drug) into the text-mining tool SCAIView. We analyzed the EMRs from 624 patients with MS using the MS ontology dictionary in order to identify drug usage and comorbidities in MS. Testing competency questions and functional evaluation using F statistics further validated the usefulness of MS ontology.ResultsValidation of the lexicalized ontology by means of named entity recognition-based methods showed an adequate performance (F score = 0.73). The MS Ontology retrieved 80% of the genes associated with MS from scientific abstracts and identified additional pathways targeted by approved disease-modifying drugs (e.g. apoptosis pathways associated with mitoxantrone, rituximab and fingolimod). The analysis of the EMR from patients with MS identified current usage of disease modifying drugs and symptomatic therapy as well as comorbidities, which are in agreement with recent reports.ConclusionThe MS Ontology provides a semantic framework that is able to automatically extract information from both scientific literature and EMR from patients with MS, revealing new pathogenesis insights as well as new clinical information. 相似文献

12.

High performance web server architecture with Kernel-level caching

Yang-Sun Lee Leonard Barolli Min Choi 《Cluster computing》2013,16(3):339-346

In this work we are focusing on reducing response time and bandwidth requirements for high performance web server. Many researches have been done in order to improve web server performance by modifying the web server architecture. In contrast to these approaches, we take a different point of view, in which we consider the web server performance in OS perspective rather than web server architecture itself. To achieve these purposes we are exploring two different approaches. The first is running web server within OS kernel. We use kHTTPd as our basis for implementation. But it has a several drawbacks such as copying data redundantly, synchronous write, and processing only static data. We propose some techniques to improve these flaws. The second approach is caching dynamic data. Dynamic data can seriously reduce the performance of web servers. Caching dynamic data has been thought difficult to cache because it often change a lot more frequently than static pages and because web server needs to access database to provide service with dynamic data. To this end, we propose a solution for higher performance web service by caching dynamic data using content separation between static and dynamic portions. Benchmark results using WebStone show that our architecture can improve server performance by up to 18 percent and can reduce user’s perceived latency significantly. 相似文献

13.

The chemical information ontology: provenance and disambiguation for chemical data on the biological semantic web

Hastings J Chepelev L Willighagen E Adams N Steinbeck C Dumontier M 《PloS one》2011,6(10):e25513

相似文献

14.

Text mining and ontologies in biomedicine: making sense of raw text 总被引：1，自引：0，他引：1

Spasic I Ananiadou S McNaught J Kumar A 《Briefings in bioinformatics》2005,6(3):239-251

The volume of biomedical literature is increasing at such a rate that it is becoming difficult to locate, retrieve and manage the reported information without text mining, which aims to automatically distill information, extract facts, discover implicit links and generate hypotheses relevant to user needs. Ontologies, as conceptual models, provide the necessary framework for semantic representation of textual information. The principal link between text and an ontology is terminology, which maps terms to domain-specific concepts. This paper summarises different approaches in which ontologies have been used for text-mining applications in biomedicine. 相似文献

15.

Gene Function Prediction Based on the Gene Ontology Hierarchical Structure

Liangxi Cheng Hongfei Lin Yuncui Hu Jian Wang Zhihao Yang 《PloS one》2014,9(9)

The information of the Gene Ontology annotation is helpful in the explanation of life science phenomena, and can provide great support for the research of the biomedical field. The use of the Gene Ontology is gradually affecting the way people store and understand bioinformatic data. To facilitate the prediction of gene functions with the aid of text mining methods and existing resources, we transform it into a multi-label top-down classification problem and develop a method that uses the hierarchical relationships in the Gene Ontology structure to relieve the quantitative imbalance of positive and negative training samples. Meanwhile the method enhances the discriminating ability of classifiers by retaining and highlighting the key training samples. Additionally, the top-down classifier based on a tree structure takes the relationship of target classes into consideration and thus solves the incompatibility between the classification results and the Gene Ontology structure. Our experiment on the Gene Ontology annotation corpus achieves an F-value performance of 50.7% (precision: 52.7% recall: 48.9%). The experimental results demonstrate that when the size of training set is small, it can be expanded via topological propagation of associated documents between the parent and child nodes in the tree structure. The top-down classification model applies to the set of texts in an ontology structure or with a hierarchical relationship. 相似文献

16.

A DNA-Based Semantic Fusion Model for Remote Sensing Data

Heng Sun Jian Weng Guangchuang Yu Richard H. Massawe 《PloS one》2013,8(10)

Semantic technology plays a key role in various domains, from conversation understanding to algorithm analysis. As the most efficient semantic tool, ontology can represent, process and manage the widespread knowledge. Nowadays, many researchers use ontology to collect and organize data''s semantic information in order to maximize research productivity. In this paper, we firstly describe our work on the development of a remote sensing data ontology, with a primary focus on semantic fusion-driven research for big data. Our ontology is made up of 1,264 concepts and 2,030 semantic relationships. However, the growth of big data is straining the capacities of current semantic fusion and reasoning practices. Considering the massive parallelism of DNA strands, we propose a novel DNA-based semantic fusion model. In this model, a parallel strategy is developed to encode the semantic information in DNA for a large volume of remote sensing data. The semantic information is read in a parallel and bit-wise manner and an individual bit is converted to a base. By doing so, a considerable amount of conversion time can be saved, i.e., the cluster-based multi-processes program can reduce the conversion time from 81,536 seconds to 4,937 seconds for 4.34 GB source data files. Moreover, the size of result file recording DNA sequences is 54.51 GB for parallel C program compared with 57.89 GB for sequential Perl. This shows that our parallel method can also reduce the DNA synthesis cost. In addition, data types are encoded in our model, which is a basis for building type system in our future DNA computer. Finally, we describe theoretically an algorithm for DNA-based semantic fusion. This algorithm enables the process of integration of the knowledge from disparate remote sensing data sources into a consistent, accurate, and complete representation. This process depends solely on ligation reaction and screening operations instead of the ontology. 相似文献

17.

Textpresso: an ontology-based information retrieval and extraction system for biological literature

Müller HM Kenny EE Sternberg PW 《PLoS biology》2004,2(11):e309

We have developed Textpresso, a new text-mining system for scientific literature whose capabilities go far beyond those of a simple keyword search engine. Textpresso's two major elements are a collection of the full text of scientific articles split into individual sentences, and the implementation of categories of terms for which a database of articles and individual sentences can be searched. The categories are classes of biological concepts (e.g., gene, allele, cell or cell group, phenotype, etc.) and classes that relate two objects (e.g., association, regulation, etc.) or describe one (e.g., biological process, etc.). Together they form a catalog of types of objects and concepts called an ontology. After this ontology is populated with terms, the whole corpus of articles and abstracts is marked up to identify terms of these categories. The current ontology comprises 33 categories of terms. A search engine enables the user to search for one or a combination of these tags and/or keywords within a sentence or document, and as the ontology allows word meaning to be queried, it is possible to formulate semantic queries. Full text access increases recall of biological data types from 45% to 95%. Extraction of particular biological facts, such as gene-gene interactions, can be accelerated significantly by ontologies, with Textpresso automatically performing nearly as well as expert curators to identify sentences; in searches for two uniquely named genes and an interaction term, the ontology confers a 3-fold increase of search efficiency. Textpresso currently focuses on Caenorhabditis elegans literature, with 3,800 full text articles and 16,000 abstracts. The lexicon of the ontology contains 14,500 entries, each of which includes all versions of a specific word or phrase, and it includes all categories of the Gene Ontology database. Textpresso is a useful curation tool, as well as search engine for researchers, and can readily be extended to other organism-specific corpora of text. Textpresso can be accessed at http://www.textpresso.org or via WormBase at http://www.wormbase.org. 相似文献

18.

Enhancing biomedical text summarization using semantic relation extraction

Shang Y Li Y Lin H Yang Z 《PloS one》2011,6(8):e23862

Automatic text summarization for a biomedical concept can help researchers to get the key points of a certain topic from large amount of biomedical literature efficiently. In this paper, we present a method for generating text summary for a given biomedical concept, e.g., H1N1 disease, from multiple documents based on semantic relation extraction. Our approach includes three stages: 1) We extract semantic relations in each sentence using the semantic knowledge representation tool SemRep. 2) We develop a relation-level retrieval method to select the relations most relevant to each query concept and visualize them in a graphic representation. 3) For relations in the relevant set, we extract informative sentences that can interpret them from the document collection to generate text summary using an information retrieval based method. Our major focus in this work is to investigate the contribution of semantic relation extraction to the task of biomedical text summarization. The experimental results on summarization for a set of diseases show that the introduction of semantic knowledge improves the performance and our results are better than the MEAD system, a well-known tool for text summarization. 相似文献

19.

Towards a semantic lexicon for biological language processing

Verspoor K 《Comparative and Functional Genomics》2005,6(1-2):61-66

This paper explores the use of the resources in the National Library of Medicine's Unified Medical Language System (UMLS) for the construction of a lexicon useful for processing texts in the field of molecular biology. A lexicon is constructed from overlapping terms in the UMLS SPECIALIST lexicon and the UMLS Metathesaurus to obtain both morphosyntactic and semantic information for terms, and the coverage of a domain corpus is assessed. Over 77% of tokens in the domain corpus are found in the constructed lexicon, validating the lexicon's coverage of the most frequent terms in the domain and indicating that the constructed lexicon is potentially an important resource for biological text processing. 相似文献

20.

A task framework for the web interface W2H

Ernst P Glatting KH Suhai S 《Bioinformatics (Oxford, England)》2003,19(2):278-282

相似文献