首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
An upper-level ontology for the biomedical domain   总被引:1,自引:0,他引:1  
At the US National Library of Medicine we have developed the Unified Medical Language System (UMLS), whose goal it is to provide integrated access to a large number of biomedical resources by unifying the vocabularies that are used to access those resources. The UMLS currently interrelates some 60 controlled vocabularies in the biomedical domain. The UMLS coverage is quite extensive, including not only many concepts in clinical medicine, but also a large number of concepts applicable to the broad domain of the life sciences. In order to provide an overarching conceptual framework for all UMLS concepts, we developed an upper-level ontology, called the UMLS semantic network. The semantic network, through its 134 semantic types, provides a consistent categorization of all concepts represented in the UMLS. The 54 links between the semantic types provide the structure for the network and represent important relationships in the biomedical domain. Because of the growing number of information resources that contain genetic information, the UMLS coverage in this area is being expanded. We recently integrated the taxonomy of organisms developed by the NLM's National Center for Biotechnology Information, and we are currently working together with the developers of the Gene Ontology to integrate this resource, as well. As additional, standard, ontologies become publicly available, we expect to integrate these into the UMLS construct.  相似文献   

2.

Background  

Biomedical ontologies are critical for integration of data from diverse sources and for use by knowledge-based biomedical applications, especially natural language processing as well as associated mining and reasoning systems. The effectiveness of these systems is heavily dependent on the quality of the ontological terms and their classifications. To assist in developing and maintaining the ontologies objectively, we propose automatic approaches to classify and/or validate their semantic categories. In previous work, we developed an approach using contextual syntactic features obtained from a large domain corpus to reclassify and validate concepts of the Unified Medical Language System (UMLS), a comprehensive resource of biomedical terminology. In this paper, we introduce another classification approach based on words of the concept strings and compare it to the contextual syntactic approach.  相似文献   

3.
We have recently mapped the Gene Ontology (GO), developed by the Gene Ontology Consortium, into the National Library of Medicine's Unified Medical Language System (UMLS). GO has been developed for the purpose of annotating gene products in genome databases, and the UMLS has been developed as a framework for integrating large numbers of disparate terminologies, primarily for the purpose of providing better access to biomedical information sources. The mapping of GO to UMLS highlighted issues in both terminology systems. After some initial explorations and discussions between the UMLS and GO teams, the GO was integrated with the UMLS. Overall, a total of 23% of the GO terms either matched directly (3%) or linked (20%) to existing UMLS concepts. All GO terms now have a corresponding, official UMLS concept, and the entire vocabulary is available through the web-based UMLS Knowledge Source Server. The mapping of the Gene Ontology, with its focus on structures, processes and functions at the molecular level, to the existing broad coverage UMLS should contribute to linking the language and practices of clinical medicine to the language and practices of genomics.  相似文献   

4.
5.
Background

We study the adaptation of Link Grammar Parser to the biomedical sublanguage with a focus on domain terms not found in a general parser lexicon. Using two biomedical corpora, we implement and evaluate three approaches to addressing unknown words: automatic lexicon expansion, the use of morphological clues, and disambiguation using a part-of-speech tagger. We evaluate each approach separately for its effect on parsing performance and consider combinations of these approaches.

Results

In addition to a 45% increase in parsing efficiency, we find that the best approach, incorporating information from a domain part-of-speech tagger, offers a statistically significant 10% relative decrease in error.

Conclusion

When available, a high-quality domain part-of-speech tagger is the best solution to unknown word issues in the domain adaptation of a general parser. In the absence of such a resource, surface clues can provide remarkably good coverage and performance when tuned to the domain. The adapted parser is available under an open-source license.

  相似文献   

6.

Background  

Word sense disambiguation (WSD) algorithms attempt to select the proper sense of ambiguous terms in text. Resources like the UMLS provide a reference thesaurus to be used to annotate the biomedical literature. Statistical learning approaches have produced good results, but the size of the UMLS makes the production of training data infeasible to cover all the domain.  相似文献   

7.

Background

We study the adaptation of Link Grammar Parser to the biomedical sublanguage with a focus on domain terms not found in a general parser lexicon. Using two biomedical corpora, we implement and evaluate three approaches to addressing unknown words: automatic lexicon expansion, the use of morphological clues, and disambiguation using a part-of-speech tagger. We evaluate each approach separately for its effect on parsing performance and consider combinations of these approaches.

Results

In addition to a 45% increase in parsing efficiency, we find that the best approach, incorporating information from a domain part-of-speech tagger, offers a statistically significant 10% relative decrease in error.

Conclusion

When available, a high-quality domain part-of-speech tagger is the best solution to unknown word issues in the domain adaptation of a general parser. In the absence of such a resource, surface clues can provide remarkably good coverage and performance when tuned to the domain. The adapted parser is available under an open-source license.
  相似文献   

8.

Background

Mapping medical terms to standardized UMLS concepts is a basic step for leveraging biomedical texts in data management and analysis. However, available methods and tools have major limitations in handling queries over the UMLS Metathesaurus that contain inaccurate query terms, which frequently appear in real world applications.

Methods

To provide a practical solution for this task, we propose a layered dynamic programming mapping (LDPMap) approach, which can efficiently handle these queries. LDPMap uses indexing and two layers of dynamic programming techniques to efficiently map a biomedical term to a UMLS concept.

Results

Our empirical study shows that LDPMap achieves much faster query speeds than LCS. In comparison to the UMLS Metathesaurus Browser and MetaMap, LDPMap is much more effective in querying the UMLS Metathesaurus for inaccurately spelled medical terms, long medical terms, and medical terms with special characters.

Conclusions

These results demonstrate that LDPMap is an efficient and effective method for mapping medical terms to the UMLS Metathesaurus.
  相似文献   

9.
The exponential increase in the explosion of Web-based user generated reviews has resulted in the emergence of Opinion Mining (OM) applications for analyzing the users’ opinions toward products, services, and policies. The polarity lexicons often play a pivotal role in the OM, indicating the positivity and negativity of a term along with the numeric score. However, the commonly available domain independent lexicons are not an optimal choice for all of the domains within the OM applications. The aforementioned is due to the fact that the polarity of a term changes from one domain to other and such lexicons do not contain the correct polarity of a term for every domain. In this work, we focus on the problem of adapting a domain dependent polarity lexicon from set of labeled user reviews and domain independent lexicon to propose a unified learning framework based on the information theory concepts that can assign the terms with correct polarity (+ive, -ive) scores. The benchmarking on three datasets (car, hotel, and drug reviews) shows that our approach improves the performance of the polarity classification by achieving higher accuracy. Moreover, using the derived domain dependent lexicon changed the polarity of terms, and the experimental results show that our approach is more effective than the base line methods.  相似文献   

10.
BackgroundOrphanet aims to provide rare disease information to healthcare professionals, patients, and their relatives.ObjectiveThe objective of this work is to evaluate two methodologies (Unified Medical Languages Systems [UMLS] and manual Orphanet-ICD-10 link-based mapping & string-based matching) used to map Orphanet thesaurus to the MeSH thesaurus.ResultsOn a corpus of 375 mappings, the string-based matching provides significantly better results than the UMLS and manual Orphanet-ICD-10 link-based mapping.ConclusionString-based matching could be applied to any biomedical terminology in French not yet included into UMLS.  相似文献   

11.
Recent years have seen a huge increase in the amount of biomedical information that is available in electronic format. Consequently, for biomedical researchers wishing to relate their experimental results to relevant data lurking somewhere within this expanding universe of on-line information, the ability to access and navigate biomedical information sources in an efficient manner has become increasingly important. Natural language and text processing techniques can facilitate this task by making the information contained in textual resources such as MEDLINE more readily accessible and amenable to computational processing. Names of biological entities such as genes and proteins provide critical links between different biomedical information sources and researchers' experimental data. Therefore, automatic identification and classification of these terms in text is an essential capability of any natural language processing system aimed at managing the wealth of biomedical information that is available electronically. To support term recognition in the biomedical domain, we have developed Termino, a large-scale terminological resource for text processing applications, which has two main components: first, a database into which very large numbers of terms can be loaded from resources such as UMLS, and stored together with various kinds of relevant information; second, a finite state recognizer, for fast and efficient identification and mark-up of terms within text. Since many biomedical applications require this functionality, we have made Termino available to the community as a web service, which allows for its integration into larger applications as a remotely located component, accessed through a standardized interface over the web.  相似文献   

12.
The GENIA ontology is a taxonomy that was developed as a result of manual annotation of a subset of MEDLINE, the GENIA corpus. Both the ontology and corpus have been used as a benchmark to test and develop biological information extraction tools. Recent work shows, however, that there is a demand for a more comprehensive ontology that would go along with the corpus. We propose a complete OWL ontology built on top of the GENIA ontology utilizing the GENIA corpus. The proposed ontology includes elements such as the original taxonomy of categories, biological entities as individuals, relations between individuals using verbs and verb nominalizations as object properties, and links to the UMLS Metathesaurus concepts. AVAILABILITY: http://www.ece.ualberta.ca/~rrak/ontology/xGENIA/  相似文献   

13.
MedScan,a natural language processing engine for MEDLINE abstracts   总被引:2,自引:0,他引:2  
MOTIVATION: The importance of extracting biomedical information from scientific publications is well recognized. A number of information extraction systems for the biomedical domain have been reported, but none of them have become widely used in practical applications. Most proposals to date make rather simplistic assumptions about the syntactic aspect of natural language. There is an urgent need for a system that has broad coverage and performs well in real-text applications. RESULTS: We present a general biomedical domain-oriented NLP engine called MedScan that efficiently processes sentences from MEDLINE abstracts and produces a set of regularized logical structures representing the meaning of each sentence. The engine utilizes a specially developed context-free grammar and lexicon. Preliminary evaluation of the system's performance, accuracy, and coverage exhibited encouraging results. Further approaches for increasing the coverage and reducing parsing ambiguity of the engine, as well as its application for information extraction are discussed.  相似文献   

14.
Veterinarians and scientists involved in applied and basic research in cattle require a lexicon of terms that is used uniformly so that diagnoses and inference of results between and among studies can be correctly interpreted and substantiated or negated and therapy and hypotheses can be formulated without unnecessary confusion and redundancy in treatments and experiments. This review provides a compilation of many of the classical and contemporary terms used in association with ovarian dynamics primarily during the estrous cycle in cattle, which can also apply to other reproductive states. While many classical terms used to describe healthy and diseased conditions associated with follicles and corpora lutea are still applicable today, there are some that have become antiquated (e.g., cystic corpus luteum, cystic ovarian degeneration, luteolysis, and granulosa cell tumor), due, in part, to advanced technology (e.g., ultrasonography) and a more thorough understanding of ovarian function. In this regard, older terms have been revised (e.g., corpus luteum with a cavity, follicular and luteinized-follicular cysts, structural and functional luteal regression, and granulosa-theca cell tumor) and newer terms have been coined (e.g., follicle deviation) and advocated herein. Defining and adopting terminology used in bovine reproduction that is clear, precise and understandable and available in a single source, is expected to make the exchange of clinical and research information and outcomes more effective, safe, and economical.  相似文献   

15.
16.
MOTIVATION: Natural language processing (NLP) methods are regarded as being useful to raise the potential of text mining from biological literature. The lack of an extensively annotated corpus of this literature, however, causes a major bottleneck for applying NLP techniques. GENIA corpus is being developed to provide reference materials to let NLP techniques work for bio-textmining. RESULTS: GENIA corpus version 3.0 consisting of 2000 MEDLINE abstracts has been released with more than 400,000 words and almost 100,000 annotations for biological terms.  相似文献   

17.
In this article, we present the Brazilian Portuguese Lexicon, a new word-based corpus for psycholinguistic and computational linguistic research in Brazilian Portuguese. We describe the corpus development, the specific characteristics on the internet site and database for user access. We also perform distributional analyses of the corpus and comparisons to other current databases. Our main objective was to provide a large, reliable, and useful word-based corpus with a dynamic, easy-to-use, and intuitive interface with free internet access for word and word-criteria searches. We used the Núcleo Interinstitucional de Linguística Computacional’s corpus as the basic data source and developed the Brazilian Portuguese Lexicon by deriving and adding metalinguistic and psycholinguistic information about Brazilian Portuguese words. We obtained a final corpus with more than 30 million word tokens, 215 thousand word types and 25 categories of information about each word. This corpus was made available on the internet via a free-access site with two search engines: a simple search and a complex search. The simple engine basically searches for a list of words, while the complex engine accepts all types of criteria in the corpus categories. The output result presents all entries found in the corpus with the criteria specified in the input search and can be downloaded as a.csv file. We created a module in the results that delivers basic statistics about each search. The Brazilian Portuguese Lexicon also provides a pseudoword engine and specific tools for linguistic and statistical analysis. Therefore, the Brazilian Portuguese Lexicon is a convenient instrument for stimulus search, selection, control, and manipulation in psycholinguistic experiments, as also it is a powerful database for computational linguistics research and language modeling related to lexicon distribution, functioning, and behavior.  相似文献   

18.
DEVELOPMENT OF A LEXICON FOR THE DESCRIPTION OF PEANUT FLAVOR   总被引:3,自引:0,他引:3  
A lexicon of terms to describe desirable as well as undesirable flavors in peanuts has been developed. The lexicon and an intensity rating scale was developed by a 13 member panel of flavor and peanut specialists representing industry and the USDA-Agricultural Research Service. This system is intended to provide definitive, common terminology for use in communicating differences in peanut flavor variables among all phases of peanut research and industry.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号