首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Electronic health records and scientific articles possess differing linguistic characteristics that may impact the performance of natural language processing tools developed for one or the other. In this paper, we investigate the performance of four extant concept recognition tools: the clinical Text Analysis and Knowledge Extraction System (cTAKES), the National Center for Biomedical Ontology (NCBO) Annotator, the Biomedical Concept Annotation System (BeCAS) and MetaMap. Each of the four concept recognition systems is applied to four different corpora: the i2b2 corpus of clinical documents, a PubMed corpus of Medline abstracts, a clinical trails corpus and the ShARe/CLEF corpus. In addition, we assess the individual system performances with respect to one gold standard annotation set, available for the ShARe/CLEF corpus. Furthermore, we built a silver standard annotation set from the individual systems’ output and assess the quality as well as the contribution of individual systems to the quality of the silver standard. Our results demonstrate that mainly the NCBO annotator and cTAKES contribute to the silver standard corpora (F1-measures in the range of 21% to 74%) and their quality (best F1-measure of 33%), independent from the type of text investigated. While BeCAS and MetaMap can contribute to the precision of silver standard annotations (precision of up to 42%), the F1-measure drops when combined with NCBO Annotator and cTAKES due to a low recall. In conclusion, the performances of individual systems need to be improved independently from the text types, and the leveraging strategies to best take advantage of individual systems’ annotations need to be revised. The textual content of the PubMed corpus, accession numbers for the clinical trials corpus, and assigned annotations of the four concept recognition systems as well as the generated silver standard annotation sets are available from http://purl.org/phenotype/resources. The textual content of the ShARe/CLEF (https://sites.google.com/site/shareclefehealth/data) and i2b2 (https://i2b2.org/NLP/DataSets/) corpora needs to be requested with the individual corpus providers.  相似文献   

2.
The exponential growth of the biomedical literature is making the need for efficient, accurate text-mining tools increasingly clear. The identification of named biological entities in text is a central and difficult task. We have developed an efficient algorithm and implementation of a dictionary-based approach to named entity recognition, which we here use to identify names of species and other taxa in text. The tool, SPECIES, is more than an order of magnitude faster and as accurate as existing tools. The precision and recall was assessed both on an existing gold-standard corpus and on a new corpus of 800 abstracts, which were manually annotated after the development of the tool. The corpus comprises abstracts from journals selected to represent many taxonomic groups, which gives insights into which types of organism names are hard to detect and which are easy. Finally, we have tagged organism names in the entire Medline database and developed a web resource, ORGANISMS, that makes the results accessible to the broad community of biologists. The SPECIES software is open source and can be downloaded from http://species.jensenlab.org along with dictionary files and the manually annotated gold-standard corpus. The ORGANISMS web resource can be found at http://organisms.jensenlab.org.  相似文献   

3.
Large-scale annotation efforts typically involve several experts who may disagree with each other. We propose an approach for modeling disagreements among experts that allows providing each annotation with a confidence value (i.e., the posterior probability that it is correct). Our approach allows computing certainty-level for individual annotations, given annotator-specific parameters estimated from data. We developed two probabilistic models for performing this analysis, compared these models using computer simulation, and tested each model's actual performance, based on a large data set generated by human annotators specifically for this study. We show that even in the worst-case scenario, when all annotators disagree, our approach allows us to significantly increase the probability of choosing the correct annotation. Along with this publication we make publicly available a corpus of 10,000 sentences annotated according to several cardinal dimensions that we have introduced in earlier work. The 10,000 sentences were all 3-fold annotated by a group of eight experts, while a 1,000-sentence subset was further 5-fold annotated by five new experts. While the presented data represent a specialized curation task, our modeling approach is general; most data annotation studies could benefit from our methodology.  相似文献   

4.
5.
《TARGETS》2003,2(4):177-179
Patent Update is a regular column dedicated to the complex issues that affect patents in the genomics and proteomics field. In each issue, there are two sections compiled by patent attorneys. The first section, Patents – a Practical Perspective, is a commentary on current issues, landmark patents, useful patent resources and how to search them, and legislative changes that impact the pharma and biotech industries. The second section, Patent News, provides brief synopses of recently issued patents and other patent events, and their significance to drug discovery R&D.  相似文献   

6.
《TARGETS》2003,2(6):271-272
Patent Update is a regular column dedicated to the complex issues that affect patents in the genomics and proteomics field. In each issue, there are two sections compiled by patent attorneys. The first section, Patents – a Practical Perspective, is a commentary on current issues, landmark patents, useful patent resources and how to search them, and legislative changes that impact the pharma and biotech industries. The second section, Patent News, provides brief synopses of recently issued patents and other patent events, and their significance to drug discovery R&D.  相似文献   

7.
At the end of the 19th century, the American Patent Office granted the patent known as “Pasteur's application”, claiming the protection of a yeast strain. Since that date, the debate around biotechnology patents, especially for those that affect living organisms or part of them, has grown exponentially.In the present article, the Patent Law is reviewed, pointing out the particular problems about fungi or parts of them. Also, some of the fungus patents are discussed from the perspective of the ethical, economical, social and environmental aspects of these kind of patents.  相似文献   

8.

Background

The analysis of high-throughput data in biology is aided by integrative approaches such as gene-set analysis. Gene-sets can represent well-defined biological entities (e.g. metabolites) that interact in networks (e.g. metabolic networks), to exert their function within the cell. Data interpretation can benefit from incorporating the underlying network, but there are currently no optimal methods that link gene-set analysis and network structures.

Results

Here we present Kiwi, a new tool that processes output data from gene-set analysis and integrates them with a network structure such that the inherent connectivity between gene-sets, i.e. not simply the gene overlap, becomes apparent. In two case studies, we demonstrate that standard gene-set analysis points at metabolites regulated in the interrogated condition. Nevertheless, only the integration of the interactions between these metabolites provides an extra layer of information that highlights how they are tightly connected in the metabolic network.

Conclusions

Kiwi is a tool that enhances interpretability of high-throughput data. It allows the users not only to discover a list of significant entities or processes as in gene-set analysis, but also to visualize whether these entities or processes are isolated or connected by means of their biological interaction. Kiwi is available as a Python package at http://www.sysbio.se/kiwi and an online tool in the BioMet Toolbox at http://www.biomet-toolbox.org.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0408-9) contains supplementary material, which is available to authorized users.  相似文献   

9.
10.
Bats account for ~20% of mammalian species, and are the only mammals with true powered flight. For the sake of their specialized phenotypic traits, many researches have been devoted to examine the evolution of bats. Until now, some whole genome sequences of bats have been assembled and annotated, however, a uniform resource for the annotated bat genomes is still unavailable. To make the extensive data associated with the bat genomes accessible to the general biological communities, we established a Bat Genome Database (BGD). BGD is an open-access, web-available portal that integrates available data of bat genomes and genes. It hosts data from six bat species, including two megabats and four microbats. Users can query the gene annotations using efficient searching engine, and it offers browsable tracks of bat genomes. Furthermore, an easy-to-use phylogenetic analysis tool was also provided to facilitate online phylogeny study of genes. To the best of our knowledge, BGD is the first database of bat genomes. It will extend our understanding of the bat evolution and be advantageous to the bat sequences analysis. BGD is freely available at: http://donglab.ecnu.edu.cn/databases/BatGenome/.  相似文献   

11.
Medical forms are very heterogeneous: on a European scale there are thousands of data items in several hundred different systems. To enable data exchange for clinical care and research purposes there is a need to develop interoperable documentation systems with harmonized forms for data capture. A prerequisite in this harmonization process is comparison of forms. So far – to our knowledge – an automated method for comparison of medical forms is not available. A form contains a list of data items with corresponding medical concepts. An automatic comparison needs data types, item names and especially item with these unique concept codes from medical terminologies. The scope of the proposed method is a comparison of these items by comparing their concept codes (coded in UMLS). Each data item is represented by item name, concept code and value domain. Two items are called identical, if item name, concept code and value domain are the same. Two items are called matching, if only concept code and value domain are the same. Two items are called similar, if their concept codes are the same, but the value domains are different. Based on these definitions an open-source implementation for automated comparison of medical forms in ODM format with UMLS-based semantic annotations was developed. It is available as package compareODM from http://cran.r-project.org. To evaluate this method, it was applied to a set of 7 real medical forms with 285 data items from a large public ODM repository with forms for different medical purposes (research, quality management, routine care). Comparison results were visualized with grid images and dendrograms. Automated comparison of semantically annotated medical forms is feasible. Dendrograms allow a view on clustered similar forms. The approach is scalable for a large set of real medical forms.  相似文献   

12.
Patent analysis with the help of the strategic mining of patents from databases is important and useful within the framework of application-oriented research and its commercialization. In the analysis reported here, we have mined cyanobacterial patents from the patent database of the United States Patent and Trademark Office (USPTO). In order to make an assessment of the commercial potentials of cyanobacteria, we conducted the patent search (from 1976 to April 2006) using certain generic terms and the 84 genera of cyanobacteria as keywords. The search was performed in two major ways – searching the abstracts and claims of the patents cumulatively and searching the entire patent documents by the mode of ‘all fields’ in USPTO. In the abstract- and claims-based search, 234 patents were obtained after the removal of overlapping patents among the keywords. An additional 31 patents were added following the ‘all fields’ search; these patents were not covered in the search that was based on abstracts and claims. The entire package of 265 patents, of which 244 were related to cyanobacteria, was then analyzed. Information derived from these patents identified five major areas of cyanobacterial utilization. Cyanobacteria have been patented as a source of a wide spectrum of products, for medical, agriculture and environmental applications, for gene-based products, for methods of cultivation and for methods of control. The chronological development in granting cyanobacterial patents was also traced. This study demonstrates that such strategic mining and analysis of patent data can be used as an index for future development.  相似文献   

13.
The products of Plant Molecular Farming are recombinant proteins or their metabolic products. In this study, patent data was employed to assess industrial trend in the research and innovation process of Plant Molecular Farming within national and international context. The US Patent and Trade Organization (USPTO), the European Patent Office (EPO) issued a total of 585 patents covering Plant Molecular Farming from 2002 through 2006. By nationality, US inventors predominated as recipients of PMF patents, followed by Germany, Denmark, and Japan. The PMF patents were catagorized in five major areas of research namely pharmaceutical and nutraceuticals with 170 patents (31%) and plant expression tools and methods for alternative production systems with 169 patents (29%) were the dominating patent applications, followed by 102 patent claims associated with antibodies (17%), 71 patents of industrial molecules (12%), 48 patents of vaccines (8%), and finally 18 patents related to post-translational protein glycosylation (3%). The greatest proportion of patentees was of US origin (52%), and PMF associated patenting activities at the USPTO and EPO were dominated with 67% by private organizations. Disclaimer: The views expressed in this study do not necessarily reflect those of the European Commission.  相似文献   

14.
The patenting of biotechnological inventions is practically in harmony with the general requirements of patent protection. It stands still in the foreground of interests since this is the only technical field where the living material itself may be the subject matter of patents. In consequence ethical problems have arisen first of all in the patenting of human cells and genes in which there is no agreement between R&D firms, patent offices and green movements. This has called for the elaboration of special Directives. On the other hand, patent systems are instrumental in safeguarding biodiversity. This review gives a picture of the patenting situation in biotechnology in the European Patent Office and in Hungary, the host country of the Congress. It also gives practical advice to biotechnological researchers on how to draft the applications and to observe the time limits, as well as on the necessity and possibilities of the deposit of microorganisms.  相似文献   

15.
The Alliance of Genome Resources (the Alliance) is a combined effort of 7 knowledgebase projects: Saccharomyces Genome Database, WormBase, FlyBase, Mouse Genome Database, the Zebrafish Information Network, Rat Genome Database, and the Gene Ontology Resource. The Alliance seeks to provide several benefits: better service to the various communities served by these projects; a harmonized view of data for all biomedical researchers, bioinformaticians, clinicians, and students; and a more sustainable infrastructure. The Alliance has harmonized cross-organism data to provide useful comparative views of gene function, gene expression, and human disease relevance. The basis of the comparative views is shared calls of orthology relationships and the use of common ontologies. The key types of data are alleles and variants, gene function based on gene ontology annotations, phenotypes, association to human disease, gene expression, protein–protein and genetic interactions, and participation in pathways. The information is presented on uniform gene pages that allow facile summarization of information about each gene in each of the 7 organisms covered (budding yeast, roundworm Caenorhabditis elegans, fruit fly, house mouse, zebrafish, brown rat, and human). The harmonized knowledge is freely available on the alliancegenome.org portal, as downloadable files, and by APIs. We expect other existing and emerging knowledge bases to join in the effort to provide the union of useful data and features that each knowledge base currently provides.  相似文献   

16.
Copy number variation (CNV) is one of the most prevalent genetic variations in the genome, leading to an abnormal number of copies of moderate to large genomic regions. High-throughput technologies such as next-generation sequencing often identify thousands of CNVs involved in biological or pathological processes. Despite the growing demand to filter and classify CNVs by factors such as frequency in population, biological features, and function, surprisingly, no online web server for CNV annotations has been made available to the research community. Here, we present CNVannotator, a web server that accepts an input set of human genomic positions in a user-friendly tabular format. CNVannotator can perform genomic overlaps of the input coordinates using various functional features, including a list of the reported 356,817 common CNVs, 181,261 disease CNVs, as well as, 140,342 SNPs from genome-wide association studies. In addition, CNVannotator incorporates 2,211,468 genomic features, including ENCODE regulatory elements, cytoband, segmental duplication, genome fragile site, pseudogene, promoter, enhancer, CpG island, and methylation site. For cancer research community users, CNVannotator can apply various filters to retrieve a subgroup of CNVs pinpointed in hundreds of tumor suppressor genes and oncogenes. In total, 5,277,234 unique genomic coordinates with functional features are available to generate an output in a plain text format that is free to download. In summary, we provide a comprehensive web resource for human CNVs. The annotated results along with the server can be accessed at http://bioinfo.mc.vanderbilt.edu/CNVannotator/.  相似文献   

17.
18.

Motivation

Biomedical entities, their identifiers and names, are essential in the representation of biomedical facts and knowledge. In the same way, the complete set of biomedical and chemical terms, i.e. the biomedical “term space” (the “Lexeome”), forms a key resource to achieve the full integration of the scientific literature with biomedical data resources: any identified named entity can immediately be normalized to the correct database entry. This goal does not only require that we are aware of all existing terms, but would also profit from knowing all their senses and their semantic interpretation (ambiguities, nestedness).

Result

This study compiles a resource for lexical terms of biomedical interest in a standard format (called “LexEBI”), determines the overall number of terms, their reuse in different resources and the nestedness of terms. LexEBI comprises references for protein and gene entries and their term variants and chemical entities amongst other terms. In addition, disease terms have been identified from Medline and PubmedCentral and added to LexEBI. Our analysis demonstrates that the baseforms of terms from the different semantic types show only little polysemous use. Nonetheless, the term variants of protein and gene names (PGNs) frequently contain species mentions, which should have been avoided according to protein annotation guidelines. Furthermore, the protein and gene entities as well as the chemical entities, both do comprise enzymes leading to hierarchical polysemy, and a large portion of PGNs make reference to a chemical entity. Altogether, according to our analysis based on the Medline distribution, 401,869 unique PGNs in the documents contain a reference to 25,022 chemical entities, 3,125 disease terms or 1,576 species mentions.

Conclusion

LexEBI delivers the complete biomedical and chemical Lexeome in a standardized representation (http://www.ebi.ac.uk/Rebholz-srv/LexEBI/). The resource provides the disease terms as open source content, and fully interlinks terms across resources.  相似文献   

19.
Microalgal biotechnology is an innovative sector in the field of biotechnology and has evolved exponentially in the last 100 years. With the aim of finding out the current situation of the sector and its development, patents on microalgal biotechnology were surveyed in Espacenet, the European Patent Office database. The objective of this study was to identify the main trends in microalgae-related patents in the most commercial genera: Chlorella, Spirulina, Dunaliella, Haematococcus and model organism Chlamydomonas.  相似文献   

20.

Introduction

New tools and approaches are necessary to facilitate public policy planning and foster the management of innovation in countries'' public health systems. To this end, an understanding of the integrated way in which the various actors who produce scientific knowledge and inventions in technological areas of interest operate, where they are located and how they relate to one another is of great relevance. Tuberculosis has been chosen as a model for the present study as it is a current challenge for Brazilian research and innovation.

Methodology

Publications about tuberculosis written by Brazilian authors were accessed from international databases, analyzed, processed with text searching tools and networks of coauthors were constructed and visualized. Patent applications about tuberculosis in Brazil were retrieved from the Brazilian National Institute of Industrial Property (INPI) and the European Patent Office databases, through the use of International Patent Classification and keywords and then categorized and analyzed.

Results/Conclusions

Brazilian authorship of articles about tuberculosis jumped from 1% in 1995 to 5% in 2010. Article production and patent filings of national origin have been concentrated in public universities and research institutions while the participation of private industry in the filing of Brazilian patents has remained limited. The goals of national patenting efforts have still not been reached, as up to the present none of the applications filed have been granted a patent. The analysis of all this data about TB publishing and patents clearly demonstrates the importance of maintaining the continuity of Brazil''s production development policies as well as government support for infrastructure projects to be employed in transforming the potential of research. This policy, which already exists for the promotion of new products and processes that, in addition to bringing diverse economic benefits to the country, will also contribute to effective dealing with public health problems affecting Brazil and the World.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号