首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Arguably, the richest source of knowledge (as opposed to fact and data collections) about biology and biotechnology is captured in natural-language documents such as technical reports, conference proceedings and research articles. The automatic exploitation of this rich knowledge base for decision making, hypothesis management (generation and testing) and knowledge discovery constitutes a formidable challenge. Recently, a set of technologies collectively referred to as knowledge discovery in text (KDT) has been advocated as a promising approach to tackle this challenge. KDT comprises three main tasks: information retrieval, information extraction and text mining. These tasks are the focus of much recent scientific research and many algorithms have been developed and applied to documents and text in biology and biotechnology. This article introduces the basic concepts of KDT, provides an overview of some of these efforts in the field of bioscience and biotechnology, and presents a framework of commonly used techniques for evaluating KDT methods, tools and systems.  相似文献   

2.
We present ProtaBank, a repository for storing, querying, analyzing, and sharing protein design and engineering data in an actively maintained and updated database. ProtaBank provides a format to describe and compare all types of protein mutational data, spanning a wide range of properties and techniques. It features a user‐friendly web interface and programming layer that streamlines data deposition and allows for batch input and queries. The database schema design incorporates a standard format for reporting protein sequences and experimental data that facilitates comparison of results across different data sets. A suite of analysis and visualization tools are provided to facilitate discovery, to guide future designs, and to benchmark and train new predictive tools and algorithms. ProtaBank will provide a valuable resource to the protein engineering community by storing and safeguarding newly generated data, allowing for fast searching and identification of relevant data from the existing literature, and exploring correlations between disparate data sets. ProtaBank invites researchers to contribute data to the database to make it accessible for search and analysis. ProtaBank is available at https://protabank.org .  相似文献   

3.
MOTIVATION: Biological literature contains many abbreviations with one particular sense in each document. However, most abbreviations do not have a unique sense across the literature. Furthermore, many documents do not contain the long forms of the abbreviations. Resolving an abbreviation in a document consists of retrieving its sense in use. Abbreviation resolution improves accuracy of document retrieval engines and of information extraction systems. RESULTS: We combine an automatic analysis of Medline abstracts and linguistic methods to build a dictionary of abbreviation/sense pairs. The dictionary is used for the resolution of abbreviations occurring with their long forms. Ambiguous global abbreviations are resolved using support vector machines that have been trained on the context of each instance of the abbreviation/sense pairs, previously extracted for the dictionary set-up. The system disambiguates abbreviations with a precision of 98.9% for a recall of 98.2% (98.5% accuracy). This performance is superior in comparison with previously reported research work. AVAILABILITY: The abbreviation resolution module is available at http://www.ebi.ac.uk/Rebholz/software.html.  相似文献   

4.
An architecture for biological information extraction and representation   总被引:1,自引:0,他引:1  
Motivations: Technological advances in biomedical research are generating a plethora of heterogeneous data at a high rate. There is a critical need for extraction, integration and management tools for information discovery and synthesis from these heterogeneous data. RESULTS: In this paper, we present a general architecture, called ALFA, for information extraction and representation from diverse biological data. The ALFA architecture consists of: (i) a networked, hierarchical, hyper-graph object model for representing information from heterogeneous data sources in a standardized, structured format; and (ii) a suite of integrated, interactive software tools for information extraction and representation from diverse biological data sources. As part of our research efforts to explore this space, we have currently prototyped the ALFA object model and a set of interactive software tools for searching, filtering, and extracting information from scientific text. In particular, we describe BioFerret, a meta-search tool for searching and filtering relevant information from the web, and ALFA Text Viewer, an interactive tool for user-guided extraction, disambiguation, and representation of information from scientific text. We further demonstrate the potential of our tools in integrating the extracted information with experimental data and diagrammatic biological models via the common underlying ALFA representation. CONTACT: aditya_vailaya@agilent.com.  相似文献   

5.
Abstract

Arguably, the richest source of knowledge (as opposed to fact and data collections) about biology and biotechnology is captured in natural-language documents such as technical reports, conference proceedings and research articles. The automatic exploitation of this rich knowledge base for decision making, hypothesis management (generation and testing) and knowledge discovery constitutes a formidable challenge. Recently, a set of technologies collectively referred to as knowledge discovery in text (KDT) has been advocated as a promising approach to tackle this challenge. KDT comprises three main tasks: information retrieval, information extraction and text mining. These tasks are the focus of much recent scientific research and many algorithms have been developed and applied to documents and text in biology and biotechnology. This article introduces the basic concepts of KDT, provides an overview of some of these efforts in the field of bioscience and biotechnology, and presents a framework of commonly used techniques for evaluating KDT methods, tools and systems.  相似文献   

6.
澳大利亚外来入侵物种管理策略及对我国的借鉴意义   总被引:32,自引:1,他引:32  
澳大利亚是一个岛状大陆,海洋运输业十分发达,通过贸易,旅游,运输等途径有意或无间引进有害外来物种的风险较大。澳大利亚政府高度重视外来入侵物种的管理工作,制定了《澳大利亚国家生物多样性保护策略》,针对外来杂草和通过压舱水载入的海洋外来入侵物种的管理制定了《国家杂草策略》,《杂草风险评价系统》和《压舱水指南》等法规和技术性文件,加强了对外来入侵物种的管理。本文简要介绍了澳大利亚外来入侵物种管理的有关策略和指南,并提出了我国在外来入侵物种管理方面的对策建议;(1)尽快建立相应的法规体系,实现外来入侵物种的依法管理;(2)加强机构建设,形成多部门的协调管理机制;(3)加强外来入侵物种管理制度的建设;(4)采取适当的引进预防,消除,控制和恢复措施;(5)开展科学研究,为外来入侵物种的管理提供科学依据;(6)制定教育和培训计划,提高公众意识。  相似文献   

7.
Given the current trends, it seems inevitable that all biological documents will eventually exist in a digital format and be distributed across the internet. New network services and tools need to be developed to increase retrieval rates for documents and to refine data recovery. Biological data have traditionally been well managed using taxonomic principles. As part of a larger initiative to build an array of names-based network services that emulate taxonomic principles for managing biological information, we undertook the digitization of a major taxonomic reference text, Nomenclator Zoologicus. The process involved replicating the text to a high level of fidelity, parsing the content for inclusion within a database, developing tools to enable expert input into the product, and integrating the metadata and factual content within taxonomic network services. The result is a high-quality and freely available web application (http://uio.mbl.edu/NomenclatorZoologicus/) capable of being exploited in an array of biological informatics services.  相似文献   

8.
SUMMARY: VISDA (Visual Statistical Data Analyzer) is a caBIG analytical tool for cluster modeling, visualization and discovery that has met silver-level compatibility under the caBIG initiative. Being statistically principled and visually interfaced, VISDA exploits both hierarchical statistics modeling and human gift for pattern recognition to allow a progressive yet interactive discovery of hidden clusters within high dimensional and complex biomedical datasets. The distinctive features of VISDA are particularly useful for users across the cancer research and broader research communities to analyze complex biological data. AVAILABILITY: http://gforge.nci.nih.gov/projects/visda/  相似文献   

9.
Gathering archival documents to trace the history of the Zeiss company presents no difficulty : they are abundant… except for a period from 1932 to 1945, systematically ignored, and that corresponds to the Nazi period. On the website Zeiss Historica, among the outstanding personalities of the Zeiss company, we note that, for Professor Emanuel Goldberg, the web page ? is still under development but an early picture of the professor is available. ?. But fortunately, Mickael Buckland, a Professor at the UC Berkeley School of Information brought the life and the work of Emanuel Goldberg to light. Thanks to him, his works and innovations, who had disappeared from our cultural and scientific heritage, return to light after being erased during fifty years. Goldberg had published dozens of articles, obtained patents, developed cameras, microdots, movie cameras, and he designed what he called a "Statistical Machine ", the first electronic document retrieval machine. In France, if this rediscovery was made known to the world of information science, it has not had the impact it deserved in the scientific world. Therefore it is time to reconstruct his career and his work, and to analyse the reasons why some attempted to erase definitively his name and memory.  相似文献   

10.
National/ethnic mutation databases aim to document the genetic heterogeneity in various populations and ethnic groups worldwide. We have previously reported the development and upgrade of FINDbase (www.findbase.org), a database recording causative mutations and pharmacogenomic marker allele frequencies in various populations around the globe. Although this database has recently been upgraded, we continuously try to enhance its functionality by providing more advanced visualization tools that would further assist effective data querying and comparisons. We are currently experimenting in various visualization techniques on the existing FINDbase causative mutation data collection aiming to provide a dynamic research tool for the worldwide scientific community. We have developed an interactive web-based application for population-based mutation data retrieval. It supports sophisticated data exploration allowing users to apply advanced filtering criteria upon a set of multiple views of the underlying data collection and enables browsing the relationships between individual datasets in a novel and meaningful way.  相似文献   

11.
Synthetic cell–cell interaction systems can be useful for understanding multicellular communities or for screening binding molecules. We adapt a previously characterized set of synthetic cognate nanobody–antigen pairs to a yeast–bacteria coincubation format and use flow cytometry to evaluate cell–cell interactions mediated by binding between surface-displayed molecules. We further use fluorescence-activated cell sorting to enrich a specific yeast-displayed nanobody within a mixed yeast-display population. Finally, we demonstrate that this system supports the characterization of a therapeutically relevant nanobody–antigen interaction: a previously discovered nanobody that binds to the intimin protein expressed on the surface of enterohemorrhagic Escherichia coli. Overall, our findings indicate that the yeast–bacteria format supports efficient evaluation of ligand–target interactions. With further development, this format may facilitate systematic characterization and high-throughput discovery of bacterial surface-binding molecules.  相似文献   

12.
Fostering data sharing is a scientific and ethical imperative. Health gains can be achieved more comprehensively and quickly by combining large, information-rich datasets from across conventionally siloed disciplines and geographic areas. While collaboration for data sharing is increasingly embraced by policymakers and the international biomedical community, we lack a common ethical and legal framework to connect regulators, funders, consortia, and research projects so as to facilitate genomic and clinical data linkage, global science collaboration, and responsible research conduct. Governance tools can be used to responsibly steer the sharing of data for proper stewardship of research discovery, genomics research resources, and their clinical applications. In this article, we propose that an international code of conduct be designed to enable global genomic and clinical data sharing for biomedical research. To give this proposed code universal application and accountability, however, we propose to position it within a human rights framework. This proposition is not without precedent: international treaties have long recognized that everyone has a right to the benefits of scientific progress and its applications, and a right to the protection of the moral and material interests resulting from scientific productions. It is time to apply these twin rights to internationally collaborative genomic and clinical data sharing.  相似文献   

13.
The Protein Information Resource (PIR) is an integrated public resource of protein informatics that supports genomic and proteomic research and scientific discovery. PIR maintains the Protein Sequence Database (PSD), an annotated protein database containing over 283 000 sequences covering the entire taxonomic range. Family classification is used for sensitive identification, consistent annotation, and detection of annotation errors. The superfamily curation defines signature domain architecture and categorizes memberships to improve automated classification. To increase the amount of experimental annotation, the PIR has developed a bibliography system for literature searching, mapping, and user submission, and has conducted retrospective attribution of citations for experimental features. PIR also maintains NREF, a non-redundant reference database, and iProClass, an integrated database of protein family, function, and structure information. PIR-NREF provides a timely and comprehensive collection of protein sequences, currently consisting of more than 1 000 000 entries from PIR-PSD, SWISS-PROT, TrEMBL, RefSeq, GenPept, and PDB. The PIR web site (http://pir.georgetown.edu) connects data analysis tools to underlying databases for information retrieval and knowledge discovery, with functionalities for interactive queries, combinations of sequence and text searches, and sorting and visual exploration of search results. The FTP site provides free download for PSD and NREF biweekly releases and auxiliary databases and files.  相似文献   

14.
15.

Background  

Accuracy of document retrieval from MEDLINE for gene queries is crucially important for many applications in bioinformatics. We explore five information retrieval-based methods to rank documents retrieved by PubMed gene queries for the human genome. The aim is to rank relevant documents higher in the retrieved list. We address the special challenges faced due to ambiguity in gene nomenclature: gene terms that refer to multiple genes, gene terms that are also English words, and gene terms that have other biological meanings.  相似文献   

16.
MOTIVATION: A vast amount of information about human, animal and plant pathogens has been acquired, stored and displayed in varied formats through different resources, both electronically and otherwise. However, there is no community standard format for organizing this information or agreement on machine-readable format(s) for data exchange, thereby hampering interoperation efforts across information systems harboring such infectious disease data. RESULTS: The Pathogen Information Markup Language (PIML) is a free, open, XML-based format for representing pathogen information. XSLT-based visual presentations of valid PIML documents were developed and can be accessed through the PathInfo website or as part of the interoperable web services federation known as ToolBus/PathPort. Currently, detailed PIML documents are available for 21 pathogens deemed of high priority with regard to public health and national biological defense. A dynamic query system allows simple queries as well as comparisons among these pathogens. Continuing efforts are being taken to include other groups' supporting PIML and to develop more PIML documents. AVAILABILITY: All the PIML-related information is accessible from http://www.vbi.vt.edu/pathport/pathinfo/  相似文献   

17.
PIMWalker™     
This article reports on PIMWalker, a free and interactive tool for visualising protein interaction networks. PIMWalker handles the unified molecular interaction (MI) format defined by members of the Proteomics Standards Initiative (the PSI MI format), and it is thus directly and easily usable by bench biologists. PIMWalker also comes with a documented, open-source Javatrade mark application programming interface allowing the bioinformatic programmer to easily extend the functions. AVAILABILITY: PIMWalker is available under a free license from http://pim.hybrigenics.com/pimwalker.  相似文献   

18.
The National Science Foundation and others have made compelling arguments that research be incorporated into the learning of undergraduates. In response to these arguments, a two-hybrid research project was incorporated into a molecular biology course that contained both a lecture section and a laboratory section. The course was designed around specific goals for educational outcomes, including introducing research to a wide range of students, teaching students experimental design and data analysis, and enhancing understanding of course material. Additional goals included teaching students to search genomic databases, to access scientific articles, and to write a paper in scientific format. Graded events tested these goals, and a student evaluation indicated student perception of the project. According to our analysis of the data, the yeast two-hybrid screen was a success: several novel clones were identified; students met expectations on graded lab reports, the poster session, and the final paper; and evaluations indicated that students had achieved the outlined goals. Students indicated on the evaluations that the research project increased their interest in research and greatly improved understanding of the course material. Finally, several students in the course intend to submit the findings of the research project to an undergraduate research journal.  相似文献   

19.
The amount of glycomics data being generated is rapidly increasing as a result of improvements in analytical and computational methods. Correlation and analysis of this large, distributed data set requires an extensible and flexible representational standard that is also ‘understood’ by a wide range of software applications. An XML-based data representation standard that faithfully captures essential structural details of a glycan moiety along with additional information (such as data provenance) to aid the interpretation and usage of glycan data, will facilitate the exchange of glycomics data across the scientific community. To meet this need, we introduce GLYcan Data Exchange (GLYDE) standard as an XML-based representation format to enable interoperability and exchange of glycomics data. An online tool (http://128.192.9.86/stargate/formatIndex.jsp) for the conversion of other representations to GLYDE format has been developed.  相似文献   

20.
SUMMARY

This paper assesses the research undertaken at Lake St. Lucia over the past 25 years based on over 300 documents from that period. Trends related to both time and subject matter are evident, and these are considered in relation to the gaps in our current knowledge concerning the system.

A feature evident throughout the period under consideration is that the major portion of documented material available relates to reports and contributions to workshops (77%) with only 23% from scientific publications. Contributions by these two sources to the subject group being considered for St. Lucia are markedly different with research publications dominating the biological field and reports dominating in physical aspects, catchment characteristics, man's activities, management, dredging and hydrological modelling. However, some 55% of all unpublished data related to reviews or assessments of the state of research on St. Lucia.

The importance of the scientific publications group as an indicator of the state of research into the system is considered in the light of an apparent decline in the number of completed projects being published. It is also considered in the light of the recent establishment of a co-ordinated Lake St. Lucia Research Programme, which may provide the impetus for a more concentrated and directed research effort on the Lake System.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号