首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Taxonomic indexing refers to a new array of taxonomically intelligent network services that use nomenclatural principles and elements of expert taxonomic knowledge to manage information about organisms. Taxonomic indexing was introduced to help manage the increasing amounts of digital information about biology. It has been designed to form a near basal layer in a layered cyberinfrastructure that deals with biological information. Taxonomic Indexing accommodates the special problems of using names of organisms to index biological material. It links alternative names for the same entity (reconciliation), and distinguishes between uses of the same name for different entities (disambiguation), and names are placed within an indefinite number of hierarchical schemes. In order to access all information on all organisms, Taxonomic indexing must be able to call on a registry of all names in all forms for all organisms. NameBank has been developed to meet that need. Taxonomic indexing is an area of informatics that overlaps with taxonomy, is dependent on the expert input of taxonomists, and reveals the relevance of the discipline to a wide audience.  相似文献   

2.
uBioRSS: tracking taxonomic literature using RSS   总被引:1,自引:0,他引:1  
Web content syndication through standard formats such as RSS and ATOM has become an increasingly popular mechanism for publishers, news sources and blogs to disseminate regularly updated content. These standardized syndication formats deliver content directly to the subscriber, allowing them to locally aggregate content from a variety of sources instead of having to find the information on multiple websites. The uBioRSS application is a 'taxonomically intelligent' service customized for the biological sciences. It aggregates syndicated content from academic publishers and science news feeds, and then uses a taxonomic Named Entity Recognition algorithm to identify and index taxonomic names within those data streams. The resulting name index is cross-referenced to current global taxonomic datasets to provide context for browsing the publications by taxonomic group. This process, called taxonomic indexing, draws upon services developed specifically for biological sciences, collectively referred to as 'taxonomic intelligence'. Such value-added enhancements can provide biologists with accelerated and improved access to current biological content. AVAILABILITY: http://names.ubio.org/rss/  相似文献   

3.
Expanding digital data sources, including social media, online news articles and blogs, provide an opportunity to understand better the context and intensity of human-nature interactions, such as wildlife exploitation. However, online searches encompassing large taxonomic groups can generate vast datasets, which can be overwhelming to filter for relevant content without the use of automated tools. The variety of machine learning models available to researchers, and the need for manually labelled training data with an even balance of labels, can make applying these tools challenging. Here, we implement and evaluate a hierarchical text classification pipeline which brings together three binary classification tasks with increasingly specific relevancy criteria. Crucially, the hierarchical approach facilitates the filtering and structuring of a large dataset, of which relevant sources make up a small proportion. Using this pipeline, we also investigate how the accuracy with which text classifiers identify relevant and irrelevant texts is influenced by the use of different models, training datasets, and the classification task. To evaluate our methods, we collected data from Facebook, Twitter, Google and Bing search engines, with the aim of identifying sources documenting the hunting and persecution of bats (Chiroptera). Overall, the ‘state-of-the-art’ transformer-based models were able to identify relevant texts with an average accuracy of 90%, with some classifiers achieving accuracy of >95%. Whilst this demonstrates that application of more advanced models can lead to improved accuracy, comparable performance was achieved by simpler models when applied to longer documents and less ambiguous classification tasks. Hence, the benefits from using more computationally expensive models are dependent on the classification context. We also found that stratification of training data, according to the presence of key search terms, improved classification accuracy for less frequent topics within datasets, and therefore improves the applicability of classifiers to future data collection. Overall, whilst our findings reinforce the usefulness of automated tools for facilitating online analyses in conservation and ecology, they also highlight that the effectiveness and appropriateness of such tools is determined by the nature and volume of data collected, the complexity of the classification task, and the computational resources available to researchers.  相似文献   

4.
The dynamic expansion of the taxonomic knowledge base is fundamental to further developments in biotechnology and sustainable conservation strategies. The vast array of software tools for numerical taxonomy and probabilistic identification, in conjunction with automated systems for data generation are allowing the construction of large computerised strain databases. New techniques available for the generation of chemical and molecular data, associated with new software tools for data analysis, are leading to a quantum leap in bacterial systematics. The easy exchange of data through an interactive and highly distributed global computer network, such as the Internet, is facilitating the dissemination of taxonomic data. Relevant information for comparative sequence analysis, ribotyping, protein and DNA electrophoretic pattern analysis is available on-line through computerised networks. Several software packages are available for the analysis of molecular data. Nomenclatural and taxonomic Authority Files are available from different sources together with strain specific information. The increasing availability of public domain software, is leading to the establishment and integration of public domain databases all over the world, and promoting co-operative research projects on a scale never seen before.  相似文献   

5.
MOTIVATION: As biomedical researchers are amassing a plethora of information in a variety of forms resulting from the advancements in biomedical research, there is a critical need for innovative information management and knowledge discovery tools to sift through these vast volumes of heterogeneous data and analysis tools. In this paper we present a general model for an information management system that is adaptable and scalable, followed by a detailed design and implementation of one component of the model. The prototype, called BioSifter, was applied to problems in the bioinformatics area. RESULTS: BioSifter was tested using 500 documents obtained from PubMed database on two biological problems related to genetic polymorphism and extracorporal shockwave lithotripsy. The results indicate that BioSifter is a powerful tool for biological researchers to automatically retrieve relevant text documents from biological literature based on their interest profile. The results also indicate that the first stage of information management process, i.e. data to information transformation, significantly reduces the size of the information space. The filtered data obtained through BioSifter is relevant as well as much smaller in dimension compared to all the retrieved data. This would in turn significantly reduce the complexity associated with the next level transformation, i.e. information to knowledge.  相似文献   

6.
Sound application of molecular epidemiological principles requires working knowledge of both molecular biological and epidemiological methods. Molecular tools have become an increasingly important part of studying the epidemiology of infectious agents. Molecular tools have allowed the aetiological agent within a population to be diagnosed with a greater degree of efficiency and accuracy than conventional diagnostic tools. They have increased the understanding of the pathogenicity, virulence, and host-parasite relationships of the aetiological agent, provided information on the genetic structure and taxonomy of the parasite and allowed the zoonotic potential of previously unidentified agents to be determined. This review describes the concept of epidemiology and proper study design, describes the array of currently available molecular biological tools and provides examples of studies that have integrated both disciplines to successfully unravel zoonotic relationships that would otherwise be impossible utilising conventional diagnostic tools. The current limitations of applying these tools, including cautions that need to be addressed during their application are also discussed.  相似文献   

7.
Evolution of web services in bioinformatics   总被引:4,自引:0,他引:4  
Bioinformaticians have developed large collections of tools to make sense of the rapidly growing pool of molecular biological data. Biological systems tend to be complex and in order to understand them, it is often necessary to link many data sets and use more than one tool. Therefore, bioinformaticians have experimented with several strategies to try to integrate data sets and tools. Owing to the lack of standards for data sets and the interfaces of the tools this is not a trivial task. Over the past few years building services with web-based interfaces has become a popular way of sharing the data and tools that have resulted from many bioinformatics projects. This paper discusses the interoperability problem and how web services are being used to try to solve it, resulting in the evolution of tools with web interfaces from HTML/web form-based tools not suited for automatic workflow generation to a dynamic network of XML-based web services that can easily be used to create pipelines.  相似文献   

8.
The development of high-throughput technologies has generated the need for bioinformatics approaches to assess the biological relevance of gene networks. Although several tools have been proposed for analysing the enrichment of functional categories in a set of genes, none of them is suitable for evaluating the biological relevance of the gene network. We propose a procedure and develop a web-based resource (BIOREL) to estimate the functional bias (biological relevance) of any given genetic network by integrating different sources of biological information. The weights of the edges in the network may be either binary or continuous. These essential features make our web tool unique among many similar services. BIOREL provides standardized estimations of the network biases extracted from independent data. By the analyses of real data we demonstrate that the potential application of BIOREL ranges from various benchmarking purposes to systematic analysis of the network biology.  相似文献   

9.
In recent years, biological web resources such as databases and tools have become more complex because of the enormous amounts of data generated in the field of life sciences. Traditional methods of distributing tutorials include publishing textbooks and posting web documents, but these static contents cannot adequately describe recent dynamic web services. Due to improvements in computer technology, it is now possible to create dynamic content such as video with minimal effort and low cost on most modern computers. The ease of creating and distributing video tutorials instead of static content improves accessibility for researchers, annotators and curators. This article focuses on online video repositories for educational and tutorial videos provided by resource developers and users. It also describes a project in Japan named TogoTV (http://togotv.dbcls.jp/en/) and discusses the production and distribution of high-quality tutorial videos, which would be useful to viewer, with examples. This article intends to stimulate and encourage researchers who develop and use databases and tools to distribute how-to videos as a tool to enhance product usability.  相似文献   

10.
Towards a collaborative, global infrastructure for biodiversity assessment   总被引:4,自引:0,他引:4  
Biodiversity data are rapidly becoming available over the Internet in common formats that promote sharing and exchange. Currently, these data are somewhat problematic, primarily with regard to geographic and taxonomic accuracy, for use in ecological research, natural resources management and conservation decision-making. However, web-based georeferencing tools that utilize best practices and gazetteer databases can be employed to improve geographic data. Taxonomic data quality can be improved through web-enabled valid taxon names databases and services, as well as more efficient mechanisms to return systematic research results and taxonomic misidentification rates back to the biodiversity community. Both of these are under construction. A separate but related challenge will be developing web-based visualization and analysis tools for tracking biodiversity change. Our aim was to discuss how such tools, combined with data of enhanced quality, will help transform today's portals to raw biodiversity data into nexuses of collaborative creation and sharing of biodiversity knowledge.  相似文献   

11.
Do JH  Choi DK 《Molecules and cells》2006,22(3):254-261
DNA microarray is a powerful tool for high-throughput analysis of biological systems. Various computational tools have been created to facilitate the analysis of the large volume of data produced in DNA microarray experiments. Normalization is a critical step for obtaining data that are reliable and usable for subsequent analysis such as identification of differentially expressed genes and clustering. A variety of normalization methods have been proposed over the past few years, but no methods are still perfect. Various assumptions are often taken in the process of normalization. Therefore, the knowledge of underlying assumption and principle of normalization would be helpful for the correct analysis of microarray data. We present a review of normalization techniques from single-labeled platforms such as the Affymetrix GeneChip array to dual-labeled platforms like spotted array focusing on their principles and assumptions.  相似文献   

12.
13.
Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining (TM) methods can help, through their ability to recognise various types of semantic information automatically, e.g., instances of concepts (places, medical conditions, drugs, etc.), synonyms/variant forms of concepts, and relationships holding between concepts (which drugs are used to treat which medical conditions, etc.). TM analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying TM methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text. In this article, we present our efforts to overcome the various challenges faced in the semantic analysis of published historical medical text dating back to the mid 19th century. Firstly, we used evidence from diverse historical medical documents from different periods to develop new resources that provide accounts of the multiple, evolving ways in which concepts, their variants and relationships amongst them may be expressed. These resources were employed to support the development of a modular processing pipeline of TM tools for the robust detection of semantic information in historical medical documents with varying characteristics. We applied the pipeline to two large-scale medical document archives covering wide temporal ranges as the basis for the development of a publicly accessible semantically-oriented search system. The novel resources are available for research purposes, while the processing pipeline and its modules may be used and configured within the Argo TM platform.  相似文献   

14.
Categorization of biomedical articles is a central task for supporting various curation efforts. It can also form the basis for effective biomedical text mining. Automatic text classification in the biomedical domain is thus an active research area. Contests organized by the KDD Cup (2002) and the TREC Genomics track (since 2003) defined several annotation tasks that involved document classification, and provided training and test data sets. So far, these efforts focused on analyzing only the text content of documents. However, as was noted in the KDD'02 text mining contest-where figure-captions proved to be an invaluable feature for identifying documents of interest-images often provide curators with critical information. We examine the possibility of using information derived directly from image data, and of integrating it with text-based classification, for biomedical document categorization. We present a method for obtaining features from images and for using them-both alone and in combination with text-to perform the triage task introduced in the TREC Genomics track 2004. The task was to determine which documents are relevant to a given annotation task performed by the Mouse Genome Database curators. We show preliminary results, demonstrating that the method has a strong potential to enhance and complement traditional text-based categorization methods.  相似文献   

15.
Reid R  Dix DJ  Miller D  Krawetz SA 《BioTechniques》2001,30(4):762-6, 768
The use of commercial microarrays is rapidly becoming the method of choice for profiling gene expression and assessing various disease states. Research Genetics has provided a series of biological and software tools to the research community for these analyses. The fidelity of data analysis using these tools is dependent on a series of well-defined reference control points in the array. During the course of our investigations, it became apparent that in some instances the reference control points that are required for analysis became lost in background noise. This effectively halted the analysis and the recovery of any information contained within that experiment. To recover this data and to increase analytical veracity, the simple strategy of superimposing a template of reference control points onto the experimental array was developed. The utility of this tool is established in this communication.  相似文献   

16.
MOTIVATION: Bioinformatics requires Grid technologies and protocols to build high performance applications without focusing on the low level detail of how the individual Grid components operate. RESULTS: The Discovery Net system is a middleware that allows service developers to integrate tools based on existing and emerging Grid standards such as web services. Once integrated, these tools can be used to compose reusable workflows using these services that can later be deployed as new services for others to use. Using the Discovery Net system and a range of different bioinformatics tools, we built a Grid based application for Genome Annotation. This includes workflows for automatic nucleotide annotation, annotation of predicted proteins and text analysis based on metabolic profiles and text analysis.  相似文献   

17.
Biology is an information-driven science. Large-scale data sets from genomics, physiology, population genetics and imaging are driving research at a dizzying rate. Simultaneously, interdisciplinary collaborations among experimental biologists, theorists, statisticians and computer scientists have become the key to making effective use of these data sets. However, too many biologists have trouble accessing and using these electronic data sets and tools effectively. A 'cyberinfrastructure' is a combination of databases, network protocols and computational services that brings people, information and computational tools together to perform science in this information-driven world. This article reviews the components of a biological cyberinfrastructure, discusses current and pending implementations, and notes the many challenges that lie ahead.  相似文献   

18.
Modern biological and chemical studies rely on life science databases as well as sophisticated software tools (e.g., homology search tools, modeling and visualization tools). These tools often have to be combined and integrated in order to support a given study. SIBIOS (System for the Integration of Bioinformatics Services) serves this purpose. The services are both life science database search services and software tools. The task engine is the core component of SIBIOS. It supports the execution of dynamic workflows that incorporate multiple bioinformatics services. The architecture of SIBIOS, the approaches to addressing the heterogeneity as well as interoperability of bioinformatics services, including data integration are presented in this paper.  相似文献   

19.
PaVESy: Pathway Visualization and Editing System   总被引:1,自引:0,他引:1  
A data managing system for editing and visualization of biological pathways is presented. The main component of PaVESy (Pathway Visualization and Editing System) is a relational SQL database system. The database design allows storage of biological objects, such as metabolites, proteins, genes and respective relations, which are required to assemble metabolic and regulatory biological interactions. The database model accommodates highly flexible annotation of biological objects by user-defined attributes. In addition, specific roles of objects are derived from these attributes in the context of user-defined interactions, e.g. in the course of pathway generation or during editing of the database content. Furthermore, the user may organize and arrange the database content within a folder structure and is free to group and annotate database objects of interest within customizable subsets. Thus, we allow an individualized view on the database content and facilitate user customization. A JAVA-based class library was developed, which serves as the database programming interface to PaVESy. This API provides classes, which implement the concepts of object persistence in SQL databases, such as entries, interactions, annotations, folders and subsets. We created editing and visualization tools for navigation in and visualization of the database content. User approved pathway assemblies are stored and may be retrieved for continued modification, annotation and export. Data export is interfaced with a range of network visualization programs, such as Pajek or other software allowing import of SBML or GML data format. AVAILABILITY: http://pavsey.mpimp-golm.mpg.de  相似文献   

20.
Arguably, the richest source of knowledge (as opposed to fact and data collections) about biology and biotechnology is captured in natural-language documents such as technical reports, conference proceedings and research articles. The automatic exploitation of this rich knowledge base for decision making, hypothesis management (generation and testing) and knowledge discovery constitutes a formidable challenge. Recently, a set of technologies collectively referred to as knowledge discovery in text (KDT) has been advocated as a promising approach to tackle this challenge. KDT comprises three main tasks: information retrieval, information extraction and text mining. These tasks are the focus of much recent scientific research and many algorithms have been developed and applied to documents and text in biology and biotechnology. This article introduces the basic concepts of KDT, provides an overview of some of these efforts in the field of bioscience and biotechnology, and presents a framework of commonly used techniques for evaluating KDT methods, tools and systems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号