首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Taxonomic names and phylogenetic hypotheses are indispensable tools for modern biological research, both basic and applied. Like all disciplines, parasitology suffers from the 'taxonomic impediment' - a global shortage of professional taxonomists and systematists. Only a fraction of the species of parasites on this planet have been identified, and the evolutionary relationships of only a minority of those are understood; thus, information on how to manage parasite biodiversity, including known and potential disease agents, is incomplete. A renewal of systematic parasitology has a key role in redefining the relationship between mankind and the organisms whose biology fascinates us so much.  相似文献   

2.
3.
ABSTRACT: BACKGROUND: A scientific name for an organism can be associated with almost all biological data. Name identification is an important step in many text mining tasks aiming to extract useful information from biological, biomedical and biodiversity text sources. A scientific name acts as an important metadata element to link biological information. RESULTS: We present NetiNeti (Name Extraction from Textual Information-Name Extraction for Taxonomic Indexing), a machine learning based approach for recognition of scientific names including the discovery of new species names from text that will also handle misspellings, OCR errors and other variations in names. The system generates candidate names using rules for scientific names and applies probabilistic machine learning methods to classify names based on structural features of candidate names and features derived from their contexts. NetiNeti can also disambiguate scientific names from other names using the contextual information. We evaluated NetiNeti on legacy biodiversity texts and biomedical literature (MEDLINE). NetiNeti performs better (precision = 98.9 % and recall = 70.5 %) compared to a popular dictionary based approach (precision = 97.5 % and recall = 54.3 %) on a 600-page biodiversity book that was manually marked by an annotator. On a small set of PubMed Central's full text articles annotated with scientific names, the precision and recall values are 98.5 % and 96.2 % respectively. NetiNeti found more than 190,000 unique binomial and trinomial names in more than 1,880,000 PubMed records when used on the full MEDLINE database. NetiNeti also successfully identifies almost all of the new species names mentioned within web pages. Additionally, we present the comparison results of various machine learning algorithms on our annotated corpus. Naive Bayes and Maximum Entropy with Generalized Iterative Scaling (GIS) parameter estimation are the top two performing algorithms. CONCLUSIONS: We present NetiNeti, a machine learning based approach for identification and discovery of scientific names. The system implementing the approach can be accessed at http://namefinding.ubio.org.  相似文献   

4.
Collaborative effort among four lead indexes of taxon names and nomenclatural acts ( International Plant Name Index (IPNI), Index Fungorum, MycoBank and ZooBank) and the journals PhytoKeys, MycoKeys and ZooKeys to create an automated, pre-publication, registration workflow, based on a server-to-server, XML request/response model. The registration model for ZooBank uses the TaxPub schema, which is an extension to the Journal Tag Publishing Suite (JATS) of the National Library of Medicine (NLM). The indexing or registration model of IPNI and Index Fungorum will use the Taxonomic Concept Transfer Schema (TCS) as a basic standard for the workflow. Other journals and publishers who intend to implement automated, pre-publication, registration of taxon names and nomenclatural acts can also use the open sample XML formats and links to schemas and relevant information published in the paper.  相似文献   

5.
uBioRSS: tracking taxonomic literature using RSS   总被引:1,自引:0,他引:1  
Web content syndication through standard formats such as RSS and ATOM has become an increasingly popular mechanism for publishers, news sources and blogs to disseminate regularly updated content. These standardized syndication formats deliver content directly to the subscriber, allowing them to locally aggregate content from a variety of sources instead of having to find the information on multiple websites. The uBioRSS application is a 'taxonomically intelligent' service customized for the biological sciences. It aggregates syndicated content from academic publishers and science news feeds, and then uses a taxonomic Named Entity Recognition algorithm to identify and index taxonomic names within those data streams. The resulting name index is cross-referenced to current global taxonomic datasets to provide context for browsing the publications by taxonomic group. This process, called taxonomic indexing, draws upon services developed specifically for biological sciences, collectively referred to as 'taxonomic intelligence'. Such value-added enhancements can provide biologists with accelerated and improved access to current biological content. AVAILABILITY: http://names.ubio.org/rss/  相似文献   

6.
7.
Extraction of regulatory gene/protein networks from Medline   总被引:2,自引:0,他引:2  
MOTIVATION: We have previously developed a rule-based approach for extracting information on the regulation of gene expression in yeast. The biomedical literature, however, contains information on several other equally important regulatory mechanisms, in particular phosphorylation, which we now expanded for our rule-based system also to extract. RESULTS: This paper presents new results for extraction of relational information from biomedical text. We have improved our system, STRING-IE, to capture both new types of linguistic constructs as well as new types of biological information [i.e. (de-)phosphorylation]. The precision remains stable with a slight increase in recall. From almost one million PubMed abstracts related to four model organisms, we manage to extract regulatory networks and binary phosphorylations comprising 3,319 relation chunks. The accuracy is 83-90% and 86-95% for gene expression and (de-)phosphorylation relations, respectively. To achieve this, we made use of an organism-specific resource of gene/protein names considerably larger than those used in most other biology related information extraction approaches. These names were included in the lexicon when retraining the part-of-speech (POS) tagger on the GENIA corpus. For the domain in question, an accuracy of 96.4% was attained on POS tags. It should be noted that the rules were developed for yeast and successfully applied to both abstracts and full-text articles related to other organisms with comparable accuracy. AVAILABILITY: The revised GENIA corpus, the POS tagger, the extraction rules and the full sets of extracted relations are available from http://www.bork.embl.de/Docu/STRING-IE  相似文献   

8.
Flies make up more than 10% of the planetary biota and our well-being depends on how we manage our coexistence with flies. Storing and accessing relevant knowledge about flies is intimately connected with using correct names, and Systema Dipterorum provides a single authoritative classification for flies developed by consensus among contributors. The 160,000 species of flies currently known are distributed among 160 recent families and some 12,000 genera, which with their synonyms encompass a total of more than a quarter of a million names. These names and their associated classification are shared with relevant global solutions. Sherborn appears to have done remarkably well indexing Diptera names with an overall error rate estimated to be close to 1%.  相似文献   

9.
长期以来, 分类学的研究成果主要以平面的方式发表在各种纸质文献资料中。近年来, 随着计算机和人工智能等新兴技术的发展和应用, 平面资料信息的数字化成为一种趋势, 世界各国都非常重视本国生物资源信息的收集汇总, 构建了多种数据库, 为科学研究、政府决策、资源保护、合理利用和科学传播提供了重要的信息基础。本研究探索并建立从菌物学平面资料信息构建数据库的流程和方法, 并在中国菌物名录数据库和Index Fungorum所收集数据的基础上进行数据挖掘和分析。通过软件操作和程序设计, 在数据库中提取相关信息, 辅助完成了《中国生物多样性红色名录——大型真菌卷》的编制工作, 同时梳理和规范了菌物拉丁和汉语学名, 为菌物分类研究和资源评估与保护提供基础。  相似文献   

10.
Lists of abbreviations for genus names of bacteria are expanded to accommodate 103 new entries which are names that have been validity published since the publication of an updated list by Rogosa et al. in 1986 (Int. J. Syst. Bacteriol. 36:464-472). These abbreviations are provided to serve the need for appropriate codified abbreviations for use in processing or indexing of information on computers.  相似文献   

11.
Afzelius BA 《Tissue & cell》1988,20(3):473-475
The nine microtubular doublets of cilia and flagella have distinctive features that make it possible to assign an index number to each of them. Such an indexing has been used for a long time for animal cilia and flagella, whereas other indexing systems have been proposed recently for plant cilia. It is shown here that the similarity between cilia from animals and cilia from plants and protists is so great that the same indexing system can be used for all cilia regardless of their derivation.  相似文献   

12.
ABSTRACT: BACKGROUND: Seqcrawler takes its roots in software like SRS or Lucegene. It provides an indexing platform to ease the search of data and meta-data in biological banks and it can scale to face the current flow of data. While many biological bank search tools are available on the Internet, mainly provided by large organizations to search in their data, there is a lack of free and open source solution to browse one own set of data with a flexible query system and able to scale from single computer to a cloud system. A personal index platform will help labs and bioinformaticians to search in their meta-data but also to build a larger information system with custom subsets of data. RESULTS: The software is scalable from a single computer to a cloud-based infrastructure. It has been successfully tested in a private cloud with 3 index shards (piece of index) hosting ~400 millions of sequence information (whole GenBank, UniProt, PDB and others) for a total size of 600 GB in a fault tolerant architecture (high-availability). It has also been successfully integrated with software to add extra meta-data from blast results to enhance user's result analysis. CONCLUSIONS: Seqcrawler provides a complete open source search and store solution for labs or platforms needing to manage large amount of data/meta-data with a flexible and customizable web interface. All components (search engine, visualization and data storage), though independent, share a common and coherent data system that can be queried with a simple HTTP interface. The solution scales easily and can also provide a high availability infrastructure.  相似文献   

13.
14.
The Linnaean system of nomenclature has been used and adapted by biologists over a period of almost 250 years. Under the current system of codes, it is now applied to more than 2 million species of organisms. Inherent in the Linnaean system is the indication of hierarchical relationships. The Linnaean system has been justified primarily on the basis of stability. Stability can be assessed on at least two grounds: the absolute stability of names, irrespective of taxonomic concept; and the stability of names under changing concepts. Recent arguments have invoked conformity to phylogenetic methods as the primary basis for choice of nomenclatural systems, but even here stability of names as they relate to monophyletic groups is stated as the ultimate objective. The idea of absolute stability as the primary justification for nomenclatural methods was wrong from the start. The reasons are several. First, taxa are concepts, no matter the frequency of assertions to the contrary; as such, they are subject to change at all levels and always will be, with the consequence that to some degree the names we use to refer to them will also be subject to change. Second, even if the true nature of all taxa could be agreed upon, the goal would require that we discover them all and correctly recognize them for what they are. Much of biology is far from that goal at the species level and even further for supraspecific taxa. Nomenclature serves as a tool for biology. Absolute stability of taxonomic concepts—and nomenclature—would hinder scientific progress rather than promote it. It can been demonstrated that the scientific goals of systematists are far from achieved. Thus, the goal of absolute nomenclatural stability is illusory and misguided. The primary strength of the Linnaean system is its ability to portray hierarchical relationships; stability is secondary. No single system of nomenclature can ever possess all desirable attributes: i.e., convey information on hierarchical relationships, provide absolute stability in the names portraying those relationships, and provide simplicity and continuity in communicating the identities of the taxa and their relationships. Aside from myriad practical problems involved in its implementation, it must be concluded that “phylogenetic nomenclature” would not provide a more stable and effective system for communicating information on biological classifications than does the Linnaean system.  相似文献   

15.
Disseminating and facilitating access to science-based information is a necessity to enable the public to make informed decisions about appropriate uses of biotechnology products. It is also one of the major objectives of Co-Extra, a European-funded project addressing co-existence of genetically modified organisms and non-genetically modified organisms in Europe and their traceability. To this end, a dynamic and interactive website has been developed as the core element of the Co-Extra external communication strategy. This website has been designed to make it attractive and accessible to a large audience in a very simple and practical manner, building on practical experiences gained in the development of other websites related to biotechnology and genetically modified organisms. The website delivers popularized information for the general public as well as scientific data meant primarily for the more expert readers. It also provides for various permanent tools allowing multidirectional interaction with its visitors. Content is displayed using a web-based platform, based on a sophisticated Content Management System. First results indicate a high level of interest from the general public and from experts, showing that the content of this website can contribute by communicating science-based information to improve awareness and understanding of biotechnology.  相似文献   

16.
Biological knowledge can be inferred from three major levels of information: molecules, organisms and ecologies. Bioinformatics is an established field that has made significant advances in the development of systems and techniques to organize contemporary molecular data; biodiversity informatics is an emerging discipline that strives to develop methods to organize knowledge at the organismal level extending back to the earliest dates of recorded natural history. Furthermore, while bioinformatics studies generally focus on detailed examinations of key 'model' organisms, biodiversity informatics aims to develop over-arching hypotheses that span the entire tree of life. Biodiversity informatics is presented here as a discipline that unifies biological information from a range of contemporary and historical sources across the spectrum of life using organisms as the linking thread. The present review primarily focuses on the use of organism names as a universal metadata element to link and integrate biodiversity data across a range of data sources.  相似文献   

17.
18.

Background

The digitization of biodiversity data is leading to the widespread application of taxon names that are superfluous, ambiguous or incorrect, resulting in mismatched records and inflated species numbers. The ultimate consequences of misspelled names and bad taxonomy are erroneous scientific conclusions and faulty policy decisions. The lack of tools for correcting this ‘names problem’ has become a fundamental obstacle to integrating disparate data sources and advancing the progress of biodiversity science.

Results

The TNRS, or Taxonomic Name Resolution Service, is an online application for automated and user-supervised standardization of plant scientific names. The TNRS builds upon and extends existing open-source applications for name parsing and fuzzy matching. Names are standardized against multiple reference taxonomies, including the Missouri Botanical Garden's Tropicos database. Capable of processing thousands of names in a single operation, the TNRS parses and corrects misspelled names and authorities, standardizes variant spellings, and converts nomenclatural synonyms to accepted names. Family names can be included to increase match accuracy and resolve many types of homonyms. Partial matching of higher taxa combined with extraction of annotations, accession numbers and morphospecies allows the TNRS to standardize taxonomy across a broad range of active and legacy datasets.

Conclusions

We show how the TNRS can resolve many forms of taxonomic semantic heterogeneity, correct spelling errors and eliminate spurious names. As a result, the TNRS can aid the integration of disparate biological datasets. Although the TNRS was developed to aid in standardizing plant names, its underlying algorithms and design can be extended to all organisms and nomenclatural codes. The TNRS is accessible via a web interface at http://tnrs.iplantcollaborative.org/ and as a RESTful web service and application programming interface. Source code is available at https://github.com/iPlantCollaborativeOpenSource/TNRS/.  相似文献   

19.
This paper presents an approach using syntactosemantic rules for the extraction of relational information from biomedical abstracts. The results show that by overcoming the hurdle of technical terminology, high precision results can be achieved. From abstracts related to baker's yeast, we manage to extract a regulatory network comprised of 441 pairwise relations from 58,664 abstracts with an accuracy of 83 - 90%. To achieve this, we made use of a resource of gene/protein names considerably larger than those used in most other biology related information extraction approaches. This list of names was included in the lexicon of our retrained partof- speech tagger for use on molecular biology abstracts. For the domain in question an accuracy of 93.6 - 97.7% was attained on Part-of-speech-tags. The method can be easily adapted to other organisms than yeast, allowing us to extract many more biologically relevant relations. The main reason for the comparable precision rates is the ontological model that was built beforehand and served as a guiding force for the manual coding of the syntactosemantic rules.  相似文献   

20.
Phylogenetic taxonomy is applied for the systematization of a new Hesionidae , rather than the traditional Linnean system, with an apomorphy-based definition of the name and without reference to rank. It is argued that biological diversity is better represented without species concepts, but that it is useful to specify when a name refers to a smallest known clade which currently cannot be further subdivided; for this we apply the newly introduced concept LITU (Least-Inclusive Taxonomic Unit) for the new taxon. LITUs are made identifiable by being italicised with lower-case initial letter; all other taxon names are italicised with capital initial letter. The Hesionidae is accordingly named capricornia , new taxon, and was found in shallow water at One Tree Island, Capricorn Group, southernmost part of the Great Barrier Reef. capricornia is small (length < 2 mm), and exhibits a number of larval Hesionidae characters, but is characterized by large paired ventrally situated penes on segment 9 in adult males. A cladistic parsimony analysis based on morphological characters of capricornia and a selection of other Hesionidae indicates that it belongs within Gyptini Pleijel 1998, and is the sister group of Amphiduros Hartman (1959) .  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号