首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 281 毫秒
1.
Nomenclatures resulting from the application of various procedures are viewed as communication tools whose optimality can be compared. The traditional, node-based, branch-based, apomorphy-based, and cladotypic procedures are compared based on theoretical cases. The traditional procedure collects several major drawbacks: endings related to ranks are of low information content on taxa hierarchy; with respect to procedures using uninominal species names, in case of a partly unbalanced and/or partly unresolved phylogeny, the application of the procedure results into supernumerary names; a traditional taxon name is prone to be polysemic, depending upon someone’s opinion on the rank and composition of the taxon, and upon conflicting hypotheses on the phylogenetic position of name-bearing types. Alternative systems vary in merit. Names of apomorphy-defined taxa are prone to be polysemic due to possible ambiguity in the formulation of the defining character state. The cladotypic nomenclatural procedure is similar in that respect, but a set of rules allow ambiguity to be limited. The main issue of node- and branch-based procedures is that cases of synonymy cannot be settled if the inner phylogeny of taxa cannot be resolved. Cases of irresolvable synonymy can occur under apomorphy-based and cladotypic procedures, but the problem can be circumvented by the use of taxa whose defining character state is not subject to ambiguous mapping. Node-, branch- and apomorphy-based definitions as governed by the PhyloCode can produce nonsensical statements, but this problem can be fixed by the adjunction of falsifiable assumptions in use under the cladotypic procedure. Cladotypic definitions must involve a fourth assumption formulated as ‘cladotypes belong to different species’ (cladogenesis assumption). The present contribution suggests that the cladotypic procedure outperforms all other proposed procedures, producing an optimal formal lexicon useful for naming and communicating about species and taxa.  相似文献   

2.
Correct spelling of taxon names in vegetation databases is a fundamental prerequisite for many data processing steps. However, manual detection and correction of spelling mistakes is inefficient, prone to errors and non‐reproducible, especially when scanning large databases. Here, I review six software tools that spell‐check taxon names in vegetation databases: (1) the Global Names Resolver, (2) the Interim Register of Marine and Nonmarine Genera, (3) the Taxonomic Name Resolution Service and R packages (4) Plantminer, (5) Taxonstand and (6) tpl. In particular, I test their capacity to spell‐check names across the taxonomic ranks and organism groups frequently encountered in vegetation data and challenge their ability to screen names from different geographic regions. Performance by software tools differed widely in these tests. Backed up by multiple reference lists, the Global Names Resolver emerged as the most versatile software tool. All software solutions currently suffer from some minor limitations, including an inability to spell‐check names of hybrid taxa. Furthermore, some spelling mistakes, by their nature, cannot be resolved unambiguously. Given these limitations, taxon names should be spell‐checked with software tools in a semi‐automatic rather than an automatic way.  相似文献   

3.
Phylogenetic definitions and taxonomic philosophy   总被引:4,自引:0,他引:4  
An examination of the post-Darwinian history of biological taxonomy reveals an implicit assumption that the definitions of taxon names consist of lists of organismal traits. That assumption represents a failure to grant the concept of evolution a central role in taxonomy, and it causes conflicts between traditional methods of defining taxon names and evolutionary concepts of taxa. Phylogenetic definitions of taxon names (de Queiroz and Gauthier 1990) grant the concept of common ancestry a central role in the definitions of taxon names and thus constitute an important step in the development of phylogenetic taxonomy. By treating phylogenetic relationships rather than organismal traits as necessary and sufficient properties, phylogenetic definitions remove conflicts between the definitions of taxon names and evolutionary concepts of taxa. The general method of definition represented by phylogenetic definitions of clade names can be applied to the names of other kinds of composite wholes, including populations and biological species. That the names of individuals (composite wholes) can be defined in terms of necessary and sufficient properties provides the foundation for a synthesis of seemingly incompatible positions held by contemporary individualists and essentialists concerning the nature of taxa and the definitions of taxon names.  相似文献   

4.
A major challenge facing biodiversity informatics is integrating data stored in widely distributed databases. Initial efforts have relied on taxonomic names as the shared identifier linking records in different databases. However, taxonomic names have limitations as identifiers, being neither stable nor globally unique, and the pace of molecular taxonomic and phylogenetic research means that a lot of information in public sequence databases is not linked to formal taxonomic names. This review explores the use of other identifiers, such as specimen codes and GenBank accession numbers, to link otherwise disconnected facts in different databases. The structure of these links can also be exploited using the PageRank algorithm to rank the results of searches on biodiversity databases. The key to rich integration is a commitment to deploy and reuse globally unique, shared identifiers [such as Digital Object Identifiers (DOIs) and Life Science Identifiers (LSIDs)], and the implementation of services that link those identifiers.  相似文献   

5.
ABSTRACT: BACKGROUND: A scientific name for an organism can be associated with almost all biological data. Name identification is an important step in many text mining tasks aiming to extract useful information from biological, biomedical and biodiversity text sources. A scientific name acts as an important metadata element to link biological information. RESULTS: We present NetiNeti (Name Extraction from Textual Information-Name Extraction for Taxonomic Indexing), a machine learning based approach for recognition of scientific names including the discovery of new species names from text that will also handle misspellings, OCR errors and other variations in names. The system generates candidate names using rules for scientific names and applies probabilistic machine learning methods to classify names based on structural features of candidate names and features derived from their contexts. NetiNeti can also disambiguate scientific names from other names using the contextual information. We evaluated NetiNeti on legacy biodiversity texts and biomedical literature (MEDLINE). NetiNeti performs better (precision = 98.9 % and recall = 70.5 %) compared to a popular dictionary based approach (precision = 97.5 % and recall = 54.3 %) on a 600-page biodiversity book that was manually marked by an annotator. On a small set of PubMed Central's full text articles annotated with scientific names, the precision and recall values are 98.5 % and 96.2 % respectively. NetiNeti found more than 190,000 unique binomial and trinomial names in more than 1,880,000 PubMed records when used on the full MEDLINE database. NetiNeti also successfully identifies almost all of the new species names mentioned within web pages. Additionally, we present the comparison results of various machine learning algorithms on our annotated corpus. Naive Bayes and Maximum Entropy with Generalized Iterative Scaling (GIS) parameter estimation are the top two performing algorithms. CONCLUSIONS: We present NetiNeti, a machine learning based approach for identification and discovery of scientific names. The system implementing the approach can be accessed at http://namefinding.ubio.org.  相似文献   

6.

Background

The digitization of biodiversity data is leading to the widespread application of taxon names that are superfluous, ambiguous or incorrect, resulting in mismatched records and inflated species numbers. The ultimate consequences of misspelled names and bad taxonomy are erroneous scientific conclusions and faulty policy decisions. The lack of tools for correcting this ‘names problem’ has become a fundamental obstacle to integrating disparate data sources and advancing the progress of biodiversity science.

Results

The TNRS, or Taxonomic Name Resolution Service, is an online application for automated and user-supervised standardization of plant scientific names. The TNRS builds upon and extends existing open-source applications for name parsing and fuzzy matching. Names are standardized against multiple reference taxonomies, including the Missouri Botanical Garden's Tropicos database. Capable of processing thousands of names in a single operation, the TNRS parses and corrects misspelled names and authorities, standardizes variant spellings, and converts nomenclatural synonyms to accepted names. Family names can be included to increase match accuracy and resolve many types of homonyms. Partial matching of higher taxa combined with extraction of annotations, accession numbers and morphospecies allows the TNRS to standardize taxonomy across a broad range of active and legacy datasets.

Conclusions

We show how the TNRS can resolve many forms of taxonomic semantic heterogeneity, correct spelling errors and eliminate spurious names. As a result, the TNRS can aid the integration of disparate biological datasets. Although the TNRS was developed to aid in standardizing plant names, its underlying algorithms and design can be extended to all organisms and nomenclatural codes. The TNRS is accessible via a web interface at http://tnrs.iplantcollaborative.org/ and as a RESTful web service and application programming interface. Source code is available at https://github.com/iPlantCollaborativeOpenSource/TNRS/.  相似文献   

7.
8.
Gene name ambiguity of eukaryotic nomenclatures   总被引:1,自引:0,他引:1  
MOTIVATION: With more and more scientific literature published online, the effective management and reuse of this knowledge has become problematic. Natural language processing (NLP) may be a potential solution by extracting, structuring and organizing biomedical information in online literature in a timely manner. One essential task is to recognize and identify genomic entities in text. 'Recognition' can be accomplished using pattern matching and machine learning. But for 'identification' these techniques are not adequate. In order to identify genomic entities, NLP needs a comprehensive resource that specifies and classifies genomic entities as they occur in text and that associates them with normalized terms and also unique identifiers so that the extracted entities are well defined. Online organism databases are an excellent resource to create such a lexical resource. However, gene name ambiguity is a serious problem because it affects the appropriate identification of gene entities. In this paper, we explore the extent of the problem and suggest ways to address it. RESULTS: We obtained gene information from 21 organisms and quantified naming ambiguities within species, across species, with English words and with medical terms. When the case (of letters) was retained, official symbols displayed negligible intra-species ambiguity (0.02%) and modest ambiguities with general English words (0.57%) and medical terms (1.01%). In contrast, the across-species ambiguity was high (14.20%). The inclusion of gene synonyms increased intra-species ambiguity substantially and full names contributed greatly to gene-medical-term ambiguity. A comprehensive lexical resource that covers gene information for the 21 organisms was then created and used to identify gene names by using a straightforward string matching program to process 45,000 abstracts associated with the mouse model organism while ignoring case and gene names that were also English words. We found that 85.1% of correctly retrieved mouse genes were ambiguous with other gene names. When gene names that were also English words were included, 233% additional 'gene' instances were retrieved, most of which were false positives. We also found that authors prefer to use synonyms (74.7%) to official symbols (17.7%) or full names (7.6%) in their publications. CONTACT: lifeng.chen@dbmi.columbia.edu  相似文献   

9.
The PhyloCode: a critical discussion of its theoretical foundation   总被引:2,自引:0,他引:2  
The definition of taxon names as formalized by the PhyloCode is based on Kripke's thesis of "rigid designation" that applies to Millian proper names. Accepting the thesis of "rigid designation" into systematics in turn is based on the thesis that species, and taxa, are individuals. These largely semantic and metaphysical issues are here contrasted with an epistemological approach to taxonomy. It is shown that the thesis of "rigid designation" if deployed in taxonomy introduces a new essentialism into systematics, which is exactly what the PhyloCode was designed to avoid. Rigidly designating names are not supposed to change their meaning, but if the shifting constitution of a clade is thought to cause a shift of meaning of the taxon name, then the taxon name is not a "rigid designator". Phylogenetic nomenclature either fails to preserve the stability of meaning of taxon names that it propagates, or it is rendered inconsistent with its own philosophical background. The alternative explored here is to conceptualize taxa as natural kinds, and to replace the analytic definition of taxon names by their explanatory definition. Such conceptualization of taxa allows taxon names to better track the results of ongoing empirical research. The semantic as well as epistemic gain is that if taxon names are associated with natural kind terms instead of being proper names, the composition of the taxon will naturally determine the meaning of its name.
© The Willi Hennig Society 2006.  相似文献   

10.
Problem: The increasing availability of large vegetation databases holds great potential in ecological research and biodiversity informatics, However, inconsistent application of plant names compromises the usefulness of these databases. This problem has been acknowledged in recent years, and solutions have been proposed, such as the concept of “potential taxa” or “taxon views”. Unfortunately, awareness of the problem remains low among vegetation scientists. Methods: We demonstrate how misleading interpretations caused by inconsistent use of plant names might occur through the course of vegetation analysis, from relevés upward through databases, and then to the final analyses. We discuss how these problems might be minimized. Results: We highlight the importance of taxonomic reference lists for standardizing plant names and outline standards they should fulfill to be useful for vegetation databases. Additionally, we present the R package vegdata, which is designed to solve name‐related problems that arise when analysing vegetation databases. Conclusions: We conclude that by giving more consideration to the appropriate application of plant names, vegetation scientists might enhance the reliability of analyses obtained from large vegetation databases.  相似文献   

11.
Absolute nomenclatural stability is undesirable in phylogenetic classifications because they reflect changing hypotheses of cladistic relationships. De Queiroz and Gauthier's (1990: Syst. Zool. 39, 307–322; 1992: A. Rev. Ecol. Syst. 23, 449–480; 1994: Trends Ecol. Evol. 9, 27–31) alternative to Linnaean nomenclature is concluded to provide stable names for unstable concepts. In terms of communicating either characters shared by species of a named taxon or elements (species) included in a taxon, de Queiroz and Gauthier's system is less stable than the Linnaean system. Linnaean ranks communicate limited information about inclusivity of taxa, but abandonment of ranks results in the loss of such information. As cladistic hypotheses advance, taxa named under de Queiroz and Gauthier's system can change their level of generality radically, from being part of a group to including it, without any indicative change in its spelling. The Linnaean system has been retained by taxonomists because its hierarchic ranks are logically compatible with nested sets of species, monophyletic groups, and characters. Other authors have offered conventions to increase the cladistic information content of Linnaean names or to replace them with names that convey cladistic knowledge in greater detail; de Queiroz and Gauthier sacrifice the meaning of taxon names and categorical ranks in favor of spelling stability.  相似文献   

12.
Linking gene and protein names mentioned in the literature to unique identifiers in referent genomic databases is an essential step in accessing and integrating knowledge in the biomedical domain. However, it remains a challenging task due to lexical and terminological variation, and ambiguity of gene name mentions in documents. We present a generic and effective rule-based approach to link gene mentions in the literature to referent genomic databases, where pre-processing of both gene synonyms in the databases and gene mentions in text are first applied. The mapping method employs a cascaded approach, which combines exact, exact-like and token-based approximate matching by using flexible representations of a gene synonym dictionary and gene mentions generated during the pre-processing phase. We also consider multi-gene name mentions and permutation of components in gene names. A systematic evaluation of the suggested methods has identified steps that are beneficial for improving either precision or recall in gene name identification. The results of the experiments on the BioCreAtIvE2 data sets (identification of human gene names) demonstrated that our methods achieved highly encouraging results with F-measure of up to 81.20%.  相似文献   

13.
14.
15.
In biological systematics, as well as in the philosophy of biology, species and higher taxa are individuated through their unique evolutionary origin. This is taken by some authors to mean that monophyly is a (relational) property not only of higher taxa, but also of species. A species is said to originate through speciation, and to go extinct when it splits into two daughter species (or through terminal extinction). Its unique evolutionary origin is said to bestow identity on a species through time and change, and to render species names rigid designators. Species names are thus believed to function just like names of supraspecific taxa. However, large parts of the Web of Life are composed of species that do not have a unique evolutionary origin from a single population, lineage or stem-species. Further, monophyly is an ambiguous concept if it is defined simply in terms of 'unique evolutionary origin'. Disambiguating the concept by defining a monophyletic taxon as 'a taxon that includes the ancestor and all, and only, its descendant' renders monophyly inapplicable to species. At the heart of the problem lies a fundamental distinction between species and monophyletic taxa, where species form mutually exclusive reticulated systems, while higher taxa form inclusive hierarchical systems. Examples are given both at the species level and below to illustrate the problems that result from the application of the monophyly criterion to species. The conclusion is that the concepts of exclusivity and monophyly should be treated as non-overlapping: exclusivity marks out a species synchronistically, i.e. in the present time. Monophyly marks out clades (groups of species) diachronistically, i.e. within an historical dimension.  相似文献   

16.
Least-inclusive taxonomic unit: a new taxonomic concept for biology   总被引:2,自引:0,他引:2  
Phylogenetic taxonomy has been introduced as a replacement for the Linnaean system. It differs from traditional nomenclature in defining taxon names with reference to phylogenetic trees and in not employing ranks for supraspecific taxa. However, 'species' are currently kept distinct. Within a system of phylogenetic taxonomy we believe that taxon names should refer to monophyletic groups only and that species should not be recognized as taxa. To distinguish the smallest identified taxa, we here introduce the least-inclusive taxonomic unit (LITU), which are differentiated from more inclusive taxa by initial lower-case letters. LITUs imply nothing absolute about inclusiveness, only that subdivisions are not presently recognized.  相似文献   

17.
A taxon is aphyletic when it is deemed to be non-monophyletic or unresolved, therefore aphyletic taxa are a taxonomic problem rather than an evolutionary anomaly. A problem arises in systematics when taxonomic names assigned to aphyletic taxa are treated as if they were natural groups. In the absence of a taxonomic and systematic revision, anomalous taxa should be labelled as aphyletic without recourse to phylogenetic inference (i.e., interpretation). Doing so avoids the validation of aphyletic names and the creation of dubious results in fields that rely on systematic and taxonomic data.  相似文献   

18.
The names used by biologists to label the observations they make are imprecise. This is an issue as workers increasingly seek to exploit data gathered from multiple, unrelated sources on line. Even when the international codes of nomenclature are followed strictly the resulting names (Taxon Names) do not uniquely identify the taxa (Taxon Concepts) that have been described by taxonomists but merely groups of type specimens. A standard data model for exchange of taxonomic information is described. It addresses this issue by facilitating explicit communication of information about Taxon Concepts and their associated names. A representation of this model as a XML Schema is introduced and the implications of the use of Globally Unique Identifiers discussed.  相似文献   

19.
Names of 20 presumed taxa in Draba sect. Aizopsis, all based on material from Italy, are considered. Full synonymies are provided, and types are designated (for 14 names) or indicated. Most of the taxa are currently considered unworthy of recognition, of which 16 belong to D. aspera sensu lato. However, the question of whether the Sicilian populations might be distinct from the peninsular populations is still unsettled.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号