首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.

Background  

Life Science Identifiers (LSIDs) are persistent, globally unique identifiers for biological objects. The decentralised nature of LSIDs makes them attractive for identifying distributed resources. Data of interest to biodiversity researchers (including specimen records, images, taxonomic names, and DNA sequences) are distributed over many different providers, and this community has adopted LSIDs as the identifier of choice.  相似文献   

3.
4.
Babnigg G  Giometti CS 《Proteomics》2006,6(16):4514-4522
In proteome studies, identification of proteins requires searching protein sequence databases. The public protein sequence databases (e.g., NCBInr, UniProt) each contain millions of entries, and private databases add thousands more. Although much of the sequence information in these databases is redundant, each database uses distinct identifiers for the identical protein sequence and often contains unique annotation information. Users of one database obtain a database-specific sequence identifier that is often difficult to reconcile with the identifiers from a different database. When multiple databases are used for searches or the databases being searched are updated frequently, interpreting the protein identifications and associated annotations can be problematic. We have developed a database of unique protein sequence identifiers called Sequence Globally Unique Identifiers (SEGUID) derived from primary protein sequences. These identifiers serve as a common link between multiple sequence databases and are resilient to annotation changes in either public or private databases throughout the lifetime of a given protein sequence. The SEGUID Database can be downloaded (http://bioinformatics.anl.gov/SEGUID/) or easily generated at any site with access to primary protein sequence databases. Since SEGUIDs are stable, predictions based on the primary sequence information (e.g., pI, Mr) can be calculated just once; we have generated approximately 500 different calculations for more than 2.5 million sequences. SEGUIDs are used to integrate MS and 2-DE data with bioinformatics information and provide the opportunity to search multiple protein sequence databases, thereby providing a higher probability of finding the most valid protein identifications.  相似文献   

5.
6.
This study summarizes results of a DNA barcoding campaign on German Diptera, involving analysis of 45,040 specimens. The resultant DNA barcode library includes records for 2,453 named species comprising a total of 5,200 barcode index numbers (BINs), including 2,700 COI haplotype clusters without species‐level assignment, so called “dark taxa.” Overall, 88 out of 117 families (75%) recorded from Germany were covered, representing more than 50% of the 9,544 known species of German Diptera. Until now, most of these families, especially the most diverse, have been taxonomically inaccessible. By contrast, within a few years this study provided an intermediate taxonomic system for half of the German Dipteran fauna, which will provide a useful foundation for subsequent detailed, integrative taxonomic studies. Using DNA extracts derived from bulk collections made by Malaise traps, we further demonstrate that species delineation using BINs and operational taxonomic units (OTUs) constitutes an effective method for biodiversity studies using DNA metabarcoding. As the reference libraries continue to grow, and gaps in the species catalogue are filled, BIN lists assembled by metabarcoding will provide greater taxonomic resolution. The present study has three main goals: (a) to provide a DNA barcode library for 5,200 BINs of Diptera; (b) to demonstrate, based on the example of bulk extractions from a Malaise trap experiment, that DNA barcode clusters, labelled with globally unique identifiers (such as OTUs and/or BINs), provide a pragmatic, accurate solution to the “taxonomic impediment”; and (c) to demonstrate that interim names based on BINs and OTUs obtained through metabarcoding provide an effective method for studies on species‐rich groups that are usually neglected in biodiversity research projects because of their unresolved taxonomy.  相似文献   

7.
8.
Given the growing wealth of downstream information, the integration of molecular and non-molecular data on a given organism has become a major challenge. For micro-organisms, this information now includes a growing collection of sequenced genes and complete genomes, and for communities of organisms it includes metagenomes. Integration of the data is facilitated by the existence of authoritative, community-recognized, consensus identifiers that may form the heart of so-called information knuckles. The Genomic Standards Consortium (GSC) is building a mapping of identifiers across a group of federated databases with the aim to improve navigation across these resources and to enable the integration of their information in the near future. In particular, this is possible because of the existence of INSDC Genome Project Identifiers (GPIDs) and accession numbers, and the ability of the community to define new consensus identifiers such as the culture identifiers used in the StrainInfo.net bioportal. Here we outline (1) the general design of the Genomic Rosetta Stone project, (2) introduce example linkages between key databases (that cover information about genomes, 16S rRNA gene sequences, and microbial biological resource centers), and (3) make an open call for participation in this project providing a vision for its future use.  相似文献   

9.
Towards a collaborative, global infrastructure for biodiversity assessment   总被引:4,自引:0,他引:4  
Biodiversity data are rapidly becoming available over the Internet in common formats that promote sharing and exchange. Currently, these data are somewhat problematic, primarily with regard to geographic and taxonomic accuracy, for use in ecological research, natural resources management and conservation decision-making. However, web-based georeferencing tools that utilize best practices and gazetteer databases can be employed to improve geographic data. Taxonomic data quality can be improved through web-enabled valid taxon names databases and services, as well as more efficient mechanisms to return systematic research results and taxonomic misidentification rates back to the biodiversity community. Both of these are under construction. A separate but related challenge will be developing web-based visualization and analysis tools for tracking biodiversity change. Our aim was to discuss how such tools, combined with data of enhanced quality, will help transform today's portals to raw biodiversity data into nexuses of collaborative creation and sharing of biodiversity knowledge.  相似文献   

10.
Problem: The increasing availability of large vegetation databases holds great potential in ecological research and biodiversity informatics, However, inconsistent application of plant names compromises the usefulness of these databases. This problem has been acknowledged in recent years, and solutions have been proposed, such as the concept of “potential taxa” or “taxon views”. Unfortunately, awareness of the problem remains low among vegetation scientists. Methods: We demonstrate how misleading interpretations caused by inconsistent use of plant names might occur through the course of vegetation analysis, from relevés upward through databases, and then to the final analyses. We discuss how these problems might be minimized. Results: We highlight the importance of taxonomic reference lists for standardizing plant names and outline standards they should fulfill to be useful for vegetation databases. Additionally, we present the R package vegdata, which is designed to solve name‐related problems that arise when analysing vegetation databases. Conclusions: We conclude that by giving more consideration to the appropriate application of plant names, vegetation scientists might enhance the reliability of analyses obtained from large vegetation databases.  相似文献   

11.
Linking gene and protein names mentioned in the literature to unique identifiers in referent genomic databases is an essential step in accessing and integrating knowledge in the biomedical domain. However, it remains a challenging task due to lexical and terminological variation, and ambiguity of gene name mentions in documents. We present a generic and effective rule-based approach to link gene mentions in the literature to referent genomic databases, where pre-processing of both gene synonyms in the databases and gene mentions in text are first applied. The mapping method employs a cascaded approach, which combines exact, exact-like and token-based approximate matching by using flexible representations of a gene synonym dictionary and gene mentions generated during the pre-processing phase. We also consider multi-gene name mentions and permutation of components in gene names. A systematic evaluation of the suggested methods has identified steps that are beneficial for improving either precision or recall in gene name identification. The results of the experiments on the BioCreAtIvE2 data sets (identification of human gene names) demonstrated that our methods achieved highly encouraging results with F-measure of up to 81.20%.  相似文献   

12.
13.
The names used by biologists to label the observations they make are imprecise. This is an issue as workers increasingly seek to exploit data gathered from multiple, unrelated sources on line. Even when the international codes of nomenclature are followed strictly the resulting names (Taxon Names) do not uniquely identify the taxa (Taxon Concepts) that have been described by taxonomists but merely groups of type specimens. A standard data model for exchange of taxonomic information is described. It addresses this issue by facilitating explicit communication of information about Taxon Concepts and their associated names. A representation of this model as a XML Schema is introduced and the implications of the use of Globally Unique Identifiers discussed.  相似文献   

14.

Background  

The taxonomic name of an organism is a key link between different databases that store information on that organism. However, in the absence of a single, comprehensive database of organism names, individual databases lack an easy means of checking the correctness of a name. Furthermore, the same organism may have more than one name, and the same name may apply to more than one organism.  相似文献   

15.
56 Algaebase     
AlgaeBase ( http://www.algaebase.org ) is a web-searchable store of information on those protists generally considered to be algae. Access is free and some 10,000 browser searches on average now take place each month. The database was established in 1996 and at first only included seaweeds. Its main function at this time was as a catalog of the marine algae of Europe for the European Union-funded BioMar and European Register of Marine Species (ERMS) projects, and for the recently published Atlas and Check-list of the Seaweeds of Britain and Ireland. The data are now being extended to cover all algae. Over 50,000 names, of which about half are presently accepted species names, are now included, together with the names of some 3500 genera, about 3000 common names, approximately 700 pictures, and in excess of 28,000 literature references. URL-based links from a number of other databases including the Species 2000 Annual Check-list, BIOSIS, GenBank, and Codes for Australian Aquatic Biota have been implemented. It is intended to initiate similar connections from new initiatives such as EuroCat and SPICE, and a number of other global biodiversity databases. As part of a further EU-funded project, SeaweedAfrica ( http://www.seaweedafrica.org ), AlgaeBase is being completely rewritten as an SQL database with a browser-enabled interface, enabling access by a panel of taxonomic experts. AlgaeBase hopes thereby to continue to provide high-quality access to community-serviced data in the best traditions of the Internet.  相似文献   

16.
Do LH  Bier E 《Bioinformation》2011,6(2):83-85
Redundancy among sequence identifiers is a recurring problem in bioinformatics. Here, we present a rapid and efficient method of fingerprinting identifiers to ascertain whether two or more aliases are identical. A number of tools and approaches have been developed to resolve differing names for the same genes and proteins, however, these methods each have their own limitations associated with their various goals. We have taken a different approach to the aliasing problem by simplifying the way aliases are stored and curated with the objective of simultaneously achieving speed and flexibility. Our approach (Booly-hashing) is to link identifiers with their corresponding hash keys derived from unique fingerprints such as gene or protein sequences. This tool has proven invaluable for designing a new data integration platform known as Booly, and has wide applicability to situations in which a dedicated efficient aliasing system is required. Compared with other aliasing techniques, Booly-hashing methodology provides 1) reduced run time complexity, 2) increased flexibility (aliasing of other data types, e.g. pharmaceutical drugs), 3) no required assumptions regarding gene clusters or hierarchies, and 4) simplicity in data addition, updating, and maintenance. The new Booly-hashing aliasing model has been incorporated as a central component of the Booly data integration platform we have recently developed and shoud be broadly applicable to other situations in which an efficient streamlined aliasing systems is required. This aliasing tool and database, which allows users to quickly group the same genes and proteins together can be accessed at: http://booly.ucsd.edu/alias. AVAILABILITY: The database is available for free at http://booly.ucsd.edu/alias.  相似文献   

17.
海洋生物多样性甚高, 但却饱受人为的破坏及干扰。目前全球最大的含点位数据的在线开放性数据库是海洋生物地理信息系统(OBIS), 共约12万种3,700万笔资料; 另一个较大的数据库世界海洋生物物种登录(WoRMS)已收集全球22万种海洋生物之物种分类信息。除此之外, 以海洋生物为主的单一类群的数据库只有鱼库(FishBase)、藻库(AlgaeBase)及世界六放珊瑚(Hexacorallians of the World)3个。跨类群及跨陆海域的全球性物种数据库则甚多, 如网络生命大百科(EOL)、全球生物物种名录(CoL)、整合分类信息系统(ITIS)、维基物种(Wikispecies)、ETI生物信息(ETI Bioinformatics)、生命条形码(BOL)、基因库(GenBank)、生物多样性历史文献图书馆(BHL)、海洋生物库(SeaLifeBase); 海洋物种鉴定入口网(Marine Species Identification Portal)、FAO渔业及水产养殖概要(FAO Fisheries and Aquaculture Fact Sheets)等可查询以分类或物种解说为主的数据库。全球生物多样性信息网络(GBIF)、发现生命(Discover Life)、水生物图库(AquaMaps)等则是以生态分布数据为主, 且可作地理分布图并提供下载功能, 甚至于可以改变水温、盐度等环境因子的参数值, 利用既定的模式作参数改变后之物种分布预测。谷歌地球(Google Earth)及国家地理(National Geographic)网站中的海洋子网页, 以及珊瑚礁库(ReefBase)等官方机构或非政府组织之网站, 则大多以海洋保育的教育倡导为主, 所提供的信息及素材可谓包罗万象, 令人目不暇给。更令用户感到方便的是上述许多网站或数据库彼此间均已可交互链接及查询。另外, 属于搜索引擎的谷歌图片(Google Images)与谷歌学术(Google Scholar)透过海洋生物数据库所提供的直接链接, 在充实物种生态图片与学术论文上亦发挥极大帮助, 让用户获得丰富多样的信息。为了保育之目的, 生物多样性数据库除了整合与公开分享外, 还应鼓励并推荐大家来使用。本文乃举Rainer Froese在巴黎演讲之内容为例, 介绍如何使用海洋生物多样性之数据来预测气候变迁对鱼类分布的影响。最后就中国大陆与台湾目前海洋生物多样性数据库的现况、两岸的合作及如何与国际接轨作介绍。  相似文献   

18.
19.
The European marine fauna used to be considered to include 16 species of Discodoris sea slugs until a recent worldwide revision demonstrated that there is not a single Discodoris species in European waters. This exemplary case illustrates the fact that species checklists do not accurately represent biodiversity unless they are based on sound taxonomic work in which (1) the status of every available species name has been addressed, i.e. whether it is valid, synonymous, or of doubtful application, and (2) classification reflects phylogenetic relationships. It is argued that taxonomic revisions are critically needed, because the status of species names can only be addressed properly through revisions. It is discussed that fields which depend on taxonomic data, such as conservation biology and ecology, might be affected deeply if problematic species names (synonyms and nomina dubia) have not been recognized. Consequently, it is proposed that a taxon that has not been revised be red-flagged in checklists, so that non-taxonomists will know which species names should be applied with caution or not at all.  相似文献   

20.

Background

Multiple pathway databases are available that describe the human metabolic network and have proven their usefulness in many applications, ranging from the analysis and interpretation of high-throughput data to their use as a reference repository. However, so far the various human metabolic networks described by these databases have not been systematically compared and contrasted, nor has the extent to which they differ been quantified. For a researcher using these databases for particular analyses of human metabolism, it is crucial to know the extent of the differences in content and their underlying causes. Moreover, the outcomes of such a comparison are important for ongoing integration efforts.

Results

We compared the genes, EC numbers and reactions of five frequently used human metabolic pathway databases. The overlap is surprisingly low, especially on reaction level, where the databases agree on 3% of the 6968 reactions they have combined. Even for the well-established tricarboxylic acid cycle the databases agree on only 5 out of the 30 reactions in total. We identified the main causes for the lack of overlap. Importantly, the databases are partly complementary. Other explanations include the number of steps a conversion is described in and the number of possible alternative substrates listed. Missing metabolite identifiers and ambiguous names for metabolites also affect the comparison.

Conclusions

Our results show that each of the five networks compared provides us with a valuable piece of the puzzle of the complete reconstruction of the human metabolic network. To enable integration of the networks, next to a need for standardizing the metabolite names and identifiers, the conceptual differences between the databases should be resolved. Considerable manual intervention is required to reach the ultimate goal of a unified and biologically accurate model for studying the systems biology of human metabolism. Our comparison provides a stepping stone for such an endeavor.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号