首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
基于古生物学的研究现状和各大科研机构对化石标本的收藏情况,提出古生物化石标本元数据标准。古生物化石标本元数据标准包含十项元数据元素以及一项元数据实体,即,标题、数据标识符、许可证标识符、关键词、描述、化石标本参数、数据链接地址、创建者、创建时间和访问限制,以及作为元数据实体的数据提交机构。十项元数据元素之一的化石标本参数,又可分为古生物化石标本的实物信息、古生物系统分类信息、地层信息和高维信息。建立古生物化石标本元数据标准旨在于完善古生物化石标本数据的规范化与标准化、利于化石标本数据的整理、汇交,跨平台操作、检索、共享和使用,促进数据驱动的古生物学与地层学新范式研究变革。  相似文献   

2.
3.
In ecological sciences, the role of metadata (i.e. key information about a dataset) to make existing datasets visible and discoverable has become increasingly important. Within the EU-funded WISER project (Water bodies in Europe: Integrative Systems to assess Ecological status and Recovery), we designed a metadatabase to allow scientists to find the optimal data for their analyses. An online questionnaire helped to collect metadata from the data providers and an online query tool (http://www.wiser.eu/results/meta-database/) facilitated data evaluation. The WISER metadatabase currently holds information on 114 datasets (22 river, 71 lake, 1 general freshwater and 20 coastal/transitional datasets), which also can be accessed by external scientists. We evaluate if generally used metadata standards (e.g. Darwin Core, ISO 19115, CSDGM, EML) are suitable for such specific purposes as WISER and suggest at least the linkage with standard metadata fields. Furthermore, we discuss whether the simple metadata documentation is enough for others to reuse a dataset and why there is still reluctance to publish both metadata and primary research data (i.e. time and financial constraints, misuse of data, abandoning intellectual property rights). We emphasise that metadata publication has major advantages as it makes datasets detectable by other scientists and generally makes a scientist’s work more visible.  相似文献   

4.
The Genomic Standards Consortium (GSC) invited a representative of the Long-Term Ecological Research (LTER) to its fifth workshop to present the Ecological Metadata Language (EML) metadata standard and its relationship to the Minimum Information about a Genome/Metagenome Sequence (MIGS/MIMS) and its implementation, the Genomic Contextual Data Markup Language (GCDML). The LTER is one of the top National Science Foundation (NSF) programs in biology since 1980, representing diverse ecosystems and creating long-term, interdisciplinary research, synthesis of information, and theory. The adoption of EML as the LTER network standard has been key to build network synthesis architectures based on high-quality standardized metadata. EML is the NSF-recognized metadata standard for LTER, and EML is a criteria used to review the LTER program progress. At the workshop, a potential crosswalk between the GCDML and EML was explored. Also, collaboration between the LTER and GSC developers was proposed to join efforts toward a common metadata cataloging designer's tool. The community adoption success of a metadata standard depends, among other factors, on the tools and trainings developed to use the standard. LTER's experience in embracing EML may help GSC to achieve similar success. A possible collaboration between LTER and GSC to provide training opportunities for GCDML and the associated tools is being explored. Finally, LTER is investigating EML enhancements to better accommodate genomics data, possibly integrating the GCDML schema into EML. All these action items have been accepted by the LTER contingent, and further collaboration between the GSC and LTER is expected.  相似文献   

5.
董仁才  王韬  张永霖  张雪琦  李欢欢 《生态学报》2018,38(11):3775-3783
在我国大力推动城市可持续发展,推进国家可持续发展实验区建设的同时,采用何种评估方法和数据开展城市可持续发展能力评估是需要重点解决的问题。近年来兴起的元数据理论与技术在解决评估数据质量控制方面被视为是一种行之有效的方法。针对我国现阶段使用的一些城市可持续发展能力评估指标体系的特点,通过深入剖析每一个指标数据的来源、获取手段、适用方法等特征,提出从软件工程学思路研发城市可持续发展能力评估元数据管理系统的具体方法,帮助可持续发展实验区高效获取和管理评估所需数据信息;以"十二五"科技支撑计划项目"城市可持续发展能力评估及信息管理关键技术研究与示范"中所建立的元数据规范,对其所包含的"数据发布日期"、"数据发布形式"、"空间范围"、"时间范围(起始时间、结束时间)"、"统计频率"、"数据安全限制分级"、"数据志说明"、"在线资源链接地址"和"数据统计单位信息(单位名称、联络人、联系电话、单位地址、邮件地址)"共14项为评估数据的关键元数据项,以此追踪对标的评估数据。并通过量化数据质量评分法针对数据质量在运用元数据追踪法前后的评价结果对比发现,被评估指标的数据质量在获得元数据支持时,其数据可靠性、可比性和可持续性方面的评价分值都获得了十分显著的改善。研究认为采用元数据理论在控制和保障城市可持续发展能力评估数据质量方面具有优势作用,开发有针对性的城市可持续发展能力评估元数据管理系统能够有效提高评估数据的综合评价结果。  相似文献   

6.
7.

Background  

Flow cytometry technology is widely used in both health care and research. The rapid expansion of flow cytometry applications has outpaced the development of data storage and analysis tools. Collaborative efforts being taken to eliminate this gap include building common vocabularies and ontologies, designing generic data models, and defining data exchange formats. The Minimum Information about a Flow Cytometry Experiment (MIFlowCyt) standard was recently adopted by the International Society for Advancement of Cytometry. This standard guides researchers on the information that should be included in peer reviewed publications, but it is insufficient for data exchange and integration between computational systems. The Functional Genomics Experiment (FuGE) formalizes common aspects of comprehensive and high throughput experiments across different biological technologies. We have extended FuGE object model to accommodate flow cytometry data and metadata.  相似文献   

8.
9.
Biodiversity metadata provide service to query, management and use of actual data sets. The progress of the development of metadata standards in China was analyzed, and metadata required and/or produced based on the Convention on Biological Diversity were reviewed. A biodiversity metadata standard was developed based on the characteristics of biodiversity data and in line with the framework of international metadata standards. The content of biodiversity metadata is divided into two levels. The first level consists of metadata entities and elements that are necessary to exclusively identify a biodiversity data set, and is named as Core Metadata. The second level comprises metadata entities and elements that are necessary to describe all aspects of a biodiversity data set. The standard for core biodiversity metadata is presented in this paper, which is composed of 51 elements belonging to 6 categories (entities), i.e. inventory information, collection information, information on the content of the data set, management information, access information, and metadata management information. The name, definition, condition, data type, and field length of metadata elements in these six categories (entities) are also described.  相似文献   

10.
The rise of smartphones and web services made possible the large-scale collection of personal metadata. Information about individuals'' location, phone call logs, or web-searches, is collected and used intensively by organizations and big data researchers. Metadata has however yet to realize its full potential. Privacy and legal concerns, as well as the lack of technical solutions for personal metadata management is preventing metadata from being shared and reconciled under the control of the individual. This lack of access and control is furthermore fueling growing concerns, as it prevents individuals from understanding and managing the risks associated with the collection and use of their data. Our contribution is two-fold: (1) we describe openPDS, a personal metadata management framework that allows individuals to collect, store, and give fine-grained access to their metadata to third parties. It has been implemented in two field studies; (2) we introduce and analyze SafeAnswers, a new and practical way of protecting the privacy of metadata at an individual level. SafeAnswers turns a hard anonymization problem into a more tractable security one. It allows services to ask questions whose answers are calculated against the metadata instead of trying to anonymize individuals'' metadata. The dimensionality of the data shared with the services is reduced from high-dimensional metadata to low-dimensional answers that are less likely to be re-identifiable and to contain sensitive information. These answers can then be directly shared individually or in aggregate. openPDS and SafeAnswers provide a new way of dynamically protecting personal metadata, thereby supporting the creation of smart data-driven services and data science research.  相似文献   

11.
The Feeding Experiments End-user Database (FEED) is a research tool developed by the Mammalian Feeding Working Group at the National Evolutionary Synthesis Center that permits synthetic, evolutionary analyses of the physiology of mammalian feeding. The tasks of the Working Group are to compile physiologic data sets into a uniform digital format stored at a central source, develop a standardized terminology for describing and organizing the data, and carry out a set of novel analyses using FEED. FEED contains raw physiologic data linked to extensive metadata. It serves as an archive for a large number of existing data sets and a repository for future data sets. The metadata are stored as text and images that describe experimental protocols, research subjects, and anatomical information. The metadata incorporate controlled vocabularies to allow consistent use of the terms used to describe and organize the physiologic data. The planned analyses address long-standing questions concerning the phylogenetic distribution of phenotypes involving muscle anatomy and feeding physiology among mammals, the presence and nature of motor pattern conservation in the mammalian feeding muscles, and the extent to which suckling constrains the evolution of feeding behavior in adult mammals. We expect FEED to be a growing digital archive that will facilitate new research into understanding the evolution of feeding anatomy.  相似文献   

12.
Data support knowledge development and theory advances in ecology and evolution. We are increasingly reusing data within our teams and projects and through the global, openly archived datasets of others. Metadata can be challenging to write and interpret, but it is always crucial for reuse. The value metadata cannot be overstated—even as a relatively independent research object because it describes the work that has been done in a structured format. We advance a new perspective and classify methods for metadata curation and development with tables. Tables with templates can be effectively used to capture all components of an experiment or project in a single, easy‐to‐read file familiar to most scientists. If coupled with the R programming language, metadata from tables can then be rapidly and reproducibly converted to publication formats including extensible markup language files suitable for data repositories. Tables can also be used to summarize existing metadata and store metadata across many datasets. A case study is provided and the added benefits of tables for metadata, a priori, are developed to ensure a more streamlined publishing process for many data repositories used in ecology, evolution, and the environmental sciences. In ecology and evolution, researchers are often highly tabular thinkers from experimental data collection in the lab and/or field, and representations of metadata as a table will provide novel research and reuse insights.  相似文献   

13.
  1. Metadata plays an essential role in the long‐term preservation, reuse, and interoperability of data. Nevertheless, creating useful metadata can be sufficiently difficult and weakly enough incentivized that many datasets may be accompanied by little or no metadata. One key challenge is, therefore, how to make metadata creation easier and more valuable. We present a solution that involves creating domain‐specific metadata schemes that are as complex as necessary and as simple as possible. These goals are achieved by co‐development between a metadata expert and the researchers (i.e., the data creators). The final product is a bespoke metadata scheme into which researchers can enter information (and validate it) via the simplest of interfaces: a web browser application and a spreadsheet.
  2. We provide the R package dmdScheme (dmdScheme: An R package for working with domain specific MetaData schemes (Version v0.9.22), 2019) for creating a template domain‐specific scheme. We describe how to create a domain‐specific scheme from this template, including the iterative co‐development process, and the simple methods for using the scheme, and simple methods for quality assessment, improvement, and validation.
  3. The process of developing a metadata scheme following the outlined approach was successful, resulting in a metadata scheme which is used for the data generated in our research group. The validation quickly identifies forgotten metadata, as well as inconsistent metadata, therefore improving the quality of the metadata. Multiple output formats are available, including XML.
  4. Making the provision of metadata easier while also ensuring high quality must be a priority for data curation initiatives. We show how both objectives are achieved by close collaboration between metadata experts and researchers to create domain‐specific schemes. A near‐future priority is to provide methods to interface domain‐specific schemes with general metadata schemes, such as the Ecological Metadata Language, to increase interoperability.

The article describes a methodology to develop, enter, and validate domain specific metadata schemes which is suitable to be used by nonmetadata specialists. The approach uses an R package which forms the backend of the processing of the metadata, uses spreadsheets to enter the metadata, and provides a server based approach to distribute and use the developed metadata schemes.  相似文献   

14.
Many communities use standard, structured documentation that is machine-readable, i.e. metadata, to make discovery, access, use, and understanding of scientific datasets possible. Organizations and communities have also developed recommendations for metadata content that is required or suggested for their data developers and users. These recommendations are typically specific to metadata representations (dialects) used by the community. By considering the conceptual content of the recommendations, quantitative analysis and comparison of the completeness of multiple metadata dialects becomes possible. This is a study of completeness of EML and CSDGM metadata records from DataONE in terms of the LTER recommendation for Completeness. The goal of the study is to quantitatively measure completeness of metadata records and to determine if metadata developed by LTER is more complete with respect to the recommendation than other collections in EML and in CSDGM. We conclude that the LTER records are broadly more complete than the other EML collections, but similar in completeness to the CSDGM collections.  相似文献   

15.
We have developed a proteome database (DB), BiomarkerDigger ( http://biomarkerdigger.org ) that automates data analysis, searching, and metadata‐gathering function. The metadata‐gathering function searches proteome DBs for protein–protein interaction, Gene Ontology, protein domain, Online Mendelian Inheritance in Man, and tissue expression profile information and integrates it into protein data sets that are accessed through a search function in BiomarkerDigger. This DB also facilitates cross‐proteome comparisons by classifying proteins based on their annotation. BiomarkerDigger highlights relationships between a given protein in a proteomic data set and any known biomarkers or biomarker candidates. The newly developed BiomarkerDigger system is useful for multi‐level synthesis, comparison, and analyses of data sets obtained from currently available web sources. We demonstrate the application of this resource to the identification of a serological biomarker for hepatocellular carcinoma by comparison of plasma and tissue proteomic data sets from healthy volunteers and cancer patients.  相似文献   

16.
17.
采用及时、可靠的方法对物种开展有效监测是生物多样性保护的基础。红外相机技术可以获得兽类物种的影像、元数据和分布信息, 是监测生物多样性的有效途径。这项技术在野外便于部署, 规程易于标准化, 可提供野生动物凭证标本(影像)以及物种拍摄位置、拍摄日期与时间、拍摄细节(相机型号等)等附属信息。这些特性使得我们可以积累数以百万计的影像资料和野生动物监测数据。在中国, 红外相机技术已得到广泛应用, 众多机构正在使用红外相机采集并存储野生动物影像以及相关联的元数据。目前, 亟需对红外相机元数据结构进行标准化, 以促进不同机构之间以及与外部保护团体之间的数据共享。迄今全球已建立有数个国际数据共享平台, 例如Wildlife Insights, 但他们离不开与中国的合作, 以有效追踪全球可持续发展的进程。达成这样的合作需要3个基础: 共同的数据标准、数据共享协议和数据禁用政策。我们倡议, 中国保护领域的政府主管部门、机构团体一起合作, 共同制定在国内单位之间以及与国际机构之间共享监测数据的政策、机制与途径。  相似文献   

18.
19.

Background

Sharing of epidemiological and clinical data sets among researchers is poor at best, in detriment of science and community at large. The purpose of this paper is therefore to (1) describe a novel Web application designed to share information on study data sets focusing on epidemiological clinical research in a collaborative environment and (2) create a policy model placing this collaborative environment into the current scientific social context.

Methodology

The Database of Databases application was developed based on feedback from epidemiologists and clinical researchers requiring a Web-based platform that would allow for sharing of information about epidemiological and clinical study data sets in a collaborative environment. This platform should ensure that researchers can modify the information. A Model-based predictions of number of publications and funding resulting from combinations of different policy implementation strategies (for metadata and data sharing) were generated using System Dynamics modeling.

Principal Findings

The application allows researchers to easily upload information about clinical study data sets, which is searchable and modifiable by other users in a wiki environment. All modifications are filtered by the database principal investigator in order to maintain quality control. The application has been extensively tested and currently contains 130 clinical study data sets from the United States, Australia, China and Singapore. Model results indicated that any policy implementation would be better than the current strategy, that metadata sharing is better than data-sharing, and that combined policies achieve the best results in terms of publications.

Conclusions

Based on our empirical observations and resulting model, the social network environment surrounding the application can assist epidemiologists and clinical researchers contribute and search for metadata in a collaborative environment, thus potentially facilitating collaboration efforts among research communities distributed around the globe.  相似文献   

20.

Background

A fundamental characteristic of multicellular organisms is the specialization of functional cell types through the process of differentiation. These specialized cell types not only characterize the normal functioning of different organs and tissues, they can also be used as cellular biomarkers of a variety of different disease states and therapeutic/vaccine responses. In order to serve as a reference for cell type representation, the Cell Ontology has been developed to provide a standard nomenclature of defined cell types for comparative analysis and biomarker discovery. Historically, these cell types have been defined based on unique cellular shapes and structures, anatomic locations, and marker protein expression. However, we are now experiencing a revolution in cellular characterization resulting from the application of new high-throughput, high-content cytometry and sequencing technologies. The resulting explosion in the number of distinct cell types being identified is challenging the current paradigm for cell type definition in the Cell Ontology.

Results

In this paper, we provide examples of state-of-the-art cellular biomarker characterization using high-content cytometry and single cell RNA sequencing, and present strategies for standardized cell type representations based on the data outputs from these cutting-edge technologies, including “context annotations” in the form of standardized experiment metadata about the specimen source analyzed and marker genes that serve as the most useful features in machine learning-based cell type classification models. We also propose a statistical strategy for comparing new experiment data to these standardized cell type representations.

Conclusion

The advent of high-throughput/high-content single cell technologies is leading to an explosion in the number of distinct cell types being identified. It will be critical for the bioinformatics community to develop and adopt data standard conventions that will be compatible with these new technologies and support the data representation needs of the research community. The proposals enumerated here will serve as a useful starting point to address these challenges.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号