首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Data support knowledge development and theory advances in ecology and evolution. We are increasingly reusing data within our teams and projects and through the global, openly archived datasets of others. Metadata can be challenging to write and interpret, but it is always crucial for reuse. The value metadata cannot be overstated—even as a relatively independent research object because it describes the work that has been done in a structured format. We advance a new perspective and classify methods for metadata curation and development with tables. Tables with templates can be effectively used to capture all components of an experiment or project in a single, easy‐to‐read file familiar to most scientists. If coupled with the R programming language, metadata from tables can then be rapidly and reproducibly converted to publication formats including extensible markup language files suitable for data repositories. Tables can also be used to summarize existing metadata and store metadata across many datasets. A case study is provided and the added benefits of tables for metadata, a priori, are developed to ensure a more streamlined publishing process for many data repositories used in ecology, evolution, and the environmental sciences. In ecology and evolution, researchers are often highly tabular thinkers from experimental data collection in the lab and/or field, and representations of metadata as a table will provide novel research and reuse insights.  相似文献   

2.
3.
4.

Background

The Sequence Read Archive (SRA) is the largest public repository of sequencing data from the next generation of sequencing platforms including Illumina (Genome Analyzer, HiSeq, MiSeq, .etc), Roche 454 GS System, Applied Biosystems SOLiD System, Helicos Heliscope, PacBio RS, and others.

Results

SRAdb is an attempt to make queries of the metadata associated with SRA submission, study, sample, experiment and run more robust and precise, and make access to sequencing data in the SRA easier. We have parsed all the SRA metadata into a SQLite database that is routinely updated and can be easily distributed. The SRAdb R/Bioconductor package then utilizes this SQLite database for querying and accessing metadata. Full text search functionality makes querying metadata very flexible and powerful. Fastq files associated with query results can be downloaded easily for local analysis. The package also includes an interface from R to a popular genome browser, the Integrated Genomics Viewer.

Conclusions

SRAdb Bioconductor package provides a convenient and integrated framework to query and access SRA metadata quickly and powerfully from within R.  相似文献   

5.
Existing on-line databases for dendrochronology are not flexible in terms of user permissions, tree-ring data formats, metadata administration and language. This is why we developed the Digital Collaboratory for Cultural Dendrochronology (DCCD). This TRiDaS-based multi-lingual database allows users to control data access, to perform queries, to upload and download (meta)data in a variety of digital formats, and to edit metadata on line. The content of the DCCD conforms to EU best practices regarding the long-term preservation of digital research data.  相似文献   

6.
7.
采用及时、可靠的方法对物种开展有效监测是生物多样性保护的基础。红外相机技术可以获得兽类物种的影像、元数据和分布信息, 是监测生物多样性的有效途径。这项技术在野外便于部署, 规程易于标准化, 可提供野生动物凭证标本(影像)以及物种拍摄位置、拍摄日期与时间、拍摄细节(相机型号等)等附属信息。这些特性使得我们可以积累数以百万计的影像资料和野生动物监测数据。在中国, 红外相机技术已得到广泛应用, 众多机构正在使用红外相机采集并存储野生动物影像以及相关联的元数据。目前, 亟需对红外相机元数据结构进行标准化, 以促进不同机构之间以及与外部保护团体之间的数据共享。迄今全球已建立有数个国际数据共享平台, 例如Wildlife Insights, 但他们离不开与中国的合作, 以有效追踪全球可持续发展的进程。达成这样的合作需要3个基础: 共同的数据标准、数据共享协议和数据禁用政策。我们倡议, 中国保护领域的政府主管部门、机构团体一起合作, 共同制定在国内单位之间以及与国际机构之间共享监测数据的政策、机制与途径。  相似文献   

8.
9.
The analysis of genomic data can be an intimidating process, particularly for researchers who are not experienced programmers. Commonly used analyses are spread across many programs, each requiring their own specific input formats, and so data must often be repeatedly reorganized and transformed into new formats. Analyses often require splitting data according to metadata variables such as population or family, which can be challenging to manage in large data sets. Here, we introduce snpR, a user-friendly data analysis package in R for processing SNP genomic data. snpR is designed to automate data subsetting and analyses across categorical metadata while also streamlining repeated analyses by integrating approaches contained in many different packages in a single ecosystem. snpR facilitates iterative and efficient analyses centred on a single R object for an entire analysis pipeline.  相似文献   

10.
11.
12.
SUMMARY: The R package HCGene (Hierarchical Classification of Genes) implements methods to process and analyze the Gene Ontology and the FunCat taxonomy in order to support the functional classification of genes. HCGene allows the extraction of subgraphs and subtrees related to specific biological problems, the labeling of genes and gene products with multiple and hierarchical functional classes, and the association of different types of bio-molecular data to genes for learning to predict their functions. AVAILABILITY: http://homes.dsi.unimi.it/~valenti/SW/hcgene/download/hcgene_1.0.tar.gz.  相似文献   

13.
The NCBI Gene Expression Omnibus (GEO) represents the largest public repository of microarray data. However, finding data in GEO can be challenging. We have developed GEOmetadb in an attempt to make querying the GEO metadata both easier and more powerful. All GEO metadata records as well as the relationships between them are parsed and stored in a local MySQL database. A powerful, flexible web search interface with several convenient utilities provides query capabilities not available via NCBI tools. In addition, a Bioconductor package, GEOmetadb that utilizes a SQLite export of the entire GEOmetadb database is also available, rendering the entire GEO database accessible with full power of SQL-based queries from within R. AVAILABILITY: The web interface and SQLite databases available at http://gbnci.abcc.ncifcrf.gov/geo/. The Bioconductor package is available via the Bioconductor project. The corresponding MATLAB implementation is also available at the same website.  相似文献   

14.
15.
Celsius: a community resource for Affymetrix microarray data   总被引:1,自引:1,他引:0  
  相似文献   

16.
beadarray: R classes and methods for Illumina bead-based data   总被引:2,自引:0,他引:2  
The R/Bioconductor package beadarray allows raw data from Illumina experiments to be read and stored in convenient R classes. Users are free to choose between various methods of image processing, background correction and normalization in their analysis rather than using the defaults in Illumina's; proprietary software. The package also allows quality assessment to be carried out on the raw data. The data can then be summarized and stored in a format which can be used by other R/Bioconductor packages to perform downstream analyses. Summarized data processed by Illumina's; BeadStudio software can also be read and analysed in the same manner. Availability: The beadarray package is available from the Bioconductor web page at www.bioconductor.org. A user's guide and example data sets are provided with the package.  相似文献   

17.
sam βada is a genome–environment association software, designed to search for signatures of local adaptation. However, pre‐ and postprocessing of data can be labour‐intensive, preventing wider uptake of the method. We have now developed R.SamBada, an r ‐package providing a pipeline for landscape genomic analysis based on sam βada , spanning from the retrieval of environmental conditions at sampling locations to gene annotation using the Ensembl genome browser. As a result, R.SamBada standardizes the landscape genomics pipeline and eases the search for candidate genes of local adaptation, enhancing reproducibility of landscape genomic studies. The efficiency and power of the pipeline is illustrated using two examples: sheep populations from Morocco with no evident population structure and Lidia cattle from Spain displaying population substructuring. In both cases, R.SamBada enabled rapid identification and interpretation of candidate genes, which are further discussed in the light of local adaptation. The package is available in the r CRAN package repository and on GitHub (github.com/SolangeD/R.SamBada).  相似文献   

18.
19.
董仁才  王韬  张永霖  张雪琦  李欢欢 《生态学报》2018,38(11):3775-3783
在我国大力推动城市可持续发展,推进国家可持续发展实验区建设的同时,采用何种评估方法和数据开展城市可持续发展能力评估是需要重点解决的问题。近年来兴起的元数据理论与技术在解决评估数据质量控制方面被视为是一种行之有效的方法。针对我国现阶段使用的一些城市可持续发展能力评估指标体系的特点,通过深入剖析每一个指标数据的来源、获取手段、适用方法等特征,提出从软件工程学思路研发城市可持续发展能力评估元数据管理系统的具体方法,帮助可持续发展实验区高效获取和管理评估所需数据信息;以"十二五"科技支撑计划项目"城市可持续发展能力评估及信息管理关键技术研究与示范"中所建立的元数据规范,对其所包含的"数据发布日期"、"数据发布形式"、"空间范围"、"时间范围(起始时间、结束时间)"、"统计频率"、"数据安全限制分级"、"数据志说明"、"在线资源链接地址"和"数据统计单位信息(单位名称、联络人、联系电话、单位地址、邮件地址)"共14项为评估数据的关键元数据项,以此追踪对标的评估数据。并通过量化数据质量评分法针对数据质量在运用元数据追踪法前后的评价结果对比发现,被评估指标的数据质量在获得元数据支持时,其数据可靠性、可比性和可持续性方面的评价分值都获得了十分显著的改善。研究认为采用元数据理论在控制和保障城市可持续发展能力评估数据质量方面具有优势作用,开发有针对性的城市可持续发展能力评估元数据管理系统能够有效提高评估数据的综合评价结果。  相似文献   

20.
  1. Metadata plays an essential role in the long‐term preservation, reuse, and interoperability of data. Nevertheless, creating useful metadata can be sufficiently difficult and weakly enough incentivized that many datasets may be accompanied by little or no metadata. One key challenge is, therefore, how to make metadata creation easier and more valuable. We present a solution that involves creating domain‐specific metadata schemes that are as complex as necessary and as simple as possible. These goals are achieved by co‐development between a metadata expert and the researchers (i.e., the data creators). The final product is a bespoke metadata scheme into which researchers can enter information (and validate it) via the simplest of interfaces: a web browser application and a spreadsheet.
  2. We provide the R package dmdScheme (dmdScheme: An R package for working with domain specific MetaData schemes (Version v0.9.22), 2019) for creating a template domain‐specific scheme. We describe how to create a domain‐specific scheme from this template, including the iterative co‐development process, and the simple methods for using the scheme, and simple methods for quality assessment, improvement, and validation.
  3. The process of developing a metadata scheme following the outlined approach was successful, resulting in a metadata scheme which is used for the data generated in our research group. The validation quickly identifies forgotten metadata, as well as inconsistent metadata, therefore improving the quality of the metadata. Multiple output formats are available, including XML.
  4. Making the provision of metadata easier while also ensuring high quality must be a priority for data curation initiatives. We show how both objectives are achieved by close collaboration between metadata experts and researchers to create domain‐specific schemes. A near‐future priority is to provide methods to interface domain‐specific schemes with general metadata schemes, such as the Ecological Metadata Language, to increase interoperability.

The article describes a methodology to develop, enter, and validate domain specific metadata schemes which is suitable to be used by nonmetadata specialists. The approach uses an R package which forms the backend of the processing of the metadata, uses spreadsheets to enter the metadata, and provides a server based approach to distribute and use the developed metadata schemes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号