首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The validity of material flow analyses (MFAs) depends on the available information base, that is, the quality and quantity of available data. MFA data are cross‐disciplinary, can have varying formats and qualities, and originate from heterogeneous sources, such as official statistics, scientific models, or expert estimations. Statistical methods for data evaluation are most often inadequate, because MFA data are typically isolated values rather than extensive data sets. In consideration of the properties of MFA data, a data characterization framework for MFA is presented. It consists of an MFA data terminology, a data characterization matrix, and a procedure for database analysis. The framework facilitates systematic data characterization by cell‐level tagging of data with data attributes. Data attributes represent data characteristics and metainformation regarding statistical properties, meaning, origination, and application of the data. The data characterization framework is illustrated in a case study of a national phosphorus budget. This work furthers understanding of the information basis of material flow systems, promotes the transparent documentation and precise communication of MFA input data, and can be the foundation for better data interpretation and comprehensive data quality evaluation.  相似文献   

2.
生物多样性数据共享和发表: 进展和建议   总被引:1,自引:0,他引:1  
生物多样性研究、保护实践、自然资源管理及科学决策等越来越依赖于大量数据的共享和整合。虽然关于数据共享的呼吁和实践越来越多, 但很多科学家仍然主动或被动地拒绝共享数据。关于数据共享, 现实中存在一些认知和技术上的障碍, 比如科学家不愿意共享数据, 担心同行竞争, 认为缺少足够的回报, 不熟悉相关数据保存机构, 缺少简便的数据提交工具, 没有足够时间和经费等。解决这些问题及改善共享文化的关键在于使共享者获得适当的回报(比如数据引用)。基于同行评审的数据发表被认为不但能够为生产、管理和共享数据的科学家提供一种激励机制, 并且能够有效地促进数据再利用。因而, 数据发表作为数据共享的方式之一, 近来引起了较多关注, 在生物多样性领域出现了专门发表数据论文的期刊。在采取数据论文的模式上, 数据保存机构和科技期刊采用联合数据政策在促进数据共享方面可能更具可行性。本文总结了数据共享和发表方面的进展, 讨论了数据论文能在何种程度上促进数据共享, 以及数据共享和数据发表的关系等问题, 提出如下建议: (1)个体科学家应努力践行数据共享; (2)使用DOI号解决数据所有权和数据引用的问题; (3)科技期刊和数据保存机构联合采用更加合理和严格的数据保存政策; (4)资助机构和研究单位应当在数据共享中起到更重要的作用。  相似文献   

3.
生态环境大数据面临的机遇与挑战   总被引:2,自引:0,他引:2  
刘丽香  张丽云  赵芬  赵苗苗  赵海凤  邵蕊  徐明 《生态学报》2017,37(14):4896-4904
随着大数据时代的到来和大数据技术的迅猛发展,生态环境大数据的建设和应用已初露端倪。为了全面推进生态环境大数据的建设和应用,综述了生态环境大数据在解决生态环境问题中的机遇和优势,并分析了生态环境大数据在应用中所面临的挑战。总结和概括了大数据的概念与特征,又结合生态环境领域的特点,分析了生态环境大数据的特殊性和复杂性。重点阐述了生态环境大数据在减缓环境污染、生态退化和气候变化中的机遇,主要从数据存储、处理、分析、解释和展示等方面阐述生态环境大数据相较于传统数据的优势,通过这些优势说明生态环境大数据将有助于全面提高生态环境治理的综合决策水平。虽然生态环境大数据的应用前景广阔,但也面临着重重挑战,在数据共享和开放、应用创新、数据管理、技术创新和落地、专业人才培养和资金投入等方面还存在着许多问题和困难。在以上分析的基础上,提出了生态环境大数据未来的发展方向,包括各类生态环境数据的标准化、建设生态环境大数据存储与处理分析平台和推动国内外生态环境大数据平台的对接。  相似文献   

4.
Challenges in using land use and land cover data for global change studies   总被引:5,自引:0,他引:5  
Land use and land cover data play a central role in climate change assessments. These data originate from different sources and inventory techniques. Each source of land use/cover data has its own domain of applicability and quality standards. Often data are selected without explicitly considering the suitability of the data for the specific application, the bias originating from data inventory and aggregation, and the effects of the uncertainty in the data on the results of the assessment. Uncertainties due to data selection and handling can be in the same order of magnitude as uncertainties related to the representation of the processes under investigation. While acknowledging the differences in data sources and the causes of inconsistencies, several methods have been developed to optimally extract information from the data and document the uncertainties. These methods include data integration, improved validation techniques and harmonization of classification systems. Based on the data needs of global change studies and the data availability, recommendations are formulated aimed at optimal use of current data and focused efforts for additional data collection. These include: improved documentation using classification systems for land use/cover data; careful selection of data given the specific application and the use of appropriate scaling and aggregation methods. In addition, the data availability may be improved by the combination of different data sources to optimize information content while collection of additional data must focus on validation of available data sets and improved coverage of regions and land cover types with a high level of uncertainty. Specific attention in data collection should be given to the representation of land management (systems) and mosaic landscapes.  相似文献   

5.
MOTIVATION: The methods for analyzing overlap data are distinct from those for analyzing probe data, making integration of the two forms awkward. Conversion of overlap data to probe-like data elements would facilitate comparison and uniform integration of overlap data and probe data using software developed for analysis of STS data. RESULTS: We show that overlap data can be effectively converted to probe-like data elements by extracting maximal sets of mutually overlapping clones. We call these sets virtual probes, since each set determines a site in the genome corresponding to the region which is common among the clones of the set. Finding the virtual probes is equivalent to finding the maximal cliques of a graph. We modify a known maximal-clique algorithm such that it finds all virtual probes in a large dataset within minutes. We illustrate the algorithm by converting fingerprint and Alu-PCR overlap data to virtual probes. The virtual probes are then analyzed using double-linkage intersection graphs and structure graphs to show that methods designed for STS data are also applicable to overlap data represented as virtual probes. Next we show that virtual probes can produce a uniform integration of different kinds of mapping data, in particular STS probe data and fingerprint and Alu-PCR overlap data. The integrated virtual probes produce longer double-linkage contigs than STS probes alone, and in conjunction with structure graphs they facilitate the identification and elimination of anomalies. Thus, the virtual-probe technique provides: (i) a new way to examine overlap data; (ii) a basis on which to compare overlap data and probe data using the same systems and standards; and (iii) a unique and useful way to uniformly integrate overlap data with probe data.  相似文献   

6.
In this paper, we propose a successive learning method in hetero-associative memories, such as Bidirectional Associative Memories and Multidirectional Associative Memories, using chaotic neural networks. It can distinguish unknown data from the stored known data and can learn the unknown data successively. The proposed model makes use of the difference in the response to the input data in order to distinguish unknown data from the stored known data. When input data is regarded as unknown data, it is memorized. Furthermore, the proposed model can estimate and learn correct data from noisy unknown data or incomplete unknown data by considering the temporal summation of the continuous data input. In addition, similarity to the physiological facts in the olfactory bulb of a rabbit found by Freeman are observed in the behavior of the proposed model. A series of computer simulations shows the effectiveness of the proposed model.  相似文献   

7.
8.
Data Quality     
A methodology is presented to develop and analyze vectors of data quality attribute scores. Each data quality vector component represents the quality of the data element for a specific attribute (e.g., age of data). Several methods for aggregating the components of data quality vectors to derive one data quality indicator (DQI) that represents the total quality associated with the input data element are presented with illustrative examples. The methods are compared and it is proven that the measure of central tendency, or arithmetic average, of the data quality vector components as a percentage of the total quality range attainable is an equivalent measure for the aggregate DQI. In addition, the methodology is applied and compared to realworld LCA data pedigree matrices. Finally, a method for aggregating weighted data quality vector attributes is developed and an illustrative example is presented. This methodology provides LCA practitioners with an approach to increase the precision of input data uncertainty assessments by selecting any number of data quality attributes with which to score the LCA inventory model input data. The resultant vector of data quality attributes can then be analyzed to develop one aggregate DQI for each input data element for use in stochastic LCA modeling.  相似文献   

9.
Effects of censoring on parameter estimates and power in genetic modeling.   总被引:5,自引:0,他引:5  
Genetic and environmental influences on variance in phenotypic traits may be estimated with normal theory Maximum Likelihood (ML). However, when the assumption of multivariate normality is not met, this method may result in biased parameter estimates and incorrect likelihood ratio tests. We simulated multivariate normal distributed twin data under the assumption of three different genetic models. Genetic model fitting was performed in six data sets: multivariate normal data, discrete uncensored data, censored data, square root transformed censored data, normal scores of censored data, and categorical data. Estimates were obtained with normal theory ML (data sets 1-5) and with categorical data analysis (data set 6). Statistical power was examined by fitting reduced models to the data. When fitting an ACE model to censored data, an unbiased estimate of the additive genetic effect was obtained. However, the common environmental effect was underestimated and the unique environmental effect was overestimated. Transformations did not remove this bias. When fitting an ADE model, the additive genetic effect was underestimated while the dominant and unique environmental effects were overestimated. In all models, the correct parameter estimates were recovered with categorical data analysis. However, with categorical data analysis, the statistical power decreased. The analysis of L-shaped distributed data with normal theory ML results in biased parameter estimates. Unbiased parameter estimates are obtained with categorical data analysis, but the power decreases.  相似文献   

10.
Analysis of repeatability in spotted cDNA microarrays   总被引:7,自引:3,他引:4  
We report a strategy for analysis of data quality in cDNA microarrays based on the repeatability of repeatedly spotted clones. We describe how repeatability can be used to control data quality by developing adaptive filtering criteria for microarray data containing clones spotted in multiple spots. We have applied the method on five publicly available cDNA microarray data sets and one previously unpublished data set from our own laboratory. The results demonstrate the feasibility of the approach as a foundation for data filtering, and indicate a high degree of variation in data quality, both across the data sets and between arrays within data sets.  相似文献   

11.
李俊洁  黄晓磊 《生物多样性》2016,24(12):1317-959
近年来有关科学数据共享的呼声越来越高, 基于同行评审的生物多样性数据论文也受到越来越多的关注, 并出现了一些专门发表数据论文的数据期刊。本文总结了近年来生物多样性数据发表方面的进展, 选择两本代表性数据期刊(Biodiversity Data JournalScientific Data), 分析了它们自创刊以来的发文数量、涉及生物类群、文章浏览量和被引次数等指标。结果显示两本数据期刊的发文量都呈稳步增长趋势, 其生物多样性数据论文覆盖了包括动物界、植物界、真菌界在内的众多生物类群, 文章浏览量和被引次数方面也有可喜的表现, 说明数据论文正在被越来越多的研究者所接受。对文章作者国别的分析则显示了不同地区的研究者在发表生物多样性数据论文或数据共享方面的不均衡。建议相关领域的中国研究者和期刊关注生物多样性数据论文和数据共享政策, 更多地践行数据共享。  相似文献   

12.
随着生物测序技术的快速发展,积累了海量的生物数据。生物数据资源作为生物分析研究及应用的核心和源头,为保证数据的正确性、可用性和安全性,对生物数据资源进行标准化的管理非常重要和迫切。本文综述了目前国内外生物数据标准化研制进展,目前国内外对生物数据缺少一个总体的规划,生物数据语义存在大量的不兼容性,数据格式多种多样,在生物数据收集、处理、存储和共享等方面缺乏统一的标准。国内外生物数据标准化处于起步阶段,但各国生物专家都在努力进行标准研制工作。文章最后从生物数据术语、生物数据资源收集、处理和交换、存储、生物数据库建设和生物数据伦理规范等方面出发,对标准研制工作进行一一探讨,期望能为生物数据标准制定提供一定的参考和依据。  相似文献   

13.
In spite of its importance, no systematic and comprehensive quality assurance (QA) program for radiation oncology information systems (ROIS) to verify clinical and treatment data integrity and mitigate against data errors/corruption and/or data loss risks is available. Based on data organization, format and purpose, data in ROISs falls into five different categories: (1) the ROIS relational database and associated files; (2) the ROIS DICOM data stream; (3) treatment machine beam data and machine configuration data; (4) electronic medical record (EMR) documents; and (5) user-generated clinical and treatment reports from the ROIS. For each data category, this framework proposes a corresponding data QA strategy to very data integrity. This approach verified every bit of data in the ROIS, including billions of data records in the ROIS SQL database, tens of millions of ROIS database-associated files, tens of thousands of DICOM data files for a group of selected patients, almost half a million EMR documents, and tens of thousands of machine configuration files and beam data files. The framework has been validated through intentional modifications with test patient data. Despite the ‘big data’ nature of ROIS, the multiprocess and multithread nature of our QA tools enabled the whole ROIS data QA process to be completed within hours without clinical interruptions. The QA framework suggested in this study proved to be robust, efficient and comprehensive without labor-intensive manual checks and has been implemented for our routine ROIS QA and ROIS upgrades.  相似文献   

14.
Human geneticists are increasingly turning to study designs based on very large sample sizes to overcome difficulties in studying complex disorders. This in turn almost always requires multi-site data collection and processing of data through centralized repositories. While such repositories offer many advantages, including the ability to return to previously collected data to apply new analytic techniques, they also have some limitations. To illustrate, we reviewed data from seven older schizophrenia studies available from the NIMH-funded Center for Collaborative Genomic Studies on Mental Disorders, also known as the Human Genetics Initiative (HGI), and assessed the impact of data cleaning and regularization on linkage analyses. Extensive data regularization protocols were developed and applied to both genotypic and phenotypic data. Genome-wide nonparametric linkage (NPL) statistics were computed for each study, over various stages of data processing. To assess the impact of data processing on aggregate results, Genome-Scan Meta-Analysis (GSMA) was performed. Examples of increased, reduced and shifted linkage peaks were found when comparing linkage results based on original HGI data to results using post-processed data within the same set of pedigrees. Interestingly, reducing the number of affected individuals tended to increase rather than decrease linkage peaks. But most importantly, while the effects of data regularization within individual data sets were small, GSMA applied to the data in aggregate yielded a substantially different picture after data regularization. These results have implications for analyses based on other types of data (e.g., case-control GWAS or sequencing data) as well as data obtained from other repositories.  相似文献   

15.
Increasing numbers of whole-genome sequences are available, but to interpret them fully requires more than listing all genes. Genome databases are faced with the challenges of integrating heterogenous data and enabling data mining. In comparison to a data warehousing approach, where integration is achieved through replication of all relevant data in a unified schema, distributed approaches provide greater flexibility and maintainability. These are important in a field where new data is generated rapidly and our understanding of the data changes. Interoperability between distributed data sources allows data maintenance to be separated from integration and analysis. Simple ways to access the data can facilitate the development of new data mining tools and the transition from model genome analysis to comparative genomics. With the MIPS Arabidopsis thaliana genome database (MAtDB, http://mips.gsf.de/proj/thal/db) our aim is to go beyond a data repository towards creating an integrated knowledge resource. To this end, the Arabidopsis genome has been a backbone against which to structure and integrate heterogenous data. The challenges to be met are continuous updating of data, the design of flexible data models that can evolve with new data, the integration of heterogenous data, e.g. through the use of ontologies, comprehensive views and visualization of complex information, simple interfaces for application access locally or via the Internet, and knowledge transfer across species.  相似文献   

16.
Previous phylogenetic analyses of caecilian neuroanatomical data yield results that are difficult to reconcile with those based upon more traditional morphological and molecular data. A review of the literature reveals problems in both the analyses and the data upon which the analyses were based. Revision of the neuroanatomical data resolves some, but not all, of these problems and yields a data set that, based on comparative measures of data quality, appears to represent some improvement over previous treatments. An extended data set of more traditional primarily morphological data is developed to facilitate the evaluation of caecilian relationships and the quality and utility of neuroanatomical and more traditional data. Separate and combined analyses of the neuroanatomical and traditional data produce a variety of results dependent upon character weighting, with little congruence among the results of the separate analyses and little support for relationships among the ‘higher’ caecilians with the combined data. Randomization tests indicate that: (i) there is significantly less incompatibility within each data set than that expected by chance alone; (2) the between-data-set incompatibility is significantly greater than that expected for random partitions of characters so the two data sets are significantly heterogeneous; (3) the neuroanatomical data appear generally of lower quality than the traditional data; (4) the neuroanatomical data are more compatible with the traditional data than are phylogenetically uninformative data. The lower quality of the neuroanatomical data may reflect small sample sizes. In addition, a subset of the neuroanatomical characters supports an unconventional grouping of all those caecilians with the most rudimentary eyes, which may reflect concerted homoplasy. Although the neuroanatomical data may be of lower quality than the traditional data, their compatibility with the traditional data suggests that they cannot be dismissed as phylogenetically meaningless. Conclusions on caecilian relationships are constrained by the conflict between the neuroanatomical and traditional data, the sensitivity of the combined analyses to weighting schemes, and by the limited support for the majority of groups in the majority of the analyses. Those hypotheses that are well supported are uncontroversial, although some have not been tested previously by numerical phylogenetic analyses. However, the data do not justify an hypothesis of ‘higher’ caecilian phylogeny that is both well resolved and well supported.  相似文献   

17.
Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data intensive in that data analytics requires complex procedures and multiple tools. To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics. In this framework techniques are proposed by leveraging cloud computing, MapReduce, and Service Oriented Architecture (SOA). Specifically, HBase is adopted for storing and managing big geoscience data across distributed computers. MapReduce-based algorithm framework is developed to support parallel processing of geoscience data. And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework. Results show that this innovative framework significantly improves the efficiency of big geoscience data analytics by reducing the data processing time as well as simplifying data analytical procedures for geoscientists.  相似文献   

18.
基于生态系统服务功能的生态系统评估是识别生态环境问题、开展生态系统恢复和生物多样性保护、建立生态补偿机制的重要基础,也是保障国家生态安全、推进生态文明建设的重要环节。生态系统评估涉及生态系统多个方面,需要多要素、多类型、多尺度的生态系统观测数据作为支撑。地面观测数据和遥感数据是生态系统评估的两大数据源,但是其在使用时常存在观测标准不一、观测要素不全面、时间连续性不足、尺度不匹配等问题,给生态系统评估增加了极大的不确定性。如何融合不同尺度的观测数据量化生态系统服务功能是实现生态系统准确评估的关键。为此,从观测尺度出发,阐述了地面观测数据、近地面遥感数据、机载遥感数据和卫星遥感数据的特点及其在问题,并综述了这几类数据源进行融合的常用方法,并以生产力、固碳能力、生物多样性几个关键生态参数为例介绍了“基于多源数据融合的生态系统评估技术及其应用研究”项目的多源数据融合体系。最后,总结面向生态系统评估的多源数据融合体系,并指出了该研究的未来发展方向。  相似文献   

19.
Ecological data can be difficult to collect, and as a result, some important temporal ecological datasets contain irregularly sampled data. Since many temporal modelling techniques require regularly spaced data, one common approach is to linearly interpolate the data, and build a model from the interpolated data. However, this process introduces an unquantified risk that the data is over-fitted to the interpolated (and hence more typical) instances. Using one such irregularly-sampled dataset, the Lake Kasumigaura algal dataset, we compare models built on the original sample data, and on the interpolated data, to evaluate the risk of mis-fitting based on the interpolated data.  相似文献   

20.
Background: Single-cell RNA sequencing (scRNA-seq) is an emerging technology that enables high resolution detection of heterogeneities between cells. One important application of scRNA-seq data is to detect differential expression (DE) of genes. Currently, some researchers still use DE analysis methods developed for bulk RNA-Seq data on single-cell data, and some new methods for scRNA-seq data have also been developed. Bulk and single-cell RNA-seq data have different characteristics. A systematic evaluation of the two types of methods on scRNA-seq data is needed. Results: In this study, we conducted a series of experiments on scRNA-seq data to quantitatively evaluate 14 popular DE analysis methods, including both of traditional methods developed for bulk RNA-seq data and new methods specifically designed for scRNA-seq data. We obtained observations and recommendations for the methods under different situations. Conclusions: DE analysis methods should be chosen for scRNA-seq data with great caution with regard to different situations of data. Different strategies should be taken for data with different sample sizes and/or different strengths of the expected signals. Several methods for scRNA-seq data show advantages in some aspects, and DEGSeq tends to outperform other methods with respect to consistency, reproducibility and accuracy of predictions on scRNA-seq data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号