首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
生态环境大数据面临的机遇与挑战   总被引:2,自引:0,他引:2  
刘丽香  张丽云  赵芬  赵苗苗  赵海凤  邵蕊  徐明 《生态学报》2017,37(14):4896-4904
随着大数据时代的到来和大数据技术的迅猛发展,生态环境大数据的建设和应用已初露端倪。为了全面推进生态环境大数据的建设和应用,综述了生态环境大数据在解决生态环境问题中的机遇和优势,并分析了生态环境大数据在应用中所面临的挑战。总结和概括了大数据的概念与特征,又结合生态环境领域的特点,分析了生态环境大数据的特殊性和复杂性。重点阐述了生态环境大数据在减缓环境污染、生态退化和气候变化中的机遇,主要从数据存储、处理、分析、解释和展示等方面阐述生态环境大数据相较于传统数据的优势,通过这些优势说明生态环境大数据将有助于全面提高生态环境治理的综合决策水平。虽然生态环境大数据的应用前景广阔,但也面临着重重挑战,在数据共享和开放、应用创新、数据管理、技术创新和落地、专业人才培养和资金投入等方面还存在着许多问题和困难。在以上分析的基础上,提出了生态环境大数据未来的发展方向,包括各类生态环境数据的标准化、建设生态环境大数据存储与处理分析平台和推动国内外生态环境大数据平台的对接。  相似文献   

2.
Challenges in using land use and land cover data for global change studies   总被引:5,自引:0,他引:5  
Land use and land cover data play a central role in climate change assessments. These data originate from different sources and inventory techniques. Each source of land use/cover data has its own domain of applicability and quality standards. Often data are selected without explicitly considering the suitability of the data for the specific application, the bias originating from data inventory and aggregation, and the effects of the uncertainty in the data on the results of the assessment. Uncertainties due to data selection and handling can be in the same order of magnitude as uncertainties related to the representation of the processes under investigation. While acknowledging the differences in data sources and the causes of inconsistencies, several methods have been developed to optimally extract information from the data and document the uncertainties. These methods include data integration, improved validation techniques and harmonization of classification systems. Based on the data needs of global change studies and the data availability, recommendations are formulated aimed at optimal use of current data and focused efforts for additional data collection. These include: improved documentation using classification systems for land use/cover data; careful selection of data given the specific application and the use of appropriate scaling and aggregation methods. In addition, the data availability may be improved by the combination of different data sources to optimize information content while collection of additional data must focus on validation of available data sets and improved coverage of regions and land cover types with a high level of uncertainty. Specific attention in data collection should be given to the representation of land management (systems) and mosaic landscapes.  相似文献   

3.
In this paper, we propose a successive learning method in hetero-associative memories, such as Bidirectional Associative Memories and Multidirectional Associative Memories, using chaotic neural networks. It can distinguish unknown data from the stored known data and can learn the unknown data successively. The proposed model makes use of the difference in the response to the input data in order to distinguish unknown data from the stored known data. When input data is regarded as unknown data, it is memorized. Furthermore, the proposed model can estimate and learn correct data from noisy unknown data or incomplete unknown data by considering the temporal summation of the continuous data input. In addition, similarity to the physiological facts in the olfactory bulb of a rabbit found by Freeman are observed in the behavior of the proposed model. A series of computer simulations shows the effectiveness of the proposed model.  相似文献   

4.
生物多样性数据共享和发表: 进展和建议   总被引:1,自引:0,他引:1  
生物多样性研究、保护实践、自然资源管理及科学决策等越来越依赖于大量数据的共享和整合。虽然关于数据共享的呼吁和实践越来越多, 但很多科学家仍然主动或被动地拒绝共享数据。关于数据共享, 现实中存在一些认知和技术上的障碍, 比如科学家不愿意共享数据, 担心同行竞争, 认为缺少足够的回报, 不熟悉相关数据保存机构, 缺少简便的数据提交工具, 没有足够时间和经费等。解决这些问题及改善共享文化的关键在于使共享者获得适当的回报(比如数据引用)。基于同行评审的数据发表被认为不但能够为生产、管理和共享数据的科学家提供一种激励机制, 并且能够有效地促进数据再利用。因而, 数据发表作为数据共享的方式之一, 近来引起了较多关注, 在生物多样性领域出现了专门发表数据论文的期刊。在采取数据论文的模式上, 数据保存机构和科技期刊采用联合数据政策在促进数据共享方面可能更具可行性。本文总结了数据共享和发表方面的进展, 讨论了数据论文能在何种程度上促进数据共享, 以及数据共享和数据发表的关系等问题, 提出如下建议: (1)个体科学家应努力践行数据共享; (2)使用DOI号解决数据所有权和数据引用的问题; (3)科技期刊和数据保存机构联合采用更加合理和严格的数据保存政策; (4)资助机构和研究单位应当在数据共享中起到更重要的作用。  相似文献   

5.
随着生物测序技术的快速发展,积累了海量的生物数据。生物数据资源作为生物分析研究及应用的核心和源头,为保证数据的正确性、可用性和安全性,对生物数据资源进行标准化的管理非常重要和迫切。本文综述了目前国内外生物数据标准化研制进展,目前国内外对生物数据缺少一个总体的规划,生物数据语义存在大量的不兼容性,数据格式多种多样,在生物数据收集、处理、存储和共享等方面缺乏统一的标准。国内外生物数据标准化处于起步阶段,但各国生物专家都在努力进行标准研制工作。文章最后从生物数据术语、生物数据资源收集、处理和交换、存储、生物数据库建设和生物数据伦理规范等方面出发,对标准研制工作进行一一探讨,期望能为生物数据标准制定提供一定的参考和依据。  相似文献   

6.
Effects of censoring on parameter estimates and power in genetic modeling.   总被引:5,自引:0,他引:5  
Genetic and environmental influences on variance in phenotypic traits may be estimated with normal theory Maximum Likelihood (ML). However, when the assumption of multivariate normality is not met, this method may result in biased parameter estimates and incorrect likelihood ratio tests. We simulated multivariate normal distributed twin data under the assumption of three different genetic models. Genetic model fitting was performed in six data sets: multivariate normal data, discrete uncensored data, censored data, square root transformed censored data, normal scores of censored data, and categorical data. Estimates were obtained with normal theory ML (data sets 1-5) and with categorical data analysis (data set 6). Statistical power was examined by fitting reduced models to the data. When fitting an ACE model to censored data, an unbiased estimate of the additive genetic effect was obtained. However, the common environmental effect was underestimated and the unique environmental effect was overestimated. Transformations did not remove this bias. When fitting an ADE model, the additive genetic effect was underestimated while the dominant and unique environmental effects were overestimated. In all models, the correct parameter estimates were recovered with categorical data analysis. However, with categorical data analysis, the statistical power decreased. The analysis of L-shaped distributed data with normal theory ML results in biased parameter estimates. Unbiased parameter estimates are obtained with categorical data analysis, but the power decreases.  相似文献   

7.
MOTIVATION: The methods for analyzing overlap data are distinct from those for analyzing probe data, making integration of the two forms awkward. Conversion of overlap data to probe-like data elements would facilitate comparison and uniform integration of overlap data and probe data using software developed for analysis of STS data. RESULTS: We show that overlap data can be effectively converted to probe-like data elements by extracting maximal sets of mutually overlapping clones. We call these sets virtual probes, since each set determines a site in the genome corresponding to the region which is common among the clones of the set. Finding the virtual probes is equivalent to finding the maximal cliques of a graph. We modify a known maximal-clique algorithm such that it finds all virtual probes in a large dataset within minutes. We illustrate the algorithm by converting fingerprint and Alu-PCR overlap data to virtual probes. The virtual probes are then analyzed using double-linkage intersection graphs and structure graphs to show that methods designed for STS data are also applicable to overlap data represented as virtual probes. Next we show that virtual probes can produce a uniform integration of different kinds of mapping data, in particular STS probe data and fingerprint and Alu-PCR overlap data. The integrated virtual probes produce longer double-linkage contigs than STS probes alone, and in conjunction with structure graphs they facilitate the identification and elimination of anomalies. Thus, the virtual-probe technique provides: (i) a new way to examine overlap data; (ii) a basis on which to compare overlap data and probe data using the same systems and standards; and (iii) a unique and useful way to uniformly integrate overlap data with probe data.  相似文献   

8.
李俊洁  黄晓磊 《生物多样性》2016,24(12):1317-959
近年来有关科学数据共享的呼声越来越高, 基于同行评审的生物多样性数据论文也受到越来越多的关注, 并出现了一些专门发表数据论文的数据期刊。本文总结了近年来生物多样性数据发表方面的进展, 选择两本代表性数据期刊(Biodiversity Data JournalScientific Data), 分析了它们自创刊以来的发文数量、涉及生物类群、文章浏览量和被引次数等指标。结果显示两本数据期刊的发文量都呈稳步增长趋势, 其生物多样性数据论文覆盖了包括动物界、植物界、真菌界在内的众多生物类群, 文章浏览量和被引次数方面也有可喜的表现, 说明数据论文正在被越来越多的研究者所接受。对文章作者国别的分析则显示了不同地区的研究者在发表生物多样性数据论文或数据共享方面的不均衡。建议相关领域的中国研究者和期刊关注生物多样性数据论文和数据共享政策, 更多地践行数据共享。  相似文献   

9.
Analysis of repeatability in spotted cDNA microarrays   总被引:7,自引:3,他引:4  
We report a strategy for analysis of data quality in cDNA microarrays based on the repeatability of repeatedly spotted clones. We describe how repeatability can be used to control data quality by developing adaptive filtering criteria for microarray data containing clones spotted in multiple spots. We have applied the method on five publicly available cDNA microarray data sets and one previously unpublished data set from our own laboratory. The results demonstrate the feasibility of the approach as a foundation for data filtering, and indicate a high degree of variation in data quality, both across the data sets and between arrays within data sets.  相似文献   

10.
In spite of its importance, no systematic and comprehensive quality assurance (QA) program for radiation oncology information systems (ROIS) to verify clinical and treatment data integrity and mitigate against data errors/corruption and/or data loss risks is available. Based on data organization, format and purpose, data in ROISs falls into five different categories: (1) the ROIS relational database and associated files; (2) the ROIS DICOM data stream; (3) treatment machine beam data and machine configuration data; (4) electronic medical record (EMR) documents; and (5) user-generated clinical and treatment reports from the ROIS. For each data category, this framework proposes a corresponding data QA strategy to very data integrity. This approach verified every bit of data in the ROIS, including billions of data records in the ROIS SQL database, tens of millions of ROIS database-associated files, tens of thousands of DICOM data files for a group of selected patients, almost half a million EMR documents, and tens of thousands of machine configuration files and beam data files. The framework has been validated through intentional modifications with test patient data. Despite the ‘big data’ nature of ROIS, the multiprocess and multithread nature of our QA tools enabled the whole ROIS data QA process to be completed within hours without clinical interruptions. The QA framework suggested in this study proved to be robust, efficient and comprehensive without labor-intensive manual checks and has been implemented for our routine ROIS QA and ROIS upgrades.  相似文献   

11.
To explore the feasibility of parsimony analysis for large data sets, we conducted heuristic parsimony searches and bootstrap analyses on separate and combined DNA data sets for 190 angiosperms and three outgroups. Separate data sets of 18S rDNA (1,855 bp), rbcL (1,428 bp), and atpB (1,450 bp) sequences were combined into a single matrix 4,733 bp in length. Analyses of the combined data set show great improvements in computer run times compared to those of the separate data sets and of the data sets combined in pairs. Six searches of the 18S rDNA + rbcL + atpB data set were conducted; in all cases TBR branch swapping was completed, generally within a few days. In contrast, TBR branch swapping was not completed for any of the three separate data sets, or for the pairwise combined data sets. These results illustrate that it is possible to conduct a thorough search of tree space with large data sets, given sufficient signal. In this case, and probably most others, sufficient signal for a large number of taxa can only be obtained by combining data sets. The combined data sets also have higher internal support for clades than the separate data sets, and more clades receive bootstrap support of > or = 50% in the combined analysis than in analyses of the separate data sets. These data suggest that one solution to the computational and analytical dilemmas posed by large data sets is the addition of nucleotides, as well as taxa.  相似文献   

12.
Data independent acquisition (DIA) proteomics techniques have matured enormously in recent years, thanks to multiple technical developments in, for example, instrumentation and data analysis approaches. However, there are many improvements that are still possible for DIA data in the area of the FAIR (Findability, Accessibility, Interoperability and Reusability) data principles. These include more tailored data sharing practices and open data standards since public databases and data standards for proteomics were mostly designed with DDA data in mind. Here we first describe the current state of the art in the context of FAIR data for proteomics in general, and for DIA approaches in particular. For improving the current situation for DIA data, we make the following recommendations for the future: (i) development of an open data standard for spectral libraries; (ii) make mandatory the availability of the spectral libraries used in DIA experiments in ProteomeXchange resources; (iii) improve the support for DIA data in the data standards developed by the Proteomics Standards Initiative; and (iv) improve the support for DIA datasets in ProteomeXchange resources, including more tailored metadata requirements.  相似文献   

13.
Modelling data uncertainty is not common practice in life cycle inventories (LCI), although different techniques are available for estimating and expressing uncertainties, and for propagating the uncertainties to the final model results. To clarify and stimulate the use of data uncertainty assessments in common LCI practice, the SETAC working group ‘Data Availability and Quality’ presents a framework for data uncertainty assessment in LCI. Data uncertainty is divided in two categories: (1) lack of data, further specified as complete lack of data (data gaps) and a lack of representative data, and (2) data inaccuracy. Filling data gaps can be done by input-output modelling, using information for similar products or the main ingredients of a product, and applying the law of mass conservation. Lack of temporal, geographical and further technological correlation between the data used and needed may be accounted for by applying uncertainty factors to the non-representative data. Stochastic modelling, which can be performed by Monte Carlo simulation, is a promising technique to deal with data inaccuracy in LCIs.  相似文献   

14.
高通量实验方法的发展导致大量基因组、转录组、代谢组等组学数据的出现,组学数据的整合为全面了解生物学系统提供了条件.但是,由于当前实验技术手段的限制,高通量组学数据大多存在系统偏差,数据类型和可靠程度也各不相同,这给组学数据的整合带来了困难.本文以转录组、蛋白质组和代谢组为重点,综述了近年来组学数据整合方面的研究进展,包括新的数据整合方法和分析平台.虽然现存的数据统计和网络分析的方法有助于发现不同组学数据之间的关联,但是生物学意义上的深层次的数据整合还有待于生物、数学、计算机等各种领域的全面发展.  相似文献   

15.
The validity of material flow analyses (MFAs) depends on the available information base, that is, the quality and quantity of available data. MFA data are cross‐disciplinary, can have varying formats and qualities, and originate from heterogeneous sources, such as official statistics, scientific models, or expert estimations. Statistical methods for data evaluation are most often inadequate, because MFA data are typically isolated values rather than extensive data sets. In consideration of the properties of MFA data, a data characterization framework for MFA is presented. It consists of an MFA data terminology, a data characterization matrix, and a procedure for database analysis. The framework facilitates systematic data characterization by cell‐level tagging of data with data attributes. Data attributes represent data characteristics and metainformation regarding statistical properties, meaning, origination, and application of the data. The data characterization framework is illustrated in a case study of a national phosphorus budget. This work furthers understanding of the information basis of material flow systems, promotes the transparent documentation and precise communication of MFA input data, and can be the foundation for better data interpretation and comprehensive data quality evaluation.  相似文献   

16.
The objective of this paper is to give an overview of existing databases in Denmark and describe some of the most important of these in relation to establishment of the Danish Veterinary and Food Administrations’ veterinary data warehouse. The purpose of the data warehouse and possible use of the data are described. Finally, sharing of data and validity of data is discussed. There are databases in other countries describing animal husbandry and veterinary antimicrobial consumption, but Denmark will be the first country relating all data concerning animal husbandry, -health and -welfare in Danish production animals to each other in a data warehouse. Moreover, creating access to these data for researchers and authorities will hopefully result in easier and more substantial risk based control, risk management and risk communication by the authorities and access to data for researchers for epidemiological studies in animal health and welfare.  相似文献   

17.
基因表达谱芯片和核酸序列数据在癌症研究中占有很重要的地位。基因表达谱芯片被广泛的应用在医学研究中,它的主要优势在于灵敏快速成本低,缺点只能对现有基因进行研究,无法进行新基因发现以及变异等方面的研究;而核酸序列数据在这方面则具有很大优势。总体来说,二者在癌症研究中都发挥着巨大的作用。随着精准医学的不断发展,对这些高通量数据的深入研究可以有助于人们进一步了解癌症的分子机制,从而加速个体化治疗的进程。  相似文献   

18.
Observational studies of health conditions and outcomes often combine clinical care data from many sites without explicitly assessing the accuracy and completeness of these data. In order to improve the quality of data in an international multi-site observational cohort of HIV-infected patients, the authors conducted on-site, Good Clinical Practice-based audits of the clinical care datasets submitted by participating HIV clinics. Discrepancies between data submitted for research and data in the clinical records were categorized using the audit codes published by the European Organization for the Research and Treatment of Cancer. Five of seven sites had error rates >10% in key study variables, notably laboratory data, weight measurements, and antiretroviral medications. All sites had significant discrepancies in medication start and stop dates. Clinical care data, particularly antiretroviral regimens and associated dates, are prone to substantial error. Verifying data against source documents through audits will improve the quality of databases and research and can be a technique for retraining staff responsible for clinical data collection. The authors recommend that all participants in observational cohorts use data audits to assess and improve the quality of data and to guide future data collection and abstraction efforts at the point of care.  相似文献   

19.
20.
Human geneticists are increasingly turning to study designs based on very large sample sizes to overcome difficulties in studying complex disorders. This in turn almost always requires multi-site data collection and processing of data through centralized repositories. While such repositories offer many advantages, including the ability to return to previously collected data to apply new analytic techniques, they also have some limitations. To illustrate, we reviewed data from seven older schizophrenia studies available from the NIMH-funded Center for Collaborative Genomic Studies on Mental Disorders, also known as the Human Genetics Initiative (HGI), and assessed the impact of data cleaning and regularization on linkage analyses. Extensive data regularization protocols were developed and applied to both genotypic and phenotypic data. Genome-wide nonparametric linkage (NPL) statistics were computed for each study, over various stages of data processing. To assess the impact of data processing on aggregate results, Genome-Scan Meta-Analysis (GSMA) was performed. Examples of increased, reduced and shifted linkage peaks were found when comparing linkage results based on original HGI data to results using post-processed data within the same set of pedigrees. Interestingly, reducing the number of affected individuals tended to increase rather than decrease linkage peaks. But most importantly, while the effects of data regularization within individual data sets were small, GSMA applied to the data in aggregate yielded a substantially different picture after data regularization. These results have implications for analyses based on other types of data (e.g., case-control GWAS or sequencing data) as well as data obtained from other repositories.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号