首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
生物多样性数据共享和发表: 进展和建议   总被引:1,自引:0,他引:1  
生物多样性研究、保护实践、自然资源管理及科学决策等越来越依赖于大量数据的共享和整合。虽然关于数据共享的呼吁和实践越来越多, 但很多科学家仍然主动或被动地拒绝共享数据。关于数据共享, 现实中存在一些认知和技术上的障碍, 比如科学家不愿意共享数据, 担心同行竞争, 认为缺少足够的回报, 不熟悉相关数据保存机构, 缺少简便的数据提交工具, 没有足够时间和经费等。解决这些问题及改善共享文化的关键在于使共享者获得适当的回报(比如数据引用)。基于同行评审的数据发表被认为不但能够为生产、管理和共享数据的科学家提供一种激励机制, 并且能够有效地促进数据再利用。因而, 数据发表作为数据共享的方式之一, 近来引起了较多关注, 在生物多样性领域出现了专门发表数据论文的期刊。在采取数据论文的模式上, 数据保存机构和科技期刊采用联合数据政策在促进数据共享方面可能更具可行性。本文总结了数据共享和发表方面的进展, 讨论了数据论文能在何种程度上促进数据共享, 以及数据共享和数据发表的关系等问题, 提出如下建议: (1)个体科学家应努力践行数据共享; (2)使用DOI号解决数据所有权和数据引用的问题; (3)科技期刊和数据保存机构联合采用更加合理和严格的数据保存政策; (4)资助机构和研究单位应当在数据共享中起到更重要的作用。  相似文献   

2.
MOTIVATION: The methods for analyzing overlap data are distinct from those for analyzing probe data, making integration of the two forms awkward. Conversion of overlap data to probe-like data elements would facilitate comparison and uniform integration of overlap data and probe data using software developed for analysis of STS data. RESULTS: We show that overlap data can be effectively converted to probe-like data elements by extracting maximal sets of mutually overlapping clones. We call these sets virtual probes, since each set determines a site in the genome corresponding to the region which is common among the clones of the set. Finding the virtual probes is equivalent to finding the maximal cliques of a graph. We modify a known maximal-clique algorithm such that it finds all virtual probes in a large dataset within minutes. We illustrate the algorithm by converting fingerprint and Alu-PCR overlap data to virtual probes. The virtual probes are then analyzed using double-linkage intersection graphs and structure graphs to show that methods designed for STS data are also applicable to overlap data represented as virtual probes. Next we show that virtual probes can produce a uniform integration of different kinds of mapping data, in particular STS probe data and fingerprint and Alu-PCR overlap data. The integrated virtual probes produce longer double-linkage contigs than STS probes alone, and in conjunction with structure graphs they facilitate the identification and elimination of anomalies. Thus, the virtual-probe technique provides: (i) a new way to examine overlap data; (ii) a basis on which to compare overlap data and probe data using the same systems and standards; and (iii) a unique and useful way to uniformly integrate overlap data with probe data.  相似文献   

3.
4.
Data Quality     
A methodology is presented to develop and analyze vectors of data quality attribute scores. Each data quality vector component represents the quality of the data element for a specific attribute (e.g., age of data). Several methods for aggregating the components of data quality vectors to derive one data quality indicator (DQI) that represents the total quality associated with the input data element are presented with illustrative examples. The methods are compared and it is proven that the measure of central tendency, or arithmetic average, of the data quality vector components as a percentage of the total quality range attainable is an equivalent measure for the aggregate DQI. In addition, the methodology is applied and compared to realworld LCA data pedigree matrices. Finally, a method for aggregating weighted data quality vector attributes is developed and an illustrative example is presented. This methodology provides LCA practitioners with an approach to increase the precision of input data uncertainty assessments by selecting any number of data quality attributes with which to score the LCA inventory model input data. The resultant vector of data quality attributes can then be analyzed to develop one aggregate DQI for each input data element for use in stochastic LCA modeling.  相似文献   

5.
Challenges in using land use and land cover data for global change studies   总被引:5,自引:0,他引:5  
Land use and land cover data play a central role in climate change assessments. These data originate from different sources and inventory techniques. Each source of land use/cover data has its own domain of applicability and quality standards. Often data are selected without explicitly considering the suitability of the data for the specific application, the bias originating from data inventory and aggregation, and the effects of the uncertainty in the data on the results of the assessment. Uncertainties due to data selection and handling can be in the same order of magnitude as uncertainties related to the representation of the processes under investigation. While acknowledging the differences in data sources and the causes of inconsistencies, several methods have been developed to optimally extract information from the data and document the uncertainties. These methods include data integration, improved validation techniques and harmonization of classification systems. Based on the data needs of global change studies and the data availability, recommendations are formulated aimed at optimal use of current data and focused efforts for additional data collection. These include: improved documentation using classification systems for land use/cover data; careful selection of data given the specific application and the use of appropriate scaling and aggregation methods. In addition, the data availability may be improved by the combination of different data sources to optimize information content while collection of additional data must focus on validation of available data sets and improved coverage of regions and land cover types with a high level of uncertainty. Specific attention in data collection should be given to the representation of land management (systems) and mosaic landscapes.  相似文献   

6.
Human geneticists are increasingly turning to study designs based on very large sample sizes to overcome difficulties in studying complex disorders. This in turn almost always requires multi-site data collection and processing of data through centralized repositories. While such repositories offer many advantages, including the ability to return to previously collected data to apply new analytic techniques, they also have some limitations. To illustrate, we reviewed data from seven older schizophrenia studies available from the NIMH-funded Center for Collaborative Genomic Studies on Mental Disorders, also known as the Human Genetics Initiative (HGI), and assessed the impact of data cleaning and regularization on linkage analyses. Extensive data regularization protocols were developed and applied to both genotypic and phenotypic data. Genome-wide nonparametric linkage (NPL) statistics were computed for each study, over various stages of data processing. To assess the impact of data processing on aggregate results, Genome-Scan Meta-Analysis (GSMA) was performed. Examples of increased, reduced and shifted linkage peaks were found when comparing linkage results based on original HGI data to results using post-processed data within the same set of pedigrees. Interestingly, reducing the number of affected individuals tended to increase rather than decrease linkage peaks. But most importantly, while the effects of data regularization within individual data sets were small, GSMA applied to the data in aggregate yielded a substantially different picture after data regularization. These results have implications for analyses based on other types of data (e.g., case-control GWAS or sequencing data) as well as data obtained from other repositories.  相似文献   

7.
Increasing numbers of whole-genome sequences are available, but to interpret them fully requires more than listing all genes. Genome databases are faced with the challenges of integrating heterogenous data and enabling data mining. In comparison to a data warehousing approach, where integration is achieved through replication of all relevant data in a unified schema, distributed approaches provide greater flexibility and maintainability. These are important in a field where new data is generated rapidly and our understanding of the data changes. Interoperability between distributed data sources allows data maintenance to be separated from integration and analysis. Simple ways to access the data can facilitate the development of new data mining tools and the transition from model genome analysis to comparative genomics. With the MIPS Arabidopsis thaliana genome database (MAtDB, http://mips.gsf.de/proj/thal/db) our aim is to go beyond a data repository towards creating an integrated knowledge resource. To this end, the Arabidopsis genome has been a backbone against which to structure and integrate heterogenous data. The challenges to be met are continuous updating of data, the design of flexible data models that can evolve with new data, the integration of heterogenous data, e.g. through the use of ontologies, comprehensive views and visualization of complex information, simple interfaces for application access locally or via the Internet, and knowledge transfer across species.  相似文献   

8.
In this paper, we propose a successive learning method in hetero-associative memories, such as Bidirectional Associative Memories and Multidirectional Associative Memories, using chaotic neural networks. It can distinguish unknown data from the stored known data and can learn the unknown data successively. The proposed model makes use of the difference in the response to the input data in order to distinguish unknown data from the stored known data. When input data is regarded as unknown data, it is memorized. Furthermore, the proposed model can estimate and learn correct data from noisy unknown data or incomplete unknown data by considering the temporal summation of the continuous data input. In addition, similarity to the physiological facts in the olfactory bulb of a rabbit found by Freeman are observed in the behavior of the proposed model. A series of computer simulations shows the effectiveness of the proposed model.  相似文献   

9.
Ecological data can be difficult to collect, and as a result, some important temporal ecological datasets contain irregularly sampled data. Since many temporal modelling techniques require regularly spaced data, one common approach is to linearly interpolate the data, and build a model from the interpolated data. However, this process introduces an unquantified risk that the data is over-fitted to the interpolated (and hence more typical) instances. Using one such irregularly-sampled dataset, the Lake Kasumigaura algal dataset, we compare models built on the original sample data, and on the interpolated data, to evaluate the risk of mis-fitting based on the interpolated data.  相似文献   

10.
GDPC: connecting researchers with multiple integrated data sources   总被引:1,自引:0,他引:1  
The goal of this project is to simplify access to genomic diversity and phenotype data, thereby encouraging reuse of this data. The Genomic Diversity and Phenotype Connection (GDPC) accomplishes this by retrieving data from one or more data sources and by allowing researchers to analyze integrated data in a standard format. GDPC is written in JAVA and provides (1) data sources available as web services that transfer XML formatted data via the SOAP protocol; (2) a JAVA API for programmatic access to data sources; and (3) a front-end application that allows users to manage data sources, retrieve data based on filters, sort/group data based on property values and save/open the data as XML files. AVAILABILITY: The source code, compiled code, documentation and GDPC Browser are freely available at: www.maizegenetics.net/gdpc/index.html the current release of GDPC is version 1.0, with updated releases planned for the future. Comments are welcome.  相似文献   

11.
Background: Single-cell RNA sequencing (scRNA-seq) is an emerging technology that enables high resolution detection of heterogeneities between cells. One important application of scRNA-seq data is to detect differential expression (DE) of genes. Currently, some researchers still use DE analysis methods developed for bulk RNA-Seq data on single-cell data, and some new methods for scRNA-seq data have also been developed. Bulk and single-cell RNA-seq data have different characteristics. A systematic evaluation of the two types of methods on scRNA-seq data is needed. Results: In this study, we conducted a series of experiments on scRNA-seq data to quantitatively evaluate 14 popular DE analysis methods, including both of traditional methods developed for bulk RNA-seq data and new methods specifically designed for scRNA-seq data. We obtained observations and recommendations for the methods under different situations. Conclusions: DE analysis methods should be chosen for scRNA-seq data with great caution with regard to different situations of data. Different strategies should be taken for data with different sample sizes and/or different strengths of the expected signals. Several methods for scRNA-seq data show advantages in some aspects, and DEGSeq tends to outperform other methods with respect to consistency, reproducibility and accuracy of predictions on scRNA-seq data.  相似文献   

12.
We compiled three independent data sets of bird species occurrences in northeastern Colorado to test how predicted species richness compared to a combined analysis using all the data. The first data set was a georeferenced regional museum data set from two major repositories — the Denver Museum of Nature, and the Science and University of Colorado Museum. The two national survey data sets were the Breeding Bird Survey (summer), and the Great Backyard Bird Count (winter). Resulting analyses show that the museum data sets give richness estimates closest to the combined data set while exhibiting a skewed abundance distribution, whereas survey data sets do not accurately estimate overall richness even though they contain far more records. The combined data set allows the strengths of one data set to augment weaknesses in others. It is likely some museum data sets display skewed abundance distributions due to collectors’ potentially self‐selecting under‐represented species over common ones.  相似文献   

13.
14.
基于生态系统服务功能的生态系统评估是识别生态环境问题、开展生态系统恢复和生物多样性保护、建立生态补偿机制的重要基础,也是保障国家生态安全、推进生态文明建设的重要环节。生态系统评估涉及生态系统多个方面,需要多要素、多类型、多尺度的生态系统观测数据作为支撑。地面观测数据和遥感数据是生态系统评估的两大数据源,但是其在使用时常存在观测标准不一、观测要素不全面、时间连续性不足、尺度不匹配等问题,给生态系统评估增加了极大的不确定性。如何融合不同尺度的观测数据量化生态系统服务功能是实现生态系统准确评估的关键。为此,从观测尺度出发,阐述了地面观测数据、近地面遥感数据、机载遥感数据和卫星遥感数据的特点及其在问题,并综述了这几类数据源进行融合的常用方法,并以生产力、固碳能力、生物多样性几个关键生态参数为例介绍了“基于多源数据融合的生态系统评估技术及其应用研究”项目的多源数据融合体系。最后,总结面向生态系统评估的多源数据融合体系,并指出了该研究的未来发展方向。  相似文献   

15.
为准确、快速地获取入侵生物野外调查数据, 我们基于全球卫星导航系统、地理信息系统、移动互联网等现代信息技术提出了外来物种入侵大数据采集方法, 设计并研发了数据表单可自定义的野外调查工具软件——云采集。该系统以Android手机为数据采集终端, 采用C#和Java语言设计开发, 运用卫星导航定位技术实现野外调查发生位置的快速采集, 通过定义9种调查指标的数据类型及指标(列值)默认值、图像拍摄、语音录入、排序等4个辅助属性, 建立调查指标与手机客户端数据录入界面的关联, 实现用户界面可定制的数据录入模式。该系统在国家重点研发项目、福建省科技重大专项及福建省红火蚁(Solenopsis invicta)疫情普查等项目的调查任务中予以应用。实践检验表明: 该系统实现了野外调查数据的离线采集、数据同步、数据查询与输出管理, 将移动智能终端采集取代传统的纸笔记录, 简化了野外调查的流程, 提高了入侵生物野外调查的数据质量, 为外来生物入侵野外调查大数据采集提供了信息化支持。  相似文献   

16.
Data analysis--not data production--is becoming the bottleneck in gene expression research. Data integration is necessary to cope with an ever increasing amount of data, to cross-validate noisy data sets, and to gain broad interdisciplinary views of large biological data sets. New Internet resources may help researchers to combine data sets across different gene expression platforms. However, noise and disparities in experimental protocols strongly limit data integration. A detailed review of four selected studies reveals how some of these limitations may be circumvented and illustrates what can be achieved through data integration.  相似文献   

17.
Keyword search on encrypted data allows one to issue the search token and conduct search operations on encrypted data while still preserving keyword privacy. In the present paper, we consider the keyword search problem further and introduce a novel notion called attribute-based proxy re-encryption with keyword search (), which introduces a promising feature: In addition to supporting keyword search on encrypted data, it enables data owners to delegate the keyword search capability to some other data users complying with the specific access control policy. To be specific, allows (i) the data owner to outsource his encrypted data to the cloud and then ask the cloud to conduct keyword search on outsourced encrypted data with the given search token, and (ii) the data owner to delegate other data users keyword search capability in the fine-grained access control manner through allowing the cloud to re-encrypted stored encrypted data with a re-encrypted data (embedding with some form of access control policy). We formalize the syntax and security definitions for , and propose two concrete constructions for : key-policy and ciphertext-policy . In the nutshell, our constructions can be treated as the integration of technologies in the fields of attribute-based cryptography and proxy re-encryption cryptography.  相似文献   

18.
The number of methods for pre-processing and analysis of gene expression data continues to increase, often making it difficult to select the most appropriate approach. We present a simple procedure for comparative estimation of a variety of methods for microarray data pre-processing and analysis. Our approach is based on the use of real microarray data in which controlled fold changes are introduced into 20% of the data to provide a metric for comparison with the unmodified data. The data modifications can be easily applied to raw data measured with any technological platform and retains all the complex structures and statistical characteristics of the real-world data. The power of the method is illustrated by its application to the quantitative comparison of different methods of normalization and analysis of microarray data. Our results demonstrate that the method of controlled modifications of real experimental data provides a simple tool for assessing the performance of data preprocessing and analysis methods.  相似文献   

19.
高通量实验方法的发展导致大量基因组、转录组、代谢组等组学数据的出现,组学数据的整合为全面了解生物学系统提供了条件.但是,由于当前实验技术手段的限制,高通量组学数据大多存在系统偏差,数据类型和可靠程度也各不相同,这给组学数据的整合带来了困难.本文以转录组、蛋白质组和代谢组为重点,综述了近年来组学数据整合方面的研究进展,包括新的数据整合方法和分析平台.虽然现存的数据统计和网络分析的方法有助于发现不同组学数据之间的关联,但是生物学意义上的深层次的数据整合还有待于生物、数学、计算机等各种领域的全面发展.  相似文献   

20.
In spite of its importance, no systematic and comprehensive quality assurance (QA) program for radiation oncology information systems (ROIS) to verify clinical and treatment data integrity and mitigate against data errors/corruption and/or data loss risks is available. Based on data organization, format and purpose, data in ROISs falls into five different categories: (1) the ROIS relational database and associated files; (2) the ROIS DICOM data stream; (3) treatment machine beam data and machine configuration data; (4) electronic medical record (EMR) documents; and (5) user-generated clinical and treatment reports from the ROIS. For each data category, this framework proposes a corresponding data QA strategy to very data integrity. This approach verified every bit of data in the ROIS, including billions of data records in the ROIS SQL database, tens of millions of ROIS database-associated files, tens of thousands of DICOM data files for a group of selected patients, almost half a million EMR documents, and tens of thousands of machine configuration files and beam data files. The framework has been validated through intentional modifications with test patient data. Despite the ‘big data’ nature of ROIS, the multiprocess and multithread nature of our QA tools enabled the whole ROIS data QA process to be completed within hours without clinical interruptions. The QA framework suggested in this study proved to be robust, efficient and comprehensive without labor-intensive manual checks and has been implemented for our routine ROIS QA and ROIS upgrades.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号