首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
李俊洁  黄晓磊 《生物多样性》2016,24(12):1317-959
近年来有关科学数据共享的呼声越来越高, 基于同行评审的生物多样性数据论文也受到越来越多的关注, 并出现了一些专门发表数据论文的数据期刊。本文总结了近年来生物多样性数据发表方面的进展, 选择两本代表性数据期刊(Biodiversity Data JournalScientific Data), 分析了它们自创刊以来的发文数量、涉及生物类群、文章浏览量和被引次数等指标。结果显示两本数据期刊的发文量都呈稳步增长趋势, 其生物多样性数据论文覆盖了包括动物界、植物界、真菌界在内的众多生物类群, 文章浏览量和被引次数方面也有可喜的表现, 说明数据论文正在被越来越多的研究者所接受。对文章作者国别的分析则显示了不同地区的研究者在发表生物多样性数据论文或数据共享方面的不均衡。建议相关领域的中国研究者和期刊关注生物多样性数据论文和数据共享政策, 更多地践行数据共享。  相似文献   

2.
参照国家“实验动物资源共性描述规范”的数据分类原则,以实验动物所包含的基本信息、遗传数据、生理数据、生化数据、解剖数据等五大生物学特性数据种类进行划分,采用层级结构探索了一套灵活性、可扩展的实验动物生物学特性数据动态分类编码方法,该方法对实验动物数据资源的科学保存、有效共享和科学管理具有重要的支撑作用。  相似文献   

3.
随着生物测序技术的快速发展,积累了海量的生物数据。生物数据资源作为生物分析研究及应用的核心和源头,为保证数据的正确性、可用性和安全性,对生物数据资源进行标准化的管理非常重要和迫切。本文综述了目前国内外生物数据标准化研制进展,目前国内外对生物数据缺少一个总体的规划,生物数据语义存在大量的不兼容性,数据格式多种多样,在生物数据收集、处理、存储和共享等方面缺乏统一的标准。国内外生物数据标准化处于起步阶段,但各国生物专家都在努力进行标准研制工作。文章最后从生物数据术语、生物数据资源收集、处理和交换、存储、生物数据库建设和生物数据伦理规范等方面出发,对标准研制工作进行一一探讨,期望能为生物数据标准制定提供一定的参考和依据。  相似文献   

4.
为满足全国医院服务能力和医疗质量评价需求,针对复杂的病案首页源数据,设计实现了自下而上包含数据集成层、存储与管理层、分析与挖掘层、分析结果展现层,以及贯穿整个系统的数据安全和质量控制的病案首页数据集成与分布式管理平台。系统平台已应用于全国多家医院的病案首页数据的集成与处理,获得了很好的效果。  相似文献   

5.
The use of animal vs. human data for the purposes of establishing human risk was examined for four pharmaceutical compounds: acetylsalicylic acid, cyclophosphamide, indomethacin and clofibric acid. Literature searches were conducted to identify preclinical and clinical data useful for the derivation of acceptable daily intakes (ADIs) from which a number of risk values including occupational exposure limits (OELs) could be calculated. OELs were calculated using human data and then again using animal data exclusively. For two compounds, ASA and clofibric acid use of animal data alone led to higher OELs (not health protective), while for indomethacin and cyclophosphamide use of animal data resulted in the same or lower OELs based on human data alone. In each case arguments were made for why the use of human data was preferred. The results of the analysis support a basic principle of risk assessment that all available data be considered  相似文献   

6.
蛋白质二级结构预测样本集数据库的设计与实现   总被引:1,自引:0,他引:1  
张宁  张涛 《生物信息学》2006,4(4):163-166
将数据库技术应用到蛋白质二级结构预测的样本集处理和分析上,建立了二级结构预测样本集数据库。以CB513样本集为例介绍了该数据库的构建模式。构建样本数据库不仅便于存储、管理和检索数据,还可以完成一些简单的序列分析工作,取代许多以往必须的编程。从而大大提高了工作效率,减少错误的发生。  相似文献   

7.
Besides the problem of searching for effective methods for data analysis there are some additional problems with handling data of high uncertainty. Uncertainty problems often arise in an analysis of ecological data, e.g. in the cluster analysis of ecological data. Conventional clustering methods based on Boolean logic ignore the continuous nature of ecological variables and the uncertainty of ecological data. That can result in misclassification or misinterpretation of the data structure. Clusters with fuzzy boundaries reflect better the continuous character of ecological features. But the problem is, that the common clustering methods (like the fuzzy c-means method) are only designed for treating crisp data, that means they provide a fuzzy partition only for crisp data (e.g. exact measurement data). This paper presents the extension and implementation of the method of fuzzy clustering of fuzzy data proposed by Yang and Liu [Yang, M.-S. and Liu, H-H, 1999. Fuzzy clustering procedures for conical fuzzy vector data. Fuzzy Sets and Systems, 106, 189-200.]. The imprecise data can be defined as multidimensional fuzzy sets with not sharply formed boundaries (in the form of the so-called conical fuzzy vectors). They can then be used for the fuzzy clustering together with crisp data. That can be particularly useful when information is not available about the variances which describe the accuracy of the data and probabilistic approaches are impossible. The method proposed by Yang has been extended and implemented for the Fuzzy Clustering System EcoFucs developed at the University of Kiel. As an example, the paper presents the fuzzy cluster analysis of chemicals according to their ecotoxicological properties. The uncertainty and imprecision of ecotoxicological data are very high because of the use of various data sources, various investigation tests and the difficulty of comparing these data. The implemented method can be very helpful in searching for an adequate partition of ecological data into clusters with similar properties.  相似文献   

8.
Human geneticists are increasingly turning to study designs based on very large sample sizes to overcome difficulties in studying complex disorders. This in turn almost always requires multi-site data collection and processing of data through centralized repositories. While such repositories offer many advantages, including the ability to return to previously collected data to apply new analytic techniques, they also have some limitations. To illustrate, we reviewed data from seven older schizophrenia studies available from the NIMH-funded Center for Collaborative Genomic Studies on Mental Disorders, also known as the Human Genetics Initiative (HGI), and assessed the impact of data cleaning and regularization on linkage analyses. Extensive data regularization protocols were developed and applied to both genotypic and phenotypic data. Genome-wide nonparametric linkage (NPL) statistics were computed for each study, over various stages of data processing. To assess the impact of data processing on aggregate results, Genome-Scan Meta-Analysis (GSMA) was performed. Examples of increased, reduced and shifted linkage peaks were found when comparing linkage results based on original HGI data to results using post-processed data within the same set of pedigrees. Interestingly, reducing the number of affected individuals tended to increase rather than decrease linkage peaks. But most importantly, while the effects of data regularization within individual data sets were small, GSMA applied to the data in aggregate yielded a substantially different picture after data regularization. These results have implications for analyses based on other types of data (e.g., case-control GWAS or sequencing data) as well as data obtained from other repositories.  相似文献   

9.
生物多样性数据共享和发表: 进展和建议   总被引:1,自引:0,他引:1  
生物多样性研究、保护实践、自然资源管理及科学决策等越来越依赖于大量数据的共享和整合。虽然关于数据共享的呼吁和实践越来越多, 但很多科学家仍然主动或被动地拒绝共享数据。关于数据共享, 现实中存在一些认知和技术上的障碍, 比如科学家不愿意共享数据, 担心同行竞争, 认为缺少足够的回报, 不熟悉相关数据保存机构, 缺少简便的数据提交工具, 没有足够时间和经费等。解决这些问题及改善共享文化的关键在于使共享者获得适当的回报(比如数据引用)。基于同行评审的数据发表被认为不但能够为生产、管理和共享数据的科学家提供一种激励机制, 并且能够有效地促进数据再利用。因而, 数据发表作为数据共享的方式之一, 近来引起了较多关注, 在生物多样性领域出现了专门发表数据论文的期刊。在采取数据论文的模式上, 数据保存机构和科技期刊采用联合数据政策在促进数据共享方面可能更具可行性。本文总结了数据共享和发表方面的进展, 讨论了数据论文能在何种程度上促进数据共享, 以及数据共享和数据发表的关系等问题, 提出如下建议: (1)个体科学家应努力践行数据共享; (2)使用DOI号解决数据所有权和数据引用的问题; (3)科技期刊和数据保存机构联合采用更加合理和严格的数据保存政策; (4)资助机构和研究单位应当在数据共享中起到更重要的作用。  相似文献   

10.
The generation of proteomic data is becoming ever more high throughput. Both the technologies and experimental designs used to generate and analyze data are becoming increasingly complex. The need for methods by which such data can be accurately described, stored and exchanged between experimenters and data repositories has been recognized. Work by the Proteome Standards Initiative of the Human Proteome Organization has laid the foundation for the development of standards by which experimental design can be described and data exchange facilitated. The Minimum Information About a Proteomic Experiment data model describes both the scope and purpose of a proteomics experiment and encompasses the development of more specific interchange formats such as the mzData model of mass spectrometry. The eXtensible Mark-up Language-MI data interchange format, which allows exchange of molecular interaction data, has already been published and major databases within this field are supplying data downloads in this format.  相似文献   

11.
Small random samples of biochemical and biological data are often representative of complex distribution functions and are difficult to analyze in detail by conventional means. The common approaches reduce the data to a few representative parameters (such as their moments) or combine the data into a histogram plot. Both approaches reduce the information content of the data. By fitting the empirical cumulative distribution function itself with models of integrated probability distributions, the information content of the raw data can be fully utilized. This approach, distribution analysis by nonlinear fitting of integrated probabilities, allows analysis of normally distributed samples, truncated data sets, and multimodal distributions with a single, powerful data processing procedure.  相似文献   

12.
Benthic invertebrate data from thirty-nine lakes in south-central Ontario were analyzed to determine the effect of choosing particular data standardizations, resemblance measures, and ordination methods on the resultant multivariate summaries. Logarithmic-transformed, 0–1 scaled, and ranked data were used as standardized variables with resemblance measures of Bray-Curtis, Euclidean distance, cosine distance, correlation, covariance and chi-squared distance. Combinations of these measures and standardizations were used in principal components analysis, principal coordinates analysis, non-metric multidimensional scaling, correspondence analysis, and detrended correspondence analysis. Correspondence analysis and principal components analysis using a correlation coefficient provided the most consistent results irrespective of the choice in data standardization. Other approaches using detrended correspondence analysis, principal components analysis, principal coordinates analysis, and non-metric multidimensional scaling provided less consistent results. These latter three methods produced similar results when the abundance data were replaced with ranks or standardized to a 0–1 range. The log-transformed data produced the least consistent results, whereas ranked data were most consistent. Resemblance measures such as the Bray-Curtis and correlation coefficient provided more consistent solutions than measures such as Euclidean distance or the covariance matrix when different data standardizations were used. The cosine distance based on standardized data provided results comparable to the CA and DCA solutions. Overall, CA proved most robust as it demonstrated high consistency irrespective of the data standardizations. The strong influence of data standardization on the other ordination methods emphasizes the importance of this frequently neglected stage of data analysis.  相似文献   

13.
The threat of biological warfare and the emergence of new infectious agents spreading at a global scale have highlighted the need for major enhancements to the public health infrastructure. Early detection of epidemics of infectious diseases requires both real-time data and real-time interpretation of data. Despite moderate advancements in data acquisition, the state of the practice for real-time analysis of data remains inadequate. We present a nonlinear mathematical framework for modeling the transient dynamics of influenza, applied to historical data sets of patients with influenza-like illness. We estimate the vital time-varying epidemiological parameters of infections from historical data, representing normal epidemiological trends. We then introduce simulated outbreaks of different shapes and magnitudes into the historical data, and estimate the parameters representing the infection rates of anomalous deviations from normal trends. Finally, a dynamic threshold-based detection algorithm is devised to assess the timeliness and sensitivity of detecting the irregularities in the data, under a fixed low false-positive rate. We find that the detection algorithm can identify such designated abnormalities in the data with high sensitivity with specificity held at 97%, but more importantly, early during an outbreak. The proposed methodology can be applied to a broad range of influenza-like infectious diseases, whether naturally occurring or a result of bioterrorism, and thus can be an integral component of a real-time surveillance system.  相似文献   

14.
Analysis of repeatability in spotted cDNA microarrays   总被引:7,自引:3,他引:4  
We report a strategy for analysis of data quality in cDNA microarrays based on the repeatability of repeatedly spotted clones. We describe how repeatability can be used to control data quality by developing adaptive filtering criteria for microarray data containing clones spotted in multiple spots. We have applied the method on five publicly available cDNA microarray data sets and one previously unpublished data set from our own laboratory. The results demonstrate the feasibility of the approach as a foundation for data filtering, and indicate a high degree of variation in data quality, both across the data sets and between arrays within data sets.  相似文献   

15.
高通量实验方法的发展导致大量基因组、转录组、代谢组等组学数据的出现,组学数据的整合为全面了解生物学系统提供了条件.但是,由于当前实验技术手段的限制,高通量组学数据大多存在系统偏差,数据类型和可靠程度也各不相同,这给组学数据的整合带来了困难.本文以转录组、蛋白质组和代谢组为重点,综述了近年来组学数据整合方面的研究进展,包括新的数据整合方法和分析平台.虽然现存的数据统计和网络分析的方法有助于发现不同组学数据之间的关联,但是生物学意义上的深层次的数据整合还有待于生物、数学、计算机等各种领域的全面发展.  相似文献   

16.
Analysis of large-scale gene expression data.   总被引:10,自引:0,他引:10  
DNA microarray technology has resulted in the generation of large complex data sets, such that the bottleneck in biological investigation has shifted from data generation, to data analysis. This review discusses some of the algorithms and tools for the analysis and organisation of microarray expression data, including clustering methods, partitioning methods, and methods for correlating expression data to other biological data.  相似文献   

17.
18.
MOTIVATION: The methods for analyzing overlap data are distinct from those for analyzing probe data, making integration of the two forms awkward. Conversion of overlap data to probe-like data elements would facilitate comparison and uniform integration of overlap data and probe data using software developed for analysis of STS data. RESULTS: We show that overlap data can be effectively converted to probe-like data elements by extracting maximal sets of mutually overlapping clones. We call these sets virtual probes, since each set determines a site in the genome corresponding to the region which is common among the clones of the set. Finding the virtual probes is equivalent to finding the maximal cliques of a graph. We modify a known maximal-clique algorithm such that it finds all virtual probes in a large dataset within minutes. We illustrate the algorithm by converting fingerprint and Alu-PCR overlap data to virtual probes. The virtual probes are then analyzed using double-linkage intersection graphs and structure graphs to show that methods designed for STS data are also applicable to overlap data represented as virtual probes. Next we show that virtual probes can produce a uniform integration of different kinds of mapping data, in particular STS probe data and fingerprint and Alu-PCR overlap data. The integrated virtual probes produce longer double-linkage contigs than STS probes alone, and in conjunction with structure graphs they facilitate the identification and elimination of anomalies. Thus, the virtual-probe technique provides: (i) a new way to examine overlap data; (ii) a basis on which to compare overlap data and probe data using the same systems and standards; and (iii) a unique and useful way to uniformly integrate overlap data with probe data.  相似文献   

19.
The method known as Analysis of Concentration (AOC) is proposed as a tool to measure the predictivity of binary data for cover data. The application of AOC to structured tables of oak forests of Central Italy has proved that binary data are more predictive for cover than cover for binary data. The ordinations produced by AOC with binary and cover data are very similar and interpretable with similar results.  相似文献   

20.
The collection of data on physical parameters of body segments is a preliminary critical step in studying the biomechanics of locomotion. Little data on nonhuman body segment parameters has been published. The lack of standardization of techniques for data collection and presentation has made the comparative use of these data difficult and at times impossible. This study offers an approach for collecting data on center of gravity and moments of inertia for standardized body segments. The double swing pendulum approach is proposed as a solution for difficulties previously encountered in calculating moments of inertia for body segments. A format for prompting a computer to perform these calculations is offered, and the resulting segment mass data for Lemur fulvus is presented.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号