首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
Data sharing by scientists: practices and perceptions   总被引:10,自引:0,他引:10  

Background

Scientific research in the 21st century is more data intensive and collaborative than in the past. It is important to study the data practices of researchers – data accessibility, discovery, re-use, preservation and, particularly, data sharing. Data sharing is a valuable part of the scientific method allowing for verification of results and extending research from prior results.

Methodology/Principal Findings

A total of 1329 scientists participated in this survey exploring current data sharing practices and perceptions of the barriers and enablers of data sharing. Scientists do not make their data electronically available to others for various reasons, including insufficient time and lack of funding. Most respondents are satisfied with their current processes for the initial and short-term parts of the data or research lifecycle (collecting their research data; searching for, describing or cataloging, analyzing, and short-term storage of their data) but are not satisfied with long-term data preservation. Many organizations do not provide support to their researchers for data management both in the short- and long-term. If certain conditions are met (such as formal citation and sharing reprints) respondents agree they are willing to share their data. There are also significant differences and approaches in data management practices based on primary funding agency, subject discipline, age, work focus, and world region.

Conclusions/Significance

Barriers to effective data sharing and preservation are deeply rooted in the practices and culture of the research process as well as the researchers themselves. New mandates for data management plans from NSF and other federal agencies and world-wide attention to the need to share and preserve data could lead to changes. Large scale programs, such as the NSF-sponsored DataNET (including projects like DataONE) will both bring attention and resources to the issue and make it easier for scientists to apply sound data management principles.  相似文献   

2.

Background

Significant efforts are underway within the biomedical research community to encourage sharing and reuse of research data in order to enhance research reproducibility and enable scientific discovery. While some technological challenges do exist, many of the barriers to sharing and reuse are social in nature, arising from researchers’ concerns about and attitudes toward sharing their data. In addition, clinical and basic science researchers face their own unique sets of challenges to sharing data within their communities. This study investigates these differences in experiences with and perceptions about sharing data, as well as barriers to sharing among clinical and basic science researchers.

Methods

Clinical and basic science researchers in the Intramural Research Program at the National Institutes of Health were surveyed about their attitudes toward and experiences with sharing and reusing research data. Of 190 respondents to the survey, the 135 respondents who identified themselves as clinical or basic science researchers were included in this analysis. Odds ratio and Fisher’s exact tests were the primary methods to examine potential relationships between variables. Worst-case scenario sensitivity tests were conducted when necessary.

Results and Discussion

While most respondents considered data sharing and reuse important to their work, they generally rated their expertise as low. Sharing data directly with other researchers was common, but most respondents did not have experience with uploading data to a repository. A number of significant differences exist between the attitudes and practices of clinical and basic science researchers, including their motivations for sharing, their reasons for not sharing, and the amount of work required to prepare their data.

Conclusions

Even within the scope of biomedical research, addressing the unique concerns of diverse research communities is important to encouraging researchers to share and reuse data. Efforts at promoting data sharing and reuse should be aimed at solving not only technological problems, but also addressing researchers’ concerns about sharing their data. Given the varied practices of individual researchers and research communities, standardizing data practices like data citation and repository upload could make sharing and reuse easier.  相似文献   

3.
We developed a database system for collaborative HIV analysis (DBCollHIV) in Brazil. The main purpose of our DBCollHIV project was to develop an HIV-integrated database system with analytical bioinformatics tools that would support the needs of Brazilian research groups for data storage and sequence analysis. Whenever authorized by the principal investigator, this system also allows the integration of data from different studies and/or the release of the data to the general public. The development of a database that combines sequences associated with clinical/epidemiological data is difficult without the active support of interdisciplinary investigators. A functional database that securely stores data and helps the investigator to manipulate their sequences before publication would be an attractive tool for investigators depositing their data and collaborating with other groups. DBCollHIV allows investigators to manipulate their own datasets, as well as integrating molecular and clinical HIV data, in an innovative fashion.  相似文献   

4.
Species distributions are already affected by climate change. Forecasting their long‐term evolution requires models with thoroughly assessed validation. Our aim here is to demonstrate that the sensitivity of such models to climate input characteristics may complicate their validation and introduce uncertainties in their predictions. In this study, we conducted a sensitivity analysis of a process‐based tree distribution model Phenofit to climate input characteristics. This analysis was conducted for two North American trees which differ greatly in their distribution and eight different types of climate input for the historic period which differ in their spatial (local or gridded data) and temporal (daily vs. monthly) resolution as well as their type (locally recorded, extrapolated or simulated by General Circulation Models). We show that the climate data resolution (spatial and temporal) and their type, highly affect the model predictions. The sensitivity analysis also revealed, the importance, for global climate change impact assessment, of (i) the daily variability of temperatures in modeling the biological processes shaping species distribution, (ii) climate data at high latitudes and elevations and (iii) climate data with high spatial resolution.  相似文献   

5.
Over recent years, a number of initiatives have proposed standard reporting guidelines for functional genomics experiments. Associated with these are data models that may be used as the basis of the design of software tools that store and transmit experiment data in standard formats. Central to the success of such data handling tools is their usability. Successful data handling tools are expected to yield benefits in time saving and in quality assurance. Here, we describe the collection of datasets that conform to the recently proposed data model for plant metabolomics known as ArMet (architecture for metabolomics) and illustrate a number of approaches to robust data collection that have been developed in collaboration between software engineers and biologists. These examples also serve to validate ArMet from the data collection perspective by demonstrating that a range of software tools, supporting data recording and data upload to central databases, can be built using the data model as the basis of their design.  相似文献   

6.
The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared.  相似文献   

7.
Due to their pharmacological importance in the oxidation of amine neurotransmitters, the membrane-bound flavoenzymes monoamine oxidase A and monoamine oxidase B have attracted numerous investigations and, as a result, two different mechanisms; the single electron transfer and the polar nucleophilic mechanisms, have been proposed to describe their catalytic mechanisms. This review compiles the recently available structural data on both enzymes with available mechanistic data as well as current NMR data on flavin systems to provide an integration of the approaches. These conclusions support the proposal that a polar nucleophilic mechanism for amine oxidation is the most consistent mechanistic scheme as compared with the single electron transfer mechanism.  相似文献   

8.
The Protein Circular Dichroism Data Bank (PCDDB) is a web-based resource containing circular dichroism (CD) and synchrotron radiation circular dichroism spectral and associated metadata located at http://pcddb.cryst.bbk.ac.uk. This resource provides a freely available, user-friendly means of accessing validated CD spectra and their associated experimental details and metadata, thereby enabling broad usage of this material and new developments across the structural biology, chemistry, and bioinformatics communities. The resource also enables researchers utilizing CD as an experimental technique to have a means of storing their data at a secure site from which it is easily retrievable, thereby making their results publicly accessible, a current requirement of many grant-funding agencies world-wide, as well as meeting the data-sharing requirements for journal publications. This tutorial provides extensive information on searching, accessing, and downloading procedures for those who wish to utilize the data available in the data bank, and detailed information on deposition procedures for creating and validating entries, including comprehensive explanations of their contents and formats, for those who wish to include their data in the data bank. Chirality 24:751-763, 2012. ? 2012 Wiley Periodicals, Inc.  相似文献   

9.
Cornelissen and Haus (1) are certainly correct in their assertion that computerized data collection and analysis are useful and important tools in chronobiological research. We also agree that circadian rhythm data should be subjected to multiple complementary analyses whenever possible. In fact, both of our laboratories currently employ on-line collection of behavioural activity data, as well as several different computer-assisted statistical procedures, including analyses similar to those referred to by Cornelissen and Haus in their commentary.  相似文献   

10.
A test statistic that is valid for data collected according to a particular type of family study design is not necessarily valid when applied to data obtained from a different type of family study design. When this can occur, a different test that usually is valid is developed for each type of family study design. However, investigators might find that their data come from two (or more) different family study designs, each requiring a different test, yet they want an overall conclusion, essentially a valid hypothesis test that is as powerful as possible. When the underlying genetic model is unknown, it is not clear how to proceed, as several alternative approaches might appear feasible. By using as an example the development of a test of association for data concerning affected singletons and their parents and affected sib pairs and their parents, it is shown that it may not be possible to develop a universally optimal approach without knowledge of the underlying genetic model.  相似文献   

11.
Cranston and Humphries (1988) expose Sæther's (1976) revision of the Hydrobaenus grou of enera (Chironomidae, Ditera) to the vagaries of quantitative phyletics. In the rocess they have clearly shown why at feast their method is not in accordance with the view of Hennig. In the qualitative Hennigian method the parsimony criterion is used when choosing among alternative hypotheses of explanation of single character distribution. The selection and interpretation equals the cladogenetic analysis. In neocladistic methods the parsimony criterion is usel in order to find the tree implying the fewest evolutionary ains and losses with the fewest lines. The explanation of characters enters as an afterthought. The differences between the methods are shown by analyzing a theoretical data matrix as well as by reassessment of the results obtained by Cranston and Humphries. Their data critique is met point by oint, their data matrix, which is to a large extent erroneous, is corrected, and their data reanalyzed using their and alternative outgroups. The tree topologies remain similar to each other as well as to the original qualitative analysis since there is little inside homoplasy but the changes proposed by Cranston and Humphries are shown invalid.  相似文献   

12.
Studies of the genomes of individual microbial organisms as well as aggregate genomes (metagenomes) of microbial communities are expected to lead to advances in various areas, such as healthcare, environmental cleanup, and alternative energy production. A variety of specialized data resources manage the results of different microbial genome data processing and interpretation stages, and represent different degrees of microbial genome characterization. Scientists studying microbial genomes and metagenomes often need one or several of these resources. Given their diversity, these resources cannot be used effectively without determining the scope and type of individual resources as well as the relationship between their data.  相似文献   

13.
Empirical evidence suggests that while people hold the capacity to control their data in high regard, they increasingly experience a loss of control over their data in the online world. The capacity to exert control over the generation and flow of personal information is a fundamental premise to important values such as autonomy, privacy, and trust. In healthcare and clinical research this capacity is generally achieved indirectly, by agreeing to specific conditions of informational exposure. Such conditions can be openly stated in informed consent documents or be implicit in the norms of confidentiality that govern the relationships of patients and healthcare professionals. However, with medicine becoming a data-intense enterprise, informed consent and medical confidentiality, as mechanisms of control, are put under pressure. In this paper we explore emerging models of informational control in data-intense healthcare and clinical research, which can compensate for the limitations of currently available instruments. More specifically, we discuss three approaches that hold promise in increasing individual control: the emergence of data portability rights as means to control data access, new mechanisms of informed consent as tools to control data use, and finally, new participatory governance schemes that allow individuals to control their data through direct involvement in data governance. We conclude by suggesting that, despite the impression that biomedical big data diminish individual control, the synergistic effect of new data management models can in fact improve it.  相似文献   

14.
The amount of data currently being generated by proteomics laboratories around the world is increasing exponentially, making it ever more critical that scientists are able to exchange, compare and retrieve datasets when re-evaluation of their original conclusions becomes important. Only a fraction of this data is published in the literature and important information is being lost every day as data formats become obsolete. The Human Proteome Organisation Proteomics Standards Initiative (HUPO-PSI) was tasked with the creation of data standards and interchange formats to allow both the exchange and storage of such data irrespective of the hardware and software from which it was generated. This article will provide an update on the work of this group, the creation and implementation of these standards and the standards-compliant data repositories being established as result of their efforts.  相似文献   

15.
Johnson et al. (2013) found that morphometric measurements of dragonfly wings taken from actual specimens and measurements taken from whole-drawer images of those specimens were equally accurate. We do not believe that their conclusions are justified by their data and analysis. Our reasons are, first, that their study was constrained in ways that restrict the generalisability of their results, but second, and of far greater significance, their statistical approach was entirely unsuited to their data and their results misled them to erroneous conclusions. We offer an alternative analysis of their data as published. Our reanalysis demonstrates, contra Johnson et al., that measurements from scanned images are not a reliable substitute for direct measurement.  相似文献   

16.
Introducing a new (freeware) tool for palynology   总被引:3,自引:0,他引:3  
We present a multiple-access key and searchable data base to Neotropical pollen that is available as freeware. The data base is based on FileMaker 5 and contains c . 6000 images of >1000 taxa. All pollen images are of acetolysed grains collected from vouchered herbarium specimens. The selection of taxa to be included in the data base is predicated upon their probable occurrence in lake sedimentary records, which in turn was based on their flower structure, sexual mechanisms and ecology. The multiple-access key is a forgiving format as it can be used with incomplete data or where the researcher cannot decide between the choices offered. The data base is downloadable and is compatible with both Mac and PC platforms.  相似文献   

17.
Summary : Often clinical studies periodically record information on disease progression as well as results from laboratory studies that are believed to reflect the progressing stages of the disease. A primary aim of such a study is to determine the relationship between the lab measurements and a disease progression. If there were no missing or censored data, these analyses would be straightforward. However, often patients miss visits, and return after their disease has progressed. In this case, not only is their progression time interval censored, but their lab test series is also incomplete. In this article, we propose a simple test for the association between a longitudinal marker and an event time from incomplete data. We derive the test using a very intuitive technique of calculating the expected complete data score conditional on the observed incomplete data (conditional expected score test, CEST). The problem was motivated by data from an observational study of patients with diabetes.  相似文献   

18.
Valeriana officinalis s. l. is an extremely polymorphic polyploid complex. A multivariate morphometric study is the only adequate method to cope with the diversity largely based on quantitative differences. 50 morphological characters have been carefully chosen with regard to their expected taxonomic relevance. The use of program packages for (multivariate) statistical and taxometric evaluation of the data has been made possible by the development of a series of computer programs. They allow different manipulations of the data as well as the extraction of distance or similarity matrices from mixed-type data (i.e. data containing quantitative as well as binary characters). In addition, a computer-based study of the relationships between variables has been carried out by a grouping of variables into subsets according to their B-coefficients (being defined as the ratio of the average of the intercorrelations among the variables of a group to their average correlations with the remaining variables). This technique can also be applied in cases where parts of the material show extremely different correlation patterns, as inValeriana officinalis s. l.  相似文献   

19.
20.

Background  

One of the consequences of the rapid and widespread adoption of high-throughput experimental technologies is an exponential increase of the amount of data produced by genome-wide experiments. Researchers increasingly need to handle very large volumes of heterogeneous data, including both the data generated by their own experiments and the data retrieved from publicly available repositories of genomic knowledge. Integration, exploration, manipulation and interpretation of data and information therefore need to become as automated as possible, since their scale and breadth are, in general, beyond the limits of what individual researchers and the basic data management tools in normal use can handle. This paper describes Genephony, a tool we are developing to address these challenges.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号