首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 440 毫秒
1.
  1. Repeatability is the cornerstone of science, and it is particularly important for systematic reviews. However, little is known on how researchers’ choice of database, and search platform influence the repeatability of systematic reviews. Here, we aim to unveil how the computer environment and the location where the search was initiated from influence hit results.
  2. We present a comparative analysis of time‐synchronized searches at different institutional locations in the world and evaluate the consistency of hits obtained within each of the search terms using different search platforms.
  3. We revealed a large variation among search platforms and showed that PubMed and Scopus returned consistent results to identical search strings from different locations. Google Scholar and Web of Science''s Core Collection varied substantially both in the number of returned hits and in the list of individual articles depending on the search location and computing environment. Inconsistency in Web of Science results has most likely emerged from the different licensing packages at different institutions.
  4. To maintain scientific integrity and consistency, especially in systematic reviews, action is needed from both the scientific community and scientific search platforms to increase search consistency. Researchers are encouraged to report the search location and the databases used for systematic reviews, and database providers should make search algorithms transparent and revise access rules to titles behind paywalls. Additional options for increasing the repeatability and transparency of systematic reviews are storing both search metadata and hit results in open repositories and using Application Programming Interfaces (APIs) to retrieve standardized, machine‐readable search metadata.
  相似文献   

2.
With the explosive growth of biological data, the development of new means of data storage was needed. More and more often biological information is no longer published in the conventional way via a publication in a scientific journal, but only deposited into a database. In the last two decades these databases have become essential tools for researchers in biological sciences. Biological databases can be classified according to the type of information they contain. There are basically three types of sequence-related databases (nucleic acid sequences, protein sequences and protein tertiary structures) as well as various specialized data collections. It is important to provide the users of biomolecular databases with a degree of integration between these databases as by nature all of these databases are connected in a scientific sense and each one of them is an important piece to biological complexity. In this review we will highlight our effort in connecting biological information as demonstrated in the SWISS-PROT protein database.  相似文献   

3.
An effective strategy for managing protein databases is to provide mechanisms to transform raw data into consistent, accurate and reliable information. Such mechanisms will greatly reduce operational inefficiencies and improve one's ability to better handle scientific objectives and interpret the research results. To achieve this challenging goal for the STING project, we introduce Sting_RDB, a relational database of structural parameters for protein analysis with support for data warehousing and data mining. In this article, we highlight the main features of Sting_RDB and show how a user can explore it for efficient and biologically relevant queries. Considering its importance for molecular biologists, effort has been made to advance Sting_RDB toward data quality assessment. To the best of our knowledge, Sting_RDB is one of the most comprehensive data repositories for protein analysis, now also capable of providing its users with a data quality indicator. This paper differs from our previous study in many aspects. First, we introduce Sting_RDB, a relational database with mechanisms for efficient and relevant queries using SQL. Sting_rdb evolved from the earlier, text (flat file)-based database, in which data consistency and integrity was not guaranteed. Second, we provide support for data warehousing and mining. Third, the data quality indicator was introduced. Finally and probably most importantly, complex queries that could not be posed on a text-based database, are now easily implemented. Further details are accessible at the Sting_RDB demo web page: http://www.cbi.cnptia.embrapa.br/StingRDB.  相似文献   

4.
Nobel Prizes are commonly seen to be among the most prestigious achievements of our times. Based on mining several million citations, we quantitatively analyze the processes driving paradigm shifts in science. We find that groundbreaking discoveries of Nobel Prize Laureates and other famous scientists are not only acknowledged by many citations of their landmark papers. Surprisingly, they also boost the citation rates of their previous publications. Given that innovations must outcompete the rich-gets-richer effect for scientific citations, it turns out that they can make their way only through citation cascades. A quantitative analysis reveals how and why they happen. Science appears to behave like a self-organized critical system, in which citation cascades of all sizes occur, from continuous scientific progress all the way up to scientific revolutions, which change the way we see our world. Measuring the "boosting effect" of landmark papers, our analysis reveals how new ideas and new players can make their way and finally triumph in a world dominated by established paradigms. The underlying "boost factor" is also useful to discover scientific breakthroughs and talents much earlier than through classical citation analysis, which by now has become a widespread method to measure scientific excellence, influencing scientific careers and the distribution of research funds. Our findings reveal patterns of collective social behavior, which are also interesting from an attention economics perspective. Understanding the origin of scientific authority may therefore ultimately help to explain how social influence comes about and why the value of goods depends so strongly on the attention they attract.  相似文献   

5.
In the last decade, significant progress has been made in expanding the scope and depth of publicly available immunological databases and online analysis resources, which have become an integral part of the repertoire of tools available to the scientific community for basic and applied research. Herein, we present a general overview of different resources and databases currently available. Because of our association with the Immune Epitope Database and Analysis Resource, this resource is reviewed in more detail. Our review includes aspects such as the development of formal ontologies and the type and breadth of analytical tools available to predict epitopes and analyze immune epitope data. A common feature of immunological databases is the requirement to host large amounts of data extracted from disparate sources. Accordingly, we discuss and review processes to curate the immunological literature, as well as examples of how the curated data can be used to generate a meta-analysis of the epitope knowledge currently available for diseases of worldwide concern, such as influenza and malaria. Finally, we review the impact of immunological databases, by analyzing their usage and citations, and by categorizing the type of citations. Taken together, the results highlight the growing impact and utility of immunological databases for the scientific community.  相似文献   

6.
Knowledge of the structure, genetics, circuits, and physiological properties of the mammalian brain in both normal and pathological states is ever increasing as research labs worldwide probe the various aspects of brain function. Until recently, however, comprehensive cataloging of gene expression across the central nervous system has been lacking. The Allen Institute for Brain Science, as part of its mission to propel neuroscience research, has completed several large gene-mapping projects in mouse, nonhuman primate, and human brain, producing informative online public resources and tools. Here we present the Allen Mouse Brain Atlas, covering ~20,000 genes throughout the adult mouse brain; the Allen Developing Mouse Brain Atlas, detailing expression of approximately 2,000 important developmental genes across seven embryonic and postnatal stages of brain growth; and the Allen Spinal Cord Atlas, revealing expression for ~20,000 genes in the adult and neonatal mouse spinal cords. Integrated data-mining tools, including reference atlases, informatics analyses, and 3-D viewers, are described. For these massive-scale projects, high-throughput industrial techniques were developed to standardize and reliably repeat experimental goals. To verify consistency and accuracy, a detailed analysis of the 1,000 most viewed genes for the adult mouse brain (according to website page views) was performed by comparing our data with peer-reviewed literature and other databases. We show that our data are highly consistent with independent sources and provide a comprehensive compendium of information and tools used by thousands of researchers each month. All data and tools are freely available via the Allen Brain Atlas portal (www.brain-map.org).  相似文献   

7.
The era of big biodiversity data has led to rapid, exciting advances in the theoretical and applied biological, ecological and conservation sciences. While large genetic, geographic and trait databases are available, these are neither complete nor random samples of the globe. Gaps and biases in these databases reduce our inferential and predictive power, and this incompleteness is even more worrisome because we are ignorant of both its kind and magnitude. We performed a comprehensive examination of the taxonomic and spatial sampling in the most complete current databases for plant genes, locations and functional traits. To do this, we downloaded data from The Plant List (taxonomy), the Global Biodiversity Information Facility (locations), TRY (traits) and GenBank (genes). Only 17.7% of the world's described and accepted land plant species feature in all three databases, meaning that more than 82% of known plant biodiversity lacks representation in at least one database. Species coverage is highest for location data and lowest for genetic data. Bryophytes and orchids stand out taxonomically and the equatorial region stands out spatially as poorly represented in all databases. We have highlighted a number of clades and regions about which we know little functionally, spatially and genetically, on which we should set research targets. The scientific community should recognize and reward the significant value, both for biodiversity science and conservation, of filling in these gaps in our knowledge of the plant tree of life.  相似文献   

8.
Proteomics and the study of protein–protein interactions are becoming increasingly important in our effort to understand human diseases on a system-wide level. Thanks to the development and curation of protein-interaction databases, up-to-date information on these interaction networks is accessible and publicly available to the scientific community. As our knowledge of protein–protein interactions increases, it is important to give thought to the different ways that these resources can impact biomedical research. In this article, we highlight the importance of protein–protein interactions in human genetics and genetic epidemiology. Since protein–protein interactions demonstrate one of the strongest functional relationships between genes, combining genomic data with available proteomic data may provide us with a more in-depth understanding of common human diseases. In this review, we will discuss some of the fundamentals of protein interactions, the databases that are publicly available and how information from these databases can be used to facilitate genome-wide genetic studies.  相似文献   

9.

Background

Resveratrol is a natural compound suggested to have beneficial health effects. However, people are consuming resveratrol for this reason without having the adequate scientific evidence for its effects in humans. Therefore, scientific valid recommendations concerning the human intake of resveratrol based on available published scientific data are necessary. Such recommendations were formulated after the Resveratrol 2010 conference, held in September 2010 in Helsingør, Denmark.

Methodology

Literature search in databases as PubMed and ISI Web of Science in combination with manual search was used to answer the following five questions: 1Can resveratrol be recommended in the prevention or treatment of human diseases?; 2Are there observed “side effects” caused by the intake of resveratrol in humans?; 3What is the relevant dose of resveratrol?; 4What valid data are available regarding an effect in various species of experimental animals?; 5Which relevant (overall) mechanisms of action of resveratrol have been documented?

Conclusions/Significance

The overall conclusion is that the published evidence is not sufficiently strong to justify a recommendation for the administration of resveratrol to humans, beyond the dose which can be obtained from dietary sources. On the other hand, animal data are promising in prevention of various cancer types, coronary heart diseases and diabetes which strongly indicate the need for human clinical trials. Finally, we suggest directions for future research in resveratrol regarding its mechanism of action and its safety and toxicology in human subjects.  相似文献   

10.

Background

Mobile health (mHealth) has undergone exponential growth in recent years. Patients and healthcare professionals are increasingly using health-related applications, at the same time as concerns about ethical issues, bias, conflicts of interest and privacy are emerging. The general aim of this paper is to provide an overview of the current state of development of mHealth.

Methods and Findings

To exemplify the issues, we made a systematic review of the pain-related apps available in scientific databases (Medline, Web of Science, Gale, Psycinfo, etc.) and the main application shops (App Store, Blackberry App World, Google Play, Nokia Store and Windows Phone Store). Only applications (designed for both patients and clinicians) focused on pain education, assessment and treatment were included. Of the 47 papers published on 34 apps in scientific databases, none were available in the app shops. A total of 283 pain-related apps were found in the five shops searched, but no articles have been published on these apps. The main limitation of this review is that we did not look at all stores in all countries.

Conclusions

There is a huge gap between the scientific and commercial faces of mHealth. Specific efforts are needed to facilitate knowledge translation and regulate commercial health-related apps.  相似文献   

11.
Proteins of unknown function are a barrier to our understanding of molecular biology. Assigning function to these "uncharacterized" proteins is imperative, but challenging. The usual approach is similarity searches using annotation databases, which are useful for predicting function. However, since the performance of these databases on uncharacterized proteins is basically unknown, the accuracy of their predictions is suspect, making annotation difficult. To address this challenge, we developed a benchmark annotation dataset of 30 proteins in Shewanella oneidensis. The proteins in the dataset were originally uncharacterized after the initial annotation of the S. oneidensis proteome in 2002. In the intervening 5 years, the accumulation of new experimental evidence has enabled specific functions to be predicted. We utilized this benchmark dataset to evaluate several commonly utilized annotation databases. According to our criteria, six annotation databases accurately predicted functions for at least 60% of proteins in our dataset. Two of these six even had a "conditional accuracy" of 90%. Conditional accuracy is another evaluation metric we developed which excludes results from databases where no function was predicted. Also, 27 of the 30 proteins' functions were correctly predicted by at least one database. These represent one of the first performance evaluations of annotation databases on uncharacterized proteins. Our evaluation indicates that these databases readily incorporate new information and are accurate in predicting functions for uncharacterized proteins, provided that experimental function evidence exists.  相似文献   

12.
Nowadays we are experiencing a remarkable growth in the number of databases that have become accessible over the Web. However, in a certain number of cases, for example, in the case of BioImage, this information is not of a textual nature, thus posing new challenges in the design of tools to handle these data. In this work, we concentrate on the development of new mechanisms aimed at "querying" these databases of complex data sets by their intrinsic content, rather than by their textual annotations only. We concentrate our efforts on a subset of BioImage containing 3D images (volumes) of biological macromolecules, implementing a first prototype of a "query-by-content" system. In the context of databases of complex data types the term query-by-content makes reference to those data modeling techniques in which user-defined functions aim at "understanding" (to some extent) the informational content of the data sets. In these systems the matching criteria introduced by the user are related to intrinsic features concerning the 3D images themselves, hence, complementing traditional queries by textual key words only. Efficient computational algorithms are required in order to "extract" structural information of the 3D images prior to storing them in the database. Also, easy-to-use interfaces should be implemented in order to obtain feedback from the expert. Our query-by-content prototype is used to construct a concrete query, making use of basic structural features, which are then evaluated over a set of three-dimensional images of biological macromolecules. This experimental implementation can be accessed via the Web at the BioImage server in Madrid, at http://www.bioimage.org/qbc/index.html.  相似文献   

13.

Background

Studies that use electronic health databases as research material are getting popular but the influence of a single electronic health database had not been well investigated yet. The United Kingdom''s General Practice Research Database (GPRD) is one of the few electronic health databases publicly available to academic researchers. This study analyzed studies that used GPRD to demonstrate the scientific production and academic impact by a single public health database.

Methodology and Findings

A total of 749 studies published between 1995 and 2009 with ‘General Practice Research Database’ as their topics, defined as GPRD studies, were extracted from Web of Science. By the end of 2009, the GPRD had attracted 1251 authors from 22 countries and been used extensively in 749 studies published in 193 journals across 58 study fields. Each GPRD study was cited 2.7 times by successive studies. Moreover, the total number of GPRD studies increased rapidly, and it is expected to reach 1500 by 2015, twice the number accumulated till the end of 2009. Since 17 of the most prolific authors (1.4% of all authors) contributed nearly half (47.9%) of GPRD studies, success in conducting GPRD studies may accumulate. The GPRD was used mainly in, but not limited to, the three study fields of “Pharmacology and Pharmacy”, “General and Internal Medicine”, and “Public, Environmental and Occupational Health”. The UK and United States were the two most active regions of GPRD studies. One-third of GRPD studies were internationally co-authored.

Conclusions

A public electronic health database such as the GPRD will promote scientific production in many ways. Data owners of electronic health databases at a national level should consider how to reduce access barriers and to make data more available for research.  相似文献   

14.
A semantic analysis of the annotations of the human genome   总被引:2,自引:0,他引:2  
The correct interpretation of any biological experiment depends in an essential way on the accuracy and consistency of the existing annotation databases. Such databases are ubiquitous and used by all life scientists in most experiments. However, it is well known that such databases are incomplete and many annotations may also be incorrect. In this paper we describe a technique that can be used to analyze the semantic content of such annotation databases. Our approach is able to extract implicit semantic relationships between genes and functions. This ability allows us to discover novel functions for known genes. This approach is able to identify missing and inaccurate annotations in existing annotation databases, and thus help improve their accuracy. We used our technique to analyze the current annotations of the human genome. From this body of annotations, we were able to predict 212 additional gene-function assignments. A subsequent literature search found that 138 of these gene-functions assignments are supported by existing peer-reviewed papers. An additional 23 assignments have been confirmed in the meantime by the addition of the respective annotations in later releases of the Gene Ontology database. Overall, the 161 confirmed assignments represent 75.95% of the proposed gene-function assignments. Only one of our predictions (0.4%) was contradicted by the existing literature. We could not find any relevant articles for 50 of our predictions (23.58%). The method is independent of the organism and can be used to analyze and improve the quality of the data of any public or private annotation database.  相似文献   

15.

Background

Extracting biological knowledge from large amounts of gene expression information deposited in public databases is a major challenge of the postgenomic era. Additional insights may be derived by data integration and cross-platform comparisons of expression profiles. However, database meta-analysis is complicated by differences in experimental technologies, data post-processing, database formats, and inconsistent gene and sample annotation.

Results

We have analysed expression profiles from three public databases: Gene Expression Atlas, SAGEmap and TissueInfo. These are repositories of oligonucleotide microarray, Serial Analysis of Gene Expression and Expressed Sequence Tag human gene expression data respectively. We devised a method, Preferential Expression Measure, to identify genes that are significantly over- or under-expressed in any given tissue. We examined intra- and inter-database consistency of Preferential Expression Measures. There was good correlation between replicate experiments of oligonucleotide microarray data, but there was less coherence in expression profiles as measured by Serial Analysis of Gene Expression and Expressed Sequence Tag counts. We investigated inter-database correlations for six tissue categories, for which data were present in the three databases. Significant positive correlations were found for brain, prostate and vascular endothelium but not for ovary, kidney, and pancreas.

Conclusion

We show that data from Gene Expression Atlas, SAGEmap and TissueInfo can be integrated using the UniGene gene index, and that expression profiles correlate relatively well when large numbers of tags are available or when tissue cellular composition is simple. Finally, in the case of brain, we demonstrate that when PEM values show good correlation, predictions of tissue-specific expression based on integrated data are very accurate.
  相似文献   

16.
When we look at the rapid growth of scientific databases on the Internet in the past decade, we tend to take the accessibility and provenance of the data for granted. As we see a future of increased database integration, the licensing of the data may be a hurdle that hampers progress and usability. We have formulated four rules for licensing data for open drug discovery, which we propose as a starting point for consideration by databases and for their ultimate adoption. This work could also be extended to the computational models derived from such data. We suggest that scientists in the future will need to consider data licensing before they embark upon re-using such content in databases they construct themselves.  相似文献   

17.
The National Science Foundation’s EarthCube End User Workshop was held at USC Wrigley Marine Science Center on Catalina Island, California in August 2013. The workshop was designed to explore and characterize the needs and tools available to the community that is focusing on microbial and physical oceanography research with a particular emphasis on ‘omic research. The assembled researchers outlined the existing concerns regarding the vast data resources that are being generated, and how we will deal with these resources as their volume and diversity increases. Particular attention was focused on the tools for handling and analyzing the existing data, on the need for the construction and curation of diverse federated databases, as well as development of shared, interoperable, “big-data capable” analytical tools. The key outputs from this workshop include (i) critical scientific challenges and cyber infrastructure constraints, (ii) the current and future ocean ‘omics science grand challenges and questions, and (iii) data management, analytical and associated and cyber-infrastructure capabilities required to meet critical current and future scientific challenges. The main thrust of the meeting and the outcome of this report is a definition of the ‘omics tools, technologies and infrastructures that facilitate continued advance in ocean science biology, marine biogeochemistry, and biological oceanography.  相似文献   

18.
Privacy laws are intended to preserve human well-being and improve medical outcomes. We used the Sportstats website, a repository of competitive athletic data, to test how easily these laws can be circumvented. We designed a haphazard, unrepresentative case-series analysis and applied unscientific methods based on an Internet connection and idle time. We found it both feasible and titillating to breach anonymity, stockpile personal information and generate misquotations. We extended our methods to snoop on celebrities, link to outside databases and uncover refusal to participate. Throughout our study, we evaded capture and public humiliation despite violating these 6 privacy fundamentals. We suggest that the legitimate principle of safeguarding personal privacy is undermined by the natural human tendency toward showing off.We are shocked! Shocked! Shocked! We are shocked at the amount of sensitive personal information being released on thousands of Canadians, including some of our country''s most prominent citizens. The widespread dispersal of and the easy access to health data offends our sensibilities as medical scientists who are respectful of Canadian privacy laws. We prefer to jump through innumerable bureaucratic hoops to obtain data for research, and we believe that our rivals in other scientific fields ought to do the same.We uphold traditional values. We reminisce about the golden age when conducting a chart review was the standard for measuring quality of care. Ethics submissions were like sustained foreplay, and privacy impact assessments provided another thrill verging on “joy of the forbidden.” The 3-week turnarounds gave us time to savour and appreciate every passing minute. And joy! Even more delays occurred when health records departments could not find the relevant charts.Woe unto those who visit the Sportstats website (www.sportstats.ca).1 This site reveals personal data obtained from timers affixed to athletes competing in sporting events across North America. This database is thorough and is searchable for many past years. In fact, we recommend using these data if you need personal information about your neighbour, nemesis or boss. In this article, we offer pointers on 6 violations of privacy for those mavericks who flaunt the scientific establishment (not us!).  相似文献   

19.
Datta S  Sundaram R 《Biometrics》2006,62(3):829-837
Multistage models are used to describe individuals (or experimental units) moving through a succession of "stages" corresponding to distinct states (e.g., healthy, diseased, diseased with complications, dead). The resulting data can be considered to be a form of multivariate survival data containing information about the transition times and the stages occupied. Traditional survival analysis is the simplest example of a multistage model, where individuals begin in an initial stage (say, alive) and move irreversibly to a second stage (death). In this article, we consider general multistage models with a directed tree structure (progressive models) in which individuals traverse through stages in a possibly non-Markovian manner. We construct nonparametric estimators of stage occupation probabilities and marginal cumulative transition hazards. Empirical calculations of these quantities are not possible due to the lack of complete data. We consider current status information which represents a more severe form of censoring than the commonly used right censoring. Asymptotic validity of our estimators can be justified using consistency results for nonparametric regression estimators. Finite-sample behavior of our estimators is studied by simulation, in which we show that our estimators based on these limited data compare well with those based on complete data. We also apply our method to a real-life data set arising from a cardiovascular diseases study in Taiwan.  相似文献   

20.

   

Attempts to engage the scientific community to annotate biological data (such as protein/gene function) stored in databases have not been overly successful. There are several hypotheses on why this has not been successful but it is not clear which of these hypotheses are correct. In this study we have surveyed 50 biologists (who have recently published a paper characterizing a gene or protein) to better understand what would make them interested in providing input/contributions to biological databases. Based on our survey two things become clear: a) database managers need to proactively contact biologists to solicit contributions; and b) potential contributors need to be provided with an easy-to-use interface and clear instructions on what to annotate. Other factors such as 'reward' and 'employer/funding agency recognition' previously perceived as motivators was found to be less important. Based on this study we propose community annotation projects should devote resources to direct solicitation for input and streamlining of the processes or interfaces used to collect this input.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号