首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The estimation of disease prevalence in online search engine data (e.g., Google Flu Trends (GFT)) has received a considerable amount of scholarly and public attention in recent years. While the utility of search engine data for disease surveillance has been demonstrated, the scientific community still seeks ways to identify and reduce biases that are embedded in search engine data. The primary goal of this study is to explore new ways of improving the accuracy of disease prevalence estimations by combining traditional disease data with search engine data. A novel method, Biased Sentinel Hospital-based Area Disease Estimation (B-SHADE), is introduced to reduce search engine data bias from a geographical perspective. To monitor search trends on Hand, Foot and Mouth Disease (HFMD) in Guangdong Province, China, we tested our approach by selecting 11 keywords from the Baidu index platform, a Chinese big data analyst similar to GFT. The correlation between the number of real cases and the composite index was 0.8. After decomposing the composite index at the city level, we found that only 10 cities presented a correlation of close to 0.8 or higher. These cities were found to be more stable with respect to search volume, and they were selected as sample cities in order to estimate the search volume of the entire province. After the estimation, the correlation improved from 0.8 to 0.864. After fitting the revised search volume with historical cases, the mean absolute error was 11.19% lower than it was when the original search volume and historical cases were combined. To our knowledge, this is the first study to reduce search engine data bias levels through the use of rigorous spatial sampling strategies.  相似文献   

2.
The Technology Portal of the Protein Structure Initiative Structural Biology Knowledgebase (PSI SBKB; http://technology.sbkb.org/portal/ ) is a web resource providing information about methods and tools that can be used to relieve bottlenecks in many areas of protein production and structural biology research. Several useful features are available on the web site, including multiple ways to search the database of over 250 technological advances, a link to videos of methods on YouTube, and access to a technology forum where scientists can connect, ask questions, get news, and develop collaborations. The Technology Portal is a component of the PSI SBKB ( http://sbkb.org ), which presents integrated genomic, structural, and functional information for all protein sequence targets selected by the Protein Structure Initiative. Created in collaboration with the Nature Publishing Group, the SBKB offers an array of resources for structural biologists, such as a research library, editorials about new research advances, a featured biological system each month, and a functional sleuth for searching protein structures of unknown function. An overview of the various features and examples of user searches highlight the information, tools, and avenues for scientific interaction available through the Technology Portal.  相似文献   

3.
Knowing which proteins interact with each other is essential information for understanding how most biological processes at the cellular and organismal level operate and how their perturbation can cause disease. Continuous technical and methodological advances over the last two decades have led to many genome-wide systematically-generated protein–protein interaction (PPI) maps. To help store, visualize, analyze and disseminate these specialized experimental datasets via the web, we developed the freely-available Open-source Protein Interaction Platform (openPIP) as a customizable web portal designed to host experimental PPI maps. Such a portal is often required to accompany a paper describing the experimental data set, in addition to depositing the data in a standard repository. No coding skills are required to set up and customize the database and web portal. OpenPIP has been used to build the databases and web portals of two major protein interactome maps, the Human and Yeast Reference Protein Interactome maps (HuRI and YeRI, respectively). OpenPIP is freely available as a ready-to-use Docker container for hosting and sharing PPI data with the scientific community at http://openpip.baderlab.org/ and the source code can be downloaded from https://github.com/BaderLab/openPIP/.  相似文献   

4.
ABSTRACT: BACKGROUND: Seqcrawler takes its roots in software like SRS or Lucegene. It provides an indexing platform to ease the search of data and meta-data in biological banks and it can scale to face the current flow of data. While many biological bank search tools are available on the Internet, mainly provided by large organizations to search in their data, there is a lack of free and open source solution to browse one own set of data with a flexible query system and able to scale from single computer to a cloud system. A personal index platform will help labs and bioinformaticians to search in their meta-data but also to build a larger information system with custom subsets of data. RESULTS: The software is scalable from a single computer to a cloud-based infrastructure. It has been successfully tested in a private cloud with 3 index shards (piece of index) hosting ~400 millions of sequence information (whole GenBank, UniProt, PDB and others) for a total size of 600 GB in a fault tolerant architecture (high-availability). It has also been successfully integrated with software to add extra meta-data from blast results to enhance user's result analysis. CONCLUSIONS: Seqcrawler provides a complete open source search and store solution for labs or platforms needing to manage large amount of data/meta-data with a flexible and customizable web interface. All components (search engine, visualization and data storage), though independent, share a common and coherent data system that can be queried with a simple HTTP interface. The solution scales easily and can also provide a high availability infrastructure.  相似文献   

5.
The NIDDK Information Network (dkNET; http://dknet.org) was launched to serve the needs of basic and clinical investigators in metabolic, digestive and kidney disease by facilitating access to research resources that advance the mission of the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK). By research resources, we mean the multitude of data, software tools, materials, services, projects and organizations available to researchers in the public domain. Most of these are accessed via web-accessible databases or web portals, each developed, designed and maintained by numerous different projects, organizations and individuals. While many of the large government funded databases, maintained by agencies such as European Bioinformatics Institute and the National Center for Biotechnology Information, are well known to researchers, many more that have been developed by and for the biomedical research community are unknown or underutilized. At least part of the problem is the nature of dynamic databases, which are considered part of the “hidden” web, that is, content that is not easily accessed by search engines. dkNET was created specifically to address the challenge of connecting researchers to research resources via these types of community databases and web portals. dkNET functions as a “search engine for data”, searching across millions of database records contained in hundreds of biomedical databases developed and maintained by independent projects around the world. A primary focus of dkNET are centers and projects specifically created to provide high quality data and resources to NIDDK researchers. Through the novel data ingest process used in dkNET, additional data sources can easily be incorporated, allowing it to scale with the growth of digital data and the needs of the dkNET community. Here, we provide an overview of the dkNET portal and its functions. We show how dkNET can be used to address a variety of use cases that involve searching for research resources.  相似文献   

6.
Access to public data sets is important to the scientific community as a resource to develop new experiments or validate new data. Projects such as the PeptideAtlas, Ensembl and The Cancer Genome Atlas (TCGA) offer both access to public data and a repository to share their own data. Access to these data sets is often provided through a web page form and a web service API. Access technologies based on web protocols (e.g. http) have been in use for over a decade and are widely adopted across the industry for a variety of functions (e.g. search, commercial transactions, and social media). Each architecture adapts these technologies to provide users with tools to access and share data. Both commonly used web service technologies (e.g. REST and SOAP), and custom-built solutions over HTTP are utilized in providing access to research data. Providing multiple access points ensures that the community can access the data in the simplest and most effective manner for their particular needs. This article examines three common access mechanisms for web accessible data: BioMart, caBIG, and Google Data Sources. These are illustrated by implementing each over the PeptideAtlas repository and reviewed for their suitability based on specific usages common to research. BioMart, Google Data Sources, and caBIG are each suitable for certain uses. The tradeoffs made in the development of the technology are dependent on the uses each was designed for (e.g. security versus speed). This means that an understanding of specific requirements and tradeoffs is necessary before selecting the access technology.  相似文献   

7.
生物信息检索和获取成为当前非常重要而紧迫的研究课题,我们采用元搜索引擎技术、JAVA和HTML编程语言研制开发了基于WWW生物信息集成检索系统,该系统提供了统一的检索界面,对15分子生物学数据库和3个通用搜索引擎实现多功能、复合型、全开放的集成检索。该系统为广大从事医学、分子生物学、分子肿瘤学、分子遗传学以度人类基因组的科研人员准确、及时、综合获取多种生物信息资源,具有极为重要而深远的意义。  相似文献   

8.
Microarrays and more recently RNA sequencing has led to an increase in available gene expression data. How to manage and store this data is becoming a key issue. In response we have developed EXP-PAC, a web based software package for storage, management and analysis of gene expression and sequence data. Unique to this package is SQL based querying of gene expression data sets, distributed normalization of raw gene expression data and analysis of gene expression data across experiments and species. This package has been populated with lactation data in the international milk genomic consortium web portal (http://milkgenomics.org/). Source code is also available which can be hosted on a Windows, Linux or Mac APACHE server connected to a private or public network (http://mamsap.it.deakin.edu.au/~pcc/Release/EXP_PAC.html).  相似文献   

9.
Qiao LA  Zhu J  Liu Q  Zhu T  Song C  Lin W  Wei G  Mu L  Tao J  Zhao N  Yang G  Liu X 《Nucleic acids research》2004,32(14):4175-4181
The integration of bioinformatics resources worldwide is one of the major concerns of the biological community. We herein established the BOD (Bioinformatics on demand) system to use Grid computing technology to set up a virtual workbench via a web-based platform, to assist researchers performing customized comprehensive bioinformatics work. Users will be able to submit entire search queries and computation requests, e.g. from DNA assembly to gene prediction and finally protein folding, from their own office using the BOD end-user web interface. The BOD web portal parses the user's job requests into steps, each of which may contain multiple tasks in parallel. The BOD task scheduler takes an entire task, or splits it into multiple subtasks, and dispatches the task or subtasks proportionally to computation node(s) associated with the BOD portal server. A node may further split and distribute an assigned task to its sub-nodes using a similar strategy. In the end, the BOD portal server receives and collates all results and returns them to the user. BOD uses a pipeline model to describe the user's submitted data and stores the job requests/status/results in a relational database. In addition, an XML criterion is established to capture task computation program details.  相似文献   

10.
We tested functionality and acceptability of a wireless fetal monitoring prototype technology in pregnant women in an inpatient labor unit in the United States. Women with full-term singleton pregnancies and no evidence of active labor were asked to wear the prototype technology for 30 minutes. We assessed functionality by evaluating the ability to successfully monitor the fetal heartbeat for 30 minutes, transmit this data to Cloud storage and view the data on a web portal. Three obstetricians also rated fetal cardiotocographs on ease of readability. We assessed acceptability by administering closed and open-ended questions on perceived utility and likeability to pregnant women and clinicians interacting with the prototype technology. Thirty-two women were enrolled, 28 of whom (87.5%) successfully completed 30 minutes of fetal monitoring including transmission of cardiotocographs to the web portal. Four sessions though completed, were not successfully uploaded to the Cloud storage. Six non-study clinicians interacted with the prototype technology. The primary technical problem observed was a delay in data transmission between the prototype and the web portal, which ranged from 2 to 209 minutes. Delays were ascribed to Wi-Fi connectivity problems. Recorded cardiotocographs received a mean score of 4.2/5 (± 1.0) on ease of readability with an interclass correlation of 0.81(95%CI 0.45, 0.96). Both pregnant women and clinicians found the prototype technology likable (81.3% and 66.7% respectively), useful (96.9% and 66.7% respectively), and would either use it again or recommend its use to another pregnant woman (77.4% and 66.7% respectively). In this pilot study we found that this wireless fetal monitoring prototype technology has potential for use in a United States inpatient setting but would benefit from some technology changes. We found it to be acceptable to both pregnant women and clinicians. Further research is needed to assess feasibility of using this technology in busy inpatient settings.  相似文献   

11.
12.
The exponential increase of image data in high-resolution reconstructions by electron cryomicroscopy (cryoEM) has posed a need for efficient data management solutions in addition to powerful data processing procedures. Although relational databases and web portals are commonly used to manage sequences and structures in biological research, their application in cryoEM has been limited due to the complexity in accomplishing the dual tasks of interacting with proprietary software and simultaneously providing data access to users without database knowledge. Here, we report our results in developing web portal to SQL image databases used by the Image Management and Icosahedral Reconstruction System (IMIRS) to manage cryoEM images for subnanometer-resolution reconstructions. Fundamental issues related to the design and deployment of web portals to image databases are described. A web browser-based user interface was designed to accomplish data reporting and other database-related services, including user authentication, data entry, graph-based data mining, and various query and reporting tasks with interactive image manipulation capabilities. With an integrated web portal, IMIRS represents the first cryoEM application that incorporates both web-based data reporting tools and a complete set of data processing modules. Our examples should thus provide general guidelines applicable to other cryoEM technology development efforts.  相似文献   

13.
Salmonellosis is one of the most common and widely distributed food borne diseases caused by Salmonella serovars. The emergence of multi drug resistant strains has become a threatening public health problem and targeting unique effectors of this pathogen can be considered as a powerful strategy for drug design. SalmonellaBase is an online web portal serving as an integrated source of information about Salmonella serovars with the data required for the structural and functional studies and the analysis of druggable targets in Salmonella. We have identified several target proteins, which helps in the pathogenicity of the organism and predicted their structures. The database will have the information on completely sequenced genomes of Salmonella species with the complete set of protein sequences of the respective strains, determined structures, predicted protein structures and biochemical pathways of the respective strains. In addition, we have provided information about name and source of the protein, Uniprot and Protein Data Bank codes and literature information. Furthermore, SalmonellaBase is linked to related databases and other resources. We have set up a web interface with different search and display options so that users have the ability to get the data in several ways. SalmonellaBase is a freely available database.

Availability

http://www.salmonellabase.com/  相似文献   

14.
SUMMARY: The Helmholtz Network for Bioinformatics (HNB) is a joint venture of eleven German bioinformatics research groups that offers convenient access to numerous bioinformatics resources through a single web portal. The 'Guided Solution Finder' which is available through the HNB portal helps users to locate the appropriate resources to answer their queries by employing a detailed, tree-like questionnaire. Furthermore, automated complex tool cascades ('tasks'), involving resources located on different servers, have been implemented, allowing users to perform comprehensive data analyses without the requirement of further manual intervention for data transfer and re-formatting. Currently, automated cascades for the analysis of regulatory DNA segments as well as for the prediction of protein functional properties are provided. AVAILABILITY: The HNB portal is available at http://www.hnbioinfo.de  相似文献   

15.
Protein-protein interactions (PPIs) are the basis of biological functions. Knowledge of the interactions of a protein can help understand its molecular function and its association with different biological processes and pathways. Several publicly available databases provide comprehensive information about individual proteins, such as their sequence, structure, and function. There also exist databases that are built exclusively to provide PPIs by curating them from published literature. The information provided in these web resources is protein-centric, and not PPI-centric. The PPIs are typically provided as lists of interactions of a given gene with links to interacting partners; they do not present a comprehensive view of the nature of both the proteins involved in the interactions. A web database that allows search and retrieval based on biomedical characteristics of PPIs is lacking, and is needed. We present Wiki-Pi (read Wiki-π), a web-based interface to a database of human PPIs, which allows users to retrieve interactions by their biomedical attributes such as their association to diseases, pathways, drugs and biological functions. Each retrieved PPI is shown with annotations of both of the participant proteins side-by-side, creating a basis to hypothesize the biological function facilitated by the interaction. Conceptually, it is a search engine for PPIs analogous to PubMed for scientific literature. Its usefulness in generating novel scientific hypotheses is demonstrated through the study of IGSF21, a little-known gene that was recently identified to be associated with diabetic retinopathy. Using Wiki-Pi, we infer that its association to diabetic retinopathy may be mediated through its interactions with the genes HSPB1, KRAS, TMSB4X and DGKD, and that it may be involved in cellular response to external stimuli, cytoskeletal organization and regulation of molecular activity. The website also provides a wiki-like capability allowing users to describe or discuss an interaction. Wiki-Pi is available publicly and freely at http://severus.dbmi.pitt.edu/wiki-pi/.  相似文献   

16.
Andromeda: a peptide search engine integrated into the MaxQuant environment   总被引:3,自引:0,他引:3  
A key step in mass spectrometry (MS)-based proteomics is the identification of peptides in sequence databases by their fragmentation spectra. Here we describe Andromeda, a novel peptide search engine using a probabilistic scoring model. On proteome data, Andromeda performs as well as Mascot, a widely used commercial search engine, as judged by sensitivity and specificity analysis based on target decoy searches. Furthermore, it can handle data with arbitrarily high fragment mass accuracy, is able to assign and score complex patterns of post-translational modifications, such as highly phosphorylated peptides, and accommodates extremely large databases. The algorithms of Andromeda are provided. Andromeda can function independently or as an integrated search engine of the widely used MaxQuant computational proteomics platform and both are freely available at www.maxquant.org. The combination enables analysis of large data sets in a simple analysis workflow on a desktop computer. For searching individual spectra Andromeda is also accessible via a web server. We demonstrate the flexibility of the system by implementing the capability to identify cofragmented peptides, significantly improving the total number of identified peptides.  相似文献   

17.
Seeber F 《Nature protocols》2007,2(10):2418-2428
This communication provides an easy-to-follow protocol for using the free Internet-accessible scientific search engine, Scirus, to search for and subsequently retrieve published patents from several patent offices in portable document format (PDF). Hints on how to 'read' patents and how to extract relevant information, as well as how to export bibliographic data from Scirus and how to cite patents, are also given. The reason for providing such a protocol is that a vast amount of information, also of potential interest to life scientists, is largely hidden for those not knowing how to access these data. Several examples are provided that highlight the reasons to include patent searches into the workflow of life scientists. These include early access to data before publication, patents as a source of data that never appear in the literature and patents as a source of critical information otherwise hard to get from commercial suppliers. Finally, alternative free patent search services are briefly discussed, and their differences are highlighted.  相似文献   

18.
Many methods developed for estimating the reliability of protein–protein interactions are based on the topology of protein–protein interaction networks. This paper describes a new reliability measure for protein–protein interactions, which does not rely on the topology of protein interaction networks, but expresses biological information on functional roles, sub-cellular localisations and protein classes as a scoring schema. The new measure is useful for filtering many spurious interactions, as well as for estimating the reliability of protein interaction data. In particular, the reliability measure can be used to search protein–protein interactions with the desired reliability in databases. The reliability-based search engine is available at http://yeast.hpid.org. We believe this is the first search engine for interacting proteins, which is made available to public. The search engine and the reliability measure of protein interactions should provide useful information for determining proteins to focus on.  相似文献   

19.
Comparison of primate genomic sequences has demonstrated that the intra-and interspecific genetic variation is provided by retroelements (REs). The human genome contains many thousands of polymorphic RE copies, which are regarded as a promising source of new generation molecular genetic markers. However, the absence of systematized data on the RE number, distribution, genomic context, and abundance in various human populations limits the use of RE insertion polymorphism. We designed the first bilingual (Russian/English) web resource on the known polymorphic REs discovered both by our team and other researchers. The database contains the information about the genomic location of each RE, its position relative to known and predicted genes, abundance in human populations, and other data. Our web portal () allows a search of the database with user-specified parameters. The database makes it possible to most comprehensively analyze the RE distribution in the human genome and to design molecular genetic markers for studies of human genome diversity and biomedical applications.  相似文献   

20.
SUMMARY: DaliLite is a program for pairwise structure comparison and for structure database searching. It is a standalone version of the search engine of the popular Dali server. A web interface is provided to view the results, multiple alignments and 3D superimpositions of structures.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号