首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The Proteomics Identifications Database (PRIDE, www.ebi.ac.uk/pride ) is one of the main repositories of MS derived proteomics data. Here, we point out the main functionalities of PRIDE both as a submission repository and as a source for proteomics data. We describe the main features for data retrieval and visualization available through the PRIDE web and BioMart interfaces. We also highlight the mechanism by which tailored queries in the BioMart can join PRIDE to other resources such as Reactome, Ensembl or UniProt to execute extremely powerful across‐domain queries. We then present the latest improvements in the PRIDE submission process, using the new easy‐to‐use, platform‐independent graphical user interface submission tool PRIDE Converter. Finally, we speak about future plans and the role of PRIDE in the ProteomExchange consortium.  相似文献   

2.
biomaRt is a new Bioconductor package that integrates BioMart data resources with data analysis software in Bioconductor. It can annotate a wide range of gene or gene product identifiers (e.g. Entrez-Gene and Affymetrix probe identifiers) with information such as gene symbol, chromosomal coordinates, Gene Ontology and OMIM annotation. Furthermore biomaRt enables retrieval of genomic sequences and single nucleotide polymorphism information, which can be used in data analysis. Fast and up-to-date data retrieval is possible as the package executes direct SQL queries to the BioMart databases (e.g. Ensembl). The biomaRt package provides a tight integration of large, public or locally installed BioMart databases with data analysis in Bioconductor creating a powerful environment for biological data mining.  相似文献   

3.
MOTIVATION: The complexity of cancer is prompting researchers to find new ways to synthesize information from diverse data sources and to carry out coordinated research efforts that span multiple institutions. There is a need for standard applications, common data models, and software infrastructure to enable more efficient access to and sharing of distributed computational resources in cancer research. To address this need the National Cancer Institute (NCI) has initiated a national-scale effort, called the cancer Biomedical Informatics Grid (caBIGtrade mark), to develop a federation of interoperable research information systems. RESULTS: At the heart of the caBIG approach to federated interoperability effort is a Grid middleware infrastructure, called caGrid. In this paper we describe the caGrid framework and its current implementation, caGrid version 0.5. caGrid is a model-driven and service-oriented architecture that synthesizes and extends a number of technologies to provide a standardized framework for the advertising, discovery, and invocation of data and analytical resources. We expect caGrid to greatly facilitate the launch and ongoing management of coordinated cancer research studies involving multiple institutions, to provide the ability to manage and securely share information and analytic resources, and to spur a new generation of research applications that empower researchers to take a more integrative, trans-domain approach to data mining and analysis. AVAILABILITY: The caGrid version 0.5 release can be downloaded from https://cabig.nci.nih.gov/workspaces/Architecture/caGrid/. The operational test bed Grid can be accessed through the client included in the release, or through the caGrid-browser web application http://cagrid-browser.nci.nih.gov.  相似文献   

4.
Proteome information resources of farm animals are lagging behind those of the classical model organisms despite their important biological and economic relevance. Here, we present a Bovine PeptideAtlas, representing a first collection of Bos taurus proteome data sets within the PeptideAtlas framework. This database was built primarily as a source of information for designing selected reaction monitoring assays for studying milk production and mammary gland health, but it has an intrinsic general value for the farm animal research community. The Bovine PeptideAtlas comprises 1921 proteins at 1.2% false discovery rate (FDR) and 8559 distinct peptides at 0.29% FDR identified in 107 samples from six tissues. The PeptideAtlas web interface has a rich set of visualization and data exploration tools, enabling users to interactively mine information about individual proteins and peptides, their prototypic features, genome mappings, and supporting spectral evidence.  相似文献   

5.
6.
7.
Abstract

Within the context of the Global Biodiversity Information Facility (GBIF), the Biological Collections Access Service (BioCASe) has been set up to foment data provision by natural history content providers. Products include the BioCASe Protocol and the PyWrapper software, a web service allowing to access rich natural history data using complex schemas like ABCD (Access to Biological Collection Data). New developments include the possibility to produce DarwinCore-Archive files using PyWrapper, in order to facilitate the indexing of large datasets by aggregators such as GBIF. However, BioCASe continues to be committed to distributed data access and continues to provide the possibility to directly query the web service for up-to-date data directly from the provider's database. ABCD provides comprehensive coverage of natural history data, and has been extended to cover DNA collections (ABCD-DNA) and geosciences (ABCD-EFG, the extension for geosciences). BioCASe also developed web portal software that allows to access and display rich data provided by special interest networks. We posit that the XML-based networking approach using a highly standardised data definition such as ABCD continues to be a valuable approach towards mobilising natural history information. Some suggestions are made regarding further improvements of ABCD.  相似文献   

8.
New ‘omics’ technologies are changing nutritional sciences research. They enable to tackle increasingly complex questions but also increase the need for collaboration between research groups. An important challenge for successful collaboration is the management and structured exchange of information that accompanies data-intense technologies. NuGO, the European Nutrigenomics Organization, the major collaborating network in molecular nutritional sciences, is supporting the application of modern information technologies in this area. We have developed and implemented a concept for data management and computing infrastructure that supports collaboration between nutrigenomics researchers. The system fills the gap between “private” storing with occasional file sharing by email and the use of centralized databases. It provides flexible tools to share data, also during experiments, while preserving ownership. The NuGO Information Network is a decentral, distributed system for data exchange based on standard web technology. Secure access to data, maintained by the individual researcher, is enabled by web services based on the the BioMoby framework. A central directory provides information about available web services. The flexibility of the infrastructure allows a wide variety of services for data processing and integration by combining several web services, including public services. Therefore, this integrated information system is suited for other research collaborations.  相似文献   

9.
We developed a resource, the Arabidopsis PeptideAtlas (www.peptideatlas.org/builds/arabidopsis/), to solve central questions about the Arabidopsis thaliana proteome, such as the significance of protein splice forms and post-translational modifications (PTMs), or simply to obtain reliable information about specific proteins. PeptideAtlas is based on published mass spectrometry (MS) data collected through ProteomeXchange and reanalyzed through a uniform processing and metadata annotation pipeline. All matched MS-derived peptide data are linked to spectral, technical, and biological metadata. Nearly 40 million out of ∼143 million MS/MS (tandem MS) spectra were matched to the reference genome Araport11, identifying ∼0.5 million unique peptides and 17,858 uniquely identified proteins (only isoform per gene) at the highest confidence level (false discovery rate 0.0004; 2 non-nested peptides ≥9 amino acid each), assigned canonical proteins, and 3,543 lower-confidence proteins. Physicochemical protein properties were evaluated for targeted identification of unobserved proteins. Additional proteins and isoforms currently not in Araport11 were identified that were generated from pseudogenes, alternative start, stops, and/or splice variants, and small Open Reading Frames; these features should be considered when updating the Arabidopsis genome. Phosphorylation can be inspected through a sophisticated PTM viewer. PeptideAtlas is integrated with community resources including TAIR, tracks in JBrowse, PPDB, and UniProtKB. Subsequent PeptideAtlas builds will incorporate millions more MS/MS data.

A web resource providing the global community with mass spectrometry-based Arabidopsis proteome information and its spectral, technical, and biological metadata integrated with TAIR and JBrowse.  相似文献   

10.
The eStation is a collecting and processing system designed to automatically deal with the reception, processing, analysis and dissemination of key environmental parameters derived from remotely sensed data. Developed mainly at the Joint Research Centre of the European Commission, the eStation has been distributed to 47 sub-Saharan countries in the frame of the AMESD (Africa n Monitoring of Environment for Sustainable Development) project to provide local institutions with the capacity to easily access a large range of remote sensing products on vegetation, precipitation, fires and oceans. These products, derived from the processing of images coming from various instruments including SPOT-Vegetation, MSG-SEVIRI and MODIS are developed to allow end-users to make local and regional assessments of the state of marine and terrestrial ecosystems. The products, dispatched to the users through the EUMETSAT data broadcasting system (EUMETCast) or provided by other Earth Observation (EO) data agencies (e.g. NASA), are further processed by the eStation to allow end-users to generate their own environmental, whether terrestrial or marine, assessments and reports. Initially designed as a stand-alone system using an open source development framework, the eStation has recently been further developed as a web processing service to allow a broader range of end-users to access the data and services over the Internet. It is the purpose of this paper to introduce the readers to the eStation and its products, to share the lessons learnt in deploying these services as well as to discuss its more recent use in chained environmental web based modeling services.  相似文献   

11.
The plenary session of the Proteomics Standards Initiative (PSI) of the Human Proteome Organization at the Tenth annual HUPO World Congress updated the delegates on the ongoing activities of this group. The Molecular Interactions workgroup described the success of the PSICQUIC web service, which enables users to access multiple interaction resources with a single query. One user instance is the IMEx Consortium, which uses the service to enable users to access a non-redundant set of protein-protein interaction records. The mass spectrometry data formats, mzML for mass spectrometer output files and mzIdentML for the output of search engines, are now successfully established with increasing numbers of implementations. A format for the output of quantitative proteomics data, mzQuantML, and also TraML, for SRM/MRM transition lists, are both currently nearing completion. The corresponding MIAPE documents are being updated in line with advances in the field, as is the shared controlled vocabulary PSI-MS. In addition, the mzTab format was introduced, as a simpler way to report MS proteomics and metabolomics results. Finally, the ProteomeXchange Consortium, which will supply a single entry point for the submission of MS proteomics data to multiple data resources including PRIDE and PeptideAtlas, is currently being established.  相似文献   

12.
The deluge of data emerging from high-throughput sequencing technologies poses large analytical challenges when testing for association to disease. We introduce a scalable framework for variable selection, implemented in C++ and OpenCL, that fits regularized regression across multiple Graphics Processing Units. Open source code and documentation can be found at a Google Code repository under the URL http://bioinformatics.oxfordjournals.org/content/early/2012/01/10/bioinformatics.bts015.abstract. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

13.
Can online media predict new and emerging trends, since there is a relationship between trends in society and their representation in online systems? While several recent studies have used Google Trends as the leading online information source to answer corresponding research questions, we focus on the online encyclopedia Wikipedia often used for deeper topical reading. Wikipedia grants open access to all traffic data and provides lots of additional (semantic) information in a context network besides single keywords. Specifically, we suggest and study context-normalized and time-dependent measures for a topic’s importance based on page-view time series of Wikipedia articles in different languages and articles related to them by internal links. As an example, we present a study of the recently emerging Big Data market with a focus on the Hadoop ecosystem, and compare the capabilities of Wikipedia versus Google in predicting its popularity and life cycles. To support further applications, we have developed an open web platform to share results of Wikipedia analytics, providing context-rich and language-independent relevance measures for emerging trends.  相似文献   

14.
Escher is a web application for visualizing data on biological pathways. Three key features make Escher a uniquely effective tool for pathway visualization. First, users can rapidly design new pathway maps. Escher provides pathway suggestions based on user data and genome-scale models, so users can draw pathways in a semi-automated way. Second, users can visualize data related to genes or proteins on the associated reactions and pathways, using rules that define which enzymes catalyze each reaction. Thus, users can identify trends in common genomic data types (e.g. RNA-Seq, proteomics, ChIP)—in conjunction with metabolite- and reaction-oriented data types (e.g. metabolomics, fluxomics). Third, Escher harnesses the strengths of web technologies (SVG, D3, developer tools) so that visualizations can be rapidly adapted, extended, shared, and embedded. This paper provides examples of each of these features and explains how the development approach used for Escher can be used to guide the development of future visualization tools.  相似文献   

15.
Drosophila melanogaster is a leading model in population genetics and genomics, and a growing number of whole-genome data sets from natural populations of this species have been published over the last years. A major challenge is the integration of disparate data sets, often generated using different sequencing technologies and bioinformatic pipelines, which hampers our ability to address questions about the evolution of this species. Here we address these issues by developing a bioinformatics pipeline that maps pooled sequencing (Pool-Seq) reads from D. melanogaster to a hologenome consisting of fly and symbiont genomes and estimates allele frequencies using either a heuristic (PoolSNP) or a probabilistic variant caller (SNAPE-pooled). We use this pipeline to generate the largest data repository of genomic data available for D. melanogaster to date, encompassing 271 previously published and unpublished population samples from over 100 locations in >20 countries on four continents. Several of these locations have been sampled at different seasons across multiple years. This data set, which we call Drosophila Evolution over Space and Time (DEST), is coupled with sampling and environmental metadata. A web-based genome browser and web portal provide easy access to the SNP data set. We further provide guidelines on how to use Pool-Seq data for model-based demographic inference. Our aim is to provide this scalable platform as a community resource which can be easily extended via future efforts for an even more extensive cosmopolitan data set. Our resource will enable population geneticists to analyze spatiotemporal genetic patterns and evolutionary dynamics of D. melanogaster populations in unprecedented detail.  相似文献   

16.

Background  

The proliferation of data repositories in bioinformatics has resulted in the development of numerous interfaces that allow scientists to browse, search and analyse the data that they contain. Interfaces typically support repository access by means of web pages, but other means are also used, such as desktop applications and command line tools. Interfaces often duplicate functionality amongst each other, and this implies that associated development activities are repeated in different laboratories. Interfaces developed by public laboratories are often created with limited developer resources. In such environments, reducing the time spent on creating user interfaces allows for a better deployment of resources for specialised tasks, such as data integration or analysis. Laboratories maintaining data resources are challenged to reconcile requirements for software that is reliable, functional and flexible with limitations on software development resources.  相似文献   

17.
The Genome Sequence Archive (GSA) is a data repository for archiving raw sequence data, which provides data storage and sharing services for worldwide scientific communities. Considering explosive data growth with diverse data types, here we present the GSA family by expanding into a set of resources for raw data archive with different purposes, namely, GSA (https://ngdc.cncb.ac.cn/gsa/), GSA for Human (GSA-Human, https://ngdc.cncb.ac.cn/gsa-human/), and Open Archive for Miscellaneous Data (OMIX, https://ngdc.cncb.ac.cn/omix/). Compared with the 2017 version, GSA has been significantly updated in data model, online functionalities, and web interfaces. GSA-Human, as a new partner of GSA, is a data repository specialized in human genetics-related data with controlled access and security. OMIX, as a critical complement to the two resources mentioned above, is an open archive for miscellaneous data. Together, all these resources form a family of resources dedicated to archiving explosive data with diverse types, accepting data submissions from all over the world, and providing free open access to all publicly available data in support of worldwide research activities.  相似文献   

18.
The Phytophthora Genome Initiative (PGI) is a distributed collaboration to study the genome and evolution of a particularly destructive group of plant pathogenic oomycete, with the goal of understanding the mechanisms of infection and resistance. NCGR provides informatics support for the collaboration as well as a centralized data repository. In the pilot phase of the project, several investigators prepared Phytophthora infestans and Phytophthora sojae EST and Phytophthora sojae BAC libraries and sent them to another laboratory for sequencing. Data from sequencing reactions were transferred to NCGR for analysis and curation. An analysis pipeline transforms raw data by performing simple analyses (i.e., vector removal and similarity searching) that are stored and can be retrieved by investigators using a web browser. Here we describe the database and access tools, provide an overview of the data therein and outline future plans. This resource has provided a unique opportunity for the distributed, collaborative study of a genus from which relatively little sequence data are available. Results may lead to insight into how better to control these pathogens. The homepage of PGI can be accessed at http:www.ncgr.org/pgi, with database access through the database access hyperlink.  相似文献   

19.
Public interest in most aspects of the environment is sharply declining relative to other subjects, as measured by internet searches performed on Google. Changes in the search behavior by the public are closely tied to their interests, and those interests are critical to driving public policy. Google Insights for Search (GIFS) was a tool that provided access to search data but is now combined with another tool, Google Trends. We used GIFS to obtain data for 19 environment-related terms from 2001 to 2009. The only environment-related term with large positive slope was climate change. All other terms that we queried had strong negative slopes indicating that searches for these topics dropped over the last decade. Our results suggest that the public is growing less interested in the environment.  相似文献   

20.
A crucial part of a successful systems biology experiment is an assay that provides reliable, quantitative measurements for each of the components in the system being studied. For proteomics to be a key part of such studies, it must deliver accurate quantification of all the components in the system for each tested perturbation without any gaps in the data. This will require a new approach to proteomics that is based on emerging targeted quantitative mass spectrometry techniques. The PeptideAtlas Project comprises a growing, publicly accessible database of peptides identified in many tandem mass spectrometry proteomics studies and software tools that allow the building of PeptideAtlas, as well as its use by the research community. Here, we describe the PeptideAtlas Project, its contents and components, and show how together they provide a unique platform to select and validate mass spectrometry targets, thereby allowing the next revolution in proteomics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号