首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Bioinformatics support for high-throughput proteomics   总被引:2,自引:0,他引:2  
In the "post-genome" era, mass spectrometry (MS) has become an important method for the analysis of proteome data. The rapid advancement of this technique in combination with other methods used in proteomics results in an increasing number of high-throughput projects. This leads to an increasing amount of data that needs to be archived and analyzed.To cope with the need for automated data conversion, storage, and analysis in the field of proteomics, the open source system ProDB was developed. The system handles data conversion from different mass spectrometer software, automates data analysis, and allows the annotation of MS spectra (e.g. assign gene names, store data on protein modifications). The system is based on an extensible relational database to store the mass spectra together with the experimental setup. It also provides a graphical user interface (GUI) for managing the experimental steps which led to the MS data. Furthermore, it allows the integration of genome and proteome data. Data from an ongoing experiment was used to compare manual and automated analysis. First tests showed that the automation resulted in a significant saving of time. Furthermore, the quality and interpretability of the results was improved in all cases.  相似文献   

3.
Cole KD 《BioTechniques》2000,29(6):1256-60, 1262
A database was developed to store, organize, and retrieve the data associated with electrophoresis and chromatography separations. It allows laboratories to store extensive data on separation techniques (analytical and preparative). The data for gel electrophoresis includes gel composition, staining methods, electric fields, analysis, and samples loaded. The database stores data on chromatography conditions, the samples used, and the fractions collected. The data structure of this database was designed to maintain the link between samples (including fractions) from chromatography separations and their analysis by gel electrophoresis. The database will allow laboratories to organize and maintain a large amount of separation and sample data in a uniform data environment. It will facilitate the retrieval of the separation history of important samples and the separation conditions used.  相似文献   

4.
This paper describes an open-source system for analyzing, storing, and validating proteomics information derived from tandem mass spectrometry. It is based on a combination of data analysis servers, a user interface, and a relational database. The database was designed to store the minimum amount of information necessary to search and retrieve data obtained from the publicly available data analysis servers. Collectively, this system was referred to as the Global Proteome Machine (GPM). The components of the system have been made available as open source development projects. A publicly available system has been established, comprised of a group of data analysis servers and one main database server.  相似文献   

5.
6.
Currently, the vital impact of environmental pollution on economic, social and health dimensions has been recognized. The need for theoretical and implementation frameworks for the acquisition, modeling and analysis of environmental data as well as tools to conceive and validate scenarios is becoming increasingly important. For these reasons, different environmental simulation models have been developed. Researchers and stakeholders need efficient tools to store, display, compare and analyze data that are produced by simulation models. One common way to manage simulation results is to use text files; however, text files make it difficult to explore the data. Spreadsheet tools (e.g., OpenOffice, MS Excel) can help to display and analyze model results, but they are not suitable for very large volumes of information. Recently, some studies have shown the feasibility of using Data Warehouse (DW) and On-Line Analytical Processing (OLAP) technologies to store model results and to facilitate model visualization, analysis and comparisons. These technologies allow model users to easily produce graphical reports and charts. In this paper, we address the analysis of pesticide transfer simulation results by warehousing and OLAPing data, for which the data results from the MACRO simulation model. This model simulates hydrological transfers of pesticides at the plot scale. We demonstrate how the simulation results can be managed using DW technologies. We also demonstrate how the use of integrity constraints can improve OLAP analysis. These constraints are used to maintain the quality of the warehoused data as well as to maintain the aggregations and queries, which will lead to better analysis, conclusions and decisions.  相似文献   

7.
cluML     
cluML is a new markup language for microarray data clustering and cluster validity assessment. The XML-based format has been designed to address some of the limitations observed in traditional formats, such as inability to store multiple clustering (including biclustering) and validation results within a dataset. cluML is an effective tool to support biomedical knowledge representation in gene expression data analysis. Although cluML was developed for DNA microarray analysis applications, it can be effectively used for the representation of clustering and for the validation of other biomedical and physical data that has no limitations.  相似文献   

8.
ABSTRACT: BACKGROUND: Next generation sequencing platforms are now well implanted in sequencing centres and some laboratories. Upcoming smaller scale machines such as the 454 junior from Roche or the MiSeq from Illumina will increase the number of laboratories hosting a sequencer. In such a context, it is important to provide these teams with an easily manageable environment to store and process the produced reads. RESULTS: We describe a user-friendly information system able to manage large sets of sequencing data. It includes, on one hand, a workflow environment already containing pipelines adapted to different input formats (sff, fasta, fastq and qseq), different sequencers (Roche 454, Illumina HiSeq) and various analyses (quality control, assembly, alignment, diversity studies,...) and, on the other hand, a secured web site giving access to the results. The connected user will be able to download raw and processed data and browse through the analysis result statistics. The provided workflows can easily be modified or extended and new ones can be added. Ergatis is used as a workflow building, running and monitoring system. The analyses can be run locally or in a cluster environment using Sun Grid Engine. CONCLUSIONS: NG6 is a complete information system designed to answer the needs of a sequencing platform. It provides a user-friendly interface to process, store and download high-throughput sequencing data.  相似文献   

9.
The public cloud storage auditing with deduplication has been studied to assure the data integrity and improve the storage efficiency for cloud storage in recent years. The cloud, however, has to store the link between the file and its data owners to support the valid data downloading in previous schemes. From this file-owner link, the cloud server can identify which users own the same file. It might expose the sensitive relationship among data owners of this multi-owners file, which seriously harms the data owners’ privacy. To address this problem, we propose an identity-protected secure auditing and deduplicating data scheme in this paper. In the proposed scheme, the cloud cannot learn any useful information on the relationship of data owners. Different from existing schemes, the cloud does not need to store the file-owner link for supporting valid data downloading. Instead, when the user downloads the file, he only needs to anonymously submit a credential to the cloud, and can download the file only if this credential is valid. Except this main contribution, our scheme has the following advantages over existing schemes. First, the proposed scheme achieves the constant storage, that is, the storage space is fully independent of the number of the data owners possessing the same file. Second, the proposed scheme achieves the constant computation. Only the first uploader needs to generate the authenticator for each file block, while subsequent owners do not need to generate it any longer. As a result, our scheme greatly reduces the storage overhead of the cloud and the computation overhead of data owners. The security analysis and experimental results show that our scheme is secure and efficient.  相似文献   

10.
The aquarium trade is an important and rapidly growing vector for introduced species in the United States. We examined this vector by surveying pet stores in the San Francisco Bay–Delta region to compile a list of aquarium fish species commonly stocked. We identified which of these species might be able to survive in the Bay–Delta, and investigated store representatives’ knowledge and attitudes about biological invasions. A restrictive analysis using conservative estimates of fish temperature tolerances and environmental conditions found that the local aquarium trade includes 5 fish species that can survive in a temperate system such as the Bay–Delta. Under more inclusive parameters, up to 27 fish species met the criteria for survival in the Bay–Delta. We further explored these results by comparing potential invader incidence between different types of stores. In the more restrictive analysis, three national retail chains stocked significantly more potentially invasive species than independent aquarium stores, but there was no difference in the more inclusive analysis. A significantly higher percentage of fish taxa were easily identifiable and well-labeled in chain stores than in independent stores. Most aquarium store representatives indicated willingness to take action to reduce the threat of trade-related introductions, although chain store employees were more willing to assign responsibility for reducing this threat to the aquarium industry than were independent store employees. Management efforts for this vector should focus on (a) improving labeling and identification of fish species in stores, (b) expanding the often spotty data on fish physiological tolerances, especially for saltwater species, (c) educating customers and store employees about the risks posed by pet release, and (d) providing better options for responsible disposal of unwanted fish. Electronic Supplementary Material   The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

11.
12.
The challenge for -omics research is to tackle the problem of fragmentation of knowledge by integrating several sources of heterogeneous information into a coherent entity. It is widely recognized that successful data integration is one of the keys to improve productivity for stored data. Through proper data integration tools and algorithms, researchers may correlate relationships that enable them to make better and faster decisions. The need for data integration is essential for present ‐omics community, because ‐omics data is currently spread world wide in wide variety of formats. These formats can be integrated and migrated across platforms through different techniques and one of the important techniques often used is XML. XML is used to provide a document markup language that is easier to learn, retrieve, store and transmit. It is semantically richer than HTML. Here, we describe bio warehousing, database federation, controlled vocabularies and highlighting the XML application to store, migrate and validate -omics data.  相似文献   

13.
蛋白质二级结构预测样本集数据库的设计与实现   总被引:1,自引:0,他引:1  
张宁  张涛 《生物信息学》2006,4(4):163-166
将数据库技术应用到蛋白质二级结构预测的样本集处理和分析上,建立了二级结构预测样本集数据库。以CB513样本集为例介绍了该数据库的构建模式。构建样本数据库不仅便于存储、管理和检索数据,还可以完成一些简单的序列分析工作,取代许多以往必须的编程。从而大大提高了工作效率,减少错误的发生。  相似文献   

14.
15.
To clarify the involvement of G protein in denatonium signal transduction, we carried out a whole-cell patch-clamp analysis with isolated taste cells in mice. Two different responses were observed by applying GDP-beta-S, a G-protein inhibitor. One response to denatonium was reduced by GDP-beta-S (G-protein-dependent), whereas the other was not affected (G-protein-independent). These different patterns were also observed by concurrently inhibiting the phospholipase C beta2 and phosphodiesterase pathways via G protein. These data suggest dual, G-protein-dependent and -independent mechanisms for denatonium. Moreover, the denatonium responses were not attenuated by singly inhibiting the phospholipase C beta2 or phosphodiesterase pathway, implying that both pathways were involved in G-protein-dependent transduction. In the G-protein-independent cells, the response was abolished by the depletion of calcium ions within the intracellular store. These results suggest that Ca2+ release from the intracellular store is an important factor. Our data demonstrate multiple transduction pathways for denatonium in mammalian taste cells.  相似文献   

16.
17.
Run coding applied to the digitized video signal from a TV scan of cell preparations can effect a substantial reduction in the total amount of data, sufficient to permit a moderate size of store to be loaded within one frame time with a representation of the field adequate for computer analysis. This paper describes the design of a run coding interface between a TV scanner and a computer store which also allows control of scan domain, spatial resolution and density resolution. Results are presented showing its efficiency when dealing with cervical smear preparations.  相似文献   

18.
ABSTRACT: BACKGROUND: Seqcrawler takes its roots in software like SRS or Lucegene. It provides an indexing platform to ease the search of data and meta-data in biological banks and it can scale to face the current flow of data. While many biological bank search tools are available on the Internet, mainly provided by large organizations to search in their data, there is a lack of free and open source solution to browse one own set of data with a flexible query system and able to scale from single computer to a cloud system. A personal index platform will help labs and bioinformaticians to search in their meta-data but also to build a larger information system with custom subsets of data. RESULTS: The software is scalable from a single computer to a cloud-based infrastructure. It has been successfully tested in a private cloud with 3 index shards (piece of index) hosting ~400 millions of sequence information (whole GenBank, UniProt, PDB and others) for a total size of 600 GB in a fault tolerant architecture (high-availability). It has also been successfully integrated with software to add extra meta-data from blast results to enhance user's result analysis. CONCLUSIONS: Seqcrawler provides a complete open source search and store solution for labs or platforms needing to manage large amount of data/meta-data with a flexible and customizable web interface. All components (search engine, visualization and data storage), though independent, share a common and coherent data system that can be queried with a simple HTTP interface. The solution scales easily and can also provide a high availability infrastructure.  相似文献   

19.
Lo SL  You T  Lin Q  Joshi SB  Chung MC  Hew CL 《Proteomics》2006,6(6):1758-1769
In the field of proteomics, the increasing difficulty to unify the data format, due to the different platforms/instrumentation and laboratory documentation systems, greatly hinders experimental data verification, exchange, and comparison. Therefore, it is essential to establish standard formats for every necessary aspect of proteomics data. One of the recently published data models is the proteomics experiment data repository [Taylor, C. F., Paton, N. W., Garwood, K. L., Kirby, P. D. et al., Nat. Biotechnol. 2003, 21, 247-254]. Compliant with this format, we developed the systematic proteomics laboratory analysis and storage hub (SPLASH) database system as an informatics infrastructure to support proteomics studies. It consists of three modules and provides proteomics researchers a common platform to store, manage, search, analyze, and exchange their data. (i) Data maintenance includes experimental data entry and update, uploading of experimental results in batch mode, and data exchange in the original PEDRo format. (ii) The data search module provides several means to search the database, to view either the protein information or the differential expression display by clicking on a gel image. (iii) The data mining module contains tools that perform biochemical pathway, statistics-associated gene ontology, and other comparative analyses for all the sample sets to interpret its biological meaning. These features make SPLASH a practical and powerful tool for the proteomics community.  相似文献   

20.
In genome‐wide association studies, quality control (QC) of genotypes is important to avoid spurious results. It is also important to maintain long‐term data integrity, particularly in settings with ongoing genotyping (e.g. estimation of genomic breeding values). Here we discuss snpqc , a fully automated pipeline to perform QC analyses of Illumina SNP array data. It applies a wide range of common quality metrics with user‐defined filtering thresholds to generate a comprehensive QC report and a filtered dataset, including a genomic relationship matrix, ready for further downstream analyses which make it amenable for integration in high‐throughput environments. snpqc also builds a database to store genotypic, phenotypic and quality metrics to ensure data integrity and the option of integrating more samples from subsequent runs. The program is generic across species and array designs, providing a convenient interface between the genotyping laboratory and downstream genome‐wide association study or genomic prediction.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号