首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Proteomics is a data-rich discipline that makes extensive use of separation tools, mass spectrometry and bioinformatics to analyze and interpret the features and dynamics of the proteome. A major challenge for the field is how proteomics data can be stored and managed, such that data become permanent and can be mined with current and future tools. This article details our experience in the development of a commercial proteomic information management system. We identify the challenges faced in data acquisition, workflow management, data permanence, security, data interpretation and analysis, as well as the solutions implemented to address these issues. We finally provide a perspective on data management in proteomics and the implications for academic and industry-based researchers working in this field.  相似文献   

2.
Structural proteomics is an emerging paradigm that is gaining importance in the post-genomic era as a valuable discipline to process the protein target information being deciphered. The field plays a crucial role in assigning function to sequenced proteins, defining pathways in which the targets are involved, and understanding structure-function relationships of the protein targets. A key component of this research sector is accessing the three-dimensional structures of protein targets by both experimental and theoretical methods. This then leads to the question of how to store, retrieve, and manipulate vast amounts of sequence (1-D) and structural (3-D) information in a relational format so that extensive data analysis can be achieved. We at SBI have addressed both of these fundamental requirements of structural proteomics. We have developed an extensive collection of three-dimensional protein structures from sequence data and have implemented a relational architecture for data management. In this article we will discuss our approaches to structural proteomics and the tools that life science researchers can use in their discovery efforts.  相似文献   

3.
MOTIVATION: Experimental techniques in proteomics have seen rapid development over the last few years. Volume and complexity of the data have both been growing at a similar rate. Accordingly, data management and analysis are one of the major challenges in proteomics. Flexible algorithms are required to handle changing experimental setups and to assist in developing and validating new methods. In order to facilitate these studies, it would be desirable to have a flexible 'toolbox' of versatile and user-friendly applications allowing for rapid construction of computational workflows in proteomics. RESULTS: We describe a set of tools for proteomics data analysis-TOPP, The OpenMS Proteomics Pipeline. TOPP provides a set of computational tools which can be easily combined into analysis pipelines even by non-experts and can be used in proteomics workflows. These applications range from useful utilities (file format conversion, peak picking) over wrapper applications for known applications (e.g. Mascot) to completely new algorithmic techniques for data reduction and data analysis. We anticipate that TOPP will greatly facilitate rapid prototyping of proteomics data evaluation pipelines. As such, we describe the basic concepts and the current abilities of TOPP and illustrate these concepts in the context of two example applications: the identification of peptides from a raw dataset through database search and the complex analysis of a standard addition experiment for the absolute quantitation of biomarkers. The latter example demonstrates TOPP's ability to construct flexible analysis pipelines in support of complex experimental setups. AVAILABILITY: The TOPP components are available as open-source software under the lesser GNU public license (LGPL). Source code is available from the project website at www.OpenMS.de  相似文献   

4.
SUMMARY: The large amount of data produced by proteomics experiments requires effective bioinformatics tools for the integration of data management and data analysis. Here we introduce a suite of tools developed at Vanderbilt University to support production proteomics. We present the Backup Utility Service tool for automated instrument file backup and the ScanSifter tool for data conversion. We also describe a queuing system to coordinate identification pipelines and the File Collector tool for batch copying analytical results. These tools are individually useful but collectively reinforce each other. They are particularly valuable for proteomics core facilities or research institutions that need to manage multiple mass spectrometers. With minor changes, they could support other types of biomolecular resource facilities.  相似文献   

5.
6.
As an emerging field, MS-based proteomics still requires software tools for efficiently storing and accessing experimental data. In this work, we focus on the management of LC–MS data, which are typically made available in standard XML-based portable formats. The structures that are currently employed to manage these data can be highly inefficient, especially when dealing with high-throughput profile data. LC–MS datasets are usually accessed through 2D range queries. Optimizing this type of operation could dramatically reduce the complexity of data analysis. We propose a novel data structure for LC–MS datasets, called mzRTree, which embodies a scalable index based on the R-tree data structure. mzRTree can be efficiently created from the XML-based data formats and it is suitable for handling very large datasets. We experimentally show that, on all range queries, mzRTree outperforms other known structures used for LC–MS data, even on those queries these structures are optimized for. Besides, mzRTree is also more space efficient. As a result, mzRTree reduces data analysis computational costs for very large profile datasets.  相似文献   

7.
The objective of proteomics is to get an overview of the proteins expressed at a given point in time in a given tissue and to identify the connection to the biochemical status of that tissue. Therefore sample throughput and analysis time are important issues in proteomics. The concept of proteomics is to encircle the identity of proteins of interest. However, the overall relation between proteins must also be explained. Classical proteomics consist of separation and characterization, based on two-dimensional electrophoresis, trypsin digestion, mass spectrometry and database searching. Characterization includes labor intensive work in order to manage, handle and analyze data. The field of classical proteomics should therefore be extended to also include handling of large datasets in an objective way. The separation obtained by two-dimensional electrophoresis and mass spectrometry gives rise to huge amount of data. We present a multivariate approach to the handling of data in proteomics with the advantage that protein patterns can be spotted at an early stage and consequently the proteins selected for sequencing can be selected intelligently. These methods can also be applied to other data generating protein analysis methods like mass spectrometry and near infrared spectroscopy and examples of application to these techniques are also presented. Multivariate data analysis can unravel complicated data structures and may thereby relieve the characterization phase in classical proteomics. Traditionally statistical methods are not suitable for analysis of the huge amounts of data, where the number of variables exceed the number of objects. Multivariate data analysis, on the other hand, may uncover the hidden structures present in these data. This study takes its starting point in the field of classical proteomics and shows how multivariate data analysis can lead to faster ways of finding interesting proteins. Multivariate analysis has shown interesting results as a supplement to classical proteomics and added a new dimension to the field of proteomics.  相似文献   

8.
Recent technological advances have made it possible to identify and quantify thousands of proteins in a single proteomics experiment. As a result of these developments, the analysis of data has become the bottleneck of proteomics experiment. To provide the proteomics community with a user-friendly platform for comprehensive analysis, inspection and visualization of quantitative proteomics data we developed the Graphical Proteomics Data Explorer (GProX)(1). The program requires no special bioinformatics training, as all functions of GProX are accessible within its graphical user-friendly interface which will be intuitive to most users. Basic features facilitate the uncomplicated management and organization of large data sets and complex experimental setups as well as the inspection and graphical plotting of quantitative data. These are complemented by readily available high-level analysis options such as database querying, clustering based on abundance ratios, feature enrichment tests for e.g. GO terms and pathway analysis tools. A number of plotting options for visualization of quantitative proteomics data is available and most analysis functions in GProX create customizable high quality graphical displays in both vector and bitmap formats. The generic import requirements allow data originating from essentially all mass spectrometry platforms, quantitation strategies and software to be analyzed in the program. GProX represents a powerful approach to proteomics data analysis providing proteomics experimenters with a toolbox for bioinformatics analysis of quantitative proteomics data. The program is released as open-source and can be freely downloaded from the project webpage at http://gprox.sourceforge.net.  相似文献   

9.
Xia D  Ghali F  Gaskell SJ  O'Cualain R  Sims PF  Jones AR 《Proteomics》2012,12(12):1912-1916
The development of ion mobility (IM) MS instruments has the capability to provide an added dimension to peptide analysis pipelines in proteomics, but, as yet, there are few software tools available for analysing such data. IM can be used to provide additional separation of parent ions or product ions following fragmentation. In this work, we have created a set of software tools that are capable of converting three dimensional IM data generated from analysis of fragment ions into a variety of formats used in proteomics. We demonstrate that IM can be used to calculate the charge state of a fragment ion, demonstrating the potential to improve peptide identification by excluding non-informative ions from a database search. We also provide preliminary evidence of structural differences between b and y ions for certain peptide sequences but not others. All software tools and data sets are made available in the public domain at http://code.google.com/p/ion-mobility-ms-tools/.  相似文献   

10.
Performing a well thought‐out proteomics data analysis can be a daunting task, especially for newcomers to the field. Even researchers experienced in the proteomics field can find it challenging to follow existing publication guidelines for MS‐based protein identification and characterization in detail. One of the primary goals of bioinformatics is to enable any researcher to interpret the vast amounts of data generated in modern biology, by providing user‐friendly and robust end‐user applications, clear documentation, and corresponding teaching materials. In that spirit, we here present an extensive tutorial for peptide and protein identification, available at http://compomics.com/bioinformatics‐for‐proteomics . The material is completely based on freely available and open‐source tools, and has already been used and refined at numerous international courses over the past 3 years. During this time, it has demonstrated its ability to allow even complete beginners to intuitively conduct advanced bioinformatics workflows, interpret the results, and understand their context. This tutorial is thus aimed at fully empowering users, by removing black boxes in the proteomics informatics pipeline.  相似文献   

11.
The field of proteomics is advancing rapidly as a result of powerful new technologies and proteomics experiments yield a vast and increasing amount of information. Data regarding protein occurrence, abundance, identity, sequence, structure, properties, and interactions need to be stored. Currently, a common standard has not yet been established and open access to results is needed for further development of robust analysis algorithms. Databases for proteomics will evolve from pure storage into knowledge resources, providing a repository for information (meta-data) which is mainly not stored in simple flat files. This review will shed light on recent steps towards the generation of a common standard in proteomics data storage and integration, but is not meant to be a comprehensive overview of all available databases and tools in the proteomics community.  相似文献   

12.
Halligan BD  Greene AS 《Proteomics》2011,11(6):1058-1063
A major challenge in the field of high-throughput proteomics is the conversion of the large volume of experimental data that is generated into biological knowledge. Typically, proteomics experiments involve the combination and comparison of multiple data sets and the analysis and annotation of these combined results. Although there are some commercial applications that provide some of these functions, there is a need for a free, open source, multifunction tool for advanced proteomics data analysis. We have developed the Visualize program that provides users with the abilities to visualize, analyze, and annotate proteomics data; combine data from multiple runs, and quantitate differences between individual runs and combined data sets. Visualize is licensed under GNU GPL and can be downloaded from http://proteomics.mcw.edu/visualize. It is available as compiled client-based executable files for both Windows and Mac OS X platforms as well as PERL source code.  相似文献   

13.
Brusic V  Marina O  Wu CJ  Reinherz EL 《Proteomics》2007,7(6):976-991
Proteomics offers the most direct approach to understand disease and its molecular biomarkers. Biomarkers denote the biological states of tissues, cells, or body fluids that are useful for disease detection and classification. Clinical proteomics is used for early disease detection, molecular diagnosis of disease, identification and formulation of therapies, and disease monitoring and prognostics. Bioinformatics tools are essential for converting raw proteomics data into knowledge and subsequently into useful applications. These tools are used for the collection, processing, analysis, and interpretation of the vast amounts of proteomics data. Management, analysis, and interpretation of large quantities of raw and processed data require a combination of various informatics technologies such as databases, sequence comparison, predictive models, and statistical tools. We have demonstrated the utility of bioinformatics in clinical proteomics through the analysis of the cancer antigen survivin and its suitability as a target for cancer immunotherapy.  相似文献   

14.
Proteomics of Staphylococcus aureus--current state and future challenges   总被引:7,自引:0,他引:7  
This paper presents a short review of the proteome of Staphylococcus aureus, a gram-positive human pathogen of increasing importance for human health as a result of the increasing antibiotic resistance. A proteome reference map is shown which can be used for future studies and is followed by a demonstration of how proteomics could be applied to obtain new information on S. aureus physiology. The proteomic approach can provide new data on the regulation of metabolism as well as of the stress or starvation responses. Proteomic signatures encompassing specific stress or starvation proteins are excellent tools to predict the physiological state of a cell population. Furthermore proteomics is very useful for analysing the size and function of known and unknown regulons and will open a new dimension in the comprehensive understanding of regulatory networks in pathogenicity. Finally, some fields of application of S. aureus proteomics are discussed, including proteomics and strain evaluation, the role of proteomics for analysis of antibiotic resistance or for discovering new targets and diagnostics tools. The review also shows that the post-genome era of S. aureus which began in 2001 with the publication of the genome sequence is still in a preliminary stage, however, the consequent application of proteomics in combination with DNA array techniques and supported by bioinformatics will provide a comprehensive picture on cell physiology and pathogenicity in the near future.  相似文献   

15.
The recent improvements in mass spectrometry instruments and new analytical methods are increasing the intersection between proteomics and big data science. In addition, bioinformatics analysis is becoming increasingly complex and convoluted, involving multiple algorithms and tools. A wide variety of methods and software tools have been developed for computational proteomics and metabolomics during recent years, and this trend is likely to continue. However, most of the computational proteomics and metabolomics tools are designed as single‐tiered software application where the analytics tasks cannot be distributed, limiting the scalability and reproducibility of the data analysis. In this paper the key steps of metabolomics and proteomics data processing, including the main tools and software used to perform the data analysis, are summarized. The combination of software containers with workflows environments for large‐scale metabolomics and proteomics analysis is discussed. Finally, a new approach for reproducible and large‐scale data analysis based on BioContainers and two of the most popular workflow environments, Galaxy and Nextflow, is introduced to the proteomics and metabolomics communities.  相似文献   

16.
SUMMARY: Analysis of proteomics data, specifically mass spectrometry data, commonly relies on libraries of known information such as atomic masses, known stable isotopes, atomic compositions of amino acids, observed modifications of known amino acids and ion masses that directly correspond to known amino acid sequences. The Java Analysis Framework (JAF) for proteomics provides a freely usable, open-source library of Java code that abstracts all of the aforementioned data, enabling more rapid development of proteomics tools. The JAF also includes several user tools that can be run directly from a web browser. AVAILABILITY: The current version and an archive of all older versions of the Java Analysis Framework for Proteomics is freely available, including complete source-code, at http://www.proteomecommons.org/current/511/.  相似文献   

17.

Background

Quantitative proteomics holds great promise for identifying proteins that are differentially abundant between populations representing different physiological or disease states. A range of computational tools is now available for both isotopically labeled and label-free liquid chromatography mass spectrometry (LC-MS) based quantitative proteomics. However, they are generally not comparable to each other in terms of functionality, user interfaces, information input/output, and do not readily facilitate appropriate statistical data analysis. These limitations, along with the array of choices, present a daunting prospect for biologists, and other researchers not trained in bioinformatics, who wish to use LC-MS-based quantitative proteomics.

Results

We have developed Corra, a computational framework and tools for discovery-based LC-MS proteomics. Corra extends and adapts existing algorithms used for LC-MS-based proteomics, and statistical algorithms, originally developed for microarray data analyses, appropriate for LC-MS data analysis. Corra also adapts software engineering technologies (e.g. Google Web Toolkit, distributed processing) so that computationally intense data processing and statistical analyses can run on a remote server, while the user controls and manages the process from their own computer via a simple web interface. Corra also allows the user to output significantly differentially abundant LC-MS-detected peptide features in a form compatible with subsequent sequence identification via tandem mass spectrometry (MS/MS). We present two case studies to illustrate the application of Corra to commonly performed LC-MS-based biological workflows: a pilot biomarker discovery study of glycoproteins isolated from human plasma samples relevant to type 2 diabetes, and a study in yeast to identify in vivo targets of the protein kinase Ark1 via phosphopeptide profiling.

Conclusion

The Corra computational framework leverages computational innovation to enable biologists or other researchers to process, analyze and visualize LC-MS data with what would otherwise be a complex and not user-friendly suite of tools. Corra enables appropriate statistical analyses, with controlled false-discovery rates, ultimately to inform subsequent targeted identification of differentially abundant peptides by MS/MS. For the user not trained in bioinformatics, Corra represents a complete, customizable, free and open source computational platform enabling LC-MS-based proteomic workflows, and as such, addresses an unmet need in the LC-MS proteomics field.  相似文献   

18.
MS‐based proteomics is a bioinformatic‐intensive field. Additionally, the instruments and instrument‐related and analytic software are expensive. Some free Internet‐based proteomics tools have gained wide usage, but there have not been any single bioinformatic framework that in an easy and intuitive way guided the user through the whole process from analyses to submission. Together, these factors may have limited the expansion of proteomics analyses, and also the secondary use (reanalyses) of proteomic data. Vaudel et al. (Proteomics 2014, 14, 1001–1005) are now describing their Compomics framework that guides the user through all the main steps, from the database generation, via the analyses and validation, and through the submission process to PRIDE, a proteomic data bank. Vaudel et al. partly base the framework on tools that they have developed themselves, and partly they are integrating other freeware tools into the workflow. One of the most interesting aspects with the Compomics framework is the possibility of extending MS‐based proteomics outside the MS laboratory itself. With the Compomics framework, any laboratory can handle large amounts of proteomic data, thereby facilitating collaboration and in‐depth data analyses. The described software also opens the potential for any laboratory to reanalyze data deposited in PRIDE.  相似文献   

19.
Novel and improved computational tools are required to transform large-scale proteomics data into valuable information of biological relevance. To this end, we developed ProteoConnections, a bioinformatics platform tailored to address the pressing needs of proteomics analyses. The primary focus of this platform is to organize peptide and protein identifications, evaluate the quality of the acquired data set, profile abundance changes, and accelerate data interpretation. Peptide and protein identifications are stored into a relational database to facilitate data mining and to evaluate the quality of data sets using graphical reports. We integrated databases of known PTMs and other bioinformatics tools to facilitate the analysis of phosphoproteomics data sets and to provide insights for subsequent biological validation experiments. Phosphorylation sites are also annotated according to kinase consensus motifs, contextual environment, protein domains, binding motifs, and evolutionary conservation across different species. The practical application of ProteoConnections is further demonstrated for the analysis of the phosphoproteomics data sets from rat intestinal IEC-6 cells where we identified 9615 phosphorylation sites on 2108 phosphoproteins. Combined proteomics and bioinformatics analyses revealed valuable biological insights on the regulation of phosphoprotein functions via the introduction of new binding sites on scaffold proteins or the modulation of protein-protein, protein-DNA, or protein-RNA interactions. Quantitative proteomics data can be integrated into ProteoConnections to determine the changes in protein phosphorylation under different cell stimulation conditions or kinase inhibitors, as demonstrated here for the MEK inhibitor PD184352.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号