首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
As proteomic data sets increase in size and complexity, the necessity for database‐centric software systems able to organize, compare, and visualize all the proteomic experiments in a lab grows. We recently developed an integrated platform called high‐throughput autonomous proteomic pipeline (HTAPP) for the automated acquisition and processing of quantitative proteomic data, and integration of proteomic results with existing external protein information resources within a lab‐based relational database called PeptideDepot. Here, we introduce the peptide validation software component of this system, which combines relational database‐integrated electronic manual spectral annotation in Java with a new software tool in the R programming language for the generation of logistic regression spectral models from user‐supplied validated data sets and flexible application of these user‐generated models in automated proteomic workflows. This logistic regression spectral model uses both variables computed directly from SEQUEST output in addition to deterministic variables based on expert manual validation criteria of spectral quality. In the case of linear quadrupole ion trap (LTQ) or LTQ‐FTICR LC/MS data, our logistic spectral model outperformed both XCorr (242% more peptides identified on average) and the X!Tandem E‐value (87% more peptides identified on average) at a 1% false discovery rate estimated by decoy database approach.  相似文献   

2.
Kebing Yu  Arthur R. Salomon 《Proteomics》2010,10(11):2113-2122
Recent advances in the speed and sensitivity of mass spectrometers and in analytical methods, the exponential acceleration of computer processing speeds, and the availability of genomic databases from an array of species and protein information databases have led to a deluge of proteomic data. The development of a lab‐based automated proteomic software platform for the automated collection, processing, storage, and visualization of expansive proteomic data sets is critically important. The high‐throughput autonomous proteomic pipeline described here is designed from the ground up to provide critically important flexibility for diverse proteomic workflows and to streamline the total analysis of a complex proteomic sample. This tool is composed of a software that controls the acquisition of mass spectral data along with automation of post‐acquisition tasks such as peptide quantification, clustered MS/MS spectral database searching, statistical validation, and data exploration within a user‐configurable lab‐based relational database. The software design of high‐throughput autonomous proteomic pipeline focuses on accommodating diverse workflows and providing missing software functionality to a wide range of proteomic researchers to accelerate the extraction of biological meaning from immense proteomic data sets. Although individual software modules in our integrated technology platform may have some similarities to existing tools, the true novelty of the approach described here is in the synergistic and flexible combination of these tools to provide an integrated and efficient analysis of proteomic samples.  相似文献   

3.
Despite advances in metabolic and postmetabolic labeling methods for quantitative proteomics, there remains a need for improved label-free approaches. This need is particularly pressing for workflows that incorporate affinity enrichment at the peptide level, where isobaric chemical labels such as isobaric tags for relative and absolute quantitation and tandem mass tags may prove problematic or where stable isotope labeling with amino acids in cell culture labeling cannot be readily applied. Skyline is a freely available, open source software tool for quantitative data processing and proteomic analysis. We expanded the capabilities of Skyline to process ion intensity chromatograms of peptide analytes from full scan mass spectral data (MS1) acquired during HPLC MS/MS proteomic experiments. Moreover, unlike existing programs, Skyline MS1 filtering can be used with mass spectrometers from four major vendors, which allows results to be compared directly across laboratories. The new quantitative and graphical tools now available in Skyline specifically support interrogation of multiple acquisitions for MS1 filtering, including visual inspection of peak picking and both automated and manual integration, key features often lacking in existing software. In addition, Skyline MS1 filtering displays retention time indicators from underlying MS/MS data contained within the spectral library to ensure proper peak selection. The modular structure of Skyline also provides well defined, customizable data reports and thus allows users to directly connect to existing statistical programs for post hoc data analysis. To demonstrate the utility of the MS1 filtering approach, we have carried out experiments on several MS platforms and have specifically examined the performance of this method to quantify two important post-translational modifications: acetylation and phosphorylation, in peptide-centric affinity workflows of increasing complexity using mouse and human models.  相似文献   

4.
5.
The MSE (where MSE is low energy (MS) and elevated energy (E) mode of acquisition) acquisition method commercialized by Waters on its Q‐TOF instruments is regarded as a unique data‐independent fragmentation approach that improves the accuracy and dynamic range of label‐free proteomic quantitation. Due to its special format, MSE acquisition files cannot be independently analyzed with most widely used open‐source proteomic software specialized for processing data‐dependent acquisition files. In this study, we established a workflow integrating Skyline, a popular and versatile peptide‐centric quantitation program, and a statistical tool DiffProt to fulfill MSE‐based proteomic quantitation. Comparison with the vendor software package for analyzing targeted phosphopeptides and global proteomic datasets reveals distinct advantages of Skyline in MSE data mining, including sensitive peak detection, flexible peptide filtering, and transparent step‐by‐step workflow. Moreover, we developed a new procedure such that Skyline MS1 filtering was extended to small molecule quantitation for the first time. This new utility of Skyline was examined in a protein–ligand interaction experiment to identify multiple chemical compounds specifically bound to NDM‐1 (where NDM is New Delhi metallo‐β‐lactamase 1), an antibiotics‐resistance target. Further improvement of the current weaknesses in Skyline MS1 filtering is expected to enhance the reliability of this powerful program in full scan‐based quantitation of both peptides and small molecules.  相似文献   

6.
Proteomic research facilities and laboratories are facing increasing demands for the integration of biological data from multiple ‘‐OMICS’ approaches. The aim to fully understand biological processes requires the integrated study of genomes, proteomes and metabolomes. While genomic and proteomic workflows are different, the study of the metabolome overlaps significantly with the latter, both in instrumentation and methodology. However, chemical diversity complicates an easy and direct access to the metabolome by mass spectrometry (MS). The present review provides an introduction into metabolomics workflows from the viewpoint of proteomic researchers. We compare the physicochemical properties of proteins and peptides with metabolites/small molecules to establish principle differences between these analyte classes based on human data. We highlight the implications this may have on sample preparation, separation, ionisation, detection and data analysis. We argue that a typical proteomic workflow (nLC‐MS) can be exploited for the detection of a number of aliphatic and aromatic metabolites, including fatty acids, lipids, prostaglandins, di/tripeptides, steroids and vitamins, thereby providing a straightforward entry point for metabolomics‐based studies. Limitations and requirements are discussed as well as extensions to the LC‐MS workflow to expand the range of detectable molecular classes without investing in dedicated instrumentation such as GC‐MS, CE‐MS or NMR.  相似文献   

7.
In recent years, mass spectrometry has become one of the core technologies for high throughput proteomic profiling in biomedical research. However, reproducibility of the results using this technology was in question. It has been realized that sophisticated automatic signal processing algorithms using advanced statistical procedures are needed to analyze high resolution and high dimensional proteomic data, e.g., Matrix-Assisted Laser Desorption/Ionization Time-of-Flight (MALDI-TOF) data. In this paper we present a software package-pkDACLASS based on R which provides a complete data analysis solution for users of MALDITOF raw data. Complete data analysis comprises data preprocessing, monoisotopic peak detection through statistical model fitting and testing, alignment of the monoisotopic peaks for multiple samples and classification of the normal and diseased samples through the detected peaks. The software provides flexibility to the users to accomplish the complete and integrated analysis in one step or conduct analysis as a flexible platform and reveal the results at each and every step of the analysis. AVAILABILITY: The database is available for free at http://cran.r-project.org/web/packages/pkDACLASS/index.html.  相似文献   

8.
The identification and characterization of peptides from MS/MS data represents a critical aspect of proteomics. It has been the subject of extensive research in bioinformatics resulting in the generation of a fair number of identification software tools. Most often, only one program with a specific and unvarying set of parameters is selected for identifying proteins. Hence, a significant proportion of the experimental spectra do not match the peptide sequences in the screened database due to inappropriate parameters or scoring schemes. The Swiss protein identification toolbox (swissPIT) project provides the scientific community with an expandable multitool platform for automated in‐depth analysis of MS data also able to handle data from high‐throughput experiments. With swissPIT many problems have been solved: The missing standards for input and output formats (A), creation of analysis workflows (B), unified result visualization (C), and simplicity of the user interface (D). Currently, swissPIT supports four different programs implementing two different search strategies to identify MS/MS spectra. Conceived to handle the calculation‐intensive needs of each of the programs, swissPIT uses the distributed resources of a Swiss‐wide computer Grid (http://www.swing‐grid.ch).  相似文献   

9.
MS‐based proteomics is a bioinformatic‐intensive field. Additionally, the instruments and instrument‐related and analytic software are expensive. Some free Internet‐based proteomics tools have gained wide usage, but there have not been any single bioinformatic framework that in an easy and intuitive way guided the user through the whole process from analyses to submission. Together, these factors may have limited the expansion of proteomics analyses, and also the secondary use (reanalyses) of proteomic data. Vaudel et al. (Proteomics 2014, 14, 1001–1005) are now describing their Compomics framework that guides the user through all the main steps, from the database generation, via the analyses and validation, and through the submission process to PRIDE, a proteomic data bank. Vaudel et al. partly base the framework on tools that they have developed themselves, and partly they are integrating other freeware tools into the workflow. One of the most interesting aspects with the Compomics framework is the possibility of extending MS‐based proteomics outside the MS laboratory itself. With the Compomics framework, any laboratory can handle large amounts of proteomic data, thereby facilitating collaboration and in‐depth data analyses. The described software also opens the potential for any laboratory to reanalyze data deposited in PRIDE.  相似文献   

10.
Mass spectrometers that provide high mass accuracy such as FT-ICR instruments are increasingly used in proteomic studies. Although the importance of accurately determined molecular masses for the identification of biomolecules is generally accepted, its role in the analysis of shotgun proteomic data has not been thoroughly studied. To gain insight into this role, we used a hybrid linear quadrupole ion trap/FT-ICR (LTQ FT) mass spectrometer for LC-MS/MS analysis of a highly complex peptide mixture derived from a fraction of the yeast proteome. We applied three data-dependent MS/MS acquisition methods. The FT-ICR part of the hybrid mass spectrometer was either not exploited, used only for survey MS scans, or also used for acquiring selected ion monitoring scans to optimize mass accuracy. MS/MS data were assigned with the SEQUEST algorithm, and peptide identifications were validated by estimating the number of incorrect assignments using the composite target/decoy database search strategy. We developed a simple mass calibration strategy exploiting polydimethylcyclosiloxane background ions as calibrant ions. This strategy allowed us to substantially improve mass accuracy without reducing the number of MS/MS spectra acquired in an LC-MS/MS run. The benefits of high mass accuracy were greatest for assigning MS/MS spectra with low signal-to-noise ratios and for assigning phosphopeptides. Confident peptide identification rates from these data sets could be doubled by the use of mass accuracy information. It was also shown that improving mass accuracy at a cost to the MS/MS acquisition rate substantially lowered the sensitivity of LC-MS/MS analyses. The use of FT-ICR selected ion monitoring scans to maximize mass accuracy reduced the number of protein identifications by 40%.  相似文献   

11.
12.
Advances in liquid chromatography‐mass spectrometry have facilitated the incorporation of proteomic studies to many biology experimental workflows. Data‐independent acquisition platforms, such as sequential window acquisition of all theoretical mass spectra (SWATH‐MS), offer several advantages for label‐free quantitative assessment of complex proteomes over data‐dependent acquisition (DDA) approaches. However, SWATH data interpretation requires spectral libraries as a detailed reference resource. The guinea pig (Cavia porcellus) is an excellent experimental model for translation to many aspects of human physiology and disease, yet there is limited experimental information regarding its proteome. To overcome this knowledge gap, a comprehensive spectral library of the guinea pig proteome is generated. Homogenates and tryptic digests are prepared from 16 tissues and subjected to >200 DDA runs. Analysis of >250 000 peptide‐spectrum matches resulted in a library of 73 594 peptides from 7666 proteins. Library validation is provided by i) analyzing externally derived SWATH files ( https://doi.org/10.1016/j.jprot.2018.03.023 ) and comparing peptide intensity quantifications; ii) merging of externally derived data to the base library. This furnishes the research community with a comprehensive proteomic resource that will facilitate future molecular‐phenotypic studies using (re‐engaging) the guinea pig as an experimental model of relevance to human biology. The spectral library and raw data are freely accessible in the MassIVE repository (MSV000083199).  相似文献   

13.
We present several bioinformatics applications for the identification and quantification of phosphoproteome components by MS. These applications include a front‐end graphical user interface that combines several Thermo RAW formats to MASCOT? Generic Format extractors (EasierMgf), two graphical user interfaces for search engines OMSSA and SEQUEST (OmssaGui and SequestGui), and three applications, one for the management of databases in FASTA format (FastaTools), another for the integration of search results from up to three search engines (Integrator), and another one for the visualization of mass spectra and their corresponding database search results (JsonVisor). These applications were developed to solve some of the common problems found in proteomic and phosphoproteomic data analysis and were integrated in the workflow for data processing and feeding on our LymPHOS database. Applications were designed modularly and can be used standalone. These tools are written in Perl and Python programming languages and are supported on Windows platforms. They are all released under an Open Source Software license and can be freely downloaded from our software repository hosted at GoogleCode.  相似文献   

14.
The cancer tissue proteome has enormous potential as a source of novel predictive biomarkers in oncology. Progress in the development of mass spectrometry (MS)‐based tissue proteomics now presents an opportunity to exploit this by applying the strategies of comprehensive molecular profiling and big‐data analytics that are refined in other fields of ‘omics research. ProCan (ProCan is a registered trademark) is a program aiming to generate high‐quality tissue proteomic data across a broad spectrum of cancer types. It is based on data‐independent acquisition–MS proteomic analysis of annotated tissue samples sourced through collaboration with expert clinical and cancer research groups. The practical requirements of a high‐throughput translational research program have shaped the approach that ProCan is taking to address challenges in study design, sample preparation, raw data acquisition, and data analysis. The ultimate goal is to establish a large proteomics knowledge‐base that, in combination with other cancer ‘omics data, will accelerate cancer research.  相似文献   

15.
16.
This review provides a brief overview of the development of data‐independent acquisition (DIA) mass spectrometry‐based proteomics and selected DIA data analysis tools. Various DIA acquisition schemes for proteomics are summarized first including Shotgun‐CID, DIA, MSE, PAcIFIC, AIF, SWATH, MSX, SONAR, WiSIM, BoxCar, Scanning SWATH, diaPASEF, and PulseDIA, as well as the mass spectrometers enabling these methods. Next, the software tools for DIA data analysis are classified into three groups: library‐based tools, library‐free tools, and statistical validation tools. The approaches are reviewed for generating spectral libraries for six selected library‐based DIA data analysis software tools which are tested by the authors, including OpenSWATH, Spectronaut, Skyline, PeakView, DIA‐NN, and EncyclopeDIA. An increasing number of library‐free DIA data analysis tools are developed including DIA‐Umpire, Group‐DIA, PECAN, PEAKS, which facilitate identification of novel proteoforms. The authors share their user experience of when to use DIA‐MS, and several selected DIA data analysis software tools. Finally, the state of the art DIA mass spectrometry and software tools, and the authors’ views of future directions are summarized.  相似文献   

17.
As high‐throughput techniques including proteomics become more accessible to individual laboratories, there is an urgent need for a user‐friendly bioinformatics analysis system. Here, we describe FunRich, an open access, standalone functional enrichment and network analysis tool. FunRich is designed to be used by biologists with minimal or no support from computational and database experts. Using FunRich, users can perform functional enrichment analysis on background databases that are integrated from heterogeneous genomic and proteomic resources (>1.5 million annotations). Besides default human specific FunRich database, users can download data from the UniProt database, which currently supports 20 different taxonomies against which enrichment analysis can be performed. Moreover, the users can build their own custom databases and perform the enrichment analysis irrespective of organism. In addition to proteomics datasets, the custom database allows for the tool to be used for genomics, lipidomics and metabolomics datasets. Thus, FunRich allows for complete database customization and thereby permits for the tool to be exploited as a skeleton for enrichment analysis irrespective of the data type or organism used. FunRich ( http://www.funrich.org ) is user‐friendly and provides graphical representation (Venn, pie charts, bar graphs, column, heatmap and doughnuts) of the data with customizable font, scale and color (publication quality).  相似文献   

18.

Background

Quantitative proteomics holds great promise for identifying proteins that are differentially abundant between populations representing different physiological or disease states. A range of computational tools is now available for both isotopically labeled and label-free liquid chromatography mass spectrometry (LC-MS) based quantitative proteomics. However, they are generally not comparable to each other in terms of functionality, user interfaces, information input/output, and do not readily facilitate appropriate statistical data analysis. These limitations, along with the array of choices, present a daunting prospect for biologists, and other researchers not trained in bioinformatics, who wish to use LC-MS-based quantitative proteomics.

Results

We have developed Corra, a computational framework and tools for discovery-based LC-MS proteomics. Corra extends and adapts existing algorithms used for LC-MS-based proteomics, and statistical algorithms, originally developed for microarray data analyses, appropriate for LC-MS data analysis. Corra also adapts software engineering technologies (e.g. Google Web Toolkit, distributed processing) so that computationally intense data processing and statistical analyses can run on a remote server, while the user controls and manages the process from their own computer via a simple web interface. Corra also allows the user to output significantly differentially abundant LC-MS-detected peptide features in a form compatible with subsequent sequence identification via tandem mass spectrometry (MS/MS). We present two case studies to illustrate the application of Corra to commonly performed LC-MS-based biological workflows: a pilot biomarker discovery study of glycoproteins isolated from human plasma samples relevant to type 2 diabetes, and a study in yeast to identify in vivo targets of the protein kinase Ark1 via phosphopeptide profiling.

Conclusion

The Corra computational framework leverages computational innovation to enable biologists or other researchers to process, analyze and visualize LC-MS data with what would otherwise be a complex and not user-friendly suite of tools. Corra enables appropriate statistical analyses, with controlled false-discovery rates, ultimately to inform subsequent targeted identification of differentially abundant peptides by MS/MS. For the user not trained in bioinformatics, Corra represents a complete, customizable, free and open source computational platform enabling LC-MS-based proteomic workflows, and as such, addresses an unmet need in the LC-MS proteomics field.  相似文献   

19.
The identification and characterization of peptides from tandem mass spectrometry (MS/MS) data represents a critical aspect of proteomics. Today, tandem MS analysis is often performed by only using a single identification program achieving identification rates between 10-50% (Elias and Gygi, 2007). Beside the development of new analysis tools, recent publications describe also the pipelining of different search programs to increase the identification rate (Hartler et al., 2007; Keller et al., 2005). The Swiss Protein Identification Toolbox (swissPIT) follows this approach, but goes a step further by providing the user an expandable multi-tool platform capable of executing workflows to analyze tandem MS-based data. One of the major problems in proteomics is the absent of standardized workflows to analyze the produced data. This includes the pre-processing part as well as the final identification of peptides and proteins. The main idea of swissPIT is not only the usage of different identification tool in parallel, but also the meaningful concatenation of different identification strategies at the same time. The swissPIT is open source software but we also provide a user-friendly web platform, which demonstrates the capabilities of our software and which is available at http://swisspit.cscs.ch upon request for account.  相似文献   

20.
Protein MS analysis is the preferred method for unbiased protein identification. It is normally applied to a large number of both small‐scale and high‐throughput studies. However, user‐friendly computational tools for protein analysis are still needed. In this issue, Mathivanan and colleagues (Proteomics 2015, 15, 2597–2601) report the development of FunRich software, an open‐access software that facilitates the analysis of proteomics data, providing tools for functional enrichment and interaction network analysis of genes and proteins. FunRich is a reinterpretation of proteomic software, a standalone tool combining ease of use with customizable databases, free access, and graphical representations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号