IDEAL-Q, an Automated Tool for Label-free Quantitation Analysis Using an Efficient Peptide Alignment Approach and Spectral Data Validation期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

IDEAL-Q, an Automated Tool for Label-free Quantitation Analysis Using an Efficient Peptide Alignment Approach and Spectral Data Validation

Authors:	Chih-Chiang Tsou Chia-Feng Tsai Ying-Hao Tsui Putty-Reddy Sudhir Yi-Ting Wang Yu-Ju Chen Jeou-Yuan Chen Ting-Yi Sung and Wen-Lian Hsu

Institution:	From the Institutes of ‡Information Science, ;§Chemistry, and ;¶Biomedical Sciences, Academia Sinica, Taipei 11529, Taiwan

Abstract:	In this study, we present a fully automated tool, called IDEAL-Q, for label-free quantitation analysis. It accepts raw data in the standard mzXML format as well as search results from major search engines, including Mascot, SEQUEST, and X!Tandem, as input data. To quantify as many identified peptides as possible, IDEAL-Q uses an efficient algorithm to predict the elution time of a peptide unidentified in a specific LC-MS/MS run but identified in other runs. Then, the predicted elution time is used to detect peak clusters of the assigned peptide. Detected peptide peaks are processed by statistical and computational methods and further validated by signal-to-noise ratio, charge state, and isotopic distribution criteria (SCI validation) to filter out noisy data. The performance of IDEAL-Q has been evaluated by several experiments. First, a serially diluted protein mixed with Escherichia coli lysate showed a high correlation with expected ratios and demonstrated good linearity (R² = 0.996). Second, in a biological replicate experiment on the THP-1 cell lysate, IDEAL-Q quantified 87% (1,672 peptides) of all identified peptides, surpassing the 45.7% (909 peptides) achieved by the conventional identity-based approach, which only quantifies peptides identified in all LC-MS/MS runs. Manual validation on all 11,940 peptide ions in six replicate LC-MS/MS runs revealed that 97.8% of the peptide ions were correctly aligned, and 93.3% were correctly validated by SCI. Thus, the mean of the protein ratio, 1.00 ± 0.05, demonstrates the high accuracy of IDEAL-Q without human intervention. Finally, IDEAL-Q was applied again to the biological replicate experiment but with an additional SDS-PAGE step to show its compatibility for label-free experiments with fractionation. For flexible workflow design, IDEAL-Q supports different fractionation strategies and various normalization schemes, including multiple spiked internal standards. User-friendly interfaces are provided to facilitate convenient inspection, validation, and modification of quantitation results. In summary, IDEAL-Q is an efficient, user-friendly, and robust quantitation tool. It is available for download.Quantitative analysis of protein expression promises to provide fundamental understanding of the biological changes or biomarker discoveries in clinical applications. In recent years, various stable isotope labeling techniques, e.g. ICAT (1), enzymatic labeling using ¹⁸O/¹⁶O (2, 3), stable isotope labeling by amino acids in cell culture (4), and isobaric tagging for relative and absolute quantitation (2, 5), coupled with LC-MS/MS have been widely used for large scale quantitative proteomics. However, several factors, such as the limited number of samples, the complexity of procedures in isotopic labeling experiments, and the high cost of reagents, limit the applicability of isotopic labeling techniques to high throughput analysis. Unlike the labeling approaches, the label-free quantitation approach quantifies protein expression across multiple LC-MS/MS analyses directly without using any labeling technique (7 –9). Thus, it is particularly useful for analyzing clinical specimens in highly multiplexed quantitation (10, 11); theoretically, it can be used to compare any number of samples. Despite these significant advantages, data analysis in label-free experiments is an intractable problem because of the experimental procedures. First, although high reproducibility in LC is considered a critical prerequisite, variations, including the aging of separation columns, changes in sample buffers, and fluctuations in temperature, will cause a chromatographic shift in retention time for analytes in different LC-MS/MS runs and thus complicate the analysis. In addition, under the label-free approach, many technical replicate analyses across a large number of samples are often acquired; however, comparing a large number of data files further complicates data analysis and renders lower quantitation accuracy than that derived by labeling methods. Hence, an accurate, automated computation tool is required to effectively solve the problem of chromatographic shift, analyze a large amount of experimental data, and provide convenient user interfaces for manual validation of quantitation results.The rapid emergence of new label-free techniques for biomarker discovery has inspired the development of a number of bioinformatics tools in recent years. For example, Scaffold (Proteome Software) and Census (12) process PepXML search results to quantify relative protein expression based on spectral counting (13 –15), which uses the number of MS/MS spectra assigned to a protein to determine the relative protein amount. Spectral counting has demonstrated a high correlation with protein abundance; however, to achieve good quantitation accuracy with the technique, high speed MS/MS data acquisition is required. Moreover, manipulations of the exclusion/inclusion strategy also affect the accuracy of spectral counting significantly. Because peptide level quantitation is also important for post-translational modification studies, the accuracy of spectral counting on peptide level quantitation deserves further study.Another type of quantitation analysis determines peptide abundance by MS¹ peak signals. According to some studies, MS¹ peak signals across different LC-MS/MS runs can be highly reproducible and correlate well with protein abundance in complex biological samples (7 –9). Quantitation analysis methods based on MS¹ peak signals can be classified into three categories: identity-based, pattern-based, and hybrid-based methods (16). Identity-based methods (7 –9) depend on the results of MS/MS sequencing to identify and detect peptide signals in MS¹ data. However, because the data acquisition speed of MS scanning is insufficient, a considerable number of low abundance peptides may not be selected for limited MS/MS sequencing. Only a few peptides can be repetitively identified in all LC-MS/MS runs and subsequently quantified; thus, only a small fraction of identified peptides are quantified, resulting in a small number of quantifiable peptides/proteins.In contrast to identity-based methods, pattern-based methods (17 –23), including the publicly available MSight (20), MZmine (21, 22), and msInspect (23), tend to quantify all peptide peaks in MS¹ data to increase the number of quantifiable peptides. These methods first detect all peaks in each MS¹ data and then align the detected peaks across different LC-MS/MS runs. However, in pattern-based methods, efficient detection and alignment of the peaks between each pair of LC-MS/MS runs are a major challenge. To align the peaks, several methods based on dynamic programming or image pattern recognition have been proposed (24 –26). The algorithms applied in these methods require intensive computation, and their computation time increases dramatically as the number of compared samples increases because all the LC-MS/MS runs must be processed. Therefore, pattern-based approaches are infeasible for processing a large number of samples. Furthermore, pattern recognition algorithms may fail on data containing noise or overlapping peptide signal (i.e. co-eluting peptides). The hybrid-based quantitation approach (16, 27 –30) combines a pattern recognition algorithm with peptide identification results to align shifted peptides for quantitation. The pioneering accurate mass and time tag strategy (27) takes advantage of very sensitive, highly accurate mass measurement instruments with a wide dynamic range, e.g. FTICR-MS and TOF-MS, for quantitation analysis. PEPPeR (16) and SuperHirn (28) apply pattern recognition algorithms to align peaks and use the peptide identification results as landmarks to improve the alignment. However, because these methods still align all peaks in MS¹ data, they suffer the same computation time problem as pattern-based methods.To resolve the computation-intensive problem in the hybrid approach, we present a fully automated software system, called IDEAL-Q, for label-free quantitation including differential protein expression and protein modification analysis. Instead of using computation-intensive pattern recognition methods, IDEAL-Q uses a computation-efficient fragmental regression method for identity-based alignment of all confidently identified peptides in a local elution time domain. It then performs peptide cross-assignment by mapping predicted elution time profiles across multiple LC-MS experiments. To improve the quantitation accuracy, IDEAL-Q applies three validation criteria to the detected peptide peak clusters to filter out noisy signals, false peptide peak clusters, and co-eluting peaks. Because of the above key features, i.e. fragmental regression and stringent validation, IDEAL-Q can substantially increase the number of quantifiable proteins as well as the quantitation accuracy compared with other extracted ion chromatogram (XIC)¹ -based tools. Notably, to accommodate different designs, IDEAL-Q supports various built-in normalization procedures, including normalization based on multiple internal standards, to eliminate systematic biases. It also adapts to different fractionation strategies for in-depth proteomics profiling.We evaluated the performance of IDEAL-Q on three levels: 1) quantitation of a standard protein mixture, 2) large scale proteome quantitation using replicate cell lysate, and 3) proteome scale quantitative analysis of protein expression that incorporates an additional fractionation step. We demonstrated that IDEAL-Q can quantify up to 89% of identified proteins (703 proteins) in the replicate THP-1 cell lysate. Moreover, by manual validation of the entire 11,940 peptide ions corresponding to 1,990 identified peptides, 93% of peptide ions were accurately quantified. In another experiment on replicate data containing huge chromatographic shifts obtained from two independent LC-MS/MS instruments, IDEAL-Q demonstrated its robust quantitation and its ability to rectify such shifts. Finally, we applied IDEAL-Q to the THP-1 replicate experiment with an additional SDS-PAGE fractionation step. Equipped with user-friendly visualization interfaces and convenient data output for publication, IDEAL-Q represents a generic, robust, and comprehensive tool for label-free quantitative proteomics.

Keywords:
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏