共查询到20条相似文献,搜索用时 0 毫秒
1.
Background
Recent advances in proteomics technologies such as SELDI-TOF mass spectrometry has shown promise in the detection of early stage cancers. However, dimensionality reduction and classification are considerable challenges in statistical machine learning. We therefore propose a novel approach for dimensionality reduction and tested it using published high-resolution SELDI-TOF data for ovarian cancer. 相似文献2.
Mass spectrometry combined with database searching has become the preferred method for identifying proteins in proteomics projects. Proteins are digested by one or several enzymes to obtain peptides, which are analyzed by mass spectrometry. We introduce a new family of scoring schemes, named OLAV, aimed at identifying peptides in a database from their tandem mass spectra. OLAV scoring schemes are based on signal detection theory, and exploit mass spectrometry information more extensively than previously existing schemes. We also introduce a new concept of structural matching that uses pattern detection methods to better separate true from false positives. We show the superiority of OLAV scoring schemes compared to MASCOT, a widely used identification program. We believe that this work introduces a new way of designing scoring schemes that are especially adapted to high-throughput projects such as GeneProt large-scale human plasma project, where it is impractical to check all identifications manually. 相似文献
3.
Bernhard Y Renard Marc Kirchner Hanno Steen Judith AJ Steen Fred A Hamprecht 《BMC bioinformatics》2008,9(1):355
Background
The reliable extraction of features from mass spectra is a fundamental step in the automated analysis of proteomic mass spectrometry (MS) experiments. 相似文献4.
Quantitative high-throughput mass spectrometry has become an established tool to measure relative gene expression proteome-wide. The output of such an experiment usually consists of a list of expression ratios (fold changes) for several thousand proteins between two conditions. However, we observed that individual peptide fold changes may show a significantly different behavior than other peptides from the same protein and that these differences cannot be explained by imprecise measurements. Such outlier peptides can be the consequence of several technical (misidentifications, misquantifications) or biological (post-translational modifications, differential regulation of isoforms) reasons. We developed a method to detect outlier peptides in mass spectrometry data which is able to delineate imprecise measurements from real outlier peptides with high accuracy when the true difference is as small as 1.4 fold. We applied our method to experimental data and investigated the different technical and biological effects that result in outlier peptides. Our method will assist future research to reduce technical bias and can help to identify genes with differentially regulated protein isoforms in high throughput mass spectrometry data. 相似文献
5.
Protein identification using mass spectrometry is an indispensable computational tool in the life sciences. A dramatic increase in the use of proteomic strategies to understand the biology of living systems generates an ongoing need for more effective, efficient, and accurate computational methods for protein identification. A wide range of computational methods, each with various implementations, are available to complement different proteomic approaches. A solid knowledge of the range of algorithms available and, more critically, the accuracy and effectiveness of these techniques is essential to ensure as many of the proteins as possible, within any particular experiment, are correctly identified. Here, we undertake a systematic review of the currently available methods and algorithms for interpreting, managing, and analyzing biological data associated with protein identification. We summarize the advances in computational solutions as they have responded to corresponding advances in mass spectrometry hardware. The evolution of scoring algorithms and metrics for automated protein identification are also discussed with a focus on the relative performance of different techniques. We also consider the relative advantages and limitations of different techniques in particular biological contexts. Finally, we present our perspective on future developments in the area of computational protein identification by considering the most recent literature on new and promising approaches to the problem as well as identifying areas yet to be explored and the potential application of methods from other areas of computational biology. 相似文献
6.
7.
Ferro M Tardif M Reguer E Cahuzac R Bruley C Vermat T Nugues E Vigouroux M Vandenbrouck Y Garin J Viari A 《Journal of proteome research》2008,7(5):1873-1883
PepLine is a fully automated software which maps MS/MS fragmentation spectra of trypsic peptides to genomic DNA sequences. The approach is based on Peptide Sequence Tags (PSTs) obtained from partial interpretation of QTOF MS/MS spectra (first module). PSTs are then mapped on the six-frame translations of genomic sequences (second module) giving hits. Hits are then clustered to detect potential coding regions (third module). Our work aimed at optimizing the algorithms of each component to allow the whole pipeline to proceed in a fully automated manner using raw nucleic acid sequences (i.e., genomes that have not been "reduced" to a database of ORFs or putative exons sequences). The whole pipeline was tested on controlled MS/MS spectra sets from standard proteins and from Arabidopsis thaliana envelope chloroplast samples. Our results demonstrate that PepLine competed with protein database searching softwares and was fast enough to potentially tackle large data sets and/or high size genomes. We also illustrate the potential of this approach for the detection of the intron/exon structure of genes. 相似文献
8.
Josh A Henkin Mark E Jennings Dwight E Matthews Jim O Vigoreaux 《Journal of biomolecular techniques》2004,15(4):230-237
Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry analysis following tryptic digestion of polyacrylamide gel pieces is a common technique used to identify proteins. This approach is rapid, sensitive, and user friendly, and is becoming widely available to scientists in a variety of biological fields. Here we introduce a simple and effective strategy called "mass processing" where the list of masses generated from a mass spectrometer undergoes two stages of data reduction before identification. Mass processing improves the ability to identify in-gel tryptic-digested proteins by reducing the number of nonsample masses submitted to protein identification database search engines. Our results demonstrate that mass processing improves the statistical score and rank of putative protein identifications, especially with low-quantity samples, thus increasing the ability to confidently identify proteins with mass spectrometry data. 相似文献
9.
Current techniques in tandem mass spectrometric analyses of cellular protein contents often produce thousands to tens of thousands of spectra per experiment. This study introduces a new algorithm, named SPEQUAL, which is aimed at automated tandem mass spectral quality assessment. The quality of a given spectrum can be evaluated from three basic components: (i) charge state differentiation, (ii) total signal intensity, and (iii) signal-to-noise estimates. The differentiation between single and multiple precursor charge states (i) provides a binary score for a given spectrum. Components (ii) and (iii) provide partial scores which are subsequently summarized and multiplied by the first score. SPEQUAL was applied to over 10,000 data files derived from almost 3,000 tandem mass spectra, and the results (final cumulative scores) were manually verified. SPEQUAL's performance was determined to have high sensitivity and specificity and low error rates for both spectral quality estimates in general and precursor charge state differentiation in particular. Each of the partial scores is controlled by adjustable thresholds to fine-tune SPEQUAL's performance for different analysis pipelines and instrumentation. This spectral quality assessment tool is intended to act in an advisory role to the researcher, assisting in filtration of thousands of spectra typically produced by high throughput tandem mass spectrometric proteome analyses. Lastly, SPEQUAL was implemented as Java GUI-based and command-line-based interfaces freely available for both academic and industrial researchers. 相似文献
10.
We use several different multivariate analysis methods to discriminate between diseased and healthy patients using protein mass spectrometer data provided by Duke University. Two problems were presented by the university; one in which the responses (diseased or healthy) of the patients were not known and second, when the responses were known. In the latter case, the data can be used as a 'training' set. We attempted both problems. In particular, we use principle component analysis along with clustering methods to discriminate for the first problem set and partial least squares coupled with logistic and discriminant methods when the responses were known. In addition, we were able to detect regions of interest in the spectrum where there were differences in the protein patterns between healthy and diseased patients. There was considerable effort involved in the preprocessing of the data. We used a binning approach to reduce the number of variables rather than peak heights or peak areas. We performed a square root transformation on the data to help stabilize the variance; this in turn made a significant improvement in clustering results. 相似文献
11.
Recent developments in combined separations with mass spectrometry for sensitive and high-throughput proteomic analyses are reviewed herein. These developments primarily involve high-efficiency (separation peak capacities of approximately 10(3)) nanoscale liquid chromatography (flow rates extending down to approximately 20 nl/min at optimal liquid mobile-phase separation linear velocities through narrow packed capillaries) in combination with advanced mass spectrometry and in particular, high-sensitivity and high-resolution Fourier transform ion cyclotron resonance mass spectrometry. Such approaches enable analysis of low nanogram level proteomic samples (i.e., nanoscale proteomics) with individual protein identification sensitivity at the low zeptomole level. The resultant protein measurement dynamic range can approach 10(6) for nanogram-sized proteomic samples, while more abundant proteins can be detected from subpicogram-sized (total) proteome samples. These qualities provide the foundation for proteomics studies of single or small populations of cells. The instrumental robustness required for automation and providing high-quality routine performance nanoscale proteomic analyses is also discussed. 相似文献
12.
蛋白质质谱技术是蛋白质组学的重要研究工具,它被出色地应用于癌症早期诊断等领域,但是蛋白质质谱数据带来的维灾难问题使得降维成为质谱分析的必需的步骤。本文首先将美国国家癌症研究所提供的高分辨率SELDI—TOF卵巢质谱数据进行预处理;然后将质谱数据的特征选择问题转化成基于模拟退火算法的组合优化模型,用基于线性判别式分析的分类错误率和样本后验概率构造待优化目标函数,用基于均匀分布和控制参数的方法构造新解产生器,在退火过程中添加记忆功能;然后用10-fold交叉验证法选择训练和测试样本,用线性判别式分析分类器评价降维后的质谱数据。实验证明,用模拟退火算法选择6个以上特征时,能够将高分辨率SELDI—TOF卵巢质谱数据全部正确分类,说明模拟退火算法可以很好地应用于蛋白质质谱数据的特征选择。 相似文献
13.
He Z Yang C Yu W 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2011,8(2):368-380
Protein identification is a key and essential step in mass spectrometry (MS) based proteome research. To date, there are many protein identification strategies that employ either MS data or MS/MS data for database searching. While MS-based methods provide wider coverage than MS/MS-based methods, their identification accuracy is lower since MS data have less information than MS/MS data. Thus, it is desired to design more sophisticated algorithms that achieve higher identification accuracy using MS data. Peptide Mass Fingerprinting (PMF) has been widely used to identify single purified proteins from MS data for many years. In this paper, we extend this technology to protein mixture identification. First, we formulate the problem of protein mixture identification as a Partial Set Covering (PSC) problem. Then, we present several algorithms that can solve the PSC problem efficiently. Finally, we extend the partial set covering model to both MS/MS data and the combination of MS data and MS/MS data. The experimental results on simulated data and real data demonstrate the advantages of our method: 1) it outperforms previous MS-based approaches significantly; 2) it is useful in the MS/MS-based protein inference; and 3) it combines MS data and MS/MS data in a unified model such that the identification performance is further improved. 相似文献
14.
《Expert review of proteomics》2013,10(3):431-447
Recent developments in combined separations with mass spectrometry for sensitive and high-throughput proteomic analyses are reviewed herein. These developments primarily involve high-efficiency (separation peak capacities of ~103) nanoscale liquid chromatography (flow rates extending down to approximately 20 nl/min at optimal liquid mobile-phase separation linear velocities through narrow packed capillaries) in combination with advanced mass spectrometry and in particular, high-sensitivity and high-resolution Fourier transform ion cyclotron resonance mass spectrometry. Such approaches enable analysis of low nanogram level proteomic samples (i.e., nanoscale proteomics) with individual protein identification sensitivity at the low zeptomole level. The resultant protein measurement dynamic range can approach 106 for nanogram-sized proteomic samples, while more abundant proteins can be detected from subpicogram-sized (total) proteome samples. These qualities provide the foundation for proteomics studies of single or small populations of cells. The instrumental robustness required for automation and providing high-quality routine performance nanoscale proteomic analyses is also discussed. 相似文献
15.
Informatics for protein identification by mass spectrometry 总被引:3,自引:0,他引:3
High throughput protein analysis (i.e., proteomics) first became possible when sensitive peptide mass mapping techniques were developed, thereby allowing for the possibility of identifying and cataloging most 2D gel electrophoresis spots. Shortly thereafter a few groups pioneered the idea of identifying proteins by using peptide tandem mass spectra to search protein sequence databases. Hence, it became possible to identify proteins from very complex mixtures. One drawback to these latter techniques is that it is not entirely straightforward to make matches using tandem mass spectra of peptides that are modified or have sequences that differ slightly from what is present in the sequence database that is being searched. This has been part of the motivation behind automated de novo sequencing programs that attempt to derive a peptide sequence regardless of its presence in a sequence database. The sequence candidates thus generated are then subjected to homology-based database search programs (e.g., BLAST or FASTA). These homology search programs, however, were not developed with mass spectrometry in mind, and it became necessary to make minor modifications such that mass spectrometric ambiguities can be taken into account when comparing query and database sequences. Finally, this review will discuss the important issue of validating protein identifications. All of the search programs will produce a top ranked answer; however, only the credulous are willing to accept them carte blanche. 相似文献
16.
The structural domains of proteins have often been identified through the use of limited proteolysis. In structural genomics studies, it is necessary to carry this out in a high-throughput manner. Here, we constructed a novel high-throughput system, which consists of cell-free protein expression and one-step affinity purification, followed by limited proteolysis using a unique new method, referred to “on beads method”. All these steps were carried out on 96-well plate formats and completed in two days, even by manual handling. The merits of the new method versus the conventional one are as follows: (1) experimental times are reduced, (2) the sample preparation for limited proteolysis experiments is simplified, and (3) both protein purification and limited digestion can be performed “in situ” on the same sample plate. This preparation method is therefore suitable for highly automated, proteolytic analyses coupled to mass spectrometry techniques at a micro-scale protein expression level. The resulting protease-resistant fragments were analyzed by MALDI-TOF-MS and protein domains of 34 mouse cDNA products were identified with this system. 相似文献
17.
Cannon WR Jarman KH Webb-Robertson BJ Baxter DJ Oehmen CS Jarman KD Heredia-Langner A Auberry KJ Anderson GA 《Journal of proteome research》2005,4(5):1687-1698
We evaluate statistical models used in two-hypothesis tests for identifying peptides from tandem mass spectrometry data. The null hypothesis H(0), that a peptide matches a spectrum by chance, requires information on the probability of by-chance matches between peptide fragments and peaks in the spectrum. Likewise, the alternate hypothesis H(A), that the spectrum is due to a particular peptide, requires probabilities that the peptide fragments would indeed be observed if it was the causative agent. We compare models for these probabilities by determining the identification rates produced by the models using an independent data set. The initial models use different probabilities depending on fragment ion type, but uniform probabilities for each ion type across all of the labile bonds along the backbone. More sophisticated models for probabilities under both H(A) and H(0) are introduced that do not assume uniform probabilities for each ion type. In addition, the performance of these models using a standard likelihood model is compared to an information theory approach derived from the likelihood model. Also, a simple but effective model for incorporating peak intensities is described. Finally, a support-vector machine is used to discriminate between correct and incorrect identifications based on multiple characteristics of the scoring functions. The results are shown to reduce the misidentification rate significantly when compared to a benchmark cross-correlation based approach. 相似文献
18.
Malmström J Lee H Nesvizhskii AI Shteynberg D Mohanty S Brunner E Ye M Weber G Eckerskorn C Aebersold R 《Journal of proteome research》2006,5(9):2241-2249
Multidimensional LC-MS based shotgun proteomics experiments at the peptide level have traditionally been carried out by ion exchange in the first dimension and reversed-phase liquid chromatography in the second. Recently, it has been shown that isoelectric focusing (IEF) is an interesting alternative approach to ion exchange separation of peptides in the first dimension. Here we present an improved protocol for peptide separation by continuous free-flow electrophoresis (FFE) as the first dimension in a two-dimensional peptide separation work flow. By the use of a flat pI gradient and a mannitol and urea based separation media we were able to perform high-throughput proteome analysis with improved interfacing between FFE and RPLC-MS/MS. The developed protocol was applied to a cytosolic fraction from Schneider S2 cells from Drosophila melanogaster, resulting in the identification of more than 10,000 unique peptides with high probability. To improve the accuracy of the peptide identification following FFE-IEF we incorporated the pI information as an additional parameter into a statistical model for discrimination between correct and incorrect peptide assignments to MS/MS spectra. 相似文献
19.
3-nitrotyrosine (3NT) is an oxidative posttranslational modification associated with many diseases. Determining the specific sites of this modification remains a challenge due to the low stoichiometry of 3NT modifications in biological samples. Mass spectrometry-based proteomics is a powerful tool for identifying 3NT modifications, however several reports identifying 3NT sites were later demonstrated to be incorrect, highlighting that both the accuracy and efficiency of these workflows need improvement. To advance our understanding of the chromatographic and spectral properties of 3NT-containing peptides we have adapted a straightforward, reproducible procedure to generate a large set of 3NT peptides by chemical nitration of a defined, commercially available 48 protein mixture. Using two complementary LC-MS/MS platforms, a QTOF (QSTAR Elite) and dual pressure ion trap mass spectrometer (LTQ Velos), we detected over 200 validated 3NT-containing peptides with significant overlap in the peptides detected by both systems. We investigated the LC-MS/MS properties for each peptide manually using defined criteria and then assessed their utility to confirm that the peptide was 3NT modified. This broad set of validated 3NT-containing peptides can be utilized to optimize mass spectrometric instrumentation and data mining strategies or further develop 3NT peptide enrichment strategies for this biologically important, oxidative posttranslational modification. 相似文献
20.