首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
3.
Due to the limited applicability of conventional protein identification methods to the proteomes of organisms with unsequenced genomes, researchers have developed approaches to identify proteins using mass spectrometry and sequence similarity database searches. Both the integration of mass spectrometry with bioinformatics and genomic sequencing drive the expanding organismal scope of proteomics.  相似文献   

4.
Timely classification and identification of bacteria is of vital importance in many areas of public health. Mass spectrometry-based methods provide an attractive alternative to well-established microbiologic procedures. Mass spectrometry methods can be characterized by the relatively high speed of acquiring taxonomically relevant information. Gel-free mass spectrometry proteomics techniques allow for rapid fingerprinting of bacterial proteins using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry or, for high-throughput sequencing of peptides from protease-digested cellular proteins, using mass analysis of fragments from collision-induced dissociation of peptide ions. The latter technique uses database searching of product ion mass spectra. A database contains a comprehensive list of protein sequences translated from protein-encoding open reading frames found in bacterial genomes. The results of such searches allow the assignment of experimental peptide sequences to matching theoretical bacterial proteomes. Phylogenetic profiles of sequenced peptides are then used to create a matrix of sequence-to-bacterium assignments, which are analyzed using numerical taxonomy tools. The results thereof reveal the relatedness between bacteria, and allow the taxonomic position of an investigated strain to be inferred.  相似文献   

5.
Timely classification and identification of bacteria is of vital importance in many areas of public health. Mass spectrometry-based methods provide an attractive alternative to well-established microbiologic procedures. Mass spectrometry methods can be characterized by the relatively high speed of acquiring taxonomically relevant information. Gel-free mass spectrometry proteomics techniques allow for rapid fingerprinting of bacterial proteins using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry or, for high-throughput sequencing of peptides from protease-digested cellular proteins, using mass analysis of fragments from collision-induced dissociation of peptide ions. The latter technique uses database searching of product ion mass spectra. A database contains a comprehensive list of protein sequences translated from protein-encoding open reading frames found in bacterial genomes. The results of such searches allow the assignment of experimental peptide sequences to matching theoretical bacterial proteomes. Phylogenetic profiles of sequenced peptides are then used to create a matrix of sequence-to-bacterium assignments, which are analyzed using numerical taxonomy tools. The results thereof reveal the relatedness between bacteria, and allow the taxonomic position of an investigated strain to be inferred.  相似文献   

6.

Background  

Tandem mass spectrometry followed by database search is currently the predominant technology for peptide sequencing in shotgun proteomics experiments. Most methods compare experimentally observed spectra to the theoretical spectra predicted from the sequences in protein databases. There is a growing interest, however, in comparing unknown experimental spectra to a library of previously identified spectra. This approach has the advantage of taking into account instrument-dependent factors and peptide-specific differences in fragmentation probabilities. It is also computationally more efficient for high-throughput proteomics studies.  相似文献   

7.
Protein identifications with the borderline statistical confidence are typically produced by matching a few marginal quality MS/MS spectra to database peptide sequences and represent a significant bottleneck in the reliable and reproducible characterization of proteomes. Here, we present a method for rapid validation of borderline hits that circumvents the need in, often biased, manual inspection of raw MS/MS spectra. The approach takes advantage of the independent interpretation of corresponding MS/MS spectra by PepNovo de novo sequencing software followed by mass spectrometry-driven BLAST (MS BLAST) sequence-similarity database searches that utilize all partially inaccurate, degenerate and redundant candidate peptide sequences. In a case study involving the identification of more than 180 Caenorhabditis elegans proteins by nanoLC-MS/MS analysis on a linear ion trap LTQ mass spectrometer, the approach enabled rapid assignment (confirmation or rejection) of more than 70% of Mascot hits of borderline statistical confidence.  相似文献   

8.
9.
10.
Spectral libraries have emerged as a viable alternative to protein sequence databases for peptide identification. These libraries contain previously detected peptide sequences and their corresponding tandem mass spectra (MS/MS). Search engines can then identify peptides by comparing experimental MS/MS scans to those in the library. Many of these algorithms employ the dot product score for measuring the quality of a spectrum-spectrum match (SSM). This scoring system does not offer a clear statistical interpretation and ignores fragment ion m/z discrepancies in the scoring. We developed a new spectral library search engine, Pepitome, which employs statistical systems for scoring SSMs. Pepitome outperformed the leading library search tool, SpectraST, when analyzing data sets acquired on three different mass spectrometry platforms. We characterized the reliability of spectral library searches by confirming shotgun proteomics identifications through RNA-Seq data. Applying spectral library and database searches on the same sample revealed their complementary nature. Pepitome identifications enabled the automation of quality analysis and quality control (QA/QC) for shotgun proteomics data acquisition pipelines.  相似文献   

11.
With the onset of modern DNA sequencing technologies, genomics is experiencing a revolution in terms of quantity and quality of sequencing data. Rapidly growing numbers of sequenced genomes and metagenomes present a tremendous challenge for bioinformatics tools that predict protein-coding regions. Experimental evidence of expressed genomic regions, both at the RNA and protein level, is becoming invaluable for genome annotation and training of gene prediction algorithms. Evidence of gene expression at the protein level using mass spectrometry-based proteomics is increasingly used in refinement of raw genome sequencing data. In a typical "proteogenomics" experiment, the whole proteome of an organism is extracted, digested into peptides and measured by a mass spectrometer. The peptide fragmentation spectra are identified by searching against a six-frame translation of the raw genomic assembly, thus enabling the identification of hitherto unpredicted protein-coding genomic regions. Application of mass spectrometry to genome annotation presents a range of challenges to the standard workflows in proteomics, especially in terms of proteome coverage and database search strategies. Here we provide an overview of the field and argue that the latest mass spectrometry technologies that enable high mass accuracy at high acquisition rates will prove to be especially well suited for proteogenomics applications.  相似文献   

12.
串联质谱图谱从头测序算法研究进展   总被引:1,自引:0,他引:1  
近年来,基于质谱技术的高通量蛋白质组学研究发展迅速,利用串联质谱图谱鉴定蛋白质是其数据处理中一个基础而又重要的环节.由于不需要利用蛋白质序列数据库,从头测序方法能够分析新物种或者基因组未测序物种的串联质谱数据,具有数据库搜索方法不可替代的优势.简要介绍高通量串联质谱图谱从头测序问题及其研究现状.归纳出几种典型的计算策略并分析了各种策略的优缺点.总结常用的从头测序算法和软件,介绍算法评估的各种指标和常用评估数据集,概括各种算法的特点,展望未来研究可能的发展方向.  相似文献   

13.
In the analysis of complex peptide mixtures by MS-based proteomics, many more peptides elute at any given time than can be identified and quantified by the mass spectrometer. This makes it desirable to optimally allocate peptide sequencing and narrow mass range quantification events. In computer science, intelligent agents are frequently used to make autonomous decisions in complex environments. Here we develop and describe a framework for intelligent data acquisition and real-time database searching and showcase selected examples. The intelligent agent is implemented in the MaxQuant computational proteomics environment, termed MaxQuant Real-Time. It analyzes data as it is acquired on the mass spectrometer, constructs isotope patterns and SILAC pair information as well as controls MS and tandem MS events based on real-time and prior MS data or external knowledge. Re-implementing a top10 method in the intelligent agent yields similar performance to the data dependent methods running on the mass spectrometer itself. We demonstrate the capabilities of MaxQuant Real-Time by creating a real-time search engine capable of identifying peptides "on-the-fly" within 30 ms, well within the time constraints of a shotgun fragmentation "topN" method. The agent can focus sequencing events onto peptides of specific interest, such as those originating from a specific gene ontology (GO) term, or peptides that are likely modified versions of already identified peptides. Finally, we demonstrate enhanced quantification of SILAC pairs whose ratios were poorly defined in survey spectra. MaxQuant Real-Time is flexible and can be applied to a large number of scenarios that would benefit from intelligent, directed data acquisition. Our framework should be especially useful for new instrument types, such as the quadrupole-Orbitrap, that are currently becoming available.  相似文献   

14.
The analysis of proteomes of biological organisms represents a major challenge of the post-genome era. Classical proteomics combines two-dimensional electrophoresis (2-DE) and mass spectrometry (MS) for the identification of proteins. Novel technologies such as isotope coded affinity tag (ICAT)-liquid chromatography/mass spectrometry (LC/MS) open new insights into protein alterations. The vast amount and diverse types of proteomic data require adequate web-accessible computational and database technologies for storage, integration, dissemination, analysis and visualization. A proteome database system (http://www.mpiib-berlin.mpg.de/2D-PAGE) for microbial research has been constructed which integrates 2-DE/MS, ICAT-LC/MS and functional classification data of proteins with genomic, metabolic and other biological knowledge sources. The two-dimensional polyacrylamide gel electrophoresis database delivers experimental data on microbial proteins including mass spectra for the validation of protein identification. The ICAT-LC/MS database comprises experimental data for protein alterations of mycobacterial strains BCG vs. H37Rv. By formulating complex queries within a functional protein classification database "FUNC_CLASS" for Mycobacterium tuberculosis and Helicobacter pylori the researcher can gather precise information on genes, proteins, protein classes and metabolic pathways. The use of the R language in the database architecture allows high-level data analysis and visualization to be performed "on-the-fly". The database system is centrally administrated, and investigators without specific bioinformatic competence in database construction can submit their data. The database system also serves as a template for a prototype of a European Proteome Database of Pathogenic Bacteria. Currently, the database system includes proteome information for six strains of microorganisms.  相似文献   

15.
Orthogonal analysis of amino acid substitutions as a result of SNPs in existing proteomic datasets provides a critical foundation for the emerging field of population-based proteomics. Large-scale proteomics datasets, derived from shotgun tandem MS analysis of complex cellular protein mixtures, contain many unassigned spectra that may correspond to alternate alleles coded by SNPs. The purpose of this work was to identify tandem MS spectra in LC-MS/MS shotgun proteomics datasets that may represent coding nonsynonymous SNPs (nsSNP). To this end, we generated a tryptic peptide database created from allelic information found in NCBI's dbSNP. We searched this database with tandem MS spectra of tryptic peptides from DU4475 breast tumor cells that had been fractioned by pI in the first-dimension and reverse-phase LC in the second dimension. In all we identified 629 nsSNPs, of which 36 were of alternate SNP alleles not found in the reference NCBI or IPI protein databases. Searches for SNP-peptides carry a high risk of false positives due both to mass shifts caused by modifications and because of multiple representations of the same peptide within the genome. In this work, false positives were filtered using a novel peptide pI prediction algorithm and characterized using a decoy database developed by random substitution of similarly sized reference peptides. Secondary validation by sequencing of corresponding genomic DNA confirmed the presence of the predicted SNP in 8 of 10 SNP-peptides. This work highlights that the usefulness of interpreting unassigned spectra as polymorphisms is highly reliant on the ability to detect and filter false positives.  相似文献   

16.
17.
In shotgun proteomics, protein identification by tandem mass spectrometry relies on bioinformatics tools. Despite recent improvements in identification algorithms, a significant number of high quality spectra remain unidentified for various reasons. Here we present ScanRanker, an open-source tool that evaluates the quality of tandem mass spectra via sequence tagging with reliable performance in data from different instruments. The superior performance of ScanRanker enables it not only to find unassigned high quality spectra that evade identification through database search but also to select spectra for de novo sequencing and cross-linking analysis. In addition, we demonstrate that the distribution of ScanRanker scores predicts the richness of identifiable spectra among multiple LC-MS/MS runs in an experiment, and ScanRanker scores assist the process of peptide assignment validation to increase confident spectrum identifications. The source code and executable versions of ScanRanker are available from http://fenchurch.mc.vanderbilt.edu.  相似文献   

18.
MOTIVATION: A powerful proteomics methodology couples high-performance liquid chromatography (HPLC) with tandem mass spectrometry and database-search software, such as SEQUEST. Such a set-up, however, produces a large number of spectra, many of which are of too poor quality to be useful. Hence a filter that eliminates poor spectra before the database search can significantly improve throughput and robustness. Moreover, spectra judged to be of high quality, but that cannot be identified by database search, are prime candidates for still more computationally intensive methods, such as de novo sequencing or wider database searches including post-translational modifications. RESULTS: We report on two different approaches to assessing spectral quality prior to identification: binary classification, which predicts whether or not SEQUEST will be able to make an identification, and statistical regression, which predicts a more universal quality metric involving the number of b- and y-ion peaks. The best of our binary classifiers can eliminate over 75% of the unidentifiable spectra while losing only 10% of the identifiable spectra. Statistical regression can pick out spectra of modified peptides that can be identified by a de novo program but not by SEQUEST. In a section of independent interest, we discuss intensity normalization of mass spectra.  相似文献   

19.
Informatics for protein identification by mass spectrometry   总被引:3,自引:0,他引:3  
High throughput protein analysis (i.e., proteomics) first became possible when sensitive peptide mass mapping techniques were developed, thereby allowing for the possibility of identifying and cataloging most 2D gel electrophoresis spots. Shortly thereafter a few groups pioneered the idea of identifying proteins by using peptide tandem mass spectra to search protein sequence databases. Hence, it became possible to identify proteins from very complex mixtures. One drawback to these latter techniques is that it is not entirely straightforward to make matches using tandem mass spectra of peptides that are modified or have sequences that differ slightly from what is present in the sequence database that is being searched. This has been part of the motivation behind automated de novo sequencing programs that attempt to derive a peptide sequence regardless of its presence in a sequence database. The sequence candidates thus generated are then subjected to homology-based database search programs (e.g., BLAST or FASTA). These homology search programs, however, were not developed with mass spectrometry in mind, and it became necessary to make minor modifications such that mass spectrometric ambiguities can be taken into account when comparing query and database sequences. Finally, this review will discuss the important issue of validating protein identifications. All of the search programs will produce a top ranked answer; however, only the credulous are willing to accept them carte blanche.  相似文献   

20.
The recent surge in microbial genomic sequencing, combined with the development of high-throughput liquid chromatography-mass-spectrometry-based (LC/LC-MS/MS) proteomics, has raised the question of the extent to which genomic information of one strain or environmental sample can be used to profile proteomes of related strains or samples. Even with decreasing sequencing costs, it remains impractical to obtain genomic sequence for every strain or sample analyzed. Here, we evaluate how shotgun proteomics is affected by amino acid divergence between the sample and the genomic database using a probability-based model and a random mutation simulation model constrained by experimental data. To assess the effects of nonrandom distribution of mutations, we also evaluated identification levels using in silico peptide data from sequenced isolates with average amino acid identities (AAI) varying between 76 and 98%. We compared the predictions to experimental protein identification levels for a sample that was evaluated using a database that included genomic information for the dominant organism and for a closely related variant (95% AAI). The range of models set the boundaries at which half of the proteins in a proteomic experiment can be identified to be 77-92% AAI between orthologs in the sample and database. Consistent with this prediction, experimental data indicated loss of half the identifiable proteins at 90% AAI. Additional analysis indicated a 6.4% reduction of the initial protein coverage per 1% amino acid divergence and total identification loss at 86% AAI. Consequently, shotgun proteomics is capable of cross-strain identifications but avoids most cross-species false positives.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号