首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 9 毫秒
1.
Mass spectrometry combined with database searching has become the preferred method for identifying proteins in proteomics projects. Proteins are digested by one or several enzymes to obtain peptides, which are analyzed by mass spectrometry. We introduce a new family of scoring schemes, named OLAV, aimed at identifying peptides in a database from their tandem mass spectra. OLAV scoring schemes are based on signal detection theory, and exploit mass spectrometry information more extensively than previously existing schemes. We also introduce a new concept of structural matching that uses pattern detection methods to better separate true from false positives. We show the superiority of OLAV scoring schemes compared to MASCOT, a widely used identification program. We believe that this work introduces a new way of designing scoring schemes that are especially adapted to high-throughput projects such as GeneProt large-scale human plasma project, where it is impractical to check all identifications manually.  相似文献   

2.
Current techniques in tandem mass spectrometric analyses of cellular protein contents often produce thousands to tens of thousands of spectra per experiment. This study introduces a new algorithm, named SPEQUAL, which is aimed at automated tandem mass spectral quality assessment. The quality of a given spectrum can be evaluated from three basic components: (i) charge state differentiation, (ii) total signal intensity, and (iii) signal-to-noise estimates. The differentiation between single and multiple precursor charge states (i) provides a binary score for a given spectrum. Components (ii) and (iii) provide partial scores which are subsequently summarized and multiplied by the first score. SPEQUAL was applied to over 10,000 data files derived from almost 3,000 tandem mass spectra, and the results (final cumulative scores) were manually verified. SPEQUAL's performance was determined to have high sensitivity and specificity and low error rates for both spectral quality estimates in general and precursor charge state differentiation in particular. Each of the partial scores is controlled by adjustable thresholds to fine-tune SPEQUAL's performance for different analysis pipelines and instrumentation. This spectral quality assessment tool is intended to act in an advisory role to the researcher, assisting in filtration of thousands of spectra typically produced by high throughput tandem mass spectrometric proteome analyses. Lastly, SPEQUAL was implemented as Java GUI-based and command-line-based interfaces freely available for both academic and industrial researchers.  相似文献   

3.
Analysis of single nucleotide polymorphisms (SNPs) is a rapidly growing field of research that provides insights into the most common type of differences between individual genomes. The resulting information has a strong impact in the fields of pharmacogenomics, drug development, forensic medicine, and diagnostics of specific disease markers. The technique of matrix-assisted laser desorption/ionization time of flight mass spectrometry (MALDI-TOF MS) has been shown to be a highly suitable tool for the analysis of DNA. It supplies a very versatile method for addressing a high-throughput SNP genotyping approach. Here, we present the Bruker genotools SNP MANAGER, a new software tool suitable for highly automated MALDI-TOF MS SNP genotyping. The genotools SNP MANAGER administers the sample preparation data, calculates masses of allele-specific primer extension products, performs genotyping analysis, and displays the results. In the current study, we have used the genotools SNP MANAGER to perform an automated duplex SNP analysis of two biallelic markers from the promoter of the gene encoding the inflammatory mediator interleukin-6.  相似文献   

4.
MOTIVATION: MethylCoder is a software program that generates per-base methylation data given a set of bisulfite-treated reads. It provides the option to use either of two existing short-read aligners, each with different strengths. It accounts for soft-masked alignments and overlapping paired-end reads. MethylCoder outputs data in text and binary formats in addition to the final alignment in SAM format, so that common high-throughput sequencing tools can be used on the resulting output. It is more flexible than existing software and competitive in terms of speed and memory use. AVAILABILITY: MethylCoder requires only a python interpreter and a C compiler to run. Extensive documentation and the full source code are available under the MIT license at: https://github.com/brentp/methylcode. CONTACT: bpederse@gmail.com.  相似文献   

5.
Mass spectrometry has made rapid advances in the recent past and has become the preferred method for proteomics. Although many open source algorithms for peptide identification exist, such as X!Tandem and OMSSA, it has majorly been a domain of proprietary software. There is a need for better, freely available, and configurable algorithms that can help in identifying the correct peptides while keeping the false positives to a minimum. We have developed MassWiz, a novel empirical scoring function that gives appropriate weights to major ions, continuity of b-y ions, intensities, and the supporting neutral losses based on the instrument type. We tested MassWiz accuracy on 486,882 spectra from a standard mixture of 18 proteins generated on 6 different instruments downloaded from the Seattle Proteome Center public repository. We compared the MassWiz algorithm with Mascot, Sequest, OMSSA, and X!Tandem at 1% FDR. MassWiz outperformed all in the largest data set (AGILENT XCT) and was second only to Mascot in the other data sets. MassWiz showed good performance in the analysis of high confidence peptides, i.e., those identified by at least three algorithms. We also analyzed a yeast data set containing 106,133 spectra downloaded from the NCBI Peptidome repository and got similar results. The results demonstrate that MassWiz is an effective algorithm for high-confidence peptide identification without compromising on the number of assignments. MassWiz is open-source, versatile, and easily configurable.  相似文献   

6.
Peptide identification by tandem mass spectrometry is an important tool in proteomic research. Powerful identification programs exist, such as SEQUEST, ProICAT and Mascot, which can relate experimental spectra to the theoretical ones derived from protein databases, thus removing much of the manual input needed in the identification process. However, the time-consuming validation of the peptide identifications is still the bottleneck of many proteomic studies. One way to further streamline this process is to remove those spectra that are unlikely to provide a confident or valid peptide identification, and in this way to reduce the labour from the validation phase. RESULTS: We propose a prefiltering scheme for evaluating the quality of spectra before the database search. The spectra are classified into two classes: spectra which contain valuable information for peptide identification and spectra that are not derived from peptides or contain insufficient information for interpretation. The different spectral features developed for the classification are tested on a real-life material originating from human lymphoblast samples and on a standard mixture of 9 proteins, both labelled with the ICAT-reagent. The results show that the prefiltering scheme efficiently separates the two spectra classes.  相似文献   

7.
8.
MOTIVATION: High-throughput and high-resolution mass spectrometry instruments are increasingly used for disease classification and therapeutic guidance. However, the analysis of immense amount of data poses considerable challenges. We have therefore developed a novel method for dimensionality reduction and tested on a published ovarian high-resolution SELDI-TOF dataset. RESULTS: We have developed a four-step strategy for data preprocessing based on: (1) binning, (2) Kolmogorov-Smirnov test, (3) restriction of coefficient of variation and (4) wavelet analysis. Subsequently, support vector machines were used for classification. The developed method achieves an average sensitivity of 97.38% (sd = 0.0125) and an average specificity of 93.30% (sd = 0.0174) in 1000 independent k-fold cross-validations, where k = 2, ..., 10. AVAILABILITY: The software is available for academic and non-commercial institutions.  相似文献   

9.
A high-throughput software pipeline for analyzing high-performance mass spectral data sets has been developed to facilitate rapid and accurate biomarker determination. The software exploits the mass precision and resolution of high-performance instrumentation, bypasses peak-finding steps, and instead uses discrete m/z data points to identify putative biomarkers. The technique is insensitive to peak shape, and works on overlapping and non-Gaussian peaks which can confound peak-finding algorithms. Methods are presented to assess data set quality and the suitability of groups of m/z values that map to peaks as potential biomarkers. The algorithm is demonstrated with serum mass spectra from patients with and without ovarian cancer. Biomarker candidates are identified and ranked by their ability to discriminate between cancer and noncancer conditions. Their discriminating power is tested by classifying unknowns using a simple distance calculation, and a sensitivity of 95.6% and a specificity of 97.1% are obtained. In contrast, the sensitivity of the ovarian cancer blood marker CA125 is approximately 50% for stage I/II and approximately 80% for stage III/IV cancers. While the generalizability of these markers is currently unknown, we have demonstrated the ability of our analytical package to extract biomarker candidates from high-performance mass spectral data.  相似文献   

10.
Quantitative high-throughput mass spectrometry has become an established tool to measure relative gene expression proteome-wide. The output of such an experiment usually consists of a list of expression ratios (fold changes) for several thousand proteins between two conditions. However, we observed that individual peptide fold changes may show a significantly different behavior than other peptides from the same protein and that these differences cannot be explained by imprecise measurements. Such outlier peptides can be the consequence of several technical (misidentifications, misquantifications) or biological (post-translational modifications, differential regulation of isoforms) reasons. We developed a method to detect outlier peptides in mass spectrometry data which is able to delineate imprecise measurements from real outlier peptides with high accuracy when the true difference is as small as 1.4 fold. We applied our method to experimental data and investigated the different technical and biological effects that result in outlier peptides. Our method will assist future research to reduce technical bias and can help to identify genes with differentially regulated protein isoforms in high throughput mass spectrometry data.  相似文献   

11.
MS/MS is a widely used method for proteome‐wide analysis of protein expression and PTMs. The thousands of MS/MS spectra produced from a single experiment pose a major challenge for downstream analysis. Standard programs, such as MASCOT, provide peptide assignments for many of the spectra, including identification of PTM sites, but these results are plagued by false‐positive identifications. In phosphoproteomic experiments, only a single peptide assignment is typically available to support identification of each phosphorylation site, and hence minimizing false positives is critical. Thus, tedious manual validation is often required to increase confidence in the spectral assignments. We have developed phoMSVal, an open‐source platform for managing MS/MS data and automatically validating identified phosphopeptides. We tested five classification algorithms with 17 extracted features to separate correct peptide assignments from incorrect ones using over 2600 manually curated spectra. The naïve Bayes algorithm was among the best classifiers with an AUC value of 97% and PPV of 97% for phosphotyrosine data. This classifier required only three features to achieve a 76% decrease in false positives as compared with MASCOT while retaining 97% of true positives. This algorithm was able to classify an independent phosphoserine/threonine data set with AUC value of 93% and PPV of 91%, demonstrating the applicability of this method for all types of phospho‐MS/MS data. PhoMSVal is available at http://csbi.ltdk.helsinki.fi/phomsval .  相似文献   

12.
13.
High-throughput proteomics experiments typically generate large amounts of peptide fragmentation mass spectra during a single experiment. There is often a substantial amount of redundant fragmentation of the same precursors among these spectra, which is usually considered a nuisance. We here discuss the potential of clustering and merging redundant spectra to turn this redundancy into a useful property of the dataset. To this end, we have created the first general-purpose, freely available open-source software application for clustering and merging MS/MS spectra. The application also introduces a novel approach to calculating the similarity of fragmentation mass spectra that takes into account the increased precision of modern mass spectrometers, and we suggest a simple but effective improvement to single-linkage clustering. The application and the novel algorithms are applied to several real-life proteomic datasets and the results are discussed. An analysis of the influence of the different algorithms available and their parameters is given, as well as a number of important applications of the overall approach.  相似文献   

14.
Current efforts aimed at developing high-throughput proteomics focus on increasing the speed of protein identification. Although improvements in sample separation, enrichment, automated handling, mass spectrometric analysis, as well as data reduction and database interrogation strategies have done much to increase the quality, quantity and efficiency of data collection, significant bottlenecks still exist. Various separation techniques have been coupled with tandem mass spectrometric (MS/MS) approaches to allow a quicker analysis of complex mixtures of proteins, especially where a high number of unambiguous protein identifications are the exception, rather than the rule. MS/MS is required to provide structural / amino acid sequence information on a peptide and thus allow protein identity to be inferred from individual peptides. Currently these spectra need to be manually validated because: (a) the potential of false positive matches i.e., protein not in database, and (b) observed fragmentation trends may not be incorporated into current MS/MS search algorithms. This validation represents a significant bottleneck associated with high-throughput proteomic strategies. We have developed CHOMPER, a software program which reduces the time required to both visualize and confirm MS/MS search results and generate post-analysis reports and protein summary tables. CHOMPER extracts the identification information from SEQUEST MS/MS search result files, reproduces both the peptide and protein identification summaries, provides a more interactive visualization of the MS/MS spectra and facilitates the direct submission of manually validated identifications to a database.  相似文献   

15.
Li D  Fu Y  Sun R  Ling CX  Wei Y  Zhou H  Zeng R  Yang Q  He S  Gao W 《Bioinformatics (Oxford, England)》2005,21(13):3049-3050
SUMMARY: Research in proteomics requires powerful database-searching software to automatically identify protein sequences in a complex protein mixture via tandem mass spectrometry. In this paper, we describe a novel database-searching software system called pFind (peptide/protein Finder), which employs an effective peptide-scoring algorithm that we reported earlier. The pFind server is implemented with the C++ STL, .Net and XML technologies. As a result, high speed and good usability of the software are achieved.  相似文献   

16.
We describe a probabilistic peptide fragmentation model for use in protein databank searching and de novo sequencing of electrospray tandem mass spectrometry data. A probabilistic framework for tuning of the model using a range of well-characterized samples are introduced. We present preliminary results of our tuning efforts.  相似文献   

17.
Creasy DM  Cottrell JS 《Proteomics》2002,2(10):1426-1434
An error tolerant mode for database matching of uninterpreted tandem mass spectrometry data is described. Selected database entries are searched without enzyme specificity, using a comprehensive list of chemical and post-translational modifications, together with a residue substitution matrix. The modifications are tested serially, to avoid the catastrophic loss of discrimination that would occur if all the permutations of large numbers of modifications in combination were possible. The new mode has been coded as an extension to the Mascot search engine, and tested against a number of Liquid chromatography-tandem mass spectrometry datasets. The results show a number of additional peptide matches, but require careful interpretation. The most significant limitation of this approach is that it can only reveal new matches to proteins that already have at least one significant peptide match.  相似文献   

18.

Background  

In proteomics experiments, database-search programs are the method of choice for protein identification from tandem mass spectra. As amino acid sequence databases grow however, computing resources required for these programs have become prohibitive, particularly in searches for modified proteins. Recently, methods to limit the number of spectra to be searched based on spectral quality have been proposed by different research groups, but rankings of spectral quality have thus far been based on arbitrary cut-off values. In this work, we develop a more readily interpretable spectral quality statistic by providing probability values for the likelihood that spectra will be identifiable.  相似文献   

19.
20.
MOTIVATION: The annotation of the Arabidopsis thaliana genome remains a problem in terms of time and quality. To improve the annotation process, we want to choose the most appropriate tools to use inside a computer-assisted annotation platform. We therefore need evaluation of prediction programs with Arabidopsis sequences containing multiple genes. RESULTS: We have developed AraSet, a data set of contigs of validated genes, enabling the evaluation of multi-gene models for the Arabidopsis genome. Besides conventional metrics to evaluate gene prediction at the site and the exon levels, new measures were introduced for the prediction at the protein sequence level as well as for the evaluation of gene models. This evaluation method is of general interest and could apply to any new gene prediction software and to any eukaryotic genome. The GeneMark.hmm program appears to be the most accurate software at all three levels for the Arabidopsis genomic sequences. Gene modeling could be further improved by combination of prediction software. AVAILABILITY: The AraSet sequence set, the Perl programs and complementary results and notes are available at http://sphinx.rug.ac.be:8080/biocomp/napav/. CONTACT: Pierre.Rouze@gengenp.rug.ac.be.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号