首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background  

High-throughput shotgun proteomics data contain a significant number of spectra from non-peptide ions or spectra of too poor quality to obtain highly confident peptide identifications. These spectra cannot be identified with any positive peptide matches in some database search programs or are identified with false positives in others. Removing these spectra can improve the database search results and lower computational expense.  相似文献   

2.

Background  

Protein identification using mass spectrometry is an important tool in many areas of the life sciences, and in proteomics research in particular. Increasing the number of proteins correctly identified is dependent on the ability to include new knowledge about the mass spectrometry fragmentation process, into computational algorithms designed to separate true matches of peptides to unidentified mass spectra from spurious matches. This discrimination is achieved by computing a function of the various features of the potential match between the observed and theoretical spectra to give a numerical approximation of their similarity. It is these underlying "metrics" that determine the ability of a protein identification package to maximise correct identifications while limiting false discovery rates. There is currently no software available specifically for the simple implementation and analysis of arbitrary novel metrics for peptide matching and for the exploration of fragmentation patterns for a given dataset.  相似文献   

3.

Background  

High-throughput peptide and protein identification technologies have benefited tremendously from strategies based on tandem mass spectrometry (MS/MS) in combination with database searching algorithms. A major problem with existing methods lies within the significant number of false positive and false negative annotations. So far, standard algorithms for protein identification do not use the information gained from separation processes usually involved in peptide analysis, such as retention time information, which are readily available from chromatographic separation of the sample. Identification can thus be improved by comparing measured retention times to predicted retention times. Current prediction models are derived from a set of measured test analytes but they usually require large amounts of training data.  相似文献   

4.
5.
A novel hierarchical MS2/MS3 database search algorithm has been developed to analyze MS2/MS3 phosphopeptides proteomic data. The algorithm is incorporated in an automated database search program, MassMatrix. The algorithm matches experimental MS2 spectra against a supplied protein database to determine candidate peptide matches. It then matches the corresponding experimental MS3 spectra against those candidate peptide matches. The MS2 and MS3 spectra are used in concert to arrive at peptide matches with overall higher confidence rather than combining MS2 and MS3 data searched separately. Receiver operating characteristic analysis showed that hierarchical MS2/MS3 database searches with MassMatrix had better sensitivity and specificity than the two‐stage MS2/MS3 database searches obtained with MassMatrix, MASCOT, and X!Tandem. A greater number of true peptide matches at a given false rate were identified by use of this new algorithm for data collected on both LCQ and LTQ‐FTICR mass spectrometers. The additional MS3 spectral data also improved the overall reliability and the number of true positives (TPs) due to the fact that the TPs of the MS2/MS3 search results had higher scores than those of the MS2.  相似文献   

6.

Background  

SUPFAM database is a compilation of superfamily relationships between protein domain families of either known or unknown 3-D structure. In SUPFAM, sequence families from Pfam and structural families from SCOP are associated, using profile matching, to result in sequence superfamilies of known structure. Subsequently all-against-all family profile matches are made to deduce a list of new potential superfamilies of yet unknown structure.  相似文献   

7.

Background  

Mass spectrometers can produce a large number of tandem mass spectra. They are unfortunately noise-contaminated. Noises can affect the quality of tandem mass spectra and thus increase the false positives and false negatives in the peptide identification. Therefore, it is appealing to develop an approach to denoising tandem mass spectra.  相似文献   

8.

Background  

Searching a database of protein structures for matches to a query structure, or occurrences of a structural motif, is an important task in structural biology and bioinformatics. While there are many existing methods for structural similarity searching, faster and more accurate approaches are still required, and few current methods are capable of substructure (motif) searching.  相似文献   

9.

Background

The sequence database searching has been the dominant method for peptide identification, in which a large number of peptide spectra generated from LC/MS/MS experiments are searched using a search engine against theoretical fragmentation spectra derived from a protein sequences database or a spectral library. Selecting trustworthy peptide spectrum matches (PSMs) remains a challenge.

Results

A novel scoring method named FC-Ranker is developed to assign a nonnegative weight to each target PSM based on the possibility of its being correct. Particularly, the scores of PSMs are updated by using a fuzzy SVM classification model and a fuzzy silhouette index iteratively. Trustworthy PSMs will be assigned high scores when the algorithm stops.

Conclusions

Our experimental studies show that FC-Ranker outperforms other post-database search algorithms over a variety of datasets, and it can be extended to solve a general classification problem with uncertain labels.
  相似文献   

10.

Background  

Several protein-protein interaction studies have been performed for the yeast Saccharomyces cerevisiae using different high-throughput experimental techniques. All these results are collected in the BioGRID database and the SGD database provide detailed annotation of the different proteins. Despite the value of BioGRID for studying protein-protein interactions, there is a need for manual curation of these interactions in order to remove false positives.  相似文献   

11.

Background  

Tandem mass spectrometry-based database searching has become an important technology for peptide and protein identification. One of the key challenges in database searching is the remarkable increase in computational demand, brought about by the expansion of protein databases, semi- or non-specific enzymatic digestion, post-translational modifications and other factors. Some software tools choose peptide indexing to accelerate processing. However, peptide indexing requires a large amount of time and space for construction, especially for the non-specific digestion. Additionally, it is not flexible to use.  相似文献   

12.
13.

Background  

Mass spectrometry has become a standard method by which the proteomic profile of cell or tissue samples is characterized. To fully take advantage of tandem mass spectrometry (MS/MS) techniques in large scale protein characterization studies robust and consistent data analysis procedures are crucial. In this work we present a machine learning based protocol for the identification of correct peptide-spectrum matches from Sequest database search results, improving on previously published protocols.  相似文献   

14.
Eriksson J  Fenyö D 《Proteomics》2002,2(3):262-270
A rapid and accurate method for testing the significance of protein identities determined by mass spectrometric analysis of protein digests and genome database searching is presented. The method is based on direct computation using a statistical model of the random matching of measured and theoretical proteolytic peptide masses. Protein identification algorithms typically rank the proteins of a genome database according to a score based on the number of matches between the masses obtained by mass spectrometry analysis and the theoretical proteolytic peptide masses of a database protein. The random matching of experimental and theoretical masses can cause false results. A result is significant only if the score characterizing the result deviates significantly from the score expected from a false result. A distribution of the score (number of matches) for random (false) results is computed directly from our model of the random matching, which allows significance testing under any experimental and database search constraints. In order to mimic protein identification data quality in large-scale proteome projects, low-to-high quality proteolytic peptide mass data were generated in silico and subsequently submitted to a database search program designed to include significance testing based on direct computation. This simulation procedure demonstrates the usefulness of direct significance testing for automatically screening for samples that must be subjected to peptide sequence analysis by e.g. tandem mass spectrometry in order to determine the protein identity.  相似文献   

15.

Background  

Proteomic protein identification results need to be compared across laboratories and platforms, and thus a reliable method is needed to estimate false discovery rates. The target-decoy strategy is a platform-independent and thus a prime candidate for standardized reporting of data. In its current usage based on global population parameters, the method does not utilize individual peptide scores optimally.  相似文献   

16.

Background  

The signal peptide plays an important role in protein targeting and protein translocation in both prokaryotic and eukaryotic cells. This transient, short peptide sequence functions like a postal address on an envelope by targeting proteins for secretion or for transfer to specific organelles for further processing. Understanding how signal peptides function is crucial in predicting where proteins are translocated. To support this understanding, we present SPdb signal peptide database , a repository of experimentally determined and computationally predicted signal peptides.  相似文献   

17.

Background  

Accurate peptide identification is important to high-throughput proteomics analyses that use mass spectrometry. Search programs compare fragmentation spectra (MS/MS) of peptides from complex digests with theoretically derived spectra from a database of protein sequences. Improved discrimination is achieved with theoretical spectra that are based on simulating gas phase chemistry of the peptides, but the limited understanding of those processes affects the accuracy of predictions from theoretical spectra.  相似文献   

18.

Background  

Often high-quality MS/MS spectra of tryptic peptides do not match to any database entry because of only partially sequenced genomes and therefore, protein identification requires de novo peptide sequencing. To achieve protein identification of the economically important but still unsequenced plant pathogenic oomycete Plasmopara halstedii, we first evaluated the performance of three different de novo peptide sequencing algorithms applied to a protein digests of standard proteins using a quadrupole TOF (QStar Pulsar i).  相似文献   

19.

Background  

Bacterial typing schemes based on the sequences of genes encoding surface antigens require databases that provide a uniform, curated, and widely accepted nomenclature of the variants identified. Due to the differences in typing schemes, imposed by the diversity of genes targeted, creating these databases has typically required the writing of one-off code to link the database to a web interface. Here we describe agdbNet, widely applicable web database software that facilitates simultaneous BLAST querying of multiple loci using either nucleotide or peptide sequences.  相似文献   

20.

Background  

Antibacterial peptides are important components of the innate immune system, used by the host to protect itself from different types of pathogenic bacteria. Over the last few decades, the search for new drugs and drug targets has prompted an interest in these antibacterial peptides. We analyzed 486 antibacterial peptides, obtained from antimicrobial peptide database APD, in order to understand the preference of amino acid residues at specific positions in these peptides.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号