首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.

Background  

Liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) has become one of the most used tools in mass spectrometry based proteomics. Various algorithms have since been developed to automate the process for modern high-throughput LC-MS/MS experiments.  相似文献   

2.
We evaluate statistical models used in two-hypothesis tests for identifying peptides from tandem mass spectrometry data. The null hypothesis H(0), that a peptide matches a spectrum by chance, requires information on the probability of by-chance matches between peptide fragments and peaks in the spectrum. Likewise, the alternate hypothesis H(A), that the spectrum is due to a particular peptide, requires probabilities that the peptide fragments would indeed be observed if it was the causative agent. We compare models for these probabilities by determining the identification rates produced by the models using an independent data set. The initial models use different probabilities depending on fragment ion type, but uniform probabilities for each ion type across all of the labile bonds along the backbone. More sophisticated models for probabilities under both H(A) and H(0) are introduced that do not assume uniform probabilities for each ion type. In addition, the performance of these models using a standard likelihood model is compared to an information theory approach derived from the likelihood model. Also, a simple but effective model for incorporating peak intensities is described. Finally, a support-vector machine is used to discriminate between correct and incorrect identifications based on multiple characteristics of the scoring functions. The results are shown to reduce the misidentification rate significantly when compared to a benchmark cross-correlation based approach.  相似文献   

3.
To interpret LC-MS/MS data in proteomics, most popular protein identification algorithms primarily use predicted fragment m/z values to assign peptide sequences to fragmentation spectra. The intensity information is often undervalued, because it is not as easy to predict and incorporate into algorithms. Nevertheless, the use of intensity to assist peptide identification is an attractive prospect and can potentially improve the confidence of matches and generate more identifications. On the basis of our previously reported study of fragmentation intensity patterns, we developed a protein identification algorithm, SeQuence IDentfication (SQID), that makes use of the coarse intensity from a statistical analysis. The scoring scheme was validated by comparing with Sequest and X!Tandem using three data sets, and the results indicate an improvement in the number of identified peptides, including unique peptides that are not identified by Sequest or X!Tandem. The software and source code are available under the GNU GPL license at http://quiz2.chem.arizona.edu/wysocki/bioinformatics.htm.  相似文献   

4.
Joh  Yoonsung  Lee  Kangbae  Kim  Hyunwoo  Park  Heejin 《BMC bioinformatics》2023,24(1):1-21
A cell exhibits a variety of responses to internal and external cues. These responses are possible, in part, due to the presence of an elaborate gene regulatory network (GRN) in every single cell. In the past 20 years, many groups worked on reconstructing the topological structure of GRNs from large-scale gene expression data using a variety of inference algorithms. Insights gained about participating players in GRNs may ultimately lead to therapeutic benefits. Mutual information (MI) is a widely used metric within this inference/reconstruction pipeline as it can detect any correlation (linear and non-linear) between any number of variables (n-dimensions). However, the use of MI with continuous data (for example, normalized fluorescence intensity measurement of gene expression levels) is sensitive to data size, correlation strength and underlying distributions, and often requires laborious and, at times, ad hoc optimization. In this work, we first show that estimating MI of a bi- and tri-variate Gaussian distribution using k-nearest neighbor (kNN) MI estimation results in significant error reduction as compared to commonly used methods based on fixed binning. Second, we demonstrate that implementing the MI-based kNN Kraskov–Stoögbauer–Grassberger (KSG) algorithm leads to a significant improvement in GRN reconstruction for popular inference algorithms, such as Context Likelihood of Relatedness (CLR). Finally, through extensive in-silico benchmarking we show that a new inference algorithm CMIA (Conditional Mutual Information Augmentation), inspired by CLR, in combination with the KSG-MI estimator, outperforms commonly used methods. Using three canonical datasets containing 15 synthetic networks, the newly developed method for GRN reconstruction—which combines CMIA, and the KSG-MI estimator—achieves an improvement of 20–35% in precision-recall measures over the current gold standard in the field. This new method will enable researchers to discover new gene interactions or better choose gene candidates for experimental validations.  相似文献   

5.
Polyamines are key regulators of cell development and many plant responses to environmental challenges, however, their functions still remain unclear in complex interactions with other hormones and in biotic or abiotic stress. This lack of knowledge derives from the difficulties on measuring natural polyamines in plants. Here, we present a fast multiresidue method for putrescine (Put), 1,3-diaminopropane (DAP), l-ornithine, spermidine (Spd) and spermine (Spn) measurements in plant samples. Polyamine determination is based on a perchloric acid extraction followed by a simple filtration procedure without previous derivatization. Polyamines are resolved by HPLC in a C18 common column and quantified by electrospray ionization tandem mass spectrometry. 13C4-putrescine and 1,7-diaminoheptane standards were added prior to sample extraction to achieve an accurate quantification in a single run. Chromatography of polyamines presents poor retention when reverse phase C18 common columns are used, because they are very polar compounds and contain several positive charges. To circumvent this problem ionic pairing technique has been used successfully with heptafluorobutyric acid (HFBA) at 1 mM in the aqueous phase and 25 mM in the sample. Improvement of the signal depleted by HFBA has been achieved by adding 1% of propionic acid to the aqueous and organic eluents. All together, gives a method accurate enough to determine polyamines in plants. To demonstrate the usefulness of the method it has been validated in Arabidopsis thaliana samples and polyamines have been determined in several genotypes that over express (35S::ADC2 line 3.6) or are disrupted (adc2) in the Arginine Decarboxylase2 (ADC2) gene.  相似文献   

6.
High-throughput protein analysis by tandem mass spectrometry produces anywhere from thousands to millions of spectra that are being used for peptide and protein identifications. Though each spectrum corresponds only to one charged peptide (ion) state, repetitive database searches of multiple charge states are typically conducted since the resolution of many common mass spectrometers is not sufficient to determine the charge state. The resulting database searches are both error-prone and time-consuming. We describe a straightforward, accurate approach on charge state estimation (CHASTE). CHASTE relies on fragment ion peak distributions, and by using reliable logistic regression models, combines different measurements to improve its accuracy. CHASTE's performance has been validated on data sets, comprised of known peptide dissociation spectra, obtained by replicate analyses of our earlier developed protein standard mixture using ion trap mass spectrometers at different laboratories. CHASTE was able to reduce number of needed database searches by at least 60% and the number of redundant searches by at least 90% virtually without any informational loss. This greatly alleviates one of the major bottlenecks in high throughput peptide and protein identifications. Thresholds and parameter estimates can be tailored to specific analysis situations, pipelines, and instrumentations. CHASTE was implemented in Java GUI-based and command-line-based interfaces.  相似文献   

7.
Mass spectrometry has become a key technology for modern large-scale protein sequencing. Tandem mass spectrometry, the process of peptide ion dissociation followed by mass-to-charge ratio (m/z) analysis, is the critical component for peptide identification. Recent advances in mass spectrometry now permit two discrete, and complementary, types of peptide ion fragmentation: collision-activated dissociation (CAD) and electron transfer dissociation (ETD) on a single instrument. To exploit this complementarity and increase sequencing success rates, we designed and embedded a data-dependent decision tree algorithm (DT) to make unsupervised, real-time decisions of which fragmentation method to use based on precursor charge and m/z. Applying the DT to large-scale proteome analyses of Saccharomyces cerevisiae and human embryonic stem cells, we identified 53,055 peptides in total, which was greater than by using CAD (38,293) or ETD (39,507) alone. In addition, the DT method also identified 7,422 phosphopeptides, compared to either 2,801 (CAD) or 5,874 (ETD) phosphopeptides.  相似文献   

8.
High-throughput proteomics is made possible by a combination of modern mass spectrometry instruments capable of generating many millions of tandem mass (MS(2)) spectra on a daily basis and the increasingly sophisticated associated software for their automated identification. Despite the growing accumulation of collections of identified spectra and the regular generation of MS(2) data from related peptides, the mainstream approach for peptide identification is still the nearly two decades old approach of matching one MS(2) spectrum at a time against a database of protein sequences. Moreover, database search tools overwhelmingly continue to require that users guess in advance a small set of 4-6 post-translational modifications that may be present in their data in order to avoid incurring substantial false positive and negative rates. The spectral networks paradigm for analysis of MS(2) spectra differs from the mainstream database search paradigm in three fundamental ways. First, spectral networks are based on matching spectra against other spectra instead of against protein sequences. Second, spectral networks find spectra from related peptides even before considering their possible identifications. Third, spectral networks determine consensus identifications from sets of spectra from related peptides instead of separately attempting to identify one spectrum at a time. Even though spectral networks algorithms are still in their infancy, they have already delivered the longest and most accurate de novo sequences to date, revealed a new route for the discovery of unexpected post-translational modifications and highly-modified peptides, enabled automated sequencing of cyclic non-ribosomal peptides with unknown amino acids and are now defining a novel approach for mapping the entire molecular output of biological systems that is suitable for analysis with tandem mass spectrometry. Here we review the current state of spectral networks algorithms and discuss possible future directions for automated interpretation of spectra from any class of molecules.  相似文献   

9.
Several methods have been used to identify peptides that correspond to tandem mass spectra. In this work, we describe a data set of low energy tandem mass spectra generated from a control mixture of known protein components that can be used to evaluate the accuracy of these methods. As an example, these spectra were searched by the SEQUEST application against a human peptide sequence database. The numbers of resulting correct and incorrect peptide assignments were then determined. We show how the sensitivity and error rate are affected by the use of various filtering criteria based upon SEQUEST scores and the number of tryptic termini of assigned peptides.  相似文献   

10.
In a neonatal-screening pilot study for inherited disorders in organic acid and amino acid metabolism, we analyzed butyrated acylcarnitines and amino acids in blood spots of more than 20 000 newborns by electrospray tandem mass spectrometry. In order to screen urea cycle disorders, we performed multiple scanning functions with additional stable isotope-labelled internal standards, since such reported functions as neutral loss of m/z 102 or 109 for butyrated amino acids were not sufficient. Arginine levels were measured with arginine-13C6. Hypocitrullinemia for the screening of some urea cycle disorders was detectable by measurement with synthesized citrulline-d6, although we did not find any such disorders. In the acylcarnitine analysis, we found a patient with propionic acidemia, who has been treated effectively. The increasing false positive rate due to the use of pivalic acid-containing antibiotics in the diagnosis of isovaleric acidemia was a problem in Japan.  相似文献   

11.
Current techniques in tandem mass spectrometric analyses of cellular protein contents often produce thousands to tens of thousands of spectra per experiment. This study introduces a new algorithm, named SPEQUAL, which is aimed at automated tandem mass spectral quality assessment. The quality of a given spectrum can be evaluated from three basic components: (i) charge state differentiation, (ii) total signal intensity, and (iii) signal-to-noise estimates. The differentiation between single and multiple precursor charge states (i) provides a binary score for a given spectrum. Components (ii) and (iii) provide partial scores which are subsequently summarized and multiplied by the first score. SPEQUAL was applied to over 10,000 data files derived from almost 3,000 tandem mass spectra, and the results (final cumulative scores) were manually verified. SPEQUAL's performance was determined to have high sensitivity and specificity and low error rates for both spectral quality estimates in general and precursor charge state differentiation in particular. Each of the partial scores is controlled by adjustable thresholds to fine-tune SPEQUAL's performance for different analysis pipelines and instrumentation. This spectral quality assessment tool is intended to act in an advisory role to the researcher, assisting in filtration of thousands of spectra typically produced by high throughput tandem mass spectrometric proteome analyses. Lastly, SPEQUAL was implemented as Java GUI-based and command-line-based interfaces freely available for both academic and industrial researchers.  相似文献   

12.

Background  

In proteomics experiments, database-search programs are the method of choice for protein identification from tandem mass spectra. As amino acid sequence databases grow however, computing resources required for these programs have become prohibitive, particularly in searches for modified proteins. Recently, methods to limit the number of spectra to be searched based on spectral quality have been proposed by different research groups, but rankings of spectral quality have thus far been based on arbitrary cut-off values. In this work, we develop a more readily interpretable spectral quality statistic by providing probability values for the likelihood that spectra will be identifiable.  相似文献   

13.
14.
蛋白质组学的兴起带动了质谱技术的快速发展,而质谱技术的进步则拓宽了蛋白质组学研究问题的广度.最近10年内,肽段或完整蛋白质在质谱仪中的裂解技术——电子捕获裂解(electron capture dissociation,ECD)与电子转运裂解(electron transfer dissociation,ETD)逐渐发展起来.ECD和ETD在蛋白质组学中的应用,特别是在蛋白质的翻译后修饰鉴定和自顶而下(Top-down)的完整蛋白质裂解研究中已经展示出了诱人的前景.对ECD和ETD的基本原理、质谱特点、仪器实现、数据解析算法与软件开发,以及在蛋白质组学中的应用进展等方面进行了比较系统全面的阐述,并对当前的研究问题、面临的技术挑战与未来的发展趋势等方面作了深入剖析.  相似文献   

15.
Cross-linking technology combined with tandem mass spectrometry (MS-MS) is a powerful method that provides a rapid solution to the discovery of protein-protein interactions and protein structures. We studied the problem of detecting cross-linked peptides and cross-linked amino acids from tandem mass spectral data. Our method consists of two steps: the first step finds two protein subsequences whose mass sum equals a given mass measured from the mass spectrometry; and the second step finds the best cross-linked amino acids in these two peptide sequences that are optimally correlated to a given tandem mass spectrum. We designed fast and space-efficient algorithms for these two steps and implemented and tested them on experimental data of cross-linked hemoglobin proteins. An interchain cross-link between two beta subunits was found in two tandem mass spectra. The length of the cross-linker (7.7 A) is very close to the actual distance (8.18 A) obtained from the molecular structure in PDB.  相似文献   

16.
High-throughput proteomics experiments typically generate large amounts of peptide fragmentation mass spectra during a single experiment. There is often a substantial amount of redundant fragmentation of the same precursors among these spectra, which is usually considered a nuisance. We here discuss the potential of clustering and merging redundant spectra to turn this redundancy into a useful property of the dataset. To this end, we have created the first general-purpose, freely available open-source software application for clustering and merging MS/MS spectra. The application also introduces a novel approach to calculating the similarity of fragmentation mass spectra that takes into account the increased precision of modern mass spectrometers, and we suggest a simple but effective improvement to single-linkage clustering. The application and the novel algorithms are applied to several real-life proteomic datasets and the results are discussed. An analysis of the influence of the different algorithms available and their parameters is given, as well as a number of important applications of the overall approach.  相似文献   

17.
Two new biomarkers, serum amyloid-P (SAP) and plasma C1-inhibitor protein are elevated in the maternal circulation of mothers carrying Down syndrome foetuses. Much emphasis of late\ has been put on the lack of translational tests being developed following the identification of new biomarkers. We have created a single-reaction-monitoring (SRM) tandem mass spectrometry-based assay for the quantitation of these biomarkers and compared these results with an in-house developed immunofluorescence-based technique (IF). This MS-based assay is a rapid 5 min test and a simple "one pot reaction," requiring only 5μl of plasma. To evaluate the potential of SRM-based quantitation in a clinical setting, SAP and C1-inhibitor were quantitated in 38 normal and Down syndrome affected pregnancies. Plasma SAP levels in the Down's group were significantly raised at 10-14 weeks (p<0.0015) and 14-20 weeks (p<0.0001). Plasma C1-inhibitor levels were also observed significantly elevated in the Down's group (10-14 weeks, p<0.0193, 14-20 weeks, p<0.0001). Analysis using the IF technique did not show any significant elevation of plasma SAP levels or C1-inhibitor levels. This rapid and sensitive assay demonstrates the potential of multiplexed tandem MS-based quantitation of proteins in chemical pathology labs and in a more cost-effective, accurate manner than conventionally used antibody methods.  相似文献   

18.
Determining glycan structures is vital to comprehend cell-matrix, cell-cell, and even intracellular biological events. Glycan sequencing, which determines the primary structure of a glycan using tandem mass spectrometry (MS/MS), remains one of the most important tasks in proteomics. Analogous to peptide de novo sequencing, glycan de novo sequencing determines the structure without the aid of a known glycan database. We show in this paper that glycan de novo sequencing is NP-hard. We then provide a heuristic algorithm and develop a software program to solve the problem in practical cases. Experiments on real MS/MS data of glycopeptides demonstrate that our heuristic algorithm gives satisfactory results on practical data.  相似文献   

19.
MS/MS combined with database search methods can identify the proteins present in complex mixtures. High throughput methods that infer probable peptide sequences from enzymatically digested protein samples create a challenge in how best to aggregate the evidence for candidate proteins. Typically the results of multiple technical and/or biological replicate experiments must be combined to maximize sensitivity. We present a statistical method for estimating probabilities of protein expression that integrates peptide sequence identifications from multiple search algorithms and replicate experimental runs. The method was applied to create a repository of 797 non-homologous zebrafish (Danio rerio) proteins, at an empirically validated false identification rate under 1%, as a resource for the development of targeted quantitative proteomics assays. We have implemented this statistical method as an analytic module that can be integrated with an existing suite of open-source proteomics software.  相似文献   

20.
Zhao Y  Lin YH 《Proteomics》2005,5(4):853-855
Instead of using the probability mean, a simple and yet effective heuristic approach was employed to treat experimentally obtained tandem mass spectrometry (MS/MS) data for protein identification. The proposed approach is based on the total number (T) of identified experimental MS/MS data. To warrant the subsequent ranking, the total number of identified b- and y-type ions (Tb+y) must be greater than 50% of T. Peptides having the same T and Tb+y are either ranked by the contiguity of identified ions or discarded during identification. When compared to other protein identification tools, good agreement with the searched results was seen.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号