首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Mass peak alignment (ion-wise alignment) has recently become a popular method for unsupervised data analysis in untargeted metabolic profiling. Here we present MSClust-a software tool for analysis GC-MS and LC-MS datasets derived from untargeted profiling. MSClust performs data reduction using unsupervised clustering and extraction of putative metabolite mass spectra from ion-wise chromatographic alignment data. The algorithm is based on the subtractive fuzzy clustering method that allows unsupervised determination of a number of metabolites in a data set and can deal with uncertain memberships of mass peaks in overlapping mass spectra. This approach is based purely on the actual information present in the data and does not require any prior metabolite knowledge. MSClust can be applied for both GC-MS and LC-MS alignment data sets. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s11306-011-0368-2) contains supplementary material, which is available to authorized users.  相似文献   

2.
NvAssign: protein NMR spectral assignment with NMRView   总被引:2,自引:0,他引:2  
MOTIVATION: Nuclear magnetic resonance (NMR) protein studies rely on the accurate assignment of resonances. The general procedure is to (1) pick peaks, (2) cluster data from various experiments or spectra, (3) assign peaks to the sequence and (4) verify the assignments with the spectra. Many algorithms already exist for automating the assignment process (step 3). What is lacking is a flexible interface to help a spectroscopist easily move from clustering (step 2) to assignment algorithms (step 3) and back to verification of the algorithm output with spectral analysis (step 4). RESULTS: A software module, NvAssign, was written for use with NMRView. It is a significant extension of the previous CBCA module. The module provides a flexible interface to cluster data and interact with the existing assignment algorithms. Further, the software module is able to read the results of other algorithms so that the data can be easily verified by spectral analysis. The generalized interface is demonstrated by connecting the clustered data with the assignment algorithms PACES and MONTE using previously assigned data for the lyase domain of DNA polymerase lambda. The spectral analysis program NMRView is now able to read the output of these programs for simplified analysis and verification. AVAILABILITY: NvAssign is available from http://dir.niehs.nih.gov/dirnmr/nvassign  相似文献   

3.
For our analysis of the data from the First Annual Proteomics Data Mining Conference, we attempted to discriminate between 24 disease spectra (group A) and 17 normal spectra (group B). First, we processed the raw spectra by (i) correcting for additive sinusoidal noise (periodic on the time scale) affecting most spectra, (ii) correcting for the overall baseline level, (iii) normalizing, (iv) recombining fractions, and (v) using variable-width windows for data reduction. Also, we identified a set of polymeric peaks (at multiples of 180.6 Da) that is present in several normal spectra (B1-B8). After data processing, we found the intensities at the following mass to charge (m/z) values to be useful discriminators: 3077, 12 886 and 74 263. Using these values, we were able to achieve an overall classification accuracy of 38/41 (92.6%). Perfect classification could be achieved by adding two additional peaks, at 2476 and 6955. We identified these values by applying a genetic algorithm to a filtered list of m/z values using Mahalanobis distance between the group means as a fitness function.  相似文献   

4.
Traditional analysis of liquid chromatography-mass spectrometry (LC-MS) data, typically performed by reviewing chromatograms and the corresponding mass spectra, is both time-consuming and difficult. Detailed data analysis is therefore often omitted in proteomics applications. When analysing multiple proteomics samples, it is usually only the final list of identified proteins that is reviewed. This may lead to unnecessarily complex or even contradictory results because the content of the list of identified proteins depends heavily on the conditions for triggering the collection of tandem mass spectra. Small changes in the signal intensity of a peptide in different LC-MS experiments can lead to the collection of a tandem mass spectrum in one experiment but not in another. Also, the quality of the tandem mass spectrometry experiments can vary, leading to successful identification in some cases but not in others. Using a novel image analysis approach, it is possible to achieve repeat analysis with a very high reproducibility by matching peptides across different LC-MS experiments using the retention time and parent mass over charge (m/z). It is also easy to confirm the final result visually. This approach has been investigated by using tryptic digests of integral membrane proteins from organelle-enriched fractions from Arabidopsis thaliana and it has been demonstrated that very highly reproducible, consistent, and reliable LC-MS data interpretation can be made.  相似文献   

5.
Assembling peptides identified from LC-MS/MS spectra into a list of proteins is a critical step in analyzing shotgun proteomics data. As one peptide sequence can be mapped to multiple proteins in a database, na?ve protein assembly can substantially overstate the number of proteins found in samples. We model the peptide-protein relationships in a bipartite graph and use efficient graph algorithms to identify protein clusters with shared peptides and to derive the minimal list of proteins. We test the effects of this parsimony analysis approach using MS/MS data sets generated from a defined human protein mixture, a yeast whole cell extract, and a human serum proteome after MARS column depletion. The results demonstrate that the bipartite parsimony technique not only simplifies protein lists but also improves the accuracy of protein identification. We use bipartite graphs for the visualization of the protein assembly results to render the parsimony analysis process transparent to users. Our approach also groups functionally related proteins together and improves the comprehensibility of the results. We have implemented the tool in the IDPicker package. The source code and binaries for this protein assembly pipeline are available under Mozilla Public License at the following URL: http://www.mc.vanderbilt.edu/msrc/bioinformatics/.  相似文献   

6.
MOTIVATION: Liquid chromatography coupled to mass spectrometry (LC-MS) and combined with tandem mass spectrometry (LC-MS/MS) have become a prominent tool for the analysis of complex proteomic samples. An important step in a typical workflow is the combination of results from multiple LC-MS experiments to improve confidence in the obtained measurements or to compare results from different samples. To do so, a suitable mapping or alignment between the data sets needs to be estimated. The alignment has to correct for variations in mass and elution time which are present in all mass spectrometry experiments. RESULTS: We propose a novel algorithm to align LC-MS samples and to match corresponding ion species across samples. Our algorithm matches landmark signals between two data sets using a geometric technique based on pose clustering. Variations in mass and retention time are corrected by an affine dewarping function estimated from matched landmarks. We use the pairwise dewarping in an algorithm for aligning multiple samples. We show that our pose clustering approach is fast and reliable as compared to previous approaches. It is robust in the presence of noise and able to accurately align samples with only few common ion species. In addition, we can easily handle different kinds of LC-MS data and adopt our algorithm to new mass spectrometry technologies. AVAILABILITY: This algorithm is implemented as part of the OpenMS software library for shotgun proteomics and available under the Lesser GNU Public License (LGPL) at www.openms.de.  相似文献   

7.
Mass spectrometry data are often corrupted by noise. It is very difficult to simultaneously detect low-abundance peaks and reduce false-positive peak detection caused by noise. In this paper, we propose to improve peak detection using an additional constraint: the consistent appearance of similar true peaks across multiple spectra. We observe that false -positive peaks in general do not repeat themselves well across multiple spectra. When we align all the identified peaks (including false-positive ones) from multiple spectra together, those false-positive peaks are not as consistent as true peaks. Thus, we propose to use information from other spectra in order to reduce false-positive peaks. The new method improves the detection of peaks over the traditional single spectrum based peak detection methods. Consequently, the discovery of cancer biomarkers also benefits from this improvement. Source code and additional data are available at: http://www.ece.ust.hk/ approximately eeyu/mspeak.htm.  相似文献   

8.
Quantitative proteomics approaches using stable isotopes are well-known and used in many labs nowadays. More recently, high resolution quantitative approaches are reported that rely on LC-MS quantitation of peptide concentrations by comparing peak intensities between multiple runs obtained by continuous detection in MS mode. Characteristic of these comparative LC-MS procedures is that they do not rely on the use of stable isotopes; therefore the procedure is often referred to as label-free LC-MS. In order to compare at comprehensive scale peak intensity data in multiple LC-MS datasets, dedicated software is required for detection, matching and alignment of peaks. The high accuracy in quantitative determination of peptide abundance provides an impressive level of detail. This approach also requires an experimental set-up where quantitative aspects of protein extraction and reproducible separation conditions need to be well controlled. In this paper we will provide insight in the critical parameters that affect the quality of the results and list an overview of the most recent software packages that are available for this procedure.  相似文献   

9.
One of the major bottlenecks in the determination of proteinstructures by NMR is in the evaluation of the data produced by theexperiments. An important step in this process is assignment, where thepeaks in the spectra are assigned to specific spins within specificresidues. In this paper, we discuss a spin system assignment tool based onpattern recognition techniques. This tool employs user-specified templatesto search for patterns of peaks in the original spectra; these patterns maycorrespond to side-chain or backbone fragments. Multiple spectra willnormally be searched simultaneously to reduce the impact of noise. Thesearch generates a preliminary list of putative assignments, which arefiltered by a set of heuristic algorithms to produce the final results list.Each result contains a set of chemical shift values plus information aboutthe peaks found. The results may be used as input for combinatorialroutines, such as sequential assignment procedures, in place of peak lists.Two examples are presented, in which (i) HCCH-COSY and -TOCSY spectra arescanned for side-chain spin systems; and (ii) backbone spin systems aredetected in a set of spectra comprising HNCA, HN(CO)CA, HNCO, HN(CA)CO,CBCANH and CBCA(CO)NH.  相似文献   

10.
A procedure for automated protein structure determination is presented that is based on an iterative procedure during which the NOESY peak list assignment and the structure calculation are performed simultaneously. The input consists of a list of NOESY peak positions and a list of chemical shifts as obtained from sequence-specific resonance assignment. For the present applications of this approach the previously introduced NOAH routine was implemented in the distance geometry program DIANA. As an illustration, experimental 2D and 3D NOESY cross-peak lists of six proteins have been analyzed, for which complete sequence-specific 1H assignments are available for the polypeptide backbone and the amino acid side chains. The automated method assigned 70–90% of all NOESY cross peaks, which is on average 10% less than with the interactive approach, and only between 0.8% and 2.4% of the automatically assigned peaks had a different assignment than in the corresponding manually assigned peak lists. The structures obtained with NOAH/DIANA are in close agreement with those from manually assigned peak lists, and with both approaches the residual constraint violations correspond to high-quality NMR structure determinations. Systematic comparisons of the bundles of conformers that represent corresponding automatically and interactively determined structures document the absence of significant bias in either approach, indicating that an important step has been made towards automation of structure determination from NMR spectra.  相似文献   

11.
Peak detection is a pivotal first step in biomarker discovery from MS data and can significantly influence the results of downstream data analysis steps. We developed a novel automatic peak detection method for prOTOF MS data, which does not require a priori knowledge of protein masses. Random noise is removed by an undecimated wavelet transform and chemical noise is attenuated by an adaptive short‐time discrete Fourier transform. Isotopic peaks corresponding to a single protein are combined by extracting an envelope over them. Depending on the S/N, the desired peaks in each individual spectrum are detected and those with the highest intensity among their peak clusters are recorded. The common peaks among all the spectra are identified by choosing an appropriate cut‐off threshold in the complete linkage hierarchical clustering. To remove the 1 Da shifting of the peaks, the peak corresponding to the same protein is determined as the detected peak with the largest number among its neighborhood. We validated this method using a data set of serial peptide and protein calibration standards. Compared with MoverZ program, our new method detects more peaks and significantly enhances S/N of the peak after the chemical noise removal. We then successfully applied this method to a data set from prOTOF MS spectra of albumin and albumin‐bound proteins from serum samples of 59 patients with carotid artery disease compared to vascular disease‐free patients to detect peaks with S/N≥2. Our method is easily implemented and is highly effective to define peaks that will be used for disease classification or to highlight potential biomarkers.  相似文献   

12.
MOTIVATION: Comparative metabolic profiling by nuclear magnetic resonance (NMR) is showing increasing promise for identifying inter-individual differences to drug response. Two dimensional (2D) (1)H (13)C NMR can reduce spectral overlap, a common problem of 1D (1)H NMR. However, the peak alignment tools for 1D NMR spectra are not well suited for 2D NMR. An automated and statistically robust method for aligning 2D NMR peaks is required to enable comparative metabonomic analysis using 2D NMR. RESULTS: A novel statistical method was developed to align NMR peaks that represent the same chemical groups across multiple 2D NMR spectra. The degree of local pattern match among peaks in different spectra is assessed using a similarity measure, and a heuristic algorithm maximizes the similarity measure for peaks across the whole spectrum. This peak alignment method was used to align peaks in 2D NMR spectra of endogenous metabolites in liver extracts obtained from four inbred mouse strains in the study of acetaminophen-induced liver toxicity. This automated alignment method was validated by manual examination of the top 50 peaks as ranked by signal intensity. Manual inspection of 1872 peaks in 39 different spectra demonstrated that the automated algorithm correctly aligned 1810 (96.7%) peaks. AVAILABILITY: Algorithm is available upon request.  相似文献   

13.

Background

Label-free quantitation of mass spectrometric data is one of the simplest and least expensive methods for differential expression profiling of proteins and metabolites. The need for high accuracy and performance computational label-free quantitation methods is still high in the biomarker and drug discovery research field. However, recent most advanced types of LC-MS generate huge amounts of analytical data with high scan speed, high accuracy and resolution, which is often impossible to interpret manually. Moreover, there are still issues to be improved for recent label-free methods, such as how to reduce false positive/negatives of the candidate peaks, how to expand scalability and how to enhance and automate data processing. AB3D (A simple label-free quantitation algorithm for Biomarker Discovery in Diagnostics and Drug discovery using LC-MS) has addressed these issues and has the capability to perform label-free quantitation using MS1 for proteomics study.

Results

We developed an algorithm called AB3D, a label free peak detection and quantitative algorithm using MS1 spectral data. To test our algorithm, practical applications of AB3D for LC-MS data sets were evaluated using 3 datasets. Comparisons were then carried out between widely used software tools such as MZmine 2, MSight, SuperHirn, OpenMS and our algorithm AB3D, using the same LC-MS datasets. All quantitative results were confirmed manually, and we found that AB3D could properly identify and quantify known peptides with fewer false positives and false negatives compared to four other existing software tools using either the standard peptide mixture or the real complex biological samples of Bartonella quintana (strain JK31). Moreover, AB3D showed the best reliability by comparing the variability between two technical replicates using a complex peptide mixture of HeLa and BSA samples. For performance, the AB3D algorithm is about 1.2 - 15 times faster than the four other existing software tools.

Conclusions

AB3D is a simple and fast algorithm for label-free quantitation using MS1 mass spectrometry data for large scale LC-MS data analysis with higher true positive and reasonable false positive rates. Furthermore, AB3D demonstrated the best reproducibility and is about 1.2- 15 times faster than those of existing 4 software tools.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0376-0) contains supplementary material, which is available to authorized users.  相似文献   

14.
Verification of candidate biomarker proteins in blood is typically done using multiple reaction monitoring (MRM) of peptides by LC-MS/MS on triple quadrupole MS systems. MRM assay development for each protein requires significant time and cost, much of which is likely to be of little value if the candidate biomarker is below the detection limit in blood or a false positive in the original discovery data. Here we present a new technology, accurate inclusion mass screening (AIMS), designed to provide a bridge from unbiased discovery to MS-based targeted assay development. Masses on the software inclusion list are monitored in each scan on the Orbitrap MS system, and MS/MS spectra for sequence confirmation are acquired only when a peptide from the list is detected with both the correct accurate mass and charge state. The AIMS experiment confirms that a given peptide (and thus the protein from which it is derived) is present in the plasma. Throughput of the method is sufficient to qualify up to a hundred proteins/week. The sensitivity of AIMS is similar to MRM on a triple quadrupole MS system using optimized sample preparation methods (low tens of ng/ml in plasma), and MS/MS data from the AIMS experiments on the Orbitrap can be directly used to configure MRM assays. The method was shown to be at least 4-fold more efficient at detecting peptides of interest than undirected LC-MS/MS experiments using the same instrumentation, and relative quantitation information can be obtained by AIMS in case versus control experiments. Detection by AIMS ensures that a quantitative MRM-based assay can be configured for that protein. The method has the potential to qualify large number of biomarker candidates based on their detection in plasma prior to committing to the time- and resource-intensive steps of establishing a quantitative assay.  相似文献   

15.
Mass spectrometry coupled to liquid chromatography (LC-MS and LC-MS/MS) is commonly used to analyze the protein content of biological samples in large scale studies, enabling quantitation and identification of proteins and peptides using a wide range of experimental protocols, algorithms, and statistical models to analyze the data. Currently it is difficult to compare the plethora of algorithms for these tasks. So far, curated benchmark data exists for peptide identification algorithms but data that represents a ground truth for the evaluation of LC-MS data is limited. Hence there have been attempts to simulate such data in a controlled fashion to evaluate and compare algorithms. We present MSSimulator, a simulation software for LC-MS and LC-MS/MS experiments. Starting from a list of proteins from a FASTA file, the simulation will perform in-silico digestion, retention time prediction, ionization filtering, and raw signal simulation (including MS/MS), while providing many options to change the properties of the resulting data like elution profile shape, resolution and sampling rate. Several protocols for SILAC, iTRAQ or MS(E) are available, in addition to the usual label-free approach, making MSSimulator the most comprehensive simulator for LC-MS and LC-MS/MS data.  相似文献   

16.
MOTIVATION: In a liquid chromatography-mass spectrometry (LC-MS)-based expressional proteomics, multiple samples from different groups are analyzed in parallel. It is necessary to develop a data mining system to perform peak quantification, peak alignment and data quality assurance. RESULTS: We have developed an algorithm for spectrum deconvolution. A two-step alignment algorithm is proposed for recognizing peaks generated by the same peptide but detected in different samples. The quality of LC-MS data is evaluated using statistical tests and alignment quality tests. AVAILABILITY: Xalign software is available upon request from the author.  相似文献   

17.

Background

We developed a new version of the open source software package Peptrix that can yet compare large numbers of Orbitrap? LC-MS data. The peptide profiling results for Peptrix on MS1 spectra were compared with those obtained from a small selection of open source and commercial software packages: msInspect, Sieve? and Progenesis?. The properties compared in these packages were speed, total number of detected masses, redundancy of masses, reproducibility in numbers and CV of intensity, overlap of masses, and differences in peptide peak intensities. Reproducibility measurements were taken for the different MS1 software applications by measuring in triplicate a complex peptide mixture of immunoglobulin on the Orbitrap? mass spectrometer. Values of peptide masses detected from the high intensity peaks of the MS1 spectra by peptide profiling were verified with values of the MS2 fragmented and sequenced masses that resulted in protein identifications with a significant score.

Findings

Peptrix finds about the same number of peptide features as the other packages, but peptide masses are in some cases approximately 5 to 10 times less redundant present in the peptide profile matrix. The Peptrix profile matrix displays the largest overlap when comparing the number of masses in a pair between two software applications. The overlap of peptide masses between software packages of low intensity peaks in the spectra is remarkably low with about 50% of the detected masses in the individual packages. Peptrix does not differ from the other packages in detecting 96% of the masses that relate to highly abundant sequenced proteins. MS1 peak intensities vary between the applications in a non linear way as they are not processed using the same method.

Conclusions

Peptrix is capable of peptide profiling using Orbitrap? files and finding differential expressed peptides in body fluid and tissue samples. The number of peptide masses detected in Orbitrap? files can be increased by using more MS1 peptide profiling applications, including Peptrix, since it appears from the comparison of Peptrix with the other applications that all software packages have likely a high false negative rate of low intensity peptide peaks (missing peptides).  相似文献   

18.
Novel algorithms are presented for automated NOESY peak picking and NOE signal identification in homonuclear 2D and heteronuclear-resolved 3D [1H,1H]-NOESY spectra during de novoprotein structure determination by NMR, which have been implemented in the new software ATNOS (automated NOESY peak picking). The input for ATNOS consists of the amino acid sequence of the protein, chemical shift lists from the sequence-specific resonance assignment, and one or several 2D or 3D NOESY spectra. In the present implementation, ATNOS performs multiple cycles of NOE peak identification in concert with automated NOE assignment with the software CANDID and protein structure calculation with the program DYANA. In the second and subsequent cycles, the intermediate protein structures are used as an additional guide for the interpretation of the NOESY spectra. By incorporating the analysis of the raw NMR data into the process of automated de novoprotein NMR structure determination, ATNOS enables direct feedback between the protein structure, the NOE assignments and the experimental NOESY spectra. The main elements of the algorithms for NOESY spectral analysis are techniques for local baseline correction and evaluation of local noise level amplitudes, automated determination of spectrum-specific threshold parameters, the use of symmetry relations, and the inclusion of the chemical shift information and the intermediate protein structures in the process of distinguishing between NOE peaks and artifacts. The ATNOS procedure has been validated with experimental NMR data sets of three proteins, for which high-quality NMR structures had previously been obtained by interactive interpretation of the NOESY spectra. The ATNOS-based structures coincide closely with those obtained with interactive peak picking. Overall, we present the algorithms used in this paper as a further important step towards objective and efficient de novoprotein structure determination by NMR.  相似文献   

19.
MOTIVATION: A major problem for current peak detection algorithms is that noise in mass spectrometry (MS) spectra gives rise to a high rate of false positives. The false positive rate is especially problematic in detecting peaks with low amplitudes. Usually, various baseline correction algorithms and smoothing methods are applied before attempting peak detection. This approach is very sensitive to the amount of smoothing and aggressiveness of the baseline correction, which contribute to making peak detection results inconsistent between runs, instrumentation and analysis methods. RESULTS: Most peak detection algorithms simply identify peaks based on amplitude, ignoring the additional information present in the shape of the peaks in a spectrum. In our experience, 'true' peaks have characteristic shapes, and providing a shape-matching function that provides a 'goodness of fit' coefficient should provide a more robust peak identification method. Based on these observations, a continuous wavelet transform (CWT)-based peak detection algorithm has been devised that identifies peaks with different scales and amplitudes. By transforming the spectrum into wavelet space, the pattern-matching problem is simplified and in addition provides a powerful technique for identifying and separating the signal from the spike noise and colored noise. This transformation, with the additional information provided by the 2D CWT coefficients can greatly enhance the effective signal-to-noise ratio. Furthermore, with this technique no baseline removal or peak smoothing preprocessing steps are required before peak detection, and this improves the robustness of peak detection under a variety of conditions. The algorithm was evaluated with SELDI-TOF spectra with known polypeptide positions. Comparisons with two other popular algorithms were performed. The results show the CWT-based algorithm can identify both strong and weak peaks while keeping false positive rate low. AVAILABILITY: The algorithm is implemented in R and will be included as an open source module in the Bioconductor project.  相似文献   

20.
Mass spectrometry is being used to find disease-related patterns in mixtures of proteins derived from biological fluids. Questions have been raised about the reproducibility and reliability of peak quantifications using this technology. We collected nipple aspirate fluid from breast cancer patients and healthy women, pooled them into a quality control sample, and produced 24 replicate SELDI spectra. We developed a novel algorithm to process the spectra, denoising with the undecimated discrete wavelet transform (UDWT), and evaluated it for consistency and reproducibility. UDWT efficiently decomposes spectra into noise and signal. The noise is consistent and uncorrelated. Baseline correction produces isolated peak clusters separated by flat regions. Our method reproducibly detects more peaks than the method implemented in Ciphergen software. After normalization and log transformation, the mean coefficient of variation of peak heights is 10.6%. Our method to process spectra provides improvements over existing methods. Denoising using the UDWT appears to be an important step toward obtaining results that are more accurate. It improves the reproducibility of quantifications and supplies tools for investigation of the variations in the technology more carefully. Further study will be required, because we do not have a gold standard providing an objective assessment of which peaks are present in the samples.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号