首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background  

Surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) is a powerful tool for rapidly generating high-throughput protein profiles from a large number of samples. However, the events that occur between the first and last sample run are likely to introduce technical variation in the results.  相似文献   

2.

Background  

Surface enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI) is a proteomics tool for biomarker discovery and other high throughput applications. Previous studies have identified various areas for improvement in preprocessing algorithms used for protein peak detection. Bottom-up approaches to preprocessing that emphasize modeling SELDI data acquisition are promising avenues of research to find the needed improvements in reproducibility.  相似文献   

3.
MOTIVATION: Pre-processing of SELDI-TOF mass spectrometry data is currently performed on a largel y ad hoc basis. This makes comparison of results from independent analyses troublesome and does not provide a framework for distinguishing different sources of variation in data. RESULTS: In this article, we consider the task of pooling a large number of single-shot spectra, a task commonly performed automatically by the instrument software. By viewing the underlying statistical problem as one of heteroscedastic linear regression, we provide a framework for introducing robust methods and for dealing with missing data resulting from a limited span of recordable intensity values provided by the instrument. Our framework provides an interpretation of currently used methods as a maximum-likelihood estimator and allows theoretical derivation of its variance. We observe that this variance depends crucially on the total number of ionic species, which can vary considerably between different pooled spectra. This variation in variance can potentially invalidate the results from naive methods of discrimination/classification and we outline appropriate data transformations. Introducing methods from robust statistics did not improve the standard errors of the pooled samples. Imputing missing values however-using the EM algorithm-had a notable effect on the result; for our data, the pooled height of peaks which were frequently truncated increased by up to 30%.  相似文献   

4.

Background  

In mass spectrometry (MS) based proteomic data analysis, peak detection is an essential step for subsequent analysis. Recently, there has been significant progress in the development of various peak detection algorithms. However, neither a comprehensive survey nor an experimental comparison of these algorithms is yet available. The main objective of this paper is to provide such a survey and to compare the performance of single spectrum based peak detection methods.  相似文献   

5.
Zou J  Hong G  Guo X  Zhang L  Yao C  Wang J  Guo Z 《PloS one》2011,6(10):e26294

Background

There has been much interest in differentiating diseased and normal samples using biomarkers derived from mass spectrometry (MS) studies. However, biomarker identification for specific diseases has been hindered by irreproducibility. Specifically, a peak profile extracted from a dataset for biomarker identification depends on a data pre-processing algorithm. Until now, no widely accepted agreement has been reached.

Results

In this paper, we investigated the consistency of biomarker identification using differentially expressed (DE) peaks from peak profiles produced by three widely used average spectrum-dependent pre-processing algorithms based on SELDI-TOF MS data for prostate and breast cancers. Our results revealed two important factors that affect the consistency of DE peak identification using different algorithms. One factor is that some DE peaks selected from one peak profile were not detected as peaks in other profiles, and the second factor is that the statistical power of identifying DE peaks in large peak profiles with many peaks may be low due to the large scale of the tests and small number of samples. Furthermore, we demonstrated that the DE peak detection power in large profiles could be improved by the stratified false discovery rate (FDR) control approach and that the reproducibility of DE peak detection could thereby be increased.

Conclusions

Comparing and evaluating pre-processing algorithms in terms of reproducibility can elucidate the relationship among different algorithms and also help in selecting a pre-processing algorithm. The DE peaks selected from small peak profiles with few peaks for a dataset tend to be reproducibly detected in large peak profiles, which suggests that a suitable pre-processing algorithm should be able to produce peaks sufficient for identifying useful and reproducible biomarkers.  相似文献   

6.

Background  

High-Density Lipoprotein (HDL), one of the main plasma lipoproteins, serves as a docking station for proteins involved in inflammation, coagulation, and lipid metabolism.  相似文献   

7.
A series of simple and robust operations for handling large chromatographic time-of-flight mass spectrometry fingerprinting data has been established and applied to data from extracts of Fusarium graminearum genotypes modified in a non-ribosomal peptide synthase gene by over-expression and deletion. It includes importing data into a computing environment by binning the m/z axis; baseline removal; data transformation and compression across retention times. Each sample represented by a total mass spectrum was normalized to unit sum and variables selected by partial least squares discriminant analysis. Finally, principal component analysis was used for identification of high discriminatory power mass-to-charge ratios (m/z’s) separating over-expression, wildtype and deletion genotypes. Two compounds exhibiting a positive correlation to the expected levels in different genotypes were identified. The two compounds were represented by m/z 683.5 with retention time of 8.9 min, and m/z’s 774.5 and 775.5 with retention time of 14.1 min. This methodology enables extraction of chemical information from large data sets (>108 entries), and provides a starting point for individual optimization in targeting small molecules from metabolomics data.  相似文献   

8.
Surface Enhanced Laser Desorption/Ionisation Time-of-Fight Mass Spectrometry (SELDI-TOF MS) is a technique by which protein profiles can be rapidly produced from a wide variety of biological samples. By employing chromatographic surfaces combined with the specificity and reproducibility of mass spectrometry it has allowed for profiles from complex biological samples to be analysed. Profiling and biomarker identification have been employed widely throughout the biological sciences. To date, however, the benefits of SELDI-TOF MS have not been realised in the area of mammalian cell culture. The advantages in identifying markers for cell stresses, apoptosis and other culture parameters mean that these tools could help greatly to enhance monitoring and control of bioreaction process and improve the production of therapeutics. Better characterisation of culture systems through proteome analysis will allow for improved productivity and better yields.  相似文献   

9.

Background  

Mass spectrometry is increasingly being used to discover proteins or protein profiles associated with disease. Experimental design of mass-spectrometry studies has come under close scrutiny and the importance of strict protocols for sample collection is now understood. However, the question of how best to process the large quantities of data generated is still unanswered. Main challenges for the analysis are the choice of proper pre-processing and classification methods. While these two issues have been investigated in isolation, we propose to use the classification of patient samples as a clinically relevant benchmark for the evaluation of pre-processing methods.  相似文献   

10.
11.
MOTIVATION: Novel methods, both molecular and statistical, are urgently needed to take advantage of recent advances in biotechnology and the human genome project for disease diagnosis and prognosis. Mass spectrometry (MS) holds great promise for biomarker identification and genome-wide protein profiling. It has been demonstrated in the literature that biomarkers can be identified to distinguish normal individuals from cancer patients using MS data. Such progress is especially exciting for the detection of early-stage ovarian cancer patients. Although various statistical methods have been utilized to identify biomarkers from MS data, there has been no systematic comparison among these approaches in their relative ability to analyze MS data. RESULTS: We compare the performance of several classes of statistical methods for the classification of cancer based on MS spectra. These methods include: linear discriminant analysis, quadratic discriminant analysis, k-nearest neighbor classifier, bagging and boosting classification trees, support vector machine, and random forest (RF). The methods are applied to ovarian cancer and control serum samples from the National Ovarian Cancer Early Detection Program clinic at Northwestern University Hospital. We found that RF outperforms other methods in the analysis of MS data.  相似文献   

12.
We evaluate statistical models used in two-hypothesis tests for identifying peptides from tandem mass spectrometry data. The null hypothesis H(0), that a peptide matches a spectrum by chance, requires information on the probability of by-chance matches between peptide fragments and peaks in the spectrum. Likewise, the alternate hypothesis H(A), that the spectrum is due to a particular peptide, requires probabilities that the peptide fragments would indeed be observed if it was the causative agent. We compare models for these probabilities by determining the identification rates produced by the models using an independent data set. The initial models use different probabilities depending on fragment ion type, but uniform probabilities for each ion type across all of the labile bonds along the backbone. More sophisticated models for probabilities under both H(A) and H(0) are introduced that do not assume uniform probabilities for each ion type. In addition, the performance of these models using a standard likelihood model is compared to an information theory approach derived from the likelihood model. Also, a simple but effective model for incorporating peak intensities is described. Finally, a support-vector machine is used to discriminate between correct and incorrect identifications based on multiple characteristics of the scoring functions. The results are shown to reduce the misidentification rate significantly when compared to a benchmark cross-correlation based approach.  相似文献   

13.
Determining glycan structures is vital to comprehend cell-matrix, cell-cell, and even intracellular biological events. Glycan sequencing, which determines the primary structure of a glycan using tandem mass spectrometry (MS/MS), remains one of the most important tasks in proteomics. Analogous to peptide de novo sequencing, glycan de novo sequencing determines the structure without the aid of a known glycan database. We show in this paper that glycan de novo sequencing is NP-hard. We then provide a heuristic algorithm and develop a software program to solve the problem in practical cases. Experiments on real MS/MS data of glycopeptides demonstrate that our heuristic algorithm gives satisfactory results on practical data.  相似文献   

14.
MOTIVATION: The analysis of biological samples with high-throughput mass spectrometers has increased greatly in recent years. As larger datasets are processed, it is important that the spectra are aligned to ensure that the same protein intensities are correctly identified in each sample. Without such an alignment procedure it is possible to make errors in identifying the signals from peptides with similar molecular weight. Two algorithms are provided that can improve the alignment among samples. One algorithm is designed to work with SELDI data produced from a Ciphergen instrument, and the other can be used with data in a more general format. RESULTS: The two algorithms were applied to samples drawn from a common pool of reference serum. The results indicate substantial improvement in consistently identifying peptide signals in different samples.  相似文献   

15.
16.
Apoptosis is a key process in the response of tumours to chemotherapeutic agents. Tumor necrosis factor-related apoptosis-inducing ligand (TRAIL) induces apoptosis in many tumor cells, while sparing most normal cells. Several chemotherapeutic drugs synergize with TRAIL in reducing tumor growth and inducing apoptosis. Because some tumour cells respond poorly to these treatments, biomarkers that predict clinical responsiveness are needed. This study used surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) to identify novel apoptotic markers in TRAIL and etoposide (T+E)-treated MDA-MB-231 and ZR-75-1 breast cancer cells and MCF-10A non-transformed breast cells. T+E induced apoptosis, increasing caspase-3 activity at 4-8h, in all cell lines. Protein profiles revealed two prominent peaks, m/z 10090 and 8560, which decreased significantly during apoptosis. Mass spectrometry sequencing of tryptic peptides identified these proteins as S100A6 (confirmed immunologically) and ubiquitin (confirmed against a purified standard), respectively. Caspase inhibition prevented the decrease in both proteins during T+E-induced apoptosis whereas proteasome inhibition combined with T+E further decreased ubiquitin, possibly by preventing its recycling. Using SELDI-TOF MS we have identified S100A6 and ubiquitin as potential protein markers of apoptosis. Further validation using patient samples is required to confirm their potential utility in monitoring the effectiveness of anti-cancer drugs in inducing tumour cell apoptosis.  相似文献   

17.

Background  

The reliable extraction of features from mass spectra is a fundamental step in the automated analysis of proteomic mass spectrometry (MS) experiments.  相似文献   

18.
Wagner M  Naik D  Pothen A 《Proteomics》2003,3(9):1692-1698
We report our results in classifying protein matrix-assisted laser desorption/ionization-time of flight mass spectra obtained from serum samples into diseased and healthy groups. We discuss in detail five of the steps in preprocessing the mass spectral data for biomarker discovery, as well as our criterion for choosing a small set of peaks for classifying the samples. Cross-validation studies with four selected proteins yielded misclassification rates in the 10-15% range for all the classification methods. Three of these proteins or protein fragments are down-regulated and one up-regulated in lung cancer, the disease under consideration in this data set. When cross-validation studies are performed, care must be taken to ensure that the test set does not influence the choice of the peaks used in the classification. Misclassification rates are lower when both the training and test sets are used to select the peaks used in classification versus when only the training set is used. This expectation was validated for various statistical discrimination methods when thirteen peaks were used in cross-validation studies. One particular classification method, a linear support vector machine, exhibited especially robust performance when the number of peaks was varied from four to thirteen, and when the peaks were selected from the training set alone. Experiments with the samples randomly assigned to the two classes confirmed that misclassification rates were significantly higher in such cases than those observed with the true data. This indicates that our findings are indeed significant. We found closely matching masses in a database for protein expression in lung cancer for three of the four proteins we used to classify lung cancer. Data from additional samples, increased experience with the performance of various preprocessing techniques, and affirmation of the biological roles of the proteins that help in classification, will strengthen our conclusions in the future.  相似文献   

19.
Motivation: Mass spectrometry (MS), such as the surface-enhancedlaser desorption and ionization time-of-flight (SELDI-TOF) MS,provides a potentially promising proteomic technology for biomarkerdiscovery. An important matter for such a technology to be usedroutinely is its reproducibility. It is of significant interestto develop quantitative measures to evaluate the quality andreliability of different experimental methods. Results: We compare the quality of SELDI-TOF MS data using unfractionated,fractionated plasma samples and abundant protein depletion methodsin terms of the numbers of detected peaks and reliability. Severalstatistical quality-control and quality-assessment techniquesare proposed, including the Graeco–Latin square designfor the sample allocation on a Protein chip, the use of thepairwise Pearson correlation coefficient as the similarity measurebetween the spectra in conjunction with multi-dimensional scaling(MDS) for graphically evaluating similarity of replicates andassessing outlier samples; and the use of the reliability ratiofor evaluating reproducibility. Our results show that the numberof peaks detected is similar among the three sample preparationtechnologies, and the use of the Sigma multi-removal kit doesnot improve peak detection. Fractionation of plasma samplesintroduces more experimental variability. The peaks detectedusing the unfractionated plasma samples have the highest reproducibilityas determined by the reliability ratio. Availability: Our algorithm for assessment of SELDI-TOF experimentquality is available at http://www.biostat.harvard.edu/~xlin Contact: harezlak{at}post.harvard.edu Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Thomas Lengauer  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号