首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background  

Mass Spectrometry coupled to Liquid Chromatography (LC-MS) is commonly used to analyze the protein content of biological samples in large scale studies. The data resulting from an LC-MS experiment is huge, highly complex and noisy. Accordingly, it has sparked new developments in Bioinformatics, especially in the fields of algorithm development, statistics and software engineering. In a quantitative label-free mass spectrometry experiment, crucial steps are the detection of peptide features in the mass spectra and the alignment of samples by correcting for shifts in retention time. At the moment, it is difficult to compare the plethora of algorithms for these tasks. So far, curated benchmark data exists only for peptide identification algorithms but no data that represents a ground truth for the evaluation of feature detection, alignment and filtering algorithms.  相似文献   

2.

Background  

Gas chromatography-mass spectrometry (GC-MS) is a robust platform for the profiling of certain classes of small molecules in biological samples. When multiple samples are profiled, including replicates of the same sample and/or different sample states, one needs to account for retention time drifts between experiments. This can be achieved either by the alignment of chromatographic profiles prior to peak detection, or by matching signal peaks after they have been extracted from chromatogram data matrices. Automated retention time correction is particularly important in non-targeted profiling studies.  相似文献   

3.

Background

Recent advances in liquid chromatography-mass spectrometry (LC-MS) technology have led to more effective approaches for measuring changes in peptide/protein abundances in biological samples. Label-free LC-MS methods have been used for extraction of quantitative information and for detection of differentially abundant peptides/proteins. However, difference detection by analysis of data derived from label-free LC-MS methods requires various preprocessing steps including filtering, baseline correction, peak detection, alignment, and normalization. Although several specialized tools have been developed to analyze LC-MS data, determining the most appropriate computational pipeline remains challenging partly due to lack of established gold standards.

Results

The work in this paper is an initial study to develop a simple model with "presence" or "absence" condition using spike-in experiments and to be able to identify these "true differences" using available software tools. In addition to the preprocessing pipelines, choosing appropriate statistical tests and determining critical values are important. We observe that individual statistical tests could lead to different results due to different assumptions and employed metrics. It is therefore preferable to incorporate several statistical tests for either exploration or confirmation purpose.

Conclusions

The LC-MS data from our spike-in experiment can be used for developing and optimizing LC-MS data preprocessing algorithms and to evaluate workflows implemented in existing software tools. Our current work is a stepping stone towards optimizing LC-MS data acquisition and testing the accuracy and validity of computational tools for difference detection in future studies that will be focused on spiking peptides of diverse physicochemical properties in different concentrations to better represent biomarker discovery of differentially abundant peptides/proteins.  相似文献   

4.

Introduction

Liquid chromatography-mass spectrometry (LC-MS) is a commonly used technique in untargeted metabolomics owing to broad coverage of metabolites, high sensitivity and simple sample preparation. However, data generated from multiple batches are affected by measurement errors inherent to alterations in signal intensity, drift in mass accuracy and retention times between samples both within and between batches. These measurement errors reduce repeatability and reproducibility and may thus decrease the power to detect biological responses and obscure interpretation.

Objective

Our aim was to develop procedures to address and correct for within- and between-batch variability in processing multiple-batch untargeted LC-MS metabolomics data to increase their quality.

Methods

Algorithms were developed for: (i) alignment and merging of features that are systematically misaligned between batches, through aggregating feature presence/missingness on batch level and combining similar features orthogonally present between batches; and (ii) within-batch drift correction using a cluster-based approach that allows multiple drift patterns within batch. Furthermore, a heuristic criterion was developed for the feature-wise choice of reference-based or population-based between-batch normalisation.

Results

In authentic data, between-batch alignment resulted in picking 15 % more features and deconvoluting 15 % of features previously erroneously aligned. Within-batch correction provided a decrease in median quality control feature coefficient of variation from 20.5 to 15.1 %. Algorithms are open source and available as an R package (‘batchCorr’).

Conclusions

The developed procedures provide unbiased measures of improved data quality, with implications for improved data analysis. Although developed for LC-MS based metabolomics, these methods are generic and can be applied to other data suffering from similar limitations.
  相似文献   

5.

Background

In proteomics studies, liquid chromatography coupled to mass spectrometry (LC-MS) has proven to be a powerful technology to investigate differential expression of proteins/peptides that are characterized by their peak intensities, mass-to-charge ratio (m/z), and retention time (RT). The variable complexity of peptide mixtures and occasional drifts lead to substantial variations in m/z and RT dimensions. Thus, label-free differential protein expression studies by LC-MS technology require alignment with respect to both RT and m/z to ensure that same proteins/peptides are compared from multiple runs.

Methods

In this study, we propose a new strategy to align LC-MALDI-TOF data by combining quality threshold cluster analysis and support vector regression. Our method performs alignment on the basis of measurements in three dimensions (RT, m/z, intensity).

Results and conclusions

We demonstrate the suitability of our proposed method for alignment of LC-MALDI-TOF data through a previously published spike-in dataset and a new in-house generated spike-in dataset. A comparison of our method with other methods that utilize only RT and m/z dimensions reveals that the use of intensity measurements enhances alignment performance.
  相似文献   

6.
MOTIVATION: Liquid chromatography coupled to mass spectrometry (LC-MS) and combined with tandem mass spectrometry (LC-MS/MS) have become a prominent tool for the analysis of complex proteomic samples. An important step in a typical workflow is the combination of results from multiple LC-MS experiments to improve confidence in the obtained measurements or to compare results from different samples. To do so, a suitable mapping or alignment between the data sets needs to be estimated. The alignment has to correct for variations in mass and elution time which are present in all mass spectrometry experiments. RESULTS: We propose a novel algorithm to align LC-MS samples and to match corresponding ion species across samples. Our algorithm matches landmark signals between two data sets using a geometric technique based on pose clustering. Variations in mass and retention time are corrected by an affine dewarping function estimated from matched landmarks. We use the pairwise dewarping in an algorithm for aligning multiple samples. We show that our pose clustering approach is fast and reliable as compared to previous approaches. It is robust in the presence of noise and able to accurately align samples with only few common ion species. In addition, we can easily handle different kinds of LC-MS data and adopt our algorithm to new mass spectrometry technologies. AVAILABILITY: This algorithm is implemented as part of the OpenMS software library for shotgun proteomics and available under the Lesser GNU Public License (LGPL) at www.openms.de.  相似文献   

7.

Background

Terminal restriction fragment length polymorphism (T-RFLP) analysis is a DNA-fingerprinting method that can be used for comparisons of the microbial community composition in a large number of samples. There is no consensus on how T-RFLP data should be treated and analyzed before comparisons between samples are made, and several different approaches have been proposed in the literature. The analysis of T-RFLP data can be cumbersome and time-consuming, and for large datasets manual data analysis is not feasible. The currently available tools for automated T-RFLP analysis, although valuable, offer little flexibility, and few, if any, options regarding what methods to use. To enable comparisons and combinations of different data treatment methods an analysis template and an extensive collection of macros for T-RFLP data analysis using Microsoft Excel were developed.

Results

The Tools for T-RFLP data analysis template provides procedures for the analysis of large T-RFLP datasets including application of a noise baseline threshold and setting of the analysis range, normalization and alignment of replicate profiles, generation of consensus profiles, normalization and alignment of consensus profiles and final analysis of the samples including calculation of association coefficients and diversity index. The procedures are designed so that in all analysis steps, from the initial preparation of the data to the final comparison of the samples, there are various different options available. The parameters regarding analysis range, noise baseline, T-RF alignment and generation of consensus profiles are all given by the user and several different methods are available for normalization of the T-RF profiles. In each step, the user can also choose to base the calculations on either peak height data or peak area data.

Conclusions

The Tools for T-RFLP data analysis template enables an objective and flexible analysis of large T-RFLP datasets in a widely used spreadsheet application.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0361-7) contains supplementary material, which is available to authorized users.  相似文献   

8.

Background

Differences in sample collection, biomolecule extraction, and instrument variability introduce bias to data generated by liquid chromatography coupled with mass spectrometry (LC-MS). Normalization is used to address these issues. In this paper, we introduce a new normalization method using the Gaussian process regression model (GPRM) that utilizes information from individual scans within an extracted ion chromatogram (EIC) of a peak. The proposed method is particularly applicable for normalization based on analysis order of LC-MS runs. Our method uses measurement variabilities estimated through LC-MS data acquired from quality control samples to correct for bias caused by instrument drift. Maximum likelihood approach is used to find the optimal parameters for the fitted GPRM. We review several normalization methods and compare their performance with GPRM.

Results

To evaluate the performance of different normalization methods, we consider LC-MS data from a study where metabolomic approach is utilized to discover biomarkers for liver cancer. The LC-MS data were acquired by analysis of sera from liver cancer patients and cirrhotic controls. In addition, LC-MS runs from a quality control (QC) sample are included to assess the run to run variability and to evaluate the ability of various normalization method in reducing this undesired variability. Also, ANOVA models are applied to the normalized LC-MS data to identify ions with intensity measurements that are significantly different between cases and controls.

Conclusions

One of the challenges in using label-free LC-MS for quantitation of biomolecules is systematic bias in measurements. Several normalization methods have been introduced to overcome this issue, but there is no universally applicable approach at the present time. Each data set should be carefully examined to determine the most appropriate normalization method. We review here several existing methods and introduce the GPRM for normalization of LC-MS data. Through our in-house data set, we show that the GPRM outperforms other normalization methods considered here, in terms of decreasing the variability of ion intensities among quality control runs.
  相似文献   

9.

Background  

Jumping alignments have recently been proposed as a strategy to search a given multiple sequence alignment A against a database. Instead of comparing a database sequence S to the multiple alignment or profile as a whole, S is compared and aligned to individual sequences from A. Within this alignment, S can jump between different sequences from A, so different parts of S can be aligned to different sequences from the input multiple alignment. This approach is particularly useful for dealing with recombination events.  相似文献   

10.
High-throughput sequence alignment using Graphics Processing Units   总被引:1,自引:0,他引:1  

Background  

The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies.  相似文献   

11.

Background  

Comparative methods have been the standard techniques for in silico protein structure prediction. The prediction is based on a multiple alignment that contains both reference sequences with known structures and the sequence whose unknown structure is predicted. Intensive research has been made to improve the quality of multiple alignments, since misaligned parts of the multiple alignment yield misleading predictions. However, sometimes all methods fail to predict the correct alignment, because the evolutionary signal is too weak to find the homologous parts due to the large number of mutations that separate the sequences.  相似文献   

12.

Background  

The vast sequence divergence among different virus groups has presented a great challenge to alignment-based analysis of virus phylogeny. Due to the problems caused by the uncertainty in alignment, existing tools for phylogenetic analysis based on multiple alignment could not be directly applied to the whole-genome comparison and phylogenomic studies of viruses. There has been a growing interest in alignment-free methods for phylogenetic analysis using complete genome data. Among the alignment-free methods, a dynamical language (DL) method proposed by our group has successfully been applied to the phylogenetic analysis of bacteria and chloroplast genomes.  相似文献   

13.

Background  

The structural integrity of recombinant proteins is of critical importance to their application as clinical treatments. Recombinant growth hormone preparations have been examined by several methodologies. In this study recombinant human growth hormone (rhGH; Genotropin?), expressed in E. coli K12, was structurally analyzed by two-dimensional gel electrophoresis and MALDI-TOF-TOF, LC-MS and LC-MS/ MS sequencing of the resolved peptides.  相似文献   

14.

Background  

Detecting remote homologies by direct comparison of protein sequences remains a challenging task. We had previously developed a similarity score between sequences, called a local alignment kernel, that exhibits good performance for this task in combination with a support vector machine. The local alignment kernel depends on an amino acid substitution matrix. Since commonly used BLOSUM or PAM matrices for scoring amino acid matches have been optimized to be used in combination with the Smith-Waterman algorithm, the matrices optimal for the local alignment kernel can be different.  相似文献   

15.

Background  

The comparison of homologous sequences from different species is an essential approach to reconstruct the evolutionary history of species and of the genes they harbour in their genomes. Several complete mitochondrial and nuclear genomes are now available, increasing the importance of using multiple sequence alignment algorithms in comparative genomics. MtDNA has long been used in phylogenetic analysis and errors in the alignments can lead to errors in the interpretation of evolutionary information. Although a large number of multiple sequence alignment algorithms have been proposed to date, they all deal with linear DNA and cannot handle directly circular DNA. Researchers interested in aligning circular DNA sequences must first rotate them to the "right" place using an essentially manual process, before they can use multiple sequence alignment tools.  相似文献   

16.

Background  

The most widely used multiple sequence alignment methods require sequences to be clustered as an initial step. Most sequence clustering methods require a full distance matrix to be computed between all pairs of sequences. This requires memory and time proportional to N 2 for N sequences. When N grows larger than 10,000 or so, this becomes increasingly prohibitive and can form a significant barrier to carrying out very large multiple alignments.  相似文献   

17.
A global thermodynamic analysis, normally used for pure cultures, has been performed for steady‐state data sets from acidogenic mixed cultures. This analysis is a combination of two different thermodynamic approaches, based on tabulated standard Gibbs energy of formation, global stoichiometry and medium compositions. It takes into account the energy transfer efficiency, ?, together with the Gibbs free energy dissipation, ΔGo, analysis of the different data. The objective is to describe these systems thermodynamically without any heat measurement. The results show that ? is influenced by environmental conditions, where increasing hydraulic retention time increases its value all cases. The pH effect on ? is related to metabolic shifts and osmoregulation. Within the environmental conditions analyzed, ? ranges from 0.23 for a hydraulic retention time of 20 h and pH 4, to 0.42 for a hydraulic retention time of 8 h and a pH ranging from 7–8.5. The estimated values of ΔGo are comparable to standard Gibbs energy of dissipation reported in the literature. For the data sets analyzed, ΔGo ranges from –1210 kJ/molx, corresponding to a stirring velocity of 300 rpm, pH 6 and a hydraulic retention time of 6 h, to –20744 kJ/molx for pH 4 and a hydraulic retention time of 20 h. For average conclusions, the combined approach based on standard Gibbs energy of formation and global stoichiometry, used in this thermodynamic analysis, allows for the estimation of Gibbs energy dissipation values from the extracellular medium compositions in acidogenic mixed cultures. Such estimated values are comparable to the standard Gibbs energy dissipation values reported in the literature. It is demonstrated that ? is affected by the environmental conditions, i.e., stirring velocity, hydraulic retention time and pH. However, a relationship that relates this parameter to environmental conditions was not found and will be the focus of further research.  相似文献   

18.

Background  

While most multiple sequence alignment programs expect that all or most of their input is known to be homologous, and penalise insertions and deletions, this is not a reasonable assumption for non-coding DNA, which is much less strongly conserved than protein-coding genes. Arguing that the goal of sequence alignment should be the detection of homology and not similarity, we incorporate an evolutionary model into a previously published multiple sequence alignment program for non-coding DNA, Sigma, as a sensitive likelihood-based way to assess the significance of alignments. Version 1 of Sigma was successful in eliminating spurious alignments but exhibited relatively poor sensitivity on synthetic data. Sigma 1 used a p-value (the probability under the "null hypothesis" of non-homology) to assess the significance of alignments, and, optionally, a background model that captured short-range genomic correlations. Sigma version 2, described here, retains these features, but calculates the p-value using a sophisticated evolutionary model that we describe here, and also allows for a transition matrix for different substitution rates from and to different nucleotides. Our evolutionary model takes separate account of mutation and fixation, and can be extended to allow for locally differing functional constraints on sequence.  相似文献   

19.

Background  

Campylobacter is the most common cause of bacterial enteritis worldwide. Handling and eating of contaminated poultry meat has considered as one of the risk factors for human campylobacteriosis. Campylobacter contamination can occur at all stages of a poultry production cycle. The objective of this study was to determine the occurrence of Campylobacter during a complete turkey production cycle which lasts for 1,5 years of time. For detection of Campylobacter, a conventional culture method was compared with a PCR method. Campylobacter isolates from different types of samples have been identified to the species level by a multiplex PCR assay.  相似文献   

20.

Background  

The brine shrimp Artemia expresses four different types of haemoglobin subunits namely C1, C2, T1 and T2. Two of these four subunits dimerize in different combinations to produce the three isoforms of the heterodimeric Artemia haemoglobin: HbI (C1 and C2), HbII (C1 and T2) and HbIII (T1 and T2). Previous biochemical, biophysical and computational analyses demonstrate that the T and C polymers are rings of nine concatenated globin domains, which are covalently joined by interdomain linkers. Two such rings stacked coaxially give the functional molecule. This research aimed to construct a quaternary structural model of Artemia HbII that shows the interpolymer interface and domain-domain alignment, using the MS3D (mass spectrometry for three dimensional analysis) approach. This involved introducing chemical crosslinks between the two polymers, cleaving with trypsin and analyzing the resulting products by mass spectrometry. This was followed by computational analysis of the mass spectrometry data using the program SearchXlinks to identify putatively crosslinked peptides.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号