共查询到20条相似文献,搜索用时 15 毫秒
1.
Ole Schulz-Trieglaff Nico Pfeifer Clemens Gröpl Oliver Kohlbacher Knut Reinert 《BMC bioinformatics》2008,9(1):423
Background
Mass Spectrometry coupled to Liquid Chromatography (LC-MS) is commonly used to analyze the protein content of biological samples in large scale studies. The data resulting from an LC-MS experiment is huge, highly complex and noisy. Accordingly, it has sparked new developments in Bioinformatics, especially in the fields of algorithm development, statistics and software engineering. In a quantitative label-free mass spectrometry experiment, crucial steps are the detection of peptide features in the mass spectra and the alignment of samples by correcting for shifts in retention time. At the moment, it is difficult to compare the plethora of algorithms for these tasks. So far, curated benchmark data exists only for peptide identification algorithms but no data that represents a ground truth for the evaluation of feature detection, alignment and filtering algorithms. 相似文献2.
Mark D Robinson David P De Souza Woon Wai Keen Eleanor C Saunders Malcolm J McConville Terence P Speed Vladimir A Likić 《BMC bioinformatics》2007,8(1):419
Background
Gas chromatography-mass spectrometry (GC-MS) is a robust platform for the profiling of certain classes of small molecules in biological samples. When multiple samples are profiled, including replicates of the same sample and/or different sample states, one needs to account for retention time drifts between experiments. This can be achieved either by the alignment of chromatographic profiles prior to peak detection, or by matching signal peaks after they have been extracted from chromatogram data matrices. Automated retention time correction is particularly important in non-targeted profiling studies. 相似文献3.
Xiangpeng Ren Chunyi Xue Qingming Kong Chengwen Zhang Yingzuo Bi Yongchang Cao 《Proteome science》2012,10(1):1-11
Background
Recent advances in liquid chromatography-mass spectrometry (LC-MS) technology have led to more effective approaches for measuring changes in peptide/protein abundances in biological samples. Label-free LC-MS methods have been used for extraction of quantitative information and for detection of differentially abundant peptides/proteins. However, difference detection by analysis of data derived from label-free LC-MS methods requires various preprocessing steps including filtering, baseline correction, peak detection, alignment, and normalization. Although several specialized tools have been developed to analyze LC-MS data, determining the most appropriate computational pipeline remains challenging partly due to lack of established gold standards.Results
The work in this paper is an initial study to develop a simple model with "presence" or "absence" condition using spike-in experiments and to be able to identify these "true differences" using available software tools. In addition to the preprocessing pipelines, choosing appropriate statistical tests and determining critical values are important. We observe that individual statistical tests could lead to different results due to different assumptions and employed metrics. It is therefore preferable to incorporate several statistical tests for either exploration or confirmation purpose.Conclusions
The LC-MS data from our spike-in experiment can be used for developing and optimizing LC-MS data preprocessing algorithms and to evaluate workflows implemented in existing software tools. Our current work is a stepping stone towards optimizing LC-MS data acquisition and testing the accuracy and validity of computational tools for difference detection in future studies that will be focused on spiking peptides of diverse physicochemical properties in different concentrations to better represent biomarker discovery of differentially abundant peptides/proteins. 相似文献4.
Carl Brunius Lin Shi Rikard Landberg 《Metabolomics : Official journal of the Metabolomic Society》2016,12(11):173
Introduction
Liquid chromatography-mass spectrometry (LC-MS) is a commonly used technique in untargeted metabolomics owing to broad coverage of metabolites, high sensitivity and simple sample preparation. However, data generated from multiple batches are affected by measurement errors inherent to alterations in signal intensity, drift in mass accuracy and retention times between samples both within and between batches. These measurement errors reduce repeatability and reproducibility and may thus decrease the power to detect biological responses and obscure interpretation.Objective
Our aim was to develop procedures to address and correct for within- and between-batch variability in processing multiple-batch untargeted LC-MS metabolomics data to increase their quality.Methods
Algorithms were developed for: (i) alignment and merging of features that are systematically misaligned between batches, through aggregating feature presence/missingness on batch level and combining similar features orthogonally present between batches; and (ii) within-batch drift correction using a cluster-based approach that allows multiple drift patterns within batch. Furthermore, a heuristic criterion was developed for the feature-wise choice of reference-based or population-based between-batch normalisation.Results
In authentic data, between-batch alignment resulted in picking 15 % more features and deconvoluting 15 % of features previously erroneously aligned. Within-batch correction provided a decrease in median quality control feature coefficient of variation from 20.5 to 15.1 %. Algorithms are open source and available as an R package (‘batchCorr’).Conclusions
The developed procedures provide unbiased measures of improved data quality, with implications for improved data analysis. Although developed for LC-MS based metabolomics, these methods are generic and can be applied to other data suffering from similar limitations.5.
Background
In proteomics studies, liquid chromatography coupled to mass spectrometry (LC-MS) has proven to be a powerful technology to investigate differential expression of proteins/peptides that are characterized by their peak intensities, mass-to-charge ratio (m/z), and retention time (RT). The variable complexity of peptide mixtures and occasional drifts lead to substantial variations in m/z and RT dimensions. Thus, label-free differential protein expression studies by LC-MS technology require alignment with respect to both RT and m/z to ensure that same proteins/peptides are compared from multiple runs.Methods
In this study, we propose a new strategy to align LC-MALDI-TOF data by combining quality threshold cluster analysis and support vector regression. Our method performs alignment on the basis of measurements in three dimensions (RT, m/z, intensity).Results and conclusions
We demonstrate the suitability of our proposed method for alignment of LC-MALDI-TOF data through a previously published spike-in dataset and a new in-house generated spike-in dataset. A comparison of our method with other methods that utilize only RT and m/z dimensions reveals that the use of intensity measurements enhances alignment performance.6.
Lange E Gröpl C Schulz-Trieglaff O Leinenbach A Huber C Reinert K 《Bioinformatics (Oxford, England)》2007,23(13):i273-i281
MOTIVATION: Liquid chromatography coupled to mass spectrometry (LC-MS) and combined with tandem mass spectrometry (LC-MS/MS) have become a prominent tool for the analysis of complex proteomic samples. An important step in a typical workflow is the combination of results from multiple LC-MS experiments to improve confidence in the obtained measurements or to compare results from different samples. To do so, a suitable mapping or alignment between the data sets needs to be estimated. The alignment has to correct for variations in mass and elution time which are present in all mass spectrometry experiments. RESULTS: We propose a novel algorithm to align LC-MS samples and to match corresponding ion species across samples. Our algorithm matches landmark signals between two data sets using a geometric technique based on pose clustering. Variations in mass and retention time are corrected by an affine dewarping function estimated from matched landmarks. We use the pairwise dewarping in an algorithm for aligning multiple samples. We show that our pose clustering approach is fast and reliable as compared to previous approaches. It is robust in the presence of noise and able to accurately align samples with only few common ion species. In addition, we can easily handle different kinds of LC-MS data and adopt our algorithm to new mass spectrometry technologies. AVAILABILITY: This algorithm is implemented as part of the OpenMS software library for shotgun proteomics and available under the Lesser GNU Public License (LGPL) at www.openms.de. 相似文献
7.
Background
Terminal restriction fragment length polymorphism (T-RFLP) analysis is a DNA-fingerprinting method that can be used for comparisons of the microbial community composition in a large number of samples. There is no consensus on how T-RFLP data should be treated and analyzed before comparisons between samples are made, and several different approaches have been proposed in the literature. The analysis of T-RFLP data can be cumbersome and time-consuming, and for large datasets manual data analysis is not feasible. The currently available tools for automated T-RFLP analysis, although valuable, offer little flexibility, and few, if any, options regarding what methods to use. To enable comparisons and combinations of different data treatment methods an analysis template and an extensive collection of macros for T-RFLP data analysis using Microsoft Excel were developed.Results
The Tools for T-RFLP data analysis template provides procedures for the analysis of large T-RFLP datasets including application of a noise baseline threshold and setting of the analysis range, normalization and alignment of replicate profiles, generation of consensus profiles, normalization and alignment of consensus profiles and final analysis of the samples including calculation of association coefficients and diversity index. The procedures are designed so that in all analysis steps, from the initial preparation of the data to the final comparison of the samples, there are various different options available. The parameters regarding analysis range, noise baseline, T-RF alignment and generation of consensus profiles are all given by the user and several different methods are available for normalization of the T-RF profiles. In each step, the user can also choose to base the calculations on either peak height data or peak area data.Conclusions
The Tools for T-RFLP data analysis template enables an objective and flexible analysis of large T-RFLP datasets in a widely used spreadsheet application.Electronic supplementary material
The online version of this article (doi:10.1186/s12859-014-0361-7) contains supplementary material, which is available to authorized users. 相似文献8.
Background
Differences in sample collection, biomolecule extraction, and instrument variability introduce bias to data generated by liquid chromatography coupled with mass spectrometry (LC-MS). Normalization is used to address these issues. In this paper, we introduce a new normalization method using the Gaussian process regression model (GPRM) that utilizes information from individual scans within an extracted ion chromatogram (EIC) of a peak. The proposed method is particularly applicable for normalization based on analysis order of LC-MS runs. Our method uses measurement variabilities estimated through LC-MS data acquired from quality control samples to correct for bias caused by instrument drift. Maximum likelihood approach is used to find the optimal parameters for the fitted GPRM. We review several normalization methods and compare their performance with GPRM.Results
To evaluate the performance of different normalization methods, we consider LC-MS data from a study where metabolomic approach is utilized to discover biomarkers for liver cancer. The LC-MS data were acquired by analysis of sera from liver cancer patients and cirrhotic controls. In addition, LC-MS runs from a quality control (QC) sample are included to assess the run to run variability and to evaluate the ability of various normalization method in reducing this undesired variability. Also, ANOVA models are applied to the normalized LC-MS data to identify ions with intensity measurements that are significantly different between cases and controls.Conclusions
One of the challenges in using label-free LC-MS for quantitation of biomolecules is systematic bias in measurements. Several normalization methods have been introduced to overcome this issue, but there is no universally applicable approach at the present time. Each data set should be carefully examined to determine the most appropriate normalization method. We review here several existing methods and introduce the GPRM for normalization of LC-MS data. Through our in-house data set, we show that the GPRM outperforms other normalization methods considered here, in terms of decreasing the variability of ion intensities among quality control runs.9.
A jumping profile Hidden Markov Model and applications to recombination sites in HIV and HCV genomes
Anne-Kathrin Schultz Ming Zhang Thomas Leitner Carla Kuiken Bette Korber Burkhard Morgenstern Mario Stanke 《BMC bioinformatics》2006,7(1):265-15
Background
Jumping alignments have recently been proposed as a strategy to search a given multiple sequence alignment A against a database. Instead of comparing a database sequence S to the multiple alignment or profile as a whole, S is compared and aligned to individual sequences from A. Within this alignment, S can jump between different sequences from A, so different parts of S can be aligned to different sequences from the input multiple alignment. This approach is particularly useful for dealing with recombination events. 相似文献10.
Background
The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. 相似文献11.
Background
Comparative methods have been the standard techniques for in silico protein structure prediction. The prediction is based on a multiple alignment that contains both reference sequences with known structures and the sequence whose unknown structure is predicted. Intensive research has been made to improve the quality of multiple alignments, since misaligned parts of the multiple alignment yield misleading predictions. However, sometimes all methods fail to predict the correct alignment, because the evolutionary signal is too weak to find the homologous parts due to the large number of mutations that separate the sequences. 相似文献12.
Zu-Guo Yu Ka Hou Chu Chi Pang Li Vo Anh Li-Qian Zhou Roger Wei Wang 《BMC evolutionary biology》2010,10(1):192
Background
The vast sequence divergence among different virus groups has presented a great challenge to alignment-based analysis of virus phylogeny. Due to the problems caused by the uncertainty in alignment, existing tools for phylogenetic analysis based on multiple alignment could not be directly applied to the whole-genome comparison and phylogenomic studies of viruses. There has been a growing interest in alignment-free methods for phylogenetic analysis using complete genome data. Among the alignment-free methods, a dynamical language (DL) method proposed by our group has successfully been applied to the phylogenetic analysis of bacteria and chloroplast genomes. 相似文献13.
Background
The structural integrity of recombinant proteins is of critical importance to their application as clinical treatments. Recombinant growth hormone preparations have been examined by several methodologies. In this study recombinant human growth hormone (rhGH; Genotropin?), expressed in E. coli K12, was structurally analyzed by two-dimensional gel electrophoresis and MALDI-TOF-TOF, LC-MS and LC-MS/ MS sequencing of the resolved peptides. 相似文献14.
Background
Detecting remote homologies by direct comparison of protein sequences remains a challenging task. We had previously developed a similarity score between sequences, called a local alignment kernel, that exhibits good performance for this task in combination with a support vector machine. The local alignment kernel depends on an amino acid substitution matrix. Since commonly used BLOSUM or PAM matrices for scoring amino acid matches have been optimized to be used in combination with the Smith-Waterman algorithm, the matrices optimal for the local alignment kernel can be different. 相似文献15.
Background
The comparison of homologous sequences from different species is an essential approach to reconstruct the evolutionary history of species and of the genes they harbour in their genomes. Several complete mitochondrial and nuclear genomes are now available, increasing the importance of using multiple sequence alignment algorithms in comparative genomics. MtDNA has long been used in phylogenetic analysis and errors in the alignments can lead to errors in the interpretation of evolutionary information. Although a large number of multiple sequence alignment algorithms have been proposed to date, they all deal with linear DNA and cannot handle directly circular DNA. Researchers interested in aligning circular DNA sequences must first rotate them to the "right" place using an essentially manual process, before they can use multiple sequence alignment tools. 相似文献16.
Gordon Blackshields Fabian Sievers Weifeng Shi Andreas Wilm Desmond G Higgins 《Algorithms for molecular biology : AMB》2010,5(1):21
Background
The most widely used multiple sequence alignment methods require sequences to be clustered as an initial step. Most sequence clustering methods require a full distance matrix to be computed between all pairs of sequences. This requires memory and time proportional to N 2 for N sequences. When N grows larger than 10,000 or so, this becomes increasingly prohibitive and can form a significant barrier to carrying out very large multiple alignments. 相似文献17.
J.‐R. Bastidas‐Oyanedel C.‐A. Aceves‐Lara G. Ruiz‐Filippi J.‐P. Steyer 《Engineering in Life Science》2008,8(5):487-498
A global thermodynamic analysis, normally used for pure cultures, has been performed for steady‐state data sets from acidogenic mixed cultures. This analysis is a combination of two different thermodynamic approaches, based on tabulated standard Gibbs energy of formation, global stoichiometry and medium compositions. It takes into account the energy transfer efficiency, ?, together with the Gibbs free energy dissipation, ΔGo, analysis of the different data. The objective is to describe these systems thermodynamically without any heat measurement. The results show that ? is influenced by environmental conditions, where increasing hydraulic retention time increases its value all cases. The pH effect on ? is related to metabolic shifts and osmoregulation. Within the environmental conditions analyzed, ? ranges from 0.23 for a hydraulic retention time of 20 h and pH 4, to 0.42 for a hydraulic retention time of 8 h and a pH ranging from 7–8.5. The estimated values of ΔGo are comparable to standard Gibbs energy of dissipation reported in the literature. For the data sets analyzed, ΔGo ranges from –1210 kJ/molx, corresponding to a stirring velocity of 300 rpm, pH 6 and a hydraulic retention time of 6 h, to –20744 kJ/molx for pH 4 and a hydraulic retention time of 20 h. For average conclusions, the combined approach based on standard Gibbs energy of formation and global stoichiometry, used in this thermodynamic analysis, allows for the estimation of Gibbs energy dissipation values from the extracellular medium compositions in acidogenic mixed cultures. Such estimated values are comparable to the standard Gibbs energy dissipation values reported in the literature. It is demonstrated that ? is affected by the environmental conditions, i.e., stirring velocity, hydraulic retention time and pH. However, a relationship that relates this parameter to environmental conditions was not found and will be the focus of further research. 相似文献
18.
Background
While most multiple sequence alignment programs expect that all or most of their input is known to be homologous, and penalise insertions and deletions, this is not a reasonable assumption for non-coding DNA, which is much less strongly conserved than protein-coding genes. Arguing that the goal of sequence alignment should be the detection of homology and not similarity, we incorporate an evolutionary model into a previously published multiple sequence alignment program for non-coding DNA, Sigma, as a sensitive likelihood-based way to assess the significance of alignments. Version 1 of Sigma was successful in eliminating spurious alignments but exhibited relatively poor sensitivity on synthetic data. Sigma 1 used a p-value (the probability under the "null hypothesis" of non-homology) to assess the significance of alignments, and, optionally, a background model that captured short-range genomic correlations. Sigma version 2, described here, retains these features, but calculates the p-value using a sophisticated evolutionary model that we describe here, and also allows for a transition matrix for different substitution rates from and to different nucleotides. Our evolutionary model takes separate account of mutation and fixation, and can be extended to allow for locally differing functional constraints on sequence. 相似文献19.
Päivikki Perko-Mäkelä Pauliina Isohanni Marianne Katzav Marianne Lund Marja-Liisa Hänninen Ulrike Lyhs 《Acta veterinaria Scandinavica》2009,51(1):18
Background
Campylobacter is the most common cause of bacterial enteritis worldwide. Handling and eating of contaminated poultry meat has considered as one of the risk factors for human campylobacteriosis. Campylobacter contamination can occur at all stages of a poultry production cycle. The objective of this study was to determine the occurrence of Campylobacter during a complete turkey production cycle which lasts for 1,5 years of time. For detection of Campylobacter, a conventional culture method was compared with a PCR method. Campylobacter isolates from different types of samples have been identified to the species level by a multiplex PCR assay. 相似文献20.