首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.

Background

Differences in sample collection, biomolecule extraction, and instrument variability introduce bias to data generated by liquid chromatography coupled with mass spectrometry (LC-MS). Normalization is used to address these issues. In this paper, we introduce a new normalization method using the Gaussian process regression model (GPRM) that utilizes information from individual scans within an extracted ion chromatogram (EIC) of a peak. The proposed method is particularly applicable for normalization based on analysis order of LC-MS runs. Our method uses measurement variabilities estimated through LC-MS data acquired from quality control samples to correct for bias caused by instrument drift. Maximum likelihood approach is used to find the optimal parameters for the fitted GPRM. We review several normalization methods and compare their performance with GPRM.

Results

To evaluate the performance of different normalization methods, we consider LC-MS data from a study where metabolomic approach is utilized to discover biomarkers for liver cancer. The LC-MS data were acquired by analysis of sera from liver cancer patients and cirrhotic controls. In addition, LC-MS runs from a quality control (QC) sample are included to assess the run to run variability and to evaluate the ability of various normalization method in reducing this undesired variability. Also, ANOVA models are applied to the normalized LC-MS data to identify ions with intensity measurements that are significantly different between cases and controls.

Conclusions

One of the challenges in using label-free LC-MS for quantitation of biomolecules is systematic bias in measurements. Several normalization methods have been introduced to overcome this issue, but there is no universally applicable approach at the present time. Each data set should be carefully examined to determine the most appropriate normalization method. We review here several existing methods and introduce the GPRM for normalization of LC-MS data. Through our in-house data set, we show that the GPRM outperforms other normalization methods considered here, in terms of decreasing the variability of ion intensities among quality control runs.
  相似文献   

2.
The identification of peptides and proteins from fragmentation mass spectra is a very common approach in the field of proteomics. Contemporary high-throughput peptide identification pipelines can quickly produce large quantities of MS/MS data that contain valuable knowledge about the actual physicochemical processes involved in the peptide fragmentation process, which can be extracted through extensive data mining studies. As these studies attempt to exploit the intensity information contained in the MS/MS spectra, a critical step required for a meaningful comparison of this information between MS/MS spectra is peak intensity normalization. We here describe a procedure for quantifying the efficiency of different published normalization methods in terms of the quartile coefficient of dispersion (qcod) statistic. The quartile coefficient of dispersion is applied to measure the dispersion of the peak intensities between redundant MS/MS spectra, allowing the quantification of the differences in computed peak intensity reproducibility between the different normalization methods. We demonstrate that our results are independent of the data set used in the evaluation procedure, allowing us to provide generic guidance on the choice of normalization method to apply in a certain MS/MS pipeline application.  相似文献   

3.
MOTIVATION: Microarray experiments are affected by numerous sources of non-biological variation that contribute systematic bias to the resulting data. In a dual-label (two-color) cDNA or long-oligonucleotide microarray, these systematic biases are often manifested as an imbalance of measured fluorescent intensities corresponding to Sample A versus those corresponding to Sample B. Systematic biases also affect between-slide comparisons. Making effective corrections for these systematic biases is a requisite for detecting the underlying biological variation between samples. Effective data normalization is therefore an essential step in the confident identification of biologically relevant differences in gene expression profiles. Several normalization methods for the correction of systemic bias have been described. While many of these methods have addressed intensity-dependent bias, few have addressed both intensity-dependent and spatiality-dependent bias. RESULTS: We present a neural network-based normalization method for correcting the intensity- and spatiality-dependent bias in cDNA microarray datasets. In this normalization method, the dependence of the log-intensity ratio (M) on the average log-intensity (A) as well as on the spatial coordinates (X,Y) of spots is approximated with a feed-forward neural network function. Resistance to outliers is provided by assigning weights to each spot based on how distant their M values is from the median over the spots whose A values are similar, as well as by using pseudospatial coordinates instead of spot row and column indices. A comparison of the robust neural network method with other published methods demonstrates its potential in reducing both intensity-dependent bias and spatial-dependent bias, which translates to more reliable identification of truly regulated genes.  相似文献   

4.
There is an increasing interest in the quantitative proteomic measurement of the protein contents of substantially similar biological samples, e.g. for the analysis of cellular response to perturbations over time or for the discovery of protein biomarkers from clinical samples. Technical limitations of current proteomic platforms such as limited reproducibility and low throughput make this a challenging task. A new LC-MS-based platform is able to generate complex peptide patterns from the analysis of proteolyzed protein samples at high throughput and represents a promising approach for quantitative proteomics. A crucial component of the LC-MS approach is the accurate evaluation of the abundance of detected peptides over many samples and the identification of peptide features that can stratify samples with respect to their genetic, physiological, or environmental origins. We present here a new software suite, SpecArray, that generates a peptide versus sample array from a set of LC-MS data. A peptide array stores the relative abundance of thousands of peptide features in many samples and is in a format identical to that of a gene expression microarray. A peptide array can be subjected to an unsupervised clustering analysis to stratify samples or to a discriminant analysis to identify discriminatory peptide features. We applied the SpecArray to analyze two sets of LC-MS data: one was from four repeat LC-MS analyses of the same glycopeptide sample, and another was from LC-MS analysis of serum samples of five male and five female mice. We demonstrate through these two study cases that the SpecArray software suite can serve as an effective software platform in the LC-MS approach for quantitative proteomics.  相似文献   

5.
Quantitative proteomics approaches using stable isotopes are well-known and used in many labs nowadays. More recently, high resolution quantitative approaches are reported that rely on LC-MS quantitation of peptide concentrations by comparing peak intensities between multiple runs obtained by continuous detection in MS mode. Characteristic of these comparative LC-MS procedures is that they do not rely on the use of stable isotopes; therefore the procedure is often referred to as label-free LC-MS. In order to compare at comprehensive scale peak intensity data in multiple LC-MS datasets, dedicated software is required for detection, matching and alignment of peaks. The high accuracy in quantitative determination of peptide abundance provides an impressive level of detail. This approach also requires an experimental set-up where quantitative aspects of protein extraction and reproducible separation conditions need to be well controlled. In this paper we will provide insight in the critical parameters that affect the quality of the results and list an overview of the most recent software packages that are available for this procedure.  相似文献   

6.
Abstract A probability-based quantification framework is presented for the calculation of relative peptide and protein abundance in label-free and label-dependent LC-MS proteomics data. The results are accompanied by credible intervals and regulation probabilities. The algorithm takes into account data uncertainties via Poisson statistics modified by a noise contribution that is determined automatically during an initial normalization stage. Protein quantification relies on assignments of component peptides to the acquired data. These assignments are generally of variable reliability and may not be present across all of the experiments comprising an analysis. It is also possible for a peptide to be identified to more than one protein in a given mixture. For these reasons the algorithm accepts a prior probability of peptide assignment for each intensity measurement. The model is constructed in such a way that outliers of any type can be automatically reweighted. Two discrete normalization methods can be employed. The first method is based on a user-defined subset of peptides, while the second method relies on the presence of a dominant background of endogenous peptides for which the concentration is assumed to be unaffected. Normalization is performed using the same computational and statistical procedures employed by the main quantification algorithm. The performance of the algorithm will be illustrated on example data sets, and its utility demonstrated for typical proteomics applications. The quantification algorithm supports relative protein quantification based on precursor and product ion intensities acquired by means of data-dependent methods, originating from all common isotopically-labeled approaches, as well as label-free ion intensity-based data-independent methods.  相似文献   

7.
Affymetrix high-density oligonucleotide array is a tool that has the capacity to simultaneously measure the abundance of thousands of mRNA sequences in biological samples. In order to allow direct array-to-array comparisons, normalization is a necessity. When deciding on an appropriate normalization procedure there are a couple questions that need to be addressed, e.g., on which level should the normalization be performed: On the level of feature intensities or on the level of expression indexes? Should all features/expression indexes be used or can we choose a subset of features likely to be unregulated? Another question is how to actually perform the normalization: normalize using the overall mean intensity or use a smooth normalization curve? Most of the currently used normalization methods are linear; e.g., the normalization method implemented in the Affymetrix software GeneChip is based on the overall mean intensity. However, along with alternative methods of summarizing feature intensities into an expression index, nonlinear methods have recently started to appear. For many of these alternative methods, the natural choice is to normalize on the level of feature intensities, either using all feature intensities or only perfect match intensities. In this report, a nonlinear normalization procedure aimed for normalizing feature intensities is proposed.  相似文献   

8.
One-dimensional 1H nuclear magnetic resonance (1D 1H-NMR) has been used extensively as a metabolic profiling tool for investigating urine and other biological fluids. Under ideal conditions, 1H-NMR peak intensities are directly proportional to metabolite concentrations and thus are useful for class prediction and biomarker discovery. However, many biological, experimental and instrumental variables can affect absolute NMR peak intensities. Normalizing or scaling data to minimize the influence of these variables is a critical step in producing robust, reproducible analyses. Traditionally, analyses of biological fluids have relied on the total spectral area [constant sum (CS)] to normalize individual intensities. This approach can introduce considerable inter-sample variance as changes in any individual metabolite will affect the scaling of all of the observed intensities. To reduce normalization-related variance, we have developed a histogram matching (HM) approach adapted from the field of image processing. We validate our approach using mixtures of synthetic compounds that mimic a biological extract and apply the method to an analysis of urine from rats treated with ethionine. We show that HM is a robust method for normalizing 1H-NMR data and propose it as an alternative to the traditional CS method.  相似文献   

9.
Normalization is an important step in the analysis of quantitative proteomics data. If this step is ignored, systematic biases can lead to incorrect assumptions about regulation. Most statistical procedures for normalizing proteomics data have been borrowed from genomics where their development has focused on the removal of so-called ‘batch effects.’ In general, a typical normalization step in proteomics works under the assumption that most peptides/proteins do not change; scaling is then used to give a median log-ratio of 0. The focus of this work was to identify other factors, derived from knowledge of the variables in proteomics, which might be used to improve normalization. Here we have examined the multi-laboratory data sets from Phase I of the NCI''s CPTAC program. Surprisingly, the most important bias variables affecting peptide intensities within labs were retention time and charge state. The magnitude of these observations was exaggerated in samples of unequal concentrations or “spike-in” levels, presumably because the average precursor charge for peptides with higher charge state potentials is lower at higher relative sample concentrations. These effects are consistent with reduced protonation during electrospray and demonstrate that the physical properties of the peptides themselves can serve as good reporters of systematic biases. Between labs, retention time, precursor m/z, and peptide length were most commonly the top-ranked bias variables, over the standardly used average intensity (A). A larger set of variables was then used to develop a stepwise normalization procedure. This statistical model was found to perform as well or better on the CPTAC mock biomarker data than other commonly used methods. Furthermore, the method described here does not require a priori knowledge of the systematic biases in a given data set. These improvements can be attributed to the inclusion of variables other than average intensity during normalization.The number of laboratories using MS as a quantitative tool for protein profiling continues to grow, propelling the field forward past simple qualitative measurements (i.e. cataloging), with the aim of establishing itself as a robust method for detecting proteomic differences. By analogy, semiquantitative proteomic profiling by MS can be compared with measurement of relative gene expression by genomics technologies such as microarrays or, newer, RNAseq measurements. While proteomics is disadvantaged by the lack of a molecular amplification system for proteins, successful reports from discovery experiments are numerous in the literature and are increasing with advances in instrument resolution and sensitivity.In general, methods for performing relative quantitation can be broadly divided into two categories: those employing labels (e.g. iTRAQ, TMT, and SILAC (1)) and so-called “label-free” techniques. Labeling methods involve adding some form of isobaric or isotopic label(s) to the proteins or peptides prior to liquid chromatography-tandem MS (LC-MS/MS) analysis. Chemical labels are typically applied during sample processing, and isotopic labels are commonly added during cell culture (i.e. metabolic labeling). One advantage of label-based methods is that the two (or more) differently-labeled samples can be mixed and run in single LC-MS analyses. This is in contrast to label-free methods which require the samples to be run independently and the data aligned post-acquisition.Many labs employ label-free methods because they are applicable to a wider range of samples and require fewer sample processing steps. Moreover, data from qualitative experiments can sometimes be re-analyzed using label-free software tools to provide semiquantitative data. Advances in these software tools have been extensively reviewed (2). While analysis of label-based data primarily uses full MS scan (MS1)1 or tandem MS scan (MS2) ion current measurements, analysis of label-free data can employ simple counts of confidently identified tandem mass spectra (3). So-called spectral counting makes the assumption that the number of times a peptide is identified is proportional to its concentration. These values are sometimes summed across all peptides for a given protein and scaled by protein length. Relative abundance can then be calculated for any peptide or protein of interest. While this approach may be easy to perform, its usefulness is particularly limited in smaller data sets and/or when counts are low.This report focuses only on the use of ion current measurements in label-free data sets, specifically those calculated from extracted MS1 ion chromatograms (XICs). In general terms, raw intensity values (i.e. ion counts in arbitrary units) cannot be used for quantitation in the absence of cognate internal standards because individual ion intensities depend on a response factor, related to the chemical properties of the molecule. Intensities are instead almost always reserved for relative determinations. Furthermore, retention times are sometimes used to align the chromatograms between runs to ensure higher confidence prior to calculating relative intensities. This step is crucial for methods without corresponding identity information, particularly for experiments performed on low-resolution instruments. To support a label-free workflow, peptide identifications are commonly made from tandem mass spectra (MS/MS) acquired along with direct electrospray signal (MS1). Or, in alternative workflows seeking deeper coverage, interesting MS1 components can be targeted for identification by MS/MS in follow-up runs (4).“Rolling up” the peptide ion information to the peptide and protein level is also done in different ways in different labs. In most cases, “peptide intensity” or “peptide abundance” is the summed or averaged value of the identified peptide ions. How the peptide information is transferred to the protein level differs between methods but typically involves summing one or more peptide intensities, following parsimony analysis. One such solution is the “Top 3” method developed by Silva and co-workers (5).Because peptides in label-free methods lack labeled analogs and require separate runs, they are more susceptible to analytical noise and systematic variations. Sources of these obscuring variations can come from many sources, including sample preparation, operator error, chromatography, electrospray, and even from the data analysis itself. While analytical noise (e.g. chemical interference) is difficult to selectively reject, systematic biases can often be removed by statistical preprocessing. The goal of these procedures is to normalize the data prior to calculations of relative abundance. Failure to resolve these issues is the common origin of batch effects, previously described for genomics data, which can severely limit meaningful interpretation of experimental data (6, 7).These effects have also been recently explored in proteomics data (8). Methods used to normalize proteomics data have been largely borrowed from the microarray community, or are based on a simple mean/median intensity ratio correction. Methods applied on microarray and/or gene chip and used on proteomics data include scaling, linear regression, nonlinear regression, and quantile normalizations (9). Moreover, work has also been done to improve normalization by subselecting a peptide basis (10). Other work suggests that linear regression, followed by run order analysis, works better than other methods tested (11). Key to this last method is the incorporation of a variable other than intensity during normalization. It is also important to note that little work has been done towards identifying the underlying sources of these variations in proteomics data. Although cause-and-effect is often difficult to determine, understanding these relationships will undoubtedly help remove and avoid the major underlying sources of systematic variations.In this report, we have attempted to combine our efforts focused on understanding variability with the work initiated by others for normalizing ion current-based label-free proteomics data. We have identified several major variables commonly affecting peptide ion intensities both within and between labs. As test data, we used a subset of raw data acquired during Phase I of the National Cancer Institute''s (NCI) Clinical Proteomics Technology Assessment for Cancer (CPTAC) program. With these data, we were able to develop a statistical model to rank bias variables and normalize the intensities using stepwise, semiparametric regression. The data analysis methods have been implemented within the National Institute of Standards and Technology (NIST) MS quality control (MSQC) pipeline. Finally, we have developed R code for removing systematic biases and have tested it using a reference standard spiked into a complex biological matrix (i.e. yeast cell lysate).  相似文献   

10.

Background  

Relative isotope abundance quantification, which can be used for peptide identification and differential peptide quantification, plays an important role in liquid chromatography-mass spectrometry (LC-MS)-based proteomics. However, several major issues exist in the relative isotopic quantification of peptides on time-of-flight (TOF) instruments: LC peak boundary detection, thermal noise suppression, interference removal and mass drift correction. We propose to use the Maximum Ratio Combining (MRC) method to extract MS signal templates for interference detection/removal and LC peak boundary detection. In our method, MRCQuant, MS templates are extracted directly from experimental values, and the mass drift in each LC-MS run is automatically captured and compensated. We compared the quantification accuracy of MRCQuant to that of another representative LC-MS quantification algorithm (msInspect) using datasets downloaded from a public data repository.  相似文献   

11.
在蛋白质组学中,进行液相质谱(LC-MS)实验谱数据处理,发现并分析生物标志物的复杂肽或蛋白质样本的差异是重点,而校准相同样本的多次重复实验中肽链产生的洗脱时间峰信号(LC峰)是进行量化、分析差异的关键。目前多个重复实验数据的校准通常是在重复的实验数据集中根据液相二级质谱(LC-MS/MS)实验标识LC峰的时间特征,然后使用翘曲函数对时间特征进行对齐。由于多重数据的洗脱时间误差产生是随机的,统一使用翘曲函数校准会产生较大误差。为了解决这个问题,本研究重点研究了多个重复实验数据中LC峰的时间校准算法。我们选取了两个重复实验数据,采用机器学习的思路,通过选用两个数据的LC-MS/MS中重复检测到的肽链数据作为可信数据,部分选为训练序列,部分作为测试序列,建立统计数学模型,提出了一种新的校准算法,并采用测试序列对该统计模型进行准确率测试,表明算法的准确性达到95%以上;然后,将该模型应用在两个实验数据的所有LC-MS/MS肽链检测值上,提高检测值在多个数据中的覆盖率,表明覆盖率可以到达85%以上。  相似文献   

12.
One of the important challenges for MALDI imaging mass spectrometry (MALDI-IMS) is the unambiguous identification of measured analytes. One way to do this is to match tryptic peptide MALDI-IMS m/z values with LC-MS/MS identified m/z values. Matching using current MALDI-TOF/TOF MS instruments is difficult due to the variability of in situ time-of-flight (TOF) m/z measurements. This variability is currently addressed using external calibration, which limits achievable mass accuracy for MALDI-IMS and makes it difficult to match these data to downstream LC-MS/MS results. To overcome this challenge, the work presented here details a method for internally calibrating data sets generated from tryptic peptide MALDI-IMS on formalin-fixed paraffin-embedded sections of ovarian cancer. By calibrating all spectra to internal peak features the m/z error for matches made between MALDI-IMS m/z values and LC-MS/MS identified peptide m/z values was significantly reduced. This improvement was confirmed by follow up matching of LC-MS/MS spectra to in situ MS/MS spectra from the same m/z peak features. The sum of the data presented here indicates that internal calibrants should be a standard component of tryptic peptide MALDI-IMS experiments.  相似文献   

13.
Analysis of primary animal and human tissues is key in biological and biomedical research. Comparative proteomics analysis of primary biological material would benefit from uncomplicated experimental work flows capable of evaluating an unlimited number of samples. In this report we describe the application of label-free proteomics to the quantitative analysis of five mouse core proteomes. We developed a computer program and normalization procedures that allow exploitation of the quantitative data inherent in LC-MS/MS experiments for relative and absolute quantification of proteins in complex mixtures. Important features of this approach include (i) its ability to compare an unlimited number of samples, (ii) its applicability to primary tissues and cultured cells, (iii) its straightforward work flow without chemical reaction steps, and (iv) its usefulness not only for relative quantification but also for estimation of absolute protein abundance. We applied this approach to quantitatively characterize the most abundant proteins in murine brain, heart, kidney, liver, and lung. We matched 8,800 MS/MS peptide spectra to 1,500 proteins and generated 44,000 independent data points to profile the approximately 1,000 most abundant proteins in mouse tissues. This dataset provides a quantitative profile of the fundamental proteome of a mouse, identifies the major similarities and differences between organ-specific proteomes, and serves as a paradigm of how label-free quantitative MS can be used to characterize the phenotype of mammalian primary tissues at the molecular level.  相似文献   

14.
Traditional analysis of liquid chromatography-mass spectrometry (LC-MS) data, typically performed by reviewing chromatograms and the corresponding mass spectra, is both time-consuming and difficult. Detailed data analysis is therefore often omitted in proteomics applications. When analysing multiple proteomics samples, it is usually only the final list of identified proteins that is reviewed. This may lead to unnecessarily complex or even contradictory results because the content of the list of identified proteins depends heavily on the conditions for triggering the collection of tandem mass spectra. Small changes in the signal intensity of a peptide in different LC-MS experiments can lead to the collection of a tandem mass spectrum in one experiment but not in another. Also, the quality of the tandem mass spectrometry experiments can vary, leading to successful identification in some cases but not in others. Using a novel image analysis approach, it is possible to achieve repeat analysis with a very high reproducibility by matching peptides across different LC-MS experiments using the retention time and parent mass over charge (m/z). It is also easy to confirm the final result visually. This approach has been investigated by using tryptic digests of integral membrane proteins from organelle-enriched fractions from Arabidopsis thaliana and it has been demonstrated that very highly reproducible, consistent, and reliable LC-MS data interpretation can be made.  相似文献   

15.

Background  

In the microarray experiment, many undesirable systematic variations are commonly observed. Normalization is the process of removing such variation that affects the measured gene expression levels. Normalization plays an important role in the earlier stage of microarray data analysis. The subsequent analysis results are highly dependent on normalization. One major source of variation is the background intensities. Recently, some methods have been employed for correcting the background intensities. However, all these methods focus on defining signal intensities appropriately from foreground and background intensities in the image analysis. Although a number of normalization methods have been proposed, no systematic methods have been proposed using the background intensities in the normalization process.  相似文献   

16.
MOTIVATION: The careful normalization of array-based comparative genomic hybridization (aCGH) data is of critical importance for the accurate detection of copy number changes. The difference in labelling affinity between the two fluorophores used in aCGH-usually Cy5 and Cy3-can be observed as a bias within the intensity distributions. If left unchecked, this bias is likely to skew data interpretation during downstream analysis and lead to an increased number of false discoveries. RESULTS: In this study, we have developed aCGH.Spline, a natural cubic spline interpolation method followed by linear interpolation of outlier values, which is able to remove a large portion of the dye bias from large aCGH datasets in a quick and efficient manner. Conclusions: We have shown that removing this bias and reducing the experimental noise has a strong positive impact on the ability to detect accurately both copy number variation (CNV) and copy number alterations (CNA).  相似文献   

17.
Central tendency, linear regression, locally weighted regression, and quantile techniques were investigated for normalization of peptide abundance measurements obtained from high-throughput liquid chromatography-Fourier transform ion cyclotron resonance mass spectrometry (LC-FTICR MS). Arbitrary abundances of peptides were obtained from three sample sets, including a standard protein sample, two Deinococcus radiodurans samples taken from different growth phases, and two mouse striatum samples from control and methamphetamine-stressed mice (strain C57BL/6). The selected normalization techniques were evaluated in both the absence and presence of biological variability by estimating extraneous variability prior to and following normalization. Prior to normalization, replicate runs from each sample set were observed to be statistically different, while following normalization replicate runs were no longer statistically different. Although all techniques reduced systematic bias to some degree, assigned ranks among the techniques revealed that for most LC-FTICR-MS analyses linear regression normalization ranked either first or second. However, the lack of a definitive trend among the techniques suggested the need for additional investigation into adapting normalization approaches for label-free proteomics. Nevertheless, this study serves as an important step for evaluating approaches that address systematic biases related to relative quantification and label-free proteomics.  相似文献   

18.
Integrated liquid-chromatography mass-spectrometry (LC-MS) is becoming a widely used approach for quantifying the protein composition of complex samples. The output of the LC-MS system measures the intensity of a peptide with a specific mass-charge ratio and retention time. In the last few years, this technology has been used to compare complex biological samples across multiple conditions. One challenge for comparative proteomic profiling with LC-MS is to match corresponding peptide features from different experiments. In this paper, we propose a new method--Peptide Element Alignment (PETAL) that uses raw spectrum data and detected peak to simultaneously align features from multiple LC-MS experiments. PETAL creates spectrum elements, each of which represents the mass spectrum of a single peptide in a single scan. Peptides detected in different LC-MS data are aligned if they can be represented by the same elements. By considering each peptide separately, PETAL enjoys greater flexibility than time warping methods. While most existing methods process multiple data sets by sequentially aligning each data set to an arbitrarily chosen template data set, PETAL treats all experiments symmetrically and can analyze all experiments simultaneously. We illustrate the performance of PETAL on example data sets.  相似文献   

19.
Normalization of expression levels applied to microarray data can help in reducing measurement error. Different methods, including cyclic loess, quantile normalization and median or mean normalization, have been utilized to normalize microarray data. Although there is considerable literature regarding normalization techniques for mRNA microarray data, there are no publications comparing normalization techniques for microRNA (miRNA) microarray data, which are subject to similar sources of measurement error. In this paper, we compare the performance of cyclic loess, quantile normalization, median normalization and no normalization for a single-color microRNA microarray dataset. We show that the quantile normalization method works best in reducing differences in miRNA expression values for replicate tissue samples. By showing that the total mean squared error are lowest across almost all 36 investigated tissue samples, we are assured that the bias correction provided by quantile normalization is not outweighed by additional error variance that can arise from a more complex normalization method. Furthermore, we show that quantile normalization does not achieve these results by compression of scale.  相似文献   

20.
MOTIVATION: Mass spectrometry (MS) data are impaired by noise similar to many other analytical methods. Therefore, proteomics requires statistical approaches to determine the reliability of regulatory information if protein quantification is based on ion intensities observed in MS. RESULTS: We suggest a procedure to model instrument and workflow-specific noise behaviour of iTRAQ reporter ions that can provide regulatory information during automated peptide sequencing by LC-MS/MS. The established mathematical model representatively predicts possible variations of iTRAQ reporter ions in an MS data-dependent manner. The model can be utilized to calculate the robustness of regulatory information systematically at the peptide level in so-called bottom-up proteome approaches. It allows to determine the best fitting regulation factor and in addition to calculate the probability of alternative regulations. The result can be visualized as likelihood curves summarizing both the quantity and quality of regulatory information. Likelihood curves basically can be calculated from all peptides belonging to different regions of proteins if they are detected in LC-MS/MS experiments. Therefore, this approach renders excellent opportunities to detect and statistically validate dynamic post-translational modifications usually affecting only particular regions of the whole protein. The detection of known phosphorylation events at protein kinases served as a first proof of concept in this study and underscores the potential for noise models in quantitative proteomics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号