首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 390 毫秒
1.
Quantification of LC-MS peak intensities assigned during peptide identification in a typical comparative proteomics experiment will deviate from run-to-run of the instrument due to both technical and biological variation. Thus, normalization of peak intensities across an LC-MS proteomics dataset is a fundamental step in pre-processing. However, the downstream analysis of LC-MS proteomics data can be dramatically affected by the normalization method selected. Current normalization procedures for LC-MS proteomics data are presented in the context of normalization values derived from subsets of the full collection of identified peptides. The distribution of these normalization values is unknown a priori. If they are not independent from the biological factors associated with the experiment the normalization process can introduce bias into the data, possibly affecting downstream statistical biomarker discovery. We present a novel approach to evaluate normalization strategies, which includes the peptide selection component associated with the derivation of normalization values. Our approach evaluates the effect of normalization on the between-group variance structure in order to identify the most appropriate normalization methods that improve the structure of the data without introducing bias into the normalized peak intensities.  相似文献   

2.

Background

Differences in sample collection, biomolecule extraction, and instrument variability introduce bias to data generated by liquid chromatography coupled with mass spectrometry (LC-MS). Normalization is used to address these issues. In this paper, we introduce a new normalization method using the Gaussian process regression model (GPRM) that utilizes information from individual scans within an extracted ion chromatogram (EIC) of a peak. The proposed method is particularly applicable for normalization based on analysis order of LC-MS runs. Our method uses measurement variabilities estimated through LC-MS data acquired from quality control samples to correct for bias caused by instrument drift. Maximum likelihood approach is used to find the optimal parameters for the fitted GPRM. We review several normalization methods and compare their performance with GPRM.

Results

To evaluate the performance of different normalization methods, we consider LC-MS data from a study where metabolomic approach is utilized to discover biomarkers for liver cancer. The LC-MS data were acquired by analysis of sera from liver cancer patients and cirrhotic controls. In addition, LC-MS runs from a quality control (QC) sample are included to assess the run to run variability and to evaluate the ability of various normalization method in reducing this undesired variability. Also, ANOVA models are applied to the normalized LC-MS data to identify ions with intensity measurements that are significantly different between cases and controls.

Conclusions

One of the challenges in using label-free LC-MS for quantitation of biomolecules is systematic bias in measurements. Several normalization methods have been introduced to overcome this issue, but there is no universally applicable approach at the present time. Each data set should be carefully examined to determine the most appropriate normalization method. We review here several existing methods and introduce the GPRM for normalization of LC-MS data. Through our in-house data set, we show that the GPRM outperforms other normalization methods considered here, in terms of decreasing the variability of ion intensities among quality control runs.
  相似文献   

3.
The combined method of LC-MS/MS is increasingly being used to explore differences in the proteomic composition of complex biological systems. The reliability and utility of such comparative protein expression profiling studies is critically dependent on an accurate and rigorous assessment of quantitative changes in the relative abundance of the myriad of proteins typically present in a biological sample such as blood or tissue. In this review, we provide an overview of key statistical and computational issues relevant to bottom-up shotgun global proteomic analysis, with an emphasis on methods that can be applied to improve the dependability of biological inferences drawn from large proteomic datasets. Focusing on a start-to-finish approach, we address the following topics: 1) low-level data processing steps, such as formation of a data matrix, filtering, and baseline subtraction to minimize noise, 2) mid-level processing steps, such as data normalization, alignment in time, peak detection, peak quantification, peak matching, and error models, to facilitate profile comparisons; and, 3) high-level processing steps such as sample classification and biomarker discovery, and related topics such as significance testing, multiple testing, and choice of feature space. We report on approaches that have recently been developed for these steps, discussing their merits and limitations, and propose areas deserving of further research.  相似文献   

4.
在蛋白质组学中,进行液相质谱(LC-MS)实验谱数据处理,发现并分析生物标志物的复杂肽或蛋白质样本的差异是重点,而校准相同样本的多次重复实验中肽链产生的洗脱时间峰信号(LC峰)是进行量化、分析差异的关键。目前多个重复实验数据的校准通常是在重复的实验数据集中根据液相二级质谱(LC-MS/MS)实验标识LC峰的时间特征,然后使用翘曲函数对时间特征进行对齐。由于多重数据的洗脱时间误差产生是随机的,统一使用翘曲函数校准会产生较大误差。为了解决这个问题,本研究重点研究了多个重复实验数据中LC峰的时间校准算法。我们选取了两个重复实验数据,采用机器学习的思路,通过选用两个数据的LC-MS/MS中重复检测到的肽链数据作为可信数据,部分选为训练序列,部分作为测试序列,建立统计数学模型,提出了一种新的校准算法,并采用测试序列对该统计模型进行准确率测试,表明算法的准确性达到95%以上;然后,将该模型应用在两个实验数据的所有LC-MS/MS肽链检测值上,提高检测值在多个数据中的覆盖率,表明覆盖率可以到达85%以上。  相似文献   

5.
6.
Statistical inference for microarray experiments usually involves the estimation of error variance for each gene. Because the sample size available for each gene is often low, the usual unbiased estimator of the error variance can be unreliable. Shrinkage methods, including empirical Bayes approaches that borrow information across genes to produce more stable estimates, have been developed in recent years. Because the same microarray platform is often used for at least several experiments to study similar biological systems, there is an opportunity to improve variance estimation further by borrowing information not only across genes but also across experiments. We propose a lognormal model for error variances that involves random gene effects and random experiment effects. Based on the model, we develop an empirical Bayes estimator of the error variance for each combination of gene and experiment and call this estimator BAGE because information is Borrowed Across Genes and Experiments. A permutation strategy is used to make inference about the differential expression status of each gene. Simulation studies with data generated from different probability models and real microarray data show that our method outperforms existing approaches.  相似文献   

7.
The Cochran-Armitage test has commonly been used for a trend test in binomial proportions. The quasi-likelihood method provides a simple approach to model extra-binomial proportions. Two versions of the score and Wald tests using different parameterizations for the extra-binomial variance were investigated: one in terms of intercluster correlation, and another in terms of variance. The Monte Carlo simulation was used to evaluate the performance of the each version of the score test and the Wald test, and the Cochran-Armitage test. The simulation shows that the Cochran-Armitage test has the proper size only for the binomial sample data, and the test is no longer valid when applied to the extra-binomial data. The Wald test is more likely to exceed the nominal level than the score test under either intercluster correlation model or variance model. Both score tests performed very well even with the binomial data; the tests control the type I error and in the meantime maintain the power of detecting the dose effects. Based on the design considered in this paper, the two scores test are comparable. The score test based on the intercluster correlations model seems better controlling the Type I error but appears less powerful than that based on the variance model. An example from a developmental toxicity experiment is given.  相似文献   

8.
For compounds dissolved in non-polar solvents, nuclear magnetic resonance spectroscopic investigations have benefited greatly from the advent of cryogenically cooled probes. Unfortunately the allure of significant increases in sensitivity may not be realized for compounds such as metabolites that are dissolved in solvents with high ionic-strengths such as solutions typically utilized for metabolomic or biomolecular investigations. In some cases there is little benefit from a cryogenically cooled probe over a conventional room temperature probe. Various sample preparation methods have been developed to minimize the detrimental effects of salt; for large numbers of metabolomic samples these preparation methods tend to be onerous and impractical. An alternative to manipulating the sample, is to utilize a probe that is designed to have a higher tolerance for solutions with high ionic-strengths. In order to acquire high-quality optimal data and choose the appropriate probe configuration (especially important for comparative quantitative investigations) the effects of salts and buffers on cryogenic probe performance must be understood. Herein we detail sample considerations for two cryogenic probes, a standard 5 mm and a narrow diameter 1.7 mm, in an effort to identify via integrals, intensities and noise levels the optimal choice for biomolecular investigations.  相似文献   

9.
MOTIVATION: Quantitative experimental data is the critical bottleneck in the modeling of dynamic cellular processes in systems biology. Here, we present statistical approaches improving reproducibility of protein quantification by immunoprecipitation and immunoblotting. RESULTS: Based on a large data set with more than 3600 data points, we unravel that the main sources of biological variability and experimental noise are multiplicative and log-normally distributed. Therefore, we suggest a log-transformation of the data to obtain additive normally distributed noise. After this transformation, common statistical procedures can be applied to analyze the data. An error model is introduced to account for technical as well as biological variability. Elimination of these systematic errors decrease variability of measurements and allow for a more precise estimation of underlying dynamics of protein concentrations in cellular signaling. The proposed error model is relevant for simulation studies, parameter estimation and model selection, basic tools of systems biology. AVAILABILITY: Matlab and R code is available from the authors on request. The data can be downloaded from our website www.fdm.uni-freiburg.de/~ckreutz/data.  相似文献   

10.
The method of reconstructing quantum bumps in photoreceptor cells from noise data by making use of shot noise theory is critically reviewed. The application of this method produces results irrespective of whether the conditions for reconstructing bumps by the method are satisfied or not and even irrespective of whether at high stimulus intensities quantum bumps exist or not. We argue that at high intensities the concept of quantum bumps indeed becomes physically meaningless and degenerates to a purely mathematical concept. In order to investigate the meaning of the results of the reconstruction method, we submit it to a test model for which bumps and single channel opening events can be evaluated analytically. By comparing the analytical results of the test model with that of the reconstruction method applied to the test model we find: (1) even at low intensities, the reconstructed bump values deviate from the analytical results by up to an order of magnitude due to the variability of the bumps, (2) at high intensities, the reconstruction method produces single chennel opening events rather than anything like a quantum bump. We also find, however, that there is no continuous transition from a bump at low intensities to a single channel event at high intensities.  相似文献   

11.
We provide a novel method, DRISEE (duplicate read inferred sequencing error estimation), to assess sequencing quality (alternatively referred to as "noise" or "error") within and/or between sequencing samples. DRISEE provides positional error estimates that can be used to inform read trimming within a sample. It also provides global (whole sample) error estimates that can be used to identify samples with high or varying levels of sequencing error that may confound downstream analyses, particularly in the case of studies that utilize data from multiple sequencing samples. For shotgun metagenomic data, we believe that DRISEE provides estimates of sequencing error that are more accurate and less constrained by technical limitations than existing methods that rely on reference genomes or the use of scores (e.g. Phred). Here, DRISEE is applied to (non amplicon) data sets from both the 454 and Illumina platforms. The DRISEE error estimate is obtained by analyzing sets of artifactual duplicate reads (ADRs), a known by-product of both sequencing platforms. We present DRISEE as an open-source, platform-independent method to assess sequencing error in shotgun metagenomic data, and utilize it to discover previously uncharacterized error in de novo sequence data from the 454 and Illumina sequencing platforms.  相似文献   

12.
Rosetta error model for gene expression analysis   总被引:4,自引:0,他引:4  
MOTIVATION: In microarray gene expression studies, the number of replicated microarrays is usually small because of cost and sample availability, resulting in unreliable variance estimation and thus unreliable statistical hypothesis tests. The unreliable variance estimation is further complicated by the fact that the technology-specific variance is intrinsically intensity-dependent. RESULTS: The Rosetta error model captures the variance-intensity relationship for various types of microarray technologies, such as single-color arrays and two-color arrays. This error model conservatively estimates intensity error and uses this value to stabilize the variance estimation. We present two commonly used error models: the intensity error-model for single-color microarrays and the ratio error model for two-color microarrays or ratios built from two single-color arrays. We present examples to demonstrate the strength of our error models in improving statistical power of microarray data analysis, particularly, in increasing expression detection sensitivity and specificity when the number of replicates is limited.  相似文献   

13.
Although two-color fluorescent DNA microarrays are now standard equipment in many molecular biology laboratories, methods for identifying differentially expressed genes in microarray data are still evolving. Here, we report a refined test for differentially expressed genes which does not rely on gene expression ratios but directly compares a series of repeated measurements of the two dye intensities for each gene. This test uses a statistical model to describe multiplicative and additive errors influencing an array experiment, where model parameters are estimated from observed intensities for all genes using the method of maximum likelihood. A generalized likelihood ratio test is performed for each gene to determine whether, under the model, these intensities are significantly different. We use this method to identify significant differences in gene expression among yeast cells growing in galactose-stimulating versus non-stimulating conditions and compare our results with current approaches for identifying differentially-expressed genes. The effect of sample size on parameter optimization is also explored, as is the use of the error model to compare the within- and between-slide intensity variation intrinsic to an array experiment.  相似文献   

14.
Random errors are omnipresent in sensorimotor tasks due to perceptual and motor noise. The question is, are humans aware of their random errors on an instance-by-instance basis? The appealing answer would be ‘no’ because it seems intuitive that humans would otherwise immediately correct for the errors online, thereby increasing sensorimotor precision. However, here we show the opposite. Participants pointed to visual targets with varying degree of feedback. After movement completion participants indicated whether they believed they landed left or right of target. Surprisingly, participants'' left/right-discriminability was well above chance, even without visual feedback. Only when forced to correct for the error after movement completion did participants loose knowledge about the remaining error, indicating that random errors can only be accessed offline. When correcting, participants applied the optimal correction gain, a weighting factor between perceptual and motor noise, minimizing end-point variance. Together these results show that humans optimally combine direct information about sensorimotor noise in the system (the current random error), with indirect knowledge about the variance of the perceptual and motor noise distributions. Yet, they only appear to do so offline after movement completion, not while the movement is still in progress, suggesting that during movement proprioceptive information is less precise.  相似文献   

15.
Microarray expression profiles are inherently noisy and many different sources of variation exist in microarray experiments. It is still a significant challenge to develop stochastic models to realize noise in microarray expression profiles, which has profound influence on the reverse engineering of genetic regulation. Using the target genes of the tumour suppressor gene p53 as the test problem, we developed stochastic differential equation models and established the relationship between the noise strength of stochastic models and parameters of an error model for describing the distribution of the microarray measurements. Numerical results indicate that the simulated variance from stochastic models with a stochastic degradation process can be represented by a monomial in terms of the hybridization intensity and the order of the monomial depends on the type of stochastic process. The developed stochastic models with multiple stochastic processes generated simulations whose variance is consistent with the prediction of the error model. This work also established a general method to develop stochastic models from experimental information.  相似文献   

16.
MOTIVATION: Mass spectrometry (MS) data are impaired by noise similar to many other analytical methods. Therefore, proteomics requires statistical approaches to determine the reliability of regulatory information if protein quantification is based on ion intensities observed in MS. RESULTS: We suggest a procedure to model instrument and workflow-specific noise behaviour of iTRAQ reporter ions that can provide regulatory information during automated peptide sequencing by LC-MS/MS. The established mathematical model representatively predicts possible variations of iTRAQ reporter ions in an MS data-dependent manner. The model can be utilized to calculate the robustness of regulatory information systematically at the peptide level in so-called bottom-up proteome approaches. It allows to determine the best fitting regulation factor and in addition to calculate the probability of alternative regulations. The result can be visualized as likelihood curves summarizing both the quantity and quality of regulatory information. Likelihood curves basically can be calculated from all peptides belonging to different regions of proteins if they are detected in LC-MS/MS experiments. Therefore, this approach renders excellent opportunities to detect and statistically validate dynamic post-translational modifications usually affecting only particular regions of the whole protein. The detection of known phosphorylation events at protein kinases served as a first proof of concept in this study and underscores the potential for noise models in quantitative proteomics.  相似文献   

17.
Mass spectrometry-based proteomics holds great promise as a discovery tool for biomarker candidates in the early detection of diseases. Recently much emphasis has been placed upon producing highly reliable data for quantitative profiling for which highly reproducible methodologies are indispensable. The main problems that affect experimental reproducibility stem from variations introduced by sample collection, preparation, and storage protocols and LC-MS settings and conditions. On the basis of a formally precise and quantitative definition of similarity between LC-MS experiments, we have developed Chaorder, a fully automatic software tool that can assess experimental reproducibility of sets of large scale LC-MS experiments. By visualizing the similarity relationships within a set of experiments, this tool can form the basis of systematic quality control and thus help assess the comparability of mass spectrometry data over time, across different laboratories, and between instruments. Applying Chaorder to data from multiple laboratories and a range of instruments, experimental protocols, and sample complexities revealed biases introduced by the sample processing steps, experimental protocols, and instrument choices. Moreover we show that reducing bias by correcting for just a few steps, for example randomizing the run order, does not provide much gain in statistical power for biomarker discovery.  相似文献   

18.
Analysis of native or endogenous peptides in biofluids can provide valuable insights into disease mechanisms. Furthermore, the detected peptides may also have utility as potential biomarkers for non-invasive monitoring of human diseases. The non-invasive nature of urine collection and the abundance of peptides in the urine makes analysis by high-throughput ‘peptidomics’ methods , an attractive approach for investigating the pathogenesis of renal disease. However, urine peptidomics methodologies can be problematic with regards to difficulties associated with sample preparation. The urine matrix can provide significant background interference in making the analytical measurements that it hampers both the identification of peptides and the depth of the peptidomics read when utilizing LC-MS based peptidome analysis. We report on a novel adaptation of the standard solid phase extraction (SPE) method to a modified SPE (mSPE) approach for improved peptide yield and analysis sensitivity with LC-MS based peptidomics in terms of time, cost, clogging of the LC-MS column, peptide yield, peptide quality, and number of peptides identified by each method. Expense and time requirements were comparable for both SPE and mSPE, but more interfering contaminants from the urine matrix were evident in the SPE preparations (e.g., clogging of the LC-MS columns, yellowish background coloration of prepared samples due to retained urobilin, lower peptide yields) when compared to the mSPE method. When we compared data from technical replicates of 4 runs, the mSPE method provided significantly improved efficiencies for the preparation of samples from urine (e.g., mSPE peptide identification 82% versus 18% with SPE; p = 8.92E-05). Additionally, peptide identifications, when applying the mSPE method, highlighted the biology of differential activation of urine peptidases during acute renal transplant rejection with distinct laddering of specific peptides, which was obscured for most proteins when utilizing the conventional SPE method. In conclusion, the mSPE method was found to be superior to the conventional, standard SPE method for urine peptide sample preparation when applying LC-MS peptidomics analysis due to the optimized sample clean up that provided improved experimental inference from the confidently identified peptides.  相似文献   

19.
Clinical trials are often planned with high uncertainty about the variance of the primary outcome variable. A poor estimate of the variance, however, may lead to an over‐ or underpowered study. In the internal pilot study design, the sample variance is calculated at an interim step and the sample size can be adjusted if necessary. The available recalculation procedures use the data of those patients for sample size recalculation that have already completed the study. In this article, we consider a variance estimator that takes into account both the data at the endpoint and at an intermediate point of the treatment phase. We derive asymptotic properties of this estimator and the relating sample size recalculation procedure. In a simulation study, the performance of the proposed approach is evaluated and compared with the procedure that uses only long‐term data. Simulation results demonstrate that the sample size resulting from the proposed procedure shows in general a smaller variability. At the same time, the Type I error rate is not inflated and the achieved power is close to the desired value.  相似文献   

20.
Spontaneously occurring synaptic events (synaptic noise) recorded intracellularly are usually assumed to be independent of evoked postsynaptic responses and to contaminate measures of postsynaptic response amplitude in a roughly Gaussian manner. Here we derive analytically the expected noise distribution for excitatory synaptic noise and investigate its effects on amplitude histograms. We propose that some fraction of this excitatory noise is initiated at the same release sites that contribute to the evoked synaptic event and develop an analytical model of the interaction between this fraction of the noise and the evoked postsynaptic response amplitude. Recording intracellularly with sharp microelectrodes in the in vitro hippocampal slice preparation, we find that excitatory synaptic noise accounts for up to 70% of the intracellular recording noise, when inhibition is blocked pharmacologically. Up to 20% of this noise shows a significant correlation with the evoked event amplitude, and the behavior of this component of the noise is consistent with a model which assumes that each release site experiences a refractory period of approximately 60 ms after release. In contrast with classical models of quantal variance, our models predict that excitatory synaptic noise can cause the apparent variance of successive peaks in an excitatory synaptic amplitude histogram to decrease from left to right, and in some cases to be less than the variance of the measured noise.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号