首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Two-dimensional difference gel electrophoresis (2-D DIGE) allows for reliable quantification of global protein abundance changes. The threshold of significance for protein abundance changes depends on the experimental variation (biological and technical). This study estimates biological, technical and total variation inherent to 2-D DIGE analysis of environmental bacteria, using the model organisms "Aromatoleum aromaticum" EbN1 and Phaeobacter gallaeciensis DSM 17395. Of both bacteria the soluble proteomes were analyzed from replicate cultures. For strains EbN1 and DSM 17395, respectively, CV revealed a total variation of below 19 and 15%, an average technical variation of 12 and 7%, and an average biological variation of 18 and 17%. Multivariate analysis of variance confirmed domination of biological over technical variance to be significant in most cases. To visualize variances, the complex protein data have been plotted with a multidimensional scaling technique. Furthermore, comparison of different treatment groups (different substrate conditions) demonstrated that variability within groups is significantly smaller than differences caused by treatment.  相似文献   

2.
INTRODUCTION: Microarray experiments often have complex designs that include sample pooling, biological and technical replication, sample pairing and dye-swapping. This article demonstrates how statistical modelling can illuminate issues in the design and analysis of microarray experiments, and this information can then be used to plan effective studies. METHODS: A very detailed statistical model for microarray data is introduced, to show the possible sources of variation that are present in even the simplest microarray experiments. Based on this model, the efficacy of common experimental designs, normalisation methodologies and analyses is determined. RESULTS: When the cost of the arrays is high compared with the cost of samples, sample pooling and spot replication are shown to be efficient variance reduction methods, whereas technical replication of whole arrays is demonstrated to be very inefficient. Dye-swap designs can use biological replicates rather than technical replicates to improve efficiency and simplify analysis. When the cost of samples is high and technical variation is a major portion of the error, technical replication can be cost effective. Normalisation by centreing on a small number of spots may reduce array effects, but can introduce considerable variation in the results. Centreing using the bulk of spots on the array is less variable. Similarly, normalisation methods based on regression methods can introduce variability. Except for normalisation methods based on spiking controls, all normalisation requires that most genes do not differentially express. Methods based on spatial location and/or intensity also require that the nondifferentially expressing genes are at random with respect to location and intensity. Spotting designs should be carefully done so that spot replicates are widely spaced on the array, and genes with similar expression patterns are not clustered. DISCUSSION: The tools for statistical design of experiments can be applied to microarray experiments to improve both efficiency and validity of the studies. Given the high cost of microarray experiments, the benefits of statistical input prior to running the experiment cannot be over-emphasised.  相似文献   

3.
Quantitative proteomics investigates physiology at the molecular level by measuring relative differences in protein expression between samples under different experimental conditions. A major obstacle to reliably determining quantitative changes in protein expression is to overcome error imposed by technical variation and biological variation. In drug discovery and development the issue of biological variation often rises in concordance with the developmental stage of research, spanning from in vitro assays to clinical trials. In this paper we present case studies to raise awareness to the issues of technical variation and biological variation and the impact this places on applying quantitative proteomics. We defined the degree of technical variation from the process of two-dimensional electrophoresis as 20-30% coefficient of variation. On the other hand, biological variation observed experiment-to-experiment showed a broader degree of variation depending upon the sample type. This was demonstrated with case studies where variation was monitored across experiments with bacteria, established cell lines, primary cultures, and with drug treated human subjects. We discuss technical variation and biological variation as key factors to consider during experimental design, and offer insight into preparing experiments that overcome this challenge to provide statistically significant outcomes for conducting quantitative proteomic research.  相似文献   

4.
MOTIVATION: The numerical values of gene expression measured using microarrays are usually presented to the biological end-user as summary statistics of spot pixel data, such as the spot mean, median and mode. Much of the subsequent data analysis reported in the literature, however, uses only one of these spot statistics. This results in sub-optimal estimates of gene expression levels and a need for improvement in quantitative spot variation surveillance. RESULTS: This paper develops a maximum-likelihood method for estimating gene expression using spot mean, variance and pixel number values available from typical microarray scanners. It employs a hierarchical model of variation between and within microarray spots. The hierarchical maximum-likelihood estimate (MLE) is shown to be a more efficient estimator of the mean than the 'conventional' estimate using solely the spot mean values (i.e. without spot variance data). Furthermore, under the assumptions of our model, the spot mean and spot variance are shown to be sufficient statistics that do not require the use of all pixel data.The hierarchical MLE method is applied to data from both Monte Carlo (MC) simulations and a two-channel dye-swapped spotted microarray experiment. The MC simulations show that the hierarchical MLE method leads to improved detection of differential gene expression particularly when 'outlier' spots are present on the arrays. Compared with the conventional method, the MLE method applied to data from the microarray experiment leads to an increase in the number of differentially expressed genes detected for low cut-off P-values of interest.  相似文献   

5.
If biological questions are to be answered using quantitative proteomics, it is essential to design experiments which have sufficient power to be able to detect changes in expression. Sample subpooling is a strategy that can be used to reduce the variance but still allow studies to encompass biological variation. Underlying sample pooling strategies is the biological averaging assumption that the measurements taken on the pool are equal to the average of the measurements taken on the individuals. This study finds no evidence of a systematic bias triggered by sample pooling for DIGE and that pooling can be useful in reducing biological variation. For the first time in quantitative proteomics, the two sources of variance were decoupled and it was found that technical variance predominates for mouse brain, while biological variance predominates for human brain. A power analysis found that as the number of individuals pooled increased, then the number of replicates needed declined but the number of biological samples increased. Repeat measures of biological samples decreased the numbers of samples required but increased the number of gels needed. An example cost benefit analysis demonstrates how researchers can optimise their experiments while taking into account the available resources.  相似文献   

6.
The relationship between spot volume and variation for all protein spots observed on large format 2D gels when utilising silver stain technology and a model system based on mammalian NSO cell extracts is reported. By running multiple gels we have shown that the reproducibility of data generated in this way is dependent on individual protein spot volumes, which in turn are directly correlated with the coefficient of variation. The coefficients of variation across all observed protein spots were highest for low abundant proteins which are the primary contributors to process error, and lowest for more abundant proteins. Using the relationship between spot volume and coefficient of variation we show it is necessary to calculate variation for individual protein spot volumes. The inherent limitations of silver staining therefore mean that errors in individual protein spot volumes must be considered when assessing significant changes in protein spot volume and not global error.  相似文献   

7.
Analysis of variance components in gene expression data   总被引:5,自引:0,他引:5  
MOTIVATION: A microarray experiment is a multi-step process, and each step is a potential source of variation. There are two major sources of variation: biological variation and technical variation. This study presents a variance-components approach to investigating animal-to-animal, between-array, within-array and day-to-day variations for two data sets. The first data set involved estimation of technical variances for pooled control and pooled treated RNA samples. The variance components included between-array, and two nested within-array variances: between-section (the upper- and lower-sections of the array are replicates) and within-section (two adjacent spots of the same gene are printed within each section). The second experiment was conducted on four different weeks. Each week there were reference and test samples with a dye-flip replicate in two hybridization days. The variance components included week-to-week, animal-to-animal and between-array and within-array variances. RESULTS: We applied the linear mixed-effects model to quantify different sources of variation. In the first data set, we found that the between-array variance is greater than the between-section variance, which, in turn, is greater than the within-section variance. In the second data set, for the reference samples, the week-to-week variance is larger than the between-array variance, which, in turn, is slightly larger than the within-array variance. For the test samples, the week-to-week variance has the largest variation. The animal-to-animal variance is slightly larger than the between-array and within-array variances. However, in a gene-by-gene analysis, the animal-to-animal variance is smaller than the between-array variance in four out of five housekeeping genes. In summary, the largest variation observed is the week-to-week effect. Another important source of variability is the animal-to-animal variation. Finally, we describe the use of variance-component estimates to determine optimal numbers of animals, arrays per animal and sections per array in planning microarray experiments.  相似文献   

8.
Peripheral blood mononuclear cells (PBMCs) are main actors in inflammatory processes and linked to many diseases, including rheumatoid arthritis, atherosclerosis, asthma, HIV and cancer. Moreover, they seem an interesting ‘surrogate tissue’ that can be used in biomarker discovery. In order to get a good experimental design for quantitative expression studies, the knowledge of the interindividual variation is an essential part. Therefore, PBMCs were isolated from 24 healthy volunteers (15 males, 9 females, ages 63–86) with no clinical signs of inflammation. The extracted proteins were separated using the two dimensional difference in gel electrophoresis technology (2D-DIGE), and the gel images were processed with the DeCyder 2D software. Protein spots present in at least 22 out of 24 healthy volunteers were selected for further statistical analysis. Determination of the coefficient of variation (CV) of the normalized spot volume values of these proteins, reveals that the total variation of the PBMC proteome varies between 12,99% to 148,45%, with a mean value of 28%. A supplemental look at the causes of technical variation showed that the isolation of PBMCs from whole blood is the factor which influences the experimental variance the most. This isolation should be handled with extra care and an additional washing step would be beneficial. Knowing the extent of variation, we show that at least 10 independent samples per group are needed to obtain statistical powerful data. This study demonstrates the importance of considering variance of a human population for a good experimental design for future protein profiling or biomarker studies.  相似文献   

9.
As a first approach in establishing the holm oak leaf proteome, we have optimised a protocol for this plant and tissue which includes the following steps: trichloroacetic acid-acetone extraction, two-dimensional gel electrophoresis (2-DE) on pH 5 to 8 linear gradient immobilised pH gradient strips as the first dimension, and sodium dodecyl sulfate-polyacrylamide gel electrophoresis on 13% polyacrylamide gels as the second one. Proteins were detected by Coomassie staining. Gel images were recorded and digitalized, and the protein spots quantified by using a linear regression equation of protein quantity on spot volume obtained against standard proteins. Analytical variance was calculated for one-hundred protein spots from three replicate 2-DE gels of the same protein extract. Biological variance was determined for the same protein spots from independent tissue extracts corresponding to leaves from different trees, or the same tree at different orientations or sampling times during a day. Values of 26% for the analytical variance and 58.6% for the biological variance among independent trees were obtained. These values provide a quantified and statistical basis for the evaluation of protein expression changes in comparative proteomic investigations with this species. A representative set of the major proteins, covering the isoelectric point range of 5 to 8 and the relative molecular mass(r) range of 14 to 78 kDa, were subjected to liquid chromatography-tandem mass spectrometry analysis. Due to the absence of Quercus DNA or protein sequence databases, a method based on the procedure reported by Liska and Shevchenko including de novo sequencing and BLAST similarity searching against other plant species databases was used for protein identification. Out of 43 analysed spots, 35 were positively identified. The identified proteins mainly corresponded to enzymes involved in photosynthesis and energetic metabolism, with a significant number corresponding to RubisCO.  相似文献   

10.
Two-dimensional SDS-PAGE gel electrophoresis using post-run staining is widely used to measure the abundances of thousands of protein spots simultaneously. Usually, the protein abundances of two or more biological groups are compared using biological and technical replicates. After gel separation and staining, the spots are detected, spot volumes are quantified, and spots are matched across gels. There are almost always many missing values in the resulting data set. The missing values arise either because the corresponding proteins have very low abundances (or are absent) or because of experimental errors such as incomplete/over focusing in the first dimension or varying run times in the second dimension as well as faulty spot detection and matching. In this study, we show that the probability for a spot to be missing can be modeled by a logistic regression function of the logarithm of the volume. Furthermore, we present an algorithm that takes a set of gels with technical and biological replicates as input and estimates the average protein abundances in the biological groups from the number of missing spots and measured volumes of the present spots using a maximum likelihood approach. Confidence intervals for abundances and p-values for differential expression between two groups are calculated using bootstrap sampling. The algorithm is compared to two standard approaches, one that discards missing values and one that sets all missing values to zero. We have evaluated this approach in two different gel data sets of different biological origin. An R-program, implementing the algorithm, is freely available at http://bioinfo.thep .lu.se/MissingValues2Dgels.html.  相似文献   

11.
Neuroproteomics is aimed to study the molecular organisation of the nervous system at the protein level. Two-dimensional electrophoresis is the most frequently used technique in quantitative proteomics. The aim of this study was to assess the experimental and biological variations on this proteomic platform using mouse brain tissue. Mice are the most generally used lab animals for modelling human disease or investigating the effect of a drug-candidate or a treatment. Experimental design plays a crucial role in quantitative proteomics, hence understanding and minimizing the variables is essential. Our results indicate that the technical variance dominantly contributes to the total variance in mouse brain and the genetic background has a negligible effect on the total variation. The results also characterise the anticipated variation using mouse brain for proteomic study hence they should be useful for future experimental design in other proteomics laboratories.  相似文献   

12.
The neutralization reaction is the most appropriate in vitro reference test system for assessing intratypic antigenic variation as it involves the antigenic determinants responsible for virus strain specificity and evoking protective antibody. Antigenic relationships determined in different neutralization test systems were independent of the system used and were assumed to truly reflect antigenic variation. The two-dimensional microneutralization test was found to be appropriate for foot and mouth disease (FMD) virus strain differentiation. To minimize test to test variation, comparisons are performed as matched pairs. The pooled variance of the test system is used to assess the significance of the relationships obtained. Antisera from convalescent animals were less specific than those from vaccinates. Serum quality seemed less critical for the virus neutralization than the complement fixation reaction. A system for FMD virus strain differentiation based on the use of the virus neutralization reaction taking into account the statistical and biological significance of observed r values is recommended.  相似文献   

13.
Optimal experimental design is important for the efficient use of modern highthroughput technologies such as microarrays and proteomics. Multiple factors including the reliability of measurement system, which itself must be estimated from prior experimental work, could influence design decisions. In this study, we describe how the optimal number of replicate measures (technical replicates) for each biological sample (biological replicate) can be determined. Different allocations of biological and technical replicates were evaluated by minimizing the variance of the ratio of technical variance (measurement error) to the total variance (sum of sampling error and measurement error). We demonstrate that if the number of biological replicates and the number of technical replicates per biological sample are variable, while the total number of available measures is fixed, then the optimal allocation of replicates for measurement evaluation experiments requires two technical replicates for each biological replicate. Therefore, it is recommended to use two technical replicates for each biological replicate if the goal is to evaluate the reproducibility of measurements.  相似文献   

14.
Hematopoietic stem cells replenish all the cells of the blood throughout the lifetime of an animal. Although thousands of stem cells reside in the bone marrow, only a few contribute to blood production at any given time. Nothing is known about the differences between individual stem cells that dictate their particular state of activation readiness. To examine such differences between individual stem cells, we determined the global gene expression profile of 12 single stem cells using microarrays. We showed that at least half of the genetic expression variability between 12 single cells profiled was due to biological variation in 44% of the genes analyzed. We also identified specific genes with high biological variance that are candidates for influencing the state of readiness of individual hematopoietic stem cells, and confirmed the variability of a subset of these genes using single-cell real-time PCR. Because apparent variation of some genes is likely due to technical factors, we estimated the degree of biological versus technical variation for each gene using identical RNA samples containing an RNA amount equivalent to that of single cells. This enabled us to identify a large cohort of genes with low technical variability whose expression can be reliably measured on the arrays at the single-cell level. These data have established that gene expression of individual stem cells varies widely, despite extremely high phenotypic homogeneity. Some of this variation is in key regulators of stem cell activity, which could account for the differential responses of particular stem cells to exogenous stimuli. The capacity to accurately interrogate individual cells for global gene expression will facilitate a systems approach to biological processes at a single-cell level.  相似文献   

15.
This work is a statistical analysis of reproducibility of a MALDI-TOF mass spectrometry experiment. Its aim is to evaluate measurement variability and compare peak intensities from two types of MALDI-TOF platforms. We compared and commented on the abilities of Principal Component Analysis and mixed-model analysis of variance to evaluate the biological variability and the technical variability of peak intensities in different patients. The properties and hypotheses of both methods are summarized and applied to spectra from plasma of patients with Hodgkin lymphoma. Principal Component Analysis checks rapidly the balance between the two variabilities; however, a mixed-model analysis of variance is necessary to quantify the biological and technical components of the experimental variance as well as their interactions and to split the total variance into between-subjects and within-subject components. The latter method helped to assess the reproducibility of measurements from two MALDI-TOF platforms and to decompose the technical variability according to the experimental design.  相似文献   

16.
ABSTRACT: BACKGROUND: mRNA expression data from next generation sequencing platforms is obtained in the form of counts per gene or exon. Counts have classically been assumed to follow a Poisson distribution in which the variance is equal to the mean. The Negative Binomial distribution which allows for over-dispersion, i.e., for the variance to be greater than the mean, is commonly used to model count data as well. RESULTS: In mRNA-Seq data from 25 subjects, we found technical variation to generally follow a Poisson distribution as has been reported previously and biological variability was over-dispersed relative to the Poisson model. The mean-variance relationship across all genes was quadratic, in keeping with a Negative Binomial (NB) distribution. Over-dispersed Poisson and NB distributional assumptions demonstrated marked improvements in goodness-of-fit (GOF) over the standard Poisson model assumptions, but with evidence of over-fitting in some genes. Modeling of experimental effects improved GOF for high variance genes but increased the over-fitting problem. CONCLUSIONS: These conclusions will guide development of analytical strategies for accurate modeling of variance structure in these data and sample size determination which in turn will aid in the identification of true biological signals that inform our understanding of biological systems.  相似文献   

17.
The QUEST system for quantitative analysis of two-dimensional gels   总被引:25,自引:0,他引:25  
The strategies and methods used by the QUEST system for two-dimensional gel analysis are described, and the performance of the system is evaluated. Radiolabeled proteins, resolved on two-dimensional gels and detected using calibrated exposures to film, are quantified in units of disintegrations per minute or as a fraction of the total protein radioactivity applied to the gel. Spot quantitation and resolution of overlapping spots is performed by two-dimensional gaussian fitting. Pattern matching is carried out for groups of gels called matchsets, and within each matchset every gel is matched to every other gel. During the matching process, spots are automatically added to each pattern at positions where unmatched spots were detected in other patterns. This results in enhanced accuracy for both spot detection and for matching. The spot fitting procedure is repeated after matching. Tests show that up to 97% of spots in each pattern can be matched and that fewer than 1% of the spots are matched inconsistently. Approximately 2000 proteins are detected from typical gels. Of these 1600 are high quality spots. Tests to measure the coefficient of variation of spot quantitation versus spot quality show that the average coefficient of variation for high quality spots is 21%. The intensities of the detected proteins range from 4 to 20,000 ppm of total protein synthesis. The QUEST analysis system has been used to build a quantitative database for the proteins of normal and transformed REF52 cells, as presented in the accompanying reports (Garrels, J., and Franza, B. R., Jr. (1989) J. Biol. Chem. 264, 5283-5298, 5299-5312).  相似文献   

18.

Background

The potential for astrocyte participation in central nervous system recovery is highlighted by in vitro experiments demonstrating their capacity to transdifferentiate into neurons. Understanding astrocyte plasticity could be advanced by comparing astrocytes with stem cells. RNA sequencing (RNA-seq) is ideal for comparing differences across cell types. However, this novel multi-stage process has the potential to introduce unwanted technical variation at several points in the experimental workflow. Quantitative understanding of the contribution of experimental parameters to technical variation would facilitate the design of robust RNA-Seq experiments.

Results

RNA-Seq was used to achieve biological and technical objectives. The biological aspect compared gene expression between normal human fetal-derived astrocytes and human neural stem cells cultured in identical conditions. When differential expression threshold criteria of |log2 fold change| > 2 were applied to the data, no significant differences were observed. The technical component quantified variation arising from particular steps in the research pathway, and compared the ability of different normalization methods to reduce unwanted variance. To facilitate this objective, a liberal false discovery rate of 10% and a |log2 fold change| > 0.5 were implemented for the differential expression threshold. Data were normalized with RPKM, TMM, and UQS methods using JMP Genomics. The contributions of key replicable experimental parameters (cell lot; library preparation; flow cell) to variance in the data were evaluated using principal variance component analysis. Our analysis showed that, although the variance for every parameter is strongly influenced by the normalization method, the largest contributor to technical variance was library preparation. The ability to detect differentially expressed genes was also affected by normalization; differences were only detected in non-normalized and TMM-normalized data.

Conclusions

The similarity in gene expression between astrocytes and neural stem cells supports the potential for astrocytic transdifferentiation into neurons, and emphasizes the need to evaluate the therapeutic potential of astrocytes for central nervous system damage. The choice of normalization method influences the contributions to experimental variance as well as the outcomes of differential expression analysis. However irrespective of normalization method, our findings illustrate that library preparation contributed the largest component of technical variance.
  相似文献   

19.
Near-isogenic sunflower lines containing 25% (inbred RHA280) and 48% (RHA801) oil by seed dry mass were comparatively analyzed in biological triplicate at 18 days after flowering using two-dimensional (both pI 3-10 and 4-7) Difference Gel Electrophoresis. Additionally, two inbred lines varying in oleic acid content, HA89 (18% oleic) and HA341 (89% oleic), were also analyzed in the same manner. Statistical analyses of these sunflower lines was performed beginning with fitting a mixed effects linear model to the log-transformed optical volume of each spot to account for gel variation, followed by testing the significance between varieties for mean transformed optical spot volumes. The p-values from the spot analysis procedures were then used to find the cutoff point for differential expression using a 10% false-discovery rate (FDR). Comparison of the oil content and oleic acid composition lines revealed 77 and 42 protein spots below the 10% FDR cutoff, respectively, and were therefore declared differentially expressed. Liquid chromatography-tandem mass spectrometry analysis of each of these protein spots resulted in assignments for 44 and 17 spots, respectively. Fructokinase, plastid phosphoglycerate kinase, and enolase proteins were determined to be up-regulated in the high oil line, while phosphofructokinase, cytosolic phosphoglucomutase, and cytsolic phosphoglycerate kinase were up-regulated in the low oil variety. Additionally, four activities involved in amino acid synthesis were up-regulated in the low oil variety in addition to 12S storage proteins and a protein similar to legumin storage protein. Interestingly, two 2-DE spots identified as 14-3-3 proteins were found to be up-regulated in high oleic acid variety. Alteration of glycolytic and amino acid biosynthetic enzymes, as well as storage protein levels, suggests seed oil content is tightly linked to carbohydrate metabolism and protein synthesis in a complex manner.  相似文献   

20.
Experimental variance is a major challenge when dealing with high-throughput sequencing data. This variance has several sources: sampling replication, technical replication, variability within biological conditions, and variability between biological conditions. The high per-sample cost of RNA-Seq often precludes the large number of experiments needed to partition observed variance into these categories as per standard ANOVA models. We show that the partitioning of within-condition to between-condition variation cannot reasonably be ignored, whether in single-organism RNA-Seq or in Meta-RNA-Seq experiments, and further find that commonly-used RNA-Seq analysis tools, as described in the literature, do not enforce the constraint that the sum of relative expression levels must be one, and thus report expression levels that are systematically distorted. These two factors lead to misleading inferences if not properly accommodated. As it is usually only the biological between-condition and within-condition differences that are of interest, we developed ALDEx, an ANOVA-like differential expression procedure, to identify genes with greater between- to within-condition differences. We show that the presence of differential expression and the magnitude of these comparative differences can be reasonably estimated with even very small sample sizes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号