首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 343 毫秒
1.
Statistical tests for differential expression in cDNA microarray experiments   总被引:13,自引:0,他引:13  
Extracting biological information from microarray data requires appropriate statistical methods. The simplest statistical method for detecting differential expression is the t test, which can be used to compare two conditions when there is replication of samples. With more than two conditions, analysis of variance (ANOVA) can be used, and the mixed ANOVA model is a general and powerful approach for microarray experiments with multiple factors and/or several sources of variation.  相似文献   

2.
3.

Background

The potential for astrocyte participation in central nervous system recovery is highlighted by in vitro experiments demonstrating their capacity to transdifferentiate into neurons. Understanding astrocyte plasticity could be advanced by comparing astrocytes with stem cells. RNA sequencing (RNA-seq) is ideal for comparing differences across cell types. However, this novel multi-stage process has the potential to introduce unwanted technical variation at several points in the experimental workflow. Quantitative understanding of the contribution of experimental parameters to technical variation would facilitate the design of robust RNA-Seq experiments.

Results

RNA-Seq was used to achieve biological and technical objectives. The biological aspect compared gene expression between normal human fetal-derived astrocytes and human neural stem cells cultured in identical conditions. When differential expression threshold criteria of |log2 fold change| > 2 were applied to the data, no significant differences were observed. The technical component quantified variation arising from particular steps in the research pathway, and compared the ability of different normalization methods to reduce unwanted variance. To facilitate this objective, a liberal false discovery rate of 10% and a |log2 fold change| > 0.5 were implemented for the differential expression threshold. Data were normalized with RPKM, TMM, and UQS methods using JMP Genomics. The contributions of key replicable experimental parameters (cell lot; library preparation; flow cell) to variance in the data were evaluated using principal variance component analysis. Our analysis showed that, although the variance for every parameter is strongly influenced by the normalization method, the largest contributor to technical variance was library preparation. The ability to detect differentially expressed genes was also affected by normalization; differences were only detected in non-normalized and TMM-normalized data.

Conclusions

The similarity in gene expression between astrocytes and neural stem cells supports the potential for astrocytic transdifferentiation into neurons, and emphasizes the need to evaluate the therapeutic potential of astrocytes for central nervous system damage. The choice of normalization method influences the contributions to experimental variance as well as the outcomes of differential expression analysis. However irrespective of normalization method, our findings illustrate that library preparation contributed the largest component of technical variance.
  相似文献   

4.
5.
6.

Background

Despite sharing the same genes, identical twins demonstrate substantial variability in behavioral traits and in their risk for disease. Epigenetic factors–DNA and chromatin modifications that affect levels of gene expression without affecting the DNA sequence–are thought to be important in establishing this variability. Epigenetically-mediated differences in the levels of gene expression that are associated with individual variability traditionally are thought to occur only in a gene-specific manner. We challenge this idea by exploring the large-scale organizational patterns of gene expression in an epigenetic model of behavioral variability.

Methodology/Findings

To study the effects of epigenetic influences on behavioral variability, we examine gene expression in genetically identical mice. Using a novel approach to microarray analysis, we show that variability in the large-scale organization of gene expression levels, rather than differences in the expression levels of specific genes, is associated with individual differences in behavior. Specifically, increased activity in the open field is associated with increased variance of log-transformed measures of gene expression in the hippocampus, a brain region involved in open field activity. Early life experience that increases adult activity in the open field also similarly modifies the variance of gene expression levels. The same association of the variance of gene expression levels with behavioral variability is found with levels of gene expression in the hippocampus of genetically heterogeneous outbred populations of mice, suggesting that variation in the large-scale organization of gene expression levels may also be relevant to phenotypic differences in outbred populations such as humans. We find that the increased variance in gene expression levels is attributable to an increasing separation of several large, log-normally distributed families of gene expression levels. We also show that the presence of these multiple log-normal distributions of gene expression levels is a universal characteristic of gene expression in eurkaryotes. We use data from the MicroArray Quality Control Project (MAQC) to demonstrate that our method is robust and that it reliably detects biological differences in the large-scale organization of gene expression levels.

Conclusions

Our results contrast with the traditional belief that epigenetic effects on gene expression occur only at the level of specific genes and suggest instead that the large-scale organization of gene expression levels provides important insights into the relationship of gene expression with behavioral variability. Understanding the epigenetic, genetic, and environmental factors that regulate the large-scale organization of gene expression levels, and how changes in this large-scale organization influences brain development and behavior will be a major future challenge in the field of behavioral genomics.  相似文献   

7.
Microarray experiments are being increasingly used in molecular biology. A common task is to detect genes with differential expression across two experimental conditions, such as two different tissues or the same tissue at two time points of biological development. To take proper account of statistical variability, some statistical approaches based on the t-statistic have been proposed. In constructing the t-statistic, one needs to estimate the variance of gene expression levels. With a small number of replicated array experiments, the variance estimation can be challenging. For instance, although the sample variance is unbiased, it may have large variability, leading to a large mean squared error. For duplicated array experiments, a new approach based on simple averaging has recently been proposed in the literature. Here we consider two more general approaches based on nonparametric smoothing. Our goal is to assess the performance of each method empirically. The three methods are applied to a colon cancer data set containing 2,000 genes. Using two arrays, we compare the variance estimates obtained from the three methods. We also consider their impact on the t-statistics. Our results indicate that the three methods give variance estimates close to each other. Due to its simplicity and generality, we recommend the use of the smoothed sample variance for data with a small number of replicates. Electronic Publication  相似文献   

8.
9.
10.
A flexible statistical framework is developed for the analysis of read counts from RNA-Seq gene expression studies. It provides the ability to analyse complex experiments involving multiple treatment conditions and blocking variables while still taking full account of biological variation. Biological variation between RNA samples is estimated separately from the technical variation associated with sequencing technologies. Novel empirical Bayes methods allow each gene to have its own specific variability, even when there are relatively few biological replicates from which to estimate such variability. The pipeline is implemented in the edgeR package of the Bioconductor project. A case study analysis of carcinoma data demonstrates the ability of generalized linear model methods (GLMs) to detect differential expression in a paired design, and even to detect tumour-specific expression changes. The case study demonstrates the need to allow for gene-specific variability, rather than assuming a common dispersion across genes or a fixed relationship between abundance and variability. Genewise dispersions de-prioritize genes with inconsistent results and allow the main analysis to focus on changes that are consistent between biological replicates. Parallel computational approaches are developed to make non-linear model fitting faster and more reliable, making the application of GLMs to genomic data more convenient and practical. Simulations demonstrate the ability of adjusted profile likelihood estimators to return accurate estimators of biological variability in complex situations. When variation is gene-specific, empirical Bayes estimators provide an advantageous compromise between the extremes of assuming common dispersion or separate genewise dispersion. The methods developed here can also be applied to count data arising from DNA-Seq applications, including ChIP-Seq for epigenetic marks and DNA methylation analyses.  相似文献   

11.
We report an analysis of allele-specific expression (ASE) and parent-of-origin expression in adult mouse liver using next generation sequencing (RNA-Seq) of reciprocal crosses of heterozygous F1 mice from the parental strains C57BL/6J and DBA/2J. We found a 60% overlap between genes exhibiting ASE and putative cis-acting expression quantitative trait loci (cis-eQTL) identified in an intercross between the same strains. We discuss the various biological and technical factors that contribute to the differences. We also identify genes exhibiting parental imprinting and complex expression patterns. Our study demonstrates the importance of biological replicates to limit the number of false positives with RNA-Seq data.  相似文献   

12.
Although there is a substantial body of work on how temperature shapes coastal marine ecosystems, the spatiotemporal variability of seawater pH and corresponding in situ biological responses remain largely unknown across biogeographic ranges of tropical coral species. Environmental variability is important to characterize because it can amplify or dampen the biological consequences of global change, depending on the functional relationship between mean temperature or pH and organismal traits. Here, we characterize the spatiotemporal variability of pH, temperature, and salinity at fringing reefs in Moorea, French Polynesia and Nanwan Bay, Taiwan using advanced time series analysis, including wavelet analysis, and infer their potential impact on the persistence and stability of coral populations. Our results demonstrate that both the mean and variance of pH and temperature differed significantly between sites in Moorea and Taiwan. Seawater temperature at the Moorea site passed the local bleaching threshold several times within the ~45 day deployment while aragonite saturation state at the Taiwan site was often below commonly observed levels for coral reefs. Our results showcase how a better understanding of the differences in environmental conditions between sites can (1) provide an important frame of reference for designing laboratory experiments to study the effects of environmental variability, (2) identify the proximity of current environmental conditions to predicted biological thresholds for the coral reef, and (3) help predict when the temporal variability and mean of environmental conditions will interact synergistically or antagonistically to alter the abundance and stability of marine populations experiencing climate change.  相似文献   

13.
14.
Variations in gene expression level might lead to phenotypic diversity across individuals or populations. Although many human genes are found to have differential mRNA levels between populations, the extent of gene expression that could vary within and between populations largely remains elusive. To investigate the dynamic range of gene expression, we analyzed the expression variability of ∼18, 000 human genes across individuals within HapMap populations. Although ∼20% of human genes show differentiated mRNA levels between populations, our results show that expression variability of most human genes in one population is not significantly deviant from another population, except for a small fraction that do show substantially higher expression variability in a particular population. By associating expression variability with sequence polymorphism, intriguingly, we found SNPs in the untranslated regions (5′ and 3′UTRs) of these variable genes show consistently elevated population heterozygosity. We performed differential expression analysis on a genome-wide scale, and found substantially reduced expression variability for a large number of genes, prohibiting them from being differentially expressed between populations. Functional analysis revealed that genes with the greatest within-population expression variability are significantly enriched for chemokine signaling in HIV-1 infection, and for HIV-interacting proteins that control viral entry, replication, and propagation. This observation combined with the finding that known human HIV host factors show substantially elevated expression variability, collectively suggest that gene expression variability might explain differential HIV susceptibility across individuals.  相似文献   

15.
We describe biological and experimental factors that induce variability in reporter ion peak areas obtained from iTRAQ experiments. We demonstrate how these factors can be incorporated into a statistical model for use in evaluating differential protein expression and highlight the benefits of using analysis of variance to quantify fold change. We demonstrate the model's utility based on an analysis of iTRAQ data derived from a spike-in study.  相似文献   

16.
Understanding the differences between microarray and RNA-Seq technologies for measuring gene expression is necessary for informed design of experiments and choice of data analysis methods. Previous comparisons have come to sometimes contradictory conclusions, which we suggest result from a lack of attention to the intensity-dependent nature of variation generated by the technologies. To examine this trend, we carried out a parallel nested experiment performed simultaneously on the two technologies that systematically split variation into four stages (treatment, biological variation, library preparation and chip/lane noise), allowing a separation and comparison of the sources of variation in a well-controlled cellular system, Saccharomyces cerevisiae. With this novel dataset, we demonstrate that power and accuracy are more dependent on per-gene read depth in RNA-Seq than they are on fluorescence intensity in microarrays. However, we carried out quantitative PCR validations which indicate that microarrays may demonstrate greater systematic bias in low-intensity genes than in RNA-seq.  相似文献   

17.

Introduction

A central issue in the design of microarray-based analysis of global gene expression is that variability resulting from experimental processes may obscure changes resulting from the effect being investigated. This study quantified the variability in gene expression at each level of a typical in vitro stimulation experiment using human peripheral blood mononuclear cells (PBMC). The primary objective was to determine the magnitude of biological and technical variability relative to the effect being investigated, namely gene expression changes resulting from stimulation with lipopolysaccharide (LPS).

Methods and Results

Human PBMC were stimulated in vitro with LPS, with replication at 5 levels: 5 subjects each on 2 separate days with technical replication of LPS stimulation, amplification and hybridisation. RNA from samples stimulated with LPS and unstimulated samples were hybridised against common reference RNA on oligonucleotide microarrays. There was a closer correlation in gene expression between replicate hybridisations (0.86–0.93) than between different subjects (0.66–0.78). Deconstruction of the variability at each level of the experimental process showed that technical variability (standard deviation (SD) 0.16) was greater than biological variability (SD 0.06), although both were low (SD<0.1 for all individual components). There was variability in gene expression both at baseline and after stimulation with LPS and proportion of cell subsets in PBMC was likely partly responsible for this. However, gene expression changes after stimulation with LPS were much greater than the variability from any source, either individually or combined.

Conclusions

Variability in gene expression was very low and likely to improve further as technical advances are made. The finding that stimulation with LPS has a markedly greater effect on gene expression than the degree of variability provides confidence that microarray-based studies can be used to detect changes in gene expression of biological interest in infectious diseases.  相似文献   

18.
MOTIVATION: DNA microarrays are now capable of providing genome-wide patterns of gene expression across many different conditions. The first level of analysis of these patterns requires determining whether observed differences in expression are significant or not. Current methods are unsatisfactory due to the lack of a systematic framework that can accommodate noise, variability, and low replication often typical of microarray data. RESULTS: We develop a Bayesian probabilistic framework for microarray data analysis. At the simplest level, we model log-expression values by independent normal distributions, parameterized by corresponding means and variances with hierarchical prior distributions. We derive point estimates for both parameters and hyperparameters, and regularized expressions for the variance of each gene by combining the empirical variance with a local background variance associated with neighboring genes. An additional hyperparameter, inversely related to the number of empirical observations, determines the strength of the background variance. Simulations show that these point estimates, combined with a t -test, provide a systematic inference approach that compares favorably with simple t -test or fold methods, and partly compensate for the lack of replication.  相似文献   

19.
We studied the general problem of interpreting and detecting differences in phenotypic variability among the genotypes at a locus, from both a biological and a statistical point of view. The scales on which we measure interval-scale quantitative traits are man-made and have little intrinsic biological relevance. Before claiming a biological interpretation for genotype differences in variance, we should be sure that no monotonic transformation of the data can reduce or eliminate these differences. We show theoretically that for an autosomal diallelic SNP, when the three corresponding means are distinct so that the variance can be expressed as a quadratic function of the mean, there implicitly exists a transformation that will tend to equalize the three variances; we also demonstrate how to find a transformation that will do this. We investigate the validity of Bartlett’s test, Box’s modification of it, and a modified Levene’s test to test for differences in variances when normality does not hold. We find that, although they may detect differences in variability, these tests do not necessarily detect differences in variance. The same is true for permutation tests that use these three statistics.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号