首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
To facilitate collaborative research efforts between multi-investigator teams using DNA microarrays, we identified sources of error and data variability between laboratories and across microarray platforms, and methods to accommodate this variability. RNA expression data were generated in seven laboratories, which compared two standard RNA samples using 12 microarray platforms. At least two standard microarray types (one spotted, one commercial) were used by all laboratories. Reproducibility for most platforms within any laboratory was typically good, but reproducibility between platforms and across laboratories was generally poor. Reproducibility between laboratories increased markedly when standardized protocols were implemented for RNA labeling, hybridization, microarray processing, data acquisition and data normalization. Reproducibility was highest when analysis was based on biological themes defined by enriched Gene Ontology (GO) categories. These findings indicate that microarray results can be comparable across multiple laboratories, especially when a common platform and set of procedures are used.  相似文献   

2.
Non-biological experimental variation or "batch effects" are commonly observed across multiple batches of microarray experiments, often rendering the task of combining data from these batches difficult. The ability to combine microarray data sets is advantageous to researchers to increase statistical power to detect biological phenomena from studies where logistical considerations restrict sample size or in studies that require the sequential hybridization of arrays. In general, it is inappropriate to combine data sets without adjusting for batch effects. Methods have been proposed to filter batch effects from data, but these are often complicated and require large batch sizes ( > 25) to implement. Because the majority of microarray studies are conducted using much smaller sample sizes, existing methods are not sufficient. We propose parametric and non-parametric empirical Bayes frameworks for adjusting data for batch effects that is robust to outliers in small sample sizes and performs comparable to existing methods for large samples. We illustrate our methods using two example data sets and show that our methods are justifiable, easy to apply, and useful in practice. Software for our method is freely available at: http://biosun1.harvard.edu/complab/batch/.  相似文献   

3.
Chen C  Grennan K  Badner J  Zhang D  Gershon E  Jin L  Liu C 《PloS one》2011,6(2):e17238
The expression microarray is a frequently used approach to study gene expression on a genome-wide scale. However, the data produced by the thousands of microarray studies published annually are confounded by "batch effects," the systematic error introduced when samples are processed in multiple batches. Although batch effects can be reduced by careful experimental design, they cannot be eliminated unless the whole study is done in a single batch. A number of programs are now available to adjust microarray data for batch effects prior to analysis. We systematically evaluated six of these programs using multiple measures of precision, accuracy and overall performance. ComBat, an Empirical Bayes method, outperformed the other five programs by most metrics. We also showed that it is essential to standardize expression data at the probe level when testing for correlation of expression profiles, due to a sizeable probe effect in microarray data that can inflate the correlation among replicates and unrelated samples.  相似文献   

4.
MOTIVATION: There is a very large and growing level of effort toward improving the platforms, experiment designs, and data analysis methods for microarray expression profiling. Along with a growing richness in the approaches there is a growing confusion among most scientists as to how to make objective comparisons and choices between them for different applications. There is a need for a standard framework for the microarray community to compare and improve analytical and statistical methods. RESULTS: We report on a microarray data set comprising 204 in-situ synthesized oligonucleotide arrays, each hybridized with two-color cDNA samples derived from 20 different human tissues and cell lines. Design of the approximately 24 000 60mer oligonucleotides that report approximately 2500 known genes on the arrays, and design of the hybridization experiments, were carried out in a way that supports the performance assessment of alternative data processing approaches and of alternative experiment and array designs. We also propose standard figures of merit for success in detecting individual differential expression changes or expression levels, and for detecting similarities and differences in expression patterns across genes and experiments. We expect this data set and the proposed figures of merit will provide a standard framework for much of the microarray community to compare and improve many analytical and statistical methods relevant to microarray data analysis, including image processing, normalization, error modeling, combining of multiple reporters per gene, use of replicate experiments, and sample referencing schemes in measurements based on expression change. AVAILABILITY/SUPPLEMENTARY INFORMATION: Expression data and supplementary information are available at http://www.rii.com/publications/2003/HE_SDS.htm  相似文献   

5.
Z-score transformation has been successfully used as a normalisation procedure for microarray data generated using radioactively labelled probes with spotted cDNA arrays. One of the advantages of the z-score transformation method is that it provides a way of standardising data across a wide range of experiments and allows the comparison of microarray data independent of the original hybridisation intensities. The feasibility of applying z-score transformation to other types of linear microarray data, specifically that generated using fluorescently labelled probes with Affymetrix chips, was tested in three separate scenarios and is discussed here. In the first scenario, Affymetrix data from the NCBI (National Center for Biotechnology Information) GEO (Gene Expression Omnibus) database was used to demonstrate that z-score transformation preserved the essential phylogenetic grouping between primate species' fibroblast gene expression baseline measurements. The second scenario employed z-score transformation on data consisting of a series of genes spiked-in at known concentrations and arrayed in a Latin square format. We were able to reconstruct the entire set of spike-in concentration curves without prior knowledge of their format by using z-score transformation as the normalisation process. Finally, we show that z-score transformed data maintains the integrity of separate samples from different experiments and laboratories, as demonstrated by accurate grouping of clustered data according to sample identity. We conclude that data normalised by z-score transformation can be easily used with Affymetrix data without noticeable loss of information content. Z-score transformation provides a useful tool for comparisons between experiments and between laboratories that use the Affymetrix platform.  相似文献   

6.
The widespread use of DNA microarrays has led to the discovery of many genes whose expression profile may have significant clinical relevance. The translation of this data to the bedside requires that gene expression be validated as protein expression, and that annotated clinical samples be available for correlative and quantitative studies to assess clinical context and usefulness of putative biomarkers. We review two microarray platforms developed to facilitate the clinical validation of candidate biomarkers: tissue microarrays and reverse-phase protein microarrays. Tissue microarrays are arrays of core biopsies obtained from paraffin-embedded tissues, which can be assayed for histologically-specific protein expression by immunohistochemistry. Reverse-phase protein microarrays consist of arrays of cell lysates or, more recently, plasma or serum samples, which can be assayed for protein quantity and for the presence of post-translational modifications such as phosphorylation. Although these platforms are limited by the availability of validated antibodies, both enable the preservation of precious clinical samples as well as experimental standardization in a high-throughput manner proper to microarray technologies. While tissue microarrays are rapidly becoming a mainstay of translational research, reverse-phase protein microarrays require further technical refinements and validation prior to their widespread adoption by research laboratories.  相似文献   

7.
The cDNA microarray is one technological approach that has the potential to accurately measure changes in global mRNA expression levels. We report an assessment of an optimized cDNA microarray platform to generate accurate, precise and reliable data consistent with the objective of using microarrays as an acquisition platform to populate gene expression databases. The study design consisted of two independent evaluations with 70 arrays from two different manufactured lots and used three human tissue sources as samples: placenta, brain and heart. Overall signal response was linear over three orders of magnitude and the sensitivity for any element was estimated to be 2 pg mRNA. The calculated coefficient of variation for differential expression for all non-differentiated elements was 12–14% across the entire signal range and did not vary with array batch or tissue source. The minimum detectable fold change for differential expression was 1.4. Accuracy, in terms of bias (observed minus expected differential expression ratio), was less than 1 part in 10 000 for all non-differentiated elements. The results presented in this report demonstrate the reproducible performance of the cDNA microarray technology platform and the methods provide a useful framework for evaluating other technologies that monitor changes in global mRNA expression.  相似文献   

8.
Analysis of repeatability in spotted cDNA microarrays   总被引:7,自引:3,他引:4  
We report a strategy for analysis of data quality in cDNA microarrays based on the repeatability of repeatedly spotted clones. We describe how repeatability can be used to control data quality by developing adaptive filtering criteria for microarray data containing clones spotted in multiple spots. We have applied the method on five publicly available cDNA microarray data sets and one previously unpublished data set from our own laboratory. The results demonstrate the feasibility of the approach as a foundation for data filtering, and indicate a high degree of variation in data quality, both across the data sets and between arrays within data sets.  相似文献   

9.
MOTIVATION: Most supervised classification methods are limited by the requirement for more cases than variables. In microarray data the number of variables (genes) far exceeds the number of cases (arrays), and thus filtering and pre-selection of genes is required. We describe the application of Between Group Analysis (BGA) to the analysis of microarray data. A feature of BGA is that it can be used when the number of variables (genes) exceeds the number of cases (arrays). BGA is based on carrying out an ordination of groups of samples, using a standard method such as Correspondence Analysis (COA), rather than an ordination of the individual microarray samples. As such, it can be viewed as a method of carrying out COA with grouped data. RESULTS: We illustrate the power of the method using two cancer data sets. In both cases, we can quickly and accurately classify test samples from any number of specified a priori groups and identify the genes which characterize these groups. We obtained very high rates of correct classification, as determined by jack-knife or validation experiments with training and test sets. The results are comparable to those from other methods in terms of accuracy but the power and flexibility of BGA make it an especially attractive method for the analysis of microarray cancer data.  相似文献   

10.
Over the last decade, gene expression microarrays have had a profound impact on biomedical research. The diversity of platforms and analytical methods available to researchers have made the comparison of data from multiple platforms challenging. In this study, we describe a framework for comparisons across platforms and laboratories. We have attempted to include nearly all the available commercial and 'in-house' platforms. Using probe sequences matched at the exon level improved consistency of measurements across the different microarray platforms compared to annotation-based matches. Generally, consistency was good for highly expressed genes, and variable for genes with lower expression values as confirmed by quantitative real-time (QRT)-PCR. Concordance of measurements was higher between laboratories on the same platform than across platforms. We demonstrate that, after stringent preprocessing, commercial arrays were more consistent than in-house arrays, and by most measures, one-dye platforms were more consistent than two-dye platforms.  相似文献   

11.
In order to engage their students in a core methodology of the new genomics era, an ever-increasing number of faculty at primarily undergraduate institutions are gaining access to microarray technology. Their students are conducting successful microarray experiments designed to address a variety of interesting questions. A next step in these teaching and research laboratory projects is often validation of the microarray data for individual selected genes. In the research community, this usually involves the use of real-time polymerase chain reaction (PCR), a technology that requires instrumentation and reagents that are prohibitively expensive for most undergraduate institutions. The results of a survey of faculty teaching undergraduates in classroom and research settings indicate a clear need for an alternative approach. We sought to develop an inexpensive and student-friendly gel electrophoresis-based PCR method for quantifying messenger RNA (mRNA) levels using undergraduate researchers as models for students in teaching and research laboratories. We compared the results for three selected genes measured by microarray analysis, real-time PCR, and the gel electrophoresis-based method. The data support the use of the gel electrophoresis-based method as an inexpensive, convenient, yet reliable alternative for quantifying mRNA levels in undergraduate laboratories.  相似文献   

12.
This article focuses on microarray experiments with two or more factors in which treatment combinations of the factors corresponding to the samples paired together onto arrays are not completely random. A main effect of one (or more) factor(s) is confounded with arrays (the experimental blocks). This is called a split-plot microarray experiment. We utilise an analysis of variance (ANOVA) model to assess differentially expressed genes for between-array and within-array comparisons that are generic under a split-plot microarray experiment. Instead of standard t- or F-test statistics that rely on mean square errors of the ANOVA model, we use a robust method, referred to as 'a pooled percentile estimator', to identify genes that are differentially expressed across different treatment conditions. We illustrate the design and analysis of split-plot microarray experiments based on a case application described by Jin et al. A brief discussion of power and sample size for split-plot microarray experiments is also presented.  相似文献   

13.
14.
Jeffrey T. Leek 《Biometrics》2011,67(2):344-352
Summary High‐dimensional data, such as those obtained from a gene expression microarray or second generation sequencing experiment, consist of a large number of dependent features measured on a small number of samples. One of the key problems in genomics is the identification and estimation of factors that associate with many features simultaneously. Identifying the number of factors is also important for unsupervised statistical analyses such as hierarchical clustering. A conditional factor model is the most common model for many types of genomic data, ranging from gene expression, to single nucleotide polymorphisms, to methylation. Here we show that under a conditional factor model for genomic data with a fixed sample size, the right singular vectors are asymptotically consistent for the unobserved latent factors as the number of features diverges. We also propose a consistent estimator of the dimension of the underlying conditional factor model for a finite fixed sample size and an infinite number of features based on a scaled eigen‐decomposition. We propose a practical approach for selection of the number of factors in real data sets, and we illustrate the utility of these results for capturing batch and other unmodeled effects in a microarray experiment using the dependence kernel approach of Leek and Storey (2008, Proceedings of the National Academy of Sciences of the United States of America 105 , 18718–18723) .  相似文献   

15.
The utility of previously generated microarray data is severely limited owing to small study size, leading to under-powered analysis, and failure of replication. Multiplicity of platforms and various sources of systematic noise limit the ability to compile existing data from similar studies. We present a model for transformation of data across different generations of Affymetrix arrays, developed using previously published datasets describing technical replicates performed with two generations of arrays. The transformation is based upon a probe set-specific regression model, generated from replicate measurements across platforms, performed using correlation coefficients. The model, when applied to the expression intensities of 5069 shared, sequence-matched probe sets in three different generations of Affymetrix Human oligonucleotide arrays, showed significant improvement in inter generation correlations between sample-wide means and individual probe set pairs. The approach was further validated by an observed reduction in Euclidean distance between signal intensities across generations for the predicted values. Finally, application of the model to independent, but related datasets resulted in improved clustering of samples based upon their biological, as opposed to technical, attributes. Our results suggest that this transformation method is a valuable tool for integrating microarray datasets from different generations of arrays.  相似文献   

16.
Yang Y  Zhu M  Wu L  Zhou J 《BMC genomics》2008,9(Z2):S5

Background

Using genomic DNA as common reference in microarray experiments has recently been tested by different laboratories. Conflicting results have been reported with regard to the reliability of microarray results using this method. To explain it, we hypothesize that data processing is a critical element that impacts the data quality.

Results

Microarray experiments were performed in a γ-proteobacterium Shewanella oneidensis. Pair-wise comparison of three experimental conditions was obtained either with two labeled cDNA samples co-hybridized to the same array, or by employing Shewanella genomic DNA as a standard reference. Various data processing techniques were exploited to reduce the amount of inconsistency between both methods and the results were assessed. We discovered that data quality was significantly improved by imposing the constraint of minimal number of replicates, logarithmic transformation and random error analyses.

Conclusion

These findings demonstrate that data processing significantly influences data quality, which provides an explanation for the conflicting evaluation in the literature. This work could serve as a guideline for microarray data analysis using genomic DNA as a standard reference.
  相似文献   

17.
18.
Measurements of gene expression from microarray experiments are highly dependent on experimental design. Systematic noise can be introduced into the data at numerous steps. On Illumina BeadChips, multiple samples are assayed in an ordered series of arrays. Two experiments were performed using the same samples but different hybridization designs. An experiment confounding genotype with BeadChip and treatment with array position was compared to another experiment in which these factors were randomized to BeadChip and array position. An ordinal effect of array position on intensity values was observed in both experiments. We demonstrate that there is increased rate of false-positive results in the confounded design and that attempts to correct for confounded effects by statistical modeling reduce power of detection for true differential expression. Simple analysis models without post hoc corrections provide the best results possible for a given experimental design. Normalization improved differential expression testing in both experiments but randomization was the most important factor for establishing accurate results. We conclude that lack of randomization cannot be corrected by normalization or by analytical methods. Proper randomization is essential for successful microarray experiments.  相似文献   

19.
20.
Microarrays are used to study gene expression in a variety of biological systems. A number of different platforms have been developed, but few studies exist that have directly compared the performance of one platform with another. The goal of this study was to determine array variation by analyzing the same RNA samples with three different array platforms. Using gene expression responses to benzo[a]pyrene exposure in normal human mammary epithelial cells (NHMECs), we compared the results of gene expression profiling using three microarray platforms: photolithographic oligonucleotide arrays (Affymetrix), spotted oligonucleotide arrays (Amersham), and spotted cDNA arrays (NCI). While most previous reports comparing microarrays have analyzed pre-existing data from different platforms, this comparison study used the same sample assayed on all three platforms, allowing for analysis of variation from each array platform. In general, poor correlation was found with corresponding measurements from each platform. Each platform yielded different gene expression profiles, suggesting that while microarray analysis is a useful discovery tool, further validation is needed to extrapolate results for broad use of the data. Also, microarray variability needs to be taken into consideration, not only in the data analysis but also in specific probe selection for each array type.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号