首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 17 毫秒


There are currently many different methods for processing and summarizing probe-level data from Affymetrix oligonucleotide arrays. It is of great interest to validate these methods and identify those that are most effective. There is no single best way to do this validation, and a variety of approaches is needed. Moreover, gene expression data are collected to answer a variety of scientific questions, and the same method may not be best for all questions. Only a handful of validation studies have been done so far, most of which rely on spike-in datasets and focus on the question of detecting differential expression. Here we seek methods that excel at estimating relative expression. We evaluate methods by identifying those that give the strongest linear association between expression measurements by array and the "gold-standard" assay.  相似文献   

Preprocessing of oligonucleotide array data   总被引:18,自引:0,他引:18  
Wu Z  Irizarry RA 《Nature biotechnology》2004,22(6):656-8; author reply 658

MOTIVATION: When running experiments that involve multiple high density oligonucleotide arrays, it is important to remove sources of variation between arrays of non-biological origin. Normalization is a process for reducing this variation. It is common to see non-linear relations between arrays and the standard normalization provided by Affymetrix does not perform well in these situations. RESULTS: We present three methods of performing normalization at the probe intensity level. These methods are called complete data methods because they make use of data from all arrays in an experiment to form the normalizing relation. These algorithms are compared to two methods that make use of a baseline array: a one number scaling based algorithm and a method that uses a non-linear normalizing relation by comparing the variability and bias of an expression measure. Two publicly available datasets are used to carry out the comparisons. The simplest and quickest complete data method is found to perform favorably. AVAILABILITY: Software implementing all three of the complete data normalization methods is available as part of the R package Affy, which is a part of the Bioconductor project http://www.bioconductor.org. SUPPLEMENTARY INFORMATION: Additional figures may be found at http://www.stat.berkeley.edu/~bolstad/normalize/index.html  相似文献   

Microarray gene-expression profiles are generally validated one gene at a time by real-time RT-PCR. We describe here a different approach based on simultaneous mutual validation of large numbers of genes using two different expression-profiling platforms. The result described here for the NCI-60 cancer cell lines is a consensus set of genes that give similar profiles on spotted cDNA arrays and Affymetrix oligonucleotide chips. Global concordance is parameterized by a 'correlation of correlations' coefficient.  相似文献   

cDNA arrays allow quantitative measurement of expression levels for thousands of genes simultaneously. The measurements are affected by many sources of variation, and substantial improvements in the precision of estimated effects accompany adjustments for these effects. Two generic nuisance variations, one associated with the magnitude of expression and the other associated with array location, are common in data from filter arrays. Procedures, like normalization using lowess regression, are effective at reducing variation associated with magnitude, and they have been widely adopted. However, variation associated with location has received less attention. Here, a simple, but effective method based on localized median is expounded for dealing with these nuisance effects, and its properties are discussed. The proposed methodology handles location-dependent variation ("splotches") and magnitude-dependent variation (background and/or saturation) effectively. The procedure is related to lowess when implemented to adjust magnitude-dependent variation, and it performs similarly. The proposed methodology is illustrated with data from the National Center for Toxicological Research (NCTR), where treatment differences in levels of mRNA from rat hepatocytes were assessed using 33P-labeled samples hybridized to cDNA spotted arrays. Normalizing intensities by the median-of-subsets removes systematic variation associated with the location of a gene on the array and/or the level of its expression. This procedure is easy to implement using iteratively reweighted least-squares algorithms. Although less sophisticated than lowess, this procedure works nearly as well for normalizing intensities based upon their magnitude. Unlike lowess, it can adjust for location-dependent effects.  相似文献   

Strehlow D 《BioTechniques》2000,29(1):118-121
Software is described that facilitates the analysis of phosphoimages from large array hybridizations. The Macintosh PowerPC-compatible application and its manual are available at no charge from http:?people.bu.edu/strehlow. The software is compatible with both custom formats and array filters from three commercial manufacturers. It allows the rapid quantitation of every spot on images of hybridizations to large arrays. The user drags grids of squares over the spots on the image to define the coordinates of each spot, then aligns and edits the position of the grid. The software then corrects the positions as necessary and quantitates up to 27,000 spots per image. It stores the numerical values for each signal in a format called the fingerprint file. Fingerprint files can be directly averaged or compared, allowing the user to find mean values or differences in data from independent hybridization experiments. Data can be recalled from the fingerprint file and can be output in a variety of spreadsheet formats with several options for background correction. Finally, the software offers an output format that allows the convenient visualization of data points using animated, three-dimensional graphs.  相似文献   

New normalization methods for cDNA microarray data   总被引:7,自引:0,他引:7  
MOTIVATION: The focus of this paper is on two new normalization methods for cDNA microarrays. After the image analysis has been performed on a microarray and before differentially expressed genes can be detected, some form of normalization must be applied to the microarrays. Normalization removes biases towards one or other of the fluorescent dyes used to label each mRNA sample allowing for proper evaluation of differential gene expression. RESULTS: The two normalization methods that we present here build on previously described non-linear normalization techniques. We extend these techniques by firstly introducing a normalization method that deals with smooth spatial trends in intensity across microarrays, an important issue that must be dealt with. Secondly we deal with normalization of a new type of cDNA microarray experiment that is coming into prevalence, the small scale specialty or 'boutique' array, where large proportions of the genes on the microarrays are expected to be highly differentially expressed. AVAILABILITY: The normalization methods described in this paper are available via http://www.pi.csiro.au/gena/ in a software suite called tRMA: tools for R Microarray Analysis upon request of the authors. Images and data used in this paper are also available via the same link.  相似文献   

An objective of many functional genomics studies is to estimate treatment-induced changes in gene expression. cDNA arrays interrogate each tissue sample for the levels of mRNA for hundreds to tens of thousands of genes, and the use of this technology leads to a multitude of treatment contrasts. By-gene hypotheses tests evaluate the evidence supporting no effect, but selecting a significance level requires dealing with the multitude of comparisons. The p-values from these tests order the genes such that a p-value cutoff divides the genes into two sets. Ideally one set would contain the affected genes and the other would contain the unaffected genes. However, the set of genes selected as affected will have false positives, i.e., genes that are not affected by treatment. Likewise, the other set of genes, selected as unaffected, will contain false negatives, i.e., genes that are affected. A plot of the observed p-values (1 - p) versus their expectation under a uniform [0, 1] distribution allows one to estimate the number of true null hypotheses. With this estimate, the false positive rates and false negative rates associated with any p-value cutoff can be estimated. When computed for a range of cutoffs, these rates summarize the ability of the study to resolve effects. In our work, we are more interested in selecting most of the affected genes rather than protecting against a few false positives. An optimum cutoff, i.e., the best set given the data, depends upon the relative cost of falsely classifying a gene as affected versus the cost of falsely classifying a gene as unaffected. We select the cutoff by a decision-theoretic method analogous to methods developed for receiver operating characteristic curves. In addition, we estimate the false discovery rate and the false nondiscovery rate associated with any cutoff value. Two functional genomics studies that were designed to assess a treatment effect are used to illustrate how the methods allowed the investigators to determine a cutoff to suit their research goals.  相似文献   



Affymetrix oligonucleotide arrays simultaneously measure the abundances of thousands of mRNAs in biological samples. Comparability of array results is necessary for the creation of large-scale gene expression databases. The standard strategy for normalizing oligonucleotide array readouts has practical drawbacks. We describe alternative normalization procedures for oligonucleotide arrays based on a common pool of known biotin-labeled cRNAs spiked into each hybridization.  相似文献   

MOTIVATION: Analysis of oligonucleotide array data, especially to select genes of interest, is a highly challenging task because of the large volume of information and various experimental factors. Moreover, interaction effect (i.e. expression changes depend on probe effects) complicates the analysis because current methods often use an additive model to analyze data. We propose an approach to address these issues with the aim of producing a more reliable selection of differentially expressed genes. The approach uses the rank for normalization, employs the percentile-range to measure expression variation, and applies various filters to monitor expression changes. RESULTS: We compare our approach with MAS and Dchip models. A data set from an angiogenesis study is used for illustration. Results show that our approach performs better than other methods either in identification of the positive control gene or in PCR confirmatory tests. In addition, the invariant set of genes in our approach provides an efficient way for normalization.  相似文献   

High-density single nucleotide polymorphism microarrays (SNP chips) provide information on a subject's genome, such as copy number and genotype (heterozygosity/homozygosity) at a SNP. While fluorescence in situ hybridization and karyotyping reveal many abnormalities, SNP chips provide a higher resolution map of the human genome that can be used to detect, e.g., aneuploidies, microdeletions, microduplications and loss of heterozygosity (LOH). As a variety of diseases are linked to such chromosomal abnormalities, SNP chips promise new insights for these diseases by aiding in the discovery of such regions, and may suggest targets for intervention. The R package SNPchip contains classes and methods useful for storing, visualizing and analyzing high density SNP data. Originally developed from the SNPscan web-tool, SNPchip utilizes S4 classes and extends other open source R tools available at Bioconductor. This has numerous advantages, including the ability to build statistical models for SNP-level data that operate on instances of the class, and to communicate with other R packages that add additional functionality. AVAILABILITY: The package is available from the Bioconductor web page at www.bioconductor.org. SUPPLEMENTARY INFORMATION: The supplementary material as described in this article (case studies, installation guidelines and R code) is available from http://biostat.jhsph.edu/~iruczins/publications/sm/  相似文献   

Microchip arrays have become one of the most rapidly growing techniques for monitoring gene expression at the genomic level and thereby gaining valuable insight about various important biological mechanisms. Examples of such mechanisms are: identifying disease-causing genes, genes involved in the regulation of some aspect of the cell cycle, etc. In this article, we discuss the problem of estimating gene expression based on a proper statistical model. More precisely, we show how the model introduced by Li and Wong can be used in its full bivariate generality to provide a new measure of gene expression from high-density oligonucleotide arrays. We also present a second gene expression index based on a new way of reducing the model into a simpler univariate model. In both cases, the gene expression indices are shown to be unbiased and to have lower variance than the established ones. Moreover, we present a bootstrap method aiming at providing non-parametric confidence intervals for the expression index.  相似文献   



High-density oligonucleotide arrays have become a valuable tool for high-throughput gene expression profiling. Increasing the array information density and improving the analysis algorithms are two important computational research topics.


A new algorithm, Match-Only Integral Distribution (MOID), was developed to analyze high-density oligonucleotide arrays. Using known data from both spiking experiments and no-change experiments performed with Affymetrix GeneChip® arrays, MOID and the Affymetrix algorithm implemented in Microarray Suite 4.0 (MAS4) were compared. While MOID gave similar performance to MAS4 in the spiking experiments, better performance was observed in the no-change experiments. MOID also provides a set of alternative statistical analysis tools to MAS4. There are two main features that distinguish MOID from MAS4. First, MOID uses continuous P values for the likelihood of gene presence, while MAS4 resorts to discrete absolute calls. Secondly, MOID uses heuristic confidence intervals for both gene expression levels and fold change values, while MAS4 categorizes the significance of gene expression level changes into discrete fold change calls.


The results show that by using MOID, Affymetrix GeneChip® arrays may need as little as ten probes per gene without compromising analysis accuracy.  相似文献   

A systems-level understanding of a small but essential population of cells in development or adulthood (e.g. somatic stem cells) requires accurate quantitative monitoring of genome-wide gene expression, ideally from single cells. We report here a strategy to globally amplify mRNAs from single cells for highly quantitative high-density oligonucleotide microarray analysis that combines a small number of directional PCR cycles with subsequent linear amplification. Using this strategy, both the representation of gene expression profiles and reproducibility between individual experiments are unambiguously improved from the original method, along with high coverage and accuracy. The immediate application of this method to single cells in the undifferentiated inner cell masses of mouse blastocysts at embryonic day (E) 3.5 revealed the presence of two populations of cells, one with primitive endoderm (PE) expression and the other with pluripotent epiblast-like gene expression. The genes expressed differentially between these two populations were well preserved in morphologically differentiated PE and epiblast in the embryos one day later (E4.5), demonstrating that the method successfully detects subtle but essential differences in gene expression at the single-cell level among seemingly homogeneous cell populations. This study provides a strategy to analyze biophysical events in medicine as well as in neural, stem cell and developmental biology, where small numbers of distinctive or diseased cells play critical roles.  相似文献   

In most microarray technologies, a number of critical stepsare required to convert raw intensity measurements into thedata relied upon by data analysts, biologists, and clinicians.These data manipulations, referred to as preprocessing, caninfluence the quality of the ultimate measurements. In the lastfew years, the high-throughput measurement of gene expressionis the most popular application of microarray technology. Forthis application, various groups have demonstrated that theuse of modern statistical methodology can substantially improveaccuracy and precision of the gene expression measurements,relative to ad hoc procedures introduced by designers and manufacturersof the technology. Currently, other applications of microarraysare becoming more and more popular. In this paper, we describea preprocessing methodology for a technology designed for theidentification of DNA sequence variants in specific genes orregions of the human genome that are associated with phenotypesof interest such as disease. In particular, we describe a methodologyuseful for preprocessing Affymetrix single-nucleotide polymorphismchips and obtaining genotype calls with the preprocessed data.We demonstrate how our procedure improves existing approachesusing data from 3 relatively large studies including the onein which large numbers of independent calls are available. Theproposed methods are implemented in the package oligo availablefrom Bioconductor.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号