首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
MOTIVATION: There is a very large and growing level of effort toward improving the platforms, experiment designs, and data analysis methods for microarray expression profiling. Along with a growing richness in the approaches there is a growing confusion among most scientists as to how to make objective comparisons and choices between them for different applications. There is a need for a standard framework for the microarray community to compare and improve analytical and statistical methods. RESULTS: We report on a microarray data set comprising 204 in-situ synthesized oligonucleotide arrays, each hybridized with two-color cDNA samples derived from 20 different human tissues and cell lines. Design of the approximately 24 000 60mer oligonucleotides that report approximately 2500 known genes on the arrays, and design of the hybridization experiments, were carried out in a way that supports the performance assessment of alternative data processing approaches and of alternative experiment and array designs. We also propose standard figures of merit for success in detecting individual differential expression changes or expression levels, and for detecting similarities and differences in expression patterns across genes and experiments. We expect this data set and the proposed figures of merit will provide a standard framework for much of the microarray community to compare and improve many analytical and statistical methods relevant to microarray data analysis, including image processing, normalization, error modeling, combining of multiple reporters per gene, use of replicate experiments, and sample referencing schemes in measurements based on expression change. AVAILABILITY/SUPPLEMENTARY INFORMATION: Expression data and supplementary information are available at http://www.rii.com/publications/2003/HE_SDS.htm  相似文献   

2.
Microarrays are used to study gene expression in a variety of biological systems. A number of different platforms have been developed, but few studies exist that have directly compared the performance of one platform with another. The goal of this study was to determine array variation by analyzing the same RNA samples with three different array platforms. Using gene expression responses to benzo[a]pyrene exposure in normal human mammary epithelial cells (NHMECs), we compared the results of gene expression profiling using three microarray platforms: photolithographic oligonucleotide arrays (Affymetrix), spotted oligonucleotide arrays (Amersham), and spotted cDNA arrays (NCI). While most previous reports comparing microarrays have analyzed pre-existing data from different platforms, this comparison study used the same sample assayed on all three platforms, allowing for analysis of variation from each array platform. In general, poor correlation was found with corresponding measurements from each platform. Each platform yielded different gene expression profiles, suggesting that while microarray analysis is a useful discovery tool, further validation is needed to extrapolate results for broad use of the data. Also, microarray variability needs to be taken into consideration, not only in the data analysis but also in specific probe selection for each array type.  相似文献   

3.
MOTIVATION: The numerical values of gene expression measured using microarrays are usually presented to the biological end-user as summary statistics of spot pixel data, such as the spot mean, median and mode. Much of the subsequent data analysis reported in the literature, however, uses only one of these spot statistics. This results in sub-optimal estimates of gene expression levels and a need for improvement in quantitative spot variation surveillance. RESULTS: This paper develops a maximum-likelihood method for estimating gene expression using spot mean, variance and pixel number values available from typical microarray scanners. It employs a hierarchical model of variation between and within microarray spots. The hierarchical maximum-likelihood estimate (MLE) is shown to be a more efficient estimator of the mean than the 'conventional' estimate using solely the spot mean values (i.e. without spot variance data). Furthermore, under the assumptions of our model, the spot mean and spot variance are shown to be sufficient statistics that do not require the use of all pixel data.The hierarchical MLE method is applied to data from both Monte Carlo (MC) simulations and a two-channel dye-swapped spotted microarray experiment. The MC simulations show that the hierarchical MLE method leads to improved detection of differential gene expression particularly when 'outlier' spots are present on the arrays. Compared with the conventional method, the MLE method applied to data from the microarray experiment leads to an increase in the number of differentially expressed genes detected for low cut-off P-values of interest.  相似文献   

4.
MOTIVATION: Spotted arrays are often printed with probes in duplicate or triplicate, but current methods for assessing differential expression are not able to make full use of the resulting information. The usual practice is to average the duplicate or triplicate results for each probe before assessing differential expression. This results in the loss of valuable information about genewise variability. RESULTS: A method is proposed for extracting more information from within-array replicate spots in microarray experiments by estimating the strength of the correlation between them. The method involves fitting separate linear models to the expression data for each gene but with a common value for the between-replicate correlation. The method greatly improves the precision with which the genewise variances are estimated and thereby improves inference methods designed to identify differentially expressed genes. The method may be combined with empirical Bayes methods for moderating the genewise variances between genes. The method is validated using data from a microarray experiment involving calibration and ratio control spots in conjunction with spiked-in RNA. Comparing results for calibration and ratio control spots shows that the common correlation method results in substantially better discrimination of differentially expressed genes from those which are not. The spike-in experiment also confirms that the results may be further improved by empirical Bayes smoothing of the variances when the sample size is small. AVAILABILITY: The methodology is implemented in the limma software package for R, available from the CRAN repository http://www.r-project.org  相似文献   

5.
Rosetta error model for gene expression analysis   总被引:4,自引:0,他引:4  
MOTIVATION: In microarray gene expression studies, the number of replicated microarrays is usually small because of cost and sample availability, resulting in unreliable variance estimation and thus unreliable statistical hypothesis tests. The unreliable variance estimation is further complicated by the fact that the technology-specific variance is intrinsically intensity-dependent. RESULTS: The Rosetta error model captures the variance-intensity relationship for various types of microarray technologies, such as single-color arrays and two-color arrays. This error model conservatively estimates intensity error and uses this value to stabilize the variance estimation. We present two commonly used error models: the intensity error-model for single-color microarrays and the ratio error model for two-color microarrays or ratios built from two single-color arrays. We present examples to demonstrate the strength of our error models in improving statistical power of microarray data analysis, particularly, in increasing expression detection sensitivity and specificity when the number of replicates is limited.  相似文献   

6.
Using DNA microarrays to study gene expression in closely related species   总被引:6,自引:0,他引:6  
MOTIVATION: Comparisons of gene expression levels within and between species have become a central tool in the study of the genetic basis for phenotypic variation, as well as in the study of the evolution of gene regulation. DNA microarrays are a key technology that enables these studies. Currently, however, microarrays are only available for a small number of species. Thus, in order to study gene expression levels in species for which microarrays are not available, researchers face three sets of choices: (i) use a microarray designed for another species, but only compare gene expression levels within species, (ii) construct a new microarray for every species whose gene expression profiles will be compared or (iii) build a multi-species microarray with probes from each species of interest. Here, we use data collected using a multi-primate cDNA array to evaluate the reliability of each approach. RESULTS: We find that, for inter-species comparisons, estimates of expression differences based on multi-species microarrays are more accurate than those based on multiple species-specific arrays. We also demonstrate that within-species expression differences can be estimated using a microarray for a closely related species, without discernible loss of information. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

7.
8.
INTRODUCTION: Microarray experiments often have complex designs that include sample pooling, biological and technical replication, sample pairing and dye-swapping. This article demonstrates how statistical modelling can illuminate issues in the design and analysis of microarray experiments, and this information can then be used to plan effective studies. METHODS: A very detailed statistical model for microarray data is introduced, to show the possible sources of variation that are present in even the simplest microarray experiments. Based on this model, the efficacy of common experimental designs, normalisation methodologies and analyses is determined. RESULTS: When the cost of the arrays is high compared with the cost of samples, sample pooling and spot replication are shown to be efficient variance reduction methods, whereas technical replication of whole arrays is demonstrated to be very inefficient. Dye-swap designs can use biological replicates rather than technical replicates to improve efficiency and simplify analysis. When the cost of samples is high and technical variation is a major portion of the error, technical replication can be cost effective. Normalisation by centreing on a small number of spots may reduce array effects, but can introduce considerable variation in the results. Centreing using the bulk of spots on the array is less variable. Similarly, normalisation methods based on regression methods can introduce variability. Except for normalisation methods based on spiking controls, all normalisation requires that most genes do not differentially express. Methods based on spatial location and/or intensity also require that the nondifferentially expressing genes are at random with respect to location and intensity. Spotting designs should be carefully done so that spot replicates are widely spaced on the array, and genes with similar expression patterns are not clustered. DISCUSSION: The tools for statistical design of experiments can be applied to microarray experiments to improve both efficiency and validity of the studies. Given the high cost of microarray experiments, the benefits of statistical input prior to running the experiment cannot be over-emphasised.  相似文献   

9.
The microarray gene expression markup language (MAGE-ML) is a widely used XML (eXtensible Markup Language) standard for describing and exchanging information about microarray experiments. It can describe microarray designs, microarray experiment designs, gene expression data and data analysis results. We describe RMAGEML, a new Bioconductor package that provides a link between cDNA microarray data stored in MAGE-ML format and the Bioconductor framework for preprocessing, visualization and analysis of microarray experiments. AVAILABILITY: http://www.bioconductor.org. Open Source.  相似文献   

10.
In the past several years, oligonucleotide microarrays have emerged as a widely used tool for the simultaneous, non-biased measurement of expression levels for thousands of genes. Several challenges exist in successfully utilizing this biotechnology; principal among these is analysis of microarray data. An experiment to measure differential gene expression can consist of a dozen microarrays, each consisting of over a hundred thousand data points. Previously, we have described the use of a novel algorithm for analyzing oligonucleotide microarrays and assessing changes in gene expression. This algorithm describes changes in expression in terms of the statistical significance (S-score) of change, which combines signals detected by multiple probe pairs according to an error model characteristic of oligonucleotide arrays. Software is available that simplifies the use of the application of this algorithm so that it may be applied to improving the analysis of oligonucleotide microarray data. The application of this method to problems of the central nervous system is discussed.  相似文献   

11.
In the relatively short period since their development, DNA microarrays have been used increasingly in the study of genetic and cellular processes, thereby offering a genome-wide approach to gene expression studies. With the advent of genome sequencing programs for organisms from yeast to man, the number of organisms which now have ready-made commercial arrays continues to increase. Here, the principle of DNA microarrays is introduced, with particular attention being given to the role of this technology in studies of the nervous system of the fruitfly Drosophila melanogaster. The importance of experimental design and sample preparation, in line with minimum information about microarray experiment (MIAME) compliance, is emphasised. The technical platforms available to the Drosophila neurobiologist have been illustrated and a brief number of data analysis tools that are readily available reviewed.  相似文献   

12.
Human microarrays are readily available, and it would be advantageous if they could be used to study gene expression in other species, such as pigs. The objectives of this research were to validate the use of human microarrays in the analysis of porcine gene expression, to assess the variability of the data generated, and to compare gene expression in boars with different levels of steroidogenesis. Cytochrome b5 (CYB5) expression was used to assess array detection sensitivity. Samples having high or low CYB5 RNA levels were hybridized to microarrays to determine if the known expression difference could be detected. Six hybridizations were conducted using human microarrays containing 3840 total spots representing 1718 characterized human ESTs. To analyze gene expression in boars with different levels of steroidogenesis, testis RNA from four boars with high levels of plasma estrone sulphate was hybridized to testis RNA from four boars with lower levels. Eight microarray hybridizations were conducted including fluor-flips. Self-self hybridizations were also conducted to assess the variability of array experiments. The Cy5 and Cy3 intensity values for each array were normalized using a locally weighted linear regression (LOESS). Statistical significance was assessed using a Student's t-test followed by the Benjamini and Hochberg multiple testing correction procedure. Quantitative real-time PCR (Q-RT-PCR) was used to verify select gene expression differences. The results show that CYB5 was significantly overexpressed in the high CYB5 sample by 1.8 fold (P < 0.05), verifying the known expression difference. The average log2 ratio of the majority of genes (1643) falls within one standard deviation of the mean, indicating the data were reproducible. In the high versus low steroidogenesis experiment, seven genes were significantly overexpressed in the high group (P < 0.05). Quantitative real-time PCR was used to validate five genes with the highest fold change, and the results corroborated those found by the microarray experiments. The results of the self-self hybridizations showed that no genes were significantly differentially expressed following the application of the Benjamini and Hochberg multiple testing correction procedure. The results presented in this report show that human arrays can be used for gene expression analysis in pigs.  相似文献   

13.
Contemporary high dimensional biological assays, such as mRNA expression microarrays, regularly involve multiple data processing steps, such as experimental processing, computational processing, sample selection, or feature selection (i.e. gene selection), prior to deriving any biological conclusions. These steps can dramatically change the interpretation of an experiment. Evaluation of processing steps has received limited attention in the literature. It is not straightforward to evaluate different processing methods and investigators are often unsure of the best method. We present a simple statistical tool, Standardized WithIn class Sum of Squares (SWISS), that allows investigators to compare alternate data processing methods, such as different experimental methods, normalizations, or technologies, on a dataset in terms of how well they cluster a priori biological classes. SWISS uses Euclidean distance to determine which method does a better job of clustering the data elements based on a priori classifications. We apply SWISS to three different gene expression applications. The first application uses four different datasets to compare different experimental methods, normalizations, and gene sets. The second application, using data from the MicroArray Quality Control (MAQC) project, compares different microarray platforms. The third application compares different technologies: a single Agilent two-color microarray versus one lane of RNA-Seq. These applications give an indication of the variety of problems that SWISS can be helpful in solving. The SWISS analysis of one-color versus two-color microarrays provides investigators who use two-color arrays the opportunity to review their results in light of a single-channel analysis, with all of the associated benefits offered by this design. Analysis of the MACQ data shows differential intersite reproducibility by array platform. SWISS also shows that one lane of RNA-Seq clusters data by biological phenotypes as well as a single Agilent two-color microarray.  相似文献   

14.
15.
Human microarrays are readily available, and it would be advantageous if they could be used to study gene expression in other species, such as pigs. The objectives of this research were to validate the use of human microarrays in the analysis of porcine gene expression, to assess the variability of the data generated, and to compare gene expression in boars with different levels of steroidogenesis. Cytochrome b5 (CYB5) expression was used to assess array detection sensitivity. Samples having high or low CYB5 RNA levels were hybridized to microarrays to determine if the known expression difference could be detected. Six hybridizations were conducted using human microarrays containing 3840 total spots representing 1718 characterized human ESTs. To analyze gene expression in boars with different levels of steroidogenesis, testis RNA from four boars with high levels of plasma estrone sulphate was hybridized to testis RNA from four boars with lower levels. Eight microarray hybridizations were conducted including fluor-flips. Self-self hybridizations were also conducted to assess the variability of array experiments. The Cy5 and Cy3 intensity values for each array were normalized using a locally weighted linear regression (LOESS). Statistical significance was assessed using a Student's t-test followed by the Benjamini and Hochberg multiple testing correction procedure. Quantitative real-time PCR (Q-RT-PCR) was used to verify select gene expression differences. The results show that CYB5 was significantly overexpressed in the high CYB5 sample by 1.8 fold (P < 0.05), verifying the known expression difference. The average log2 ratio of the majority of genes (1643) falls within one standard deviation of the mean, indicating the data were reproducible. In the high versus low steroidogenesis experiment, seven genes were significantly overexpressed in the high group (P < 0.05). Quantitative real-time PCR was used to validate five genes with the highest fold change, and the results corroborated those found by the microarray experiments. The results of the self-self hybridizations showed that no genes were significantly differentially expressed following the application of the Benjamini and Hochberg multiple testing correction procedure. The results presented in this report show that human arrays can be used for gene expression analysis in pigs.  相似文献   

16.
17.
In the past several years, oligonucleotide microarrays have emerged as a widely used tool for the simultaneous, non-biased measurement of expression levels for thousands of genes. Several challenges exist in successfully utilizing this biotechnology; principal among these is analysis of microarray data. An experiment to measure differential gene expression can consist of a dozen microarrays, each consisting of over a hundred thousand data points. Previously, we have described the use of a novel algorithm for analyzing oligonucleotide microarrays and assessing changes in gene expression [J. Mol. Biol. 317 (2002) 225]. This algorithm describes changes in expression in terms of the statistical significance (S-score) of change, which combines signals detected by multiple probe pairs according to an error model characteristic of oligonucleotide arrays. Software is available that simplifies the use of the application of this algorithm so that it may be applied to improving the analysis of oligonucleotide microarray data. The application of this method to problems of the central nervous system is discussed.  相似文献   

18.
Statistical inference for microarray experiments usually involves the estimation of error variance for each gene. Because the sample size available for each gene is often low, the usual unbiased estimator of the error variance can be unreliable. Shrinkage methods, including empirical Bayes approaches that borrow information across genes to produce more stable estimates, have been developed in recent years. Because the same microarray platform is often used for at least several experiments to study similar biological systems, there is an opportunity to improve variance estimation further by borrowing information not only across genes but also across experiments. We propose a lognormal model for error variances that involves random gene effects and random experiment effects. Based on the model, we develop an empirical Bayes estimator of the error variance for each combination of gene and experiment and call this estimator BAGE because information is Borrowed Across Genes and Experiments. A permutation strategy is used to make inference about the differential expression status of each gene. Simulation studies with data generated from different probability models and real microarray data show that our method outperforms existing approaches.  相似文献   

19.
MOTIVATION: Many biomedical experiments are carried out by pooling individual biological samples. However, pooling samples can potentially hide biological variance and give false confidence concerning the data significance. In the context of microarray experiments for detecting differentially expressed genes, recent publications have addressed the problem of the efficiency of sample pooling, and some approximate formulas were provided for the power and sample size calculations. It is desirable to have exact formulas for these calculations and have the approximate results checked against the exact ones. We show that the difference between the approximate and the exact results can be large. RESULTS: In this study, we have characterized quantitatively the effect of pooling samples on the efficiency of microarray experiments for the detection of differential gene expression between two classes. We present exact formulas for calculating the power of microarray experimental designs involving sample pooling and technical replications. The formulas can be used to determine the total number of arrays and biological subjects required in an experiment to achieve the desired power at a given significance level. The conditions under which pooled design becomes preferable to non-pooled design can then be derived given the unit cost associated with a microarray and that with a biological subject. This paper thus serves to provide guidance on sample pooling and cost-effectiveness. The formulation in this paper is outlined in the context of performing microarray comparative studies, but its applicability is not limited to microarray experiments. It is also applicable to a wide range of biomedical comparative studies where sample pooling may be involved.  相似文献   

20.
Recent developments in microarray technology make it possible to capture the gene expression profiles for thousands of genes at once. With this data researchers are tackling problems ranging from the identification of 'cancer genes' to the formidable task of adding functional annotations to our rapidly growing gene databases. Specific research questions suggest patterns of gene expression that are interesting and informative: for instance, genes with large variance or groups of genes that are highly correlated. Cluster analysis and related techniques are proving to be very useful. However, such exploratory methods alone do not provide the opportunity to engage in statistical inference. Given the high dimensionality (thousands of genes) and small sample sizes (often <30) encountered in these datasets, an honest assessment of sampling variability is crucial and can prevent the over-interpretation of spurious results. We describe a statistical framework that encompasses many of the analytical goals in gene expression analysis; our framework is completely compatible with many of the current approaches and, in fact, can increase their utility. We propose the use of a deterministic rule, applied to the parameters of the gene expression distribution, to select a target subset of genes that are of biological interest. In addition to subset membership, the target subset can include information about relationships between genes, such as clustering. This target subset presents an interesting parameter that we can estimate by applying the rule to the sample statistics of microarray data. The parametric bootstrap, based on a multivariate normal model, is used to estimate the distribution of these estimated subsets and relevant summary measures of this sampling distribution are proposed. We focus on rules that operate on the mean and covariance. Using Bernstein's Inequality, we obtain consistency of the subset estimates, under the assumption that the sample size converges faster to infinity than the logarithm of the number of genes. We also provide a conservative sample size formula guaranteeing that the sample mean and sample covariance matrix are uniformly within a distance epsilon > 0 of the population mean and covariance. The practical performance of the method using a cluster-based subset rule is illustrated with a simulation study. The method is illustrated with an analysis of a publicly available leukemia data set.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号