首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: Many biomedical experiments are carried out by pooling individual biological samples. However, pooling samples can potentially hide biological variance and give false confidence concerning the data significance. In the context of microarray experiments for detecting differentially expressed genes, recent publications have addressed the problem of the efficiency of sample pooling, and some approximate formulas were provided for the power and sample size calculations. It is desirable to have exact formulas for these calculations and have the approximate results checked against the exact ones. We show that the difference between the approximate and the exact results can be large. RESULTS: In this study, we have characterized quantitatively the effect of pooling samples on the efficiency of microarray experiments for the detection of differential gene expression between two classes. We present exact formulas for calculating the power of microarray experimental designs involving sample pooling and technical replications. The formulas can be used to determine the total number of arrays and biological subjects required in an experiment to achieve the desired power at a given significance level. The conditions under which pooled design becomes preferable to non-pooled design can then be derived given the unit cost associated with a microarray and that with a biological subject. This paper thus serves to provide guidance on sample pooling and cost-effectiveness. The formulation in this paper is outlined in the context of performing microarray comparative studies, but its applicability is not limited to microarray experiments. It is also applicable to a wide range of biomedical comparative studies where sample pooling may be involved.  相似文献   

2.
Determining sample sizes for microarray experiments is important but the complexity of these experiments, and the large amounts of data they produce, can make the sample size issue seem daunting, and tempt researchers to use rules of thumb in place of formal calculations based on the goals of the experiment. Here we present formulae for determining sample sizes to achieve a variety of experimental goals, including class comparison and the development of prognostic markers. Results are derived which describe the impact of pooling, technical replicates and dye-swap arrays on sample size requirements. These results are shown to depend on the relative sizes of different sources of variability. A variety of common types of experimental situations and designs used with single-label and dual-label microarrays are considered. We discuss procedures for controlling the false discovery rate. Our calculations are based on relatively simple yet realistic statistical models for the data, and provide straightforward sample size calculation formulae.  相似文献   

3.
Conventional statistical methods for interpreting microarray data require large numbers of replicates in order to provide sufficient levels of sensitivity. We recently described a method for identifying differentially-expressed genes in one-channel microarray data 1. Based on the idea that the variance structure of microarray data can itself be a reliable measure of noise, this method allows statistically sound interpretation of as few as two replicates per treatment condition. Unlike the one-channel array, the two-channel platform simultaneously compares gene expression in two RNA samples. This leads to covariation of the measured signals. Hence, by accounting for covariation in the variance model, we can significantly increase the power of the statistical test. We believe that this approach has the potential to overcome limitations of existing methods. We present here a novel approach for the analysis of microarray data that involves modeling the variance structure of paired expression data in the context of a Bayesian framework. We also describe a novel statistical test that can be used to identify differentially-expressed genes. This method, bivariate microarray analysis (BMA), demonstrates dramatically improved sensitivity over existing approaches. We show that with only two array replicates, it is possible to detect gene expression changes that are at best detected with six array replicates by other methods. Further, we show that combining results from BMA with Gene Ontology annotation yields biologically significant results in a ligand-treated macrophage cell system.  相似文献   

4.
If biological questions are to be answered using quantitative proteomics, it is essential to design experiments which have sufficient power to be able to detect changes in expression. Sample subpooling is a strategy that can be used to reduce the variance but still allow studies to encompass biological variation. Underlying sample pooling strategies is the biological averaging assumption that the measurements taken on the pool are equal to the average of the measurements taken on the individuals. This study finds no evidence of a systematic bias triggered by sample pooling for DIGE and that pooling can be useful in reducing biological variation. For the first time in quantitative proteomics, the two sources of variance were decoupled and it was found that technical variance predominates for mouse brain, while biological variance predominates for human brain. A power analysis found that as the number of individuals pooled increased, then the number of replicates needed declined but the number of biological samples increased. Repeat measures of biological samples decreased the numbers of samples required but increased the number of gels needed. An example cost benefit analysis demonstrates how researchers can optimise their experiments while taking into account the available resources.  相似文献   

5.
Designing microarray experiments, scientists are often confronted with the question of pooling due to financial constraints, but discussion of the validity of pooling tends toward a sub-pooling recommendation. Since complete pooling protocols can be considered part of sub-pooling designs, gene expression data from three complete pooling experiments were analyzed. Data from complete pooled versus individual mRNA samples of rat brain tissue were compared to answer the question whether the pooled sample represents individual samples in small-sized experiments. Our analytic approach provided clear results concerning the Affymetrix MAS 5.0 signal and detection call parameters. Despite a strong similarity of arrays within experimental groups, the individual signals were evidently not appropriately represented in the pooled sample, with slightly more than half of all the genes considered. Our analysis reveals problems in cases of small complete pooling designs with less than six subjects pooled.  相似文献   

6.
The effect of replication on gene expression microarray experiments   总被引:5,自引:0,他引:5  
MOTIVATION: We examine the effect of replication on the detection of apparently differentially expressed genes in gene expression microarray experiments. Our analysis is based on a random sampling approach using real data sets from 16 published studies. We consider both the ability to find genes that meet particular statistical criteria as well as the stability of the results in the face of changing levels of replication. RESULTS: While dependent on the data source, our findings suggest that stable results are typically not obtained until at least five biological replicates have been used. Conversely, for most studies, 10-15 replicates yield results that are quite stable, and there is less improvement in stability as the number of replicates is further increased. Our methods will be of use in evaluating existing data sets and in helping to design new studies.  相似文献   

7.
Variance stabilization is a step in the preprocessing of microarray data that can greatly benefit the performance of subsequent statistical modeling and inference. Due to the often limited number of technical replicates for Affymetrix and cDNA arrays, achieving variance stabilization can be difficult. Although the Illumina microarray platform provides a larger number of technical replicates on each array (usually over 30 randomly distributed beads per probe), these replicates have not been leveraged in the current log2 data transformation process. We devised a variance-stabilizing transformation (VST) method that takes advantage of the technical replicates available on an Illumina microarray. We have compared VST with log2 and Variance-stabilizing normalization (VSN) by using the Kruglyak bead-level data (2006) and Barnes titration data (2005). The results of the Kruglyak data suggest that VST stabilizes variances of bead-replicates within an array. The results of the Barnes data show that VST can improve the detection of differentially expressed genes and reduce false-positive identifications. We conclude that although both VST and VSN are built upon the same model of measurement noise, VST stabilizes the variance better and more efficiently for the Illumina platform by leveraging the availability of a larger number of within-array replicates. The algorithms and Supplementary Data are included in the lumi package of Bioconductor, available at: www.bioconductor.org.  相似文献   

8.
A statistical model is proposed for the analysis of errors in microarray experiments and is employed in the analysis and development of a combined normalisation regime. Through analysis of the model and two-dye microarray data sets, this study found the following. The systematic error introduced by microarray experiments mainly involves spot intensity-dependent, feature-specific and spot position-dependent contributions. It is difficult to remove all these errors effectively without a suitable combined normalisation operation. Adaptive normalisation using a suitable regression technique is more effective in removing spot intensity-related dye bias than self-normalisation, while regional normalisation (block normalisation) is an effective way to correct spot position-dependent errors. However, dye-flip replicates are necessary to remove feature-specific errors, and also allow the analyst to identify the experimentally introduced dye bias contained in non-self-self data sets. In this case, the bias present in the data sets may include both experimentally introduced dye bias and the biological difference between two samples. Self-normalisation is capable of removing dye bias without identifying the nature of that bias. The performance of adaptive normalisation, on the other hand, depends on its ability to correctly identify the dye bias. If adaptive normalisation is combined with an effective dye bias identification method then there is no systematic difference between the outcomes of the two methods.  相似文献   

9.
MOTIVATION: Microarrays can simultaneously measure the expression levels of many genes and are widely applied to study complex biological problems at the genetic level. To contain costs, instead of obtaining a microarray on each individual, mRNA from several subjects can be first pooled and then measured with a single array. mRNA pooling is also necessary when there is not enough mRNA from each subject. Several studies have investigated the impact of pooling mRNA on inferences about gene expression, but have typically modeled the process of pooling as if it occurred in some transformed scale. This assumption is unrealistic. RESULTS: We propose modeling the gene expression levels in a pool as a weighted average of mRNA expression of all individuals in the pool on the original measurement scale, where the weights correspond to individual sample contributions to the pool. Based on these improved statistical models, we develop the appropriate F statistics to test for differentially expressed genes. We present formulae to calculate the power of various statistical tests under different strategies for pooling mRNA and compare resulting power estimates to those that would be obtained by following the approach proposed by Kendziorski et al. (2003). We find that the Kendziorski estimate tends to exceed true power and that the estimate we propose, while somewhat conservative, is less biased. We argue that it is possible to design a study that includes mRNA pooling at a significantly reduced cost but with little loss of information.  相似文献   

10.
MOTIVATION: There is a very large and growing level of effort toward improving the platforms, experiment designs, and data analysis methods for microarray expression profiling. Along with a growing richness in the approaches there is a growing confusion among most scientists as to how to make objective comparisons and choices between them for different applications. There is a need for a standard framework for the microarray community to compare and improve analytical and statistical methods. RESULTS: We report on a microarray data set comprising 204 in-situ synthesized oligonucleotide arrays, each hybridized with two-color cDNA samples derived from 20 different human tissues and cell lines. Design of the approximately 24 000 60mer oligonucleotides that report approximately 2500 known genes on the arrays, and design of the hybridization experiments, were carried out in a way that supports the performance assessment of alternative data processing approaches and of alternative experiment and array designs. We also propose standard figures of merit for success in detecting individual differential expression changes or expression levels, and for detecting similarities and differences in expression patterns across genes and experiments. We expect this data set and the proposed figures of merit will provide a standard framework for much of the microarray community to compare and improve many analytical and statistical methods relevant to microarray data analysis, including image processing, normalization, error modeling, combining of multiple reporters per gene, use of replicate experiments, and sample referencing schemes in measurements based on expression change. AVAILABILITY/SUPPLEMENTARY INFORMATION: Expression data and supplementary information are available at http://www.rii.com/publications/2003/HE_SDS.htm  相似文献   

11.
12.
The loop design of Kerr and Churchill is a clever application of incomplete blocks of size 2 to two-channel microarray experiments. In this paper, we extend the loop design to include more replicates, biological and technical replication, multi-factor experiments, and blocking. Loop and extended loop designs are shown to be more efficient than the reference design for any given number of arrays. We also show that adding new treatments to a loop design requires the same number of additional arrays as adding treatments to a reference design, with a greater gain in power. Given the flexibility of extended loop designs and their power, we propose that these should be the designs of choice for most experiments using two-channel microarrays.  相似文献   

13.
Comparison of microarray designs for class comparison and class discovery   总被引:4,自引:0,他引:4  
MOTIVATION: Two-color microarray experiments in which an aliquot derived from a common RNA sample is placed on each array are called reference designs. Traditionally, microarray experiments have used reference designs, but designs without a reference have recently been proposed as alternatives. RESULTS: We develop a statistical model that distinguishes the different levels of variation typically present in cancer data, including biological variation among RNA samples, experimental error and variation attributable to phenotype. Within the context of this model, we examine the reference design and two designs which do not use a reference, the balanced block design and the loop design, focusing particularly on efficiency of estimates and the performance of cluster analysis. We calculate the relative efficiency of designs when there are a fixed number of arrays available, and when there are a fixed number of samples available. Monte Carlo simulation is used to compare the designs when the objective is class discovery based on cluster analysis of the samples. The number of discrepancies between the estimated clusters and the true clusters were significantly smaller for the reference design than for the loop design. The efficiency of the reference design relative to the loop and block designs depends on the relation between inter- and intra-sample variance. These results suggest that if cluster analysis is a major goal of the experiment, then a reference design is preferable. If identification of differentially expressed genes is the main concern, then design selection may involve a consideration of several factors.  相似文献   

14.
Acquisition of microarray data is prone to systematic errors. A correction, called normalisation, must be applied to the data before further analysis is performed. With many normalisation techniques published and in use, the best way of executing this correction remains an open question. In this study, a variety of single-slide normalisation techniques, and different parameter settings for these techniques, were compared over many replicated microarray experiments. Different normalisation techniques were assessed through the distribution of the standard deviation of replicates from one biological sample across different slides. It is shown that local normalisation outperformed global normalisation, and intensity-based 'LOWESS' outperformed trimmed mean and median normalisation techniques. Overall, the top performing normalisation technique was a print-tip-based LOWESS with zero robust iterations. Lastly, we validated this evaluation methodology by examining the ability to predict oestrogen receptor-positive and -negative breast cancer samples with data that had been normalised using different techniques.  相似文献   

15.
16.
The determination of a list of differentially expressed genes is a basic objective in many cDNA microarray experiments. We present a statistical approach that allows direct control over the percentage of false positives in such a list and, under certain reasonable assumptions, improves on existing methods with respect to the percentage of false negatives. The method accommodates a wide variety of experimental designs and can simultaneously assess significant differences between multiple types of biological samples. Two interconnected mixed linear models are central to the method and provide a flexible means to properly account for variability both across and within genes. The mixed model also provides a convenient framework for evaluating the statistical power of any particular experimental design and thus enables a researcher to a priori select an appropriate number of replicates. We also suggest some basic graphics for visualizing lists of significant genes. Analyses of published experiments studying human cancer and yeast cells illustrate the results.  相似文献   

17.
In the last years, biostatistical research has begun to apply linear models and design theory to develop efficient experimental designs and analysis tools for gene expression microarray data. With two-colour microarrays, direct comparisons of RNA-targets are possible and lead to incomplete block designs. In this setting, efficient designs for simple and factorial microarray experiments have mainly been proposed for technical replicates. But for biological replicates, which are crucial to obtain inference that can be generalised to a biological population, this question has only been discussed recently and is not fully solved yet. In this paper, we propose efficient designs for independent two-sample experiments using two-colour microarrays enabling biologists to measure their biological random samples in an efficient manner to draw generalisable conclusions. We give advice for experimental situations with differing group sizes and show the impact of different designs on the variance and degrees of freedom of the test statistics. The designs proposed in this paper can be evaluated using SAS PROC MIXED or S+/R lme.  相似文献   

18.
An expressed sequence tag database from immune tissues was used to design the first high-density turbot (Scophthalmus maximus) oligo-microarray with the aim of identifying candidate genes for tolerance to pathogens. Specific oligonucleotides (60 mers) were successfully designed for 2,716 out of 3,482 unique sequences of the database. An Agilent custom oligo-microarray 8 × 15 k (five replicates/gene; eight microarrays/slide) was constructed. The performance of the microarray and the sources of variation along microarray analysis were examined on spleen pools of controls and Aeromonas salmonicida-challenged fish at 3 days postinfection. Only 48 out of 2,716 probes did not show signal of hybridization on the 32 microarrays employed, thus demonstrating the consistency of the bioinformatic applications of our database. An asymmetric hierarchical design was employed to ascertain the noise associated with biological and technical (RNA extraction, labeling, hybridization, slide, and dye bias) factors using 1C and 2C labeling approaches. The high correlation coefficient between replicates at most factors tested demonstrated the high reproducibility of the signal. An analysis of random-effects variance revealed that technical variation was mostly negligible, and biological variation represented the main factor, even using pooled samples. One-color approach performed at least as well as 2C, suggesting their usefulness due to its higher design flexibility and lower cost. A relevant proportion of genes turn out to be differentially labeled depending on fluorophore, which alerts for the likely need of swapping replication in 2C experiments. A set of differentially expressed genes and enriched functions related to immune/defense response were detected at 3 days postchallenging.  相似文献   

19.
DNA microarray experiments have become a widely used tool for studying gene expression. An important, but difficult, part of these experiments is deciding on the appropriate number of biological replicates to use. Often, researchers will want a number of replicates that give sufficient power to recognize regulated genes while controlling the false discovery rate (FDR) at an acceptable level. Recent advances in statistical methodology can now help to resolve this issue. Before using such methods it is helpful to understand the reasoning behind them. In this Research Focus article we explain, in an intuitive way, the effect sample size has on the FDR and power, and then briefly survey some recently proposed methods in this field of research and provide an example of use.  相似文献   

20.
A. Darvasi  M. Soller 《Genetics》1994,138(4):1365-1373
Selective genotyping is a method to reduce costs in marker-quantitative trait locus (QTL) linkage determination by genotyping only those individuals with extreme, and hence most informative, quantitative trait values. The DNA pooling strategy (termed: ``selective DNA pooling') takes this one step further by pooling DNA from the selected individuals at each of the two phenotypic extremes, and basing the test for linkage on marker allele frequencies as estimated from the pooled samples only. This can reduce genotyping costs of marker-QTL linkage determination by up to two orders of magnitude. Theoretical analysis of selective DNA pooling shows that for experiments involving backcross, F(2) and half-sib designs, the power of selective DNA pooling for detecting genes with large effect, can be the same as that obtained by individual selective genotyping. Power for detecting genes with small effect, however, was found to decrease strongly with increase in the technical error of estimating allele frequencies in the pooled samples. The effect of technical error, however, can be markedly reduced by replication of technical procedures. It is also shown that a proportion selected of 0.1 at each tail will be appropriate for a wide range of experimental conditions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号