首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
MOTIVATION: Despite theoretical arguments that so-called 'loop designs' for two-channel DNA microarray experiments are more efficient, biologists continue to use 'reference designs'. We describe two sets of microarray experiments with RNA from two different biological systems (TPA-stimulated mammalian cells and Streptomyces coelicolor). In each case, both a loop and a reference design were used with the same RNA preparations with the aim of studying their relative efficiency. RESULTS: The results of these experiments show that (1) the loop design attains a much higher precision than the reference design, (2) multiplicative spot effects are a large source of variability, and if they are not accounted for in the mathematical model, for example, by taking log-ratios or including spot effects, then the model will perform poorly. The first result is reinforced by a simulation study. Practical recommendations are given on how simple loop designs can be extended to more realistic experimental designs and how standard statistical methods allow the experimentalist to use and interpret the results from loop designs in practice. AVAILABILITY: The data and R code are available at http://exgen.ma.umist.ac.uk CONTACT: veronica.vinciotti@brunel.ac.uk.  相似文献   

2.
The loop design of Kerr and Churchill is a clever application of incomplete blocks of size 2 to two-channel microarray experiments. In this paper, we extend the loop design to include more replicates, biological and technical replication, multi-factor experiments, and blocking. Loop and extended loop designs are shown to be more efficient than the reference design for any given number of arrays. We also show that adding new treatments to a loop design requires the same number of additional arrays as adding treatments to a reference design, with a greater gain in power. Given the flexibility of extended loop designs and their power, we propose that these should be the designs of choice for most experiments using two-channel microarrays.  相似文献   

3.
Microarray technology is widely applied to address complex scientific questions. However, there remain fundamental issues on how to design experiments to ensure that the resulting data enables robust statistical analysis. Interwoven loop design has several advantages over other designs. However it suffers in the complexity of design. We have implemented an online web application which allows users to find optimal loop designs for two-color microarray experiments. Given a number of conditions (such as treatments or time points) and replicates, the application will find the best possible design of the experiment and output experimental parameters. It is freely available from http://mcbc.usm.edu/iloop.  相似文献   

4.
INTRODUCTION: Microarray experiments often have complex designs that include sample pooling, biological and technical replication, sample pairing and dye-swapping. This article demonstrates how statistical modelling can illuminate issues in the design and analysis of microarray experiments, and this information can then be used to plan effective studies. METHODS: A very detailed statistical model for microarray data is introduced, to show the possible sources of variation that are present in even the simplest microarray experiments. Based on this model, the efficacy of common experimental designs, normalisation methodologies and analyses is determined. RESULTS: When the cost of the arrays is high compared with the cost of samples, sample pooling and spot replication are shown to be efficient variance reduction methods, whereas technical replication of whole arrays is demonstrated to be very inefficient. Dye-swap designs can use biological replicates rather than technical replicates to improve efficiency and simplify analysis. When the cost of samples is high and technical variation is a major portion of the error, technical replication can be cost effective. Normalisation by centreing on a small number of spots may reduce array effects, but can introduce considerable variation in the results. Centreing using the bulk of spots on the array is less variable. Similarly, normalisation methods based on regression methods can introduce variability. Except for normalisation methods based on spiking controls, all normalisation requires that most genes do not differentially express. Methods based on spatial location and/or intensity also require that the nondifferentially expressing genes are at random with respect to location and intensity. Spotting designs should be carefully done so that spot replicates are widely spaced on the array, and genes with similar expression patterns are not clustered. DISCUSSION: The tools for statistical design of experiments can be applied to microarray experiments to improve both efficiency and validity of the studies. Given the high cost of microarray experiments, the benefits of statistical input prior to running the experiment cannot be over-emphasised.  相似文献   

5.
Factorial and time course designs for cDNA microarray experiments   总被引:4,自引:0,他引:4  
Microarrays are powerful tools for surveying the expression levels of many thousands of genes simultaneously. They belong to the new genomics technologies which have important applications in the biological, agricultural and pharmaceutical sciences. There are myriad sources of uncertainty in microarray experiments, and rigorous experimental design is essential for fully realizing the potential of these valuable resources. Two questions frequently asked by biologists on the brink of conducting cDNA or two-colour, spotted microarray experiments are 'Which mRNA samples should be competitively hybridized together on the same slide?' and 'How many times should each slide be replicated?' Early experience has shown that whilst the field of classical experimental design has much to offer this emerging multi-disciplinary area, new approaches which accommodate features specific to the microarray context are needed. In this paper, we propose optimal designs for factorial and time course experiments, which are special designs arising quite frequently in microarray experimentation. Our criterion for optimality is statistical efficiency based on a new notion of admissible designs; our approach enables efficient designs to be selected subject to the information available on the effects of most interest to biologists, the number of arrays available for the experiment, and other resource or practical constraints, including limitations on the amount of mRNA probe. We show that our designs are superior to both the popular reference designs, which are highly inefficient, and to designs incorporating all possible direct pairwise comparisons. Moreover, our proposed designs represent a substantial practical improvement over classical experimental designs which work in terms of standard interactions and main effects. The latter do not provide a basis for meaningful inference on the effects of most interest to biologists, nor make the most efficient use of valuable and limited resources.  相似文献   

6.
MOTIVATION: Consensus clustering, also known as cluster ensemble, is one of the important techniques for microarray data analysis, and is particularly useful for class discovery from microarray data. Compared with traditional clustering algorithms, consensus clustering approaches have the ability to integrate multiple partitions from different cluster solutions to improve the robustness, stability, scalability and parallelization of the clustering algorithms. By consensus clustering, one can discover the underlying classes of the samples in gene expression data. RESULTS: In addition to exploring a graph-based consensus clustering (GCC) algorithm to estimate the underlying classes of the samples in microarray data, we also design a new validation index to determine the number of classes in microarray data. To our knowledge, this is the first time in which GCC is applied to class discovery for microarray data. Given a pre specified maximum number of classes (denoted as K(max) in this article), our algorithm can discover the true number of classes for the samples in microarray data according to a new cluster validation index called the Modified Rand Index. Experiments on gene expression data indicate that our new algorithm can (i) outperform most of the existing algorithms, (ii) identify the number of classes correctly in real cancer datasets, and (iii) discover the classes of samples with biological meaning. AVAILABILITY: Matlab source code for the GCC algorithm is available upon request from Zhiwen Yu.  相似文献   

7.
MOTIVATION: Current Self-Organizing Maps (SOMs) approaches to gene expression pattern clustering require the user to predefine the number of clusters likely to be expected. Hierarchical clustering methods used in this area do not provide unique partitioning of data. We describe an unsupervised dynamic hierarchical self-organizing approach, which suggests an appropriate number of clusters, to perform class discovery and marker gene identification in microarray data. In the process of class discovery, the proposed algorithm identifies corresponding sets of predictor genes that best distinguish one class from other classes. The approach integrates merits of hierarchical clustering with robustness against noise known from self-organizing approaches. RESULTS: The proposed algorithm applied to DNA microarray data sets of two types of cancers has demonstrated its ability to produce the most suitable number of clusters. Further, the corresponding marker genes identified through the unsupervised algorithm also have a strong biological relationship to the specific cancer class. The algorithm tested on leukemia microarray data, which contains three leukemia types, was able to determine three major and one minor cluster. Prediction models built for the four clusters indicate that the prediction strength for the smaller cluster is generally low, therefore labelled as uncertain cluster. Further analysis shows that the uncertain cluster can be subdivided further, and the subdivisions are related to two of the original clusters. Another test performed using colon cancer microarray data has automatically derived two clusters, which is consistent with the number of classes in data (cancerous and normal). AVAILABILITY: JAVA software of dynamic SOM tree algorithm is available upon request for academic use. SUPPLEMENTARY INFORMATION: A comparison of rectangular and hexagonal topologies for GSOM is available from http://www.mame.mu.oz.au/mechatronics/journalinfo/Hsu2003supp.pdf  相似文献   

8.
Qiu W  Lee ML 《Bioinformation》2006,1(7):251-252
Calculation of the appropriate sample size in planning microarray studies is important because sample collection can be expensive and time-consuming. Sample-size calculation is also a challenging issue for microarray studies because the number of genes is far larger than the number of samples so that traditional methods of sample-size calculation cannot be directly applied. To help investigators answer the question of how many samples are needed in their microarray studies, we developed a user-friendly web-based calculator, SPCalc, for calculating sample size and power for a variety of commonly used experimental designs, including completely randomized treatmentcontrol design, matched-pairs design, multiple-treatment design having an isolated treatment effect, and randomized block design. AVAILABILITY: The web-based calculator SPCalc is publicly available at http://www.biostat.harvard.edu /people/faculty/mltlee/webfront-r.html.  相似文献   

9.
Design of microarray experiments for genetical genomics studies   总被引:2,自引:0,他引:2       下载免费PDF全文
Bueno Filho JS  Gilmour SG  Rosa GJ 《Genetics》2006,174(2):945-957
  相似文献   

10.
Experimental design for gene expression microarrays   总被引:19,自引:0,他引:19  
We examine experimental design issues arising with gene expression microarray technology. Microarray experiments have multiple sources of variation, and experimental plans should ensure that effects of interest are not confounded with ancillary effects. A commonly used design is shown to violate this principle and to be generally inefficient. We explore the connection between microarray designs and classical block design and use a family of ANOVA models as a guide to choosing a design. We combine principles of good design and A-optimality to give a general set of recommendations for design with microarrays. These recommendations are illustrated in detail for one kind of experimental objective, where we also give the results of a computer search for good designs.  相似文献   

11.
Summary .   Gene expression microarray experiments are intrinsically two-phase experiments. Messenger RNA (mRNA), required for the microarray experiment, must first be derived from plants or animals that are exposed to a set of treatments in a previous experiment (Phase 1). The mRNA is then used in the subsequent laboratory-based microarray experiment (Phase 2) from which gene expression is measured and ultimately analyzed. We show that obtaining a valid test for the effects of treatments on gene expression depends on the design of both the Phase 1 and Phase 2 experiments. Examples show that the multiple dye-swap design at Phase 2 is more robust than the alternating loop design in the absence of prior knowledge of the relative size of variation in the Phase 1 and Phase 2 experiments.  相似文献   

12.
Gene expression microarray experiments are intrinsically two-phase experiments. Messenger RNA (mRNA), required for the microarray experiment, must first be derived from plants or animals that are exposed to a set of treatments in a previous experiment (Phase 1). The mRNA is then used in the subsequent laboratory-based microarray experiment (Phase 2) from which gene expression is measured and ultimately analyzed. We show that obtaining a valid test for the effects of treatments on gene expression depends on the design of both the Phase 1 and Phase 2 experiments. Examples show that the multiple dye-swap design at Phase 2 is more robust than the alternating loop design in the absence of prior knowledge of the relative size of variation in the Phase 1 and Phase 2 experiments.  相似文献   

13.
Universal mouse reference RNA derived from neonatal mice   总被引:2,自引:0,他引:2  
He XR  Zhang C  Patterson C 《BioTechniques》2004,37(3):464-468
  相似文献   

14.
In the last years, biostatistical research has begun to apply linear models and design theory to develop efficient experimental designs and analysis tools for gene expression microarray data. With two-colour microarrays, direct comparisons of RNA-targets are possible and lead to incomplete block designs. In this setting, efficient designs for simple and factorial microarray experiments have mainly been proposed for technical replicates. But for biological replicates, which are crucial to obtain inference that can be generalised to a biological population, this question has only been discussed recently and is not fully solved yet. In this paper, we propose efficient designs for independent two-sample experiments using two-colour microarrays enabling biologists to measure their biological random samples in an efficient manner to draw generalisable conclusions. We give advice for experimental situations with differing group sizes and show the impact of different designs on the variance and degrees of freedom of the test statistics. The designs proposed in this paper can be evaluated using SAS PROC MIXED or S+/R lme.  相似文献   

15.
Cluster analysis has proven to be a useful tool for investigating the association structure among genes in a microarray data set. There is a rich literature on cluster analysis and various techniques have been developed. Such analyses heavily depend on an appropriate (dis)similarity measure. In this paper, we introduce a general clustering approach based on the confidence interval inferential methodology, which is applied to gene expression data of microarray experiments. Emphasis is placed on data with low replication (three or five replicates). The proposed method makes more efficient use of the measured data and avoids the subjective choice of a dissimilarity measure. This new methodology, when applied to real data, provides an easy-to-use bioinformatics solution for the cluster analysis of microarray experiments with replicates (see the Appendix). Even though the method is presented under the framework of microarray experiments, it is a general algorithm that can be used to identify clusters in any situation. The method's performance is evaluated using simulated and publicly available data set. Our results also clearly show that our method is not an extension of the conventional clustering method based on correlation or euclidean distance.  相似文献   

16.
Statistical design of reverse dye microarrays   总被引:7,自引:0,他引:7  
MOTIVATION: In cDNA microarray experiments all samples are labelled with either Cy3 dye or Cy5 dye. Certain genes exhibit dye bias-a tendency to bind more efficiently to one of the dyes. The common reference design avoids the problem of dye bias by running all arrays 'forward', so that the samples being compared are always labelled with the same dye. But comparison of samples labelled with different dyes is sometimes of interest. In these situations, it is necessary to run some arrays 'reverse'-with the dye labelling reversed-in order to correct for the dye bias. The design of these experiments will impact one's ability to identify genes that are differentially expressed in different tissues or conditions. We address the design issue of how many specimens are needed, how many forward and reverse labelled arrays to perform, and how to optimally assign Cy3 and Cy5 labels to the specimens. RESULTS: We consider three types of experiments for which some reverse labelling is needed: paired samples, samples from two predefined groups, and reference design data when comparison with the reference is of interest. We present simple probability models for the data, derive optimal estimators for relative gene expression, and compare the efficiency of the estimators for a range of designs. In each case, we present the optimal design and sample size formulas. We show that reverse labelling of individual arrays is generally not required.  相似文献   

17.
In this article we propose two practical types of designs for large time-course, dual-channel microarray experiments. One type consists of several interwoven loops, and the other type combines reference and loop designs. By representing the experiment as a graph, where the timepoints are nodes and the arrays are edges, we demonstrate how the time contrasts between any two timepoints can be estimated, provided that there is a path of edges linking them. In addition, we give a general formula for the variance of such contrasts. The efficiency of the proposed designs is evaluated by estimating the variances of the log-ratios of the comparisons of interest.  相似文献   

18.
With the increasing number of studies focusing on PIWI-interacting RNA (piRNAs), it is now pertinent to develop efficient tools dedicated towards piRNA analysis. We have developed a novel cluster prediction tool called PILFER (PIrna cLuster FindER), which can accurately predict piRNA clusters from small RNA sequencing data. PILFER is an open source, easy to use tool, and can be executed even on a personal computer with minimum resources. It uses a sliding-window mechanism by integrating the expression of the reads along with the spatial information to predict the piRNA clusters. We have additionally defined a piRNA analysis pipeline incorporating PILFER to detect and annotate piRNAs and their clusters from raw small RNA sequencing data and implemented it on publicly available data from healthy germline and somatic tissues. We compared PILFER with other existing piRNA cluster prediction tools and found it to be statistically more accurate and superior in many aspects such as the robustness of PILFER clusters is higher and memory efficiency is more. Overall, PILFER provides a fast and accurate solution to piRNA cluster prediction.  相似文献   

19.
MOTIVATION: Many biomedical experiments are carried out by pooling individual biological samples. However, pooling samples can potentially hide biological variance and give false confidence concerning the data significance. In the context of microarray experiments for detecting differentially expressed genes, recent publications have addressed the problem of the efficiency of sample pooling, and some approximate formulas were provided for the power and sample size calculations. It is desirable to have exact formulas for these calculations and have the approximate results checked against the exact ones. We show that the difference between the approximate and the exact results can be large. RESULTS: In this study, we have characterized quantitatively the effect of pooling samples on the efficiency of microarray experiments for the detection of differential gene expression between two classes. We present exact formulas for calculating the power of microarray experimental designs involving sample pooling and technical replications. The formulas can be used to determine the total number of arrays and biological subjects required in an experiment to achieve the desired power at a given significance level. The conditions under which pooled design becomes preferable to non-pooled design can then be derived given the unit cost associated with a microarray and that with a biological subject. This paper thus serves to provide guidance on sample pooling and cost-effectiveness. The formulation in this paper is outlined in the context of performing microarray comparative studies, but its applicability is not limited to microarray experiments. It is also applicable to a wide range of biomedical comparative studies where sample pooling may be involved.  相似文献   

20.
We discuss the definition and application of design criteria for evaluating the efficiency of 2-color microarray designs. First, we point out that design optimality criteria are defined differently for the regression and block design settings. This has caused some confusion in the literature and warrants clarification. Linear models for microarray data analysis have equivalent formulations as ANOVA or regression models. However, this equivalence does not extend to design criteria. We discuss optimality criterion, and argue against applying regression-style D-optimality to the microarray design problem. We further disfavor E- and D-optimality (as defined in block design) because they are not attuned to scientific questions of interest.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号