首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
2.
MOTIVATION: There is a very large and growing level of effort toward improving the platforms, experiment designs, and data analysis methods for microarray expression profiling. Along with a growing richness in the approaches there is a growing confusion among most scientists as to how to make objective comparisons and choices between them for different applications. There is a need for a standard framework for the microarray community to compare and improve analytical and statistical methods. RESULTS: We report on a microarray data set comprising 204 in-situ synthesized oligonucleotide arrays, each hybridized with two-color cDNA samples derived from 20 different human tissues and cell lines. Design of the approximately 24 000 60mer oligonucleotides that report approximately 2500 known genes on the arrays, and design of the hybridization experiments, were carried out in a way that supports the performance assessment of alternative data processing approaches and of alternative experiment and array designs. We also propose standard figures of merit for success in detecting individual differential expression changes or expression levels, and for detecting similarities and differences in expression patterns across genes and experiments. We expect this data set and the proposed figures of merit will provide a standard framework for much of the microarray community to compare and improve many analytical and statistical methods relevant to microarray data analysis, including image processing, normalization, error modeling, combining of multiple reporters per gene, use of replicate experiments, and sample referencing schemes in measurements based on expression change. AVAILABILITY/SUPPLEMENTARY INFORMATION: Expression data and supplementary information are available at http://www.rii.com/publications/2003/HE_SDS.htm  相似文献   

3.
MOTIVATION: We present statistical methods for determining the number of per gene replicate spots required in microarray experiments. The purpose of these methods is to obtain an estimate of the sampling variability present in microarray data, and to determine the number of replicate spots required to achieve a high probability of detecting a significant fold change in gene expression, while maintaining a low error rate. Our approach is based on data from control microarrays, and involves the use of standard statistical estimation techniques. RESULTS: After analyzing two experimental data sets containing control array data, we were able to determine the statistical power available for the detection of significant differential expression given differing levels of replication. The inclusion of replicate spots on microarrays not only allows more accurate estimation of the variability present in an experiment, but more importantly increases the probability of detecting genes undergoing significant fold changes in expression, while substantially decreasing the probability of observing fold changes due to chance rather than true differential expression.  相似文献   

4.
Identifying differential expressed genes across various conditions or genotypes is the most typical approach to studying the regulation of gene expression. An estimate of gene-specific variance is often needed for the assessment of statistical significance in most differential expression (DE) detection methods, including linear models (e.g., for transformed and normalized microarray data) and generalized linear models (e.g., for count data in RNAseq). Due to a common limit in sample size, the variance estimate is often unstable in small experiments. Shrinkage estimates using empirical Bayes methods have proven useful in improving the variance estimate, hence improving the detection of DE. The most widely used empirical Bayes methods borrow information across genes within the same experiments. In these methods, genes are considered exchangeable or exchangeable conditioning on expression level. We propose, with the increasing accumulation of expression data, borrowing information from historical data on the same gene can provide better estimate of gene-specific variance, thus further improve DE detection. Specifically, we show that the variation of gene expression is truly gene-specific and reproducible between different experiments. We present a new method to establish informative gene-specific prior on the variance of expression using existing public data, and illustrate how to shrink the variance estimate and detect DE. We demonstrate improvement in DE detection under our strategy compared to leading DE detection methods.  相似文献   

5.
Wang Y  Wu C  Ji Z  Wang B  Liang Y 《PloS one》2011,6(5):e20060

Background

We proposed a non-parametric method, named Non-Parametric Change Point Statistic (NPCPS for short), by using a single equation for detecting differential gene expression (DGE) in microarray data. NPCPS is based on the change point theory to provide effective DGE detecting ability.

Methodology

NPCPS used the data distribution of the normal samples as input, and detects DGE in the cancer samples by locating the change point of gene expression profile. An estimate of the change point position generated by NPCPS enables the identification of the samples containing DGE. Monte Carlo simulation and ROC study were applied to examine the detecting accuracy of NPCPS, and the experiment on real microarray data of breast cancer was carried out to compare NPCPS with other methods.

Conclusions

Simulation study indicated that NPCPS was more effective for detecting DGE in cancer subset compared with five parametric methods and one non-parametric method. When there were more than 8 cancer samples containing DGE, the type I error of NPCPS was below 0.01. Experiment results showed both good accuracy and reliability of NPCPS. Out of the 30 top genes ranked by using NPCPS, 16 genes were reported as relevant to cancer. Correlations between the detecting result of NPCPS and the compared methods were less than 0.05, while between the other methods the values were from 0.20 to 0.84. This indicates that NPCPS is working on different features and thus provides DGE identification from a distinct perspective comparing with the other mean or median based methods.  相似文献   

6.
MOTIVATION: Many biomedical experiments are carried out by pooling individual biological samples. However, pooling samples can potentially hide biological variance and give false confidence concerning the data significance. In the context of microarray experiments for detecting differentially expressed genes, recent publications have addressed the problem of the efficiency of sample pooling, and some approximate formulas were provided for the power and sample size calculations. It is desirable to have exact formulas for these calculations and have the approximate results checked against the exact ones. We show that the difference between the approximate and the exact results can be large. RESULTS: In this study, we have characterized quantitatively the effect of pooling samples on the efficiency of microarray experiments for the detection of differential gene expression between two classes. We present exact formulas for calculating the power of microarray experimental designs involving sample pooling and technical replications. The formulas can be used to determine the total number of arrays and biological subjects required in an experiment to achieve the desired power at a given significance level. The conditions under which pooled design becomes preferable to non-pooled design can then be derived given the unit cost associated with a microarray and that with a biological subject. This paper thus serves to provide guidance on sample pooling and cost-effectiveness. The formulation in this paper is outlined in the context of performing microarray comparative studies, but its applicability is not limited to microarray experiments. It is also applicable to a wide range of biomedical comparative studies where sample pooling may be involved.  相似文献   

7.
MOTIVATION: Finding differentially expressed genes is a fundamental objective of a microarray experiment. Numerous methods have been proposed to perform this task. Existing methods are based on point estimates of gene expression level obtained from each microarray experiment. This approach discards potentially useful information about measurement error that can be obtained from an appropriate probe-level analysis. Probabilistic probe-level models can be used to measure gene expression and also provide a level of uncertainty in this measurement. This probe-level measurement error provides useful information which can help in the identification of differentially expressed genes. RESULTS: We propose a Bayesian method to include probe-level measurement error into the detection of differentially expressed genes from replicated experiments. A variational approximation is used for efficient parameter estimation. We compare this approximation with MAP and MCMC parameter estimation in terms of computational efficiency and accuracy. The method is used to calculate the probability of positive log-ratio (PPLR) of expression levels between conditions. Using the measurements from a recently developed Affymetrix probe-level model, multi-mgMOS, we test PPLR on a spike-in dataset and a mouse time-course dataset. Results show that the inclusion of probe-level measurement error improves accuracy in detecting differential gene expression. AVAILABILITY: The MAP approximation and variational inference described in this paper have been implemented in an R package pplr. The MCMC method is implemented in Matlab. Both software are available from http://umber.sbs.man.ac.uk/resources/puma.  相似文献   

8.
The main aims of this study were to determine the effects of GH gene abuse/misuse in normal animals and to discover genes that could be used as candidate biomarkers for the detection of GH gene therapy abuse/misuse in humans. We determined the global gene expression profile of peripheral whole blood from normal adult male rats after long-term GH gene therapy using CapitalBio 27 K Rat Genome Oligo Arrays. Sixty one genes were found to be differentially expressed in GH gene-treated rats 24 weeks after receiving GH gene therapy, at a two-fold higher or lower level compared to the empty vector group (p < 0.05). These genes were mainly associated with angiogenesis, oncogenesis, apoptosis, immune networks, signaling pathways, general metabolism, type I diabetes mellitus, carbon fixation, cell adhesion molecules, and cytokine-cytokine receptor interaction. The results imply that exogenous GH gene expression in normal subjects is likely to induce cellular changes in the metabolism, signal pathways and immunity. A real-time qRT-PCR analysis of a selection of the genes confirmed the microarray data. Eight differently expressed genes were selected as candidate biomarkers from among these 61 genes. These 8 showed five-fold higher or lower expression levels after the GH gene transduction (p < 0.05). They were then validated in real-time PCR experiments using 15 single-treated blood samples and 10 control blood samples. In summary, we detected the gene expression profiles of rat peripheral whole blood after long-term GH gene therapy and screened eight genes as candidate biomarkers based on the microarray data. This will contribute to an increased mechanistic understanding of the effects of chronic GH gene therapy abuse/misuse in normal subjects.  相似文献   

9.
A class of nonparametric statistical methods, including a nonparametric empirical Bayes (EB) method, the Significance Analysis of Microarrays (SAM) and the mixture model method (MMM) have been proposed to detect differential gene expression for replicated microarray experiments. They all depend on constructing a test statistic, for example, a t-statistic, and then using permutation to draw inferences. However, due to special features of microarray data, using standard permutation scores may not estimate the null distribution of the test statistic well, leading to possibly too conservative inferences. We propose a new method of constructing weighted permutation scores to overcome the problem: posterior probabilities of having no differential expression from the EB method are used as weights for genes to better estimate the null distribution of the test statistic. We also propose a weighted method to estimate the false discovery rate (FDR) using the posterior probabilities. Using simulated data and real data for time-course microarray experiments, we show the improved performance of the proposed methods when implemented in MMM, EB and SAM.  相似文献   

10.
Identifying differentially expressed (DE) genes across conditions or treatments is a typical problem in microarray experiments. In time course microarray experiments (under two or more conditions/treatments), it is sometimes of interest to identify two classes of DE genes: those with no time-condition interactions (called parallel DE genes, or PDE), and those with time-condition interactions (nonparallel DE genes, NPDE). Although many methods have been proposed for identifying DE genes in time course experiments, methods for discerning NPDE genes from the general DE genes are still lacking. We propose a functional ANOVA mixed-effect model to model time course gene expression observations. The fixed effect of (the mean curve) of the model decomposes bivariate functions of time and treatments (or experimental conditions) as in the classic ANOVA method and provides the associated notions of main effects and interactions. Random effects capture time-dependent correlation structures. In this model, identifying NPDE genes is equivalent to testing the significance of the time-condition interaction, for which an approximate F-test is suggested. We examined the performance of the proposed method on simulated datasets in comparison with some existing methods, and applied the method to a study of human reaction to the endotoxin stimulation, as well as to a cell cycle expression data set.  相似文献   

11.
Experiments using cDNA microarrays for the identification of genes with certain expression patterns require a thoughtfully planned design. This study was conducted to determine an optimal design for a microarray experiment to estimate differential gene expression between hybrids and their parental inbred lines in maize (i.e. dominance). It has two features: the contrasts of interest contain more than two genotypes and the procedure may be customised to other microarray experiments where different effects may influence hybridisation signals. A mixed model was used to include all important effects. Impacts during growth of the plant material were taken into consideration as well as those occurring during hybridisation. The results of a preliminary experiment were used to determine which effects were to be included in the model, and data from another microarray experiment were used to estimate variance components. In order to select good designs, an optimality criterion adapted to the problem of differential gene expression between hybrids and their parental inbred lines was defined. Two approaches were used to determine an optimal design: the first one simplifies the problem by dividing it into several subproblems, whereas the second is more sophisticated and uses a simulated annealing (SA) algorithm. We found that the first approach constitutes a useful means for designing microarray experiments to study this problem. Using the more sophisticated SA approach the design can be further improved.  相似文献   

12.
Finding edging genes from microarray data   总被引:1,自引:0,他引:1  
MOTIVATION: A set of genes and their gene expression levels are used to classify disease and normal tissues. Due to the massive number of genes in microarray, there are a large number of edges to divide different classes of genes in microarray space. The edging genes (EGs) can be co-regulated genes, they can also be on the same pathway or deregulated by the same non-coding genes, such as siRNA or miRNA. Every gene in EGs is vital for identifying a tissue's class. The changing in one EG's gene expression may cause a tissue alteration from normal to disease and vice versa. Finding EGs is of biological importance. In this work, we propose an algorithm to effectively find these EGs. RESULT: We tested our algorithm with five microarray datasets. The results are compared with the border-based algorithm which was used to find gene groups and subsequently divide different classes of tissues. Our algorithm finds a significantly larger amount of EGs than does the border-based algorithm. As our algorithm prunes irrelevant patterns at earlier stages, time and space complexities are much less prevalent than in the border-based algorithm. AVAILABILITY: The algorithm proposed is implemented in C++ on Linux platform. The EGs in five microarray datasets are calculated. The preprocessed datasets and the discovered EGs are available at http://www3.it.deakin.edu.au/~phoebe/microarray.html.  相似文献   

13.
The low reproducibility of differential expression of individual genes in microarray experiments has led to the suggestion that experiments be analyzed in terms of gene characteristics, such as GO categories or pathways, in order to enhance the robustness of the results. An implicit assumption of this approach is that the different experiments in effect randomly sample the genes participating in an active process. We argue that by the same rationale it is possible to perform this higher-level analysis on the aggregation of genes that are differentially-expressed in different expression-based studies, even if the experiments used different platforms. The aggregation increases the reliability of the results, it has the potential for uncovering signals that are liable to escape detection in the individual experiments, and it enables a more thorough mining of the ever more plentiful microarray data. We present here a proof-of-concept study of these ideas, using ten studies describing the changes in expression profiles of human host genes in response to infection by Retroviridae or Herpesviridae viral families. We supply a tool (accessible at www.cs.bgu.ac.il/~waytogo) which enables the user to learn about genes and processes of interest in this study.  相似文献   

14.
MOTIVATION: Identification of genes expressed in a cell-cycle-specific periodical manner is of great interest to understand cyclic systems which play a critical role in many biological processes. However, identification of cell-cycle regulated genes by raw microarray gene expression data directly is complicated by the factor of synchronization loss, thus remains a challenging problem. Decomposing the expression measurements and extracting synchronized expression will allow to better represent the single-cell behavior and improve the accuracy in identifying periodically expressed genes. RESULTS: In this paper, we propose a resynchronization-based algorithm for identifying cell-cycle-related genes. We introduce a synchronization loss model by modeling the gene expression measurements as a superposition of different cell populations growing at different rates. The underlying expression profile is then reconstructed through resynchronization and is further fitted to the measurements in order to identify periodically expressed genes. Results from both simulations and real microarray data show that the proposed scheme is promising for identifying cyclic genes and revealing underlying gene expression profiles. AVAILABILITY: Contact the authors. SUPPLEMENTARY INFORMATION: Supplementary data are available at: http://dsplab.eng.umd.edu/~genomics/syn/  相似文献   

15.
Moderated statistical tests for assessing differences in tag abundance   总被引:2,自引:0,他引:2  
MOTIVATION: Digital gene expression (DGE) technologies measure gene expression by counting sequence tags. They are sensitive technologies for measuring gene expression on a genomic scale, without the need for prior knowledge of the genome sequence. As the cost of sequencing DNA decreases, the number of DGE datasets is expected to grow dramatically. Various tests of differential expression have been proposed for replicated DGE data using binomial, Poisson, negative binomial or pseudo-likelihood (PL) models for the counts, but none of the these are usable when the number of replicates is very small. RESULTS: We develop tests using the negative binomial distribution to model overdispersion relative to the Poisson, and use conditional weighted likelihood to moderate the level of overdispersion across genes. Not only is our strategy applicable even with the smallest number of libraries, but it also proves to be more powerful than previous strategies when more libraries are available. The methodology is equally applicable to other counting technologies, such as proteomic spectral counts. AVAILABILITY: An R package can be accessed from http://bioinf.wehi.edu.au/resources/  相似文献   

16.
17.
MOTIVATION: A common objective of microarray experiments is the detection of differential gene expression between samples obtained under different conditions. The task of identifying differentially expressed genes consists of two aspects: ranking and selection. Numerous statistics have been proposed to rank genes in order of evidence for differential expression. However, no one statistic is universally optimal and there is seldom any basis or guidance that can direct toward a particular statistic of choice. RESULTS: Our new approach, which addresses both ranking and selection of differentially expressed genes, integrates differing statistics via a distance synthesis scheme. Using a set of (Affymetrix) spike-in datasets, in which differentially expressed genes are known, we demonstrate that our method compares favorably with the best individual statistics, while achieving robustness properties lacked by the individual statistics. We further evaluate performance on one other microarray study.  相似文献   

18.
Darvish A  Najarian K 《Bio Systems》2006,83(2-3):125-135
We propose a novel technique that constructs gene regulatory networks from DNA microarray data and gene-protein databases and then applies Mason rule to systematically search for the most dominant regulators of the network. The algorithm then recommends the identified dominant regulator genes as the best candidates for future knock-out experiments. Actively choosing the genes for knock-out experiments allows optimal perturbation of the pathway and therefore produces the most informative DNA microarray data for pathway identification purposes. This approach is more practically advantageous in analysis of large pathways where the time and cost of DNA microarray data experiments can be reduced using the proposed optimal experiment design. The proposed method was successfully tested on the galactose regulatory network.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号