首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Genome-scale microarray experiments for comparative analysis of gene expressions produce massive amounts of information. Traditional statistical approaches fail to achieve the required accuracy in sensitivity and specificity of the analysis. Since the problem can be resolved neither by increasing the number of replicates nor by manipulating thresholds, one needs a novel approach to the analysis. This article describes methods to improve the power of microarray analyses by defining internal standards to characterize features of the biological system being studied and the technological processes underlying the microarray experiments. Applying these methods, internal standards are identified and then the obtained parameters are used to define (i) genes that are distinct in their expression from background; (ii) genes that are differentially expressed; and finally (iii) genes that have similar dynamical behavior.  相似文献   

2.
Accurately identifying differentially expressed genes from microarray data is not a trivial task, partly because of poor variance estimates of gene expression signals. Here, after analyzing 380 replicated microarray experiments, we found that probesets have typical, distinct variances that can be estimated based on a large number of microarray experiments. These probeset-specific variances depend at least in part on the function of the probed gene: genes for ribosomal or structural proteins often have a small variance, while genes implicated in stress responses often have large variances. We used these variance estimates to develop a statistical test for differentially expressed genes called EVE (external variance estimation). The EVE algorithm performs better than the t-test and LIMMA on some real-world data, where external information from appropriate databases is available. Thus, EVE helps to maximize the information gained from a typical microarray experiment. Nonetheless, only a large number of replicates will guarantee to identify nearly all truly differentially expressed genes. However, our simulation studies suggest that even limited numbers of replicates will usually result in good coverage of strongly differentially expressed genes.  相似文献   

3.
4.
Heart failure (HF) is the major of cause of mortality and morbidity in the developed world. Gene expression profiles of animal model of heart failure have been used in number of studies to understand human cardiac disease. In this study, statistical methods of analysing microarray data on cardiac tissues from dogs with pacing induced HF were used to identify differentially expressed genes between normal and two abnormal tissues. The unsupervised techniques principal component analysis (PCA) and cluster analysis were explored to distinguish between three different groups of 12 arrays and to separate the genes which are up regulated in different conditions among 23912 genes in heart failure canines'' microarray data. It was found that out of 23912 genes, 1802 genes were differentially expressed in the three groups at 5% level of significance and 496 genes were differentially expressed at 1% level of significance using one way analysis of variance (ANOVA). The genes clustered using PCA and clustering analysis were explored in the paper to understand HF and a small number of differentially expressed genes related to HF were identified.  相似文献   

5.
Pan W  Lin J  Le CT 《Genome biology》2002,3(5):research0022.1-research002210

Background  

It has been recognized that replicates of arrays (or spots) may be necessary for reliably detecting differentially expressed genes in microarray experiments. However, the often-asked question of how many replicates are required has barely been addressed in the literature. In general, the answer depends on several factors: a given magnitude of expression change, a desired statistical power (that is, probability) to detect it, a specified Type I error rate, and the statistical method being used to detect the change. Here, we discuss how to calculate the number of replicates in the context of applying a nonparametric statistical method, the normal mixture model approach, to detect changes in gene expression.  相似文献   

6.
The identification and cloning of genes conferring mosquito refractoriness to the malaria parasite is critical for understanding malaria transmission mechanisms and holds great promise for developing novel approaches to malaria control. The mosquito midgut is the first major site of interaction between the parasite and the mosquito. Failure of the parasite to negotiate this environment can be a barrier for development and is likely the main cause of mosquito refractoriness. This paper reports a study on Aedes aegypti midgut expressed sequence tag (EST) identification and the determination of genes differentially expressed in mosquito populations susceptible and refractory to the avian malaria parasite Plasmodium gallinaceum. We sequenced a total of 1200 cDNA clones and obtained 1183 high-quality mosquito midgut ESTs that were computationally collapsed into 105 contigs and 251 singlets. All 1200 midgut cDNA clones, together with an additional 102 genetically or physically mapped Ae. aegypti clones, were spotted on single arrays with 12 replicates. Of those interrogated microarray elements, 28 (2.3%) were differentially expressed between the susceptible and refractory mosquito populations. Twenty-seven elements showed at least a two-fold increase in expression in the susceptible population level relative to the refractory population and one clone showed reduced expression. Sequence analysis of these differentially expressed genes revealed that 10 showed no significant similarity to any known genes, 6 clones had matches with unannotated genes of Anopheles gambiae, and 12 clones exhibited significant similarity to known genes. Real-time quantitative RT-PCR of selected clones confirmed the mRNA expression profiles from the microarray analysis.  相似文献   

7.
The level of differential gene expression may be defined as a fold change, a frequency of upregulation, or some other measure of the degree or extent of a difference in expression across groups of interest. On the basis of expression data for hundreds or thousands of genes, inferring which genes are differentially expressed or ranking genes in order of priority introduces a bias in estimates of their differential expression levels. A previous correction of this feature selection bias suffers from a lack of generality in the method of ranking genes, from requiring many biological replicates, and from unnecessarily overcompensating for the bias. For any method of ranking genes on the basis of gene expression measured for as few as three biological replicates, a simple leave-one-out algorithm corrects, with less overcompensation, the bias in estimates of the level of differential gene expression. In a microarray data set, the bias correction reduces estimates of the probability of upregulation or downregulation from 100% to as low as 60%, even for genes with estimated local false discovery rates close to 0. A simulation study quantifies both the advantage of smoothing estimates of bias before correction and the degree of overcompensation.  相似文献   

8.
High water use efficiency or transpiration efficiency (TE) in wheat is a desirable physiological trait for increasing grain yield under water-limited environments. The identification of genes associated with this trait would facilitate the selection for genotypes with higher TE using molecular markers. We performed an expression profiling (microarray) analysis of approximately 16,000 unique wheat ESTs to identify genes that were differentially expressed between wheat progeny lines with contrasting TE levels from a cross between Quarrion (high TE) and Genaro 81 (low TE). We also conducted a second microarray analysis to identify genes responsive to drought stress in wheat leaves. Ninety-three genes that were differentially expressed between high and low TE progeny lines were identified. One fifth of these genes were markedly responsive to drought stress. Several potential growth-related regulatory genes, which were down-regulated by drought, were expressed at a higher level in the high TE lines than the low TE lines and are potentially associated with a biomass production component of the Quarrion-derived high TE trait. Eighteen of the TE differentially expressed genes were further analysed using quantitative RT-PCR on a separate set of plant samples from those used for microarray analysis. The expression levels of 11 of the 18 genes were positively correlated with the high TE trait, measured as carbon isotope discrimination (Δ13C). These data indicate that some of these TE differentially expressed genes are candidates for investigating processes that underlie the high TE trait or for use as expression quantitative trait loci (eQTLs) for TE. Electronic Supplementary Material Supplementary material is available for this article at  相似文献   

9.
Global gene expression analysis using microarrays and, more recently, RNA-seq, has allowed investigators to understand biological processes at a system level. However, the identification of differentially expressed genes in experiments with small sample size, high dimensionality, and high variance remains challenging, limiting the usability of these tens of thousands of publicly available, and possibly many more unpublished, gene expression datasets. We propose a novel variable selection algorithm for ultra-low-n microarray studies using generalized linear model-based variable selection with a penalized binomial regression algorithm called penalized Euclidean distance (PED). Our method uses PED to build a classifier on the experimental data to rank genes by importance. In place of cross-validation, which is required by most similar methods but not reliable for experiments with small sample size, we use a simulation-based approach to additively build a list of differentially expressed genes from the rank-ordered list. Our simulation-based approach maintains a low false discovery rate while maximizing the number of differentially expressed genes identified, a feature critical for downstream pathway analysis. We apply our method to microarray data from an experiment perturbing the Notch signaling pathway in Xenopus laevis embryos. This dataset was chosen because it showed very little differential expression according to limma, a powerful and widely-used method for microarray analysis. Our method was able to detect a significant number of differentially expressed genes in this dataset and suggest future directions for investigation. Our method is easily adaptable for analysis of data from RNA-seq and other global expression experiments with low sample size and high dimensionality.  相似文献   

10.

Background  

To identify differentially expressed genes across experimental conditions in oligonucleotide microarray experiments, existing statistical methods commonly use a summary of probe-level expression data for each probe set and compare replicates of these values across conditions using a form of the t-test or rank sum test. Here we propose the use of a statistical method that takes advantage of the built-in redundancy architecture of high-density oligonucleotide arrays.  相似文献   

11.
MOTIVATION: Gene expression experiments provide a fast and systematic way to identify disease markers relevant to clinical care. In this study, we address the problem of robust identification of differentially expressed genes from microarray data. Differentially expressed genes, or discriminator genes, are genes with significantly different expression in two user-defined groups of microarray experiments. We compare three model-free approaches: (1). nonparametric t-test, (2). Wilcoxon (or Mann-Whitney) rank sum test, and (3). a heuristic method based on high Pearson correlation to a perfectly differentiating gene ('ideal discriminator method'). We systematically assess the performance of each method based on simulated and biological data under varying noise levels and p-value cutoffs. RESULTS: All methods exhibit very low false positive rates and identify a large fraction of the differentially expressed genes in simulated data sets with noise level similar to that of actual data. Overall, the rank sum test appears most conservative, which may be advantageous when the computationally identified genes need to be tested biologically. However, if a more inclusive list of markers is desired, a higher p-value cutoff or the nonparametric t-test may be appropriate. When applied to data from lung tumor and lymphoma data sets, the methods identify biologically relevant differentially expressed genes that allow clear separation of groups in question. Thus the methods described and evaluated here provide a convenient and robust way to identify differentially expressed genes for further biological and clinical analysis.  相似文献   

12.

Background  

In microarray gene expression profiling experiments, differentially expressed genes (DEGs) are detected from among tens of thousands of genes on an array using statistical tests. It is important to control the number of false positives or errors that are present in the resultant DEG list. To date, more than 20 different multiple test methods have been reported that compute overall Type I error rates in microarray experiments. However, these methods share the following dilemma: they have low power in cases where only a small number of DEGs exist among a large number of total genes on the array.  相似文献   

13.
Combining multiple microarrays in the presence of controlling variables   总被引:2,自引:0,他引:2  
MOTIVATION: Microarray technology enables the monitoring of expression levels for thousands of genes simultaneously. When the magnitude of the experiment increases, it becomes common to use the same type of microarrays from different laboratories or hospitals. Thus, it is important to analyze microarray data together to derive a combined conclusion after accounting for the differences. One of the main objectives of the microarray experiment is to identify differentially expressed genes among the different experimental groups. The analysis of variance (ANOVA) model has been commonly used to detect differentially expressed genes after accounting for the sources of variation commonly observed in the microarray experiment. RESULTS: We extended the usual ANOVA model to account for an additional variability resulting from many confounding variables such as the effect of different hospitals. The proposed model is a two-stage ANOVA model. The first stage is the adjustment for the effects of no interests. The second stage is the detection of differentially expressed genes among the experimental groups using the residuals obtained from the first stage. Based on these residuals, we propose a permutation test to detect the differentially expressed genes. The proposed model is illustrated using the data from 133 microarrays collected at three different hospitals. The proposed approach is more flexible to use, and it is easier to accommodate the individual covariates in this model than using the meta-analysis approach. AVAILABILITY: A set of programs written in R will be electronically sent upon request.  相似文献   

14.
The effect of replication on gene expression microarray experiments   总被引:5,自引:0,他引:5  
MOTIVATION: We examine the effect of replication on the detection of apparently differentially expressed genes in gene expression microarray experiments. Our analysis is based on a random sampling approach using real data sets from 16 published studies. We consider both the ability to find genes that meet particular statistical criteria as well as the stability of the results in the face of changing levels of replication. RESULTS: While dependent on the data source, our findings suggest that stable results are typically not obtained until at least five biological replicates have been used. Conversely, for most studies, 10-15 replicates yield results that are quite stable, and there is less improvement in stability as the number of replicates is further increased. Our methods will be of use in evaluating existing data sets and in helping to design new studies.  相似文献   

15.
MOTIVATION: Standard analysis routines for microarray data aim at differentially expressed genes. In this paper, we address the complementary problem of detecting sets of differentially co-expressed genes in two phenotypically distinct sets of expression profiles. RESULTS: We introduce a score for differential co-expression and suggest a computationally efficient algorithm for finding high scoring sets of genes. The use of our novel method is demonstrated in the context of simulations and on real expression data from a clinical study.  相似文献   

16.
MOTIVATION: A common objective of microarray experiments is the detection of differential gene expression between samples obtained under different conditions. The task of identifying differentially expressed genes consists of two aspects: ranking and selection. Numerous statistics have been proposed to rank genes in order of evidence for differential expression. However, no one statistic is universally optimal and there is seldom any basis or guidance that can direct toward a particular statistic of choice. RESULTS: Our new approach, which addresses both ranking and selection of differentially expressed genes, integrates differing statistics via a distance synthesis scheme. Using a set of (Affymetrix) spike-in datasets, in which differentially expressed genes are known, we demonstrate that our method compares favorably with the best individual statistics, while achieving robustness properties lacked by the individual statistics. We further evaluate performance on one other microarray study.  相似文献   

17.
In this paper, the problem of identifying differentially expressed genes under different conditions using gene expression microarray data, in the presence of outliers, is discussed. For this purpose, the robust modeling of gene expression data using some powerful distributions known as normal/independent distributions is considered. These distributions include the Student’s t and normal distributions which have been used previously, but also include extensions such as the slash, the contaminated normal and the Laplace distributions. The purpose of this paper is to identify differentially expressed genes by considering these distributional assumptions instead of the normal distribution. A Bayesian approach using the Markov Chain Monte Carlo method is adopted for parameter estimation. Two publicly available gene expression data sets are analyzed using the proposed approach. The use of the robust models for detecting differentially expressed genes is investigated. This investigation shows that the choice of model for differentiating gene expression data is very important. This is due to the small number of replicates for each gene and the existence of outlying data. Comparison of the performance of these models is made using different statistical criteria and the ROC curve. The method is illustrated using some simulation studies. We demonstrate the flexibility of these robust models in identifying differentially expressed genes.  相似文献   

18.
There are many options in handling microarray data that can affect study conclusions, sometimes drastically. Working with a two-color platform, this study uses ten spike-in microarray experiments to evaluate the relative effectiveness of some of these options for the experimental goal of detecting differential expression. We consider two data transformations, background subtraction and intensity normalization, as well as six different statistics for detecting differentially expressed genes. Findings support the use of an intensity-based normalization procedure and also indicate that local background subtraction can be detrimental for effectively detecting differential expression. We also verify that robust statistics outperform t-statistics in identifying differentially expressed genes when there are few replicates. Finally, we find that choice of image analysis software can also substantially influence experimental conclusions.  相似文献   

19.

Background  

The evaluation of statistical significance has become a critical process in identifying differentially expressed genes in microarray studies. Classical p-value adjustment methods for multiple comparisons such as family-wise error rate (FWER) have been found to be too conservative in analyzing large-screening microarray data, and the False Discovery Rate (FDR), the expected proportion of false positives among all positives, has been recently suggested as an alternative for controlling false positives. Several statistical approaches have been used to estimate and control FDR, but these may not provide reliable FDR estimation when applied to microarray data sets with a small number of replicates.  相似文献   

20.
Multivariate exploratory tools for microarray data analysis   总被引:2,自引:0,他引:2  
The ultimate success of microarray technology in basic and applied biological sciences depends critically on the development of statistical methods for gene expression data analysis. The most widely used tests for differential expression of genes are essentially univariate. Such tests disregard the multidimensional structure of microarray data. Multivariate methods are needed to utilize the information hidden in gene interactions and hence to provide more powerful and biologically meaningful methods for finding subsets of differentially expressed genes. The objective of this paper is to develop methods of multidimensional search for biologically significant genes, considering expression signals as mutually dependent random variables. To attain these ends, we consider the utility of a pertinent distance between random vectors and its empirical counterpart constructed from gene expression data. The distance furnishes exploratory procedures aimed at finding a target subset of differentially expressed genes. To determine the size of the target subset, we resort to successive elimination of smaller subsets resulting from each step of a random search algorithm based on maximization of the proposed distance. Different stopping rules associated with this procedure are evaluated. The usefulness of the proposed approach is illustrated with an application to the analysis of two sets of gene expression data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号