首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
SUMMARY: Searching for differentially expressed genes is one of the most common applications for microarrays, yet statistically there are difficult hurdles to achieving adequate rigor and practicality. False discovery rate (FDR) approaches have become relatively standard; however, how to define and control the FDR has been hotly debated. Permutation estimation approaches such as SAM and PaGE can be effective; however, they leave much room for improvement. We pursue the permutation estimation method and describe a convenient definition for the FDR that can be estimated in a straightforward manner. We then discuss issues regarding the choice of statistic and data transformation. It is impossible to optimize the power of any statistic for thousands of genes simultaneously, and we look at the practical consequences of this. For example, the log transform can both help and hurt at the same time, depending on the gene. We examine issues surrounding the SAM 'fudge factor' parameter, and how to handle these issues by optimizing with respect to power.  相似文献   

2.
Bochkina N  Richardson S 《Biometrics》2007,63(4):1117-1125
We consider the problem of identifying differentially expressed genes in microarray data in a Bayesian framework with a noninformative prior distribution on the parameter quantifying differential expression. We introduce a new rule, tail posterior probability, based on the posterior distribution of the standardized difference, to identify genes differentially expressed between two conditions, and we derive a frequentist estimator of the false discovery rate associated with this rule. We compare it to other Bayesian rules in the considered settings. We show how the tail posterior probability can be extended to testing a compound null hypothesis against a class of specific alternatives in multiclass data.  相似文献   

3.
4.
MOTIVATION: It is important to consider finding differentially expressed genes in a dataset of microarray experiments for pattern generation. RESULTS: We developed two methods which are mainly based on the q-values approach; the first is a direct extension of the q-values approach, while the second uses two approaches: q-values and maximum-likelihood. We present two algorithms for the second method, one for error minimization and the other for confidence bounding. Also, we show how the method called Patterns from Gene Expression (PaGE) (Grant et al., 2000) can benefit from q-values. Finally, we conducted some experiments to demonstrate the effectiveness of the proposed methods; experimental results on a selected dataset (BRCA1 vs BRCA2 tumor types) are provided. CONTACT: alhajj@cpsc.ucalgary.ca.  相似文献   

5.
An important and common problem in microarray experiments is the detection of genes that are differentially expressed in a given number of classes. As this problem concerns the selection of significant genes from a large pool of candidate genes, it needs to be carried out within the framework of multiple hypothesis testing. In this paper, we focus on the use of mixture models to handle the multiplicity issue. With this approach, a measure of the local FDR (false discovery rate) is provided for each gene. An attractive feature of the mixture model approach is that it provides a framework for the estimation of the prior probability that a gene is not differentially expressed, and this probability can subsequently be used in forming a decision rule. The rule can also be formed to take the false negative rate into account. We apply this approach to a well-known publicly available data set on breast cancer, and discuss our findings with reference to other approaches.  相似文献   

6.
Estimating the false discovery rate using nonparametric deconvolution   总被引:1,自引:0,他引:1  
van de Wiel MA  Kim KI 《Biometrics》2007,63(3):806-815
Given a set of microarray data, the problem is to detect differentially expressed genes, using a false discovery rate (FDR) criterion. As opposed to common procedures in the literature, we do not base the selection criterion on statistical significance only, but also on the effect size. Therefore, we select only those genes that are significantly more differentially expressed than some f-fold (e.g., f = 2). This corresponds to use of an interval null domain for the effect size. Based on a simple error model, we discuss a naive estimator for the FDR, interpreted as the probability that the parameter of interest lies in the null-domain (e.g., mu < log(2)(2) = 1) given that the test statistic exceeds a threshold. We improve the naive estimator by using deconvolution. That is, the density of the parameter of interest is recovered from the data. We study performance of the methods using simulations and real data.  相似文献   

7.
Tan Y  Liu Y 《Bioinformation》2011,7(8):400-404
Identification of genes differentially expressed across multiple conditions has become an important statistical problem in analyzing large-scale microarray data. Many statistical methods have been developed to address the challenging problem. Therefore, an extensive comparison among these statistical methods is extremely important for experimental scientists to choose a valid method for their data analysis. In this study, we conducted simulation studies to compare six statistical methods: the Bonferroni (B-) procedure, the Benjamini and Hochberg (BH-) procedure, the Local false discovery rate (Localfdr) method, the Optimal Discovery Procedure (ODP), the Ranking Analysis of F-statistics (RAF), and the Significant Analysis of Microarray data (SAM) in identifying differentially expressed genes. We demonstrated that the strength of treatment effect, the sample size, proportion of differentially expressed genes and variance of gene expression will significantly affect the performance of different methods. The simulated results show that ODP exhibits an extremely high power in indentifying differentially expressed genes, but significantly underestimates the False Discovery Rate (FDR) in all different data scenarios. The SAM has poor performance when the sample size is small, but is among the best-performing methods when the sample size is large. The B-procedure is stringent and thus has a low power in all data scenarios. Localfdr and RAF show comparable statistical behaviors with the BH-procedure with favorable power and conservativeness of FDR estimation. RAF performs the best when proportion of differentially expressed genes is small and treatment effect is weak, but Localfdr is better than RAF when proportion of differentially expressed genes is large.  相似文献   

8.

Background  

Before conducting a microarray experiment, one important issue that needs to be determined is the number of arrays required in order to have adequate power to identify differentially expressed genes. This paper discusses some crucial issues in the problem formulation, parameter specifications, and approaches that are commonly proposed for sample size estimation in microarray experiments. Common methods for sample size estimation are formulated as the minimum sample size necessary to achieve a specified sensitivity (proportion of detected truly differentially expressed genes) on average at a specified false discovery rate (FDR) level and specified expected proportion (π 1) of the true differentially expression genes in the array. Unfortunately, the probability of detecting the specified sensitivity in such a formulation can be low. We formulate the sample size problem as the number of arrays needed to achieve a specified sensitivity with 95% probability at the specified significance level. A permutation method using a small pilot dataset to estimate sample size is proposed. This method accounts for correlation and effect size heterogeneity among genes.  相似文献   

9.
In genome-wide genetic studies with a large number of markers, balancing the type I error rate and power is a challenging issue. Recently proposed false discovery rate (FDR) approaches are promising solutions to this problem. Using the 100 simulated datasets of a genome-wide marker map spaced about 3 cM and phenotypes from the Genetic Analysis Workshop 14, we studied the type I error rate and power of Storey's FDR approach, and compared it to the traditional Bonferroni procedure. We confirmed that Storey's FDR approach had a strong control of FDR. We found that Storey's FDR approach only provided weak control of family-wise error rate (FWER). For these simulated datasets, Storey's FDR approach only had slightly higher power than the Bonferroni procedure. In conclusion, Storey's FDR approach is more powerful than the Bonferroni procedure if strong control of FDR or weak control of FWER is desired. Storey's FDR approach has little power advantage over the Bonferroni procedure if there is low linkage disequilibrium among the markers. Further evaluation of the type I error rate and power of the FDR approaches for higher linkage disequilibrium and for haplotype analyses is warranted.  相似文献   

10.
Empirical Bayes models have been shown to be powerful tools for identifying differentially expressed genes from gene expression microarray data. An example is the WAME model, where a global covariance matrix accounts for array-to-array correlations as well as differing variances between arrays. However, the existing method for estimating the covariance matrix is very computationally intensive and the estimator is biased when data contains many regulated genes. In this paper, two new methods for estimating the covariance matrix are proposed. The first method is a direct application of the EM algorithm for fitting the multivariate t-distribution of the WAME model. In the second method, a prior distribution for the log fold-change is added to the WAME model, and a discrete approximation is used for this prior. Both methods are evaluated using simulated and real data. The first method shows equal performance compared to the existing method in terms of bias and variability, but is superior in terms of computer time. For large data sets (>15 arrays), the second method also shows superior computer run time. Moreover, for simulated data with regulated genes the second method greatly reduces the bias. With the proposed methods it is possible to apply the WAME model to large data sets with reasonable computer run times. The second method shows a small bias for simulated data, but appears to have a larger bias for real data with many regulated genes.  相似文献   

11.
12.
MOTIVATION: Characterizing the dynamic regulation of gene expression by time course experiments is becoming more and more important. A common problem is to identify differentially expressed genes between the treatment and control time course. It is often difficult to compare expression patterns of a gene between two time courses for the following reasons: (1) the number of sampling time points may be different or hard to be aligned between the treatment and the control time courses; (2) estimation of the function that describes the expression of a gene in a time course is difficult and error-prone due to the limited number of time points. We propose a novel method to identify the differentially expressed genes between two time courses, which avoids direct comparison of gene expression patterns between the two time courses. RESULTS: Instead of attempting to 'align' and compare the two time courses directly, we first convert the treatment and control time courses into neighborhood systems that reflect the underlying relationships between genes. We then identify the differentially expressed genes by comparing the two gene relationship networks. To verify our method, we apply it to two treatment-control time course datasets. The results are consistent with the previous results and also give some new biologically meaningful findings. AVAILABILITY: The algorithm in this paper is coded in C++ and is available from http://leili-lab.cmb.usc.edu/yeastaging/projects/MARD/  相似文献   

13.
When applying hierarchical clustering algorithms to cluster patient samples from microarray data, the clustering patterns generated by most algorithms tend to be dominated by groups of highly differentially expressed genes that have closely related expression patterns. Sometimes, these genes may not be relevant to the biological process under study or their functions may already be known. The problem is that these genes can potentially drown out the effects of other genes that are relevant or have novel functions. We propose a procedure called complementary hierarchical clustering that is designed to uncover the structures arising from these novel genes that are not as highly expressed. Simulation studies show that the procedure is effective when applied to a variety of examples. We also define a concept called relative gene importance that can be used to identify the influential genes in a given clustering. Finally, we analyze a microarray data set from 295 breast cancer patients, using clustering with the correlation-based distance measure. The complementary clustering reveals a grouping of the patients which is uncorrelated with a number of known prognostic signatures and significantly differing distant metastasis-free probabilities.  相似文献   

14.
MOTIVATION: Gene expression experiments provide a fast and systematic way to identify disease markers relevant to clinical care. In this study, we address the problem of robust identification of differentially expressed genes from microarray data. Differentially expressed genes, or discriminator genes, are genes with significantly different expression in two user-defined groups of microarray experiments. We compare three model-free approaches: (1). nonparametric t-test, (2). Wilcoxon (or Mann-Whitney) rank sum test, and (3). a heuristic method based on high Pearson correlation to a perfectly differentiating gene ('ideal discriminator method'). We systematically assess the performance of each method based on simulated and biological data under varying noise levels and p-value cutoffs. RESULTS: All methods exhibit very low false positive rates and identify a large fraction of the differentially expressed genes in simulated data sets with noise level similar to that of actual data. Overall, the rank sum test appears most conservative, which may be advantageous when the computationally identified genes need to be tested biologically. However, if a more inclusive list of markers is desired, a higher p-value cutoff or the nonparametric t-test may be appropriate. When applied to data from lung tumor and lymphoma data sets, the methods identify biologically relevant differentially expressed genes that allow clear separation of groups in question. Thus the methods described and evaluated here provide a convenient and robust way to identify differentially expressed genes for further biological and clinical analysis.  相似文献   

15.
DNA array technology now allows an enormous amount of expression data to be obtained. For large-scale gene profiling enterprises, this is of course welcome. However, the scientist interested in follow-up studies of a handful of differentially expressed genes may find it hard to sift through the vast datasets to pinpoint genes with the most desirable and reliable behaviors. Here, we present the methodology we have employed to discover genes differentially expressed in the adult mouse brain. We first used Affymetrix microarrays to compare gene expression from five different brain regions: the amygdala, cerebellum, hippocampus, olfactory bulb, and periaqueductal gray. Second, we identified genes differentially expressed within three distinct amygdala subnuclei. In this case, the tissue was microdissected by laser-capture to minimize contamination from adjacent subnuclei, and extracted RNA was subjected to three rounds of linear amplification prior to hybridization to the microarrays. To select candidate genes, we developed a custom algorithm to identify those genes with the most robust changes in expression across different replicate samples. Confirmation of expression patterns with in situ hybridization uncovered further criteria to consider in the selection process.  相似文献   

16.
The increased availability of microarray data has been calling for statistical methods to integrate findings across studies. A common goal of microarray analysis is to determine differentially expressed genes between two conditions, such as treatment vs control. A recent Bayesian metaanalysis model used a prior distribution for the mean log-expression ratios that was a mixture of two normal distributions. This model centered the prior distribution of differential expression at zero, and separated genes into two groups only: expressed and nonexpressed. Here, we introduce a Bayesian three-component truncated normal mixture prior model that more flexibly assigns prior distributions to the differentially expressed genes and produces three groups of genes: up and downregulated, and nonexpressed. We found in simulations of two and five studies that the three-component model outperformed the two-component model using three comparison measures. When analyzing biological data of Bacillus subtilis, we found that the three-component model discovered more genes and omitted fewer genes for the same levels of posterior probability of differential expression than the two-component model, and discovered more genes for fixed thresholds of Bayesian false discovery. We assumed that the data sets were produced from the same microarray platform and were prescaled.  相似文献   

17.
MOTIVATION: Standard analysis routines for microarray data aim at differentially expressed genes. In this paper, we address the complementary problem of detecting sets of differentially co-expressed genes in two phenotypically distinct sets of expression profiles. RESULTS: We introduce a score for differential co-expression and suggest a computationally efficient algorithm for finding high scoring sets of genes. The use of our novel method is demonstrated in the context of simulations and on real expression data from a clinical study.  相似文献   

18.
The behaviors of autism overlap with a diverse array of other neurological disorders, suggesting common molecular mechanisms. We conducted a large comparative analysis of the network of genes linked to autism with those of 432 other neurological diseases to circumscribe a multi-disorder subcomponent of autism. We leveraged the biological process and interaction properties of these multi-disorder autism genes to overcome the across-the-board multiple hypothesis corrections that a purely data-driven approach requires. Using prior knowledge of biological process, we identified 154 genes not previously linked to autism of which 42% were significantly differentially expressed in autistic individuals. Then, using prior knowledge from interaction networks of disorders related to autism, we uncovered 334 new genes that interact with published autism genes, of which 87% were significantly differentially regulated in autistic individuals. Our analysis provided a novel picture of autism from the perspective of related neurological disorders and suggested a model by which prior knowledge of interaction networks can inform and focus genome-scale studies of complex neurological disorders.  相似文献   

19.
This paper deals with the problem of tele-monitoring EEG signals. In EEG tele-monitoring system, the integral step is to compress the signals in computationally efficient manner so that they can be transmitted over a limited bandwidth. In such a situation a Compressed Sensing (CS) framework for compressing and recovering the signals is the most viable approach. Previously the well known synthesis prior formulation is used for reconstruction. For the first time in this work, we show that the lesser known analysis prior formulation is a more appropriate way to frame the reconstruction problem. We show that our method yields better results than the previous synthesis prior formulation.  相似文献   

20.
We have designed a simple and efficient polymerase chain reaction (PCR)-based cDNA subtraction protocol for high-throughput cloning of differentially expressed genes from plants that can be applied to any experimental system and as an alternative to DNA chip technology. Sequence-independent PCR-amplifiable first-strand cDNA population was synthesized by priming oligo-dT primer with a defined 5' heel sequence and ligating another specified single-stranded oligonucleotide primer on the 3' ends of first-strand cDNAs by T4 RNA ligase. A biotin label was introduced into the sense strands of cDNA that must be subtracted by using 5' biotinylated forward primer during PCR amplification to immobilize the sense strand onto the streptavidin-linked paramagnetic beads. The unamplified first strand (antisense) of the interrogating cDNA population was hybridized with a large excess of amplified sense strands of control cDNA. We used magnetic bead technology for the efficient removal of common cDNA population after hybridization to reduce the complexity of the cDNA prior to PCR amplification for the enrichment and sequence abundance normalization of differentially expressed genes. Construction of a subtracted and normalized cDNA library efficiently eliminates common abundant cDNA messages and also increases the probability of identifying clones differentially expressed in low-abundance cDNA messages. We used this method to successfully isolate differentially expressed genes from Pennisetum seedlings in response to salinity stress. Sequence analysis of the selected clones showed homologies to genes that were reported previously and shown to be involved in plant stress adaptation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号