首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Cross-species research in drug development is novel and challenging. A bivariate mixture model utilizing information across two species was proposed to solve the fundamental problem of identifying differentially expressed genes in microarray experiments in order to potentially improve the understanding of translation between preclinical and clinical studies for drug development. The proposed approach models the joint distribution of treatment effects estimated from independent linear models. The mixture model posits up to nine components, four of which include groups in which genes are differentially expressed in both species. A comprehensive simulation to evaluate the model performance and one application on a real world data set, a mouse and human type II diabetes experiment, suggest that the proposed model, though highly structured, can handle various configurations of differential gene expression and is practically useful on identifying differentially expressed genes, especially when the magnitude of differential expression due to different treatment intervention is weak. In the mouse and human application, the proposed mixture model was able to eliminate unimportant genes and identify a list of genes that were differentially expressed in both species and could be potential gene targets for drug development.  相似文献   

2.
MOTIVATION: The field of microarray data analysis is shifting emphasis from methods for identifying differentially expressed genes to methods for identifying differentially expressed gene categories. The latter approaches utilize a priori information about genes to group genes into categories and enhance the interpretation of experiments aimed at identifying expression differences across treatments. While almost all of the existing approaches for identifying differentially expressed gene categories are practically useful, they suffer from a variety of drawbacks. Perhaps most notably, many popular tools are based exclusively on gene-specific statistics that cannot detect many types of multivariate expression change. RESULTS: We have developed a nonparametric multivariate method for identifying gene categories whose multivariate expression distribution differs across two or more conditions. We illustrate our approach and compare its performance to several existing procedures via the analysis of a real data set and a unique data-based simulation study designed to capture the challenges and complexities of practical data analysis. We show that our method has good power for differentiating between differentially expressed and non-differentially expressed gene categories, and we utilize a resampling based strategy for controlling the false discovery rate when testing multiple categories. AVAILABILITY: R code (www.r-project.org) for implementing our approach is available from the first author by request.  相似文献   

3.
4.
cDNA microarray data are subject to many sources of variation that have to be removed before statistical tests can be applied for identifying genes that are expressed differentially. Background correction, log-ratio transformation, and normalization, referred as the log-ratio approach, have been widely used for this purpose. However, there are some problems associated with this procedure. In this study, we proposed an alternative approach that obviates the log-ratio transformation step and goes directly to normalization after background correction. The method can estimate the “noise” effect by utilizing the information more effectively. Simulation studies were carried out to compare the feasibility and efficiency of this approach for detecting the specifically and differentially expressed genes under various conditions with the log-ratio approach. The results showed that our approach worked well and was more robust and powerful than the log-ratio approach.  相似文献   

5.
cDNA微阵列数据中包含许多变异因素,用于检测差异表达基因和其它统计分析前,必须将这些“噪音”剔除。对数比法(背景校正、对数比转换和数据标准化)已经被广泛应用于cDNA微阵列数据分析中,然而这种方法却存在着一些亟待解决的缺陷。对此,该文提出一种非转换方法,它可免去对数比的转化过程,直接在背景校正后进行数据标准化,可以有效剔除实验“噪音”。研究结果表明:在检测差异表达基因的效率方面,非转换方法比常规的对数比法具有更好的稳健性和更高的检测功效,基因检出率和准确性大大提高。  相似文献   

6.
Although many statistical methods have been proposed for identifying differentially expressed genes, the optimal approach has still not been resolved. Therefore, it is necessary to develop more efficient methods of finding differentially expressed genes while accounting for noise and false discovery rate (FDR). We propose a method based on multi-resolution wavelet transformation analysis combined with SAM for identifying differentially expressed genes by adjusting the Δ and computing the FDR. This method was applied to a microarray expression dataset from adenoma patients and normal subjects. The number of differentially expressed genes gradually reduced with an increasing Δ value, and the FDR was reduced after wavelet transformation. At a given Δ value, the FDR was also reduced before and after wavelet transformation. In conclusion, a greater number and quality of differentially expressed genes were detected using the method when compared to non-transformed data, and the FDRs were notably more controlled and reduced.  相似文献   

7.
Gene expression studies have been widely used in an effort to identify signatures that can predict clinical progression of cancer. In this study we focused instead on identifying gene expression differences between breast tumors and adjacent normal tissue, and between different subtypes of tumor classified by clinical marker status. We have collected a set of 20 breast cancer tissues, matched with the adjacent pathologically normal tissue from the same patient. The cancer samples representing each subtype of breast cancer identified by estrogen receptor ER(+/-) and Her2(+/-) status and divided into four subgroups (ER+/Her2+, ER+/Her2-, ER-/Her2+, and ER-/Her2-) were hybridized on Affymetrix HG-133 Plus 2.0 microarrays. By comparing cancer samples with their matched normal controls we have identified 3537 overall differentially expressed genes using data analysis methods from Bioconductor. When we looked at the genes in common of the four subgroups, we found 151 regulated genes, some of them encoding known targets for breast cancer treatment. Unique genes in the four subgroups instead suggested gene regulation dependent on the ER/Her2 markers selection. In conclusion, the results indicate that microarray studies using robust analysis of matched tumor and normal samples from the same patients can be used to identify genes differentially expressed in breast cancer tumor subtypes even when small numbers of samples are considered and can further elucidate molecular features of breast cancer.  相似文献   

8.
基因芯片筛选差异表达基因方法比较   总被引:1,自引:0,他引:1  
单文娟  童春发  施季森 《遗传》2008,30(12):1640-1646
摘要: 使用计算机模拟数据和真实的芯片数据, 对8种筛选差异表达基因的方法进行了比较分析, 旨在比较不同方法对基因芯片数据的筛选效果。模拟数据分析表明, 所使用的8种方法对均匀分布的差异表达基因有很好的识别、检出作用。算法方面, SAM和Wilcoxon秩和检验方法较好; 数据分布方面, 正态分布的识别效果较好, 卡方分布和指数分布的识别效果较差。杨树cDNA芯片分析表明, SAM、Samroc和回归模型方法相近, 而Wilcoxon秩和检验方法与它们有较大差异。  相似文献   

9.
MOTIVATION: One major area of interest in analyzing oligonucleotide gene array data is identifying differentially expressed genes. A challenge to biostatisticians is to develop an approach to summarizing probe-level information that adequately reflects the true expression level while accounting for probe variation, chip variation and interaction effects. Various statistical tools, such as MAS and RMA, have been developed to address this issue. In these approaches, the probe level expression data are summarized into gene level data, which are then used for downstream statistical analysis. Since probe variation is often larger than chip variation and there is also a potential interaction effect between probe affinity and treatment effect, strategies such as a gene level analysis, may not be optimal. In this study, we propose a procedure to analyze probe level data for selecting differentially expressed genes under two treatment conditions (groups) with a small number of replicates. The probe level discrepancy between two groups can be measured by a difference of the percentiles of probe perfect-match (PM) ranks or of probe PM weighted ranks. The difference is then compared with a pre-specified threshold to determine differentially expressed genes. The probe level approach takes into account non-homogenous treatment effects and reduces possible cross-hybridization effects across a set of probes. RESULTS: The proposed approach is compared with MAS and RMA using two benchmark gene array datasets. Positive predictivity and sensitivity are used for evaluation. Results show the proposed approach has higher positive predictivity and higher sensitivity. AVAILABILITY: Available on request from the authors. CONTACT: dtchen@uab.edu.  相似文献   

10.
There are many options in handling microarray data that can affect study conclusions, sometimes drastically. Working with a two-color platform, this study uses ten spike-in microarray experiments to evaluate the relative effectiveness of some of these options for the experimental goal of detecting differential expression. We consider two data transformations, background subtraction and intensity normalization, as well as six different statistics for detecting differentially expressed genes. Findings support the use of an intensity-based normalization procedure and also indicate that local background subtraction can be detrimental for effectively detecting differential expression. We also verify that robust statistics outperform t-statistics in identifying differentially expressed genes when there are few replicates. Finally, we find that choice of image analysis software can also substantially influence experimental conclusions.  相似文献   

11.
MOTIVATION: Currently most of the methods for identifying differentially expressed genes fall into the category of so called single-gene-analysis, performing hypothesis testing on a gene-by-gene basis. In a single-gene-analysis approach, estimating the variability of each gene is required to determine whether a gene is differentially expressed or not. Poor accuracy of variability estimation makes it difficult to identify genes with small fold-changes unless a very large number of replicate experiments are performed. RESULTS: We propose a method that can avoid the difficult task of estimating variability for each gene, while reliably identifying a group of differentially expressed genes with low false discovery rates, even when the fold-changes are very small. In this article, a new characterization of differentially expressed genes is established based on a theorem about the distribution of ranks of genes sorted by (log) ratios within each array. This characterization of differentially expressed genes based on rank is an example of all-gene-analysis instead of single gene analysis. We apply the method to a cDNA microarray dataset and many low fold-changed genes (as low as 1.3 fold-changes) are reliably identified without carrying out hypothesis testing on a gene-by-gene basis. The false discovery rate is estimated in two different ways reflecting the variability from all the genes without the complications related to multiple hypothesis testing. We also provide some comparisons between our approach and single-gene-analysis based methods. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

12.
13.
DNA microarray experiments have generated large amount of gene expression measurements across different conditions. One crucial step in the analysis of these data is to detect differentially expressed genes. Some parametric methods, including the two-sample t-test (T-test) and variations of it, have been used. Alternatively, a class of non-parametric algorithms, such as the Wilcoxon rank sum test (WRST), significance analysis of microarrays (SAM) of Tusher et al. (2001), the empirical Bayesian (EB) method of Efron et al. (2001), etc., have been proposed. Most available popular methods are based on t-statistic. Due to the quality of the statistic that they used to describe the difference between groups of data, there are situations when these methods are inefficient, especially when the data follows multi-modal distributions. For example, some genes may display different expression patterns in the same cell type, say, tumor or normal, to form some subtypes. Most available methods are likely to miss these genes. We developed a new non-parametric method for selecting differentially expressed genes by relative entropy, called SDEGRE, to detect differentially expressed genes by combining relative entropy and kernel density estimation, which can detect all types of differences between two groups of samples. The significance of whether a gene is differentially expressed or not can be estimated by resampling-based permutations. We illustrate our method on two data sets from Golub et al. (1999) and Alon et al. (1999). Comparing the results with those of the T-test, the WRST and the SAM, we identified novel differentially expressed genes which are of biological significance through previous biological studies while they were not detected by the other three methods. The results also show that the genes selected by SDEGRE have a better capability to distinguish the two cell types.  相似文献   

14.
Although two-color fluorescent DNA microarrays are now standard equipment in many molecular biology laboratories, methods for identifying differentially expressed genes in microarray data are still evolving. Here, we report a refined test for differentially expressed genes which does not rely on gene expression ratios but directly compares a series of repeated measurements of the two dye intensities for each gene. This test uses a statistical model to describe multiplicative and additive errors influencing an array experiment, where model parameters are estimated from observed intensities for all genes using the method of maximum likelihood. A generalized likelihood ratio test is performed for each gene to determine whether, under the model, these intensities are significantly different. We use this method to identify significant differences in gene expression among yeast cells growing in galactose-stimulating versus non-stimulating conditions and compare our results with current approaches for identifying differentially-expressed genes. The effect of sample size on parameter optimization is also explored, as is the use of the error model to compare the within- and between-slide intensity variation intrinsic to an array experiment.  相似文献   

15.
Microarrays have become an important tool for studying the molecular basis of complex disease traits and fundamental biological processes. A common purpose of microarray experiments is the detection of genes that are differentially expressed under two conditions, such as treatment versus control or wild type versus knockout. We introduce a Laplace mixture model as a long-tailed alternative to the normal distribution when identifying differentially expressed genes in microarray experiments, and provide an extension to asymmetric over- or underexpression. This model permits greater flexibility than models in current use as it has the potential, at least with sufficient data, to accommodate both whole genome and restricted coverage arrays. We also propose likelihood approaches to hyperparameter estimation which are equally applicable in the Normal mixture case. The Laplace model appears to give some improvement in fit to data, though simulation studies show that our method performs similarly to several other statistical approaches to the problem of identification of differential expression.  相似文献   

16.
《遗传、选种与进化》2007,39(6):651-668
The aim of this paper was to describe, and when possible compare, the multivariate methods used by the participants in the EADGENE WP1.4 workshop. The first approach was for class discovery and class prediction using evidence from the data at hand. Several teams used hierarchical clustering (HC) or principal component analysis (PCA) to identify groups of differentially expressed genes with a similar expression pattern over time points and infective agent (E. coli or S. aureus). The main result from these analyses was that HC and PCA were able to separate tissue samples taken at 24 h following E. coli infection from the other samples. The second approach identified groups of differentially co-expressed genes, by identifying clusters of genes highly correlated when animals were infected with E. coli but not correlated more than expected by chance when the infective pathogen was S. aureus. The third approach looked at differential expression of predefined gene sets. Gene sets were defined based on information retrieved from biological databases such as Gene Ontology. Based on these annotation sources the teams used either the GlobalTest or the Fisher exact test to identify differentially expressed gene sets. The main result from these analyses was that gene sets involved in immune defence responses were differentially expressed.  相似文献   

17.
Li BQ  Zhang J  Huang T  Zhang L  Cai YD 《Biochimie》2012,94(9):1910-1917
This paper presents a new method for identifying retinoblastoma related genes by integrating gene expression profile and shortest path in a functional linkage graph. With the existing protein-protein interaction data from STRING, a weighted functional linkage graph is constructed. 119 consistently differentially expressed genes between retinoblastoma and normal retina were obtained from the overlap of two gene expression studies of retinoblastoma. Then the shortest paths between each pair of these 119 genes were determined with Dijkstra's algorithm. Finally, all the genes present on the shortest paths were extracted and ranked according to their betweenness and the 119 shortest genes with a betweenness greater than 100 and with a p-value less than 0.05 were selected for further analysis. We also identified 53 retinoblastoma related miRNAs from published miRNA array data and most of the 238 (119 consistently differentially expressed genes and 119 shortest path genes) retinoblastoma genes were shown to be target genes of these 53 miRNAs. Interestingly, the genes we identified from both the gene expression profiles and the functional protein association network included more cancer genes than did the genes identified from the gene expression profiles alone. In addition, these genes also had greater functional similarity to the reported cancer genes than did the genes identified from the gene expression profiles alone. This study shows promising results and proves the efficiency of the proposed methods.  相似文献   

18.
19.
A Bayesian model-based clustering approach is proposed for identifying differentially expressed genes in meta-analysis. A Bayesian hierarchical model is used as a scientific tool for combining information from different studies, and a mixture prior is used to separate differentially expressed genes from non-differentially expressed genes. Posterior estimation of the parameters and missing observations are done by using a simple Markov chain Monte Carlo method. From the estimated mixture model, useful measure of significance of a test such as the Bayesian false discovery rate (FDR), the local FDR (Efron et al., 2001), and the integration-driven discovery rate (IDR; Choi et al., 2003) can be easily computed. The model-based approach is also compared with commonly used permutation methods, and it is shown that the model-based approach is superior to the permutation methods when there are excessive under-expressed genes compared to over-expressed genes or vice versa. The proposed method is applied to four publicly available prostate cancer gene expression data sets and simulated data sets.  相似文献   

20.
Li BQ  Huang T  Liu L  Cai YD  Chou KC 《PloS one》2012,7(4):e33393
One of the most important and challenging problems in biomedicine and genomics is how to identify the disease genes. In this study, we developed a computational method to identify colorectal cancer-related genes based on (i) the gene expression profiles, and (ii) the shortest path analysis of functional protein association networks. The former has been used to select differentially expressed genes as disease genes for quite a long time, while the latter has been widely used to study the mechanism of diseases. With the existing protein-protein interaction data from STRING (Search Tool for the Retrieval of Interacting Genes), a weighted functional protein association network was constructed. By means of the mRMR (Maximum Relevance Minimum Redundancy) approach, six genes were identified that can distinguish the colorectal tumors and normal adjacent colonic tissues from their gene expression profiles. Meanwhile, according to the shortest path approach, we further found an additional 35 genes, of which some have been reported to be relevant to colorectal cancer and some are very likely to be relevant to it. Interestingly, the genes we identified from both the gene expression profiles and the functional protein association network have more cancer genes than the genes identified from the gene expression profiles alone. Besides, these genes also had greater functional similarity with the reported colorectal cancer genes than the genes identified from the gene expression profiles alone. All these indicate that our method as presented in this paper is quite promising. The method may become a useful tool, or at least plays a complementary role to the existing method, for identifying colorectal cancer genes. It has not escaped our notice that the method can be applied to identify the genes of other diseases as well.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号