首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
当两组样本间基因表达的差异程度较低或样本量较少时,采用通常的错误发现率(falsediscovery rate,FDR)控制水平(如5%或10%),可能无法识别足够多的差异表达基因以进行后续的功能富集分析。然而,功能富集分析对差异表达基因中的错误发现具有一定的稳健性。所以,采用较低的FDR控制水平(即允许较高的FDR)识别差异表达基因,可能可以可靠地发现疾病相关功能。本文分析了5套研究乳腺癌转移的基因表达谱,通过其中差异表达信号较强的3套数据,论证了即使差异表达基因的FDR达到25%,功能富集分析的结果仍具有较高的稳健性。然后,在另外2套差异表达信号微弱的数据中,采用25%的FDR控制水平筛选差异表达基因来进行功能富集分析,并与前述3套数据的功能富集结果做比较。结果显示,采用较低的FDR控制水平筛选差异表达基因,仍然可以可靠地识别乳腺癌转移相关功能。分析结果也提示,在乳腺癌转移过程中,一些功能较为宽泛的生物学过程(如细胞分裂、细胞周期和DNA复制等)整体受到了扰动,反映出乳腺癌转移是一种涉及广泛基因表达改变的系统性疾病。  相似文献   

2.

Background

Using hybrid approach for gene selection and classification is common as results obtained are generally better than performing the two tasks independently. Yet, for some microarray datasets, both classification accuracy and stability of gene sets obtained still have rooms for improvement. This may be due to the presence of samples with wrong class labels (i.e. outliers). Outlier detection algorithms proposed so far are either not suitable for microarray data, or only solve the outlier detection problem on their own.

Results

We tackle the outlier detection problem based on a previously proposed Multiple-Filter-Multiple-Wrapper (MFMW) model, which was demonstrated to yield promising results when compared to other hybrid approaches (Leung and Hung, 2010). To incorporate outlier detection and overcome limitations of the existing MFMW model, three new features are introduced in our proposed MFMW-outlier approach: 1) an unbiased external Leave-One-Out Cross-Validation framework is developed to replace internal cross-validation in the previous MFMW model; 2) wrongly labeled samples are identified within the MFMW-outlier model; and 3) a stable set of genes is selected using an L1-norm SVM that removes any redundant genes present. Six binary-class microarray datasets were tested. Comparing with outlier detection studies on the same datasets, MFMW-outlier could detect all the outliers found in the original paper (for which the data was provided for analysis), and the genes selected after outlier removal were proven to have biological relevance. We also compared MFMW-outlier with PRAPIV (Zhang et al., 2006) based on same synthetic datasets. MFMW-outlier gave better average precision and recall values on three different settings. Lastly, artificially flipped microarray datasets were created by removing our detected outliers and flipping some of the remaining samples'' labels. Almost all the ‘wrong’ (artificially flipped) samples were detected, suggesting that MFMW-outlier was sufficiently powerful to detect outliers in high-dimensional microarray datasets.  相似文献   

3.
基于基因表达谱识别乳腺癌转移相关差异表达基因及其功能时,由于基因表达在个体间的变异相对较高而样本量相对较少,由不同研究识别的差异表达基因的可重复性较低。本文基于两套乳腺癌转移基因表达谱,评价两组差异表达基因及其所富集的功能的可重复性。结果显示:在两套表达谱中识别的差异表达基因的表达改变方向高度一致并具有显著的表达相关性;基于两组差异表达基因识别的转移相关功能在两套表达谱中高度可重复,主要涉及细胞分裂、细胞周期、DNA复制、染色体分离、磷酸肌醇介导信号转导和DNA损伤刺激应答等。  相似文献   

4.
基因芯片筛选差异表达基因方法比较   总被引:1,自引:0,他引:1  
单文娟  童春发  施季森 《遗传》2008,30(12):1640-1646
摘要: 使用计算机模拟数据和真实的芯片数据, 对8种筛选差异表达基因的方法进行了比较分析, 旨在比较不同方法对基因芯片数据的筛选效果。模拟数据分析表明, 所使用的8种方法对均匀分布的差异表达基因有很好的识别、检出作用。算法方面, SAM和Wilcoxon秩和检验方法较好; 数据分布方面, 正态分布的识别效果较好, 卡方分布和指数分布的识别效果较差。杨树cDNA芯片分析表明, SAM、Samroc和回归模型方法相近, 而Wilcoxon秩和检验方法与它们有较大差异。  相似文献   

5.
6.
Identifying differentially expressed (DE) genes across conditions or treatments is a typical problem in microarray experiments. In time course microarray experiments (under two or more conditions/treatments), it is sometimes of interest to identify two classes of DE genes: those with no time-condition interactions (called parallel DE genes, or PDE), and those with time-condition interactions (nonparallel DE genes, NPDE). Although many methods have been proposed for identifying DE genes in time course experiments, methods for discerning NPDE genes from the general DE genes are still lacking. We propose a functional ANOVA mixed-effect model to model time course gene expression observations. The fixed effect of (the mean curve) of the model decomposes bivariate functions of time and treatments (or experimental conditions) as in the classic ANOVA method and provides the associated notions of main effects and interactions. Random effects capture time-dependent correlation structures. In this model, identifying NPDE genes is equivalent to testing the significance of the time-condition interaction, for which an approximate F-test is suggested. We examined the performance of the proposed method on simulated datasets in comparison with some existing methods, and applied the method to a study of human reaction to the endotoxin stimulation, as well as to a cell cycle expression data set.  相似文献   

7.
High-density oligonucleotide arrays are widely used for analysis of gene expression on a genomic scale, but the generated data remain largely inaccessible for comparative analysis purposes. Similarity searches in databases with differentially expressed gene (DEG) lists may be used to assign potential functions to new genes and to identify potential chemical inhibitors/activators and genetic suppressors/enhancers. Although this is a very promising concept, it requires the compatibility and validity of the DEG lists to be significantly improved. Using Arabidopsis and human datasets, we have developed guidelines for the performance of similarity searches against databases that collect microarray data. We found that, in comparison with many other methods, a rank-product analysis achieves a higher degree of inter- and intra-laboratory consistency of DEG lists, and is advantageous for assessing similarities and differences between them. To support this concept, we developed a tool called MASTA (microarray overlap search tool and analysis), and re-analyzed over 600 Arabidopsis microarray expression datasets. This revealed that large-scale searches produce reliable intersections between DEG lists that prove to be useful for genetic analysis, thus aiding in the characterization of cellular and molecular mechanisms. We show that this approach can be used to discover unexpected connections and to illuminate unanticipated interactions between individual genes.  相似文献   

8.
9.
10.
11.
MOTIVATION: The rapid accumulation of microarray datasets provides unique opportunities to perform systematic functional characterization of the human genome. We designed a graph-based approach to integrate cross-platform microarray data, and extract recurrent expression patterns. A series of microarray datasets can be modeled as a series of co-expression networks, in which we search for frequently occurring network patterns. The integrative approach provides three major advantages over the commonly used microarray analysis methods: (1) enhance signal to noise separation (2) identify functionally related genes without co-expression and (3) provide a way to predict gene functions in a context-specific way. RESULTS: We integrate 65 human microarray datasets, comprising 1105 experiments and over 11 million expression measurements. We develop a data mining procedure based on frequent itemset mining and biclustering to systematically discover network patterns that recur in at least five datasets. This resulted in 143,401 potential functional modules. Subsequently, we design a network topology statistic based on graph random walk that effectively captures characteristics of a gene's local functional environment. Function annotations based on this statistic are then subject to the assessment using the random forest method, combining six other attributes of the network modules. We assign 1126 functions to 895 genes, 779 known and 116 unknown, with a validation accuracy of 70%. Among our assignments, 20% genes are assigned with multiple functions based on different network environments. AVAILABILITY: http://zhoulab.usc.edu/ContextAnnotation.  相似文献   

12.
Multidimensional local false discovery rate for microarray studies   总被引:1,自引:0,他引:1  
MOTIVATION: The false discovery rate (fdr) is a key tool for statistical assessment of differential expression (DE) in microarray studies. Overall control of the fdr alone, however, is not sufficient to address the problem of genes with small variance, which generally suffer from a disproportionally high rate of false positives. It is desirable to have an fdr-controlling procedure that automatically accounts for gene variability. METHODS: We generalize the local fdr as a function of multiple statistics, combining a common test statistic for assessing DE with its standard error information. We use a non-parametric mixture model for DE and non-DE genes to describe the observed multi-dimensional statistics, and estimate the distribution for non-DE genes via the permutation method. We demonstrate this fdr2d approach for simulated and real microarray data. RESULTS: The fdr2d allows objective assessment of DE as a function of gene variability. We also show that the fdr2d performs better than commonly used modified test statistics. AVAILABILITY: An R-package OCplus containing functions for computing fdr2d() and other operating characteristics of microarray data is available at http://www.meb.ki.se/~yudpaw.  相似文献   

13.
The number of antral follicles counted (AFC) by ultrasound is associated with fertility in cattle. Cows with higher follicle count (HFC) have higher performance in reproductive‐assisted technologies than cows with lower follicle count (LFC). In this study, we aimed to define the preantral follicle count by histology and to identify differentially expressed genes (DEGs) using a microarray in Nelore and Angus heifers with HFC and LFC. The ovaries of each animal were scanned with an ultrasound device 12 to 24 hr after estrus. The groups were formed based on the average number of total follicles (≥3 mm) counted in each breed consistently ± the standard deviation. For the histological analysis, preantral follicles were counted and classified under a stereo microscope, and follicle density was determined. Microarray analysis was performed on pools of three follicles dissected from the ovaries of 15 Nelore (6 HFC and 9 LFC) and 17 Angus heifers (9 HFC and 8 LFC). Angus heifers have increased total and primordial follicle density. Nelore heifers have increased antral follicle count. Different patterns of gene expression regulate follicle recruitment and development in Angus and Nelore heifers and may be associated with the different follicle densities observed in Angus versus Nelore heifers. Furthermore, HFC heifers presented increased expression of genes associated with cellular development and metabolism.  相似文献   

14.

Introduction

Microarray analysis has revolutionized the role of genomic prognostication in breast cancer. However, most studies are single series studies, and suffer from methodological problems. We sought to use a meta-analytic approach in combining multiple publicly available datasets, while correcting for batch effects, to reach a more robust oncogenomic analysis.

Aim

The aim of the present study was to find gene sets associated with distant metastasis free survival (DMFS) in systemically untreated, node-negative breast cancer patients, from publicly available genomic microarray datasets.

Methods

Four microarray series (having 742 patients) were selected after a systematic search and combined. Cox regression for each gene was done for the combined dataset (univariate, as well as multivariate – adjusted for expression of Cell cycle related genes) and for the 4 major molecular subtypes. The centre and microarray batch effects were adjusted by including them as random effects variables. The Cox regression coefficients for each analysis were then ranked and subjected to a Gene Set Enrichment Analysis (GSEA).

Results

Gene sets representing protein translation were independently negatively associated with metastasis in the Luminal A and Luminal B subtypes, but positively associated with metastasis in Basal tumors. Proteinaceous extracellular matrix (ECM) gene set expression was positively associated with metastasis, after adjustment for expression of cell cycle related genes on the combined dataset. Finally, the positive association of the proliferation-related genes with metastases was confirmed.

Conclusion

To the best of our knowledge, the results depicting mixed prognostic significance of protein translation in breast cancer subtypes are being reported for the first time. We attribute this to our study combining multiple series and performing a more robust meta-analytic Cox regression modeling on the combined dataset, thus discovering ''hidden'' associations. This methodology seems to yield new and interesting results and may be used as a tool to guide new research.  相似文献   

15.
16.

Background

Meta-analysis of gene expression microarray datasets presents significant challenges for statistical analysis. We developed and validated a new bioinformatic method for the identification of genes upregulated in subsets of samples of a given tumour type (‘outlier genes’), a hallmark of potential oncogenes.

Methodology

A new statistical method (the gene tissue index, GTI) was developed by modifying and adapting algorithms originally developed for statistical problems in economics. We compared the potential of the GTI to detect outlier genes in meta-datasets with four previously defined statistical methods, COPA, the OS statistic, the t-test and ORT, using simulated data. We demonstrated that the GTI performed equally well to existing methods in a single study simulation. Next, we evaluated the performance of the GTI in the analysis of combined Affymetrix gene expression data from several published studies covering 392 normal samples of tissue from the central nervous system, 74 astrocytomas, and 353 glioblastomas. According to the results, the GTI was better able than most of the previous methods to identify known oncogenic outlier genes. In addition, the GTI identified 29 novel outlier genes in glioblastomas, including TYMS and CDKN2A. The over-expression of these genes was validated in vivo by immunohistochemical staining data from clinical glioblastoma samples. Immunohistochemical data were available for 65% (19 of 29) of these genes, and 17 of these 19 genes (90%) showed a typical outlier staining pattern. Furthermore, raltitrexed, a specific inhibitor of TYMS used in the therapy of tumour types other than glioblastoma, also effectively blocked cell proliferation in glioblastoma cell lines, thus highlighting this outlier gene candidate as a potential therapeutic target.

Conclusions/Significance

Taken together, these results support the GTI as a novel approach to identify potential oncogene outliers and drug targets. The algorithm is implemented in an R package (Text S1).  相似文献   

17.
We developed an R function named "microarray outlier filter" (MOF) to assist in the identification of faUed arrays. In sorting a group of similar arrays by the likelihood of failure, two statistical indices were employed: the correlation coefficient and the percentage of outlier spots. MOF can be used to monitor the quality of microarray data for both trouble shooting, and to eliminate bad datasets from downstream analysis. The function is freely avaliable at http://www.wriwindber.org/ applications/mof/.  相似文献   

18.
We evaluated the expression profiles of turbot in spleen, liver, and head kidney across five temporal points of the Philasterides dicentrarchi infection process using an 8x15K Agilent oligo-microarray. The microarray included 2,176 different fivefold replicated gene probes designed from a turbot 3' sequenced EST database. We were able to identify 221 differentially expressed (DE) genes (8.1% of the whole microarray), 113 in spleen, 83 in liver, and 90 in head kidney, in at least 1 of the 5 temporal points sampled for each organ. Most of these genes could be annotated (83.0%) and functionally categorized using GO terms (69.1%) after the additional sequencing of DE genes from the 5' end. Many DE genes were related to innate and acquired immune functions. A high proportion of DE genes were organ-specific (70.6%), although their associated GO functions showed notable similarities in the three organs. The most striking difference in functional distribution was observed between the up- and downregulated gene groups. Upregulated genes were mostly associated to immune functions, while downregulated ones mainly involved metabolism-related genes. Genetic response appeared clustered in a few groups of genes with similar expression profiles along the temporal series. The information obtained will aid to understand the turbot immune response and will specifically be valuable to develop strategies of defense to P. dicentrarchi to achieve more resistant broodstocks for turbot industry.  相似文献   

19.
Confirmation of gene expression by a second methodology is critical in order to detect false-positive findings associated with microarrays. However, the impact of methodology upon the measurement of gene expression has not been rigorously evaluated. In the current study, we compared differential gene expression between PC3 and PC3-M human prostate cancer cell lines using three separate methods: microarray, quantitative RT/PCR (qRT/PCR), and Northern blotting. The PC3 to PC3-M ratio of gene expression was determined for each of 24 different genes evaluated, by each of the three methods. Comparison of gene expression ratios between Northern and microarray, Northern and qRT/PCR, and microarray and qRT/PCR, gave correlation coefficients (r) of 0.72, 0.39, and 0.63, respectively. In each instance, one to two outlier genes were apparent. Their exclusion from analysis gave r values of 0.79, 0.72, and 0.83, respectively. These findings demonstrate that the assessment of differential gene expression is dependent upon the methodology used in each situation where outcome between different methodologies was compared, the presence of a relatively limited number of outlier genes precludes high overall correlation between the methods. Validation of gene expression by different methods should be performed whenever possible.  相似文献   

20.
Landgrebe J  Wurst W  Welzl G 《Genome biology》2002,3(4):research0019.1-research001911

Background  

In microarray data analysis, the comparison of gene-expression profiles with respect to different conditions and the selection of biologically interesting genes are crucial tasks. Multivariate statistical methods have been applied to analyze these large datasets. Less work has been published concerning the assessment of the reliability of gene-selection procedures. Here we describe a method to assess reliability in multivariate microarray data analysis using permutation-validated principal components analysis (PCA). The approach is designed for microarray data with a group structure.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号