首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background  

Gene set analysis (GSA) is a widely used strategy for gene expression data analysis based on pathway knowledge. GSA focuses on sets of related genes and has established major advantages over individual gene analyses, including greater robustness, sensitivity and biological relevance. However, previous GSA methods have limited usage as they cannot handle datasets of different sample sizes or experimental designs.  相似文献   

2.

Introduction

Gene-set analysis (GSA) methods are used as complementary approaches to genome-wide association studies (GWASs). The single marker association estimates of a predefined set of genes are either contrasted with those of all remaining genes or with a null non-associated background. To pool the p-values from several GSAs, it is important to take into account the concordance of the observed patterns resulting from single marker association point estimates across any given gene set. Here we propose an enhanced version of Fisher’s inverse χ2-method META-GSA, however weighting each study to account for imperfect correlation between association patterns.

Simulation and Power

We investigated the performance of META-GSA by simulating GWASs with 500 cases and 500 controls at 100 diallelic markers in 20 different scenarios, simulating different relative risks between 1 and 1.5 in gene sets of 10 genes. Wilcoxon’s rank sum test was applied as GSA for each study. We found that META-GSA has greater power to discover truly associated gene sets than simple pooling of the p-values, by e.g. 59% versus 37%, when the true relative risk for 5 of 10 genes was assume to be 1.5. Under the null hypothesis of no difference in the true association pattern between the gene set of interest and the set of remaining genes, the results of both approaches are almost uncorrelated. We recommend not relying on p-values alone when combining the results of independent GSAs.

Application

We applied META-GSA to pool the results of four case-control GWASs of lung cancer risk (Central European Study and Toronto/Lunenfeld-Tanenbaum Research Institute Study; German Lung Cancer Study and MD Anderson Cancer Center Study), which had already been analyzed separately with four different GSA methods (EASE; SLAT, mSUMSTAT and GenGen). This application revealed the pathway GO0015291 “transmembrane transporter activity” as significantly enriched with associated genes (GSA-method: EASE, p = 0.0315 corrected for multiple testing). Similar results were found for GO0015464 “acetylcholine receptor activity” but only when not corrected for multiple testing (all GSA-methods applied; p≈0.02).  相似文献   

3.

Background

Gene Set Analysis (GSA) identifies differential expression gene sets amid the different phenotypes. The results of published papers in this filed are inconsistent and there is no consensus on the best method. In this paper two new methods, in comparison to the previous ones, are introduced for GSA.

Methods

The MMGSA and MRGSA methods based on multivariate nonparametric techniques were presented. The implementation of five GSA methods (Hotelling's T2, Globaltest, Abs_Cat, Med_Cat and Rs_Cat) and the novel methods to detect differential gene expression between phenotypes were compared using simulated and real microarray data sets.

Results

In a real dataset, the results showed that the powers of MMGSA and MRGSA were as well as Globaltest and Tsai. The MRGSA method has not a good performance in the simulation dataset.

Conclusions

The Globaltest method is the best method in the real or simulation datasets. The performance of MMGSA in simulation dataset is good in small-size gene sets. The GLS methods are not good in the simulated data, except the Med_Cat method in large-size gene sets.  相似文献   

4.
5.

Background

Gene set analysis (GSA) methods test the association of sets of genes with phenotypes in gene expression microarray studies. While GSA methods on a single binary or categorical phenotype abounds, little attention has been paid to the case of a continuous phenotype, and there is no method to accommodate correlated multiple continuous phenotypes.

Result

We propose here an extension of the linear combination test (LCT) to its new version for multiple continuous phenotypes, incorporating correlations among gene expressions of functionally related gene sets, as well as correlations among multiple phenotypes. Further, we extend our new method to its nonlinear version, referred as nonlinear combination test (NLCT), to test potential nonlinear association of gene sets with multiple phenotypes. Simulation study and a real microarray example demonstrate the practical aspects of the proposed methods.

Conclusion

The proposed approaches are effective in controlling type I errors and powerful in testing associations between gene-sets and multiple continuous phenotypes. They are both computationally effective. Naively (univariately) analyzing a group of multiple correlated phenotypes could be dangerous. R-codes to perform LCT and NLCT for multiple continuous phenotypes are available at http://www.ualberta.ca/~yyasui/homepage.html.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-260) contains supplementary material, which is available to authorized users.  相似文献   

6.

Background  

The analysis of high-throughput gene expression data with respect to sets of genes rather than individual genes has many advantages. A variety of methods have been developed for assessing the enrichment of sets of genes with respect to differential expression. In this paper we provide a comparative study of four of these methods: Fisher's exact test, Gene Set Enrichment Analysis (GSEA), Random-Sets (RS), and Gene List Analysis with Prediction Accuracy (GLAPA). The first three methods use associative statistics, while the fourth uses predictive statistics. We first compare all four methods on simulated data sets to verify that Fisher's exact test is markedly worse than the other three approaches. We then validate the other three methods on seven real data sets with known genetic perturbations and then compare the methods on two cancer data sets where our a priori knowledge is limited.  相似文献   

7.
8.

Background  

Many methods have been developed to test the enrichment of genes related to certain phenotypes or cell states in gene sets. These approaches usually combine gene expression data with functionally related gene sets as defined in databases such as GeneOntology (GO), KEGG, or BioCarta. The results based on gene set analysis are generally more biologically interpretable, accurate and robust than the results based on individual gene analysis. However, while most available methods for gene set enrichment analysis test the enrichment of the entire gene set, it is more likely that only a subset of the genes in the gene set may be related to the phenotypes of interest.  相似文献   

9.

Background

Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with gene set analysis (GSA) methods for chemical treatment identification, for pharmacological mechanism elucidation, and for comparing compound toxicity profiles.

Methods

We created 30,211 chemical response-specific gene sets for human and mouse by next-gen TM, and derived 1,189 (human) and 588 (mouse) gene sets from the Comparative Toxicogenomics Database (CTD). We tested for significant differential expression (SDE) (false discovery rate -corrected p-values < 0.05) of the next-gen TM-derived gene sets and the CTD-derived gene sets in gene expression (GE) data sets of five chemicals (from experimental models). We tested for SDE of gene sets for six fibrates in a peroxisome proliferator-activated receptor alpha (PPARA) knock-out GE dataset and compared to results from the Connectivity Map. We tested for SDE of 319 next-gen TM-derived gene sets for environmental toxicants in three GE data sets of triazoles, and tested for SDE of 442 gene sets associated with embryonic structures. We compared the gene sets to triazole effects seen in the Whole Embryo Culture (WEC), and used principal component analysis (PCA) to discriminate triazoles from other chemicals.

Results

Next-gen TM-derived gene sets matching the chemical treatment were significantly altered in three GE data sets, and the corresponding CTD-derived gene sets were significantly altered in five GE data sets. Six next-gen TM-derived and four CTD-derived fibrate gene sets were significantly altered in the PPARA knock-out GE dataset. None of the fibrate signatures in cMap scored significant against the PPARA GE signature. 33 environmental toxicant gene sets were significantly altered in the triazole GE data sets. 21 of these toxicants had a similar toxicity pattern as the triazoles. We confirmed embryotoxic effects, and discriminated triazoles from other chemicals.

Conclusions

Gene set analysis with next-gen TM-derived chemical response-specific gene sets is a scalable method for identifying similarities in gene responses to other chemicals, from which one may infer potential mode of action and/or toxic effect.  相似文献   

10.

Background  

Recently, a great effort in microarray data analysis is directed towards the study of the so-called gene sets. A gene set is defined by genes that are, somehow, functionally related. For example, genes appearing in a known biological pathway naturally define a gene set. The gene sets are usually identified from a priori biological knowledge. Nowadays, many bioinformatics resources store such kind of knowledge (see, for example, the Kyoto Encyclopedia of Genes and Genomes, among others). Although pathways maps carry important information about the structure of correlation among genes that should not be neglected, the currently available multivariate methods for gene set analysis do not fully exploit it.  相似文献   

11.

Background  

Gene set enrichment analysis (GSEA) is a microarray data analysis method that uses predefined gene sets and ranks of genes to identify significant biological changes in microarray data sets. GSEA is especially useful when gene expression changes in a given microarray data set is minimal or moderate.  相似文献   

12.

Background  

The most popular methods for significance analysis on microarray data are well suited to find genes differentially expressed across predefined categories. However, identification of features that correlate with continuous dependent variables is more difficult using these methods, and long lists of significant genes returned are not easily probed for co-regulations and dependencies. Dimension reduction methods are much used in the microarray literature for classification or for obtaining low-dimensional representations of data sets. These methods have an additional interpretation strength that is often not fully exploited when expression data are analysed. In addition, significance analysis may be performed directly on the model parameters to find genes that are important for any number of categorical or continuous responses. We introduce a general scheme for analysis of expression data that combines significance testing with the interpretative advantages of the dimension reduction methods. This approach is applicable both for explorative analysis and for classification and regression problems.  相似文献   

13.

Purpose

Identification of key inputs and their effect on results from Life Cycle Assessment (LCA) models is fundamental. Because parameter importance varies greatly between cases due to the interaction of sensitivity and uncertainty, these features should never be defined a priori. However, exhaustive parametrical uncertainty analyses may potentially be complicated and demanding, both with analytical and sampling methods. Therefore, we propose a systematic method for selection of critical parameters based on a simplified analytical formulation that unifies the concepts of sensitivity and uncertainty in a Global Sensitivity Analysis (GSA) framework.

Methods

The proposed analytical method based on the calculation of sensitivity coefficients (SC) is evaluated against Monte Carlo sampling on traditional uncertainty assessment procedures, both for individual parameters and for full parameter sets. Three full-scale waste management scenarios are modelled with the dedicated waste LCA model EASETECH and a full range of ILCD recommended impact categories. Common uncertainty ranges of 10 % are used for all parameters, which we assume to be normally distributed. The applicability of the concepts of additivity of variances and GSA is tested on results from both uncertainty propagation methods. Then, we examine the differences in discernibility analyses results carried out with varying numbers of sampling points and parameters.

Results and discussion

The proposed analytical method complies with the Monte Carlo results for all scenarios and impact categories, but offers substantially simpler mathematical formulation and shorter computation times. The coefficients of variation obtained with the analytical method and Monte Carlo differ only by 1 %, indicating that the analytical method provides a reliable representation of uncertainties and allows determination of whether a discernibility analysis is required. The additivity of variances and the GSA approach show that the uncertainty in results is determined by a limited set of important parameters. The results of the discernibility analysis based on these critical parameters vary only by 1 % from discernibility analyses based on the full set, but require significantly fewer Monte Carlo runs.

Conclusions

The proposed method and GSA framework provide a fast and valuable approximation for uncertainty quantification. Uncertainty can be represented sparsely by contextually identifying important parameters in a systematic manner. The proposed method integrates with existing step-wise approaches for uncertainty analysis by introducing a global importance analysis before uncertainty propagation.
  相似文献   

14.

Background  

One challenge facing biologists is to tease out useful information from massive data sets for further analysis. A pathway-based analysis may shed light by projecting candidate genes onto protein functional relationship networks. We are building such a pathway-based analysis system.  相似文献   

15.

Background  

With DNA microarray data, selecting a compact subset of discriminative genes from thousands of genes is a critical step for accurate classification of phenotypes for, e.g., disease diagnosis. Several widely used gene selection methods often select top-ranked genes according to their individual discriminative power in classifying samples into distinct categories, without considering correlations among genes. A limitation of these gene selection methods is that they may result in gene sets with some redundancy and yield an unnecessary large number of candidate genes for classification analyses. Some latest studies show that incorporating gene to gene correlations into gene selection can remove redundant genes and improve classification accuracy.  相似文献   

16.

Background  

Gene duplication and gene loss during the evolution of eukaryotes have hindered attempts to estimate phylogenies and divergence times of species. Although current methods that identify clusters of orthologous genes in complete genomes have helped to investigate gene function and gene content, they have not been optimized for evolutionary sequence analyses requiring strict orthology and complete gene matrices. Here we adopt a relatively simple and fast genome comparison approach designed to assemble orthologs for evolutionary analysis. Our approach identifies single-copy genes representing only species divergences (panorthologs) in order to minimize potential errors caused by gene duplication. We apply this approach to complete sets of proteins from published eukaryote genomes specifically for phylogeny and time estimation.  相似文献   

17.

Background  

Gene clustering has been widely used to group genes with similar expression pattern in microarray data analysis. Subsequent enrichment analysis using predefined gene sets can provide clues on which functional themes or regulatory sequence motifs are associated with individual gene clusters. In spite of the potential utility, gene clustering and enrichment analysis have been used in separate platforms, thus, the development of integrative algorithm linking both methods is highly challenging.  相似文献   

18.

Background  

Large microarray datasets have enabled gene regulation to be studied through coexpression analysis. While numerous methods have been developed for identifying differentially expressed genes between two conditions, the field of differential coexpression analysis is still relatively new. More specifically, there is so far no sensitive and untargeted method to identify gene modules (also known as gene sets or clusters) that are differentially coexpressed between two conditions. Here, sensitive and untargeted means that the method should be able to construct de novo modules by grouping genes based on shared, but subtle, differential correlation patterns.  相似文献   

19.

Background  

There are many methods for analyzing microarray data that group together genes having similar patterns of expression over all conditions tested. However, in many instances the biologically important goal is to identify relatively small sets of genes that share coherent expression across only some conditions, rather than all or most conditions as required in traditional clustering; e.g. genes that are highly up-regulated and/or down-regulated similarly across only a subset of conditions. Equally important is the need to learn which conditions are the decisive ones in forming such gene sets of interest, and how they relate to diverse conditional covariates, such as disease diagnosis or prognosis.  相似文献   

20.

Background  

Differential co-expression analysis is an emerging strategy for characterizing disease related dysregulation of gene expression regulatory networks. Given pre-defined sets of biological samples, such analysis aims at identifying genes that are co-expressed in one, but not in the other set of samples.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号