首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A test-statistic typically employed in the gene set enrichment analysis (GSEA) prevents this method from being genuinely multivariate. In particular, this statistic is insensitive to changes in the correlation structure of the gene sets of interest. The present paper considers the utility of an alternative test-statistic in designing the confirmatory component of the GSEA. This statistic is based on a pertinent distance between joint distributions of expression levels of genes included in the set of interest. The null distribution of the proposed test-statistic, known as the multivariate N-statistic, is obtained by permuting group labels. Our simulation studies and analysis of biological data confirm the conjecture that the N-statistic is a much better choice for multivariate significance testing within the framework of the GSEA. We also discuss some other aspects of the GSEA paradigm and suggest new avenues for future research.  相似文献   

2.
GSEA是一个可下载后免费使用的全基因组表达谱芯片数据分析工具。它根据已有的对基因的定位、性质、功能、生物学意义等知识的基础上,首先构建了一个分子标签数据库,数据库中包含了多个功能基因集。通过分析一组处于两个生物学状态的基因表达谱杂交数据,它们在特定的功能基因集中的表达状况,以及这种表达状况是否存在某种统计学显著性。GSEA是从另一个角度来诠释生物信息,可进一步完善我们对相关生物学事件的认识。  相似文献   

3.
4.

Background  

Gene set enrichment analysis (GSEA) is a microarray data analysis method that uses predefined gene sets and ranks of genes to identify significant biological changes in microarray data sets. GSEA is especially useful when gene expression changes in a given microarray data set is minimal or moderate.  相似文献   

5.
6.
7.

Background  

With the current technological advances in high-throughput biology, the necessity to develop tools that help to analyse the massive amount of data being generated is evident. A powerful method of inspecting large-scale data sets is gene set enrichment analysis (GSEA) and investigation of protein structural features can guide determining the function of individual genes. However, a convenient tool that combines these two features to aid in high-throughput data analysis has not been developed yet. In order to fill this niche, we developed the user-friendly, web-based application, PhenoFam.  相似文献   

8.
Lipid metabolism reprogramming plays important role in cell growth, proliferation, angiogenesis and invasion in cancers. However, the diverse lipid metabolism programmes and prognostic value during glioma progression remain unclear. Here, the lipid metabolism‐related genes were profiled using RNA sequencing data from The Cancer Genome Atlas (TCGA) and Chinese Glioma Genome Atlas (CGGA) database. Gene ontology (GO) and gene set enrichment analysis (GSEA) found that glioblastoma (GBM) mainly exhibited enrichment of glycosphingolipid metabolic progress, whereas lower grade gliomas (LGGs) showed enrichment of phosphatidylinositol metabolic progress. According to the differential genes of lipid metabolism between LGG and GBM, we developed a nine‐gene set using Cox proportional hazards model with elastic net penalty, and the CGGA cohort was used for validation data set. Survival analysis revealed that the obtained gene set could differentiate the outcome of low‐ and high‐risk patients in both cohorts. Meanwhile, multivariate Cox regression analysis indicated that this signature was a significantly independent prognostic factor in diffuse gliomas. Gene ontology and GSEA showed that high‐risk cases were associated with phenotypes of cell division and immune response. Collectively, our findings provided a new sight on lipid metabolism in diffuse gliomas.  相似文献   

9.
基于基因表达变异性的通路富集方法研究   总被引:1,自引:0,他引:1       下载免费PDF全文
当前的通路富集方法主要是基于基因的表达差异,很少有方法从通路变异性(方差)角度对其富集分析.我们注意到用合适的统计量描述通路的变异性时,在疾病表型下一些通路的变异性有明显的上升或者下降.因此本研究假设:通路变异性程度在不同表型中存在差异.本文设计了14种描述通路变异性的统计量与检验方法,检测不同表型下变异性有差异的通路即富集通路,并将富集结果与文献检索结果进行比较,同时,分析不同芯片预处理方法对数据和结果的影响.研究结果表明:5种预处理方法中,多阵列对数健壮算法(RMA)是数据预处理的最优方法;不同表型下通路的变异性程度存在差异;根据文献检索的通路结果,14种基于变异性的通路富集方法中,以通路中各基因欧氏距离的方差做统计量进行permutation检验(方法11)能有效识别显著通路,其富集结果优于基因集富集分析(GSEA).综上所述,基于通路变异性的通路富集策略具有可行性,不仅对通路富集分析有一定的理论指导意义,而且为人类疾病研究提供新的视角.  相似文献   

10.
11.
ABSTRACT: BACKGROUND: Gene-set enrichment analyses (GEA or GSEA) are commonly used for biological characterization of an experimental gene-set. This is done by finding known functional categories, such as pathways or Gene Ontology terms, that are over-represented in the experimental set; the assessment is based on an overlap statistic. Rich biological information in terms of gene interaction network is now widely available, but this topological information is not used by GEA, so there is a need for methods that exploit this type of information in high-throughput data analysis. RESULTS: We developed a method of network enrichment analysis (NEA) that extends the overlap statistic in GEA to network links between genes in the experimental set and those in the functional categories. For the crucial step in statistical inference, we developed a fast network randomization algorithm in order to obtain the distribution of any network statistic under the null hypothesis of no association between an experimental gene-set and a functional category. We illustrate the NEA method using gene and protein expression data from a lung cancer study. CONCLUSIONS: The results indicate that the NEA method is more powerful than the traditional GEA, primarily because the relationships between gene sets were more strongly captured by network connectivity rather than by simple overlaps.  相似文献   

12.
Fistulifera sp. strain JPCC DA0580 is a newly sequenced pennate diatom that is capable of simultaneously growing and accumulating lipids. This is a unique trait, not found in other related microalgae so far. It is able to accumulate between 40 to 60% of its cell weight in lipids, making it a strong candidate for the production of biofuel. To investigate this characteristic, we used RNA-Seq data gathered at four different times while Fistulifera sp. strain JPCC DA0580 was grown in oil accumulating and non-oil accumulating conditions. We then adapted gene set enrichment analysis (GSEA) to investigate the relationship between the difference in gene expression of 7,822 genes and metabolic functions in our data. We utilized information in the KEGG pathway database to create the gene sets and changed GSEA to use re-sampling so that data from the different time points could be included in the analysis. Our GSEA method identified photosynthesis, lipid synthesis and amino acid synthesis related pathways as processes that play a significant role in oil production and growth in Fistulifera sp. strain JPCC DA0580. In addition to GSEA, we visualized the results by creating a network of compounds and reactions, and plotted the expression data on top of the network. This made existing graph algorithms available to us which we then used to calculate a path that metabolizes glucose into triacylglycerol (TAG) in the smallest number of steps. By visualizing the data this way, we observed a separate up-regulation of genes at different times instead of a concerted response. We also identified two metabolic paths that used less reactions than the one shown in KEGG and showed that the reactions were up-regulated during the experiment. The combination of analysis and visualization methods successfully analyzed time-course data, identified important metabolic pathways and provided new hypotheses for further research.  相似文献   

13.
The development of microarray technology allows the simultaneous measurement of the expression of many thousands of genes. The information gained offers an unprecedented opportunity to fully characterize biological processes. However, this challenge will only be successful if new tools for the efficient integration and interpretation of large datasets are available. One of these tools, pathway analysis, involves looking for consistent but subtle changes in gene expression by incorporating either pathway or functional annotations. We review several methods of pathway analysis and compare the performance of three, the binomial distribution, z scores, and gene set enrichment analysis, on two microarray datasets. Pathway analysis is a promising tool to identify the mechanisms that underlie diseases, adaptive physiological compensatory responses and new avenues for investigation.  相似文献   

14.
Ovarian carcinoma (OC) is one of the most common malignant tumors in female genitals. In recent years, the therapeutic effect of OC has been significantly improved through the application of effective chemotherapy regimen. However, the 5-year survival rate is also lower than 30% due to high rate of relapse. So, it is needed to screen reliable predictive and prognostic markers of OC. Ovarian cancer gene expression data and corresponding clinical data used were downloaded from Gene Expression Omnibus database. Weighted gene expression network analysis (WGCNA) and Cox proportional hazards regression (PHR) were used to screen Pathological Grade and Prognosis-associated long noncoding RNA (lncRNA). Kaplan-Meier analysis and receiver operating characteristic curves analysis were performed to evaluate the predictive ability of the selected lncRNA. Gene Ontology (GO) enrichment and Gene Set Enrichment Analysis (GSEA) enrichment analysis methods were used to explore the possible mechanisms of the selected lncRNA affecting the development of OC. Five reliably lncRNAs (LINC00664, LINC00667, LINC01139, LINC01419, and LOC286437) was identified through a series of bioinformatics methods. In testing cohorts, we found that the five lncRNAs in predicting the risk of OC recurrence is robustness, and multivariate Cox PHR analysis indicate that the five lncRNAs is an independent risk factor for OC recurrence. Moreover, GO and GSEA enrichment analysis showed that the five lncRNAs are involved in multiple ovarian cancer occurrence mechanism. In summary, all these findings indicated that the five lncRNAs can effectively predict the risk of recurrence of ovarian cancer.  相似文献   

15.
16.
The cancer stem cell (CSC) hypothesis implicates the development of new therapeutic approaches to target the CSC population. Characterization of the pathways that regulate CSCs activity will facilitate the development of targeted therapies. We recently reported that the enzymatic activity of ALDH1, as measured by the ALDELFUOR assay, can be utilized to isolate normal and malignant breast stem cells in both primary tumors and cell lines. In this study, utilizing a tumorsphere assay, we have demonstrated the role of retinoid signaling in the regulation of breast CSCs self-renewal and differentiation. Utilizing the gene set enrichment analysis (GSEA) algorithm we identified gene sets and pathways associated with retinoid signaling. These pathways regulate breast CSCs biology and their inhibition may provide novel therapeutic approaches to target breast CSCs.  相似文献   

17.

Background  

Recently, microarray data analyses using functional pathway information, e.g., gene set enrichment analysis (GSEA) and significance analysis of function and expression (SAFE), have gained recognition as a way to identify biological pathways/processes associated with a phenotypic endpoint. In these analyses, a local statistic is used to assess the association between the expression level of a gene and the value of a phenotypic endpoint. Then these gene-specific local statistics are combined to evaluate association for pre-selected sets of genes. Commonly used local statistics include t-statistics for binary phenotypes and correlation coefficients that assume a linear or monotone relationship between a continuous phenotype and gene expression level. Methods applicable to continuous non-monotone relationships are needed. Furthermore, for multiple experimental categories, methods that combine multiple GSEA/SAFE analyses are needed.  相似文献   

18.
19.
Milk production traits, such as 305‐day milk yield (305MY), have been under direct selection to improve production in dairy cows. Over the past 50 years, the average milk yield has nearly doubled, and over 56% of the increase is attributable to genetic improvement. As such, additional improvements in milk yield are still possible as new loci are identified. The objectives of this study were to detect SNPs and gene sets associated with 305MY in order to identify new candidate genes contributing to variation in milk production. A population of 781 primiparous Holstein cows from six central Washington dairies with records of 305MY and energy corrected milk were used to perform a genome‐wide association analysis (GWAA) using the Illumina BovineHD BeadChip (777 962 SNPs) to identify QTL associated with 305MY (< 1.0 × 10?5). A gene set enrichment analysis with SNP data (GSEA‐SNP) was performed to identify gene sets (normalized enrichment score > 3.0) and leading edge genes (LEGs) influencing 305MY. The GWAA identified three QTL comprising 34 SNPs and 30 positional candidate genes. In the GSEA‐SNP, five gene sets with 58 unique and 24 shared LEGs contributed to 305MY. Identification of QTL and LEGs associated with 305MY can provide additional targets for genomic selection to continue to improve 305MY in dairy cattle.  相似文献   

20.

Background  

Gene set enrichment testing has helped bridge the gap from an individual gene to a systems biology interpretation of microarray data. Although gene sets are defined a priori based on biological knowledge, current methods for gene set enrichment testing treat all genes equal. It is well-known that some genes, such as those responsible for housekeeping functions, appear in many pathways, whereas other genes are more specialized and play a unique role in a single pathway. Drawing inspiration from the field of information retrieval, we have developed and present here an approach to incorporate gene appearance frequency (in KEGG pathways) into two current methods, Gene Set Enrichment Analysis (GSEA) and logistic regression-based LRpath framework, to generate more reproducible and biologically meaningful results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号