首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
目的:研究在基因芯片数据分析中自限性原假设和竞争性原假设两类方法的优劣性和准确型,选取各自具有代表性的GAGE(Generally Applicable Gene-set Enrichment)和GSEA(Gene Set Enrichment Analysis)两种基因集分析方法筛选富集基因集的效能,并探讨其筛选效果.方法:采用两种待比较的方法在实际基因表达谱数据中分析研究,比较筛选结果的准确性和科学性,探讨两种方法筛选富集基因集的效果.结果:两方法对已知的基因表达谱数据进行应用分析表明GAGE的检验效能和筛选出的基因集生物学相关性均优于GSEA.结论:GAGE作为一种自限性原假设的基因集分析方法,由于其充分利用了表达谱数据,并将表达数据分为实验集和通路集分别进行分析处理,同时考虑到基因集的上调和下调,其检验效能优于竞争性原假设的GSEA,能够得到更为准确和科学的结果.  相似文献   

2.
为寻找与结直肠癌发展和预后相关的潜在关键基因及信号通路。从美国国立信息中心NCBI的GEO数据库获得结直肠癌基因表达数据集GSE106582,通过PCA对样本进行分组,利用GEO2R进行综合分析,筛选结直肠癌与癌旁对照组的差异表达基因;通过DAVID在线工具对差异表达基因进行GO本体分析和KEGG通路富集分析,初步分析差异表达基因的生物学作用;基于STRING数据库对差异表达基因进行蛋白质相互作用网络分析,利用Cytoscape软件进行可视化并筛选关键基因;用生存分析和ROC曲线诊断对关键基因进行鉴定并通过数据集GSE21510进行验证。共鉴定出199个差异表达基因,其中53个为上调基因,146个为下调基因;上调的差异表达基因主要富集在与胶原蛋白分解代谢过程、细胞外基质分解、细胞外基质受体相互作用和PI3K/AKT信号通路等生物学过程;下调的差异表达基因主要富集在碳酸氢盐运输、一碳代谢过程、矿物质吸收、药物代谢-细胞色素P450和氮代谢通路等生物学过程;MCODE分析、生存分析和ROC诊断共发现3个基因分别为BGN、COL1A2和TIMP1可能与结直肠癌的发生发展有关,它们在肿瘤组织中的异常高表达与患者较差的生存期呈正相关,GSE21510的验证结果与GSE106582的分析结果相同。本研究采用生物信息学方法对CRC基因芯片数据进行挖掘,从基因水平探讨CRC潜在的发病机制、肿瘤标志物的及患者预后分子的筛选,以及可能的药物治疗靶点提供了一定的参考价值和理论基础。  相似文献   

3.
【目的】采用生物信息学方法分析公共数据库来源的细菌性败血症患者全血转录组学表达谱,探讨细菌败血症相关的宿主关键差异基因及意义。【方法】基于GEO数据库中GSE80496和GSE72829全血转录组基因数据集,采用GEO2R、基因集富集分析(GSEA)联用加权基因共表达网络分析(WGCNA)筛选细菌性败血症患者相比健康人群显著改变的差异基因,通过R软件对交集基因进行GO功能分析和KEGG富集分析。同时,通过String 11.0和Cytoscape分析枢纽基因,验证枢纽基因在数据集GSE72809(Health组52例,Definedsepsis组52例)全血标本中的表达情况,并探讨婴儿性别、月(胎)龄、出生体重、是否接触抗生素等因素与靶基因表达谱间的关系。【结果】分析GSE80496和GSE72829数据集分别筛选得到932个基因和319个基因,联合WGCNA枢纽模块交集得到与细菌性败血症发病相关的10个枢纽基因(MMP9、ITGAM、CSTD、GAPDH、PGLYRP1、FOLR3、OSCAR、TLR5、IL1RN和TIMP1);GSEA分析获得关键通路(氨基酸糖类-核糖代谢、PPAR信号通路、聚糖生物合成通路、自噬调控通路、补体、凝血因子级联反应、尼古丁和烟酰胺代谢、不饱和脂肪酸生物合成和阿尔兹海默症通路)及生物学过程(类固醇激素分泌、腺苷酸环化酶的激活、细胞外基质降解和金属离子运输)。【结论】本项研究通过GEO2R、GSEA联用WGCNA分析,筛选出与细菌性败血症发病相关的2个枢纽模块、10个枢纽基因以及一些关键信号通路和生物学过程,可为后续深入研究细菌性败血症致病机制奠定理论依据。  相似文献   

4.
为了对骨质疏松症基因芯片数据集进行整合分析并识别出外周血细胞中与骨质疏松症相关的枢纽基因,通过检索GEO和ArrayExpress数据库获得骨质疏松症相关的表达谱芯片数据集;运用GWGS (genome-wide global significance)方法对纳入的数据集进行整合分析,筛选出差异表达基因(differentially expressed genes, DEGs);然后,运用GO (gene ontology)富集分析和KEGG (kyoto encyclopedia of genes and genomes)通路富集分析对差异表达基因进行功能注释,并建立蛋白质相互作用(protein-protein interaction, PPI)网络,筛选出骨质疏松症相关的枢纽基因。公共数据库检索得到3个符合纳入排除标准的研究集, GWGS整合分析筛选出排序前200的DEGs,这些基因主要富集的GO条目为脂多糖的细胞反应、凋亡过程和炎症反应,与骨质疏松症相关的KEGG富集通路为破骨细胞分化等。PPI分析进一步检测到与骨质疏松症相关的10个枢纽基因,其中9个基因已有研究报道和骨质疏松症的发生发展相关,而ELANE基因还未有研究报道与骨质疏松症有关。ELANE基因同时在人的骨髓组织、小鼠骨髓和骨组织中高表达,这个基因很可能与骨质疏松症有潜在的联系。本研究的结果将有助于进一步理解骨质疏松症的分子致病机理。  相似文献   

5.
本研究是利用公共基因芯片数据库筛选乳腺癌的预后基因,预测和探索这些基因在乳腺癌进展中的可能机制和临床价值.首先,我们筛选了公共基因芯片数据库(gene expression omnibus,GEO)GSE22820和癌症基因组图谱(the cancer genome atlas,TCGA)乳腺癌数据库的重叠差异表达基因,联合R语言分析乳腺癌组织与癌旁正常组织差异表达的基因;其次,基于STRING数据库及Cytoscape软件构建蛋白质相互作用网络图,分析并识别了中枢基因和前3个模块;之后进行了更多的功能分析,包括基因本体(gene ontology,GO)和京都基因与基因组百科全书(kyoto encyclopedia of genes and genomes,KEGG)通路分析以及基因集富集分析(gene set enrichment analysis,GSEA),以研究这些基因的作用以及潜在的潜在机制;最后进行了Kaplan-Meier分析和Cox比例风险分析,以阐明这些基因的诊断和预后效果.相关数据分析表明15个基因的表达水平与生存预后相关,高表达基因患者的总生存时间短于低表达患者(P<0.05);Cox比例风险分析表明UBE2T、ER-CC6L和RAD51这3个基因是预后生存的独立因素(P<0.05);GSEA分析表明在UBE2T、ERCC6L和RAD51基因中细胞周期、基础转录因子和卵母细胞减数分裂明显富集.最终,我们得出结论,这3种基因标志物的高表达是乳腺癌预后不良因素,可作为预测乳腺癌患者转移和预后的有效生物标志物.  相似文献   

6.
《蛇志》2020,(1)
目的探讨强直性脊柱炎(AS)患者差异表达基因,并基于差异基因探讨强直性脊柱炎发病相关的可能生物学过程和信号通路。方法检索基因表达谱数据库(GEO)并筛选AS相关基因表达谱数据集。应用GEO在线分析功能GEO2R分析AS组和正常对照组的差异表达基因,用Cytoscape软件clueGO插件进行基因本体论和京都基因与基因组百科全书分析,采用String蛋白-蛋白相互作用(PPI)数据库分析差异表达基因编码蛋白间的相互作用;应用Cytoscape绘制蛋白相互作用网络图,并软件筛选信号通路关键基因分析。结果选取AS患者全血表达数据集GSE25101为研究对象,分析获得差异表达基因72个。72个差异表达基因分子功能主要为参与高迁移率族盒染色体蛋白1(HMGB1)转导机制;生物学过程主要富集于巨噬细胞迁移、骨髓细胞凋亡过程、线粒体呼吸链复合体装配、ATP合成偶联电子传输、线粒体ATP合成耦合电子输运等;细胞成分主要富集于呼吸链复合体、线粒体呼吸体等。信号通路富集于氧化磷酸化信号通路和帕金森综合征相关信号通路。PPI网络经过cytohubba插件筛选,ATP5J、NDUFS4、UQCRB、UQCRH、NDUFB3、COX7B、LSM3、ATP5EP2、ENY2、PSMA4被筛选为网络中的核心基因。结论通过生物信息学方法进行预测了AS的潜在机制,并筛选出10个潜在的与AS相关的重要分子,其中氧化磷酸化可能在AS发病机制中发挥了重要的作用。  相似文献   

7.
杨燕霞  金莲  王欣  张洁  柳小平 《生命科学研究》2020,24(2):127-135,159
为了从基因层面探讨非小细胞肺癌(non-small cell lung cancer, NSCLC)发生发展的内在机制,筛选与NSCLC诊断、预后相关的基因,为NSCLC分子机制的进一步研究提供生物信息学依据,利用生物信息学方法对GEO数据库和TCGA数据库的数据集进行合并分析,筛选NSCLC组织与正常肺组织之间的差异表达基因(differentially expressed genes, DEGs),并对所取交集的DEGs进行基因集富集分析(gene set enrichment analysis, GSEA)、基因本体论(gene ontology, GO)分析、KEGG (kyoto encyclopedia of genes and genomes)通路富集分析、蛋白质相互作用(protein-protein interaction, PPI)分析、ROC曲线诊断效能分析及LASSO生存分析。文中共筛选出240个DEGs,主要涉及核分裂、染色体分离等生物学过程。GSEA分析结果显示,富集的通路主要涉及DNA修复和细胞周期。从PPI网络中筛选出20个hub基因, ROC结果显示, UBE2C (AUC=0.939)、TOP2A(AUC=0.927)、RRM2 (AUC=0.927)、CCNB1 (AUC=0.928)、MKI67 (AUC=0.930)、AURKA (AUC=0.931)、MELK(AUC=0.950)相对具有较高的诊断价值, LASSO COX回归结果则显示IL6、KIAA0101、MKI67、TPX2、AURKA、CDKN3及CDCA5与NSCLC患者的预后强相关。本研究结果表明, ZWINT、KIF2C、MELK、CDCA5可能在NSCLC中发挥着重要的作用,为阐明NSCLC的分子机制提供了新思路。  相似文献   

8.
梁爽  凡奎  张燕  谢杨眉 《生物信息学》2020,18(3):163-168
为了寻找诊断、鉴别IgA肾病(IgAN)和膜性肾病(MN)的血液特异性标记物,利用公共数据库中的IgAN和MN患者的外周血单核细胞(PBMCs)的转录组表达谱数据集识别特异性生物标记物,为诊断和鉴别提供简便、可靠的依据补充。从公共基因表达数据库(GEO)下载IgAN患者组(n=15)和MN患者组(n=8)芯片数据集,筛选前250个差异表达基因(DEGs)。通过分析筛选关键基因和途径,进行基因本体(GO)富集分析、京都基因与基因组百科全书(KEGG)通路分析和蛋白质与蛋白质相互作用关系(PPI)分析等进一步了解DEGs。通过分析共发现75个显著DEGs,其中73个上调基因,2个下调基因。GO富集分析的生物学过程(BP)主要包括蛋白质转运、内溶酶体到溶酶体转运、趋化因子介导的信号通路作用等。显著富集差异表达基因KEGG通路分析包括Endocytosis和Hepatitis B的相关信号通路。PPI筛选出EPS15、STAT4、CCL2、SUN2、SEC24C、SEC31A、GOLGB1、F2R,RAB12和PTK2B等关键基因。成功筛选出核心差异表达基因,为IgAN和MN的诊断和鉴别提供简便、可靠的依据补充,甚至提供治疗的新靶点。  相似文献   

9.
胚胎来源的中胚层细胞可以分化为心血管、血液和肌肉组织等多种类型细胞,而应用人胚胎干细胞分化为中胚层细胞的体外模型可为研究中胚层及其衍生的细胞谱系的分子调控机制提供重要手段。miRNA调控基因的表达通过多条信号通路参与中胚层细胞分化,但其调控机制虽有相关研究却并未完全阐明,特别是从整体水平上探索基因与非编码RNA表达变化及其相互作用的网络调控。该研究根据生物信息学分析,构建通过调节多条信号通路参与人胚胎干细胞向中胚层分化的潜在miRNA-mRNA调控网络,以便更全面地阐明人胚胎干细胞的分化机制。通过基因芯片和二代测序(RNAseq)技术检测筛选人胚胎干细胞诱导分化为中胚层细胞过程差异表达的miRNA和基因,并应用生物信息学分析预测差异表达miRNA的靶基因,将靶基因与差异表达基因取交集获得目标基因。同时,对差异表达基因和目标基因进行GSEA富集、GO注释及KEGG富集分析。最后,构建miRNA-mRNA的调控网络和筛选出关键基因并检测关键基因的表达。该研究共筛选出287个差异表达的miRNA和739个差异表达基因,预测差异表达miRNA的靶基因为13 064个,13 064个靶基因与739个差异表达基因取交集共获得目标基因401个。GSEA和KEGG富集分析发现,多条参与中胚层分化的信号通路,主要涉及Wnt/β-catenin、TGF-β和Hippo三条重要的信号通路。通过构建miRNA-mRNA调控网络,结果显示100个miRNA靶向Wnt/β-catenin通路中的11个基因,59个miRNA靶向TGF-β通路中的7个基因,有106个miRNA靶向Hippo通路中的10个基因。通过RT-qPCR验证三条通路中关键基因的表达。因此,该研究揭示了在中胚层分化过程中,Wnt/β-catenin、TGF-β和Hippo信号通路起了重要的调控作用,可能通过与各种miRNA-mRNA相互作用形成复杂的网络调控系统,精确调控人胚胎干细胞定向分化为中胚层细胞。  相似文献   

10.
基于基因表达变异性的通路富集方法研究   总被引:1,自引:0,他引:1  
当前的通路富集方法主要是基于基因的表达差异,很少有方法从通路变异性(方差)角度对其富集分析.我们注意到用合适的统计量描述通路的变异性时,在疾病表型下一些通路的变异性有明显的上升或者下降.因此本研究假设:通路变异性程度在不同表型中存在差异.本文设计了14种描述通路变异性的统计量与检验方法,检测不同表型下变异性有差异的通路即富集通路,并将富集结果与文献检索结果进行比较,同时,分析不同芯片预处理方法对数据和结果的影响.研究结果表明:5种预处理方法中,多阵列对数健壮算法(RMA)是数据预处理的最优方法;不同表型下通路的变异性程度存在差异;根据文献检索的通路结果,14种基于变异性的通路富集方法中,以通路中各基因欧氏距离的方差做统计量进行permutation检验(方法11)能有效识别显著通路,其富集结果优于基因集富集分析(GSEA).综上所述,基于通路变异性的通路富集策略具有可行性,不仅对通路富集分析有一定的理论指导意义,而且为人类疾病研究提供新的视角.  相似文献   

11.
12.
13.
We developed PathAct, a novel method for pathway analysis to investigate the biological and clinical implications of the gene expression profiles. The advantage of PathAct in comparison with the conventional pathway analysis methods is that it can estimate pathway activity levels for individual patient quantitatively in the form of a pathway-by-sample matrix. This matrix can be used for further analysis such as hierarchical clustering and other analysis methods. To evaluate the feasibility of PathAct, comparison with frequently used gene-enrichment analysis methods was conducted using two public microarray datasets. The dataset #1 was that of breast cancer patients, and we investigated pathways associated with triple-negative breast cancer by PathAct, compared with those obtained by gene set enrichment analysis (GSEA). The dataset #2 was another breast cancer dataset with disease-free survival (DFS) of each patient. Contribution by each pathway to prognosis was investigated by our method as well as the Database for Annotation, Visualization and Integrated Discovery (DAVID) analysis. In the dataset #1, four out of the six pathways that satisfied p < 0.05 and FDR < 0.30 by GSEA were also included in those obtained by the PathAct method. For the dataset #2, two pathways (“Cell Cycle” and “DNA replication”) out of four pathways by PathAct were commonly identified by DAVID analysis. Thus, we confirmed a good degree of agreement among PathAct and conventional methods. Moreover, several applications of further statistical analyses such as hierarchical cluster analysis by pathway activity, correlation analysis and survival analysis between pathways were conducted.  相似文献   

14.

Background

Multiple microarray analyses of multiple sclerosis (MS) and its experimental models have been published in the last years.

Objective

Meta-analyses integrate the information from multiple studies and are suggested to be a powerful approach in detecting highly relevant and commonly affected pathways.

Data sources

ArrayExpress, Gene Expression Omnibus and PubMed databases were screened for microarray gene expression profiling studies of MS and its experimental animal models.

Study eligibility criteria

Studies comparing central nervous system (CNS) samples of diseased versus healthy individuals with n >1 per group and publically available raw data were selected.

Material and Methods

Included conditions for re-analysis of differentially expressed genes (DEGs) were MS, myelin oligodendrocyte glycoprotein-induced experimental autoimmune encephalomyelitis (EAE) in rats, proteolipid protein-induced EAE in mice, Theiler’s murine encephalomyelitis virus-induced demyelinating disease (TMEV-IDD), and a transgenic tumor necrosis factor-overexpressing mouse model (TNFtg). Since solely a single MS raw data set fulfilled the inclusion criteria, a merged list containing the DEGs from two MS-studies was additionally included. Cross-study analysis was performed employing list comparisons of DEGs and alternatively Gene Set Enrichment Analysis (GSEA).

Results

The intersection of DEGs in MS, EAE, TMEV-IDD, and TNFtg contained 12 genes related to macrophage functions. The intersection of EAE, TMEV-IDD and TNFtg comprised 40 DEGs, functionally related to positive regulation of immune response. Over and above, GSEA identified substantially more differentially regulated pathways including coagulation and JAK/STAT-signaling.

Conclusion

A meta-analysis based on a simple comparison of DEGs is over-conservative. In contrast, the more experimental GSEA approach identified both, a priori anticipated as well as promising new candidate pathways.  相似文献   

15.

Background  

Gene set enrichment analysis (GSEA) is a microarray data analysis method that uses predefined gene sets and ranks of genes to identify significant biological changes in microarray data sets. GSEA is especially useful when gene expression changes in a given microarray data set is minimal or moderate.  相似文献   

16.
In the past, microarray studies have been criticized due to noise and the limited overlap between gene signatures. Prior biological knowledge should therefore be incorporated as side information in models based on gene expression data to improve the accuracy of diagnosis and prognosis in cancer. As prior knowledge, we investigated interaction and pathway information from the human interactome on different aspects of biological systems. By exploiting the properties of kernel methods, relations between genes with similar functions but active in alternative pathways could be incorporated in a support vector machine classifier based on spectral graph theory. Using 10 microarray data sets, we first reduced the number of data sources relevant for multiple cancer types and outcomes. Three sources on metabolic pathway information (KEGG), protein-protein interactions (OPHID) and miRNA-gene targeting (microRNA.org) outperformed the other sources with regard to the considered class of models. Both fixed and adaptive approaches were subsequently considered to combine the three corresponding classifiers. Averaging the predictions of these classifiers performed best and was significantly better than the model based on microarray data only. These results were confirmed on 6 validation microarray sets, with a significantly improved performance in 4 of them. Integrating interactome data thus improves classification of cancer outcome for the investigated microarray technologies and cancer types. Moreover, this strategy can be incorporated in any kernel method or non-linear version of a non-kernel method.  相似文献   

17.
Evaluation and comparison of gene clustering methods in microarray analysis   总被引:4,自引:0,他引:4  
MOTIVATION: Microarray technology has been widely applied in biological and clinical studies for simultaneous monitoring of gene expression in thousands of genes. Gene clustering analysis is found useful for discovering groups of correlated genes potentially co-regulated or associated to the disease or conditions under investigation. Many clustering methods including hierarchical clustering, K-means, PAM, SOM, mixture model-based clustering and tight clustering have been widely used in the literature. Yet no comprehensive comparative study has been performed to evaluate the effectiveness of these methods. RESULTS: In this paper, six gene clustering methods are evaluated by simulated data from a hierarchical log-normal model with various degrees of perturbation as well as four real datasets. A weighted Rand index is proposed for measuring similarity of two clustering results with possible scattered genes (i.e. a set of noise genes not being clustered). Performance of the methods in the real data is assessed by a predictive accuracy analysis through verified gene annotations. Our results show that tight clustering and model-based clustering consistently outperform other clustering methods both in simulated and real data while hierarchical clustering and SOM perform among the worst. Our analysis provides deep insight to the complicated gene clustering problem of expression profile and serves as a practical guideline for routine microarray cluster analysis.  相似文献   

18.

Background  

Gene-set analysis evaluates the expression of biological pathways, or a priori defined gene sets, rather than that of individual genes, in association with a binary phenotype, and is of great biologic interest in many DNA microarray studies. Gene Set Enrichment Analysis (GSEA) has been applied widely as a tool for gene-set analyses. We describe here some critical problems with GSEA and propose an alternative method by extending the individual-gene analysis method, Significance Analysis of Microarray (SAM), to gene-set analyses (SAM-GS).  相似文献   

19.

Background  

Recently, microarray data analyses using functional pathway information, e.g., gene set enrichment analysis (GSEA) and significance analysis of function and expression (SAFE), have gained recognition as a way to identify biological pathways/processes associated with a phenotypic endpoint. In these analyses, a local statistic is used to assess the association between the expression level of a gene and the value of a phenotypic endpoint. Then these gene-specific local statistics are combined to evaluate association for pre-selected sets of genes. Commonly used local statistics include t-statistics for binary phenotypes and correlation coefficients that assume a linear or monotone relationship between a continuous phenotype and gene expression level. Methods applicable to continuous non-monotone relationships are needed. Furthermore, for multiple experimental categories, methods that combine multiple GSEA/SAFE analyses are needed.  相似文献   

20.
Circumventing the cut-off for enrichment analysis   总被引:1,自引:0,他引:1  
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号