首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A common biological pathway reconstruction approach—as implemented by many automatic biological pathway services (such as the KAAS and RAST servers) and the functional annotation of metagenomic sequences—starts with the identification of protein functions or families (e.g., KO families for the KEGG database and the FIG families for the SEED database) in the query sequences, followed by a direct mapping of the identified protein families onto pathways. Given a predicted patchwork of individual biochemical steps, some metric must be applied in deciding what pathways actually exist in the genome or metagenome represented by the sequences. Commonly, and straightforwardly, a complete biological pathway can be identified in a dataset if at least one of the steps associated with the pathway is found. We report, however, that this naïve mapping approach leads to an inflated estimate of biological pathways, and thus overestimates the functional diversity of the sample from which the DNA sequences are derived. We developed a parsimony approach, called MinPath (Minimal set of Pathways), for biological pathway reconstructions using protein family predictions, which yields a more conservative, yet more faithful, estimation of the biological pathways for a query dataset. MinPath identified far fewer pathways for the genomes collected in the KEGG database—as compared to the naïve mapping approach—eliminating some obviously spurious pathway annotations. Results from applying MinPath to several metagenomes indicate that the common methods used for metagenome annotation may significantly overestimate the biological pathways encoded by microbial communities.  相似文献   

2.
王伟科  宋吉玲  闫静  陆娜  袁卫东  周祖法 《菌物学报》2020,39(10):1874-1885
通过对桑树桑黄Sanghuangporus sanghuang菌丝体和子实体2个不同生长阶段的转录组进行分析,为研究桑黄子实体生长发育相关机制奠定基础。采用Illumina测序技术,对桑树桑黄菌株S23菌丝体和子实体2个不同生长发育阶段进行了全转录组测序。将转录组测序reads比对到参考序列上,菌丝体测序样本的reads比对率为82.89%;子实体测序样本的reads比对率为83%。基因差异表达分析显示,与菌丝体相比,子实体中显著上调表达基因为2 898个,显著下调表达基因为1 965个。经过Blast nr比对发现,桑黄菌在子实体阶段表达量上升的基因主要与各种氧化酶活性、疏水蛋白等相关;表达量下降的基因主要与糖类、氨基酸结合、运输等相关。基因本体(gene ontology,GO)富集分析表明,菌丝体及子实体两个阶段与跨膜转运相关的差异表达基因富集明显。代谢通路(pathway)富集分析表明,类固醇生物合成、精氨酸生物合成、丝裂原活化蛋白激酶(mitogen-activated protein kinase,MAPK)信号通路等差异基因富集明显。  相似文献   

3.
Prostate cancer is one of the most common male malignant neoplasms; however, its causes are not completely understood. A few recent studies have used gene expression profiling of prostate cancer to identify differentially expressed genes and possible relevant pathways. However, few studies have examined the genetic mechanics of prostate cancer at the pathway level to search for such pathways. We used gene set enrichment analysis and a meta-analysis of six independent studies after standardized microarray preprocessing, which increased concordance between these gene datasets. Based on gene set enrichment analysis, there were 12 down- and 25 up-regulated mixing pathways in more than two tissue datasets, while there were two down- and two up-regulated mixing pathways in three cell datasets. Based on the meta-analysis, there were 46 and nine common pathways in the tissue and cell datasets, respectively. Three up- and 10 down-regulated crossing pathways were detected with combined gene set enrichment analysis and meta-analysis. We found that genes with small changes are difficult to detect by classic univariate statistics; they can more easily be identified by pathway analysis. After standardized microarray preprocessing, we applied gene set enrichment analysis and a meta-analysis to increase the concordance in identifying biological mechanisms involved in prostate cancer. The gene pathways that we identified could provide insight concerning the development of prostate cancer.  相似文献   

4.
利用GEO数据库(gene expression omnibus database)通过生物信息学分析方法探讨急性髓系白血病(acute myelogenous leukemia,AML)的发病机制。检索GEO数据库中AML相关芯片数据集GSE142698、GSE142699和GSE96535。利用GEO2R分析得到差异mRNAs、miRNAs以及差异lncRNAs。利用在线生物信息学分析工具DAVID对差异mRNAs进行GO富集分析和KEGG通路分析。利用miRWalk数据库预测AML相关miRNAs的靶向mRNAs,利用Spongescan数据库预测AML相关miRNAs的靶向lncRNAs,构建lncRNA-miRNA-mRNA竞争性内源RNA (competing endogenous RNA,ceRNA)调控网络。共筛选出29个显著差异mRNAs、70个显著差异miRNAs和20 005个显著差异lncRNAs。GO富集分析和KEGG通路分析显示,差异表达基因主要涉及蛋白磷酸化、细胞分裂、细胞增殖的负调控、基因表达的正向调节、周期蛋白依赖的丝氨酸/苏氨酸激酶活性的调节等生物过程以及细胞周期、细胞衰老、癌症通路、PI3K-Akt通路等信号通路。将miRWalk数据库预测的靶向mRNAs与差异mRNAs取交集,Spongescan数据库预测的靶向lncRNAs与差异lncRNAs取交集,分别确定了25个mRNAs、6个lncRNAs参与AML相关ceRNA调控网络的构建。结果表明,lncRNAs可能作为关键的ceRNA,通过调控miRNA和相关靶基因参与AML的发生与发展,研究结果为AML诊断和治疗的分子生物学研究提供了新的依据。  相似文献   

5.
We used established databases in standard ways to systematically characterize gene ontologies, pathways and functional linkages in the large set of genes now associated with autism spectrum disorders (ASDs). These conditions are particularly challenging—they lack clear pathognomonic biological markers, they involve great heterogeneity across multiple levels (genes, systemic biological and brain characteristics, and nuances of behavioral manifestations)—and yet everyone with this diagnosis meets the same defining behavioral criteria. Using the human gene list from Simons Foundation Autism Research Initiative (SFARI) we performed gene set enrichment analysis with the Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathway Database, and then derived a pathway network from pathway-pathway functional interactions again in reference to KEGG. Through identifying the GO (Gene Ontology) groups in which SFARI genes were enriched, mapping the coherence between pathways and GO groups, and ranking the relative strengths of representation of pathway network components, we 1) identified 10 disease-associated and 30 function-associated pathways 2) revealed calcium signaling pathway and neuroactive ligand-receptor interaction as the most enriched, statistically significant pathways from the enrichment analysis, 3) showed calcium signaling pathways and MAPK signaling pathway to be interactive hubs with other pathways and also to be involved with pervasively present biological processes, 4) found convergent indications that the process “calcium-PRC (protein kinase C)-Ras-Raf-MAPK/ERK” is likely a major contributor to ASD pathophysiology, and 5) noted that perturbations associated with KEGG’s category of environmental information processing were common. These findings support the idea that ASD-associated genes may contribute not only to core features of ASD themselves but also to vulnerability to other chronic and systemic problems potentially including cancer, metabolic conditions and heart diseases. ASDs may thus arise, or emerge, from underlying vulnerabilities related to pleiotropic genes associated with pervasively important molecular mechanisms, vulnerability to environmental input and multiple systemic co-morbidities.  相似文献   

6.
There is a pressing need today to mechanistically interpret sets of genomic variants associated with diseases. Here we present a tool called ‘VarSAn’ that uses a network analysis algorithm to identify pathways relevant to a given set of variants. VarSAn analyzes a configurable network whose nodes represent variants, genes and pathways, using a Random Walk with Restarts algorithm to rank pathways for relevance to the given variants, and reports P-values for pathway relevance. It treats non-coding and coding variants differently, properly accounts for the number of pathways impacted by each variant and identifies relevant pathways even if many variants do not directly impact genes of the pathway. We use VarSAn to identify pathways relevant to variants related to cancer and several other diseases, as well as drug response variation. We find VarSAn''s pathway ranking to be complementary to the standard approach of enrichment tests on genes related to the query set. We adopt a novel benchmarking strategy to quantify its advantage over this baseline approach. Finally, we use VarSAn to discover key pathways, including the VEGFA-VEGFR2 pathway, related to de novo variants in patients of Hypoplastic Left Heart Syndrome, a rare and severe congenital heart defect.  相似文献   

7.

Background

High-throughput technologies like functional screens and gene expression analysis produce extended lists of candidate genes. Gene-Set Enrichment Analysis is a commonly used and well established technique to test for the statistically significant over-representation of particular pathways. A shortcoming of this method is however, that most genes that are investigated in the experiments have very sparse functional or pathway annotation and therefore cannot be the target of such an analysis. The approach presented here aims to assign lists of genes with limited annotation to previously described functional gene collections or pathways. This works by comparing InterPro domain signatures of the candidate gene lists with domain signatures of gene sets derived from known classifications, e.g. KEGG pathways.

Results

In order to validate our approach, we designed a simulation study. Based on all pathways available in the KEGG database, we create test gene lists by randomly selecting pathway genes, removing these genes from the known pathways and adding variable amounts of noise in the form of genes not annotated to the pathway. We show that we can recover pathway memberships based on the simulated gene lists with high accuracy. We further demonstrate the applicability of our approach on a biological example.

Conclusion

Results based on simulation and data analysis show that domain based pathway enrichment analysis is a very sensitive method to test for enrichment of pathways in sparsely annotated lists of genes. An R based software package domainsignatures, to routinely perform this analysis on the results of high-throughput screening, is available via Bioconductor.  相似文献   

8.
目的:挖掘重症肌无力(Myasthenia gravis,MG)可能的风险基因。方法:通过人工挖掘在Pub Med数据库收集重症肌无力风险基因,通过Gene数据库获取重症肌无力风险基因编号,用以表示基因或者其相应的蛋白。应用基因功能分析软件DAVID(http://david.abcc.ncifcrf.gov/)对重症肌无力风险基因进行KEGG通路富集分析,挖掘重症肌无力风险通路,进而对任意两个通路进行关联分析。应用基因功能分析软件DAVID的Gene Ontology,对MG风险基因进行功能注释,以P0.01来判定注释是否有显著意义。结果:(1)本研究挖掘出97个重症肌无力的风险基因,KEGG基因富集分析共筛选出44条与重症肌无力显著相关的通路,主要包括多种自身免疫性疾病相关通路、信号转导相关通路、肿瘤相关通路、抗原的加工提呈通路等等。(2)以上44条风险通路两两通路间均具有相关性(P.0.01)。结论:本研究共挖掘出44条重症肌无力风险通路,8个重症肌无力风险基因,分别为:NF-kB、TNFR、MEK、AP-1、Raf、MEK1/2、MSK1、TAPBP。其中,MEK同时出现在多个风险通路中,考虑其风险性更高。  相似文献   

9.
The power of genome-wide SNP association studies is limited, among others, by the large number of false positive test results. To provide a remedy, we combined SNP association analysis with the pathway-driven gene set enrichment analysis (GSEA), recently developed to facilitate handling of genome-wide gene expression data. The resulting GSEA-SNP method rests on the assumption that SNPs underlying a disease phenotype are enriched in genes constituting a signaling pathway or those with a common regulation. Besides improving power for association mapping, GSEA-SNP may facilitate the identification of disease-associated SNPs and pathways, as well as the understanding of the underlying biological mechanisms. GSEA-SNP may also help to identify markers with weak effects, undetectable in association studies without pathway consideration. The program is freely available and can be downloaded from our website.  相似文献   

10.
11.
梁爽  凡奎  张燕  谢杨眉 《生物信息学》2020,18(3):163-168
为了寻找诊断、鉴别IgA肾病(IgAN)和膜性肾病(MN)的血液特异性标记物,利用公共数据库中的IgAN和MN患者的外周血单核细胞(PBMCs)的转录组表达谱数据集识别特异性生物标记物,为诊断和鉴别提供简便、可靠的依据补充。从公共基因表达数据库(GEO)下载IgAN患者组(n=15)和MN患者组(n=8)芯片数据集,筛选前250个差异表达基因(DEGs)。通过分析筛选关键基因和途径,进行基因本体(GO)富集分析、京都基因与基因组百科全书(KEGG)通路分析和蛋白质与蛋白质相互作用关系(PPI)分析等进一步了解DEGs。通过分析共发现75个显著DEGs,其中73个上调基因,2个下调基因。GO富集分析的生物学过程(BP)主要包括蛋白质转运、内溶酶体到溶酶体转运、趋化因子介导的信号通路作用等。显著富集差异表达基因KEGG通路分析包括Endocytosis和Hepatitis B的相关信号通路。PPI筛选出EPS15、STAT4、CCL2、SUN2、SEC24C、SEC31A、GOLGB1、F2R,RAB12和PTK2B等关键基因。成功筛选出核心差异表达基因,为IgAN和MN的诊断和鉴别提供简便、可靠的依据补充,甚至提供治疗的新靶点。  相似文献   

12.
13.
Ma X  Tarone AM  Li W 《PloS one》2008,3(4):e1922

Background

Synthetic lethal genetic interaction analysis has been successfully applied to predicting the functions of genes and their pathway identities. In the context of synthetic lethal interaction data alone, the global similarity of synthetic lethal interaction patterns between two genes is used to predict gene function. With physical interaction data, such as protein-protein interactions, the enrichment of physical interactions within subsets of genes and the enrichment of synthetic lethal interactions between those subsets of genes are used as an indication of compensatory pathways.

Result

In this paper, we propose a method of mapping genetically compensatory pathways from synthetic lethal interactions. Our method is designed to discover pairs of gene-sets in which synthetic lethal interactions are depleted among the genes in an individual set and where such gene-set pairs are connected by many synthetic lethal interactions. By its nature, our method could select compensatory pathway pairs that buffer the deleterious effect of the failure of either one, without the need of physical interaction data. By focusing on compensatory pathway pairs where genes in each individual pathway have a highly homogenous cellular function, we show that many cellular functions have genetically compensatory properties.

Conclusion

We conclude that synthetic lethal interaction data are a powerful source to map genetically compensatory pathways, especially in systems lacking physical interaction information, and that the cellular function network contains abundant compensatory properties.  相似文献   

14.
通过生物信息学方法预测hsa-miR-192-3p的靶基因及其靶基因的可能功能。首先通过miRbase在线数据库对hsamiR-192-3p的碱基序列及序列在各物种间的保守性进行分析,再通过miRGator v3. 0在线数据库查看hsa-miR-192-3p在各个组织器官中的表达丰富度情况;其次,应用Target Scan和miRanda在线数据库预测hsa-miR-192-3p的靶基因;最后,将预测得到的两个数据库的靶基因交集用DAVID在线数据库进行功能富集分析和信号转导通路富集分析。结果表明:hsa-miR-192-3p在人、家鼠、猕猴等生物中存在高度保守性; hsa-miR-192-3p在胃肠道、肾脏、肝胆系统、干细胞、鼻、脾、胸腺中表达丰富度较高;通过两个靶基因预测软件预测的靶基因取交集后共有190个;功能富集分析发现hsa-miR-192-3p靶基因富集在细胞质、细胞核、质膜、高尔基体等15个细胞组件(p0. 05),参与蛋白结合、GTP酶活性、锌离子跨膜转运蛋白活性等7个分子功能(p0. 05),涉及金属离子运输、RNA聚合酶II启动子的转录阳性调控、基因表达调节、钙离子跨膜运输、胚胎发育等18个生物过程(p0. 05);预测靶基因集合显著富集于癌症通路与催乳素信号通路中(p0. 05)。得出结论:hsa-miR-192-3p预测的靶基因集合富集于多个生物过程,与肿瘤密切相关,生物信息预测为今后的研究奠定了一定的理论基础,为后续实验验证提供了研究方向。  相似文献   

15.
To identify key microRNAs (miRNAs) associated with hepatocellular carcinoma (HCC) using small RNA-seq data. Small RNA-seq data for two HCC samples and two normal samples were downloaded from NCBI Gene Expression Omnibus. MiRNAs were identified through database search. Differentially expressed miRNAs were screened out with t test and their target genes were retrieved. Functional enrichment analysis was performed to uncover their biological functions. Regulatory networks and core metabolic networks were also constructed to present the global patterns. In addition, new miRNAs and their target genes were predicted. A total of 59 differentially expressed miRNAs were obtained, 12 up-regulated and 47 down-regulated. A total of 3,306 target genes were retrieved for eight miRNAs. Pathway enrichment analysis for the target genes showed that “pathways in cancer” and “MAPK signaling pathway” were significantly over-represented. Functional enrichment analysis found that “biological regulation” and “macromolecule modification” were significantly related to the target genes. Two regulatory networks were constructed for up- and down-regulated differentially expressed miRNAs with information from Ingenuity Pathway Analysis database. Two metabolic networks were also established based upon “pathways in cancer” and “MAPK signaling pathway”, consisting of miRNAs, target genes, compounds and others genes. Moreover, a number of new miRNAs and relevant target genes were predicted. Our study discloses a number of miRNAs as well as genes which may be involved in the development of HCC and these findings are beneficial in guiding future researches.  相似文献   

16.
17.
Yang JO  Charny P  Lee B  Kim S  Bhak J  Woo HG 《Bioinformation》2007,2(5):194-196
GS2PATH is a Web-based pipeline tool to permit functional enrichment of a given gene set from prior knowledge databases, including gene ontology (GO) database and biological pathway databases. The tool also provides an estimation of gene set enrichment, in GO terms, from the databases of the KEGG and BioCarta pathways, which may allow users to compute and compare functional over-representations. This is especially useful in the perspective of biological pathways such as metabolic, signal transduction, genetic information processing, environmental information processing, cellular process, disease, and drug development. It provides relevant images of biochemical pathways with highlighting of the gene set by customized colors, which can directly assist in the visualization of functional alteration.

Availability  相似文献   


18.
MetaCyc (http://metacyc.org) contains experimentally determined biochemical pathways to be used as a reference database for metabolism. In conjunction with the Pathway Tools software, MetaCyc can be used to computationally predict the metabolic pathway complement of an annotated genome. To increase the breadth of pathways and enzymes, more than 60 plant-specific pathways have been added or updated in MetaCyc recently. In contrast to MetaCyc, which contains metabolic data for a wide range of organisms, AraCyc is a species-specific database containing only enzymes and pathways found in the model plant Arabidopsis (Arabidopsis thaliana). AraCyc (http://arabidopsis.org/tools/aracyc/) was the first computationally predicted plant metabolism database derived from MetaCyc. Since its initial computational build, AraCyc has been under continued curation to enhance data quality and to increase breadth of pathway coverage. Twenty-eight pathways have been manually curated from the literature recently. Pathway predictions in AraCyc have also been recently updated with the latest functional annotations of Arabidopsis genes that use controlled vocabulary and literature evidence. AraCyc currently features 1,418 unique genes mapped onto 204 pathways with 1,156 literature citations. The Omics Viewer, a user data visualization and analysis tool, allows a list of genes, enzymes, or metabolites with experimental values to be painted on a diagram of the full pathway map of AraCyc. Other recent enhancements to both MetaCyc and AraCyc include implementation of an evidence ontology, which has been used to provide information on data quality, expansion of the secondary metabolism node of the pathway ontology to accommodate curation of secondary metabolic pathways, and enhancement of the cellular component ontology for storing and displaying enzyme and pathway locations within subcellular compartments.  相似文献   

19.
MOTIVATION: We present a system, QPACA (Quantitative Pathway Analysis in Cancer) for analysis of biological data in the context of pathways. QPACA supports data visualization and both fine- and coarse-grained specifications, but, more importantly, addresses the problems of pathway recognition and pathway augmentation. RESULTS: Given a set of genes hypothesized to be part of a pathway or a coordinated process, QPACA is able to reliably distinguish true pathways from non-pathways using microarray expression data. Relying on the observation that only some of the experiments within a dataset are relevant to a specific biochemical pathway, QPACA automates selection of this subset using an optimization procedure. We present data on all human and yeast pathways found in the KEGG pathway database. In 117 out of 191 cases (61%), QPACA was able to correctly identify these positive cases as bona fide pathways with p-values measured using rigorous permutation analysis. Success in recognizing pathways was dependent on pathway size, with the largest quartile of pathways yielding 83% success. In cross-validation tests of pathway membership prediction, QPACA was able to yield enrichments for predicted pathway genes over random genes at rates of 2-fold or better the majority of the time, with rates of 10-fold or better 10-20% of the time. AVAILABILITY: The software is available for academic research use free of charge by email request. SUPPLEMENTARY INFORMATION: Data used in the paper may be downloaded from http://www.jainlab.org/downloads.html  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号