首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Cancer, being among the most serious diseases, causes many deaths every year. Many investigators have devoted themselves to designing effective treatments for this disease. Cancer always involves abnormal cell growth with the potential to invade or spread to other parts of the body. In contrast, tumor suppressor genes (TSGs) act as guardians to prevent a disordered cell cycle and genomic instability in normal cells. Studies on TSGs can assist in the design of effective treatments against cancer. In this study, we propose a computational method to discover potential TSGs. Based on the known TSGs, a number of candidate genes were selected by applying the shortest path approach in a weighted graph that was constructed using protein–protein interaction network. The analysis of selected genes shows that some of them are new TSGs recently reported in the literature, while others may be novel TSGs.  相似文献   

2.
3.
Advanced statistical methods used to analyze high-throughput data such as gene-expression assays result in long lists of “significant genes.” One way to gain insight into the significance of altered expression levels is to determine whether Gene Ontology (GO) terms associated with a particular biological process, molecular function, or cellular component are over- or under-represented in the set of genes deemed significant. This process, referred to as enrichment analysis, profiles a gene-set, and is widely used to makes sense of the results of high-throughput experiments. The canonical example of enrichment analysis is when the output dataset is a list of genes differentially expressed in some condition. To determine the biological relevance of a lengthy gene list, the usual solution is to perform enrichment analysis with the GO. We can aggregate the annotating GO concepts for each gene in this list, and arrive at a profile of the biological processes or mechanisms affected by the condition under study. While GO has been the principal target for enrichment analysis, the methods of enrichment analysis are generalizable. We can conduct the same sort of profiling along other ontologies of interest. Just as scientists can ask “Which biological process is over-represented in my set of interesting genes or proteins?” we can also ask “Which disease (or class of diseases) is over-represented in my set of interesting genes or proteins?“. For example, by annotating known protein mutations with disease terms from the ontologies in BioPortal, Mort et al. recently identified a class of diseases—blood coagulation disorders—that were associated with a 14-fold depletion in substitutions at O-linked glycosylation sites. With the availability of tools for automatic annotation of datasets with terms from disease ontologies, there is no reason to restrict enrichment analyses to the GO. In this chapter, we will discuss methods to perform enrichment analysis using any ontology available in the biomedical domain. We will review the general methodology of enrichment analysis, the associated challenges, and discuss the novel translational analyses enabled by the existence of public, national computational infrastructure and by the use of disease ontologies in such analyses.

What to Learn in This Chapter

  • Review the commonly used approach of Gene Ontology based enrichment analysis
  • Understand the pitfalls associated with current approaches
  • Understand the national infrastructure available for using alternative ontologies for enrichment analysis
  • Learn about a generalized enrichment analysis workflow and its application using disease ontologies
This article is part of the “Translational Bioinformatics” collection for PLOS Computational Biology.
  相似文献   

4.
Ovarian carcinoma (OC) is one of the most common malignant tumors in female genitals. In recent years, the therapeutic effect of OC has been significantly improved through the application of effective chemotherapy regimen. However, the 5-year survival rate is also lower than 30% due to high rate of relapse. So, it is needed to screen reliable predictive and prognostic markers of OC. Ovarian cancer gene expression data and corresponding clinical data used were downloaded from Gene Expression Omnibus database. Weighted gene expression network analysis (WGCNA) and Cox proportional hazards regression (PHR) were used to screen Pathological Grade and Prognosis-associated long noncoding RNA (lncRNA). Kaplan-Meier analysis and receiver operating characteristic curves analysis were performed to evaluate the predictive ability of the selected lncRNA. Gene Ontology (GO) enrichment and Gene Set Enrichment Analysis (GSEA) enrichment analysis methods were used to explore the possible mechanisms of the selected lncRNA affecting the development of OC. Five reliably lncRNAs (LINC00664, LINC00667, LINC01139, LINC01419, and LOC286437) was identified through a series of bioinformatics methods. In testing cohorts, we found that the five lncRNAs in predicting the risk of OC recurrence is robustness, and multivariate Cox PHR analysis indicate that the five lncRNAs is an independent risk factor for OC recurrence. Moreover, GO and GSEA enrichment analysis showed that the five lncRNAs are involved in multiple ovarian cancer occurrence mechanism. In summary, all these findings indicated that the five lncRNAs can effectively predict the risk of recurrence of ovarian cancer.  相似文献   

5.
An integral part of functional genomics studies is to assess the enrichment of specific biological terms in lists of genes found to be playing an important role in biological phenomena. Contrasting the observed frequency of annotated terms with those of the background is at the core of overrepresentation analysis (ORA). Gene Ontology (GO) is a means to consistently classify and annotate gene products and has become a mainstay in ORA. Alternatively, Medical Subject Headings (MeSH) offers a comprehensive life science vocabulary including additional categories that are not covered by GO. Although MeSH is applied predominantly in human and model organism research, its full potential in livestock genetics is yet to be explored. In this study, MeSH ORA was evaluated to discern biological properties of identified genes and contrast them with the results obtained from GO enrichment analysis. Three published datasets were employed for this purpose, representing a gene expression study in dairy cattle, the use of SNPs for genome‐wide prediction in swine and the identification of genomic regions targeted by selection in horses. We found that several overrepresented MeSH annotations linked to these gene sets share similar concepts with those of GO terms. Moreover, MeSH yielded unique annotations, which are not directly provided by GO terms, suggesting that MeSH has the potential to refine and enrich the representation of biological knowledge. We demonstrated that MeSH can be regarded as another choice of annotation to draw biological inferences from genes identified via experimental analyses. When used in combination with GO terms, our results indicate that MeSH can enhance our functional interpretations for specific biological conditions or the genetic basis of complex traits in livestock species.  相似文献   

6.
通过比较登革热患者和健康人群转录组数据,识别差异基因,构建失调ceRNA网络,筛选关键基因富集分析,解析潜在生物学功能,助力登革热诊断标志物的研究。从GEO数据库下载登革热外周血芯片数据,识别差异基因并进行富集分析。结合miRNA-mRNA互作数据,利用超几何算法和皮尔森相关性计算方法识别登革热失调ceRNA互作对,使用Cytoscape软件可视化ceRNA网络与模块挖掘,对网络模块进行功能富集及外部数据验证表达模式。筛选出251个差异基因,发现其富集在细胞周期等生物学通路中。经外部数据验证,网络模块基因的表达趋势与训练集数据大致相同,表明模块基因在登革热疾病中的潜在诊断效能。本研究可为确定有效的疾病诊断分子标志物提供思路。  相似文献   

7.

Background

Communalities between large sets of genes obtained from high-throughput experiments are often identified by searching for enrichments of genes with the same Gene Ontology (GO) annotations. The GO analysis tools used for these enrichment analyses assume that GO terms are independent and the semantic distances between all parent–child terms are identical, which is not true in a biological sense. In addition these tools output lists of often redundant or too specific GO terms, which are difficult to interpret in the context of the biological question investigated by the user. Therefore, there is a demand for a robust and reliable method for gene categorization and enrichment analysis.

Results

We have developed Categorizer, a tool that classifies genes into user-defined groups (categories) and calculates p-values for the enrichment of the categories. Categorizer identifies the biologically best-fit category for each gene by taking advantage of a specialized semantic similarity measure for GO terms. We demonstrate that Categorizer provides improved categorization and enrichment results of genetic modifiers of Huntington’s disease compared to a classical GO Slim-based approach or categorizations using other semantic similarity measures.

Conclusion

Categorizer enables more accurate categorizations of genes than currently available methods. This new tool will help experimental and computational biologists analyzing genomic and proteomic data according to their specific needs in a more reliable manner.  相似文献   

8.
It is of great importance to identify new cancer genes from the data of large scale genome screenings of gene mutations in cancers. Considering the alternations of some essential functions are indispensable for oncogenesis, we define them as cancer functions and select, as their approximations, a group of detailed functions in GO (Gene Ontology) highly enriched with known cancer genes. To evaluate the efficiency of using cancer functions as features to identify cancer genes, we define, in the screened genes, the known protein kinase cancer genes as gold standard positives and the other kinase genes as gold standard negatives. The results show that cancer associated functions are more efficient in identifying cancer genes than the selection pressure feature. Furthermore, combining cancer functions with the number of non-silent mutations can generate more reliable positive predictions. Finally, with precision 0.42, we suggest a list of 46 kinase genes as candidate cancer genes which are annotated to cancer functions and carry at least 3 non-silent mutations.  相似文献   

9.
Lee S  Cha JY  Kim H  Yu U 《BMB reports》2012,45(2):120-125
We have developed a biologist-friendly, Java GUI application (GoBean) for GO term enrichment analysis. It was designed to be a comprehensive and flexible GUI tool for GO term enrichment analysis, combining the merits of other programs and incorporating extensive graphic exploration of enrichment results. An intuitive user interface with multiple panels allows for extensive visual scrutiny of analysis results. The program includes many essential and useful features, such as enrichment analysis algorithms, multiple test correction methods, and versatile filtering of enriched GO terms for more focused analyses. A unique graphic interface reflecting the GO tree structure was devised to facilitate comparisons of multiple GO analysis results, which can provide valuable insights for biological interpretation. Additional features to enhance user convenience include built in ID conversion, evidence code-based gene-GO association filtering, set operations of gene lists and enriched GO terms, and user -provided data files. It is available at http://neon.gachon.ac.kr/GoBean/.  相似文献   

10.
MOTIVATION: Many methods have been developed for selecting small informative feature subsets in large noisy data. However, unsupervised methods are scarce. Examples are using the variance of data collected for each feature, or the projection of the feature on the first principal component. We propose a novel unsupervised criterion, based on SVD-entropy, selecting a feature according to its contribution to the entropy (CE) calculated on a leave-one-out basis. This can be implemented in four ways: simple ranking according to CE values (SR); forward selection by accumulating features according to which set produces highest entropy (FS1); forward selection by accumulating features through the choice of the best CE out of the remaining ones (FS2); backward elimination (BE) of features with the lowest CE. RESULTS: We apply our methods to different benchmarks. In each case we evaluate the success of clustering the data in the selected feature spaces, by measuring Jaccard scores with respect to known classifications. We demonstrate that feature filtering according to CE outperforms the variance method and gene-shaving. There are cases where the analysis, based on a small set of selected features, outperforms the best score reported when all information was used. Our method calls for an optimal size of the relevant feature set. This turns out to be just a few percents of the number of genes in the two Leukemia datasets that we have analyzed. Moreover, the most favored selected genes turn out to have significant GO enrichment in relevant cellular processes.  相似文献   

11.
High-throughput genomic technologies enable researchers to identify genes that are co-regulated with respect to specific experimental conditions. Numerous statistical approaches have been developed to identify differentially expressed genes. Because each approach can produce distinct gene sets, it is difficult for biologists to determine which statistical approach yields biologically relevant gene sets and is appropriate for their study. To address this issue, we implemented Latent Semantic Indexing (LSI) to determine the functional coherence of gene sets. An LSI model was built using over 1 million Medline abstracts for over 20,000 mouse and human genes annotated in Entrez Gene. The gene-to-gene LSI-derived similarities were used to calculate a literature cohesion p-value (LPv) for a given gene set using a Fisher's exact test. We tested this method against genes in more than 6,000 functional pathways annotated in Gene Ontology (GO) and found that approximately 75% of gene sets in GO biological process category and 90% of the gene sets in GO molecular function and cellular component categories were functionally cohesive (LPv<0.05). These results indicate that the LPv methodology is both robust and accurate. Application of this method to previously published microarray datasets demonstrated that LPv can be helpful in selecting the appropriate feature extraction methods. To enable real-time calculation of LPv for mouse or human gene sets, we developed a web tool called Gene-set Cohesion Analysis Tool (GCAT). GCAT can complement other gene set enrichment approaches by determining the overall functional cohesion of data sets, taking into account both explicit and implicit gene interactions reported in the biomedical literature. Availability: GCAT is freely available at http://binf1.memphis.edu/gcat.  相似文献   

12.

Background

With the rapid accumulation of genomic data, it has become a challenge issue to annotate and interpret these data. As a representative, Gene set enrichment analysis has been widely used to interpret large molecular datasets generated by biological experiments. The result of gene set enrichment analysis heavily relies on the quality and integrity of gene set annotations. Although several methods were developed to annotate gene sets, there is still a lack of high quality annotation methods. Here, we propose a novel method to improve the annotation accuracy through combining the GO structure and gene expression data.

Results

We propose a novel approach for optimizing gene set annotations to get more accurate annotation results. The proposed method filters the inconsistent annotations using GO structure information and probabilistic gene set clusters calculated by a range of cluster sizes over multiple bootstrap resampled datasets. The proposed method is employed to analyze p53 cell lines, colon cancer and breast cancer gene expression data. The experimental results show that the proposed method can filter a number of annotations unrelated to experimental data and increase gene set enrichment power and decrease the inconsistent of annotations.

Conclusions

A novel gene set annotation optimization approach is proposed to improve the quality of gene annotations. Experimental results indicate that the proposed method effectively improves gene set annotation quality based on the GO structure and gene expression data.
  相似文献   

13.
MOTIVATION: Logistic regression is a standard method for building prediction models for a binary outcome and has been extended for disease classification with microarray data by many authors. A feature (gene) selection step, however, must be added to penalized logistic modeling due to a large number of genes and a small number of subjects. Model selection for this two-step approach requires new statistical tools because prediction error estimation ignoring the feature selection step can be severely downward biased. Generic methods such as cross-validation and non-parametric bootstrap can be very ineffective due to the big variability in the prediction error estimate. RESULTS: We propose a parametric bootstrap model for more accurate estimation of the prediction error that is tailored to the microarray data by borrowing from the extensive research in identifying differentially expressed genes, especially the local false discovery rate. The proposed method provides guidance on the two critical issues in model selection: the number of genes to include in the model and the optimal shrinkage for the penalized logistic regression. We show that selecting more than 20 genes usually helps little in further reducing the prediction error. Application to Golub's leukemia data and our own cervical cancer data leads to highly accurate prediction models. AVAILABILITY: R library GeneLogit at http://geocities.com/jg_liao  相似文献   

14.
Gene selection via the BAHSIC family of algorithms   总被引:1,自引:0,他引:1  
MOTIVATION: Identifying significant genes among thousands of sequences on a microarray is a central challenge for cancer research in bioinformatics. The ultimate goal is to detect the genes that are involved in disease outbreak and progression. A multitude of methods have been proposed for this task of feature selection, yet the selected gene lists differ greatly between different methods. To accomplish biologically meaningful gene selection from microarray data, we have to understand the theoretical connections and the differences between these methods. In this article, we define a kernel-based framework for feature selection based on the Hilbert-Schmidt independence criterion and backward elimination, called BAHSIC. We show that several well-known feature selectors are instances of BAHSIC, thereby clarifying their relationship. Furthermore, by choosing a different kernel, BAHSIC allows us to easily define novel feature selection algorithms. As a further advantage, feature selection via BAHSIC works directly on multiclass problems. RESULTS: In a broad experimental evaluation, the members of the BAHSIC family reach high levels of accuracy and robustness when compared to other feature selection techniques. Experiments show that features selected with a linear kernel provide the best classification performance in general, but if strong non-linearities are present in the data then non-linear kernels can be more suitable. AVAILABILITY: Accompanying homepage is http://www.dbs.ifi.lmu.de/~borgward/BAHSIC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

15.
Previous studies have reported that the tumour cells of nasopharyngeal carcinoma (NPC) exhibit recurrent chromosome abnormalities. These genetic changes are broadly assumed to lead to changes in gene expression which are important for the pathogenesis of this tumour. However, this assumption has yet to be formally tested at a global level. Therefore a genome wide analysis of chromosome copy number and gene expression was performed in tumour cells micro-dissected from the same NPC biopsies. Cellular tumour suppressor and tumour-promoting genes (TSG, TPG) and Epstein-Barr Virus (EBV)-encoded oncogenes were examined. The EBV-encoded genome maintenance protein EBNA1, along with the putative oncogenes LMP1, LMP2 and BARF1 were expressed in the majority of NPCs that were analysed. Significant downregulation of expression in an average of 76 cellular TSGs per tumour was found, whilst a per-tumour average of 88 significantly upregulated, TPGs occurred. The expression of around 60% of putative TPGs and TSGs was both up-and down-regulated in different types of cancer, suggesting that the simplistic classification of genes as TSGs or TPGs may not be entirely appropriate and that the concept of context-dependent onco-suppressors may be more extensive than previously recognised. No significant enrichment of TPGs within regions of frequent genomic gain was seen but TSGs were significantly enriched within regions of frequent genomic loss. It is suggested that loss of the FHIT gene may be a driver of NPC tumourigenesis. Notwithstanding the association of TSGs with regions of genomic loss, on a gene by gene basis and excepting homozygous deletions and high-level amplification, there is very little correlation between chromosomal copy number aberrations and expression levels of TSGs and TPGs in NPC.  相似文献   

16.
17.
18.
19.
20.
《Genomics》2020,112(3):2615-2622
Lung cancer is a leading cause of cancer-related death in the world. Therefore, identifying the genes and molecular pathways involved in lung development and tumorigenesis can help us improve the therapeutic strategies of lung cancer. Accumulating evidence confirms that long noncoding RNAs, as a novel layer of regulatory RNA molecules, play an important role in various aspects of the cells. Here, using available high throughput gene expression data, we identified an lncRNA (HSPC324) with high expression level in lung tissue that is distinctly expressed in lung tumor tissues relative to normal. Using GO enrichment and KEGG pathway analyses, we further analyzed the functions and pathways involving the HSPC324-correlated genes. Ectopic expression of lncRNA HSPC324 significantly inhibited proliferation, cell cycle and migration; on the other hand, increased apoptosis and ROS production in lung adenocarcinoma cells. Overall, this study introduces HSPC324 as a new player in the development of lung cancer.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号