首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
MOTIVATION: Microarrays rapidly generate large quantities of gene expression information, but interpreting such data within a biological context is still relatively complex and laborious. New methods that can identify functionally related genes via shared literature concepts will be useful in addressing these needs. RESULTS: We have developed a novel method that uses implicit literature relationships (concepts related via shared, intermediate concepts) to cluster related genes. Genes are evaluated for implicit connections within a network of biomedical objects (other genes, ontological concepts and diseases) that are connected via their co-occurrences in Medline titles and/or abstracts. On the basis of these implicit relationships, individual gene pairs are scored using a probability-based algorithm. Scores are generated for all pairwise combinations of genes, which are then clustered based on the scores. We applied this method to a test set composed of nine functional groups with known relationships. The method scored highly for all nine groups and significantly better than a benchmark co-occurrence-based method for six groups. We then applied this method to gene sets specific to two previously defined breast tumor subtypes. Analysis of the results recapitulated known biological relationships and identified novel pathway relationships unique to each tumor subtype. We demonstrate that this method provides a valuable new means of identifying and visualizing significantly related genes within gene lists via their implicit relationships in the literature.  相似文献   

2.
MOTIVATION: New relationships are often implicit from existing information, but the amount and growth of published literature limits the scope of analysis an individual can accomplish. Our goal was to develop and test a computational method to identify relationships within scientific reports, such that large sets of relationships between unrelated items could be sought out and statistically ranked for their potential relevance as a set. RESULTS: We first construct a network of tentative relationships between 'objects' of biomedical research interest (e.g. genes, diseases, phenotypes, chemicals) by identifying their co-occurrences within all electronically available MEDLINE records. Relationships shared by two unrelated objects are then ranked against a random network model to estimate the statistical significance of any given grouping. When compared against known relationships, we find that this ranking correlates with both the probability and frequency of object co-occurrence, demonstrating the method is well suited to discover novel relationships based upon existing shared relationships. To test this, we identified compounds whose shared relationships predicted they might affect the development and/or progression of cardiac hypertrophy. When laboratory tests were performed in a rodent model, chlorpromazine was found to reduce the progression of cardiac hypertrophy.  相似文献   

3.
Minguez P  Dopazo J 《PloS one》2011,6(3):e17474
Microarray experiments have been extensively used to define signatures, which are sets of genes that can be considered markers of experimental conditions (typically diseases). Paradoxically, in spite of the apparent functional role that might be attributed to such gene sets, signatures do not seem to be reproducible across experiments. Given the close relationship between function and protein interaction, network properties can be used to study to what extent signatures are composed of genes whose resulting proteins show a considerable level of interaction (and consequently a putative common functional role).We have analysed 618 signatures and 507 modules of co-expression in cancer looking for significant values of four main protein-protein interaction (PPI) network parameters: connection degree, cluster coefficient, betweenness and number of components. A total of 3904 gene ontology (GO) modules, 146 KEGG pathways, and 263 Biocarta pathways have been used as functional modules of reference.Co-expression modules found in microarray experiments display a high level of connectivity, similar to the one shown by conventional modules based on functional definitions (GO, KEGG and Biocarta). A general observation for all the classes studied is that the networks formed by the modules improve their topological parameters when an external protein is allowed to be introduced within the paths (up to the 70% of GO modules show network parameters beyond the random expectation). This fact suggests that functional definitions are incomplete and some genes might still be missing. Conversely, signatures are clearly not capturing the altered functions in the corresponding studies. This is probably because the way in which the genes have been selected in the signatures is too conservative. These results suggest that gene selection methods which take into account relationships among genes should be superior to methods that assume independence among genes outside their functional contexts.  相似文献   

4.
High-throughput genomic technologies enable researchers to identify genes that are co-regulated with respect to specific experimental conditions. Numerous statistical approaches have been developed to identify differentially expressed genes. Because each approach can produce distinct gene sets, it is difficult for biologists to determine which statistical approach yields biologically relevant gene sets and is appropriate for their study. To address this issue, we implemented Latent Semantic Indexing (LSI) to determine the functional coherence of gene sets. An LSI model was built using over 1 million Medline abstracts for over 20,000 mouse and human genes annotated in Entrez Gene. The gene-to-gene LSI-derived similarities were used to calculate a literature cohesion p-value (LPv) for a given gene set using a Fisher's exact test. We tested this method against genes in more than 6,000 functional pathways annotated in Gene Ontology (GO) and found that approximately 75% of gene sets in GO biological process category and 90% of the gene sets in GO molecular function and cellular component categories were functionally cohesive (LPv<0.05). These results indicate that the LPv methodology is both robust and accurate. Application of this method to previously published microarray datasets demonstrated that LPv can be helpful in selecting the appropriate feature extraction methods. To enable real-time calculation of LPv for mouse or human gene sets, we developed a web tool called Gene-set Cohesion Analysis Tool (GCAT). GCAT can complement other gene set enrichment approaches by determining the overall functional cohesion of data sets, taking into account both explicit and implicit gene interactions reported in the biomedical literature. Availability: GCAT is freely available at http://binf1.memphis.edu/gcat.  相似文献   

5.
Additional gene ontology structure for improved biological reasoning   总被引:5,自引:0,他引:5  
MOTIVATION: The Gene Ontology (GO) is a widely used terminology for gene product characterization in, for example, interpretation of biology underlying microarray experiments. The current GO defines term relationships within each of the independent subontologies: molecular function, biological process and cellular component. However, it is evident that there also exist biological relationships between terms of different subontologies. Our aim was to connect the three subontologies to enable GO to cover more biological knowledge, enable a more consistent use of GO and provide new opportunities for biological reasoning. RESULTS: We propose a new structure, the Second Gene Ontology Layer, capturing biological relations not directly reflected in the present ontology structure. Given molecular functions, these paths identify biological processes where the molecular functions are involved and cellular components where they are active. The current Second Layer contains 6271 validated paths, covering 54% of the molecular functions of GO and can be used to render existing gene annotation sets more complete and consistent. Applying Second Layer paths to a set of 4223 human genes, increased biological process annotations by 24% compared to publicly available annotations and reproduced 30% of them. AVAILABILITY: The Second GO is publicly available through the GO Annotation Toolbox (GOAT.no): http://www.goat.no.  相似文献   

6.
7.

Background

Genetic and genomic data analyses are outputting large sets of genes. Functional comparison of these gene sets is a key part of the analysis, as it identifies their shared functions, and the functions that distinguish each set. The Gene Ontology (GO) initiative provides a unified reference for analyzing the genes molecular functions, biological processes and cellular components. Numerous semantic similarity measures have been developed to systematically quantify the weight of the GO terms shared by two genes. We studied how gene set comparisons can be improved by considering gene set particularity in addition to gene set similarity.

Results

We propose a new approach to compute gene set particularities based on the information conveyed by GO terms. A GO term informativeness can be computed using either its information content based on the term frequency in a corpus, or a function of the term''s distance to the root. We defined the semantic particularity of a set of GO terms Sg1 compared to another set of GO terms Sg2. We combined our particularity measure with a similarity measure to compare gene sets. We demonstrated that the combination of semantic similarity and semantic particularity measures was able to identify genes with particular functions from among similar genes. This differentiation was not recognized using only a semantic similarity measure.

Conclusion

Semantic particularity should be used in conjunction with semantic similarity to perform functional analysis of GO-annotated gene sets. The principle is generalizable to other ontologies.  相似文献   

8.
Bell L  Chowdhary R  Liu JS  Niu X  Zhang J 《PloS one》2011,6(6):e21474
A significant part of our biological knowledge is centered on relationships between biological entities (bio-entities) such as proteins, genes, small molecules, pathways, gene ontology (GO) terms and diseases. Accumulated at an increasing speed, the information on bio-entity relationships is archived in different forms at scattered places. Most of such information is buried in scientific literature as unstructured text. Organizing heterogeneous information in a structured form not only facilitates study of biological systems using integrative approaches, but also allows discovery of new knowledge in an automatic and systematic way. In this study, we performed a large scale integration of bio-entity relationship information from both databases containing manually annotated, structured information and automatic information extraction of unstructured text in scientific literature. The relationship information we integrated in this study includes protein-protein interactions, protein/gene regulations, protein-small molecule interactions, protein-GO relationships, protein-pathway relationships, and pathway-disease relationships. The relationship information is organized in a graph data structure, named integrated bio-entity network (IBN), where the vertices are the bio-entities and edges represent their relationships. Under this framework, graph theoretic algorithms can be designed to perform various knowledge discovery tasks. We designed breadth-first search with pruning (BFSP) and most probable path (MPP) algorithms to automatically generate hypotheses--the indirect relationships with high probabilities in the network. We show that IBN can be used to generate plausible hypotheses, which not only help to better understand the complex interactions in biological systems, but also provide guidance for experimental designs.  相似文献   

9.
Identifying clusters of functionally related genes in genomes   总被引:4,自引:0,他引:4  
MOTIVATION: An increasing body of literature shows that genomes of eukaryotes can contain clusters of functionally related genes. Most approaches to identify gene clusters utilize microarray data or metabolic pathway databases to find groups of genes on chromosomes that are linked by common attributes. A generalized method that can find gene clusters regardless of the mechanism of origin would provide researchers with an unbiased method for finding clusters and studying the evolutionary forces that give rise to them. RESULTS: We present an algorithm to identify gene clusters in eukaryotic genomes that utilizes functional categories defined in graph-based vocabularies such as the Gene Ontology (GO). Clusters identified in this manner need only have a common function and are not constrained by gene expression or other properties. We tested the algorithm by analyzing genomes of a representative set of species. We identified species-specific variation in percentage of clustered genes as well as in properties of gene clusters including size distribution and functional annotation. These properties may be diagnostic of the evolutionary forces that lead to the formation of gene clusters. AVAILABILITY: A software implementation of the algorithm and example output files are available at http://fcg.tamu.edu/C_Hunter/.  相似文献   

10.
Prostate cancer is the most common malignancy in urinary system and brings heavy burdens in men. We downloaded gene expression profile of mRNA and related clinical data of GSE70768 data set from public database. Weighted gene co‐expression network analysis (WGCNA) was used to identify the relationships between gene modules and clinical features, as well as the candidate genes. Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) analyses were developed to investigate the potential functions of related hub genes. Importantly, basic experiments were performed to verify the relationship between hub genes and the phenotype previously identified. Lastly, copy number variation (CNV) analysis was conducted to explore the genetical alteration. WGCNA identified that black module was the most relevant module which was tightly related to castration‐resistant prostate cancer (CRPC) phenotype. KEGG and GO analysis results revealed genes in black module were mainly related to RNA splicing. Additionally, 9 genes were chosen as hub genes and heterogeneous nuclear ribonucleoprotein A2/B1 (HNRNPA2B1), golgin A8 family member B (GOLGA8B) and mitogen‐activated protein kinase 8 interacting protein 3 (MAPK8IP3) were identified to be associated with PCa progression and prognosis. Moreover, all above three genes were highly expressed in CRPC‐like cells and their suppression led to hindered cell proliferation in vitro. Finally, CNV analysis found that amplification was the main type of alteration of the 3 hub genes. Our study found that HNRNPA2B1, GOLGA8B and MAPK8IP3 were identified to be tightly associated with tumour progression and prognosis, and further researches are needed before clinical application.  相似文献   

11.
The Gene Ontology (GO) provides biologists with a controlled terminology that describes how genes are associated with functions and how functional terms are related to one another. These term-term relationships encode how scientists conceive the organization of biological functions, and they take the form of a directed acyclic graph (DAG). Here, we propose that the network structure of gene-term annotations made using GO can be employed to establish an alternative approach for grouping functional terms that captures intrinsic functional relationships that are not evident in the hierarchical structure established in the GO DAG. Instead of relying on an externally defined organization for biological functions, our approach connects biological functions together if they are performed by the same genes, as indicated in a compendium of gene annotation data from numerous different sources. We show that grouping terms by this alternate scheme provides a new framework with which to describe and predict the functions of experimentally identified sets of genes.  相似文献   

12.
协作网通常被用于描述各种社会关系,相似的概念也可以应用到转录调控网络的研究中.针对被调控基因共享转录因子的相似性,可以建立一个被调控基因协作网,同样,根据转录因子调控基因的相似性可以建立一个相对较小的转录因子协作网.对被调控基因协作网的聚类研究发现,大部分的类都显著地富集一个或者多个GO功能注释.进一步的结果分析发现某些GO注释的基因更倾向于共享相似的调控机制.这表明,在协作网中,相对简单的调控机制相似性能捕捉生物功能相关的信息.并且,将在二部图分析中使用的概念--"异常点"引入到协作网的分析中,发现协作网的异常点和致死基因有相关性.综上所述,协作网的方法是分析转录调控网络的一个有用的补充.  相似文献   

13.
MOTIVATION: The goal of neighborhood analysis is to find a set of genes (the neighborhood) that is similar to an initial 'seed' set of genes. Neighborhood analysis methods for network data are important in systems biology. If individual network connections are susceptible to noise, it can be advantageous to define neighborhoods on the basis of a robust interconnectedness measure, e.g. the topological overlap measure. Since the use of multiple nodes in the seed set may lead to more informative neighborhoods, it can be advantageous to define multi-node similarity measures. RESULTS: The pairwise topological overlap measure is generalized to multiple network nodes and subsequently used in a recursive neighborhood construction method. A local permutation scheme is used to determine the neighborhood size. Using four network applications and a simulated example, we provide empirical evidence that the resulting neighborhoods are biologically meaningful, e.g. we use neighborhood analysis to identify brain cancer related genes. AVAILABILITY: An executable Windows program and tutorial for multi-node topological overlap measure (MTOM) based analysis can be downloaded from the webpage (http://www.genetics.ucla.edu/labs/horvath/MTOM/).  相似文献   

14.
Although there is extensive information on gene expression and molecular interactions in various cell types, integrating those data in a functionally coherent manner remains challenging. This study explores the premise that genes whose expression at the mRNA level is correlated over diverse cell lines are likely to function together in a network of molecular interactions. We previously derived expression-correlated gene clusters from the database of the NCI-60 human tumor cell lines and associated each cluster with function categories of the Gene Ontology (GO) database. From a cluster rich in genes associated with GO categories related to cell migration, we extracted 15 genes that were highly cross-correlated; prominent among them were RRAS, AXL, ADAM9, FN14, and integrin-beta1. We then used those 15 genes as bait to identify other correlated genes in the NCI-60 database. A survey of current literature disclosed, not only that many of the expression-correlated genes engaged in molecular interactions related to migration, invasion, and metastasis, but that highly cross-correlated subsets of those genes engaged in specific cell migration processes. We assembled this information in molecular interaction maps (MIMs) that depict networks governing 3 cell migration processes: degradation of extracellular matrix, production of transient focal complexes at the leading edge of the cell, and retraction of the rear part of the cell. Also depicted are interactions controlling the release and effects of calcium ions, which may regulate migration in a spaciotemporal manner in the cell. The MIMs and associated text comprise a detailed and integrated summary of what is currently known or surmised about the role of the expression cross-correlated genes in molecular networks governing those processes.  相似文献   

15.
目的:建立挖掘恶性胶质瘤候选基因的方法并进行系统分析。方法:结合恶性胶质瘤已知通路内基因和发生点突变和拷贝数改变的基因构建扩展基因关系网络,计算并分别寻找在网络中度和中心性得分高,脆弱性为正数的节点(基因),将满足一种或多种测度并与已知恶性胶质瘤基因共功能的基因作为恶性胶质瘤候选基因。最后,通过文献验证方法评价多种测度预测恶性胶质瘤基因的效能。结果:融合基因功能后,利用基因在网络中的度和脆弱性可识别大部分恶性胶质瘤基因,但利用中心性预测的结果较差;当将三个测度融合后,效能并没比单独使用脆弱性高。结论:融合基因功能关系和网络脆弱性是预测恶性胶质瘤基因的最佳测度。  相似文献   

16.
Directed indices for exploring gene expression data   总被引:1,自引:0,他引:1  
MOTIVATION: Large expression studies with clinical outcome data are becoming available for analysis. An important goal is to identify genes or clusters of genes where expression is related to patient outcome. While clustering methods are useful data exploration tools, they do not directly allow one to relate the expression data to clinical outcome. Alternatively, methods which rank genes based on their univariate significance do not incorporate gene function or relationships to genes that have been previously identified. In addition, after sifting through potentially thousands of genes, summary estimates (e.g. regression coefficients or error rates) algorithms should address the potentially large bias introduced by gene selection. RESULTS: We developed a gene index technique that generalizes methods that rank genes by their univariate associations to patient outcome. Genes are ordered based on simultaneously linking their expression both to patient outcome and to a specific gene of interest. The technique can also be used to suggest profiles of gene expression related to patient outcome. A cross-validation method is shown to be important for reducing bias due to adaptive gene selection. The methods are illustrated on a recently collected gene expression data set based on 160 patients with diffuse large cell lymphoma (DLCL).  相似文献   

17.
18.
19.
GoSurfer   总被引:2,自引:0,他引:2  
The analysis of complex patterns of gene regulation is central to understanding the biology of cells, tissues and organisms. Patterns of gene regulation pertaining to specific biological processes can be revealed by a variety of experimental strategies, particularly microarrays and other highly parallel methods, which generate large datasets linking many genes. Although methods for detecting gene expression have improved substantially in recent years, understanding the physiological implications of complex patterns in gene expression data is a major challenge. This article presents GoSurfer, an easy-to-use graphical exploration tool with built-in statistical features that allow a rapid assessment of the biological functions represented in large gene sets. GoSurfer takes one or two list(s) of gene identifiers (Affymetrix probe set ID) as input and retrieves all the Gene Ontology (GO) terms associated with the input genes. GoSurfer visualises these GO terms in a hierarchical tree format. With GoSurfer, users can perform statistical tests to search for the GO terms that are enriched in the annotations of the input genes. These GO terms can be highlighted on the GO tree. Users can manipulate the GO tree in various ways and interactively query the genes associated with any GO term. The user-generated graphics can be saved as graphics files, and all the GO information related to the input genes can be exported as text files. AVAILABILITY: GoSurfer is a Windows-based program freely available for noncommercial use and can be downloaded at http://www.gosurfer.org. Datasets used to construct the trees shown in the figures in this article are available at http://www.gosurfer.org/download/GoSurfer.zip.  相似文献   

20.
MOTIVATION: A methodology to search for genes associated with multifactorial diseases by integrating the large amount of accumulated knowledge is seriously needed. A comprehensive understanding derived from a holistic view of gene relationship structures can be gained from our proposed analysis called the cross-subspace analysis (CSA). In this analysis, gene objects are generated by machine learning using their term occurrence patterns in MEDLINE abstracts and the degree of relationship between gene objects is quantified by matching these patterns. RESULTS: Structuralization of relationships of a set of genes was performed using CSA, which were retrieved using the terms, 'obesity', 'diabetes', 'hypertriglyceridemia' and 'hypertension' that refer to diseases comprising metabolic syndrome, on a 2D plane inferring important biomedical concepts from the gene distribution. Then, we prioritized the significance of 6131 well-annotated human genes in terms of the distance on the plane from the centroid of 'metabolic syndrome'-related genes distribution. The validity was confirmed by comparing the knowledge extracted by the ordering with existing medical knowledge.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号