首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
MOTIVATION: Microarrays rapidly generate large quantities of gene expression information, but interpreting such data within a biological context is still relatively complex and laborious. New methods that can identify functionally related genes via shared literature concepts will be useful in addressing these needs. RESULTS: We have developed a novel method that uses implicit literature relationships (concepts related via shared, intermediate concepts) to cluster related genes. Genes are evaluated for implicit connections within a network of biomedical objects (other genes, ontological concepts and diseases) that are connected via their co-occurrences in Medline titles and/or abstracts. On the basis of these implicit relationships, individual gene pairs are scored using a probability-based algorithm. Scores are generated for all pairwise combinations of genes, which are then clustered based on the scores. We applied this method to a test set composed of nine functional groups with known relationships. The method scored highly for all nine groups and significantly better than a benchmark co-occurrence-based method for six groups. We then applied this method to gene sets specific to two previously defined breast tumor subtypes. Analysis of the results recapitulated known biological relationships and identified novel pathway relationships unique to each tumor subtype. We demonstrate that this method provides a valuable new means of identifying and visualizing significantly related genes within gene lists via their implicit relationships in the literature.  相似文献   

2.

Background  

High-throughput experiments, such as with DNA microarrays, typically result in hundreds of genes potentially relevant to the process under study, rendering the interpretation of these experiments problematic. Here, we propose and evaluate an approach to find functional associations between large numbers of genes and other biomedical concepts from free-text literature. For each gene, a profile of related concepts is constructed that summarizes the context in which the gene is mentioned in literature. We assign a weight to each concept in the profile based on a likelihood ratio measure. Gene concept profiles can then be clustered to find related genes and other concepts.  相似文献   

3.
The gene composition of present-day genomes has been shaped by a complicated evolutionary history, resulting in diverse distributions of genes across genomes. The pattern of presence and absence of a gene in different genomes is called its phylogenetic profile. It has been shown that proteins whose encoding genes have highly similar profiles tend to be functionally related: As these genes were gained and lost together, their encoded proteins can probably only perform their full function if both are present. However, a large proportion of genes encoding interacting proteins do not have matching profiles. In this study, we analysed one possible reason for this, namely that phylogenetic profiles can be affected by multi-functional proteins such as shared subunits of two or more protein complexes. We found that by considering triplets of proteins, of which one protein is multi-functional, a large fraction of disturbed co-occurrence patterns can be explained.  相似文献   

4.
Gene clustering by latent semantic indexing of MEDLINE abstracts   总被引:1,自引:0,他引:1  
MOTIVATION: A major challenge in the interpretation of high-throughput genomic data is understanding the functional associations between genes. Previously, several approaches have been described to extract gene relationships from various biological databases using term-matching methods. However, more flexible automated methods are needed to identify functional relationships (both explicit and implicit) between genes from the biomedical literature. In this study, we explored the utility of Latent Semantic Indexing (LSI), a vector space model for information retrieval, to automatically identify conceptual gene relationships from titles and abstracts in MEDLINE citations. RESULTS: We found that LSI identified gene-to-gene and keyword-to-gene relationships with high average precision. In addition, LSI identified implicit gene relationships based on word usage patterns in the gene abstract documents. Finally, we demonstrate here that pairwise distances derived from the vector angles of gene abstract documents can be effectively used to functionally group genes by hierarchical clustering. Our results provide proof-of-principle that LSI is a robust automated method to elucidate both known (explicit) and unknown (implicit) gene relationships from the biomedical literature. These features make LSI particularly useful for the analysis of novel associations discovered in genomic experiments. AVAILABILITY: The 50-gene document collection used in this study can be interactively queried at http://shad.cs.utk.edu/sgo/sgo.html.  相似文献   

5.
Genomic screens for small RNA candidates in Enterobacteriacae genomes were carried out with existing small RNA sequences, conserved flanking genes, and genomic backbone information. The small RNA sequences and contexts from E. coli K12 formed the basis of the search. Sequence identity identified 117 additional small RNA homologs in related genomes. Motifs of continuous sequence stretches added another 48 sRNA regions, termed partial homologs. However, this study is unique in identifying 160 nonhomologous sRNA loci in related genomes based on the conserved flanking gene synteny and the backbone retention information obtained from KEGG-SSDB. Gene synteny and genomic backbone continuity were observed to be correlated with all of the sRNAs in related genomes. This search is the first of its kind toward identification of functionally important regions using gene order and back-bone information. A disruption in flanking gene order or genomic backbone indicates a possible hotspot for alien gene pool integration. This study reports both occurrence of multiple copies of a sRNA and co-occurrence of different sRNAs between a pair of conserved flanking genes. In general, synteny and genomic backbone retention information can be added as additional search criteria toward the design of precise bioinformatics tools for sRNA, gene identification, and gene functional annotations in related genomes.  相似文献   

6.
Biochemical and cytogenetic experiments have led to the hypothesis that eukaryotic chromatin is organized into a series of distinct domains that are functionally independent. Two expectations of this hypothesis are: (i) adjacent genes are more frequently co-expressed than is expected by chance; and (ii) co-expressed neighbouring genes are often functionally related. Here we report that over 10% of Arabidopsis thaliana genes are within large, co-expressed chromosomal regions. Two per cent (497/22,520) of genes are highly co-expressed (r > 0.7), about five times the number expected by chance. These genes fall into 226 groups distributed across the genome, and each group typically contains two to three genes. Among the highly co-expressed groups, 40% (91/226) have genes with high amino acid sequence similarity. Nonetheless, duplicate genes alone do not explain the observed levels of co-expression. Co-expressed, non-homologous genes are transcribed in parallel, share functions, and lie close together more frequently than expected. Our results show that the A. thaliana genome contains domains of gene expression. Small domains have highly co-expressed genes that often share functional and sequence similarity and are probably co-regulated by nearby regulatory sequences. Genes within large, significantly correlated groups are typically co-regulated at a low level, suggesting the presence of large chromosomal domains.  相似文献   

7.
8.
MOTIVATION: Many experimental and algorithmic approaches in biology generate groups of genes that need to be examined for related functional properties. For example, gene expression profiles are frequently organized into clusters of genes that may share functional properties. We evaluate a method, neighbor divergence per gene (NDPG), that uses scientific literature to assess whether a group of genes are functionally related. The method requires only a corpus of documents and an index connecting the documents to genes. RESULTS: We evaluate NDPG on 2796 functional groups generated by the Gene Ontology consortium in four organisms: mouse, fly, worm and yeast. NDPG finds functional coherence in 96, 92, 82 and 45% of the groups (at 99.9% specificity) in yeast, mouse, fly and worm respectively.  相似文献   

9.
10.
Finding edging genes from microarray data   总被引:1,自引:0,他引:1  
MOTIVATION: A set of genes and their gene expression levels are used to classify disease and normal tissues. Due to the massive number of genes in microarray, there are a large number of edges to divide different classes of genes in microarray space. The edging genes (EGs) can be co-regulated genes, they can also be on the same pathway or deregulated by the same non-coding genes, such as siRNA or miRNA. Every gene in EGs is vital for identifying a tissue's class. The changing in one EG's gene expression may cause a tissue alteration from normal to disease and vice versa. Finding EGs is of biological importance. In this work, we propose an algorithm to effectively find these EGs. RESULT: We tested our algorithm with five microarray datasets. The results are compared with the border-based algorithm which was used to find gene groups and subsequently divide different classes of tissues. Our algorithm finds a significantly larger amount of EGs than does the border-based algorithm. As our algorithm prunes irrelevant patterns at earlier stages, time and space complexities are much less prevalent than in the border-based algorithm. AVAILABILITY: The algorithm proposed is implemented in C++ on Linux platform. The EGs in five microarray datasets are calculated. The preprocessed datasets and the discovered EGs are available at http://www3.it.deakin.edu.au/~phoebe/microarray.html.  相似文献   

11.
12.
Wu DD  Zhang YP 《Genomics》2011,98(5):367-369
Horizontal gene transfer, the movement of genetic materials across the normal mating barriers between organisms occurs frequently and contributes significantly to the evolution of both eukaryotic and prokaryotic genomes. However, few concurrent transfers of functionally related genes implemented in a pathway from eukaryotes to prokaryotes are observed. Here, we did phylogenetic analyses to support that the genes, i.e. dihydrofolate reductase, glycine hydroxymethyltransferase, and thymidylate synthase involved in thymidylate metabolism, in Hz-1 virus were obtained from insect genome recently by independent horizontal gene transfers. In addition, five other related genes in nucleotide metabolism show evidences of horizontal gene transfers. These genes demonstrate similar expression pattern, and they may have formatted a functionally related pathway (e.g. thymidylate synthesis, and DNA replication) in Hz-1 virus. In conclusion, we provide an example of horizontal gene transfer of functionally related genes in a pathway to prokaryote from eukaryote.  相似文献   

13.
The scientific literature represents a rich source for retrieval of knowledge on associations between biomedical concepts such as genes, diseases and cellular processes. A commonly used method to establish relationships between biomedical concepts from literature is co-occurrence. Apart from its use in knowledge retrieval, the co-occurrence method is also well-suited to discover new, hidden relationships between biomedical concepts following a simple ABC-principle, in which A and C have no direct relationship, but are connected via shared B-intermediates. In this paper we describe CoPub Discovery, a tool that mines the literature for new relationships between biomedical concepts. Statistical analysis using ROC curves showed that CoPub Discovery performed well over a wide range of settings and keyword thesauri. We subsequently used CoPub Discovery to search for new relationships between genes, drugs, pathways and diseases. Several of the newly found relationships were validated using independent literature sources. In addition, new predicted relationships between compounds and cell proliferation were validated and confirmed experimentally in an in vitro cell proliferation assay. The results show that CoPub Discovery is able to identify novel associations between genes, drugs, pathways and diseases that have a high probability of being biologically valid. This makes CoPub Discovery a useful tool to unravel the mechanisms behind disease, to find novel drug targets, or to find novel applications for existing drugs.  相似文献   

14.
15.
Karan D  David JR  Capy P 《Gene》2001,265(1-2):95-101
Acetyl-CoA-Synthetase (ACS) is involved in the production of acetate, a major metabolite in numerous organisms. There are two forms of this enzyme: ADP-forming ACS and ATP-forming ACS. We focus mainly on the AMP-forming ACS gene, which is relatively well conserved in eubacteria, archeaebacteria, and eukaryotes. BLAST searches in databases showed 30 protein sequences significantly related to the ACS. Most of these sequences were identified as ACS but three of them, belonging to the mammalian species, were annotated as another gene named: the SA gene, which is involved in the essential hypertension. The ACS and SA genes probably derived from a duplication of an ancestral gene but have acquired different functions. Six conserved regions of the ACS protein were defined across the three domains of life. While the precise function of the conserved regions remains unknown, they are probably involved in the enzymatic activity. Among eukaryotes, we found a high variability with respect to the number and the position of introns. However, some positions are conserved between fungi and a nematode. A maximum likelihood tree based upon the conserved regions showed that all sequences except the one from B. subtilis, belong to two basic groups: one the SA-like group including sequences from Archaeoglobus fulgidus and Streptomyces coelicolor, and second, the ACS group. The later can be further divided in two parts: a prokaryotic one including eubacteria and an archaebacterium, and a eukaryotic group within which two proteobacterial sequences branch including ACS from the alpha-proteobacterium Rhodobacter capsulatus. Within the eukaryotic group, bootstrap support is very low, but overall the data are consistent with the view that eukaryotes acquired their ACS gene from the ancestors of mitochondria. The localization of this enzyme in eukaryotic mitochondria is the additional evidence in favor of this interpretation.  相似文献   

16.
Molecular understanding of morphological agronomic traits is very important to improve grain yield and quality. According to the literature information summarized in Overview of Functionally Characterized Genes in Rice online database, 430 genes related to these traits have been functionally characterized in rice, while the functions of other genes remain to be elucidated. Gene indexed mutants are available for at least half of the genes identified in the rice genome, and are very useful resources to study gene function. To suggest candidate genes for functional studies associated with morphological agronomic traits, we identified genes with tissue/organ-preferred expression patterns through meta-analysis of microarray data, and identified 781 genes for roots, 1,084 for leaves, 1,029 for calluses, 927 for anthers, 241 for embryos, and 343 for endosperms. Additionally, 4,243 genes expressed in all tissue types were allocated to a ubiquitously-expressed gene group (‘housekeeping’ genes). The estimated tissue/organ-preferred and housekeeping genes accounted for 40% of the characterized genes associated with morphological agronomic traits, indicating that identification of tissue/organ-preferred genes is an effective way to provide putative gene function. In this study, we reported the information of gene-indexed mutants for 84% of the identified candidate genes. Our candidate genes and relating indexed mutant resources can potentially be used to improve morphological agronomic traits in rice.  相似文献   

17.
Matsuo T 《Genetics》2008,178(2):1061-1072
Genes encoding odorant-binding protein (OBP) form a large family in an insect genome. Two OBP genes, Obp57d and Obp57e, were previously identified to be involved in host-plant recognition in Drosophila sechellia. Here, by comparing the genomic sequences at the Obp57d/e locus from 27 Drosophila species, we found large differences in gene number between species. Phylogenetic analysis revealed that Obp57d and Obp57e in the D. melanogaster species group arose by gene duplication of an ancestral OBP gene that remains single in the obscura species group. Further gain and loss of OBP genes were observed in several lineages in the melanogaster group. Site-specific analysis of evolutionary rate suggests that Obp57d and Obp57e have functionally diverged from each other. Thus, there are two classes of gene number differences in the Obp57d/e region: the difference of the genes that have functionally diverged from each other and the difference of the genes that appear to be functionally identical. Our analyses demonstrate that these two classes of differences can be distinguished by comparisons of many genomic sequences from closely related species.  相似文献   

18.

Motivation

Weighted semantic networks built from text-mined literature can be used to retrieve known protein-protein or gene-disease associations, and have been shown to anticipate associations years before they are explicitly stated in the literature. Our text-mining system recognizes over 640,000 biomedical concepts: some are specific (i.e., names of genes or proteins) others generic (e.g., ‘Homo sapiens’). Generic concepts may play important roles in automated information retrieval, extraction, and inference but may also result in concept overload and confound retrieval and reasoning with low-relevance or even spurious links. Here, we attempted to optimize the retrieval performance for protein-protein interactions (PPI) by filtering generic concepts (node filtering) or links to generic concepts (edge filtering) from a weighted semantic network. First, we defined metrics based on network properties that quantify the specificity of concepts. Then using these metrics, we systematically filtered generic information from the network while monitoring retrieval performance of known protein-protein interactions. We also systematically filtered specific information from the network (inverse filtering), and assessed the retrieval performance of networks composed of generic information alone.

Results

Filtering generic or specific information induced a two-phase response in retrieval performance: initially the effects of filtering were minimal but beyond a critical threshold network performance suddenly drops. Contrary to expectations, networks composed exclusively of generic information demonstrated retrieval performance comparable to unfiltered networks that also contain specific concepts. Furthermore, an analysis using individual generic concepts demonstrated that they can effectively support the retrieval of known protein-protein interactions. For instance the concept “binding” is indicative for PPI retrieval and the concept “mutation abnormality” is indicative for gene-disease associations.

Conclusion

Generic concepts are important for information retrieval and cannot be removed from semantic networks without negative impact on retrieval performance.  相似文献   

19.
A limitation of many gene expression analytic approaches is that they do not incorporate comprehensive background knowledge about the genes into the analysis. We present a computational method that leverages the peer-reviewed literature in the automatic analysis of gene expression data sets. Including the literature in the analysis of gene expression data offers an opportunity to incorporate functional information about the genes when defining expression clusters. We have created a method that associates gene expression profiles with known biological functions. Our method has two steps. First, we apply hierarchical clustering to the given gene expression data set. Secondly, we use text from abstracts about genes to (i) resolve hierarchical cluster boundaries to optimize the functional coherence of the clusters and (ii) recognize those clusters that are most functionally coherent. In the case where a gene has not been investigated and therefore lacks primary literature, articles about well-studied homologous genes are added as references. We apply our method to two large gene expression data sets with different properties. The first contains measurements for a subset of well-studied Saccharomyces cerevisiae genes with multiple literature references, and the second contains newly discovered genes in Drosophila melanogaster; many have no literature references at all. In both cases, we are able to rapidly define and identify the biologically relevant gene expression profiles without manual intervention. In both cases, we identified novel clusters that were not noted by the original investigators.  相似文献   

20.
Shachar O  Linial M 《Proteins》2004,57(3):531-538
With currently available sequence data, it is feasible to conduct extensive comparisons among large sets of protein sequences. It is still a much more challenging task to partition the protein space into structurally and functionally related families solely based on sequence comparisons. The ProtoNet system automatically generates a treelike classification of the whole protein space. It stands to reason that this classification reflects evolutionary relationships, both close and remote. In this article, we examine this hypothesis. We present a semiautomatic procedure that singles out certain inner nodes in the ProtoNet tree that should ideally correspond to structurally and functionally defined protein families. We compare the performance of this method against several expert systems. Some of the competing methods incorporate additional extraneous information on protein structure or on enzymatic activities. The ProtoNet-based method performs at least as well as any of the methods with which it was compared. This article illustrates the ProtoNet-based method on several evolutionarily diverse families. Using this new method, an evolutionary divergence scheme can be proposed for a large number of structural and functional related superfamilies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号