共查询到20条相似文献,搜索用时 0 毫秒
1.
A new method to measure the semantic similarity of GO terms 总被引:4,自引:0,他引:4
2.
Dowell RD 《Genome biology》2011,12(1):101
Meta-analysis of human and mouse microarray data reveals conservation of patterns of gene expression that will help to better
characterize the evolution of gene expression. 相似文献
3.
Existing methods for calculating semantic similarities between pairs of Gene Ontology (GO) terms and gene products often rely on external databases like Gene Ontology Annotation (GOA) that annotate gene products using the GO terms. This dependency leads to some limitations in real applications. Here, we present a semantic similarity algorithm (SSA), that relies exclusively on the GO. When calculating the semantic similarity between a pair of input GO terms, SSA takes into account the shortest path between them, the depth of their nearest common ancestor, and a novel similarity score calculated between the definitions of the involved GO terms. In our work, we use SSA to calculate semantic similarities between pairs of proteins by combining pairwise semantic similarities between the GO terms that annotate the involved proteins. The reliability of SSA was evaluated by comparing the resulting semantic similarities between proteins with the functional similarities between proteins derived from expert annotations or sequence similarity. Comparisons with existing state-of-the-art methods showed that SSA is highly competitive with the other methods. SSA provides a reliable measure for semantics similarity independent of external databases of functional-annotation observations. 相似文献
4.
Background
With the rapid accumulation of genomic data, it has become a challenge issue to annotate and interpret these data. As a representative, Gene set enrichment analysis has been widely used to interpret large molecular datasets generated by biological experiments. The result of gene set enrichment analysis heavily relies on the quality and integrity of gene set annotations. Although several methods were developed to annotate gene sets, there is still a lack of high quality annotation methods. Here, we propose a novel method to improve the annotation accuracy through combining the GO structure and gene expression data.Results
We propose a novel approach for optimizing gene set annotations to get more accurate annotation results. The proposed method filters the inconsistent annotations using GO structure information and probabilistic gene set clusters calculated by a range of cluster sizes over multiple bootstrap resampled datasets. The proposed method is employed to analyze p53 cell lines, colon cancer and breast cancer gene expression data. The experimental results show that the proposed method can filter a number of annotations unrelated to experimental data and increase gene set enrichment power and decrease the inconsistent of annotations.Conclusions
A novel gene set annotation optimization approach is proposed to improve the quality of gene annotations. Experimental results indicate that the proposed method effectively improves gene set annotation quality based on the GO structure and gene expression data.5.
Phenotypic similarity is correlated with a number of measures of gene function, such as relatedness at the level of direct protein-protein interaction. The phenotypic effect of a deleted or mutated gene, which is one part of gene annotation, has caught broad attention. However, there have been few measures to study phenotypic similarity with the data from Human Phenotype Ontology (HPO) database, therefore more analogous measures should be developed and investigated. We used five semantic similarity-based measures (Jiang and Conrath, Lin, Schlicker, Yu and Wu) to calculate the human phenotypic similarity between genes (PSG) with data from HPO database, and evaluated their accuracy with information of protein-protein interaction, protein complex, protein family, gene function or DNA sequence. Compared with the gene pairs that were random selected, the results of these methods were statistically significant (all P<0.001). Furthermore, we assessed the performance of these five measures by receiver operating characteristic (ROC) curve analysis, and found that most of them performed better than the previous methods. This work had proved that these measures based on semantic similarity for calculation of PSG were effective for hierarchical structure data. Our study contributes to the development and optimization of novel algorithms of PSG calculation and provides more alternative methods to researchers as well as tools and directions for PSG study. 相似文献
6.
一种新的基因注释语义相似度计算方法 总被引:1,自引:0,他引:1
基因本体(GO)数据库为基因提供了统一的注释,有效地解决了不同数据库描述相同基因的不一致问题。但是,根据基因注释如何比较基因的功能相似性,这个问题仍然没有得到有效解决。本文提出一种新的基因注释语义相似度计算方法,这种方法在本质上是基于基因的生物学特性,其特点在于结点的语义相似度与结点所在集合无关,只与结点在GO图的位置有关,语义相似度可被重复利用。它既考虑了基因所映射的GO结点深度,又考虑了两GO结点之间所有路径对结点语义相似度的影响。文中以酵母菌的异亮氨酸降解代谢通路和谷氨酸合成代谢通路为实验,实验结果表明这种算法能准确地计算基因注释语义相似度。 相似文献
7.
Jorn R De Haan Ester Piek Rene C van Schaik Jacob de Vlieg Susanne Bauerschmidt Lutgarde MC Buydens Ron Wehrens 《BMC bioinformatics》2010,11(1):158
Background
Gene expression data can be analyzed by summarizing groups of individual gene expression profiles based on GO annotation information. The mean expression profile per group can then be used to identify interesting GO categories in relation to the experimental settings. However, the expression profiles present in GO classes are often heterogeneous, i.e., there are several different expression profiles within one class. As a result, important experimental findings can be obscured because the summarizing profile does not seem to be of interest. We propose to tackle this problem by finding homogeneous subclasses within GO categories: preclustering. 相似文献8.
MOTIVATION: The inference of genes that are truly associated with inherited human diseases from a set of candidates resulting from genetic linkage studies has been one of the most challenging tasks in human genetics. Although several computational approaches have been proposed to prioritize candidate genes relying on protein-protein interaction (PPI) networks, these methods can usually cover less than half of known human genes. RESULTS: We propose to rely on the biological process domain of the gene ontology to construct a gene semantic similarity network and then use the network to infer disease genes. We show that the constructed network covers about 50% more genes than a typical PPI network. By analyzing the gene semantic similarity network with the PPI network, we show that gene pairs tend to have higher semantic similarity scores if the corresponding proteins are closer to each other in the PPI network. By analyzing the gene semantic similarity network with a phenotype similarity network, we show that semantic similarity scores of genes associated with similar diseases are significantly different from those of genes selected at random, and that genes with higher semantic similarity scores tend to be associated with diseases with higher phenotype similarity scores. We further use the gene semantic similarity network with a random walk with restart model to infer disease genes. Through a series of large-scale leave-one-out cross-validation experiments, we show that the gene semantic similarity network can achieve not only higher coverage but also higher accuracy than the PPI network in the inference of disease genes. 相似文献
9.
MOTIVATION: Function annotation of an unclassified protein on the basis of its interaction partners is well documented in the literature. Reliable predictions of interactions from other data sources such as gene expression measurements would provide a useful route to function annotation. We investigate the global relationship of protein-protein interactions with gene expression. This relationship is studied in four evolutionarily diverse species, for which substantial information regarding their interactions and expression is available: human, mouse, yeast and Escherichia coli. RESULTS: In E.coli the expression of interacting pairs is highly correlated in comparison to random pairs, while in the other three species, the correlation of expression of interacting pairs is only slightly stronger than that of random pairs. To strengthen the correlation, we developed a protocol to integrate ortholog information into the interaction and expression datasets. In all four genomes, the likelihood of predicting protein interactions from highly correlated expression data is increased using our protocol. In yeast, for example, the likelihood of predicting a true interaction, when the correlation is > 0.9, increases from 1.4 to 9.4. The improvement demonstrates that protein interactions are reflected in gene expression and the correlation between the two is strengthened by evolution information. The results establish that co-expression of interacting protein pairs is more conserved than that of random ones. 相似文献
10.
Molecular cloning of the maize gene crp1 reveals similarity between regulators of mitochondrial and chloroplast gene expression 总被引:15,自引:0,他引:15 下载免费PDF全文
The maize nuclear gene crp1 is required for the translation of the chloroplast petA and petD mRNAs and for the processing of the petD mRNA from a polycistronic precursor. In order to understand the biochemical role of the crp1 gene product and the interconnections between chloroplast translation and RNA metabolism, the crp1 gene and cDNA were cloned. The predicted crp1 gene product (CRP1) is related to nuclear genes in fungi that play an analogous role in mitochondrial gene expression, suggesting an underlying mechanistic similarity. Analysis of double mutants that lack both chloroplast ribosomes and crp1 function indicated that CRP1 activates a site-specific endoribonuclease independently of any role it plays in translation. Antibodies prepared to recombinant CRP1 were used to demonstrate that CRP1 is localized to the chloroplast stroma and that it is a component of a multisubunit complex. The CRP1 complex is not associated detectably with either chloroplast membranes or chloroplast ribosomes. Models for CRP1 function and its relationship to other activators of organellar translation are discussed. 相似文献
11.
Background
Conventionally, the first step in analyzing the large and high-dimensional data sets measured by microarrays is visual exploration. Dendrograms of hierarchical clustering, self-organizing maps (SOMs), and multidimensional scaling have been used to visualize similarity relationships of data samples. We address two central properties of the methods: (i) Are the visualizations trustworthy, i.e., if two samples are visualized to be similar, are they really similar? (ii) The metric. The measure of similarity determines the result; we propose using a new learning metrics principle to derive a metric from interrelationships among data sets.Results
The trustworthiness of hierarchical clustering, multidimensional scaling, and the self-organizing map were compared in visualizing similarity relationships among gene expression profiles. The self-organizing map was the best except that hierarchical clustering was the most trustworthy for the most similar profiles. Trustworthiness can be further increased by treating separately those genes for which the visualization is least trustworthy. We then proceed to improve the metric. The distance measure between the expression profiles is adjusted to measure differences relevant to functional classes of the genes. The genes for which the new metric is the most different from the usual correlation metric are listed and visualized with one of the visualization methods, the self-organizing map, computed in the new metric.Conclusions
The conjecture from the methodological results is that the self-organizing map can be recommended to complement the usual hierarchical clustering for visualizing and exploring gene expression data. Discarding the least trustworthy samples and improving the metric still improves it.12.
13.
Min Li Qi Li Gamage Upeksha Ganegoda JianXin Wang FangXiang Wu Yi Pan 《中国科学:生命科学英文版》2014,57(11):1064-1071
Identification of disease-causing genes among a large number of candidates is a fundamental challenge in human disease studies. However, it is still time-consuming and laborious to determine the real disease-causing genes by biological experiments. With the advances of the high-throughput techniques, a large number of protein-protein interactions have been produced. Therefore, to address this issue, several methods based on protein interaction network have been proposed. In this paper, we propose a shortest path-based algorithm, named SPranker, to prioritize disease-causing genes in protein interaction networks. Considering the fact that diseases with similar phenotypes are generally caused by functionally related genes, we further propose an improved algorithm SPGOranker by integrating the semantic similarity of GO annotations. SPGOranker not only considers the topological similarity between protein pairs in a protein interaction network but also takes their functional similarity into account. The proposed algorithms SPranker and SPGOranker were applied to 1598 known orphan disease-causing genes from 172 orphan diseases and compared with three state-of-the-art approaches, ICN, VS and RWR. The experimental results show that SPranker and SPGOranker outperform ICN, VS, and RWR for the prioritization of orphan disease-causing genes. Importantly, for the case study of severe combined immunodeficiency, SPranker and SPGOranker predict several novel causal genes. 相似文献
14.
Background
Semantic similarity measures are useful to assess the physiological relevance of protein-protein interactions (PPIs). They quantify similarity between proteins based on their function using annotation systems like the Gene Ontology (GO). Proteins that interact in the cell are likely to be in similar locations or involved in similar biological processes compared to proteins that do not interact. Thus the more semantically similar the gene function annotations are among the interacting proteins, more likely the interaction is physiologically relevant. However, most semantic similarity measures used for PPI confidence assessment do not consider the unequal depth of term hierarchies in different classes of cellular location, molecular function, and biological process ontologies of GO and thus may over-or under-estimate similarity. 相似文献15.
16.
《Saudi Journal of Biological Sciences》2019,26(8):1986-1990
ObjectiveAutophagy is a cellular pathway that regulates the transportation and degradation of cytoplasmic macromolecules and organelles towards lysosome, which is often related to the tumorigenesis and tumor suppression. Here, we investigate the regulating effect of PTEN gene on autophagy-related protein P62 in rat colorectal cancer (CRC) cells and explore the application value of PTEN gene in clinic.MethodsRat colorectal cancer was induced by intraperitoneal injection of 1,2-dimethyl hydrazine in male ACI rats. A total of 20 rats were randomly selected from those successfully induced with CRC as the experimental group, while 10 healthy rats as control. The rat CRC cells were isolated and cultured. After transfecting the rat CRC cells with pEGFP-N1-PTEN plasmid, RT-PCR was adopted to examine that gene expression of p62 and PTEN, while Western blotting was used to detect the protein expression of p62 and PTEN. Also, the proliferation of CRC cells was measured by MTT assay.ResultsThe expression of PTEN gene in the experimental group was significantly inhibited as compared with the control group, while the expression of P62 gene was significantly increased (p < 0.05). Western blotting demonstrated that the PTEN protein in the experimental group was lower, while the expression of P62 protein was higher. When the CRC cells were transfected with pEGFP-N1-PTEN plasmid, the PTEN expressions were elevated, while p62 was down-regulated. Also, the proliferation of CRC cells was inhibited.ConclusionThe expression of PTEN gene is negatively correlated with the expression of P62 gene in rat CRC cells. And the expression of PTEN gene can inhibit the occurrence and development of colorectal cancer, thus providing theoretical basis for future clinical treatment. 相似文献
17.
18.
19.
Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation 总被引:12,自引:0,他引:12
MOTIVATION: Many bioinformatics data resources not only hold data in the form of sequences, but also as annotation. In the majority of cases, annotation is written as scientific natural language: this is suitable for humans, but not particularly useful for machine processing. Ontologies offer a mechanism by which knowledge can be represented in a form capable of such processing. In this paper we investigate the use of ontological annotation to measure the similarities in knowledge content or 'semantic similarity' between entries in a data resource. These allow a bioinformatician to perform a similarity measure over annotation in an analogous manner to those performed over sequences. A measure of semantic similarity for the knowledge component of bioinformatics resources should afford a biologist a new tool in their repertoire of analyses. RESULTS: We present the results from experiments that investigate the validity of using semantic similarity by comparison with sequence similarity. We show a simple extension that enables a semantic search of the knowledge held within sequence databases. AVAILABILITY: Software available from http://www.russet.org.uk. 相似文献
20.
MOTIVATIONS: Bi-clustering is an important approach in microarray data analysis. The underlying bases for using bi-clustering in the analysis of gene expression data are (1) similar genes may exhibit similar behaviors only under a subset of conditions, not all conditions, (2) genes may participate in more than one function, resulting in one regulation pattern in one context and a different pattern in another. Using bi-clustering algorithms, one can obtain sets of genes that are co-regulated under subsets of conditions. RESULTS: We develop a polynomial time algorithm to find an optimal bi-cluster with the maximum similarity score. To our knowledge, this is the first formulation for bi-cluster problems that admits a polynomial time algorithm for optimal solutions. The algorithm works for a special case, where the bi-clusters are approximately squares. We then extend the algorithm to handle various kinds of other cases. Experiments on simulation data and real data show that the new algorithms outperform most of the existing methods in many cases. Our new algorithms have the following advantages: (1) no discretization procedure is required, (2) performs well for overlapping bi-clusters and (3) works well for additive bi-clusters. AVAILABILITY: The software is available at http://www.cs.cityu.edu.hk/~liuxw/msbe/help.html. 相似文献