首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 0 毫秒
The correct interpretation of many molecular biology experiments depends in an essential way on the accuracy and consistency of the existing annotation databases. Such databases are meant to act as repositories for our biological knowledge as we acquire and refine it. Hence, by definition, they are incomplete at any given time. In this paper, we describe a technique that improves our previous method for predicting novel GO annotations by extracting implicit semantic relationships between genes and functions. In this work, we use a vector space model and a number of weighting schemes in addition to our previous latent semantic indexing approach. The technique described here is able to take into consideration the hierarchical structure of the Gene Ontology (GO) and can weight differently GO terms situated at different depths. The prediction abilities of 15 different weighting schemes are compared and evaluated. Nine such schemes were previously used in other problem domains, while six of them are introduced in this paper. The best weighting scheme was a novel scheme, n2tn. Out of the top 50 functional annotations predicted using this weighting scheme, we found support in the literature for 84 percent of them, while 6 percent of the predictions were contradicted by the existing literature. For the remaining 10 percent, we did not find any relevant publications to confirm or contradict the predictions. The n2tn weighting scheme also outperformed the simple binary scheme used in our previous approach.  相似文献   



Large-scale sequencing projects have now become routine lab practice and this has led to the development of a new generation of tools involving function prediction methods, bringing the latter back to the fore. The advent of Gene Ontology, with its structured vocabulary and paradigm, has provided computational biologists with an appropriate means for this task.


We present here a novel method called ARGOT (Annotation Retrieval of Gene Ontology Terms) that is able to process quickly thousands of sequences for functional inference. The tool exploits for the first time an integrated approach which combines clustering of GO terms, based on their semantic similarities, with a weighting scheme which assesses retrieved hits sharing a certain number of biological features with the sequence to be annotated. These hits may be obtained by different methods and in this work we have based ARGOT processing on BLAST results.


The extensive benchmark involved 10,000 protein sequences, the complete S. cerevisiae genome and a small subset of proteins for purposes of comparison with other available tools. The algorithm was proven to outperform existing methods and to be suitable for function prediction of single proteins due to its high degree of sensitivity, specificity and coverage.  相似文献   

Systemic lupus erythematosus (SLE) commonly accredited as “the great imitator” is a highly complex disease involving multiple gene susceptibility with non-specific symptoms. Many experimental and computational approaches have been used to investigate the disease related candidate genes. But the limited knowledge of gene function and disease correlation and also lack of complete functional details about the majority of genes in susceptible locus, encumbrances the identification of SLE related candidate genes. In this paper, we have studied the human immunome network (undirected) using various graph theoretical centrality measures integrated with the gene ontology terms to predict the new candidate genes. As a result, we have identified 8 candidate genes, which may act as potential targets for SLE disease. We have also carried out the same analysis by replacing the human immunome network with human immunome signaling network (directed) and as an outcome we have obtained 5 candidate genes as potential targets for SLE disease. From the comparison study, we have found these two approaches are complementary in nature.  相似文献   

挖掘高通量实验数据蕴含的生物学意义是蛋白质组学研究面临的一大挑战 . 基于等级化结构化的词汇表 GO (Gene Ontology) 和相关数据库中的蛋白质功能注释,发展了一种对蛋白质组学研究中得到的表达谱 (Expression profile) 进行功能分析的策略 . 在对蛋白质表达谱进行功能注释的基础上给出蛋白质表达谱中蛋白质功能的分布,同时给出感兴趣功能类别的统计信息 . 这有助于对表达谱蛋白质功能的整体理解和深入的生物信息学分析 . 该策略已经成功应用胎肝蛋白表达谱研究中,用户可以通过访问网址 http://www.hupo.org.cn/GOfact/ 使用或者下载我们的程序 .  相似文献   

The un-biased and reproducible interpretation of high-content gene sets from large-scale genomic experiments is crucial to the understanding of biological themes, validation of experimental data, and the eventual development of plans for future experimentation. To derive biomedically-relevant information from simple gene lists, a mathematical association to scientific language and meaningful words or sentences is crucial. Unfortunately, existing software for deriving meaningful and easily-appreciable scientific textual ‘tokens’ from large gene sets either rely on controlled vocabularies (Medical Subject Headings, Gene Ontology, BioCarta) or employ Boolean text searching and co-occurrence models that are incapable of detecting indirect links in the literature. As an improvement to existing web-based informatic tools, we have developed Textrous!, a web-based framework for the extraction of biomedical semantic meaning from a given input gene set of arbitrary length. Textrous! employs natural language processing techniques, including latent semantic indexing (LSI), sentence splitting, word tokenization, parts-of-speech tagging, and noun-phrase chunking, to mine MEDLINE abstracts, PubMed Central articles, articles from the Online Mendelian Inheritance in Man (OMIM), and Mammalian Phenotype annotation obtained from Jackson Laboratories. Textrous! has the ability to generate meaningful output data with even very small input datasets, using two different text extraction methodologies (collective and individual) for the selecting, ranking, clustering, and visualization of English words obtained from the user data. Textrous!, therefore, is able to facilitate the output of quantitatively significant and easily appreciable semantic words and phrases linked to both individual gene and batch genomic data.  相似文献   



The Gene Ontology (GO) is used to describe genes and gene products from many organisms. When used for functional annotation of microarray data, GO is often slimmed by editing so that only higher level terms remain. This practice is designed to improve the summarizing of experimental results by grouping high level terms and the statistical power of GO term enrichment analysis.  相似文献   

Genome-wide association studies (GWAS) have been widely used for identifying common variants associated with complex diseases. Despite remarkable success in uncovering many risk variants and providing novel insights into disease biology, genetic variants identified to date fail to explain the vast majority of the heritability for most complex diseases. One explanation is that there are still a large number of common variants that remain to be discovered, but their effect sizes are generally too small to be detected individually. Accordingly, gene set analysis of GWAS, which examines a group of functionally related genes, has been proposed as a complementary approach to single-marker analysis. Here, we propose a flexible and adaptive test for gene sets (FLAGS), using summary statistics. Extensive simulations showed that this method has an appropriate type I error rate and outperforms existing methods with increased power. As a proof of principle, through real data analyses of Crohn’s disease GWAS data and bipolar disorder GWAS meta-analysis results, we demonstrated the superior performance of FLAGS over several state-of-the-art association tests for gene sets. Our method allows for the more powerful application of gene set analysis to complex diseases, which will have broad use given that GWAS summary results are increasingly publicly available.  相似文献   

Automated Gene Ontology annotation for anonymous sequence data   总被引:9,自引:1,他引:9       下载免费PDF全文

MOTIVATION: To improve the ability of biologists (both researchers and students) to ask biologically interesting questions of the Gene Ontology (GO) database and to explore the ontologies by seeing large portions of the ontology graphs in context, along with details of individual terms in the ontologies. RESULTS: GoGet and GoView are two new tools built as part of an extensible web application system based on Java 2 Enterprise Edition technology. GoGet has a user interface that enables users to ask biologically interesting questions, such as (1) What are the DNA binding proteins involved in DNA repair, but not in DNA replication? and (2) Of the terms containing the word triphosphatase, which have associated gene products from mouse, but not fruit fly? The results of such queries can be viewed in a collapsed tabular format that eases the burden of getting through large tables of data. GoView enables users to explore the large directed acyclic graph structure of the ontologies in the GO database. The two tools are coordinated, so that results from queries in GoGet can be visualized in GoView in the ontology in which they appear, and explorations started from GoView can request details of gene product associations to appear in a result table in GoGet. AVAILABILITY: Free access to the GoGet query tool and free download of the GoView ontology viewer are provided to all users at http://db.math.macalester.edu/goproject. In addition, source code for the GoView tool is also available from this site, along with a user manual for both tools.  相似文献   

为研究人法尼酯衍生物X受体(FXR)基因5′调控区的功能,在对人FXR基因进行生物信息学分析的基础上,用5′RACE法确定该基因的转录起始位点碱基是A.采用PCR技术,扩增人基因组DNA中FXR基因5′上游序列,构建了5种含不同长度启动子序列的荧光素酶报告基因表达体系.将它们瞬时转染HepG2细胞,检测其荧光素酶活性.结果表明,-1 651~+200、-1 496~+200区域的启动子活性无明显区别,-847~+200区域的启动子活性最高,-544~+200区域启动子活性较前显著降低.研究提示,FXR基因转录所必需的基因启动子序列在-847~-544范围内.  相似文献   

以落叶松转录组测序数据为基础,利用RT-PCR从落叶松中分离出一个1590 bp的APETALA2-Like转录因子基因全长编码序列,命名为La AP2L1。蛋白质特性分析显示La AP2L1编码529个氨基酸,为亲水性蛋白,相对分子质量为58.327 k D,等电点为6.45。二级结构主要由α-螺旋(alpha helix)、β-折叠(strand)和环肽链(loop)组成。多重序列比对和系统进化分析表明La AP2L1所编码的氨基酸与黑松、云杉等亲缘关系最近。同时,通过构建植物表达载体,利用浸花法转化模式植物拟南芥的功能研究发现,与转空载体对照拟南芥相比,过量表达La AP2L1的转基因拟南芥的叶片、茎、花和株高都显著增大、增高,表明落叶松La AP2L1转录因子可能参与了植物器官发育的调节。  相似文献   

WTF1 (What’s this factor 1)是包含“Domain of Unknown Function 860”(DUF860)的一类蛋白, 特异定位于植物细胞叶绿体或线粒体中, 在内含子的剪切中发挥作用。在研究小麦(Triticum aestivum)热胁迫转录谱时发现一个包含该结构域的探针受高温诱导表达。通过对其所编码的基因TaWTF1进行克隆并详细分析, 发现该基因的启动子区包含HSE、干旱、GA及SA等胁迫和激素响应元件, 且该基因在苗期和开花期均受热胁迫诱导表达。在开花期, TaWTF1在普通叶中的表达显著高于包括旗叶在内的其它组织器官。进一步将该基因在拟南芥(Arabidopsis thaliana)中超表达, 显著提高了转基因植株在热胁迫下的成活率, 说明TaWTF1参与了植物耐热性。该研究为解析植物耐热性分子机理开辟了新的领域, 并为作物耐热性分子育种提供了候选基因。  相似文献   

Shao  Wei  He  Lihong  Chen  Qingxiu  Li  Jiang  Deng  Fei  Wang  Hualin  Hu  Zhihong  Wang  Manli 《中国病毒学》2019,34(6):701-711
Virologica Sinica - Baculoviridae is a family of large DNA viruses that specifically infect insects. It contains four genera, Alpha-, Beta-, Gamma-, and Deltabaculovirus. Alphabaculovirus is...  相似文献   

R Abo  GD Jenkins  L Wang  BL Fridley 《PloS one》2012,7(8):e43301
Genetic variation underlying the regulation of mRNA gene expression in humans may provide key insights into the molecular mechanisms of human traits and complex diseases. Current statistical methods to map genetic variation associated with mRNA gene expression have typically applied standard linkage and/or association methods; however, when genome-wide SNP and mRNA expression data are available performing all pair wise comparisons is computationally burdensome and may not provide optimal power to detect associations. Consideration of different approaches to account for the high dimensionality and multiple testing issues may provide increased efficiency and statistical power. Here we present a novel approach to model and test the association between genetic variation and mRNA gene expression levels in the context of gene sets (GSs) and pathways, referred to as gene set - expression quantitative trait loci analysis (GS-eQTL). The method uses GSs to initially group SNPs and mRNA expression, followed by the application of principal components analysis (PCA) to collapse the variation and reduce the dimensionality within the GSs. We applied GS-eQTL to assess the association between SNP and mRNA expression level data collected from a cell-based model system using PharmGKB and KEGG defined GSs. We observed a large number of significant GS-eQTL associations, in which the most significant associations arose between genetic variation and mRNA expression from the same GS. However, a number of associations involving genetic variation and mRNA expression from different GSs were also identified. Our proposed GS-eQTL method effectively addresses the multiple testing limitations in eQTL studies and provides biological context for SNP-expression associations.  相似文献   



Automated protein function prediction methods are the only practical approach for assigning functions to genes obtained from model organisms. Many of the previously reported function annotation methods are of limited utility for fungal protein annotation. They are often trained only to one species, are not available for high-volume data processing, or require the use of data derived by experiments such as microarray analysis. To meet the increasing need for high throughput, automated annotation of fungal genomes, we have developed a tool for annotating fungal protein sequences with terms from the Gene Ontology.  相似文献   

Ontologies support automatic sharing, combination and analysis of life sciences data. They undergo regular curation and enrichment. We studied the impact of an ontology evolution on its structural complexity. As a case study we used the sixty monthly releases between January 2008 and December 2012 of the Gene Ontology and its three independent branches, i.e. biological processes (BP), cellular components (CC) and molecular functions (MF). For each case, we measured complexity by computing metrics related to the size, the nodes connectivity and the hierarchical structure.The number of classes and relations increased monotonously for each branch, with different growth rates. BP and CC had similar connectivity, superior to that of MF. Connectivity increased monotonously for BP, decreased for CC and remained stable for MF, with a marked increase for the three branches in November and December 2012. Hierarchy-related measures showed that CC and MF had similar proportions of leaves, average depths and average heights. BP had a lower proportion of leaves, and a higher average depth and average height. For BP and MF, the late 2012 increase of connectivity resulted in an increase of the average depth and average height and a decrease of the proportion of leaves, indicating that a major enrichment effort of the intermediate-level hierarchy occurred.The variation of the number of classes and relations in an ontology does not provide enough information about the evolution of its complexity. However, connectivity and hierarchy-related metrics revealed different patterns of values as well as of evolution for the three branches of the Gene Ontology. CC was similar to BP in terms of connectivity, and similar to MF in terms of hierarchy. Overall, BP complexity increased, CC was refined with the addition of leaves providing a finer level of annotations but decreasing slightly its complexity, and MF complexity remained stable.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号