首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.

Background

Large-scale sequencing projects have now become routine lab practice and this has led to the development of a new generation of tools involving function prediction methods, bringing the latter back to the fore. The advent of Gene Ontology, with its structured vocabulary and paradigm, has provided computational biologists with an appropriate means for this task.

Methodology

We present here a novel method called ARGOT (Annotation Retrieval of Gene Ontology Terms) that is able to process quickly thousands of sequences for functional inference. The tool exploits for the first time an integrated approach which combines clustering of GO terms, based on their semantic similarities, with a weighting scheme which assesses retrieved hits sharing a certain number of biological features with the sequence to be annotated. These hits may be obtained by different methods and in this work we have based ARGOT processing on BLAST results.

Conclusions

The extensive benchmark involved 10,000 protein sequences, the complete S. cerevisiae genome and a small subset of proteins for purposes of comparison with other available tools. The algorithm was proven to outperform existing methods and to be suitable for function prediction of single proteins due to its high degree of sensitivity, specificity and coverage.  相似文献   

3.
4.
Systemic lupus erythematosus (SLE) commonly accredited as “the great imitator” is a highly complex disease involving multiple gene susceptibility with non-specific symptoms. Many experimental and computational approaches have been used to investigate the disease related candidate genes. But the limited knowledge of gene function and disease correlation and also lack of complete functional details about the majority of genes in susceptible locus, encumbrances the identification of SLE related candidate genes. In this paper, we have studied the human immunome network (undirected) using various graph theoretical centrality measures integrated with the gene ontology terms to predict the new candidate genes. As a result, we have identified 8 candidate genes, which may act as potential targets for SLE disease. We have also carried out the same analysis by replacing the human immunome network with human immunome signaling network (directed) and as an outcome we have obtained 5 candidate genes as potential targets for SLE disease. From the comparison study, we have found these two approaches are complementary in nature.  相似文献   

5.
The un-biased and reproducible interpretation of high-content gene sets from large-scale genomic experiments is crucial to the understanding of biological themes, validation of experimental data, and the eventual development of plans for future experimentation. To derive biomedically-relevant information from simple gene lists, a mathematical association to scientific language and meaningful words or sentences is crucial. Unfortunately, existing software for deriving meaningful and easily-appreciable scientific textual ‘tokens’ from large gene sets either rely on controlled vocabularies (Medical Subject Headings, Gene Ontology, BioCarta) or employ Boolean text searching and co-occurrence models that are incapable of detecting indirect links in the literature. As an improvement to existing web-based informatic tools, we have developed Textrous!, a web-based framework for the extraction of biomedical semantic meaning from a given input gene set of arbitrary length. Textrous! employs natural language processing techniques, including latent semantic indexing (LSI), sentence splitting, word tokenization, parts-of-speech tagging, and noun-phrase chunking, to mine MEDLINE abstracts, PubMed Central articles, articles from the Online Mendelian Inheritance in Man (OMIM), and Mammalian Phenotype annotation obtained from Jackson Laboratories. Textrous! has the ability to generate meaningful output data with even very small input datasets, using two different text extraction methodologies (collective and individual) for the selecting, ranking, clustering, and visualization of English words obtained from the user data. Textrous!, therefore, is able to facilitate the output of quantitatively significant and easily appreciable semantic words and phrases linked to both individual gene and batch genomic data.  相似文献   

6.
7.
8.

Background  

The Gene Ontology (GO) is used to describe genes and gene products from many organisms. When used for functional annotation of microarray data, GO is often slimmed by editing so that only higher level terms remain. This practice is designed to improve the summarizing of experimental results by grouping high level terms and the statistical power of GO term enrichment analysis.  相似文献   

9.
MOTIVATION: To improve the ability of biologists (both researchers and students) to ask biologically interesting questions of the Gene Ontology (GO) database and to explore the ontologies by seeing large portions of the ontology graphs in context, along with details of individual terms in the ontologies. RESULTS: GoGet and GoView are two new tools built as part of an extensible web application system based on Java 2 Enterprise Edition technology. GoGet has a user interface that enables users to ask biologically interesting questions, such as (1) What are the DNA binding proteins involved in DNA repair, but not in DNA replication? and (2) Of the terms containing the word triphosphatase, which have associated gene products from mouse, but not fruit fly? The results of such queries can be viewed in a collapsed tabular format that eases the burden of getting through large tables of data. GoView enables users to explore the large directed acyclic graph structure of the ontologies in the GO database. The two tools are coordinated, so that results from queries in GoGet can be visualized in GoView in the ontology in which they appear, and explorations started from GoView can request details of gene product associations to appear in a result table in GoGet. AVAILABILITY: Free access to the GoGet query tool and free download of the GoView ontology viewer are provided to all users at http://db.math.macalester.edu/goproject. In addition, source code for the GoView tool is also available from this site, along with a user manual for both tools.  相似文献   

10.
Automated Gene Ontology annotation for anonymous sequence data   总被引:9,自引:1,他引:9       下载免费PDF全文
  相似文献   

11.

Background  

Automated protein function prediction methods are the only practical approach for assigning functions to genes obtained from model organisms. Many of the previously reported function annotation methods are of limited utility for fungal protein annotation. They are often trained only to one species, are not available for high-volume data processing, or require the use of data derived by experiments such as microarray analysis. To meet the increasing need for high throughput, automated annotation of fungal genomes, we have developed a tool for annotating fungal protein sequences with terms from the Gene Ontology.  相似文献   

12.
Ontologies support automatic sharing, combination and analysis of life sciences data. They undergo regular curation and enrichment. We studied the impact of an ontology evolution on its structural complexity. As a case study we used the sixty monthly releases between January 2008 and December 2012 of the Gene Ontology and its three independent branches, i.e. biological processes (BP), cellular components (CC) and molecular functions (MF). For each case, we measured complexity by computing metrics related to the size, the nodes connectivity and the hierarchical structure.The number of classes and relations increased monotonously for each branch, with different growth rates. BP and CC had similar connectivity, superior to that of MF. Connectivity increased monotonously for BP, decreased for CC and remained stable for MF, with a marked increase for the three branches in November and December 2012. Hierarchy-related measures showed that CC and MF had similar proportions of leaves, average depths and average heights. BP had a lower proportion of leaves, and a higher average depth and average height. For BP and MF, the late 2012 increase of connectivity resulted in an increase of the average depth and average height and a decrease of the proportion of leaves, indicating that a major enrichment effort of the intermediate-level hierarchy occurred.The variation of the number of classes and relations in an ontology does not provide enough information about the evolution of its complexity. However, connectivity and hierarchy-related metrics revealed different patterns of values as well as of evolution for the three branches of the Gene Ontology. CC was similar to BP in terms of connectivity, and similar to MF in terms of hierarchy. Overall, BP complexity increased, CC was refined with the addition of leaves providing a finer level of annotations but decreasing slightly its complexity, and MF complexity remained stable.  相似文献   

13.
为构建修复突变绿色荧七蛋白(GFP)基因的反式剪接核酶,分别构建包含突变的GFP基因的XYQ5/10-pGEM重组质粒、XYQ5/10—pEGFP—C2重组质粒及用于修复该突变基因的反式剪接核酶载体trans—rib—CMV2。通过对体外共转录XYQ5/10—pGEM和trans—rib—CMV2重组质粒的RNA产物进行RT—PCR检测核酶细胞外剪接效果;通过XYQ5/10-pEGFP-C2和trans—rib—CMV2重组质粒共转染HeLa细胞检测核酶细胞内的剪接效果。结果显示,XYQ5/10—pGEM、XYQ5/10-pEGFP-C2及trans—rib—CMV2重组质粒构建成功,反式剪接核酶在细胞外及细胞内都可以修复突变基因。虽然效率不高,但为今后更大规模地研究设计反式剪接核酶打下了基础。  相似文献   

14.
Liver injuries due to ingestion or exposure to chemicals and industrial toxicants pose a serious health risk that may be hard to assess due to a lack of non-invasive diagnostic tests. Mapping chemical injuries to organ-specific damage and clinical outcomes via biomarkers or biomarker panels will provide the foundation for highly specific and robust diagnostic tests. Here, we have used DrugMatrix, a toxicogenomics database containing organ-specific gene expression data matched to dose-dependent chemical exposures and adverse clinical pathology assessments in Sprague Dawley rats, to identify groups of co-expressed genes (modules) specific to injury endpoints in the liver. We identified 78 such gene co-expression modules associated with 25 diverse injury endpoints categorized from clinical pathology, organ weight changes, and histopathology. Using gene expression data associated with an injury condition, we showed that these modules exhibited different patterns of activation characteristic of each injury. We further showed that specific module genes mapped to 1) known biochemical pathways associated with liver injuries and 2) clinically used diagnostic tests for liver fibrosis. As such, the gene modules have characteristics of both generalized and specific toxic response pathways. Using these results, we proposed three gene signature sets characteristic of liver fibrosis, steatosis, and general liver injury based on genes from the co-expression modules. Out of all 92 identified genes, 18 (20%) genes have well-documented relationships with liver disease, whereas the rest are novel and have not previously been associated with liver disease. In conclusion, identifying gene co-expression modules associated with chemically induced liver injuries aids in generating testable hypotheses and has the potential to identify putative biomarkers of adverse health effects.  相似文献   

15.
R Abo  GD Jenkins  L Wang  BL Fridley 《PloS one》2012,7(8):e43301
Genetic variation underlying the regulation of mRNA gene expression in humans may provide key insights into the molecular mechanisms of human traits and complex diseases. Current statistical methods to map genetic variation associated with mRNA gene expression have typically applied standard linkage and/or association methods; however, when genome-wide SNP and mRNA expression data are available performing all pair wise comparisons is computationally burdensome and may not provide optimal power to detect associations. Consideration of different approaches to account for the high dimensionality and multiple testing issues may provide increased efficiency and statistical power. Here we present a novel approach to model and test the association between genetic variation and mRNA gene expression levels in the context of gene sets (GSs) and pathways, referred to as gene set - expression quantitative trait loci analysis (GS-eQTL). The method uses GSs to initially group SNPs and mRNA expression, followed by the application of principal components analysis (PCA) to collapse the variation and reduce the dimensionality within the GSs. We applied GS-eQTL to assess the association between SNP and mRNA expression level data collected from a cell-based model system using PharmGKB and KEGG defined GSs. We observed a large number of significant GS-eQTL associations, in which the most significant associations arose between genetic variation and mRNA expression from the same GS. However, a number of associations involving genetic variation and mRNA expression from different GSs were also identified. Our proposed GS-eQTL method effectively addresses the multiple testing limitations in eQTL studies and provides biological context for SNP-expression associations.  相似文献   

16.
17.
Mammary fat is the main composition of breast, and is the most probable candidate to affect tumor behavior because the fat produces hormones, growth factors and adipokines, a heterogeneous group of signaling molecules. Gene expression profiling and functional characterization of mammary fat in Chinese women has not been reported. Thus, we collected the mammary fat tissues adjacent to breast tumors from 60 subjects, among which 30 subjects had breast cancer and 30 had benign lesions. We isolated and cultured the stromal vascular cell fraction from mammary fat. The expression of genes related to adipose function (including adipogenesis and secretion) was detected at both the tissue and the cellular level. We also studied mammary fat browning. The results indicated that fat tissue close to malignant and benign lesions exhibited distinctive gene expression profiles and functional characteristics. Although the mammary fat of breast tumors atrophied, it secreted tumor growth stimulatory factors. Browning of mammary fat was observed and browning activity of fat close to malignant breast tumors was greater than that close to benign lesions. Understanding the diversity between these two fat depots may possibly help us improve our understanding of breast cancer pathogenesis and find the key to unlock new anticancer therapies.  相似文献   

18.
Trehalose is a non-reducing disaccharide of glucose that functions as a compatible solute in the stabilization of biological structures under heat and desiccation stress in bacteria, fungi, and some “resurrection plants”. In the plant kingdom, trehalose is biosynthesized by trehalose-6-phosphate synthase (TPS) and trehalose-6-phosphate phosphatase (TPP). Over-expression of exogenous and endogenous genes encoding TPS and TPP is reported to be effective for improving abiotic stress tolerance in tobacco, potato, tomato, rice, and Arabidopsis. On the basis of bioinformatics prediction, we cloned a fragment containing an open reading frame of 2,820 bp from maize, which encodes a protein of 939 amino acids. Phylogenetic analysis showed that this gene belongs to the class I subfamily of the TPS gene family. Analysis of conserved domains revealed the presence of a TPS domain and a TPP domain. Yeast complementation with TPS and TPP mutants demonstrated that this protein has the activity of trehalose-6-phosphate synthase. Semi-quantitative RT-PCR and real-time quantitative PCR indicated that the expression of this gene is upregulated in response to both salt and cold stress.  相似文献   

19.
20.
One of the most important objects in bioinformatics is a gene product (protein or RNA). For many gene products, functional information is summarized in a set of Gene Ontology (GO) annotations. For these genes, it is reasonable to include similarity measures based on the terms found in the GO or other taxonomy. In this paper, we introduce several novel measures for computing the similarity of two gene products annotated with GO terms. The fuzzy measure similarity (FMS) has the advantage that it takes into consideration the context of both complete sets of annotation terms when computing the similarity between two gene products. When the two gene products are not annotated by common taxonomy terms, we propose a method that avoids a zero similarity result. To account for the variations in the annotation reliability, we propose a similarity measure based on the Choquet integral. These similarity measures provide extra tools for the biologist in search of functional information for gene products. The initial testing on a group of 194 sequences representing three proteins families shows a higher correlation of the FMS and Choquet similarities to the BLAST sequence similarities than the traditional similarity measures such as pairwise average or pairwise maximum.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号