首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Semantic Similarity in Biomedical Ontologies   总被引:1,自引:0,他引:1  
In recent years, ontologies have become a mainstream topic in biomedical research. When biological entities are described using a common schema, such as an ontology, they can be compared by means of their annotations. This type of comparison is called semantic similarity, since it assesses the degree of relatedness between two entities by the similarity in meaning of their annotations. The application of semantic similarity to biomedical ontologies is recent; nevertheless, several studies have been published in the last few years describing and evaluating diverse approaches. Semantic similarity has become a valuable tool for validating the results drawn from biomedical studies such as gene clustering, gene expression data analysis, prediction and validation of molecular interactions, and disease gene prioritization.  相似文献   

2.
Structured gene annotations are a foundation upon which many bioinformatics and statistical analyses are built. However the structured annotations available in public databases are a sparse representation of biological knowledge as a whole. The rate of biomedical data generation is such that centralized biocuration efforts struggle to keep up. New models for gene annotation need to be explored that expand the pace at which we are able to structure biomedical knowledge. Recently, online games have emerged as an effective way to recruit, engage and organize large numbers of volunteers to help address difficult biological challenges. For example, games have been successfully developed for protein folding (Foldit), multiple sequence alignment (Phylo) and RNA structure design (EteRNA). Here we present Dizeez, a simple online game built with the purpose of structuring knowledge of gene-disease associations. Preliminary results from game play online and at scientific conferences suggest that Dizeez is producing valid gene-disease annotations not yet present in any public database. These early results provide a basic proof of principle that online games can be successfully applied to the challenge of gene annotation. Dizeez is available at http://genegames.org.  相似文献   

3.
MOTIVATION: With the rapid advancement of biomedical science and the development of high-throughput analysis methods, the extraction of various types of information from biomedical text has become critical. Since automatic functional annotations of genes are quite useful for interpreting large amounts of high-throughput data efficiently, the demand for automatic extraction of information related to gene functions from text has been increasing. RESULTS: We have developed a method for automatically extracting the biological process functions of genes/protein/families based on Gene Ontology (GO) from text using a shallow parser and sentence structure analysis techniques. When the gene/protein/family names and their functions are described in ACTOR (doer of action) and OBJECT (receiver of action) relationships, the corresponding GO-IDs are assigned to the genes/proteins/families. The gene/protein/family names are recognized using the gene/protein/family name dictionaries developed by our group. To achieve wide recognition of the gene/protein/family functions, we semi-automatically gather functional terms based on GO using co-occurrence, collocation similarities and rule-based techniques. A preliminary experiment demonstrated that our method has an estimated recall of 54-64% with a precision of 91-94% for actually described functions in abstracts. When applied to the PUBMED, it extracted over 190 000 gene-GO relationships and 150 000 family-GO relationships for major eukaryotes.  相似文献   

4.
5.
Red/ET重组及其在生物医学中的应用   总被引:3,自引:2,他引:1  
王军平  张友明   《生物工程学报》2005,21(3):502-506
通过应用Rac噬菌体的RecE RecT和λ噬菌体的RedαRedβ系统而建立的DNA工程平台———Red ET重组,是一种不依赖于限制性内切酶的分子克隆新技术。运用该技术能够介导PCR产物或寡核苷酸对目标基因进行剪切、插入、融合及突变等多种操作,在生物医学领域里具有广阔的应用前景,尤其在基因组功能研究中对BACs、PACs和细菌染色体的打靶修饰以及基因敲除动物DNA靶分子的快速构建等方面最有效。随着Red ET重组的推广与应用,该技术本身也在不断被改进,在工作效率得到显著提高的同时,其操作也变得更加简单、省时、省力。结合自身的一些研究结果,对Red ET重组的技术特点、发展现状和在生物医学中的应用进行了详细阐述。  相似文献   

6.
Rich information on point mutation studies is scattered across heterogeneous data sources. This paper presents an automated workflow for mining mutation annotations from full-text biomedical literature using natural language processing (NLP) techniques as well as for their subsequent reuse in protein structure annotation and visualization. This system, called mSTRAP (Mutation extraction and STRucture Annotation Pipeline), is designed for both information aggregation and subsequent brokerage of the mutation annotations. It facilitates the coordination of semantically related information from a series of text mining and sequence analysis steps into a formal OWL-DL ontology. The ontology is designed to support application-specific data management of sequence, structure, and literature annotations that are populated as instances of object and data type properties. mSTRAPviz is a subsystem that facilitates the brokerage of structure information and the associated mutations for visualization. For mutated sequences without any corresponding structure available in the Protein Data Bank (PDB), an automated pipeline for homology modeling is developed to generate the theoretical model. With mSTRAP, we demonstrate a workable system that can facilitate automation of the workflow for the retrieval, extraction, processing, and visualization of mutation annotations -- tasks which are well known to be tedious, time-consuming, complex, and error-prone. The ontology and visualization tool are available at (http://datam.i2r.a-star.edu.sg/mstrap).  相似文献   

7.
We present BioGraph, a data integration and data mining platform for the exploration and discovery of biomedical information. The platform offers prioritizations of putative disease genes, supported by functional hypotheses. We show that BioGraph can retrospectively confirm recently discovered disease genes and identify potential susceptibility genes, outperforming existing technologies, without requiring prior domain knowledge. Additionally, BioGraph allows for generic biomedical applications beyond gene discovery. BioGraph is accessible at .  相似文献   

8.
A high proportion of life science researches are gene-oriented, in which scientists aim to investigate the roles that genes play in biological processes, and their involvement in biological mechanisms. As a result, gene names and their related information turn out to be one of the main objects of interest in biomedical literatures. While the capability of recognizing gene mentions has made significant progress, the results of recognition are still insufficient for direct use due to the ambiguity of gene names. Gene normalization (GN) goes beyond the recognition task by linking a gene mention to a database ID. Unlike most previous works, we approach GN on the instance-level and evaluate its overall performance on the recognition and normalization steps in abstracts and full texts. We release the first instance-level gene normalization (IGN) corpus in the BioC format, which includes annotations for the boundaries of all gene mentions and the corresponding IDs for human gene mentions. Species information, along with existing co-reference chains and full name/abbreviation pairs are also provided for each gene mention. Using the released corpus, we have designed a collective instance-level GN approach using not only the contextual information of each individual instance, but also the relations among instances and the inherent characteristics of full-text sections. Our experimental results show that our collective approach can achieve an F-score of 0.743. The proposed approach that exploits section characteristics in full-text articles can improve the F-scores of information lacking sections by up to 1.8%. In addition, using the proposed refinement process improved the F-score of gene mention recognition by 0.125 and that of GN by 0.03. Whereas current experimental results are limited to the human species, we seek to continue updating the annotations of the IGN corpus and observe how the proposed approach can be extended to other species.  相似文献   

9.
10.
A genetic association study is a complicated process that involves collecting phenotypic data, generating genotypic data, analyzing associations between genotypic and phenotypic data, and interpreting genetic biomarkers identified. SNPTrack is an integrated bioinformatics system developed by the US Food and Drug Administration (FDA) to support the review and analysis of pharmacogenetics data resulting from FDA research or submitted by sponsors. The system integrates data management, analysis, and interpretation in a single platform for genetic association studies. Specifically, it stores genotyping data and single-nucleotide polymorphism (SNP) annotations along with study design data in an Oracle database. It also integrates popular genetic analysis tools, such as PLINK and Haploview. SNPTrack provides genetic analysis capabilities and captures analysis results in its database as SNP lists that can be cross-linked for biological interpretation to gene/protein annotations, Gene Ontology, and pathway analysis data. With SNPTrack, users can do the entire stream of bioinformatics jobs for genetic association studies. SNPTrack is freely available to the public at http://www.fda.gov/ScienceResearch/BioinformaticsTools/SNPTrack/default.htm.  相似文献   

11.
MOTIVATION: In general, most accurate gene/protein annotations are provided by curators. Despite having lesser evidence strengths, it is inevitable to use computational methods for fast and a priori discovery of protein function annotations. This paper considers the problem of assigning Gene Ontology (GO) annotations to partially annotated or newly discovered proteins. RESULTS: We present a data mining technique that computes the probabilistic relationships between GO annotations of proteins on protein-protein interaction data, and assigns highly correlated GO terms of annotated proteins to non-annotated proteins in the target set. In comparison with other techniques, probabilistic suffix tree and correlation mining techniques produce the highest prediction accuracy of 81% precision with the recall at 45%. AVAILABILITY: Code is available upon request. Results and used materials are available online at http://kirac.case.edu/PROTAN.  相似文献   

12.
BACKGROUND: The semantic integration of biomedical resources is still a challenging issue which is required for effective information processing and data analysis. The availability of comprehensive knowledge resources such as biomedical ontologies and integrated thesauri greatly facilitates this integration effort by means of semantic annotation, which allows disparate data formats and contents to be expressed under a common semantic space. In this paper, we propose a multidimensional representation for such a semantic space, where dimensions regard the different perspectives in biomedical research (e.g., population, disease, anatomy and protein/genes). RESULTS: This paper presents a novel method for building multidimensional semantic spaces from semantically annotated biomedical data collections. This method consists of two main processes: knowledge and data normalization. The former one arranges the concepts provided by a reference knowledge resource (e.g., biomedical ontologies and thesauri) into a set of hierarchical dimensions for analysis purposes. The latter one reduces the annotation set associated to each collection item into a set of points of the multidimensional space. Additionally, we have developed a visual tool, called 3D-Browser, which implements OLAP-like operators over the generated multidimensional space. The method and the tool have been tested and evaluated in the context of the Health-e-Child (HeC) project. Automatic semantic annotation was applied to tag three collections of abstracts taken from PubMed, one for each target disease of the project, the Uniprot database, and the HeC patient record database. We adopted the UMLS Meta-thesaurus 2010AA as the reference knowledge resource. CONCLUSIONS: Current knowledge resources and semantic-aware technology make possible the integration of biomedical resources. Such an integration is performed through semantic annotation of the intended biomedical data resources. This paper shows how these annotations can be exploited for integration, exploration, and analysis tasks. Results over a real scenario demonstrate the viability and usefulness of the approach, as well as the quality of the generated multidimensional semantic spaces.  相似文献   

13.
14.
15.
CyanoBase provides an online resource for access to data on genomic information about the cyanobacterium Synechocystis sp. strain PCC6803. The database contains annotations for each protein-coding gene deduced from the entire nucleotide sequence of the genome, gene classification lists, and keyword and similarity search engines. Core portions of CyanoBase consist of annotations for each of the 3168 protein genes deduced from the entire nucleotide sequence of this genome. The contents of each gene were improved by updating with the results of similarity searches and by introducing references for analysis in bioinformatics. The database now contains repository facilities that store and provide experimental information, in addition to providing proposals for the function of each gene. This information should help to avoid unnecessary, overlapping experiments and should assist communication between scientists who wish to elucidate the function of putative genes on the cyanobacteria genome. The current URL of CyanoBase is http://www.kazusa.or.jp:8080/cyano/  相似文献   

16.
GSEA-P: a desktop application for Gene Set Enrichment Analysis   总被引:4,自引:0,他引:4  
Gene Set Enrichment Analysis (GSEA) is a computational method that assesses whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states. We report the availability of a new version of the Java based software (GSEA-P 2.0) that represents a major improvement on the previous release through the addition of a leading edge analysis component, seamless integration with the Molecular Signature Database (MSigDB) and an embedded browser that allows users to search for gene sets and map them to a variety of microarray platform formats. This functionality makes it possible for users to directly import gene sets from MSigDB for analysis with GSEA. We have also improved the visualizations in GSEA-P 2.0 and added links to a new form of concise gene set annotations called Gene Set Cards. These additions, as well as other improvements suggested by over 3500 users who have downloaded the software over the past year have been incorporated into this new release of the GSEA-P Java desktop program. AVAILABILITY: GSEA-P 2.0 is freely available for academic and commercial users and can be downloaded from http://www.broad.mit.edu/GSEA  相似文献   

17.
MOTIVATION: A clear understanding of functions in biology is a key component in accurate modelling of molecular, cellular and organismal biology. Using the existing biomedical ontologies it has been impossible to capture the complexity of the community's knowledge about biological functions. RESULTS: We present here a top-level ontological framework for representing knowledge about biological functions. This framework lends greater accuracy, power and expressiveness to biomedical ontologies by providing a means to capture existing functional knowledge in a more formal manner. An initial major application of the ontology of functions is the provision of a principled way in which to curate functional knowledge and annotations in biomedical ontologies. Further potential applications include the facilitation of ontology interoperability and automated reasoning. A major advantage of the proposed implementation is that it is an extension to existing biomedical ontologies, and can be applied without substantial changes to these domain ontologies. AVAILABILITY: The Ontology of Functions (OF) can be downloaded in OWL format from http://onto.eva.mpg.de/. Additionally, a UML profile and supplementary information and guides for using the OF can be accessed from the same website.  相似文献   

18.
Xuan Z  Zhao F  Wang J  Chen G  Zhang MQ 《Genome biology》2005,6(8):R72-12
  相似文献   

19.
In pursuit of a better updated source including 'omics' information for breast cancer, Breast Cancer Database (BCDB) has been developed to provide the researcher with the quick overview of the Breast cancer disease and other relevant information. This database comprises of myriad of information about genes involved in breast cancer, its functions and drug molecules which are currently being used in the treatment of breast cancer. The data available in BCDB is retrieved from the biomedical research literature. It facilitates the user to search information on gene, its location in chromosome, functions and its importance in cancer diseases. Broadly, this can be queried by giving gene name, protein name and drug name. This database is platform independent, user friendly and freely accessible through internet. The data present in BCDB is directly linked to other on-line resources such as NCBI, PDB and PubMed. Hence, it can act as a complete web resource comprising gene sequences, drug structures and literature information related to breast cancer, which is not available in any other breast cancer database. AVAILABILITY: The database is freely available at http://122.165.25.137/bioinfo/breastcancerdb/  相似文献   

20.

Background  

Significant inconsistencies between probe-to-gene annotations between different releases of probe set identifiers by commercial microarray platform solutions have been reported. Such inconsistencies lead to misleading or ambiguous interpretation of published gene expression results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号