首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We present an analysis of some considerations involved in expressing the Gene Ontology (GO) as a machine-processible ontology, reflecting principles of formal ontology. GO is a controlled vocabulary that is intended to facilitate communication between biologists by standardizing usage of terms in database annotations. Making such controlled vocabularies maximally useful in support of bioinformatics applications requires explicating in machine-processible form the implicit background information that enables human users to interpret the meaning of the vocabulary terms. In the case of GO, this process would involve rendering the meanings of GO into a formal (logical) language with the help of domain experts, and adding additional information required to support the chosen formalization. A controlled vocabulary augmented in these ways is commonly called an ontology. In this paper, we make a modest exploration to determine the ontological requirements for this extended version of GO. Using the terms within the three GO hierarchies (molecular function, biological process and cellular component), we investigate the facility with which GO concepts can be ontologized, using available tools from the philosophical and ontological engineering literature.  相似文献   

2.
In predicting hierarchical protein function annotations, such as terms in the Gene Ontology (GO), the simplest approach makes predictions for each term independently. However, this approach has the unfortunate consequence that the predictor may assign to a single protein a set of terms that are inconsistent with one another; for example, the predictor may assign a specific GO term to a given protein ('purine nucleotide binding') but not assign the parent term ('nucleotide binding'). Such predictions are difficult to interpret. In this work, we focus on methods for calibrating and combining independent predictions to obtain a set of probabilistic predictions that are consistent with the topology of the ontology. We call this procedure 'reconciliation'. We begin with a baseline method for predicting GO terms from a collection of data types using an ensemble of discriminative classifiers. We apply the method to a previously described benchmark data set, and we demonstrate that the resulting predictions are frequently inconsistent with the topology of the GO. We then consider 11 distinct reconciliation methods: three heuristic methods; four variants of a Bayesian network; an extension of logistic regression to the structured case; and three novel projection methods - isotonic regression and two variants of a Kullback-Leibler projection method. We evaluate each method in three different modes - per term, per protein and joint - corresponding to three types of prediction tasks. Although the principal goal of reconciliation is interpretability, it is important to assess whether interpretability comes at a cost in terms of precision and recall. Indeed, we find that many apparently reasonable reconciliation methods yield reconciled probabilities with significantly lower precision than the original, unreconciled estimates. On the other hand, we find that isotonic regression usually performs better than the underlying, unreconciled method, and almost never performs worse; isotonic regression appears to be able to use the constraints from the GO network to its advantage. An exception to this rule is the high precision regime for joint evaluation, where Kullback-Leibler projection yields the best performance.  相似文献   

3.
ABSTRACT: BACKGROUND: Biomedical processes can provide essential information about the (mal-) functioning of an organism and are thus frequently represented in biomedical terminologies and ontologies, including the GO Biological Process branch. These processes often need to be described and categorised in terms of their attributes, such as rates or regularities. The adequate representation of such process attributes has been a contentious issue in bio-ontologies recently; and domain ontologies have correspondingly developed ad hoc workarounds that compromise interoperability and logical consistency. RESULTS: We present a design pattern for the representation of process attributes that is compatible with upper ontology frameworks such as BFO and BioTop. Our solution rests on two key tenets: firstly, that many of the sorts of process attributes which are biomedically interesting can be characterised by the ways that repeated parts of such processes constitute, in combination, an overall process; secondly, that entities for which a full logical definition can be assigned do not need to be treated as primitive within a formal ontology framework. We apply this approach to the challenge of modelling and automatically classifying examples of normal and abnormal rates and patterns of heart beating processes, and discuss the expressivity required in the underlying ontology representation language. We provide full definitions for process attributes at increasing levels of domain complexity. CONCLUSIONS: We show that a logical definition of process attributes is feasible, though limited by the expressivity of DL languages so that the creation of primitives is still necessary. This finding may endorse current formal upper-ontology frameworks as a way of ensuring consistency, interoperability and clarity.  相似文献   

4.
Additional gene ontology structure for improved biological reasoning   总被引:5,自引:0,他引:5  
MOTIVATION: The Gene Ontology (GO) is a widely used terminology for gene product characterization in, for example, interpretation of biology underlying microarray experiments. The current GO defines term relationships within each of the independent subontologies: molecular function, biological process and cellular component. However, it is evident that there also exist biological relationships between terms of different subontologies. Our aim was to connect the three subontologies to enable GO to cover more biological knowledge, enable a more consistent use of GO and provide new opportunities for biological reasoning. RESULTS: We propose a new structure, the Second Gene Ontology Layer, capturing biological relations not directly reflected in the present ontology structure. Given molecular functions, these paths identify biological processes where the molecular functions are involved and cellular components where they are active. The current Second Layer contains 6271 validated paths, covering 54% of the molecular functions of GO and can be used to render existing gene annotation sets more complete and consistent. Applying Second Layer paths to a set of 4223 human genes, increased biological process annotations by 24% compared to publicly available annotations and reproduced 30% of them. AVAILABILITY: The Second GO is publicly available through the GO Annotation Toolbox (GOAT.no): http://www.goat.no.  相似文献   

5.
There has been recent and growing interest in applying Cerenkov radiation (CR) for biological applications. Knowledge of the production efficiency and other characteristics of the CR produced by various radionuclides would help in accessing the feasibility of proposed applications and guide the choice of radionuclides. To generate this information we developed models of CR production efficiency based on the Frank-Tamm equation and models of CR distribution based on Monte-Carlo simulations of photon and β particle transport. All models were validated against direct measurements using multiple radionuclides and then applied to a number of radionuclides commonly used in biomedical applications. We show that two radionuclides, Ac-225 and In-111, which have been reported to produce CR in water, do not in fact produce CR directly. We also propose a simple means of using this information to calibrate high sensitivity luminescence imaging systems and show evidence suggesting that this calibration may be more accurate than methods in routine current use.  相似文献   

6.
Terminology-driven mining of biomedical literature   总被引:3,自引:0,他引:3  
MOTIVATION: With an overwhelming amount of textual information in molecular biology and biomedicine, there is a need for effective literature mining techniques that can help biologists to gather and make use of the knowledge encoded in text documents. Although the knowledge is organized around sets of domain-specific terms, few literature mining systems incorporate deep and dynamic terminology processing. RESULTS: In this paper, we present an overview of an integrated framework for terminology-driven mining from biomedical literature. The framework integrates the following components: automatic term recognition, term variation handling, acronym acquisition, automatic discovery of term similarities and term clustering. The term variant recognition is incorporated into terminology recognition process by taking into account orthographical, morphological, syntactic, lexico-semantic and pragmatic term variations. In particular, we address acronyms as a common way of introducing term variants in biomedical papers. Term clustering is based on the automatic discovery of term similarities. We use a hybrid similarity measure, where terms are compared by using both internal and external evidence. The measure combines lexical, syntactical and contextual similarity. Experiments on terminology recognition and clustering performed on a corpus of MEDLINE abstracts recorded the precision of 98 and 71% respectively. AVAILABILITY: software for the terminology management is available upon request.  相似文献   

7.
Developing and extending a biomedical ontology is a very demanding task that can never be considered complete given our ever-evolving understanding of the life sciences. Extension in particular can benefit from the automation of some of its steps, thus releasing experts to focus on harder tasks. Here we present a strategy to support the automation of change capturing within ontology extension where the need for new concepts or relations is identified. Our strategy is based on predicting areas of an ontology that will undergo extension in a future version by applying supervised learning over features of previous ontology versions. We used the Gene Ontology as our test bed and obtained encouraging results with average f-measure reaching 0.79 for a subset of biological process terms. Our strategy was also able to outperform state of the art change capturing methods. In addition we have identified several issues concerning prediction of ontology evolution, and have delineated a general framework for ontology extension prediction. Our strategy can be applied to any biomedical ontology with versioning, to help focus either manual or semi-automated extension methods on areas of the ontology that need extension.  相似文献   

8.
We develop a new weighting approach of gene ontology (GO) terms for predicting protein subcellular localization. The weights of individual GO terms, corresponding to their contribution to the prediction algorithm, are determined by the term-weighting methods used in text categorization. We evaluate several term-weighting methods, which are based on inverse document frequency, information gain, gain ratio, odds ratio, and chi-square and its variants. Additionally, we propose a new term-weighting method based on the logarithmic transformation of chi-square. The proposed term-weighting method performs better than other term-weighting methods, and also outperforms state-of-the-art subcellular prediction methods. Our proposed method achieves 98.1%, 99.3%, 98.1%, 98.1%, and 95.9% overall accuracies for the animal BaCelLo independent dataset (IDS), fungal BaCelLo IDS, animal Höglund IDS, fungal Höglund IDS, and PLOC dataset, respectively. Furthermore, the close correlation between high-weighted GO terms and subcellular localizations suggests that our proposed method appropriately weights GO terms according to their relevance to the localizations.  相似文献   

9.
Ontology matching is a growing field of research that is of critical importance for the semantic web initiative. The use of background knowledge for ontology matching is often a key factor for success, particularly in complex and lexically rich domains such as the life sciences. However, in most ontology matching systems, the background knowledge sources are either predefined by the system or have to be provided by the user. In this paper, we present a novel methodology for automatically selecting background knowledge sources for any given ontologies to match. This methodology measures the usefulness of each background knowledge source by assessing the fraction of classes mapped through it over those mapped directly, which we call the mapping gain. We implemented this methodology in the AgreementMakerLight ontology matching framework, and evaluate it using the benchmark biomedical ontology matching tasks from the Ontology Alignment Evaluation Initiative (OAEI) 2013. In each matching problem, our methodology consistently identified the sources of background knowledge that led to the highest improvements over the baseline alignment (i.e., without background knowledge). Furthermore, our proposed mapping gain parameter is strongly correlated with the F-measure of the produced alignments, thus making it a good estimator for ontology matching techniques based on background knowledge.  相似文献   

10.
Natively unstructured regions are a common feature of eukaryotic proteomes. Between 30% and 60% of proteins are predicted to contain long stretches of disordered residues, and not only have many of these regions been confirmed experimentally, but they have also been found to be essential for protein function. In this study, we directly address the potential contribution of protein disorder in predicting protein function using standard Gene Ontology (GO) categories. Initially we analyse the occurrence of protein disorder in the human proteome and report ontology categories that are enriched in disordered proteins. Pattern analysis of the distributions of disordered regions in human sequences demonstrated that the functions of intrinsically disordered proteins are both length- and position-dependent. These dependencies were then encoded in feature vectors to quantify the contribution of disorder in human protein function prediction using Support Vector Machine classifiers. The prediction accuracies of 26 GO categories relating to signalling and molecular recognition are improved using the disorder features. The most significant improvements were observed for kinase, phosphorylation, growth factor, and helicase categories. Furthermore, we provide predicted GO term assignments using these classifiers for a set of unannotated and orphan human proteins. In this study, the importance of capturing protein disorder information and its value in function prediction is demonstrated. The GO category classifiers generated can be used to provide more reliable predictions and further insights into the behaviour of orphan and unannotated proteins.  相似文献   

11.
An understanding of heart development is critical in any systems biology approach to cardiovascular disease. The interpretation of data generated from high-throughput technologies (such as microarray and proteomics) is also essential to this approach. However, characterizing the role of genes in the processes underlying heart development and cardiovascular disease involves the non-trivial task of data analysis and integration of previous knowledge. The Gene Ontology (GO) Consortium provides structured controlled biological vocabularies that are used to summarize previous functional knowledge for gene products across all species. One aspect of GO describes biological processes, such as development and signaling.In order to support high-throughput cardiovascular research, we have initiated an effort to fully describe heart development in GO; expanding the number of GO terms describing heart development from 12 to over 280. This new ontology describes heart morphogenesis, the differentiation of specific cardiac cell types, and the involvement of signaling pathways in heart development. This work also aligns GO with the current views of the heart development research community and its representation in the literature. This extension of GO allows gene product annotators to comprehensively capture the genetic program leading to the developmental progression of the heart. This will enable users to integrate heart development data across species, resulting in the comprehensive retrieval of information about this subject.The revised GO structure, combined with gene product annotations, should improve the interpretation of data from high-throughput methods in a variety of cardiovascular research areas, including heart development, congenital cardiac disease, and cardiac stem cell research. Additionally, we invite the heart development community to contribute to the expansion of this important dataset for the benefit of future research in this area.  相似文献   

12.
生物医学数据库到生物医学本体的语义映射是基于本体集成生物医学数据库系统的一个重要环节.而生物医学本体随着学科的发展不断演化,造成了集成系统不稳定.针对这个问题,本文在本体演化条件下,分析并发现了语义映射的变化规律,设计了对应的维护流程和维护方法,并通过计算维护收益率证明了该方法对映射的维护是有效性的,从而增强了集成系统在本体演化条件下的稳定性.  相似文献   

13.
Recent years have seen a huge increase in the amount of biomedical information that is available in electronic format. Consequently, for biomedical researchers wishing to relate their experimental results to relevant data lurking somewhere within this expanding universe of on-line information, the ability to access and navigate biomedical information sources in an efficient manner has become increasingly important. Natural language and text processing techniques can facilitate this task by making the information contained in textual resources such as MEDLINE more readily accessible and amenable to computational processing. Names of biological entities such as genes and proteins provide critical links between different biomedical information sources and researchers' experimental data. Therefore, automatic identification and classification of these terms in text is an essential capability of any natural language processing system aimed at managing the wealth of biomedical information that is available electronically. To support term recognition in the biomedical domain, we have developed Termino, a large-scale terminological resource for text processing applications, which has two main components: first, a database into which very large numbers of terms can be loaded from resources such as UMLS, and stored together with various kinds of relevant information; second, a finite state recognizer, for fast and efficient identification and mark-up of terms within text. Since many biomedical applications require this functionality, we have made Termino available to the community as a web service, which allows for its integration into larger applications as a remotely located component, accessed through a standardized interface over the web.  相似文献   

14.
肖清滔  姚莉 《生物磁学》2011,(14):2770-2774
生物医学数据库到生物医学本体的语义映射是基于本体集成生物医学数据库系统的一个重要环节。而生物医学本体随着学科的发展不断演化,造成了集成系统不稳定。针对这个问题,本文在本体演化条件下,分析并发现了语义映射的变化规律,设计了对应的维护流程和维护方法,并通过计算维护收益率证明了该方法对映射的维护是有效性的,从而增强了集成系统在本体演化条件下的稳定性。  相似文献   

15.

Background  

In the current climate of high-throughput computational biology, the inference of a protein's function from related measurements, such as protein-protein interaction relations, has become a canonical task. Most existing technologies pursue this task as a classification problem, on a term-by-term basis, for each term in a database, such as the Gene Ontology (GO) database, a popular rigorous vocabulary for biological functions. However, ontology structures are essentially hierarchies, with certain top to bottom annotation rules which protein function predictions should in principle follow. Currently, the most common approach to imposing these hierarchical constraints on network-based classifiers is through the use of transitive closure to predictions.  相似文献   

16.
The Open Biomedical Ontologies (OBO) format from the GO consortium is a very successful format for biomedical ontologies, including the Gene Ontology. But it lacks formal computational definitions for its constructs and tools, like DL reasoners, to facilitate ontology development/maintenance. We describe the OBO Converter, a Java tool to convert files from OBO format to Web Ontology Language (OWL) (and vice versa) that can also be used as a Protégé Tab plug-in. It uses the OBO to OWL mapping provided by the National Center for Biomedical Ontologies (NCBO) (a joint effort of OBO developers and OWL experts) and offers options to ease the task of saving/reading files in both formats. AVAILABILITY: bioontology.org/tools/oboinowl/obo_converter.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

17.
The Gene Ontology Categorizer, developed jointly by the Los Alamos National Laboratory and Procter & Gamble Corp., provides a capability for the categorization task in the Gene Ontology (GO): given a list of genes of interest, what are the best nodes of the GO to summarize or categorize that list? The motivating question is from a drug discovery process, where after some gene expression analysis experiment, we wish to understand the overall effect of some cell treatment or condition by identifying 'where' in the GO the differentially expressed genes fall: 'clustered' together in one place? in two places? uniformly spread throughout the GO? 'high', or 'low'? In order to address this need, we view bio-ontologies more as combinatorially structured databases than facilities for logical inference, and draw on the discrete mathematics of finite partially ordered sets (posets) to develop data representation and algorithms appropriate for the GO. In doing so, we have laid the foundations for a general set of methods to address not just the categorization task, but also other tasks (e.g. distances in ontologies and ontology merger and exchange) in both the GO and other bio-ontologies (such as the Enzyme Commission database or the MEdical Subject Headings) cast as hierarchically structured taxonomic knowledge systems.  相似文献   

18.
The current increase in Gene Ontology (GO) annotations of proteins in the existing genome databases and their use in different analyses have fostered the improvement of several biomedical and biological applications. To integrate this functional data into different analyses, several protein functional similarity measures based on GO term information content (IC) have been proposed and evaluated, especially in the context of annotation-based measures. In the case of topology-based measures, each approach was set with a specific functional similarity measure depending on its conception and applications for which it was designed. However, it is not clear whether a specific functional similarity measure associated with a given approach is the most appropriate, given a biological data set or an application, i.e., achieving the best performance compared to other functional similarity measures for the biological application under consideration. We show that, in general, a specific functional similarity measure often used with a given term IC or term semantic similarity approach is not always the best for different biological data and applications. We have conducted a performance evaluation of a number of different functional similarity measures using different types of biological data in order to infer the best functional similarity measure for each different term IC and semantic similarity approach. The comparisons of different protein functional similarity measures should help researchers choose the most appropriate measure for the biological application under consideration.  相似文献   

19.

Background  

The ever-expanding population of gene expression profiles (EPs) from specified cells and tissues under a variety of experimental conditions is an important but difficult resource for investigators to utilize effectively. Software tools have been recently developed to use the distribution of gene ontology (GO) terms associated with the genes in an EP to identify specific biological functions or processes that are over- or under-represented in that EP relative to other EPs. Additionally, it is possible to use the distribution of GO terms inherent to each EP to relate that EP as a whole to other EPs. Because GO term annotation is organized in a tree-like cascade of variable granularity, this approach allows the user to relate (e.g., by hierarchical clustering) EPs of varying length and from different platforms (e.g., GeneChip, SAGE, EST library).  相似文献   

20.
Cardiac rehabilitation (CR) produces a host of health benefits related to modifiable cardiovascular risk factors. The purpose of the present investigation was to determine the influence of body weight, assessed through BMI, on acute and long-term improvements in aerobic capacity following completion of CR. Three thousand nine hundred and ninety seven subjects with coronary artery disease (CAD) participated in a 12-week multidisciplinary CR program. Subjects underwent an exercise test to determine peak estimated metabolic equivalents (eMETs) and BMI assessment at baseline, immediately following CR completion and at 1-year follow-up. Normal weight subjects at 1-year follow-up demonstrated the greatest improvement in aerobic fitness and best retention of those gains (gain in peak METs: 0.95 ± 1.1, P < 0.001). Although the improvement was significant (P < 0.001), subjects who were initially classified as obese had the lowest aerobic capacity and poorest retention in CR fitness gains at 1-year follow-up (gain in peak eMETs: 0.69 ± 1.2). Subjects initially classified as overweight by BMI had a peak eMET improvement that was also significantly better (P < 0.05) than obese subjects at 1-year follow-up (gain in peak eMETs: 0.82 ± 1.1). Significant fitness gains, one of the primary beneficial outcomes of CR, can be obtained by all subjects irrespective of BMI classification. However, obese patients have poorer baseline fitness and are more likely to "give back" fitness gains in the long term. Obese CAD patients may therefore benefit from additional interventions to enhance the positive adaptations facilitated by CR.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号