首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In biology field, the ontology application relates to a large amount of genetic information and chemical information of molecular structure, which makes knowledge of ontology concepts convey much information. Therefore, in mathematical notation, the dimension of vector which corresponds to the ontology concept is often very large, and thus improves the higher requirements of ontology algorithm. Under this background, we consider the designing of ontology sparse vector algorithm and application in biology. In this paper, using knowledge of marginal likelihood and marginal distribution, the optimized strategy of marginal based ontology sparse vector learning algorithm is presented. Finally, the new algorithm is applied to gene ontology and plant ontology to verify its efficiency.  相似文献   

2.
3.
Hymenoptera is an extraordinarily diverse lineage, both in terms of species numbers and morphotypes, that includes sawflies, bees, wasps, and ants. These organisms serve critical roles as herbivores, predators, parasitoids, and pollinators, with several species functioning as models for agricultural, behavioral, and genomic research. The collective anatomical knowledge of these insects, however, has been described or referred to by labels derived from numerous, partially overlapping lexicons. The resulting corpus of information--millions of statements about hymenopteran phenotypes--remains inaccessible due to language discrepancies. The Hymenoptera Anatomy Ontology (HAO) was developed to surmount this challenge and to aid future communication related to hymenopteran anatomy. The HAO was built using newly developed interfaces within mx, a Web-based, open source software package, that enables collaborators to simultaneously contribute to an ontology. Over twenty people contributed to the development of this ontology by adding terms, genus differentia, references, images, relationships, and annotations. The database interface returns an Open Biomedical Ontology (OBO) formatted version of the ontology and includes mechanisms for extracting candidate data and for publishing a searchable ontology to the Web. The application tools are subject-agnostic and may be used by others initiating and developing ontologies. The present core HAO data constitute 2,111 concepts, 6,977 terms (labels for concepts), 3,152 relations, 4,361 sensus (links between terms, concepts, and references) and over 6,000 text and graphical annotations. The HAO is rooted with the Common Anatomy Reference Ontology (CARO), in order to facilitate interoperability with and future alignment to other anatomy ontologies, and is available through the OBO Foundry ontology repository and BioPortal. The HAO provides a foundation through which connections between genomic, evolutionary developmental biology, phylogenetic, taxonomic, and morphological research can be actualized. Inherent mechanisms for feedback and content delivery demonstrate the effectiveness of remote, collaborative ontology development and facilitate future refinement of the HAO.  相似文献   

4.
We present an analysis of some considerations involved in expressing the Gene Ontology (GO) as a machine-processible ontology, reflecting principles of formal ontology. GO is a controlled vocabulary that is intended to facilitate communication between biologists by standardizing usage of terms in database annotations. Making such controlled vocabularies maximally useful in support of bioinformatics applications requires explicating in machine-processible form the implicit background information that enables human users to interpret the meaning of the vocabulary terms. In the case of GO, this process would involve rendering the meanings of GO into a formal (logical) language with the help of domain experts, and adding additional information required to support the chosen formalization. A controlled vocabulary augmented in these ways is commonly called an ontology. In this paper, we make a modest exploration to determine the ontological requirements for this extended version of GO. Using the terms within the three GO hierarchies (molecular function, biological process and cellular component), we investigate the facility with which GO concepts can be ontologized, using available tools from the philosophical and ontological engineering literature.  相似文献   

5.

Background  

Prediction of protein subcellular localization generally involves many complex factors, and using only one or two aspects of data information may not tell the true story. For this reason, some recent predictive models are deliberately designed to integrate multiple heterogeneous data sources for exploiting multi-aspect protein feature information. Gene ontology, hereinafter referred to as GO, uses a controlled vocabulary to depict biological molecules or gene products in terms of biological process, molecular function and cellular component. With the rapid expansion of annotated protein sequences, gene ontology has become a general protein feature that can be used to construct predictive models in computational biology. Existing models generally either concatenated the GO terms into a flat binary vector or applied majority-vote based ensemble learning for protein subcellular localization, both of which can not estimate the individual discriminative abilities of the three aspects of gene ontology.  相似文献   

6.
Small molecules play crucial role in the modulation of biological functions by interacting with specific macromolecules. Hence small molecule interactions are captured by a variety of experimental methods to estimate and propose correlations between molecular structures to their biological activities. The tremendous expanse in publicly available small molecules is also driving new efforts to better understand interactions involving small molecules particularly in area of drug docking and pharmacogenomics. We have studied and designed a functional group identification system with the associated ontology for it. The functional group identification system can detect the functional group components from given ligand structure with specific coordinate information. Functional group ontology (FGO) proposed by us is a structured classification of chemical functional group which acts as an important source of prior knowledge that may be automatically integrated to support identification, categorization and predictive data analysis tasks. We have used a new annotation method which can be used to construct the original structure from given ontological expression using exact coordinate information. Here, we also discuss about ontology-driven similarity measure of functional groups and uses of such novel ontology for pharmacophore searching and de-novo ligand designing.  相似文献   

7.
8.
Yasui Y  Pepe M  Hsu L  Adam BL  Feng Z 《Biometrics》2004,60(1):199-206
Training data in a supervised learning problem consist of the class label and its potential predictors for a set of observations. Constructing effective classifiers from training data is the goal of supervised learning. In biomedical sciences and other scientific applications, class labels may be subject to errors. We consider a setting where there are two classes but observations with labels corresponding to one of the classes may in fact be mislabeled. The application concerns the use of protein mass-spectrometry data to discriminate between serum samples from cancer and noncancer patients. The patients in the training set are classified on the basis of tissue biopsy. Although biopsy is 100% specific in the sense that a tissue that shows itself to have malignant cells is certainly cancer, it is less than 100% sensitive. Reference gold standards that are subject to this special type of misclassification due to imperfect diagnosis certainty arise in many fields. We consider the development of a supervised learning algorithm under these conditions and refer to it as partially supervised learning. Boosting is a supervised learning algorithm geared toward high-dimensional predictor data, such as those generated in protein mass-spectrometry. We propose a modification of the boosting algorithm for partially supervised learning. The proposal is to view the true class membership of the samples that are labeled with the error-prone class label as missing data, and apply an algorithm related to the EM algorithm for minimization of a loss function. To assess the usefulness of the proposed method, we artificially mislabeled a subset of samples and applied the original and EM-modified boosting (EM-Boost) algorithms for comparison. Notable improvements in misclassification rates are observed with EM-Boost.  相似文献   

9.
The Gene Ontology Categorizer, developed jointly by the Los Alamos National Laboratory and Procter & Gamble Corp., provides a capability for the categorization task in the Gene Ontology (GO): given a list of genes of interest, what are the best nodes of the GO to summarize or categorize that list? The motivating question is from a drug discovery process, where after some gene expression analysis experiment, we wish to understand the overall effect of some cell treatment or condition by identifying 'where' in the GO the differentially expressed genes fall: 'clustered' together in one place? in two places? uniformly spread throughout the GO? 'high', or 'low'? In order to address this need, we view bio-ontologies more as combinatorially structured databases than facilities for logical inference, and draw on the discrete mathematics of finite partially ordered sets (posets) to develop data representation and algorithms appropriate for the GO. In doing so, we have laid the foundations for a general set of methods to address not just the categorization task, but also other tasks (e.g. distances in ontologies and ontology merger and exchange) in both the GO and other bio-ontologies (such as the Enzyme Commission database or the MEdical Subject Headings) cast as hierarchically structured taxonomic knowledge systems.  相似文献   

10.
环境微生物研究中机器学习算法及应用   总被引:1,自引:0,他引:1  
陈鹤  陶晔  毛振镀  邢鹏 《微生物学报》2022,62(12):4646-4662
微生物在环境中无处不在,它们不仅是生物地球化学循环和环境演化的关键参与者,也在环境监测、生态治理和保护中发挥着重要作用。随着高通量技术的发展,大量微生物数据产生,运用机器学习对环境微生物大数据进行建模和分析,在微生物标志物识别、污染物预测和环境质量预测等领域的科学研究和社会应用方面均具有重要意义。机器学习可分为监督学习和无监督学习2大类。在微生物组学研究当中,无监督学习通过聚类、降维等方法高效地学习输入数据的特征,进而对微生物数据进行整合和归类。监督学习运用有特征和标记的微生物数据集训练模型,在面对只有特征没有标记的数据时可以判断出标记,从而实现对新数据的分类、识别和预测。然而,复杂的机器学习算法通常以牺牲可解释性为代价来重点关注模型预测的准确性。机器学习模型通常可以看作预测特定结果的“黑匣子”,即对模型如何得出预测所知甚少。为了将机器学习更多地运用于微生物组学研究、提高我们提取有价值的微生物信息的能力,深入了解机器学习算法、提高模型的可解释性尤为重要。本文主要介绍在环境微生物领域常用的机器学习算法和基于微生物组数据的机器学习模型的构建步骤,包括特征选择、算法选择、模型构建和评估等,并对各种机器学习模型在环境微生物领域的应用进行综述,深入探究微生物组与周围环境之间的关联,探讨提高模型可解释性的方法,并为未来环境监测、环境健康预测提供科学参考。  相似文献   

11.
Improving missing value estimation in microarray data with gene ontology   总被引:3,自引:0,他引:3  
MOTIVATION: Gene expression microarray experiments produce datasets with frequent missing expression values. Accurate estimation of missing values is an important prerequisite for efficient data analysis as many statistical and machine learning techniques either require a complete dataset or their results are significantly dependent on the quality of such estimates. A limitation of the existing estimation methods for microarray data is that they use no external information but the estimation is based solely on the expression data. We hypothesized that utilizing a priori information on functional similarities available from public databases facilitates the missing value estimation. RESULTS: We investigated whether semantic similarity originating from gene ontology (GO) annotations could improve the selection of relevant genes for missing value estimation. The relative contribution of each information source was automatically estimated from the data using an adaptive weight selection procedure. Our experimental results in yeast cDNA microarray datasets indicated that by considering GO information in the k-nearest neighbor algorithm we can enhance its performance considerably, especially when the number of experimental conditions is small and the percentage of missing values is high. The increase of performance was less evident with a more sophisticated estimation method. We conclude that even a small proportion of annotated genes can provide improvements in data quality significant for the eventual interpretation of the microarray experiments. AVAILABILITY: Java and Matlab codes are available on request from the authors. SUPPLEMENTARY MATERIAL: Available online at http://users.utu.fi/jotatu/GOImpute.html.  相似文献   

12.
《Ecological Complexity》2008,5(3):272-279
As ecological data increases in breadth, depth, and complexity, the discipline of ecology is increasingly influenced by information science. While this influence provides many opportunities for ecologists, it also necessitates a change in how we manage and share data, and perhaps more fundamentally, define concepts in ecology. Specifically, the information technology process of automated data integration entirely depends upon consistent concept definition. A common tool used in computer science and engineering to specify meanings, which is both novel and offers significant potential to ecology, is an ontology. An ontology is a formal representation of knowledge in which concepts are described by their meaning and their relationship to each other. Ontologies are a tool that can be used to ‘explicitly specify a concept’ (Gruber, 1993) and this approach is uncommon in ecology. In this paper, we develop an ontology for the concept of ‘landscape’ that captures the most general definitions and usages of this term. We selected the concept of landscape because it is often used in very different ways by investigators and hence generates linguistic uncertainty. A graphic theoretic (i.e., visual) model is provided which describes the set of structuring rules we used to define the relationships between ‘landscape’ and appropriately related terms. Based upon these rules, a landscape necessarily contains a spatial component (i.e., area), structure and function (i.e., ecosystems), and is scale independent. This approach provides the set of necessary conditions for landscape studies to reduce linguistic uncertainty, and facilitate interoperability of data, i.e., in a manner that promotes data linkages and quantitative synthesis particularly by automatic data synthesis programs that are likely to become an important part of ecology in the future. Simply put, we use an ontology, a technique novel to ecology but not other disciplines, to define ‘landscape,’ thereby clearly delineating one subset of its potential general usage. As such this ontology can serve as both a checklist for landscape studies and a blueprint for additional ecological ontologies.  相似文献   

13.
By 2026, Korea is expected to surpass the UNs definition of an aged society and reach the level of a super-aged society. As a result, degenerative spinal disease and the related surgical procedures will increase exponentially. To prevent unnecessary spinal surgery and support scientific diagnosis of spinal disease and systematic prediction of treatment effects, we have been developing e-Spine which is a computerized simulation model of human spines. In this paper, we present the Korean spine database and ontology that are used as a background data for realizing e-Spine. Generally, Korean physical function is different from foreign physical function. For example, ossification of posterior longitudinal ligament is only occurred in Asians. However, developed countries are currently constructing digital human data to improve the medical and biomedical researches, while the digital human data for Korean are inadequate. Therefore, we constructed Korean spine database on Korean with normal spine or degenerative spinal diseases. To date, we have collected spine data from 72 cadavers and 298 patients. The spine data consists of 2D images such as CT, MRI, or X-ray, 3D shapes, geometry data and property data. The volume and quality of Korean spine database are now the worlds highest. Also, we constructed spinal ontology to provide a wealth of information related to spine. The spinal ontology contains anatomy of spine, method of treatment, cause, classification information related to spine. Finally, we implemented a management service for efficiently searching and managing the data. As a result, our database and ontology will offer great value and utility in the diagnosis, treatment, and rehabilitation of patients suffering from spinal diseases.  相似文献   

14.
MOTIVATION: As the scientific curiosity in genome studies shifts toward identification of functions of the genomes in large scale, data produced about cellular processes at molecular level has been accumulating with an accelerating rate. In this regard, it is essential to be able to store, integrate, access and analyze this data effectively with the help of software tools. Clearly this requires a strong ontology that is intuitive, comprehensive and uncomplicated. RESULTS: We define an ontology for an intuitive, comprehensive and uncomplicated representation of cellular events. The ontology presented here enables integration of fragmented or incomplete pathway information via collaboration, and supports manipulation of the stored data. In addition, it facilitates concurrent modifications to the data while maintaining its validity and consistency. Furthermore, novel structures for representation of multiple levels of abstraction for pathways and homologies is provided. Lastly, our ontology supports efficient querying of large amounts of data. We have also developed a software tool named pathway analysis tool for integration and knowledge acquisition (PATIKA) providing an integrated, multi-user environment for visualizing and manipulating network of cellular events. PATIKA implements the basics of our ontology.  相似文献   

15.
Parameter setting plays an important role for improving the performance of a brain computer interface (BCI). Currently, parameters (e.g. channels and frequency band) are often manually selected. It is time-consuming and not easy to obtain an optimal combination of parameters for a BCI. In this paper, motor imagery-based BCIs are considered, in which channels and frequency band are key parameters. First, a semi-supervised support vector machine algorithm is proposed for automatically selecting a set of channels with given frequency band. Next, this algorithm is extended for joint channel-frequency selection. In this approach, both training data with labels and test data without labels are used for training a classifier. Hence it can be used in small training data case. Finally, our algorithms are applied to a BCI competition data set. Our data analysis results show that these algorithms are effective for selection of frequency band and channels when the training data set is small.  相似文献   

16.
17.
Structural characterization of protein-protein interactions is essential for our ability to study life processes at the molecular level. Computational modeling of protein complexes (protein docking) is important as the source of their structure and as a way to understand the principles of protein interaction. Rapidly evolving comparative docking approaches utilize target/template similarity metrics, which are often based on the protein structure. Although the structural similarity, generally, yields good performance, other characteristics of the interacting proteins (eg, function, biological process, and localization) may improve the prediction quality, especially in the case of weak target/template structural similarity. For the ranking of a pool of models for each target, we tested scoring functions that quantify similarity of Gene Ontology (GO) terms assigned to target and template proteins in three ontology domains—biological process, molecular function, and cellular component (GO-score). The scoring functions were tested in docking of bound, unbound, and modeled proteins. The results indicate that the combined structural and GO-terms functions improve the scoring, especially in the twilight zone of structural similarity, typical for protein models of limited accuracy.  相似文献   

18.
M. Ba  G. Diallo 《IRBM》2013,34(1):56-59
The proliferation of biomedical applications, which rely on different knowledge organization systems, such as ontologies and thesauri raises the issue of the automated identification of the correspondences between these models, in particular for the data integration need. A significant effort has been conducted for tackling this issue of ontology alignment. However, few systems are able to deal with ontologies containing tens of thousands of entities, as it may be the case in the biomedical domain where resources such as SNOMED-CT, the FMA or the NCI thesaurus are commonly used. We present in this paper ServOMap, an efficient system for large-scale ontology alignment. It relies on an Ontology Server (ServO) and uses Information Retrieval techniques for computing similarity between entities. The system participated with two configurations in the 2012 Ontology Alignment Evaluation Initiative campaign. We report the very promising results obtained by the system for large biomedical ontologies alignment. ServOMap is freely available for download at http://code.google.com/p/servo/.  相似文献   

19.

Background  

Ontology term labels can be ambiguous and have multiple senses. While this is no problem for human annotators, it is a challenge to automated methods, which identify ontology terms in text. Classical approaches to word sense disambiguation use co-occurring words or terms. However, most treat ontologies as simple terminologies, without making use of the ontology structure or the semantic similarity between terms. Another useful source of information for disambiguation are metadata. Here, we systematically compare three approaches to word sense disambiguation, which use ontologies and metadata, respectively.  相似文献   

20.
Although databases for cell signaling pathways include numbers of reaction data of the pathways, the reaction data cannot be used yet to deduce biological functions from them. For the deduction, we need systematic and consistent interpretation of biological functions of reactions in cell signaling pathways in the context of "information transmission". To address this issue, we have developed a functional ontology for cell signaling pathways, Cell Signaling Network Ontology (CSN-Ontology), which provides framework for the functional interpretation presenting some important concepts as information, selectivity, movability, and signaling rules including passage of time.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号