首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Pal D 《Bioinformation》2006,1(3):97-98
The effort of function annotation does not merely involve associating a gene with some structured vocabulary that describes action. Rather the details of the actions, the components of the actions, the larger context of the actions are important issues that are of direct relevance, because they help understand the biological system to which the gene/protein belongs. Currently Gene Ontology (GO) Consortium offers the most comprehensive sets of relationships to describe gene/protein activity. However, its choice to segregate gene ontology to subdomains of molecular function, biological process and cellular component is creating significant limitations in terms of future scope of use. If we are to understand biology in its total complexity, comprehensive ontologies in larger biological domains are essential. A vigorous discussion on this topic is necessary for the larger benefit of the biological community. I highlight this point because larger-bio-domain ontologies cannot be simply created by integrating subdomain ontologies. Relationships in larger bio-domain-ontologies are more complex due to larger size of the system and are therefore more labor intensive to create. The current limitations of GO will be a handicap in derivation of more complex relationships from the high throughput biology data.  相似文献   

2.
3.
We present an analysis of some considerations involved in expressing the Gene Ontology (GO) as a machine-processible ontology, reflecting principles of formal ontology. GO is a controlled vocabulary that is intended to facilitate communication between biologists by standardizing usage of terms in database annotations. Making such controlled vocabularies maximally useful in support of bioinformatics applications requires explicating in machine-processible form the implicit background information that enables human users to interpret the meaning of the vocabulary terms. In the case of GO, this process would involve rendering the meanings of GO into a formal (logical) language with the help of domain experts, and adding additional information required to support the chosen formalization. A controlled vocabulary augmented in these ways is commonly called an ontology. In this paper, we make a modest exploration to determine the ontological requirements for this extended version of GO. Using the terms within the three GO hierarchies (molecular function, biological process and cellular component), we investigate the facility with which GO concepts can be ontologized, using available tools from the philosophical and ontological engineering literature.  相似文献   

4.
In biology field, the ontology application relates to a large amount of genetic information and chemical information of molecular structure, which makes knowledge of ontology concepts convey much information. Therefore, in mathematical notation, the dimension of vector which corresponds to the ontology concept is often very large, and thus improves the higher requirements of ontology algorithm. Under this background, we consider the designing of ontology sparse vector algorithm and application in biology. In this paper, using knowledge of marginal likelihood and marginal distribution, the optimized strategy of marginal based ontology sparse vector learning algorithm is presented. Finally, the new algorithm is applied to gene ontology and plant ontology to verify its efficiency.  相似文献   

5.
6.

Background  

The biological interpretation of large-scale gene expression data is one of the paramount challenges in current bioinformatics. In particular, placing the results in the context of other available functional genomics data, such as existing bio-ontologies, has already provided substantial improvement for detecting and categorizing genes of interest. One common approach is to look for functional annotations that are significantly enriched within a group or cluster of genes, as compared to a reference group.  相似文献   

7.
8.
Zhong S  Li C  Wong WH 《Nucleic acids research》2003,31(13):3483-3486
To date, assembling comprehensive annotation information for all probe sets of any Affymetrix microarrays remains a time-consuming, error-prone and challenging task. ChipInfo is designed for retrieving annotation information from online databases such as NetAffx and Gene Ontology and organizing such information into easily interpretable tabular format outputs. As companion software to dChip and GoSurfer, ChipInfo enables users to independently update the information resource files of these software packages. It also has functions for computing related summary statistics of probe sets and Gene Ontology terms. ChipInfo is available at http://biosun1.harvard.edu/complab/chipinfo/.  相似文献   

9.
We describe the PloGO R package, a simple open-source tool for plotting gene ontology (GO) annotation and abundance information, which was developed to aid with the bioinformatics analysis of multi-condition label-free proteomics experiments using quantitation based on spectral counting. PloGO can incorporate abundance (raw spectral counts) or normalized spectral abundance factors (NSAF) data in addition to the GO annotation, as well as handle multiple files and allow for a targeted collection of GO categories of interest. Our main aims were to help identify interesting subsets of proteins for further analysis such as those arising from a protein data set partition based on the presence and absence or multiple pair-wise comparisons, as well as provide GO summaries that can be easily used in subsequent analyses. Though developed with label-free proteomics experiments in mind it is not specific to that approach and can be used for any multi-condition experiment for which GO information has been generated.  相似文献   

10.
Quantitative or numerical metrics of protein function specificity made possible by the Gene Ontology are useful in that they enable development of distance or similarity measures between protein functions. Here we describe how to calculate four measures of function specificity for GO terms: 1) number of ancestor terms; 2) number of offspring terms; 3) proportion of terms; and 4) Information Content (IC). We discuss the relationship between the metrics and the strengths and weaknesses of each.  相似文献   

11.
Here we introduce a computer database that allows for the rapid retrieval of physicochemical properties, Gene Ontology, and Kyoto Encyclopedia of Genes and Genomes information about a protein or a list of proteins. We applied PIGOK analyzing Schizosaccharomyces pombe proteins displaying differential expression under oxidative stress and identified their biological functions and pathways. The database is available on the Internet at http://pc4-133.ludwig.ucl.ac.uk/pigok.html.  相似文献   

12.
13.
14.
Based on the recent development in the gene ontology and functional domain databases, a new hybridization approach is developed for predicting protein subcellular location by combining the gene product, functional domain, and quasi-sequence-order effects. As a showcase, the same prokaryotic and eukaryotic datasets, which were studied by many previous investigators, are used for demonstration. The overall success rate by the jackknife test for the prokaryotic set is 94.7% and that for the eukaryotic set 92.9%. These are so far the highest success rates achieved for the two datasets by following a rigorous cross-validation test procedure, suggesting that such a hybrid approach may become a very useful high-throughput tool in the area of bioinformatics, proteomics, as well as molecular cell biology. The very high success rates also reflect the fact that the subcellular localization of a protein is closely correlated with: (1). the biological objective to which the gene or gene product contributes, (2). the biochemical activity of a gene product, and (3). the place in the cell where a gene product is active.  相似文献   

15.

Background  

Uncovering cellular roles of a protein is a task of tremendous importance and complexity that requires dedicated experimental work as well as often sophisticated data mining and processing tools. Protein functions, often referred to as its annotations, are believed to manifest themselves through topology of the networks of inter-proteins interactions. In particular, there is a growing body of evidence that proteins performing the same function are more likely to interact with each other than with proteins with other functions. However, since functional annotation and protein network topology are often studied separately, the direct relationship between them has not been comprehensively demonstrated. In addition to having the general biological significance, such demonstration would further validate the data extraction and processing methods used to compose protein annotation and protein-protein interactions datasets.  相似文献   

16.
MOTIVATION: There has been an explosion of interest in the role of mitochondria in programmed cell death and other fundamental pathological processes underlying the development of human diseases. Nevertheless, the inventory of mitochondrial proteins encoded in the nuclear genome remains incomplete, providing an impediment to mitochondrial research at the interface with systems biology. We created the MiGenes database to further define the scope of the mitochondrial proteome in humans and model organisms including mice, rats, flies and worms as well as budding and fission yeasts. MiGenes is intended to stimulate mitochondrial research using model organisms. SUMMARY: MiGenes is a large-scale relational database that is automatically updated to keep pace with advances in mitochondrial proteomics and is curated to assure that the designation of proteins as mitochondrial reflects gene ontology (GO) annotations supported by high-quality evidence codes. A set of postulates is proposed to help define which proteins are authentic components of mitochondria. MiGenes incorporates >1160 new GO annotations to human, mouse and rat protein records, 370 of which represent the first GO annotation reflecting a mitochondrial localization. MiGenes employs a flexible search interface that permits batchwise accession number searches to support high-throughput proteomic studies. A web interface is provided to permit members of the mitochondrial research community to suggest modifications in protein annotations or mitochondrial status.  相似文献   

17.

Background  

Integration and exploration of data obtained from genome wide monitoring technologies has become a major challenge for many bioinformaticists and biologists due to its heterogeneity and high dimensionality. A widely accepted approach to solve these issues has been the creation and use of controlled vocabularies (ontologies). Ontologies allow for the formalization of domain knowledge, which in turn enables generalization in the creation of querying interfaces as well as in the integration of heterogeneous data, providing both human and machine readable interfaces.  相似文献   

18.

Background  

The Gene Ontology project supports categorization of gene products according to their location of action, the molecular functions that they carry out, and the processes that they are involved in. Although the ontologies are intentionally developed to be taxon neutral, and to cover all species, there are inherent taxon specificities in some branches. For example, the process 'lactation' is specific to mammals and the location 'mitochondrion' is specific to eukaryotes. The lack of an explicit formalization of these constraints can lead to errors and inconsistencies in automated and manual annotation.  相似文献   

19.
The original aim of the Information Theory (IT) was to solve a purely technical problem: to increase the performance of communication systems, which are constantly affected by interferences that diminish the quality of the transmitted information. That is, the theory deals only with the problem of transmitting with the maximal precision the symbols constituting a message. In Shannon''s theory messages are characterized only by their probabilities, regardless of their value or meaning. As for its present day status, it is generally acknowledged that Information Theory has solid mathematical foundations and has fruitful strong links with Physics in both theoretical and experimental areas. However, many applications of Information Theory to Biology are limited to using it as a technical tool to analyze biopolymers, such as DNA, RNA or protein sequences. The main point of discussion about the applicability of IT to explain the information flow in biological systems is that in a classic communication channel, the symbols that conform the coded message are transmitted one by one in an independent form through a noisy communication channel, and noise can alter each of the symbols, distorting the message; in contrast, in a genetic communication channel the coded messages are not transmitted in the form of symbols but signaling cascades transmit them. Consequently, the information flow from the emitter to the effector is due to a series of coupled physicochemical processes that must ensure the accurate transmission of the message. In this review we discussed a novel proposal to overcome this difficulty, which consists of the modeling of gene expression with a stochastic approach that allows Shannon entropy (H) to be directly used to measure the amount of uncertainty that the genetic machinery has in relation to the correct decoding of a message transmitted into the nucleus by a signaling pathway. From the value of H we can define a function I that measures the amount of information content in the input message that the cell''s genetic machinery is processing during a given time interval. Furthermore, combining Information Theory with the frequency response analysis of dynamical systems we can examine the cell''s genetic response to input signals with varying frequencies, amplitude and form, in order to determine if the cell can distinguish between different regimes of information flow from the environment. In the particular case of the ethylene signaling pathway, the amount of information managed by the root cell of Arabidopsis can be correlated with the frequency of the input signal. The ethylene signaling pathway cuts off very low and very high frequencies, allowing a window of frequency response in which the nucleus reads the incoming message as a varying input. Outside of this window the nucleus reads the input message as an approximately non-varying one. This frequency response analysis is also useful to estimate the rate of information transfer during the transport of each new ERF1 molecule into the nucleus. Additionally, application of Information Theory to analysis of the flow of information in the ethylene signaling pathway provides a deeper insight in the form in which the transition between auxin and ethylene hormonal activity occurs during a circadian cycle. An ambitious goal for the future would be to use Information Theory as a theoretical foundation for a suitable model of the information flow that runs at each level and through all levels of biological organization.Key words: information theory, shannon entropy, frequency systems analysis, Arabidopsis thaliana, ethylene signaling systems, plant genetic networks, circadian cycles  相似文献   

20.
Mostafavi S  Morris Q 《Proteomics》2012,12(10):1687-1696
In this article, we review how interaction networks can be used alone or in combination in an automated fashion to provide insight into gene and protein function. We describe the concept of a "gene-recommender system" that can be applied to any large collection of interaction networks to make predictions about gene or protein function based on a query list of proteins that share a function of interest. We discuss these systems in general and focus on one specific system, GeneMANIA, that has unique features and uses different algorithms from the majority of other systems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号