首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
BACKGROUND: Saccharomyces cerevisiae is recognized as a model system representing a simple eukaryote whose genome can be easily manipulated. Information solicited by scientists on its biological entities (Proteins, Genes, RNAs...) is scattered within several data sources like SGD, Yeastract, CYGD-MIPS, BioGrid, PhosphoGrid, etc. Because of the heterogeneity of these sources, querying them separately and then manually combining the returned results is a complex and time-consuming task for biologists most of whom are not bioinformatics expert. It also reduces and limits the use that can be made on the available data. RESULTS: To provide transparent and simultaneous access to yeast sources, we have developed YeastMed: an XML and mediator-based system. In this paper, we present our approach in developing this system which takes advantage of SB-KOM to perform the query transformation needed and a set of Data Services to reach the integrated data sources. The system is composed of a set of modules that depend heavily on XML and Semantic Web technologies. User queries are expressed in terms of a domain ontology through a simple form-based web interface. CONCLUSIONS: YeastMed is the first mediation-based system specific for integrating yeast data sources. It was conceived mainly to help biologists to find simultaneously relevant data from multiple data sources. It has a biologist-friendly interface easy to use. The system is available at http://www.khaos.uma.es/yeastmed/.  相似文献   

2.
3.
Characterizing gene function is one of the major challenging tasks in the post-genomic era. To address this challenge, we have developed GeneFAS (Gene Function Annotation System), a new integrated probabilistic method for cellular function prediction by combining information from protein-protein interactions, protein complexes, microarray gene expression profiles, and annotations of known proteins through an integrative statistical model. Our approach is based on a novel assessment for the relationship between (1) the interaction/correlation of two proteins' high-throughput data and (2) their functional relationship in terms of their Gene Ontology (GO) hierarchy. We have developed a Web server for the predictions. We have applied our method to yeast Saccharomyces cerevisiae and predicted functions for 1548 out of 2472 unannotated proteins.  相似文献   

4.
TAMBIS: transparent access to multiple bioinformatics information sources   总被引:4,自引:0,他引:4  
SUMMARY: TAMBIS (Transparent Access to Multiple Bioinformatics Information Sources) is an application that allows biologists to ask rich and complex questions over a range of bioinformatics resources. It is based on a model of the knowledge of the concepts and their relationships in molecular biology and bioinformatics. AVAILABILITY: TAMBIS is available as an applet from http://img.cs.man.ac.uk/tambis SUPPLEMENTARY: A full manual, tutorial and videos can be found at http://img.cs.man.ac.uk/tambis. CONTACT: tambis@cs.man.ac.uk  相似文献   

5.
Images are paramount in documentation of morphological data. Production and reproduction costs have traditionally limited how many illustrations taxonomy could afford to publish, and much comparative knowledge continues to be lost as generations turn over. Now digital images are cheaply produced and easily disseminated electronically but pose problems in maintenance, curation, sharing, and use, particularly in long-term data sets involving multiple collaborators and institutions. We propose an efficient linkage of images to phylogenetic data sets via an ontology of morphological terms; an underlying, fine-grained database of specimens, images, and associated metadata; fixation of the meaning of morphological terms (homolog names) by ostensive references to particular taxa; and formalization of images as standard views. The ontology provides the intellectual structure and fundamental design of the relationships and enables intelligent queries to populate phylogenetic data sets with images. The database itself documents primary morphological observations, their vouchers, and associated metadata, rather than the conventional data set cell, and thereby facilitates data maintenance despite character redefinition or specimen reidentification. It minimizes reexamination of specimens, loss of information or data quality, and echoes the data models of web-based repositories for images, specimens, and taxonomic names. Confusion and ambiguity in the meanings of technical morphological terms are reduced by ostensive definitions pointing to features in particular taxa, which may serve as reference for globally unique identifiers of characters. Finally, the concept of standard views (an image illustrating one or more homologs in a specific sex and life stage, in a specific orientation, using a specific device and preparation technique) enables efficient, dynamic linkage of images to the data set and automatic population of matrix cells with images independently of scoring decisions.  相似文献   

6.
High-throughput methods for detecting protein interactions, such as mass spectrometry and yeast two-hybrid assays, continue to produce vast amounts of data that may be exploited to infer protein function and regulation. As this article went to press, the pool of all published interaction information on Saccharomyces cerevisiae was 15,143 interactions among 4,825 proteins, and power-law scaling supports an estimate of 20,000 specific protein interactions. To investigate the biases, overlaps, and complementarities among these data, we have carried out an analysis of two high-throughput mass spectrometry (HMS)-based protein interaction data sets from budding yeast, comparing them to each other and to other interaction data sets. Our analysis reveals 198 interactions among 222 proteins common to both data sets, many of which reflect large multiprotein complexes. It also indicates that a "spoke" model that directly pairs bait proteins with associated proteins is roughly threefold more accurate than a "matrix" model that connects all proteins. In addition, we identify a large, previously unsuspected nucleolar complex of 148 proteins, including 39 proteins of unknown function. Our results indicate that existing large-scale protein interaction data sets are nonsaturating and that integrating many different experimental data sets yields a clearer biological view than any single method alone.  相似文献   

7.
The functional form of spillover, measured as a gradient of abundance of fish, may provide insight about processes that control the spatial distribution of fish inside and outside the MPA. In this study, we aimed to infer on spillover mechanism of Diplodus spp. (family Sparidae) from a Mediterranean MPA (Carry-le-Rouet, France) from visual censuses and artisanal fisheries data. From the existing literature, three potential functional forms of spillover such as a linear gradient, an exponential gradient and a logistic gradient are defined. Each functional form is included in a spatial generalized linear mixed model allowing accounting for spatial autocorrelation of data. We select between the different forms of gradients by using a Bayesian model selection procedure. In a first step, the functional form of the spillover for visual census and artisanal fishing data is assessed separately. For both sets of data, our model selection favoured the negative exponential model, evidencing a decrease of the spatial abundance of fish vanishing around 1000 m from the MPA border. We combined both datasets in a joint model by including an observability parameter. This parameter captures how the different sources of data quantify the underlying spatial distribution of the harvested species. This enabled us to demonstrate that the different sampling methods do not affect the estimation of the underlying spatial distribution of Diplodus spp. inside and outside the MPA. We show that data from different sources can be pooled through spatial generalized linear mixed model. Our findings allow to better understand the underlying mechanisms that control spillover of fish from MPA.  相似文献   

8.
Chen Y  Wang W  Zhou Y  Shields R  Chanda SK  Elston RC  Li J 《PloS one》2011,6(6):e21137
Identifying disease genes is crucial to the understanding of disease pathogenesis, and to the improvement of disease diagnosis and treatment. In recent years, many researchers have proposed approaches to prioritize candidate genes by considering the relationship of candidate genes and existing known disease genes, reflected in other data sources. In this paper, we propose an expandable framework for gene prioritization that can integrate multiple heterogeneous data sources by taking advantage of a unified graphic representation. Gene-gene relationships and gene-disease relationships are then defined based on the overall topology of each network using a diffusion kernel measure. These relationship measures are in turn normalized to derive an overall measure across all networks, which is utilized to rank all candidate genes. Based on the informativeness of available data sources with respect to each specific disease, we also propose an adaptive threshold score to select a small subset of candidate genes for further validation studies. We performed large scale cross-validation analysis on 110 disease families using three data sources. Results have shown that our approach consistently outperforms other two state of the art programs. A case study using Parkinson disease (PD) has identified four candidate genes (UBB, SEPT5, GPR37 and TH) that ranked higher than our adaptive threshold, all of which are involved in the PD pathway. In particular, a very recent study has observed a deletion of TH in a patient with PD, which supports the importance of the TH gene in PD pathogenesis. A web tool has been implemented to assist scientists in their genetic studies.  相似文献   

9.
GDPC: connecting researchers with multiple integrated data sources   总被引:1,自引:0,他引:1  
The goal of this project is to simplify access to genomic diversity and phenotype data, thereby encouraging reuse of this data. The Genomic Diversity and Phenotype Connection (GDPC) accomplishes this by retrieving data from one or more data sources and by allowing researchers to analyze integrated data in a standard format. GDPC is written in JAVA and provides (1) data sources available as web services that transfer XML formatted data via the SOAP protocol; (2) a JAVA API for programmatic access to data sources; and (3) a front-end application that allows users to manage data sources, retrieve data based on filters, sort/group data based on property values and save/open the data as XML files. AVAILABILITY: The source code, compiled code, documentation and GDPC Browser are freely available at: www.maizegenetics.net/gdpc/index.html the current release of GDPC is version 1.0, with updated releases planned for the future. Comments are welcome.  相似文献   

10.
11.
12.
A system for "intelligent" semantic integration and querying of federated databases is being implemented by using three main components: A component which enables SQL access to integrated databases by database federation (MARGBench), an ontology based semantic metadatabase (SEMEDA) and an ontology based query interface (SEMEDA-query). In this publication we explain and demonstrate the principles, architecture and the use of SEMEDA. Since SEMEDA is implemented as 3 tiered web application database providers can enter all relevant semantic and technical information about their databases by themselves via a web browser. SEMEDA' s collaborative ontology editing feature is not restricted to database integration, and might also be useful for ongoing ontology developments, such as the "Gene Ontology" [2]. SEMEDA can be found at http://www-bm.cs.uni-magdeburg.de/semeda/. We explain how this ontologically structured information can be used for semantic database integration. In addition, requirements to ontologies for molecular biological database integration are discussed and relevant existing ontologies are evaluated. We further discuss how ontologies and structured knowledge sources can be used in SEMEDA and whether they can be merged supplemented or updated to meet the requirements for semantic database integration.  相似文献   

13.
DNA–protein interactions play essential roles in all living cells. Understanding of how features embedded in the DNA sequence affect specific interactions with proteins is both challenging and important, since it may contribute to finding the means to regulate metabolic pathways involving DNA–protein interactions. Using a massive experimental benchmark dataset of binding scores for DNA sequences and a machine learning workflow, we describe the binding to DNA of T7 primase, as a model system for specific DNA–protein interactions. Effective binding of T7 primase to its specific DNA recognition sequences triggers the formation of RNA primers that serve as Okazaki fragment start sites during DNA replication.  相似文献   

14.
Microarray technology has resulted in an explosion of complex, valuable data. Integrating data analysis tools with a comprehensive underlying database would allow efficient identification of common properties among differentially regulated genes. In this study we sought to compare the utility of various databases in microarray analysis. The Proteome BioKnowledge Library (BKL), a manually curated, proteome-wide compilation of the scientific literature, was used to generate a list of Gene Ontology (GO) Biological Process (BP) terms enriched among proteins involved in cardiovascular disease. Analysis of DNA microarray data generated in a study of rat vascular smooth muscle cell responses revealed significant enrichment in a number of GO BPs that were also enriched among cardiovascular disease-related proteins. Using annotation from LocusLink and chip annotation from the Gene Expression Omnibus yielded fewer enriched cardiovascular disease-associated GO BP terms. Data sets of orthologous genes from mouse and human were generated using the BKL Retriever. Analysis of these sets focusing on BKL Disease annotation, revealed a significant association of these genes with cardiovascular disease. These results and the extensive presence of experimental evidence for BKL GO and Disease features, underscore the benefits of using this database for microarray analysis.  相似文献   

15.

Background  

The accurate detection of differentially expressed (DE) genes has become a central task in microarray analysis. Unfortunately, the noise level and experimental variability of microarrays can be limiting. While a number of existing methods partially overcome these limitations by incorporating biological knowledge in the form of gene groups, these methods sacrifice gene-level resolution. This loss of precision can be inappropriate, especially if the desired output is a ranked list of individual genes. To address this shortcoming, we developed M-BISON (Microarray-Based Integration of data SOurces using Networks), a formal probabilistic model that integrates background biological knowledge with microarray data to predict individual DE genes.  相似文献   

16.
After a brief synopsis of the history of mantodean classification, a re-organized systematic arrangement of extant praying mantids is provided. To overcome past homoplasy problems, a phylogenetic framework based on male genital structure was used, supplemented by published morphological, chromosomal and molecular data. As already noticed by previous authors, external morphology is highly homoplastic and does not provide useful systematic tools above subfamily level. In contrast, the morphology of male external genitalia is largely congruent with the results of recent molecular phylogenies, but contradicts the most widely used past systems. Additionally, some genital structures widely used for taxonomic purposes could be shown to be not homologous, most notably the distal process. Evolutionary transitions of the distal process and the phalloid apophysis across the mantodean phylogenetic tree are identified and named. The phalloid apophysis of many derived mantodeans shows a tendency towards bifurcation into an anterior and a posterior lobe. This and other observed genital traits are hypothesized to be an adaptation of males towards a stable copulatory grasp in groups exhibiting increased sexual dimorphism, associated with an increased risk for the male to be cannibalized during copulation. Genital characters allowed most genera to be unambiguously assigned to the major clades (superfamilies) recovered by our genital and previous molecular data. The few exceptions concern genera with secondarily simplified genitalia lacking diagnostic structures. Taxonomic literature is very heterogeneous, and several subfamilies yet lacking any modern revisionary treatment will need further refinement. To account for phylogenetic constraints, i.e. correct for past polyphyletic groupings, the number of families was elevated to 29, and the number of subfamilies to 60. We establish the new family Leptomantellidae, the new subfamilies Brancsikiinae and Deiphobinae, the new tribes Leptomiopterygini, Hagiomantini, Gonypetellini, Bolbellini, Epsomantini, Neomantini, Amantini, Armenini, Danuriellini, Deiphobini, Cotigaonopsini, Didymocoryphini, Oxyelaeini, Heterochaetulini, Rhodomantini and Pseudoxyopsidini, and the new subtribes Amphecostephanina, Bolbina, Tricondylomimina, Gonypetyllina, Antistiina, Toxomantina and Tarachomantina. New morphological diagnoses are provided for the currently recognized families. Despite a few yet to be solved problems, this work offers the urgently needed working base for future studies in Mantodean systematics, life history and ecology.  相似文献   

17.
18.
19.
It has been a challenging task to integrate high-throughput data into investigations of the systematic and dynamic organization of biological networks. Here, we presented a simple hierarchical clustering algorithm that goes a long way to achieve this aim. Our method effectively reveals the modular structure of the yeast protein-protein interaction network and distinguishes protein complexes from functional modules by integrating high-throughput protein-protein interaction data with the added subcellular localization and expression profile data. Furthermore, we take advantage of the detected modules to provide a reliably functional context for the uncharacterized components within modules. On the other hand, the integration of various protein-protein association information makes our method robust to false-positives, especially for derived protein complexes. More importantly, this simple method can be extended naturally to other types of data fusion and provides a framework for the study of more comprehensive properties of the biological network and other forms of complex networks.  相似文献   

20.
Advances in proteomics technologies have enabled novel protein interactions to be detected at high speed, but they come at the expense of relatively low quality. Therefore, a crucial step in utilizing the high throughput protein interaction data is evaluating their confidence and then separating the subsets of reliable interactions from the background noise for further analyses. Using Bayesian network approaches, we combine multiple heterogeneous biological evidences, including model organism protein-protein interaction, interaction domain, functional annotation, gene expression, genome context, and network topology structure, to assign reliability to the human protein-protein interactions identified by high throughput experiments. This method shows high sensitivity and specificity to predict true interactions from the human high throughput protein-protein interaction data sets. This method has been developed into an on-line confidence scoring system specifically for the human high throughput protein-protein interactions. Users may submit their protein-protein interaction data on line, and the detailed information about the supporting evidence for query interactions together with the confidence scores will be returned. The Web interface of PRINCESS (protein interaction confidence evaluation system with multiple data sources) is available at the website of China Human Proteome Organisation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号