首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 47 毫秒
1.
Many computational problems and methods have been proposed for analysis of biological pathways. Among them, this paper focuses on extraction of mapping rules of atoms from enzymatic reaction data, which is useful for drug design, simulation of tracer experiments, and consistency checking of pathway databases. Most of existing methods for this problem are based on maximal common subgraph algorithms. In this paper, we propose a novel approach based on graph partition and graph isomorphism. We show that this problem is NP-hard in general, but can be solved in polynomial time for wide classes of enzymatic reactions. We also present an O(n(1.5)) time algorithm for a special but fundamental class of reactions, where n is the maximum size of compounds appearing in a reaction. We develop practical polynomial-time algorithms in which the Morgan algorithm is used for computing the normal form of a graph, where it is known that the Morgan algorithm works correctly for most chemical structures. Computational experiments are performed for these practical algorithms using the chemical reaction data stored in the KEGG/LIGAND database. The results of computational experiments suggest that practical algorithms are useful in many cases.  相似文献   

2.
The KEGG databases at GenomeNet   总被引:30,自引:0,他引:30       下载免费PDF全文
The Kyoto Encyclopedia of Genes and Genomes (KEGG) is the primary database resource of the Japanese GenomeNet service (http://www.genome.ad.jp/) for understanding higher order functional meanings and utilities of the cell or the organism from its genome information. KEGG consists of the PATHWAY database for the computerized knowledge on molecular interaction networks such as pathways and complexes, the GENES database for the information about genes and proteins generated by genome sequencing projects, and the LIGAND database for the information about chemical compounds and chemical reactions that are relevant to cellular processes. In addition to these three main databases, limited amounts of experimental data for microarray gene expression profiles and yeast two-hybrid systems are stored in the EXPRESSION and BRITE databases, respectively. Furthermore, a new database, named SSDB, is available for exploring the universe of all protein coding genes in the complete genomes and for identifying functional links and ortholog groups. The data objects in the KEGG databases are all represented as graphs and various computational methods are developed to detect graph features that can be related to biological functions. For example, the correlated clusters are graph similarities which can be used to predict a set of genes coding for a pathway or a complex, as summarized in the ortholog group tables, and the cliques in the SSDB graph are used to annotate genes. The KEGG databases are updated daily and made freely available (http://www.genome.ad.jp/kegg/).  相似文献   

3.
Exploring the diversity of complex metabolic networks   总被引:1,自引:0,他引:1  
MOTIVATION: Metabolism, the network of chemical reactions that make life possible, is one of the most complex processes in nature. We describe here the development of a computational approach for the identification of every possible biochemical reaction from a given set of enzyme reaction rules that allows the de novo synthesis of metabolic pathways composed of these reactions, and the evaluation of these novel pathways with respect to their thermodynamic properties. RESULTS: We applied this framework to the analysis of the aromatic amino acid pathways and discovered almost 75,000 novel biochemical routes from chorismate to phenylalanine, more than 350,000 from chorismate to tyrosine, but only 13 from chorismate to tryptophan. Thermodynamic analysis of these pathways suggests that the native pathways are thermodynamically more favorable than the alternative possible pathways. The pathways generated involve compounds that exist in biological databases, as well as compounds that exist in chemical databases and novel compounds, suggesting novel biochemical routes for these compounds and the existence of biochemical compounds that remain to be discovered or synthesized through enzyme and pathway engineering. AVAILABILITY: Framework will be available via web interface at http://systemsbiology.northwestern.edu/BNICE (site under construction). CONTACT: vassily@northwestern.edu or broadbelt@northwestern.edu SUPPLEMENTARY INFORMATION: http://systemsbiology.northwestern.edu/BNICE/publications.  相似文献   

4.
Metabolic databases contain information about thousands of small molecules and reactions, which can be represented as networks. In the context of metabolic reconstruction, pathways can be inferred by searching optimal paths in such networks. A recurrent problem is the presence of pool metabolites (e.g., water, energy carriers, and cofactors), which are connected to hundreds of reactions, thus establishing irrelevant shortcuts between nodes of the network. One solution to this problem relies on weighted networks to penalize highly connected compounds. A more refined solution takes the chemical structure of reactants into account in order to differentiate between side and main compounds of a reaction. Thanks to an intensive annotation effort at KEGG, decompositions of reactions into reactant pairs (RPAIR) categorized by their role (main, trans, cofac, ligase, and leave) are now available.The goal of this article is to evaluate the impact of RPAIR data on pathfinding in metabolic networks. To this end, we measure the impact of different parameters concerning the construction of the metabolic network: mapping of reactions and reactant pairs onto a graph, use of selected categories of reactant pairs, weighting schemes for compounds and reactions, removal of highly connected metabolites, and reaction directionality. In total, we tested 104 combinations of parameters and identified their optimal values for pathfinding on the basis of 55 reference pathways from three organisms.The best-performing metabolic network combines the biochemical knowledge encoded by KEGG RPAIR with a weighting scheme penalizing highly connected compounds. With this network, we could recover reference pathways from Escherichia coli with an average accuracy of 93% (32 pathways), from Saccharomyces cerevisiae with an average accuracy of 66% (11 pathways), and from humans with an average accuracy of 70% (12 pathways). Our pathfinding approach is available as part of the Network Analysis Tools.  相似文献   

5.
The chemical organization of signaling interactions   总被引:2,自引:0,他引:2  
MOTIVATION: Cellular chemical signaling pathways form complex networks that are beginning to be studied at the level of chemical kinetics and databases of reactions. Chemical reaction details are traditionally represented as lists of reactions and rates. This does not map readily to the block diagram representation familiar to biologists, and obscures the functional organization of signaling networks. This study examines motifs in signaling chemistry and reports common features that may help to formalize such a mapping between pathway block diagrams and the chemistry. The same motifs may facilitate data representation and provide functional abstraction of the chemistry. RESULTS: I classified 74 interactions between 25 signaling pathways in terms of shared chemical motifs. All interactions in this dataset consist of a few communicating molecules from one set of pathways, and a replicating set of reactions and molecules from another. Each unique combination of interacting pathways duplicates the chemical reaction scheme of this replicating set, but involves different rate constants. Signaling pathways can therefore be described in an object-oriented manner as sets of core reactions with well-defined interfaces between pathways. This generalization lends itself to designing simulators and databases for signaling networks. AVAILABILITY: Software and example models are freely available from http://www.ncbs.res.in/~bhalla/examples/EGFR_example.html.  相似文献   

6.
MOTIVATION: As more genomic data becomes available there is increased attention on understanding the mechanisms encoded in the genome. New XML dialects like CellML and Systems Biology Markup Language (SBML) are being developed to describe biological networks of all types. In the absence of detailed kinetic information for these networks, stoichiometric data is an especially valuable source of information. Network databases are the next logical step beyond storing purely genomic information. Just as comparison of entries in genomic databases has been a vital algorithmic problem through the course of the sequencing project, comparison of networks in network databases will be a crucial problem as we seek to integrate higher-order network knowledge. RESULTS: We show that comparing the stoichiometric structure of two reactions systems is equivalent to the graph isomorphism problem. This is encouraging because graph isomorphism is, in practice, a tractable problem using heuristics. The analogous problem of searching for a subsystem of a reaction system is NP-complete. We also discuss heuristic issues in implementations for practical comparison of stoichiometric matrices.  相似文献   

7.
An approach is presented for computing meaningful pathways in the network of small molecule metabolism comprising the chemical reactions characterized in all organisms. The metabolic network is described as a weighted graph in which all the compounds are included, but each compound is assigned a weight equal to the number of reactions in which it participates. Path finding is performed in this graph by searching for one or more paths with lowest weight. Performance is evaluated systematically by computing paths between the first and last reactions in annotated metabolic pathways, and comparing the intermediate reactions in the computed pathways to those in the annotated ones. For the sake of comparison, paths are computed also in the un-weighted raw (all compounds and reactions) and filtered (highly connected pool metabolites removed) metabolic graphs, respectively. The correspondence between the computed and annotated pathways is very poor (<30%) in the raw graph; increasing to approximately 65% in the filtered graph; reaching approximately 85% in the weighted graph. Considering the best-matching path among the five lightest paths increases the correspondence to 92%, on average. We then show that the average distance between pairs of metabolites is significantly larger in the weighted graph than in the raw unfiltered graph, suggesting that the small-world properties previously reported for metabolic networks probably result from irrelevant shortcuts through pool metabolites. In addition, we provide evidence that the length of the shortest path in the weighted graph represents a valid measure of the "metabolic distance" between enzymes. We suggest that the success of our simplistic approach is rooted in the high degree of specificity of the reactions in metabolic pathways, presumably reflecting thermodynamic constraints operating in these pathways. We expect our approach to find useful applications in inferring metabolic pathways in newly sequenced genomes.  相似文献   

8.
AraCyc is a database containing biochemical pathways of Arabidopsis, developed at The Arabidopsis Information Resource (http://www.arabidopsis.org). The aim of AraCyc is to represent Arabidopsis metabolism as completely as possible with a user-friendly Web-based interface. It presently features more than 170 pathways that include information on compounds, intermediates, cofactors, reactions, genes, proteins, and protein subcellular locations. The database uses Pathway Tools software, which allows the users to visualize a bird's eye view of all pathways in the database down to the individual chemical structures of the compounds. The database was built using Pathway Tools' Pathologic module with MetaCyc, a collection of pathways from more than 150 species, as a reference database. This initial build was manually refined and annotated. More than 20 plant-specific pathways, including carotenoid, brassinosteroid, and gibberellin biosyntheses have been added from the literature. A list of more than 40 plant pathways will be added in the coming months. The quality of the initial, automatic build of the database was compared with the manually improved version, and with EcoCyc, an Escherichia coli database using the same software system that has been manually annotated for many years. In addition, a Perl interface, PerlCyc, was developed that allows programmers to access Pathway Tools databases from the popular Perl language. AraCyc is available at the tools section of The Arabidopsis Information Resource Web site (http://www.arabidopsis.org/tools/aracyc).  相似文献   

9.
MOTIVATION: In this paper a new algorithmic approach is presented, which automatically generates structure diagrams of molecular complexes. A complex diagram contains the ligand, the amino acids of the protein interacting with the ligand and the hydrophilic interactions schematized as dashed lines between the corresponding atoms. The algorithm is based on a combinatorial optimization strategy which solves parts of the layout problem non-heuristically. The depicted molecules are represented as structure diagrams according to the chemical nomenclature. Due to the frequent usage of complex diagrams in the scientific literature as well as in text books dealing with structural biology, biochemistry and medicinal chemistry, the new algorithm is a key element for computer applications in these areas. RESULTS: The method was implemented in the new software tool PoseView. It was tested on a representative dataset containing 305 protein-ligand complexes in total from the Brookhaven Protein Data Bank. PoseView was able to find collision-free layouts for more than three quarters of all complexes. In the following the layout generation algorithm is presented and, additional to the statistical results, representative test cases demonstrating the challenges of the layout generation will be discussed. AVAILABILITY: The method is available as a webservice at http://www.zbh.uni-hamburg.de/poseview.  相似文献   

10.
Cell-free systems containing multiple enzymes are becoming an increasingly interesting tool for one-pot syntheses of biochemical compounds. To extensively explore the enormous wealth of enzymes in the biological space, we present methods for assembling and curing data from databases to apply them for the prediction of pathway candidates for directed enzymatic synthesis. We use Kyoto Encyclopedia of Genes and Genomes to establish single organism models and a pan-organism model that is combining the available data from all organisms listed there. We introduce a filtering scheme to remove data that are not suitable, for example, generic metabolites and general reactions. In addition, a valid stoichiometry of reactions is required for acceptance. The networks created are analyzed by graph theoretical methods to identify a set of metabolites that are potentially reachable from a defined set of starting metabolites. Thus, metabolites not connected to such starting metabolites cannot be produced unless new starting metabolites or reactions are introduced. The network models also comprise stoichiometric and thermodynamic data that allow the definition of constraints to identify potential pathways. The resulting data can be directly applied using existing or future pathway finding tools.  相似文献   

11.
MOTIVATION: Graph drawing algorithms are often used for visualizing relational information, but a naive implementation of a graph drawing algorithm encounters real difficulties when drawing large-scale graphs such as protein interaction networks. RESULTS: We have developed a new, extremely fast layout algorithm for visualizing large-scale protein interaction networks in the three-dimensional space. The algorithm (1) first finds a layout of connected components of an entire network, (2) finds a global layout of nodes with respect to pivot nodes within a connected component and (3) refines the local layout of each connected component by first relocating midnodes with respect to their cutvertices and direct neighbors of the cutvertices and then by relocating all nodes with respect to their neighbors within distance 2. Advantages of this algorithm over classical graph drawing methods include: (1) it is an order of magnitude faster, (2) it can directly visualize data from protein interaction databases and (3) it provides several abstraction and comparison operations for effectively analyzing large-scale protein interaction networks. AVAILABILITY: http://wilab.inha.ac.kr/interviewer/  相似文献   

12.
Biochemical reactions form large and complex networks. Comprehensible visual representations of these networks help biochemists understand the relationships between the chemical components. Typically pathway diagrams are manually produced drawings. Because of the steady progress of knowledge and the complex relationships in these networks, automatic visualizations are necessary. Bio-Path is a system for the exploration and automatic visualization of biochemical pathways. It has been developed to obtain an electronic version of the well-known Boehringer Biochemical Pathways poster and offers new possibilities to find information and to navigate through pathways. BioPath has a specific database containing reactions and a hierarchical clustering of reactions and reaction networks. One feature is the automatic generation of pathways from the database and their high quality visualization. This paper states the requirements for the visualization of biochemical pathways, presents a layout algorithm and shows how BioPath can be used to explore biochemical reaction networks.  相似文献   

13.
Graph layout is extensively used in the field of mathematics and computer science, however these ideas and methods have not been extended in a general fashion to the construction of graphs for biological data. To this end, we have implemented a version of the Fruchterman Rheingold graph layout algorithm, extensively modified for the purpose of similarity analysis in biology. This algorithm rapidly and effectively generates clear two (2D) or three-dimensional (3D) graphs representing similarity relationships such as protein sequence similarity. The implementation of the algorithm is general and applicable to most types of similarity information for biological data. AVAILABILITY: BioLayout is available for most UNIX platforms at the following web-site: http://www.ebi.ac.uk/research/cgg/services/layout.  相似文献   

14.
A new dynamical layout algorithm for complex biochemical reaction networks   总被引:1,自引:0,他引:1  

Background  

To study complex biochemical reaction networks in living cells researchers more and more rely on databases and computational methods. In order to facilitate computational approaches, visualisation techniques are highly important. Biochemical reaction networks, e.g. metabolic pathways are often depicted as graphs and these graphs should be drawn dynamically to provide flexibility in the context of different data. Conventional layout algorithms are not sufficient for every kind of pathway in biochemical research. This is mainly due to certain conventions to which biochemists/biologists are used to and which are not in accordance to conventional layout algorithms. A number of approaches has been developed to improve this situation. Some of these are used in the context of biochemical databases and make more or less use of the information in these databases to aid the layout process. However, visualisation becomes also more and more important in modelling and simulation tools which mostly do not offer additional connections to databases. Therefore, layout algorithms used in these tools have to work independently of any databases. In addition, all of the existing algorithms face some limitations with respect to the number of edge crossings when it comes to larger biochemical systems due to the interconnectivity of these. Last but not least, in some cases, biochemical conventions are not met properly.  相似文献   

15.
Path matching and graph matching in biological networks.   总被引:2,自引:0,他引:2  
We develop algorithms for the following path matching and graph matching problems: (i) given a query path p and a graph G, find a path p' that is most similar to p in G; (ii) given a query graph G (0) and a graph G, find a graph G (0)' that is most similar to G (0) in G. In these problems, p and G (0) represent a given substructure of interest to a biologist, and G represents a large network in which the biologist desires to find a related substructure. These algorithms allow the study of common substructures in biological networks in order to understand how these networks evolve both within and between organisms. We reduce the path matching problem to finding a longest weighted path in a directed acyclic graph and show that the problem of finding top k suboptimal paths can be solved in polynomial time. This is in contrast with most previous approaches that used exponential time algorithms to find simple paths which are practical only when the paths are short. We reduce the graph matching problem to finding highest scoring subgraphs in a graph and give an exact algorithm to solve the problem when the query graph G (0) is of moderate size. This eliminates the need for less accurate heuristic or randomized algorithms.We show that our algorithms are able to extract biologically meaningful pathways from protein interaction networks in the DIP database and metabolic networks in the KEGG database. Software programs implementing these techniques (PathMatch and GraphMatch) are available at http://faculty.cs.tamu.edu/shsze/pathmatch and http://faculty.cs.tamu.edu/shsze/graphmatch.  相似文献   

16.
Molecular interaction data plays an important role in understanding biological processes at a modular level by providing a framework for understanding cellular organization, functional hierarchy, and evolutionary conservation. As the quality and quantity of network and interaction data increases rapidly, the problem of effectively analyzing this data becomes significant. Graph theoretic formalisms, commonly used for these analysis tasks, often lead to computationally hard problems due to their relation to subgraph isomorphism. This paper presents an innovative new algorithm, MULE, for detecting frequently occurring patterns and modules in biological networks. Using an innovative graph simplification technique based on ortholog contraction, which is ideally suited to biological networks, our algorithm renders these problems computationally tractable and scalable to large numbers of networks. We show, experimentally, that our algorithm can extract frequently occurring patterns in metabolic pathways and protein interaction networks from the KEGG, DIP, and BIND databases within seconds. When compared to existing approaches, our graph simplification technique can be viewed either as a pruning heuristic, or a closely related, but computationally simpler task. When used as a pruning heuristic, we show that our technique reduces effective graph sizes significantly, accelerating existing techniques by several orders of magnitude! Indeed, for most of the test cases, existing techniques could not even be applied without our pruning step. When used as a stand-alone analysis technique, MULE is shown to convey significant biological insights at near-interactive rates. The software, sample input graphs, and detailed results for comprehensive analysis of nine eukaryotic PPI networks are available at www.cs.purdue.edu/homes/koyuturk/mule.  相似文献   

17.
Plant defence signalling response against various pathogens, including viruses, is a complex phenomenon. In resistant interaction a plant cell perceives the pathogen signal, transduces it within the cell and performs a reprogramming of the cell metabolism leading to the pathogen replication arrest. This work focuses on signalling pathways crucial for the plant defence response, i.e., the salicylic acid, jasmonic acid and ethylene signal transduction pathways, in the Arabidopsis thaliana model plant. The initial signalling network topology was constructed manually by defining the representation formalism, encoding the information from public databases and literature, and composing a pathway diagram. The manually constructed network structure consists of 175 components and 387 reactions. In order to complement the network topology with possibly missing relations, a new approach to automated information extraction from biological literature was developed. This approach, named Bio3graph, allows for automated extraction of biological relations from the literature, resulting in a set of (component1, reaction, component2) triplets and composing a graph structure which can be visualised, compared to the manually constructed topology and examined by the experts. Using a plant defence response vocabulary of components and reaction types, Bio3graph was applied to a set of 9,586 relevant full text articles, resulting in 137 newly detected reactions between the components. Finally, the manually constructed topology and the new reactions were merged to form a network structure consisting of 175 components and 524 reactions. The resulting pathway diagram of plant defence signalling represents a valuable source for further computational modelling and interpretation of omics data. The developed Bio3graph approach, implemented as an executable language processing and graph visualisation workflow, is publically available at http://ropot.ijs.si/bio3graph/and can be utilised for modelling other biological systems, given that an adequate vocabulary is provided.  相似文献   

18.
Public databases that store the data from small-molecule screens are a rich and untapped resource of chemical and biological information. However, screening databases are unorganized, which makes interpreting their data difficult. We propose a method of inferring workflow graphs-which encode the relationships between assays in screening projects-directly from screening data and using these workflows to organize each project's data. On the basis of four heuristics regarding the organization of screening projects, we designed an algorithm that extracts a project's workflow graph from screening data. Where possible, the algorithm is evaluated by comparing each project's inferred workflow to its documentation. In the majority of cases, there are no discrepancies between the two. Most errors can be traced to points in the project where screeners chose additional molecules to test based on structural similarity to promising molecules, a case our algorithm is not yet capable of handling. Nonetheless, these workflows accurately organize most of the data and also provide a method of visualizing a screening project. This method is robust enough to build a workflow-oriented front-end to PubChem and is currently being used regularly by both our lab and our collaborators. A Python implementation of the algorithm is available online, and a searchable database of all PubChem workflows is available at http://swami.wustl.edu/flow.  相似文献   

19.
This paper provides an overview of the research that has been carried out in Sheffield over the last decade into searching techniques for databases of three-dimensional (3D) chemical structures. A 3D structure or query pattern is represented by a labelled graph, in which the nodes and the edges of the graph are used to represent atoms and the associated inter-atomic distances, respectively. The presence of a pharmacophore in each of the structures in a database can then be tested by means of a subgraph isomorphism algorithm, the computational requirements of which are minimized by the use of an initial screening procedure that eliminates the majority of the structures from the subgraph-isomorphism search. Analogous graph-based representation and searching methods can also be used with flexible 3D structures: in this case, the edges of the graphs represent inter-atomic distance ranges and a final conformational search needs to be carried out for those molecules that match the query pharmacophore in the subgraph-isomorphism search. The paper also reviews related work on the automatic identification of pharmacophoric patterns and on 3D similarity searching.  相似文献   

20.

Background  

Network visualization would serve as a useful first step for analysis. However, current graph layout algorithms for biological pathways are insensitive to biologically important information, e.g. subcellular localization, biological node and graph attributes, or/and not available for large scale networks, e.g. more than 10000 elements.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号