首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Path matching and graph matching in biological networks.   总被引:2,自引:0,他引:2  
We develop algorithms for the following path matching and graph matching problems: (i) given a query path p and a graph G, find a path p' that is most similar to p in G; (ii) given a query graph G (0) and a graph G, find a graph G (0)' that is most similar to G (0) in G. In these problems, p and G (0) represent a given substructure of interest to a biologist, and G represents a large network in which the biologist desires to find a related substructure. These algorithms allow the study of common substructures in biological networks in order to understand how these networks evolve both within and between organisms. We reduce the path matching problem to finding a longest weighted path in a directed acyclic graph and show that the problem of finding top k suboptimal paths can be solved in polynomial time. This is in contrast with most previous approaches that used exponential time algorithms to find simple paths which are practical only when the paths are short. We reduce the graph matching problem to finding highest scoring subgraphs in a graph and give an exact algorithm to solve the problem when the query graph G (0) is of moderate size. This eliminates the need for less accurate heuristic or randomized algorithms.We show that our algorithms are able to extract biologically meaningful pathways from protein interaction networks in the DIP database and metabolic networks in the KEGG database. Software programs implementing these techniques (PathMatch and GraphMatch) are available at http://faculty.cs.tamu.edu/shsze/pathmatch and http://faculty.cs.tamu.edu/shsze/graphmatch.  相似文献   

2.
Graph models of habitat mosaics   总被引:7,自引:0,他引:7  
Graph theory is a body of mathematics dealing with problems of connectivity, flow, and routing in networks ranging from social groups to computer networks. Recently, network applications have erupted in many fields, and graph models are now being applied in landscape ecology and conservation biology, particularly for applications couched in metapopulation theory. In these applications, graph nodes represent habitat patches or local populations and links indicate functional connections among populations (i.e. via dispersal). Graphs are models of more complicated real systems, and so it is appropriate to review these applications from the perspective of modelling in general. Here we review recent applications of network theory to habitat patches in landscape mosaics. We consider (1) the conceptual model underlying these applications; (2) formalization and implementation of the graph model; (3) model parameterization; (4) model testing, insights, and predictions available through graph analyses; and (5) potential implications for conservation biology and related applications. In general, and for a variety of ecological systems, we find the graph model a remarkably robust framework for applications concerned with habitat connectivity. We close with suggestions for further work on the parameterization and validation of graph models, and point to some promising analytic insights.  相似文献   

3.
4.
An approach is presented for computing meaningful pathways in the network of small molecule metabolism comprising the chemical reactions characterized in all organisms. The metabolic network is described as a weighted graph in which all the compounds are included, but each compound is assigned a weight equal to the number of reactions in which it participates. Path finding is performed in this graph by searching for one or more paths with lowest weight. Performance is evaluated systematically by computing paths between the first and last reactions in annotated metabolic pathways, and comparing the intermediate reactions in the computed pathways to those in the annotated ones. For the sake of comparison, paths are computed also in the un-weighted raw (all compounds and reactions) and filtered (highly connected pool metabolites removed) metabolic graphs, respectively. The correspondence between the computed and annotated pathways is very poor (<30%) in the raw graph; increasing to approximately 65% in the filtered graph; reaching approximately 85% in the weighted graph. Considering the best-matching path among the five lightest paths increases the correspondence to 92%, on average. We then show that the average distance between pairs of metabolites is significantly larger in the weighted graph than in the raw unfiltered graph, suggesting that the small-world properties previously reported for metabolic networks probably result from irrelevant shortcuts through pool metabolites. In addition, we provide evidence that the length of the shortest path in the weighted graph represents a valid measure of the "metabolic distance" between enzymes. We suggest that the success of our simplistic approach is rooted in the high degree of specificity of the reactions in metabolic pathways, presumably reflecting thermodynamic constraints operating in these pathways. We expect our approach to find useful applications in inferring metabolic pathways in newly sequenced genomes.  相似文献   

5.
Three dimensional DNA structures in computing   总被引:13,自引:0,他引:13  
Jonoska N  Karl SA  Saito M 《Bio Systems》1999,52(1-3):143-153
We show that 3-dimensional graph structures can be used for solving computational problems with DNA molecules. Vertex building blocks consisting of k-armed (k = 3 or 4) branched junction molecules are used to form graphs. We present procedures for the 3-SAT and 3-vertex-colorability problems. Construction of one graph structure (in many copies) is sufficient to determine the solution to the problem. In our proposed procedure for 3-SAT, the number of steps required is equal to the number of variables in the formula. For the 3-vertex-colorability problem, the procedure requires a constant number of steps regardless of the size of the graph.  相似文献   

6.
We present an RNA-As-Graphs (RAG) based inverse folding algorithm, RAG-IF, to design novel RNA sequences that fold onto target tree graph topologies. The algorithm can be used to enhance our recently reported computational design pipeline (Jain et al., NAR 2018). The RAG approach represents RNA secondary structures as tree and dual graphs, where RNA loops and helices are coarse-grained as vertices and edges, opening the usage of graph theory methods to study, predict, and design RNA structures. Our recently developed computational pipeline for design utilizes graph partitioning (RAG-3D) and atomic fragment assembly (F-RAG) to design sequences to fold onto RNA-like tree graph topologies; the atomic fragments are taken from existing RNA structures that correspond to tree subgraphs. Because F-RAG may not produce the target folds for all designs, automated mutations by RAG-IF algorithm enhance the candidate pool markedly. The crucial residues for mutation are identified by differences between the predicted and the target topology. A genetic algorithm then mutates the selected residues, and the successful sequences are optimized to retain only the minimal or essential mutations. Here we evaluate RAG-IF for 6 RNA-like topologies and generate a large pool of successful candidate sequences with a variety of minimal mutations. We find that RAG-IF adds robustness and efficiency to our RNA design pipeline, making inverse folding motivated by graph topology rather than secondary structure more productive.  相似文献   

7.
Annotation of cells in single-cell clustering requires a homogeneous grouping of cell populations. There are various issues in single cell sequencing that effect homogeneous grouping (clustering) of cells, such as small amount of starting RNA, limited per-cell sequenced reads, cell-to-cell variability due to cell-cycle, cellular morphology, and variable reagent concentrations. Moreover, single cell data is susceptible to technical noise, which affects the quality of genes (or features) selected/extracted prior to clustering.Here we introduce sc-CGconv (copula based graph convolution network for single clustering), a stepwise robust unsupervised feature extraction and clustering approach that formulates and aggregates cell–cell relationships using copula correlation (Ccor), followed by a graph convolution network based clustering approach. sc-CGconv formulates a cell-cell graph using Ccor that is learned by a graph-based artificial intelligence model, graph convolution network. The learned representation (low dimensional embedding) is utilized for cell clustering. sc-CGconv features the following advantages. a. sc-CGconv works with substantially smaller sample sizes to identify homogeneous clusters. b. sc-CGconv can model the expression co-variability of a large number of genes, thereby outperforming state-of-the-art gene selection/extraction methods for clustering. c. sc-CGconv preserves the cell-to-cell variability within the selected gene set by constructing a cell-cell graph through copula correlation measure. d. sc-CGconv provides a topology-preserving embedding of cells in low dimensional space.  相似文献   

8.
We describe several analytical techniques for use in developing genetic models of oncogenesis including: methods for the selection of important genetic events, construction of graph models (including distance-based trees, branching trees, contingency trees and directed acyclic graph models) from these events and methods for interpretation of the resulting models. The models can be used to make predictions about: which genetic events tend to occur early, which events tend to occur together and the likely order of events. Unlike simple path models of oncogenesis, our models allow dependencies to exist between specific genetic changes and allow for multiple, divergent paths in tumor progression. A variety of genetic events can be used with the graph models including chromosome breaks, losses or gains of large DNA regions, small mutations and changes in methylation. As an application of the techniques, we use a recently published cytogenetic analysis of 206 melanoma cases [Nelson et al. (2000), Cancer Genet. Cytogenet.122, 101-109] to derive graph models for chromosome breaks in melanoma. Among our predictions are: (1) breaks in 6q1 and 1q1 are early events, with 6q1 preferentially occurring first and increasing the probability of a break in 1q1 and (2) breaks in the two sets [1p1, 1p2, 9q1] and [1q1, 7p2, 9p2] tend to occur together. This study illustrates that the application of graph models to genetic data from tumor sets provide new information on the interrelationships among genetic changes during tumor progression.  相似文献   

9.
Graph representations have been widely used to analyze and design various economic, social, military, political, and biological networks. In systems biology, networks of cells and organs are useful for understanding disease and medical treatments and, in structural biology, structures of molecules can be described, including RNA structures. In our RNA-As-Graphs (RAG) framework, we represent RNA structures as tree graphs by translating unpaired regions into vertices and helices into edges. Here we explore the modularity of RNA structures by applying graph partitioning known in graph theory to divide an RNA graph into subgraphs. To our knowledge, this is the first application of graph partitioning to biology, and the results suggest a systematic approach for modular design in general. The graph partitioning algorithms utilize mathematical properties of the Laplacian eigenvector (µ2) corresponding to the second eigenvalues (λ2) associated with the topology matrix defining the graph: λ2 describes the overall topology, and the sum of µ2′s components is zero. The three types of algorithms, termed median, sign, and gap cuts, divide a graph by determining nodes of cut by median, zero, and largest gap of µ2′s components, respectively. We apply these algorithms to 45 graphs corresponding to all solved RNA structures up through 11 vertices (∼220 nucleotides). While we observe that the median cut divides a graph into two similar-sized subgraphs, the sign and gap cuts partition a graph into two topologically-distinct subgraphs. We find that the gap cut produces the best biologically-relevant partitioning for RNA because it divides RNAs at less stable connections while maintaining junctions intact. The iterative gap cuts suggest basic modules and assembly protocols to design large RNA structures. Our graph substructuring thus suggests a systematic approach to explore the modularity of biological networks. In our applications to RNA structures, subgraphs also suggest design strategies for novel RNA motifs.  相似文献   

10.
In Earth surface systems (ESS), everything is connected to everything else, an aphorism often called the First Law of Ecology and of geography. Such linkages are not always direct and unmediated, but many ESS, represented as networks of interacting components, attain or approach full, direct connectivity among components. The question is how and why this happens at the system or network scale. The crowded landscape concept dictates that linkages and connections among ESS components are inevitable. The connection selection concept holds that the linkages among components are (often) advantageous to the network and are selected for, and thereby preserved and enhanced. These network advantages are illustrated via algebraic graph theory. For a given number of components in an ESS, as the number of links or connections increases, spectral radius, graph energy, and algebraic connectivity increase. While the advantages (if any) of increased complexity are unclear, higher spectral radii are directly correlated with higher graph energy. The greater graph energy is associated with more intense feedback in the system, and tighter coupling among components. This in turn reflects advantageous properties of more intense cycling of water, nutrients, and minerals, as well as multiple potential degrees of freedom for individual components to respond to changes. The increase of algebraic connectivity reflects a greater ability or tendency for the network to respond to changes in concert.  相似文献   

11.
This paper explores relationships between classical and parametric measures of graph (or network) complexity. Classical measures are based on vertex decompositions induced by equivalence relations. Parametric measures, on the other hand, are constructed by using information functions to assign probabilities to the vertices. The inequalities established in this paper relating classical and parametric measures lay a foundation for systematic classification of entropy-based measures of graph complexity.  相似文献   

12.
The idea of assigning an information content to a graph is extended to include the following two situations: (a) Combinations of sets of topologically equivalent points are used as symbols; (b) Points of the graph may exist in different states.  相似文献   

13.
ABSTRACT: BACKGROUND: Identification of protein structural cores requires isolation of sets of proteins all sharing a same subset of structural motifs. In the context of ever growing number of available 3D protein structures, standard and automatic clustering algorithms require adaptations so as to allow for efficient identification of such sets of proteins. RESULTS: When considering a pair of 3D structures, they are stated as similar or not according to the local similarities of their matching substructures in a structural alignment. This binary relation can be represented in a graph of similarities where a node represents a 3D protein structure and an edge states that two 3D protein structures are similar. Therefore, the classification of proteins into structural families can be viewed as graph clustering task. Unfortunately, because such a graph encodes only pairwise similarity information, clustering algorithms may group in the same cluster a subset of 3D structures that do not share a common substructure. To overcome this drawback we first define a ternary similarity on a triple of 3D structures as a constraint to be satisfied by the graph of similarities. Such a ternary constraint takes into account similarities between pairwise alignments, so as to ensure that the three involved protein structures do have some common substructure. We propose hereunder a modification algorithm that eliminates edges from the original graph of similarities and outputs a reduced graph in which no ternary constraints are violated. Our proposition is then first to build a graph of similarities, then to reduce the graph according to the modification algorithm, and finally to apply to the reduced graph a standard graph clustering algorithm. We applied this method to ASTRAL-40 non-redundant protein domains, identifying significant pairwise similarities with Yakusa, a program devised for rapid 3D structure alignments. CONCLUSIONS: We show that filtering similarities prior to standard graph based clustering process by applying ternary similarity constraints i) improves the separation of proteins of different classes and consequently ii) improves the classification quality of standard graph based clustering algorithms according to the reference classification SCOP.  相似文献   

14.
高梅香  朱家祺  刘爽  程鑫  刘冬  李彦胜 《生态学报》2023,43(16):6862-6877
土壤动物学面临以全新知识体系为科学研究框架的变革时期,其核心内容是以数据驱动为主要特征的人工智能技术方法。目前广泛应用的基于数据库的数据处理分析方法,面临着数据多源异构、快速增长和处理能力不足之间的矛盾。基于快速发展的大数据科学和人工智能技术的数据挖掘方法在解决前述矛盾中有突出优势,但需要依赖一个强大的领域知识库,然而土壤动物领域知识图谱的研究十分匮乏。土壤动物知识图谱是一个具有有向图结构的知识库,其中图的节点代表与土壤动物相关的实体或概念,图的边代表实体或概念之间的各种语义关系。提出了土壤动物知识图谱的定义、内涵、理论模型和构建方法,以浙江天目山土壤螨类多样性为例,分析了构建山地土壤动物知识图谱的技术方法;以土壤动物多样性研究关注的物种分布、物种共存、环境条件对物种的影响作用为例,探讨了基于山地土壤动物知识图谱可以解决的相关科学问题。研究表明,土壤动物知识图谱在解决生物多样性重要科学问题方面具有独特的潜力和优势,有力推动了土壤动物学、信息科学和数据科学交叉的土壤动物信息学的发展。  相似文献   

15.
Dense subgraphs of Protein-Protein Interaction (PPI) graphs are assumed to be potential functional modules and play an important role in inferring the functional behavior of proteins. Increasing amount of available PPI data implies a fast, accurate approach of biological complex identification. Therefore, there are different models and algorithms in identifying functional modules. This paper describes a new graph theoretic clustering algorithm that detects densely connected regions in a large PPI graph. The method is based on finding bounded diameter subgraphs around a seed node. The algorithm has the advantage of being very simple and efficient when compared with other graph clustering methods. This algorithm is tested on the yeast PPI graph and the results are compared with MCL, Core-Attachment, and MCODE algorithms.  相似文献   

16.
景观结构和空间格局一直是景观生态学的核心问题,图论的应用为景观格局的分析提供了一种研究框架,基于图论的景观图逐渐被应用于生物多样性保护的连通性建模和景观规划的决策支持研究,景观图的表达、分析和应用已成为保护生物学和景观生态学研究的热点。本文首先介绍了景观图的图论基础,在Scopus数据库的基础上,检索了1993—2019年在标题、摘要和关键词中出现 “landscape graph”、“connectivity”和“network”词汇的257篇已发表的期刊论文。从年发文量、来源期刊、研究区域、研究机构、景观类型等方面分析了该领域的研究进展和发展趋势。分析发现,2017年之前,发表的期刊论文数量整体呈上升趋势,2017年之后年发文量逐年下降;主要研究力量集中在美国、西班牙、法国、加拿大和中国,发文量占到86.8%。大部分研究成果发表在“Landscape Ecology”、“Landscape and Urban Planning”和“Biological Conservation”期刊上。在研究内容上,景观图表达主要包括点的定义、边的度量和景观的模拟3方面,景观图分析研究包括分析指数、景观图划分两方面。我们重点关注了景观图在生物多样性保护、景观(生态网络)规划和管理、景观影响评价等科学与实践中的应用。基于图论的景观图通过帮助理解景观连通性变化、动物行为和栖息地保护,影响着保护科学和规划实践者。图论对保护科学和规划的影响来自于丰富的理论基础和成熟的研究方法,基于图论的景观图为景观结构和格局的生态学理解提供了跳板,并将继续成为全球研究人员和实践者的重要工具。  相似文献   

17.

Background and scope

Large networks, such as protein interaction networks, are extremely difficult to analyze as a whole. We developed Clust&See, a Cytoscape plugin dedicated to the identification, visualization and analysis of clusters extracted from such networks.

Implementation and performance

Clust&See provides the ability to apply three different, recently developed graph clustering algorithms to networks and to visualize: (i) the obtained partition as a quotient graph in which nodes correspond to clusters and (ii) the obtained clusters as their corresponding subnetworks. Importantly, tools for investigating the relationships between clusters and vertices as well as their organization within the whole graph are supplied.  相似文献   

18.
SAGA: a subgraph matching tool for biological graphs   总被引:4,自引:0,他引:4  
MOTIVATION: With the rapid increase in the availability of biological graph datasets, there is a growing need for effective and efficient graph querying methods. Due to the noisy and incomplete characteristics of these datasets, exact graph matching methods have limited use and approximate graph matching methods are required. Unfortunately, existing graph matching methods are too restrictive as they only allow exact or near exact graph matching. This paper presents a novel approximate graph matching technique called SAGA. This technique employs a flexible model for computing graph similarity, which allows for node gaps, node mismatches and graph structural differences. SAGA employs an indexing technique that allows it to efficiently evaluate queries even against large graph datasets. RESULTS: SAGA has been used to query biological pathways and literature datasets, which has revealed interesting similarities between distinct pathways that cannot be found by existing methods. These matches associate seemingly unrelated biological processes, connect studies in different sub-areas of biomedical research and thus pose hypotheses for new discoveries. SAGA is also orders of magnitude faster than existing methods. AVAILABILITY: SAGA can be accessed freely via the web at http://www.eecs.umich.edu/saga. Binaries are also freely available at this website.  相似文献   

19.
We study nonlinear electrical oscillator networks, the smallest example of which consists of a voltage-dependent capacitor, an inductor, and a resistor driven by a pure tone source. By allowing the network topology to be that of any connected graph, such circuits generalize spatially discrete nonlinear transmission lines/lattices that have proven useful in high-frequency analog devices. For such networks, we develop two algorithms to compute the steady-state response when a subset of nodes are driven at the same fixed frequency. The algorithms we devise are orders of magnitude more accurate and efficient than stepping towards the steady-state using a standard numerical integrator. We seek to enhance a given network''s nonlinear behavior by altering the eigenvalues of the graph Laplacian, i.e., the resonances of the linearized system. We develop a Newton-type method that solves for the network inductances such that the graph Laplacian achieves a desired set of eigenvalues; this method enables one to move the eigenvalues while keeping the network topology fixed. Running numerical experiments using three different random graph models, we show that shrinking the gap between the graph Laplacian''s first two eigenvalues dramatically improves a network''s ability to (i) transfer energy to higher harmonics, and (ii) generate large-amplitude signals. Our results shed light on the relationship between a network''s structure, encoded by the graph Laplacian, and its function, defined in this case by the presence of strongly nonlinear effects in the frequency response.  相似文献   

20.
MOTIVATION: Backbone resonance assignment is a critical bottleneck in studies of protein structure, dynamics and interactions by nuclear magnetic resonance (NMR) spectroscopy. A minimalist approach to assignment, which we call 'contact-based', seeks to dramatically reduce experimental time and expense by replacing the standard suite of through-bond experiments with the through-space (nuclear Overhauser enhancement spectroscopy, NOESY) experiment. In the contact-based approach, spectral data are represented in a graph with vertices for putative residues (of unknown relation to the primary sequence) and edges for hypothesized NOESY interactions, such that observed spectral peaks could be explained if the residues were 'close enough'. Due to experimental ambiguity, several incorrect edges can be hypothesized for each spectral peak. An assignment is derived by identifying consistent patterns of edges (e.g. for alpha-helices and beta-sheets) within a graph and by mapping the vertices to the primary sequence. The key algorithmic challenge is to be able to uncover these patterns even when they are obscured by significant noise. RESULTS: This paper develops, analyzes and applies a novel algorithm for the identification of polytopes representing consistent patterns of edges in a corrupted NOESY graph. Our randomized algorithm aggregates simplices into polytopes and fixes inconsistencies with simple local modifications, called rotations, that maintain most of the structure already uncovered. In characterizing the effects of experimental noise, we employ an NMR-specific random graph model in proving that our algorithm gives optimal performance in expected polynomial time, even when the input graph is significantly corrupted. We confirm this analysis in simulation studies with graphs corrupted by up to 500% noise. Finally, we demonstrate the practical application of the algorithm on several experimental beta-sheet datasets. Our approach is able to eliminate a large majority of noise edges and to uncover large consistent sets of interactions. AVAILABILITY: Our algorithm has been implemented in the platform-independent Python code. The software can be freely obtained for academic use by request from the authors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号