首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Topological properties of networks are widely applied to study the link-prediction problem recently. Common Neighbors, for example, is a natural yet efficient framework. Many variants of Common Neighbors have been thus proposed to further boost the discriminative resolution of candidate links. In this paper, we reexamine the role of network topology in predicting missing links from the perspective of information theory, and present a practical approach based on the mutual information of network structures. It not only can improve the prediction accuracy substantially, but also experiences reasonable computing complexity.  相似文献   

2.
Complex networks underlie an enormous variety of social, biological, physical, and virtual systems. A profound complication for the science of complex networks is that in most cases, observing all nodes and all network interactions is impossible. Previous work addressing the impacts of partial network data is surprisingly limited, focuses primarily on missing nodes, and suggests that network statistics derived from subsampled data are not suitable estimators for the same network statistics describing the overall network topology. We generate scaling methods to predict true network statistics, including the degree distribution, from only partial knowledge of nodes, links, or weights. Our methods are transparent and do not assume a known generating process for the network, thus enabling prediction of network statistics for a wide variety of applications. We validate analytical results on four simulated network classes and empirical data sets of various sizes. We perform subsampling experiments by varying proportions of sampled data and demonstrate that our scaling methods can provide very good estimates of true network statistics while acknowledging limits. Lastly, we apply our techniques to a set of rich and evolving large-scale social networks, Twitter reply networks. Based on 100 million tweets, we use our scaling techniques to propose a statistical characterization of the Twitter Interactome from September 2008 to November 2008. Our treatment allows us to find support for Dunbar''s hypothesis in detecting an upper threshold for the number of active social contacts that individuals maintain over the course of one week.  相似文献   

3.
Networks play a prominent role in the study of complex systems of interacting entities in biology, sociology, and economics. Despite this diversity, we demonstrate here that a statistical model decomposing networks into matching and centrality components provides a comprehensive and unifying quantification of their architecture. The matching term quantifies the assortative structure in which node makes links with which other node, whereas the centrality term quantifies the number of links that nodes make. We show, for a diverse set of networks, that this decomposition can provide a tight fit to observed networks. Then we provide three applications. First, we show that the model allows very accurate prediction of missing links in partially known networks. Second, when node characteristics are known, we show how the matching–centrality decomposition can be related to this external information. Consequently, it offers us a simple and versatile tool to explore how node characteristics explain network architecture. Finally, we demonstrate the efficiency and flexibility of the model to forecast the links that a novel node would create if it were to join an existing network.  相似文献   

4.
Predicting the biological function of all the genes of an organism is one of the fundamental goals of computational system biology. In the last decade, high-throughput experimental methods for studying the functional interactions between gene products (GPs) have been combined with computational approaches based on Bayesian networks for data integration. The result of these computational approaches is an interaction network with weighted links representing connectivity likelihood between two functionally related GPs. The weighted network generated by these computational approaches can be used to predict annotations for functionally uncharacterized GPs. Here we introduce Weighted Network Predictor (WNP), a novel algorithm for function prediction of biologically uncharacterized GPs. Tests conducted on simulated data show that WNP outperforms other 5 state-of-the-art methods in terms of both specificity and sensitivity and that it is able to better exploit and propagate the functional and topological information of the network. We apply our method to Saccharomyces cerevisiae yeast and Arabidopsis thaliana networks and we predict Gene Ontology function for about 500 and 10000 uncharacterized GPs respectively.  相似文献   

5.
The problem of link prediction has recently received increasing attention from scholars in network science. In social network analysis, one of its aims is to recover missing links, namely connections among actors which are likely to exist but have not been reported because data are incomplete or subject to various types of uncertainty. In the field of criminal investigations, problems of incomplete information are encountered almost by definition, given the obvious anti-detection strategies set up by criminals and the limited investigative resources. In this paper, we work on a specific dataset obtained from a real investigation, and we propose a strategy to identify missing links in a criminal network on the basis of the topological analysis of the links classified as marginal, i.e. removed during the investigation procedure. The main assumption is that missing links should have opposite features with respect to marginal ones. Measures of node similarity turn out to provide the best characterization in this sense. The inspection of the judicial source documents confirms that the predicted links, in most instances, do relate actors with large likelihood of co-participation in illicit activities.  相似文献   

6.
7.
Ecological networks are complexes of interacting species, but not all potential links among species are realized. Unobserved links are either missing or forbidden. Missing links exist, but require more sampling or alternative ways of detection to be verified. Forbidden links remain unobservable, irrespective of sampling effort. They are caused by linkage constraints. We studied one Arctic pollination network and two Mediterranean seed-dispersal networks. In the first, for example, we recorded flower-visit links for one full season, arranged data in an interaction matrix and got a connectance C of 15 per cent. Interaction accumulation curves documented our sampling of interactions through observation of visits to be robust. Then, we included data on pollen from the body surface of flower visitors as an additional link ‘currency’. This resulted in 98 new links, missing from the visitation data. Thus, the combined visit–pollen matrix got an increased C of 20 per cent. For the three networks, C ranged from 20 to 52 per cent, and thus the percentage of unobserved links (100 − C) was 48 to 80 per cent; these were assumed forbidden because of linkage constraints and not missing because of under-sampling. Phenological uncoupling (i.e. non-overlapping phenophases between interacting mutualists) is one kind of constraint, and it explained 22 to 28 per cent of all possible, but unobserved links. Increasing phenophase overlap between species increased link probability, but extensive overlaps were required to achieve a high probability. Other kinds of constraint, such as size mismatch and accessibility limitations, are briefly addressed.  相似文献   

8.
The link-prediction problem is an open issue in data mining and knowledge discovery, which attracts researchers from disparate scientific communities. A wealth of methods have been proposed to deal with this problem. Among these approaches, most are applied in unweighted networks, with only a few taking the weights of links into consideration. In this paper, we present a weighted model for undirected and weighted networks based on the mutual information of local network structures, where link weights are applied to further enhance the distinguishable extent of candidate links. Empirical experiments are conducted on four weighted networks, and results show that the proposed method can provide more accurate predictions than not only traditional unweighted indices but also typical weighted indices. Furthermore, some in-depth discussions on the effects of weak ties in link prediction as well as the potential to predict link weights are also given. This work may shed light on the design of algorithms for link prediction in weighted networks.  相似文献   

9.
10.
Community detection is the process of assigning nodes and links in significant communities (e.g. clusters, function modules) and its development has led to a better understanding of complex networks. When applied to sizable networks, we argue that most detection algorithms correctly identify prominent communities, but fail to do so across multiple scales. As a result, a significant fraction of the network is left uncharted. We show that this problem stems from larger or denser communities overshadowing smaller or sparser ones, and that this effect accounts for most of the undetected communities and unassigned links. We propose a generic cascading approach to community detection that circumvents the problem. Using real and artificial network datasets with three widely used community detection algorithms, we show how a simple cascading procedure allows for the detection of the missing communities. This work highlights a new detection limit of community structure, and we hope that our approach can inspire better community detection algorithms.  相似文献   

11.
Application of network theory to potential mycorrhizal networks   总被引:5,自引:0,他引:5  
The concept of a common mycorrhizal network implies that the arrangement of plants and mycorrhizal fungi in a community shares properties with other networks. A network is a system of nodes connected by links. Here we apply network theory to mycorrhizas to determine whether the architecture of a potential common mycorrhizal network is random or scale-free. We analyzed mycorrhizal data from an oak woodland from two perspectives: the phytocentric view using trees as nodes and fungi as links and the mycocentric view using fungi as nodes and trees as links. From the phytocentric perspective, the distribution of potential mycorrhizal links, as measured by the number of ectomycorrhizal morphotypes on trees of Quercus garryana, was random with a short tail, implying that all the individuals of this species are more or less equal in linking to fungi in a potential network. From the mycocentric perspective, however, the distribution of plant links to fungi was scale-free, suggesting that certain fungus species may act as hubs with frequent connections to the network. Parallels exist between social networks and mycorrhizas that suggest future lines of study on mycorrhizal networks.  相似文献   

12.
Modeling has contributed a great deal to our understanding of how individual neurons and neuronal networks function. In this review, we focus on models of the small neuronal networks of invertebrates, especially rhythmically active CPG networks. Models have elucidated many aspects of these networks, from identifying key interacting membrane properties to pointing out gaps in our understanding, for example missing neurons. Even the complex CPGs of vertebrates, such as those that underlie respiration, have been reduced to small network models to great effect. Modeling of these networks spans from simplified models, which are amenable to mathematical analyses, to very complicated biophysical models. Some researchers have now adopted a population approach, where they generate and analyze many related models that differ in a few to several judiciously chosen free parameters; often these parameters show variability across animals and thus justify the approach. Models of small neuronal networks will continue to expand and refine our understanding of how neuronal networks in all animals program motor output, process sensory information and learn.  相似文献   

13.
The group model is a useful tool to understand broad-scale patterns of interaction in a network, but it has previously been limited in use to food webs, which contain only predator-prey interactions. Natural populations interact with each other in a variety of ways and, although most published ecological networks only include information about a single interaction type (e.g., feeding, pollination), ecologists are beginning to consider networks which combine multiple interaction types. Here we extend the group model to signed directed networks such as ecological interaction webs. As a specific application of this method, we examine the effects of including or excluding specific interaction types on our understanding of species roles in ecological networks. We consider all three currently available interaction webs, two of which are extended plant-mutualist networks with herbivores and parasitoids added, and one of which is an extended intertidal food web with interactions of all possible sign structures (+/+, -/0, etc.). Species in the extended food web grouped similarly with all interactions, only trophic links, and only nontrophic links. However, removing mutualism or herbivory had a much larger effect in the extended plant-pollinator webs. Species removal even affected groups that were not directly connected to those that were removed, as we found by excluding a small number of parasitoids. These results suggest that including additional species in the network provides far more information than additional interactions for this aspect of network structure. Our methods provide a useful framework for simplifying networks to their essential structure, allowing us to identify generalities in network structure and better understand the roles species play in their communities.  相似文献   

14.
Human tissues have distinct biological functions. Many proteins/enzymes are known to be expressed only in specific tissues and therefore the metabolic networks in various tissues are different. Though high quality global human metabolic networks and metabolic networks for certain tissues such as liver have already been studied, a systematic study of tissue specific metabolic networks for all main tissues is still missing. In this work, we reconstruct the tissue specific metabolic networks for 15 main tissues in human based on the previously reconstructed Edinburgh Human Metabolic Network (EHMN). The tissue information is firstly obtained for enzymes from Human Protein Reference Database (HPRD) and UniprotKB databases and transfers to reactions through the enzyme-reaction relationships in EHMN. As our knowledge of tissue distribution of proteins is still very limited, we replenish the tissue information of the metabolic network based on network connectivity analysis and thorough examination of the literature. Finally, about 80% of proteins and reactions in EHMN are determined to be in at least one of the 15 tissues. To validate the quality of the tissue specific network, the brain specific metabolic network is taken as an example for functional module analysis and the results reveal that the function of the brain metabolic network is closely related with its function as the centre of the human nervous system. The tissue specific human metabolic networks are available at .  相似文献   

15.

Background and motivations

Module identification has been studied extensively in order to gain deeper understanding of complex systems, such as social networks as well as biological networks. Modules are often defined as groups of vertices in these networks that are topologically cohesive with similar interaction patterns with the rest of the vertices. Most of the existing module identification algorithms assume that the given networks are faithfully measured without errors. However, in many real-world applications, for example, when analyzing protein-protein interaction networks from high-throughput profiling techniques, there is significant noise with both false positive and missing links between vertices. In this paper, we propose a new model for more robust module identification by taking advantage of multiple observed networks with significant noise so that signals in multiple networks can be strengthened and help improve the solution quality by combining information from various sources.

Methods

We adopt a hierarchical Bayesian model to integrate multiple noisy snapshots that capture the underlying modular structure of the networks under study. By introducing a latent root assignment matrix and its relations to instantaneous module assignments in all the observed networks to capture the underlying modular structure and combine information across multiple networks, an efficient variational Bayes algorithm can be derived to accurately and robustly identify the underlying modules from multiple noisy networks.

Results

Experiments on synthetic and protein-protein interaction data sets show that our proposed model enhances both the accuracy and resolution in detecting cohesive modules, and it is less vulnerable to noise in the observed data. In addition, it shows higher power in predicting missing edges compared to individual-network methods.
  相似文献   

16.
Quantitative time-series observation of gene expression is becoming possible, for example by cell array technology. However, there are no practical methods with which to infer network structures using only observed time-series data. As most computational models of biological networks for continuous time-series data have a high degree of freedom, it is almost impossible to infer the correct structures. On the other hand, it has been reported that some kinds of biological networks, such as gene networks and metabolic pathways, may have scale-free properties. We hypothesize that the architecture of inferred biological network models can be restricted to scale-free networks. We developed an inference algorithm for biological networks using only time-series data by introducing such a restriction. We adopt the S-system as the network model, and a distributed genetic algorithm to optimize models to fit its simulated results to observed time series data. We have tested our algorithm on a case study (simulated data). We compared optimization under no restriction, which allows for a fully connected network, and under the restriction that the total number of links must equal that expected from a scale free network. The restriction reduced both false positive and false negative estimation of the links and also the differences between model simulation and the given time-series data.  相似文献   

17.
Many large network data sets are noisy and contain links representing low-intensity relationships that are difficult to differentiate from random interactions. This is especially relevant for high-throughput data from systems biology, large-scale ecological data, but also for Web 2.0 data on human interactions. In these networks with missing and spurious links, it is possible to refine the data based on the principle of structural similarity, which assesses the shared neighborhood of two nodes. By using similarity measures to globally rank all possible links and choosing the top-ranked pairs, true links can be validated, missing links inferred, and spurious observations removed. While many similarity measures have been proposed to this end, there is no general consensus on which one to use. In this article, we first contribute a set of benchmarks for complex networks from three different settings (e-commerce, systems biology, and social networks) and thus enable a quantitative performance analysis of classic node similarity measures. Based on this, we then propose a new methodology for link assessment called z* that assesses the statistical significance of the number of their common neighbors by comparison with the expected value in a suitably chosen random graph model and which is a consistently top-performing algorithm for all benchmarks. In addition to a global ranking of links, we also use this method to identify the most similar neighbors of each single node in a local ranking, thereby showing the versatility of the method in two distinct scenarios and augmenting its applicability. Finally, we perform an exploratory analysis on an oceanographic plankton data set and find that the distribution of microbes follows similar biogeographic rules as those of macroorganisms, a result that rejects the global dispersal hypothesis for microbes.  相似文献   

18.
The problem of reconstructing and identifying intracellular protein signaling and biochemical networks is of critical importance in biology. We propose a mathematical approach called augmented sparse reconstruction for the identification of links among nodes of ordinary differential equation (ODE) networks, given a small set of observed trajectories with various initial conditions. As a test case, the method is applied to the epidermal growth factor receptor (EGFR) driven signaling cascade, a well-studied and clinically important signaling network. Our method builds a system of representation from a collection of trajectory integrals, selectively attenuating blocks of terms in the representation. The system of representation is then augmented with random vectors, and l1 minimization is used to find sparse representations for the dynamical interactions of each node. After showing the performance of our method on a model of the EGFR protein network, we sketch briefly the potential future therapeutic applications of this approach.  相似文献   

19.
We deal here with the issue of complex network evolution. The analysis of topological evolution of complex networks plays a crucial role in predicting their future. While an impressive amount of work has been done on the issue, very little attention has been so far devoted to the investigation of how information theory quantifiers can be applied to characterize networks evolution. With the objective of dynamically capture the topological changes of a network''s evolution, we propose a model able to quantify and reproduce several characteristics of a given network, by using the square root of the Jensen-Shannon divergence in combination with the mean degree and the clustering coefficient. To support our hypothesis, we test the model by copying the evolution of well-known models and real systems. The results show that the methodology was able to mimic the test-networks. By using this copycat model, the user is able to analyze the networks behavior over time, and also to conjecture about the main drivers of its evolution, also providing a framework to predict its evolution.  相似文献   

20.
Cascading failures constitute an important vulnerability of interconnected systems. Here we focus on the study of such failures on networks in which the connectivity of nodes is constrained by geographical distance. Specifically, we use random geometric graphs as representative examples of such spatial networks, and study the properties of cascading failures on them in the presence of distributed flow. The key finding of this study is that the process of cascading failures is non-self-averaging on spatial networks, and thus, aggregate inferences made from analyzing an ensemble of such networks lead to incorrect conclusions when applied to a single network, no matter how large the network is. We demonstrate that this lack of self-averaging disappears with the introduction of a small fraction of long-range links into the network. We simulate the well studied preemptive node removal strategy for cascade mitigation and show that it is largely ineffective in the case of spatial networks. We introduce an altruistic strategy designed to limit the loss of network nodes in the event of a cascade triggering failure and show that it performs better than the preemptive strategy. Finally, we consider a real-world spatial network viz. a European power transmission network and validate that our findings from the study of random geometric graphs are also borne out by simulations of cascading failures on the empirical network.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号