首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Community structure detection is of great importance because it can help in discovering the relationship between the function and the topology structure of a network. Many community detection algorithms have been proposed, but how to incorporate the prior knowledge in the detection process remains a challenging problem. In this paper, we propose a semi-supervised community detection algorithm, which makes full utilization of the must-link and cannot-link constraints to guide the process of community detection and thereby extracts high-quality community structures from networks. To acquire the high-quality must-link and cannot-link constraints, we also propose a semi-supervised component generation algorithm based on active learning, which actively selects nodes with maximum utility for the proposed semi-supervised community detection algorithm step by step, and then generates the must-link and cannot-link constraints by accessing a noiseless oracle. Extensive experiments were carried out, and the experimental results show that the introduction of active learning into the problem of community detection makes a success. Our proposed method can extract high-quality community structures from networks, and significantly outperforms other comparison methods.  相似文献   

2.
How to identify influential nodes is a key issue in complex networks. The degree centrality is simple, but is incapable to reflect the global characteristics of networks. Betweenness centrality and closeness centrality do not consider the location of nodes in the networks, and semi-local centrality, leaderRank and pageRank approaches can be only applied in unweighted networks. In this paper, a bio-inspired centrality measure model is proposed, which combines the Physarum centrality with the K-shell index obtained by K-shell decomposition analysis, to identify influential nodes in weighted networks. Then, we use the Susceptible-Infected (SI) model to evaluate the performance. Examples and applications are given to demonstrate the adaptivity and efficiency of the proposed method. In addition, the results are compared with existing methods.  相似文献   

3.
Statistical properties of the static networks have been extensively studied. However, online social networks are evolving dynamically, understanding the evolving characteristics of the core is one of major concerns in online social networks. In this paper, we empirically investigate the evolving characteristics of the Facebook core. Firstly, we separate the Facebook-link(FL) and Facebook-wall(FW) datasets into 28 snapshots in terms of timestamps. By employing the k-core decomposition method to identify the core of each snapshot, we find that the core sizes of the FL and FW networks approximately contain about 672 and 373 nodes regardless of the exponential growth of the network sizes. Secondly, we analyze evolving topological properties of the core, including the k-core value, assortative coefficient, clustering coefficient and the average shortest path length. Empirical results show that nodes in the core are getting more interconnected in the evolving process. Thirdly, we investigate the life span of nodes belonging to the core. More than 50% nodes stay in the core for more than one year, and 19% nodes always stay in the core from the first snapshot. Finally, we analyze the connections between the core and the whole network, and find that nodes belonging to the core prefer to connect nodes with high k-core values, rather than the high degrees ones. This work could provide new insights into the online social network analysis.  相似文献   

4.
Identifying influential spreaders in networks, which contributes to optimizing the use of available resources and efficient spreading of information, is of great theoretical significance and practical value. A random-walk-based algorithm LeaderRank has been shown as an effective and efficient method in recognizing leaders in social network, which even outperforms the well-known PageRank method. As LeaderRank is initially developed for binary directed networks, further extensions should be studied in weighted networks. In this paper, a generalized algorithm PhysarumSpreader is proposed by combining LeaderRank with a positive feedback mechanism inspired from an amoeboid organism called Physarum Polycephalum. By taking edge weights into consideration and adding the positive feedback mechanism, PhysarumSpreader is applicable in both directed and undirected networks with weights. By taking two real networks for examples, the effectiveness of the proposed method is demonstrated by comparing with other standard centrality measures.  相似文献   

5.
Community detection has drawn a lot of attention as it can provide invaluable help in understanding the function and visualizing the structure of networks. Since single objective optimization methods have intrinsic drawbacks to identifying multiple significant community structures, some methods formulate the community detection as multi-objective problems and adopt population-based evolutionary algorithms to obtain multiple community structures. Evolutionary algorithms have strong global search ability, but have difficulty in locating local optima efficiently. In this study, in order to identify multiple significant community structures more effectively, a multi-objective memetic algorithm for community detection is proposed by combining multi-objective evolutionary algorithm with a local search procedure. The local search procedure is designed by addressing three issues. Firstly, nondominated solutions generated by evolutionary operations and solutions in dominant population are set as initial individuals for local search procedure. Then, a new direction vector named as pseudonormal vector is proposed to integrate two objective functions together to form a fitness function. Finally, a network specific local search strategy based on label propagation rule is expanded to search the local optimal solutions efficiently. The extensive experiments on both artificial and real-world networks evaluate the proposed method from three aspects. Firstly, experiments on influence of local search procedure demonstrate that the local search procedure can speed up the convergence to better partitions and make the algorithm more stable. Secondly, comparisons with a set of classic community detection methods illustrate the proposed method can find single partitions effectively. Finally, the method is applied to identify hierarchical structures of networks which are beneficial for analyzing networks in multi-resolution levels.  相似文献   

6.
7.
Community detection is a fundamental problem in the analysis of complex networks. Recently, many researchers have concentrated on the detection of overlapping communities, where a vertex may belong to more than one community. However, most current methods require the number (or the size) of the communities as a priori information, which is usually unavailable in real-world networks. Thus, a practical algorithm should not only find the overlapping community structure, but also automatically determine the number of communities. Furthermore, it is preferable if this method is able to reveal the hierarchical structure of networks as well. In this work, we firstly propose a generative model that employs a nonnegative matrix factorization (NMF) formulization with a l2,1 norm regularization term, balanced by a resolution parameter. The NMF has the nature that provides overlapping community structure by assigning soft membership variables to each vertex; the l2,1 regularization term is a technique of group sparsity which can automatically determine the number of communities by penalizing too many nonempty communities; and hence the resolution parameter enables us to explore the hierarchical structure of networks. Thereafter, we derive the multiplicative update rule to learn the model parameters, and offer the proof of its correctness. Finally, we test our approach on a variety of synthetic and real-world networks, and compare it with some state-of-the-art algorithms. The results validate the superior performance of our new method.  相似文献   

8.
Several protein structure classification schemes exist that partition the protein universe into structural units called folds. Yet these schemes do not discuss how these units sit relative to each other in a global structure space. In this paper we construct networks that describe such global relationships between folds in the form of structural bridges. We generate these networks using four different structural alignment methods across multiple score thresholds. The networks constructed using the different methods remain a similar distance apart regardless of the probability threshold defining a structural bridge. This suggests that at least some structural bridges are method specific and that any attempt to build a picture of structural space should not be reliant on a single structural superposition method. Despite these differences all representations agree on an organisation of fold space into five principal community structures: all-α, all-β sandwiches, all-β barrels, α/β and α + β. We project estimated fold ages onto the networks and find that not only are the pairings of unconnected folds associated with higher age differences than bridged folds, but this difference increases with the number of networks displaying an edge. We also examine different centrality measures for folds within the networks and how these relate to fold age. While these measures interpret the central core of fold space in varied ways they all identify the disposition of ancestral folds to fall within this core and that of the more recently evolved structures to provide the peripheral landscape. These findings suggest that evolutionary information is encoded along these structural bridges. Finally, we identify four highly central pivotal folds representing dominant topological features which act as key attractors within our landscapes.  相似文献   

9.
Rücker's walk count (WC) indices are well-known topological indices (TIs) used in Chemoinformatics to quantify the molecular structure of drugs represented by a graph in Quantitative structure–activity/property relationship (QSAR/QSPR) studies. In this work, we introduce for the first time the higher-order (kth order) analogues (WCk) of these indices using Markov chains. In addition, we report new QSPR models for large complex networks of different Bio-Systems useful in Parasitology and Neuroinformatics. The new type of QSPR models can be used for model checking to calculate numerical scores S(Lij) for links Lij (checking or re-evaluation of network connectivity) in large networks of all these fields. The method may be summarized as follows: (i) first, the WCk(j) values are calculated for all jth nodes in a complex network already created; (ii) A linear discriminant analysis (LDA) is used to seek a linear equation that discriminates connected or linked (Lij = 1) pairs of nodes experimentally confirmed from non-linked ones (Lij = 0); (iii) The new model is validated with external series of pairs of nodes; (iv) The equation obtained is used to re-evaluate the connectivity quality of the network, connecting/disconnecting nodes based on the quality scores calculated with the new connectivity function. The linear QSPR models obtained yielded the following results in terms of overall test accuracy for re-construction of complex networks of different Bio-Systems: parasite–host networks (93.14%), NW Spain fasciolosis spreading networks (71.42/70.18%) and CoCoMac Brain Cortex co-activation network (86.40%). Thus, this work can contribute to the computational re-evaluation or model checking of connectivity (collation) in complex systems of any science field.  相似文献   

10.
It is a classic topic of social network analysis to evaluate the importance of nodes and identify the node that takes on the role of core or bridge in a network. Because a single indicator is not sufficient to analyze multiple characteristics of a node, it is a natural solution to apply multiple indicators that should be selected carefully. An intuitive idea is to select some indicators with weak correlations to efficiently assess different characteristics of a node. However, this paper shows that it is much better to select the indicators with strong correlations. Because indicator correlation is based on the statistical analysis of a large number of nodes, the particularity of an important node will be outlined if its indicator relationship doesn''t comply with the statistical correlation. Therefore, the paper selects the multiple indicators including degree, ego-betweenness centrality and eigenvector centrality to evaluate the importance and the role of a node. The importance of a node is equal to the normalized sum of its three indicators. A candidate for core or bridge is selected from the great degree nodes or the nodes with great ego-betweenness centrality respectively. Then, the role of a candidate is determined according to the difference between its indicators'' relationship with the statistical correlation of the overall network. Based on 18 real networks and 3 kinds of model networks, the experimental results show that the proposed methods perform quite well in evaluating the importance of nodes and in identifying the node role.  相似文献   

11.
The task of extracting the maximal amount of information from a biological network has drawn much attention from researchers, for example, predicting the function of a protein from a protein-protein interaction (PPI) network. It is well known that biological networks consist of modules/communities, a set of nodes that are more densely inter-connected among themselves than with the rest of the network. However, practical applications of utilizing the community information have been rather limited. For protein function prediction on a network, it has been shown that none of the existing community-based protein function prediction methods outperform a simple neighbor-based method. Recently, we have shown that proper utilization of a highly optimal modularity community structure for protein function prediction can outperform neighbor-assisted methods. In this study, we propose two function prediction approaches on bipartite networks that consider the community structure information as well as the neighbor information from the network: 1) a simple screening method and 2) a random forest based method. We demonstrate that our community-assisted methods outperform neighbor-assisted methods and the random forest method yields the best performance. In addition, we show that using the optimal community structure information is essential for more accurate function prediction for the protein-complex bipartite network of Saccharomyces cerevisiae. Community detection can be carried out either using a modified modularity for dealing with the original bipartite network or first projecting the network into a single-mode network (i.e., PPI network) and then applying community detection to the reduced network. We find that the projection leads to the loss of information in a significant way. Since our prediction methods rely only on the network topology, they can be applied to various fields where an efficient network-based analysis is required.  相似文献   

12.
A core comprises of a group of central and densely connected nodes which governs the overall behaviour of a network. It is recognised as one of the key meso-scale structures in complex networks. Profiling this meso-scale structure currently relies on a limited number of methods which are often complex and parameter dependent or require a null model. As a result, scalability issues are likely to arise when dealing with very large networks together with the need for subjective adjustment of parameters. The notion of a rich-club describes nodes which are essentially the hub of a network, as they play a dominating role in structural and functional properties. The definition of a rich-club naturally emphasises high degree nodes and divides a network into two subgroups. Here, we develop a method to characterise a rich-core in networks by theoretically coupling the underlying principle of a rich-club with the escape time of a random walker. The method is fast, scalable to large networks and completely parameter free. In particular, we show that the evolution of the core in World Trade and C. elegans networks correspond to responses to historical events and key stages in their physical development, respectively.  相似文献   

13.
Wang J  Liu B  Li M  Pan Y 《BMC genomics》2010,11(Z2):S10

Background

Identification of protein complexes in large interaction networks is crucial to understand principles of cellular organization and predict protein functions, which is one of the most important issues in the post-genomic era. Each protein might be subordinate multiple protein complexes in the real protein-protein interaction networks. Identifying overlapping protein complexes from protein-protein interaction networks is a considerable research topic.

Result

As an effective algorithm in identifying overlapping module structures, clique percolation method (CPM) has a wide range of application in social networks and biological networks. However, the recognition accuracy of algorithm CPM is lowly. Furthermore, algorithm CPM is unfit to identifying protein complexes with meso-scale when it applied in protein-protein interaction networks. In this paper, we propose a new topological model by extending the definition of k-clique community of algorithm CPM and introduced distance restriction, and develop a novel algorithm called CP-DR based on the new topological model for identifying protein complexes. In this new algorithm, the protein complex size is restricted by distance constraint to conquer the shortcomings of algorithm CPM. The algorithm CP-DR is applied to the protein interaction network of Sacchromyces cerevisiae and identifies many well known complexes.

Conclusion

The proposed algorithm CP-DR based on clique percolation and distance restriction makes it possible to identify dense subgraphs in protein interaction networks, a large number of which correspond to known protein complexes. Compared to algorithm CPM, algorithm CP-DR has more outstanding performance.
  相似文献   

14.
15.

Background

Recently, large data sets of protein-protein interactions (PPI) which can be modeled as PPI networks are generated through high-throughput methods. And locally dense regions in PPI networks are very likely to be protein complexes. Since protein complexes play a key role in many biological processes, detecting protein complexes in PPI networks is one of important tasks in post-genomic era. However, PPI networks are often incomplete and noisy, which builds barriers to mining protein complexes.

Results

We propose a new and effective algorithm based on robustness to detect overlapping clusters as protein complexes in PPI networks. And in order to improve the accuracy of resulting clusters, our algorithm tries to reduce bad effects brought by noise in PPI networks. And in our algorithm, each new cluster begins from a seed and is expanded through adding qualified nodes from the cluster's neighbourhood nodes. Besides, in our algorithm, a new distance measurement method between a cluster K and a node in the neighbours of K is proposed as well. The performance of our algorithm is evaluated by applying it on two PPI networks which are Gavin network and Database of Interacting Proteins (DIP). The results show that our algorithm is better than Markov clustering algorithm (MCL), Clique Percolation method (CPM) and core-attachment based method (CoAch) in terms of F-measure, co-localization and Gene Ontology (GO) semantic similarity.

Conclusions

Our algorithm detects locally dense regions or clusters as protein complexes. The results show that protein complexes generated by our algorithm have better quality than those generated by some previous classic methods. Therefore, our algorithm is effective and useful.
  相似文献   

16.
With the rapid accumulation of high-throughput metagenomic sequencing data, it is possible to infer microbial species relations in a microbial community systematically. In recent years, some approaches have been proposed for identifying microbial interaction network. These methods often focus on one dataset without considering the advantage of data integration. In this study, we propose to use a similarity network fusion (SNF) method to infer microbial relations. The SNF efficiently integrates the similarities of species derived from different datasets by a cross-network diffusion process. We also introduce consensus k-nearest neighborhood (Ck-NN) method instead of k-NN in the original SNF (we call the approach CSNF). The final network represents the augmented species relationships with aggregated evidence from various datasets, taking advantage of complementarity in the data. We apply the method on genus profiles derived from three microbiome datasets and we find that CSNF can discover the modular structure of microbial interaction network which cannot be identified by analyzing a single dataset.  相似文献   

17.
In many types of network, the relationship between structure and function is of great significance. We are particularly interested in community structures, which arise in a wide variety of domains. We apply a simple oscillator model to networks with community structures and show that waves of regular oscillation are caused by synchronised clusters of nodes. Moreover, we show that such global oscillations may arise as a direct result of network topology. We also observe that additional modes of oscillation (as detected through frequency analysis) occur in networks with additional levels of topological hierarchy and that such modes may be directly related to network structure. We apply the method in two specific domains (metabolic networks and metropolitan transport) demonstrating the robustness of our results when applied to real world systems. We conclude that (where the distribution of oscillator frequencies and the interactions between them are known to be unimodal) our observations may be applicable to the detection of underlying community structure in networks, shedding further light on the general relationship between structure and function in complex systems.  相似文献   

18.
Community detection is an important tool for exploring and classifying the properties of large complex networks and should be of great help for spatial networks. Indeed, in addition to their location, nodes in spatial networks can have attributes such as the language for individuals, or any other socio-economical feature that we would like to identify in communities. We discuss in this paper a crucial aspect which was not considered in previous studies which is the possible existence of correlations between space and attributes. Introducing a simple toy model in which both space and node attributes are considered, we discuss the effect of space-attribute correlations on the results of various community detection methods proposed for spatial networks in this paper and in previous studies. When space is irrelevant, our model is equivalent to the stochastic block model which has been shown to display a detectability-non detectability transition. In the regime where space dominates the link formation process, most methods can fail to recover the communities, an effect which is particularly marked when space-attributes correlations are strong. In this latter case, community detection methods which remove the spatial component of the network can miss a large part of the community structure and can lead to incorrect results.  相似文献   

19.
Studies of complex networks show that nodes with high centrality scores are important to network structure and stability. Following this rationale, centrality measures can be used to (i) identify keystone species in ecological networks, a major issue in community ecology, and (ii) differentiate the keystone species concept, e.g. species may play a key role in a network for different topological reasons. In 34 pollination communities we examine the relationship between the generalization level of species (ND) and two complementary centrality indices: closeness (CC) and betweenness centrality (BC). CC measures the proximity of a species to all other species in the community, while BC describes the importance of a species as a connector. Most networks had a linear NDCC relationship with a minimum CC value of 0.41. Hence, species were close to each and will be likely to be rapidly affected by disturbances. Contrarily, in most networks, the NDBC relationships were power-law distributed with exponents larger than one. Only 59% of the species were connectors (BC > 0). In particular, there was a connector threshold value of ND = 0.46. Species above this threshold represent ~40%, almost all of which were connectors. These results indicate that in pollination systems the most generalized species are usually network keystone species, playing at least two roles: (i) interact closely with most other species (high CC) and (ii) connect otherwise unconnected subnetworks (high BC). We discuss the implications of centrality measures to community-based conservation ecology.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号