首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Networks play a prominent role in the study of complex systems of interacting entities in biology, sociology, and economics. Despite this diversity, we demonstrate here that a statistical model decomposing networks into matching and centrality components provides a comprehensive and unifying quantification of their architecture. The matching term quantifies the assortative structure in which node makes links with which other node, whereas the centrality term quantifies the number of links that nodes make. We show, for a diverse set of networks, that this decomposition can provide a tight fit to observed networks. Then we provide three applications. First, we show that the model allows very accurate prediction of missing links in partially known networks. Second, when node characteristics are known, we show how the matching–centrality decomposition can be related to this external information. Consequently, it offers us a simple and versatile tool to explore how node characteristics explain network architecture. Finally, we demonstrate the efficiency and flexibility of the model to forecast the links that a novel node would create if it were to join an existing network.  相似文献   

2.
Identifying influential nodes in very large-scale directed networks is a big challenge relevant to disparate applications, such as accelerating information propagation, controlling rumors and diseases, designing search engines, and understanding hierarchical organization of social and biological networks. Known methods range from node centralities, such as degree, closeness and betweenness, to diffusion-based processes, like PageRank and LeaderRank. Some of these methods already take into account the influences of a node’s neighbors but do not directly make use of the interactions among it’s neighbors. Local clustering is known to have negative impacts on the information spreading. We further show empirically that it also plays a negative role in generating local connections. Inspired by these facts, we propose a local ranking algorithm named ClusterRank, which takes into account not only the number of neighbors and the neighbors’ influences, but also the clustering coefficient. Subject to the susceptible-infected-recovered (SIR) spreading model with constant infectivity, experimental results on two directed networks, a social network extracted from delicious.com and a large-scale short-message communication network, demonstrate that the ClusterRank outperforms some benchmark algorithms such as PageRank and LeaderRank. Furthermore, ClusterRank can also be applied to undirected networks where the superiority of ClusterRank is significant compared with degree centrality and k-core decomposition. In addition, ClusterRank, only making use of local information, is much more efficient than global methods: It takes only 191 seconds for a network with about nodes, more than 15 times faster than PageRank.  相似文献   

3.
GESTs (gene expression similarity and taxonomy similarity), a gene functional prediction approach previously proposed by us, is based on gene expression similarity and concept similarity of functional classes defined in Gene Ontology (GO). In this paper, we extend this method to protein-protein interac-tion data by introducing several methods to filter the neighbors in protein interaction networks for a protein of unknown function(s). Unlike other conventional methods, the proposed approach automati-cally selects the most appropriate functional classes as specific as possible during the learning proc-ess, and calls on genes annotated to nearby classes to support the predictions to some small-sized specific classes in GO. Based on the yeast protein-protein interaction information from MIPS and a dataset of gene expression profiles, we assess the performances of our approach for predicting protein functions to “biology process” by three measures particularly designed for functional classes organ-ized in GO. Results show that our method is powerful for widely predicting gene functions with very specific functional terms. Based on the GO database published in December 2004, we predict some proteins whose functions were unknown at that time, and some of the predictions have been confirmed by the new SGD annotation data published in April, 2006.  相似文献   

4.
One of the pressing open problems of computational systems biology is the elucidation of the topology of genetic regulatory networks (GRNs) using high throughput genomic data, in particular microarray gene expression data. The Dialogue for Reverse Engineering Assessments and Methods (DREAM) challenge aims to evaluate the success of GRN inference algorithms on benchmarks of simulated data. In this article, we present GENIE3, a new algorithm for the inference of GRNs that was best performer in the DREAM4 In Silico Multifactorial challenge. GENIE3 decomposes the prediction of a regulatory network between p genes into p different regression problems. In each of the regression problems, the expression pattern of one of the genes (target gene) is predicted from the expression patterns of all the other genes (input genes), using tree-based ensemble methods Random Forests or Extra-Trees. The importance of an input gene in the prediction of the target gene expression pattern is taken as an indication of a putative regulatory link. Putative regulatory links are then aggregated over all genes to provide a ranking of interactions from which the whole network is reconstructed. In addition to performing well on the DREAM4 In Silico Multifactorial challenge simulated data, we show that GENIE3 compares favorably with existing algorithms to decipher the genetic regulatory network of Escherichia coli. It doesn''t make any assumption about the nature of gene regulation, can deal with combinatorial and non-linear interactions, produces directed GRNs, and is fast and scalable. In conclusion, we propose a new algorithm for GRN inference that performs well on both synthetic and real gene expression data. The algorithm, based on feature selection with tree-based ensemble methods, is simple and generic, making it adaptable to other types of genomic data and interactions.  相似文献   

5.
The problem of link prediction has recently received increasing attention from scholars in network science. In social network analysis, one of its aims is to recover missing links, namely connections among actors which are likely to exist but have not been reported because data are incomplete or subject to various types of uncertainty. In the field of criminal investigations, problems of incomplete information are encountered almost by definition, given the obvious anti-detection strategies set up by criminals and the limited investigative resources. In this paper, we work on a specific dataset obtained from a real investigation, and we propose a strategy to identify missing links in a criminal network on the basis of the topological analysis of the links classified as marginal, i.e. removed during the investigation procedure. The main assumption is that missing links should have opposite features with respect to marginal ones. Measures of node similarity turn out to provide the best characterization in this sense. The inspection of the judicial source documents confirms that the predicted links, in most instances, do relate actors with large likelihood of co-participation in illicit activities.  相似文献   

6.
GESTs (gene expression similarity and taxonomy similarity), a gene functional prediction approach previously proposed by us, is based on gene expression similarity and concept similarity of functional classes defined in Gene Ontology (GO). In this paper, we extend this method to protein-protein interaction data by introducing several methods to filter the neighbors in protein interaction networks for a protein of unknown function(s). Unlike other conventional methods, the proposed approach automatically selects the most appropriate functional classes as specific as possible during the learning process, and calls on genes annotated to nearby classes to support the predictions to some small-sized specific classes in GO. Based on the yeast protein-protein interaction information from MIPS and a dataset of gene expression profiles, we assess the performances of our approach for predicting protein functions to “biology process” by three measures particularly designed for functional classes organized in GO. Results show that our method is powerful for widely predicting gene functions with very specific functional terms. Based on the GO database published in December 2004, we predict some proteins whose functions were unknown at that time, and some of the predictions have been confirmed by the new SGD annotation data published in April, 2006.  相似文献   

7.
Disease epidemic outbreaks on human metapopulation networks are often driven by a small number of superspreader nodes, which are primarily responsible for spreading the disease throughout the network. Superspreader nodes typically are characterized either by their locations within the network, by their degree of connectivity and centrality, or by their habitat suitability for the disease, described by their reproduction number (R). Here we introduce a model that considers simultaneously the effects of network properties and R on superspreaders, as opposed to previous research which considered each factor separately. This type of model is applicable to diseases for which habitat suitability varies by climate or land cover, and for direct transmitted diseases for which population density and mitigation practices influences R. We present analytical models that quantify the superspreader capacity of a population node by two measures: probability-dependent superspreader capacity, the expected number of neighboring nodes to which the node in consideration will randomly spread the disease per epidemic generation, and time-dependent superspreader capacity, the rate at which the node spreads the disease to each of its neighbors. We validate our analytical models with a Monte Carlo analysis of repeated stochastic Susceptible-Infected-Recovered (SIR) simulations on randomly generated human population networks, and we use a random forest statistical model to relate superspreader risk to connectivity, R, centrality, clustering, and diffusion. We demonstrate that either degree of connectivity or R above a certain threshold are sufficient conditions for a node to have a moderate superspreader risk factor, but both are necessary for a node to have a high-risk factor. The statistical model presented in this article can be used to predict the location of superspreader events in future epidemics, and to predict the effectiveness of mitigation strategies that seek to reduce the value of R, alter host movements, or both.  相似文献   

8.
Cloud computing technology plays a very important role in many areas, such as in the construction and development of the smart city. Meanwhile, numerous cloud services appear on the cloud-based platform. Therefore how to how to select trustworthy cloud services remains a significant problem in such platforms, and extensively investigated owing to the ever-growing needs of users. However, trust relationship in social network has not been taken into account in existing methods of cloud service selection and recommendation. In this paper, we propose a cloud service selection model based on the trust-enhanced similarity. Firstly, the direct, indirect, and hybrid trust degrees are measured based on the interaction frequencies among users. Secondly, we estimate the overall similarity by combining the experience usability measured based on Jaccard’s Coefficient and the numerical distance computed by Pearson Correlation Coefficient. Then through using the trust degree to modify the basic similarity, we obtain a trust-enhanced similarity. Finally, we utilize the trust-enhanced similarity to find similar trusted neighbors and predict the missing QoS values as the basis of cloud service selection and recommendation. The experimental results show that our approach is able to obtain optimal results via adjusting parameters and exhibits high effectiveness. The cloud services ranking by our model also have better QoS properties than other methods in the comparison experiments.  相似文献   

9.
Many biological networks are signed molecular networks which consist of positive and negative links. To reveal the distinct features between links with different signs, we proposed signed link-clustering coefficients that assess the similarity of inter-action profiles between linked molecules. We found that positive links tended to cluster together, while negative links usually behaved like bridges between positive clusters. Positive links with higher adhesiveness tended to share protein domains, be associated with protein-protein interactions and make intra-connections within protein complexes. Negative links that were more bridge-like tended to make interconnections between protein complexes. Utilizing the proposed measures to group positive links, we observed hierarchical modules that could be well characterized by functional annotations or known protein complexes. Our results imply that the proposed sign-specific measures can help reveal the network structural characteristics and the embedded biological contexts of signed links, as well as the functional organization of signed molecular networks.  相似文献   

10.
Ecological networks are complexes of interacting species, but not all potential links among species are realized. Unobserved links are either missing or forbidden. Missing links exist, but require more sampling or alternative ways of detection to be verified. Forbidden links remain unobservable, irrespective of sampling effort. They are caused by linkage constraints. We studied one Arctic pollination network and two Mediterranean seed-dispersal networks. In the first, for example, we recorded flower-visit links for one full season, arranged data in an interaction matrix and got a connectance C of 15 per cent. Interaction accumulation curves documented our sampling of interactions through observation of visits to be robust. Then, we included data on pollen from the body surface of flower visitors as an additional link ‘currency’. This resulted in 98 new links, missing from the visitation data. Thus, the combined visit–pollen matrix got an increased C of 20 per cent. For the three networks, C ranged from 20 to 52 per cent, and thus the percentage of unobserved links (100 − C) was 48 to 80 per cent; these were assumed forbidden because of linkage constraints and not missing because of under-sampling. Phenological uncoupling (i.e. non-overlapping phenophases between interacting mutualists) is one kind of constraint, and it explained 22 to 28 per cent of all possible, but unobserved links. Increasing phenophase overlap between species increased link probability, but extensive overlaps were required to achieve a high probability. Other kinds of constraint, such as size mismatch and accessibility limitations, are briefly addressed.  相似文献   

11.
Commercial microbial identification systems rank the relative likelihood of species identity on the basis of in vitro reactions only. Failure to consider the prevalence of individual taxa results in a spurious demotion of common species; and a tendency toward over reporting of rare microbes. The incorporation of Bayesian analysis into identification matrices can provide for a realistic ranking of bacterial species which fulfil given biocode schemes. and accepted 22 June 1989  相似文献   

12.
The advent of functional genomics has enabled the molecular biosciences to come a long way towards characterizing the molecular constituents of life. Yet, the challenge for biology overall is to understand how organisms function. By discovering how function arises in dynamic interactions, systems biology addresses the missing links between molecules and physiology. Top-down systems biology identifies molecular interaction networks on the basis of correlated molecular behavior observed in genome-wide "omics" studies. Bottom-up systems biology examines the mechanisms through which functional properties arise in the interactions of known components. Here, we outline the challenges faced by systems biology and discuss limitations of the top-down and bottom-up approaches, which, despite these limitations, have already led to the discovery of mechanisms and principles that underlie cell function.  相似文献   

13.
Commercial microbial identification systems rank the relative likelihood of species identity on the basis of in vitro reactions only. Failure to consider the prevalence of individual taxa results in a spurious demotion of common species; and a tendency toward over reporting of rare microbes. The incorporation of Bayesian analysis into identification matrices can provide for a realistic ranking of bacterial species which fulfil given biocode schemes.  相似文献   

14.
We compared general behaviour trends of resampling methods (bootstrap, bootstrap with Poisson distribution, jackknife, and jackknife with symmetric resampling) and different ways to summarize the results for resampling (absolute frequency, F, and frequency difference, GC') for real data sets under variable resampling strengths in three weighting schemes. We propose an equivalence between bootstrap and jackknife in order to make bootstrap variable across different resampling strengths. Specifically, for each method we evaluated the number of spurious groups (groups not present in the strict consensus of the unaltered data set), of real groups, and of inconsistencies in ranking of groups under variable resampling strengths. We found that GC' always generated more spurious groups and recovered more groups than F. Bootstrap methods generated more spurious groups than jackknife methods; and jackknife is the method that recovered more real groups. We consistently obtained a higher proportion of spurious groups for GC' than for F; and for bootstrap than for jackknife. Finally, we evaluated the ranking of groups under variable resampling strengths qualitatively in the trajectories of "support" against resampling strength, and quantitatively with Kendall coefficient values. We found fewer ranking inconsistencies for GC' than for F, and for bootstrap than for jackknife.
© The Willi Hennig Society 2009.  相似文献   

15.
Identification of homologous core structures   总被引:7,自引:0,他引:7  
Matsuo Y  Bryant SH 《Proteins》1999,35(1):70-79
Using a large database of protein structure-structure alignments, we test a new method for distinguishing homologous and "analogous" structural neighbors. The homologous neighbors included in the test set show no detectable sequence similarity, but they may be well superimposed and show functional similarity or other evidence of evolutionary relationship. Analogous neighbors also show no sequence similarity and may be well superimposed, but they have different functions and their structural similarity may be the result of convergent evolution. Confirming results of other analyses, we find that remote homologs and analogs are not well distinguished by measures of pairwise structural similarity, including the percentage of identical residues and root-mean-square (RMS) superposition residual. We show, however, that with structure-structure alignments of analogous neighbors rarely superimpose the particular substructure that is shared among homologous neighbors. We call this characteristic substructure the homologous core structure (HCS), and we show that a cross-validated test for presence of the HCS correctly identifies 75% of remote homologs with a false-positive rate of 16% analogs, significantly better than discrimination by RMS or other measures of pairwise similarity. The HCS describes conservation of spatial structure within a protein family in much the way that a sequence motif describes sequence conservation. We suggest that it may be used in the same way, to identify homologous neighbors at greater evolutionary distance than is possible by pairwise comparison.  相似文献   

16.
Microbes compose most of the biomass on the planet, yet the majority of taxa remain uncharacterized. These unknown microbes, often referred to as “microbial dark matter,” represent a major challenge for biology. To understand the ecological contributions of these Unknown taxa, it is essential to first understand the relationship between unknown species, neighboring microbes, and their respective environment. Here, we establish a method to study the ecological significance of “microbial dark matter” by building microbial co-occurrence networks from publicly available 16S rRNA gene sequencing data of four extreme aquatic habitats. For each environment, we constructed networks including and excluding unknown organisms at multiple taxonomic levels and used network centrality measures to quantitatively compare networks. When the Unknown taxa were excluded from the networks, a significant reduction in degree and betweenness was observed for all environments. Strikingly, Unknown taxa occurred as top hubs in all environments, suggesting that “microbial dark matter” play necessary ecological roles within their respective communities. In addition, novel adaptation-related genes were detected after using 16S rRNA gene sequences from top-scoring hub taxa as probes to blast metagenome databases. This work demonstrates the broad applicability of network metrics to identify and prioritize key Unknown taxa and improve understanding of ecosystem structure across diverse habitats.Subject terms: Microbial ecology, Metagenomics  相似文献   

17.
Link Clustering (LC) is a relatively new method for detecting overlapping communities in networks. The basic principle of LC is to derive a transform matrix whose elements are composed of the link similarity of neighbor links based on the Jaccard distance calculation; then it applies hierarchical clustering to the transform matrix and uses a measure of partition density on the resulting dendrogram to determine the cut level for best community detection. However, the original link clustering method does not consider the link similarity of non-neighbor links, and the partition density tends to divide the communities into many small communities. In this paper, an Extended Link Clustering method (ELC) for overlapping community detection is proposed. The improved method employs a new link similarity, Extended Link Similarity (ELS), to produce a denser transform matrix, and uses the maximum value of EQ (an extended measure of quality of modularity) as a means to optimally cut the dendrogram for better partitioning of the original network space. Since ELS uses more link information, the resulting transform matrix provides a superior basis for clustering and analysis. Further, using the EQ value to find the best level for the hierarchical clustering dendrogram division, we obtain communities that are more sensible and reasonable than the ones obtained by the partition density evaluation. Experimentation on five real-world networks and artificially-generated networks shows that the ELC method achieves higher EQ and In-group Proportion (IGP) values. Additionally, communities are more realistic than those generated by either of the original LC method or the classical CPM method.  相似文献   

18.
MOTIVATION: A global view of the protein space is essential for functional and evolutionary analysis of proteins. In order to achieve this, a similarity network can be built using pairwise relationships among proteins. However, existing similarity networks employ a single similarity measure and therefore their utility depends highly on the quality of the selected measure. A more robust representation of the protein space can be realized if multiple sources of information are used. RESULTS: We propose a novel approach for analyzing multi-attribute similarity networks by combining random walks on graphs with Bayesian theory. A multi-attribute network is created by combining sequence and structure based similarity measures. For each attribute of the similarity network, one can compute a measure of affinity from a given protein to every other protein in the network using random walks. This process makes use of the implicit clustering information of the similarity network, and we show that it is superior to naive, local ranking methods. We then combine the computed affinities using a Bayesian framework. In particular, when we train a Bayesian model for automated classification of a novel protein, we achieve high classification accuracy and outperform single attribute networks. In addition, we demonstrate the effectiveness of our technique by comparison with a competing kernel-based information integration approach.  相似文献   

19.
Little is known about the network structure of competition in large populations of plants, despite the importance of such knowledge for understanding population dynamics. In this study, we used complex network analysis to examine temporal changes in the network structure of competition in an even-aged multi-individual stand of the Sakhalin fir Abies sachalinensis in Hokkaido, Japan. Using census data, which were measured over 30 years (1948–1978; seedlings were planted in 1929), on the sizes and locations of these plants, we regarded a plant as a node and competition between plants as a link. We then introduced two indices, the binary and weighted out-degrees (BO and WO, respectively), to interpret complicated plant interactions. The BO of a plant represents the number of links from the target plant to its neighbors, and the WO is the total strength of competition from the target plant to its neighbors. The analysis showed that the distributions of BO and WO were heavy-tailed in all years and that large plants had large BO and WO. These results suggest that only a few (i.e., large) plants have a very large impact on the growth and survival of a much larger number of neighboring plants and thus on population dynamics, whereas most of the others (i.e., small and medium-sized plants) have only a small impact on a few neighbors. By introducing binary and weighted connectivities (BC’ and WC’, respectively), we were able to identify the size classes of neighbors with which the target plant preferentially and strongly competed. The BC’ and WC’ results showed that large plants competed preferentially and more strongly with other large plants in 1948, but they competed more strongly with small plants after 1963. These results clarify targets of the very large impact of large plants, as shown by the results of BO and WO: the impact was exerted on the growth and survival of other large plants in 1948, whereas the impact was exerted on those of small plants after 1963. Our study demonstrates that the statistical properties of the competition network structure, which have been largely ignored in plant competition research, are important for understanding plant population dynamics.  相似文献   

20.
Hao D  Li C 《PloS one》2011,6(12):e28322
Most complex networks from different areas such as biology, sociology or technology, show a correlation on node degree where the possibility of a link between two nodes depends on their connectivity. It is widely believed that complex networks are either disassortative (links between hubs are systematically suppressed) or assortative (links between hubs are enhanced). In this paper, we analyze a variety of biological networks and find that they generally show a dichotomous degree correlation. We find that many properties of biological networks can be explained by this dichotomy in degree correlation, including the neighborhood connectivity, the sickle-shaped clustering coefficient distribution and the modularity structure. This dichotomy distinguishes biological networks from real disassortative networks or assortative networks such as the Internet and social networks. We suggest that the modular structure of networks accounts for the dichotomy in degree correlation and vice versa, shedding light on the source of modularity in biological networks. We further show that a robust and well connected network necessitates the dichotomy of degree correlation, suggestive of an evolutionary motivation for its existence. Finally, we suggest that a dichotomous degree correlation favors a centrally connected modular network, by which the integrity of network and specificity of modules might be reconciled.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号