首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The Minimum Evolution (ME) approach to phylogeny estimation has been shown to be statistically consistent when it is used in conjunction with ordinary least-squares (OLS) fitting of a metric to a tree structure. The traditional approach to using ME has been to start with the Neighbor Joining (NJ) topology for a given matrix and then do a topological search from that starting point. The first stage requires O(n(3)) time, where n is the number of taxa, while the current implementations of the second are in O(p n(3)) or more, where p is the number of swaps performed by the program. In this paper, we examine a greedy approach to minimum evolution which produces a starting topology in O(n(2)) time. Moreover, we provide an algorithm that searches for the best topology using nearest neighbor interchanges (NNIs), where the cost of doing p NNIs is O(n(2) + p n), i.e., O(n(2)) in practice because p is always much smaller than n. The Greedy Minimum Evolution (GME) algorithm, when used in combination with NNIs, produces trees which are fairly close to NJ trees in terms of topological accuracy. We also examine ME under a balanced weighting scheme, where sibling subtrees have equal weight, as opposed to the standard "unweighted" OLS, where all taxa have the same weight so that the weight of a subtree is equal to the number of its taxa. The balanced minimum evolution scheme (BME) runs slower than the OLS version, requiring O(n(2) x diam(T)) operations to build the starting tree and O(p n x diam(T)) to perform the NNIs, where diam(T) is the topological diameter of the output tree. In the usual Yule-Harding distribution on phylogenetic trees, the diameter expectation is in log(n), so our algorithms are in practice faster that NJ. Moreover, this BME scheme yields a very significant improvement over NJ and other distance-based algorithms, especially with large trees, in terms of topological accuracy.  相似文献   

2.
The topological importance of species within networks is an important way of bringing a species-level consideration to the study of whole ecological networks. There are many different indices of topological importance, including centrality indices, but it is likely that a small number are sufficient to explain variation in topological importance. We used 14 indices to describe topological importance of plants and pollinators in 12 quantitative mutualistic (plant–pollinator) networks. The 14 indices varied in their consideration of interaction strength (weighted versus unweighted indices) and indirect interactions (from the local measure of degree to meso-scale indices). We use principal components approximation to assess how well every combination of 1–14 indices approximated to the results of principal components analysis (PCA). We found that one or two indices were sufficient to explain up to 90% of the variation in topological importance in both plants and pollinators. The choice of index was crucial because there was considerable variation between the best and the worst approximating subsets of indices. The best single indices were unweighted degree and unweighted topological importance (Jordán's TI index) with two steps (a measurement of apparent competition). The best pairs of indices consisted of a measure of a TI index and one of closeness centrality (weighted or unweighted) or d′ (a standardised species-level measure of partner diversity). Although we have found indices that efficiently explain variation in topological importance, we recommend further research to discover the real-world relevance of different aspects of topological importance to species in ecological networks.  相似文献   

3.
Electron cryo-microscopy is a fast advancing biophysical technique to derive three-dimensional structures of large protein complexes. Using this technique, many density maps have been generated at intermediate resolution such as 6-10 ? resolution. Although it is challenging to derive the backbone of the protein directly from such density maps, secondary structure elements such as helices and β-sheets can be computationally detected. Our work in this paper provides an approach to enumerate the top-ranked possible topologies instead of enumerating the entire population of the topologies. This approach is particularly practical for large proteins. We developed a directed weighted graph, the topology graph, to represent the secondary structure assignment problem. We prove that the problem of finding the valid topology with the minimum cost is NP hard. We developed an O(N(2)2(N)) dynamic programming algorithm to identify the topology with the minimum cost. The test of 15 proteins suggests that our dynamic programming approach is feasible to work with proteins of much larger size than we could before. The largest protein in the test contains 18 helical sticks detected from the density map out of 33 helices in the protein.  相似文献   

4.
The link-prediction problem is an open issue in data mining and knowledge discovery, which attracts researchers from disparate scientific communities. A wealth of methods have been proposed to deal with this problem. Among these approaches, most are applied in unweighted networks, with only a few taking the weights of links into consideration. In this paper, we present a weighted model for undirected and weighted networks based on the mutual information of local network structures, where link weights are applied to further enhance the distinguishable extent of candidate links. Empirical experiments are conducted on four weighted networks, and results show that the proposed method can provide more accurate predictions than not only traditional unweighted indices but also typical weighted indices. Furthermore, some in-depth discussions on the effects of weak ties in link prediction as well as the potential to predict link weights are also given. This work may shed light on the design of algorithms for link prediction in weighted networks.  相似文献   

5.
Determining the relevance and importance of a technosphere process or a cluster of processes in relation to the rest of the industrial network can provide insights into the sustainability of supply chains: those that need to be optimized or controlled/safeguarded. Network analysis (NA) can offer a broad framework of indicators to tackle this problem. In this article, we present a detailed analysis of a life cycle inventory (LCI) model from an NA perspective. Specifically, the network is represented as a directed graph and the “emergy” numeraire is used as the weight associated with the arcs of the network. The case study of a technological system for drinking water production is presented. We investigate the topological and structural characteristics of the network representation of this system and compare properties of its weighted and unweighted network, as well as the importance of nodes (i.e., life cycle unit processes). By identifying a number of advantages and limitations linked to the modeling complexity of such emergy‐LCI networks, we classify the LCI technosphere network of our case study as a complex network belonging to the scale‐free network family. The salient feature of this network family is represented by the presence of “hubs”: nodes that connect with many other nodes. Hub failures may imply relevant changes, decreases, or even breaks in the connectedness with other smaller hubs and nodes of the network. Hence, by identifying node centralities, we can rank and interpret the relevance of each node for its special role in the life cycle network.  相似文献   

6.
A novel algorithm for unsupervised fuzzy clustering is introduced. The algorithm uses a so-called Weighted Fixed Neural Network (WFNN) to store important and useful information about the topological relations in a given data set. The algorithm produces a weighted connected net, of weighted nodes connected by weighted edges, which reflects and preserves the topology of the input data set. The weights of the nodes and the edges in the resulting net are proportional to the local densities of data samples in input space. The connectedness of the net can be changed, and the higher the connectedness of the net is chosen, the fuzzier the system becomes. The new algorithm is computationally efficient when compared to other existing methods for clustering multi-dimensional data, such as color images.  相似文献   

7.
A topological approach is presented for the analysis of control and regulation in metabolic pathways. In this approach, the control structure of a metabolic pathway is represented by a weighted directed graph. From an inspection of the topology of the graph, the control coefficients of the enzymes are evaluated in a heuristic manner in terms of the enzyme elasticities. The major advantage of the topological approach is that it provides a visual framework for (1) calculating the control coefficients of the enzymes, (2) analyzing the cause-effect relationships of the individual enzymes, (3) assessing the relative importance of the enzymes in metabolic regulation, and (4) simplifying the structure of a given pathway, from a regulatory viewpoint. Results are obtained for (a) an unbranched pathway in the absence of feedback the feedforward regulation and (b) an unbranched pathway with feedback inhibition. Our formulation is based on the metabolic control theory of Kacser and Burns (1973) and Heinrich and Rapoport (1974).  相似文献   

8.
A new method for phylogenetic inference, Strongest Evidence (SE), is described. In this method, a character's support for a phylogenetic hypothesis, its apparent phylogenetic signal, is greatest when the amount of implied homoplasy is most remarkably small given background knowledge alone. Because evolutionary rates are not assumed to be slow, background expectations for character length can be derived through modeling complete dissociation between branching pattern and character state assignments. As in unweighted parsimony, SE holds that fewer required evolutionary steps in a character indicates stronger support for a tree. However, in SE, the relationship between steps and support differs by unlabeled tree topology and character state distribution. Strongest evidence is contrasted in detail with both unweighted parsimony and Goloboff's method of implied weights. An iterative process is suggested for incrementally resolving a phylogenetic hypothesis while conducting cladistic analyses at increasingly local levels.  相似文献   

9.
Pollination and seed dispersal networks are by definition bimodal, linking two sets of species, plants and animals. Bimodal networks are often analysed after being transformed into unimodal ones, since most attributes in network theory are defined for the latter. Such unimodal projections (e.g. of plants sharing flower visitors or seed dispersers) map potential inter-specific competition or facilitation, and can thus be useful for instance when identifying native species potentially sensitive to aliens in the communities. In this work, we introduce procedures to project unweighted and weighted bimodal networks into unimodals, for animals or plants, and calculate two centrality measures that inform us about the species’ role in the communities. By using 20 empirical weighted networks worldwide, we obtained 160 unimodal networks via four projection methods and evaluated correlations among centrality parameters across the different methodologies to assess how consistent the results are when including different link weights between species. Degree centralities obtained by projecting unweighted and weighted bimodal networks were not significantly correlated, suggesting that the role of the species differs when considering link weights in the original bimodal networks. By contrast, betweenness centralities were highly correlated, indicating the consistent importance of the species as connectors regardless of the projection method used. We conclude that preserving the weighted information when transforming bimodal into unimodal networks may allow us to make more realistic predictions on the potential competitive or facilitative interactions among species of one set (e.g. plants) that share species of the other (e.g. flower visitors or dispersers).  相似文献   

10.
11.
Graph theory is a valuable framework to study the organization of functional and anatomical connections in the brain. Its use for comparing network topologies, however, is not without difficulties. Graph measures may be influenced by the number of nodes (N) and the average degree (k) of the network. The explicit form of that influence depends on the type of network topology, which is usually unknown for experimental data. Direct comparisons of graph measures between empirical networks with different N and/or k can therefore yield spurious results. We list benefits and pitfalls of various approaches that intend to overcome these difficulties. We discuss the initial graph definition of unweighted graphs via fixed thresholds, average degrees or edge densities, and the use of weighted graphs. For instance, choosing a threshold to fix N and k does eliminate size and density effects but may lead to modifications of the network by enforcing (ignoring) non-significant (significant) connections. Opposed to fixing N and k, graph measures are often normalized via random surrogates but, in fact, this may even increase the sensitivity to differences in N and k for the commonly used clustering coefficient and small-world index. To avoid such a bias we tried to estimate the N,k-dependence for empirical networks, which can serve to correct for size effects, if successful. We also add a number of methods used in social sciences that build on statistics of local network structures including exponential random graph models and motif counting. We show that none of the here-investigated methods allows for a reliable and fully unbiased comparison, but some perform better than others.  相似文献   

12.
S G Self 《Biometrics》1991,47(3):975-986
A class of adaptive weighted log-rank statistics is described where the vector of weights is chosen in a data-dependent way from a family of "smooth" weight vectors. A parametric family of weight vectors is identified which includes most shapes of weighting vectors that will be near optimal in many cancer prevention and screening trials. This family of weight vectors is used in an application of the proposed method to data from a breast cancer screening trial. Results from a small simulation study comparing the power of the adaptive statistic to that of the unweighted log-rank statistic are presented.  相似文献   

13.
In this paper, we propose a worst-case weighted approach to the multi-objective n-person non-zero sum game model where each player has more than one competing objective. Our “worst-case weighted multi-objective game” model supposes that each player has a set of weights to its objectives and wishes to minimize its maximum weighted sum objectives where the maximization is with respect to the set of weights. This new model gives rise to a new Pareto Nash equilibrium concept, which we call “robust-weighted Nash equilibrium”. We prove that the robust-weighted Nash equilibria are guaranteed to exist even when the weight sets are unbounded. For the worst-case weighted multi-objective game with the weight sets of players all given as polytope, we show that a robust-weighted Nash equilibrium can be obtained by solving a mathematical program with equilibrium constraints (MPEC). For an application, we illustrate the usefulness of the worst-case weighted multi-objective game to a supply chain risk management problem under demand uncertainty. By the comparison with the existed weighted approach, we show that our method is more robust and can be more efficiently used for the real-world applications.  相似文献   

14.
The relative efficiencies of the maximum-likelihood (ML), neighbor- joining (NJ), and maximum-parsimony (MP) methods in obtaining the correct topology and in estimating the branch lengths for the case of four DNA sequences were studied by computer simulation, under the assumption either that there is variation in substitution rate among different nucleotide sites or that there is no variation. For the NJ method, several different distance measures (Jukes-Cantor, Kimura two- parameter, and gamma distances) were used, whereas for the ML method three different transition/transversion ratios (R) were used. For the MP method, both the standard unweighted parsimony and the dynamically weighted parsimony methods were used. The results obtained are as follows: (1) When the R value is high, dynamically weighted parsimony is more efficient than unweighted parsimony in obtaining the correct topology. (2) However, both weighted and unweighted parsimony methods are generally less efficient than the NJ and ML methods even in the case where the MP method gives a consistent tree. (3) When all the assumptions of the ML method are satisfied, this method is slightly more efficient than the NJ method. However, when the assumptions are not satisfied, the NJ method with gamma distances is slightly better in obtaining the correct topology than is the ML method. In general, the two methods show more or less the same performance. The NJ method may give a correct topology even when the distance measures used are not unbiased estimators of nucleotide substitutions. (4) Branch length estimates of a tree with the correct topology are affected more easily than topology by violation of the assumptions of the mathematical model used, for both the ML and the NJ methods. Under certain conditions, branch lengths are seriously overestimated or underestimated. The MP method often gives serious underestimates for certain branches. (5) Distance measures that generate the correct topology, with high probability, do not necessarily give good estimates of branch lengths. (6) The likelihood-ratio test and the confidence-limit test, in Felsenstein's DNAML, for examining the statistical of branch length estimates are quite sensitive to violation of the assumptions and are generally too liberal to be used for actual data. Rzhetsky and Nei's branch length test is less sensitive to violation of the assumptions than is Felsenstein's test. (7) When the extent of sequence divergence is < or = 5% and when > or = 1,000 nucleotides are used, all three methods show essentially the same efficiency in obtaining the correct topology and in estimating branch lengths.(ABSTRACT TRUNCATED AT 400 WORDS)   相似文献   

15.
Advances in large-scale technologies in proteomics, such as yeast two-hybrid screening and mass spectrometry, have made it possible to generate large Protein Interaction Networks (PINs). Recent methods for identifying dense sub-graphs in such networks have been based solely on graph theoretic properties. Therefore, there is a need for an approach that will allow us to combine domain-specific knowledge with topological properties to generate functionally relevant sub-graphs from large networks. This article describes two alternative network measures for analysis of PINs, which combine functional information with topological properties of the networks. These measures, called weighted clustering coefficient and weighted average nearest-neighbors degree, use weights representing the strengths of interactions between the proteins, calculated according to their semantic similarity, which is based on the Gene Ontology terms of the proteins. We perform a global analysis of the yeast PIN by systematically comparing the weighted measures with their topological counterparts. To show the usefulness of the weighted measures, we develop an algorithm for identification of functional modules, called SWEMODE (Semantic WEights for MODule Elucidation), that identifies dense sub-graphs containing functionally similar proteins. The proposed method is based on the ranking of nodes, i.e., proteins, according to their weighted neighborhood cohesiveness. The highest ranked nodes are considered as seeds for candidate modules. The algorithm then iterates through the neighborhood of each seed protein, to identify densely connected proteins with high functional similarity, according to the chosen parameters. Using a yeast two-hybrid data set of experimentally determined protein-protein interactions, we demonstrate that SWEMODE is able to identify dense clusters containing proteins that are functionally similar. Many of the identified modules correspond to known complexes or subunits of these complexes.  相似文献   

16.
We introduce a weighted graph model to investigate the self-similarity characteristics of eubacteria genomes. The regular treating in similarity comparison about genome is to discover the evolution distance among different genomes. Few people focus their attention on the overall statistical characteristics of each gene compared with other genes in the same genome. In our model, each genome is attributed to a weighted graph, whose topology describes the similarity relationship among genes in the same genome. Based on the related weighted graph theory, we extract some quantified statistical variables from the topology, and give the distribution of some variables derived from the largest social structure in the topology. The 23 eubacteria recently studied by Sorimachi and Okayasu are markedly classified into two different groups by their double logarithmic point-plots describing the similarity relationship among genes of the largest social structure in genome. The results show that the proposed model may provide us with some new sights to understand the structures and evolution patterns determined from the complete genomes.  相似文献   

17.
How the complexity of food webs relates to stability has been a subject of many studies. Often, unweighted connectance is used to express complexity. Unweighted connectance is measured as the proportion of realized links in the network. Weighted connectance, on the other hand, takes link weights (fluxes or feeding rates) into account and captures the shape of the flux distribution. Here, we used weighted connectance to revisit the relation between complexity and stability. We used 15 real soil food webs and determined the feeding rates and the interaction strength matrices. We calculated both versions of connectance, and related these structural properties to food web stability. We also determined the skewness of both flux and interaction strength distributions with the Gini coefficient. We found no relation between unweighted connectance and food web stability, but weighted connectance was positively correlated with stability. This finding challenges the notion that complexity may constrain stability, and supports the ‘complexity begets stability’ notion. The positive correlation between weighted connectance and stability implies that the more evenly flux rates were distributed over links, the more stable the webs were. This was confirmed by the Gini coefficients of both fluxes and interaction strengths. However, the most even distributions of this dataset still were strongly skewed towards small fluxes or weak interaction strengths. Thus, incorporating these distribution with many weak links via weighted instead of unweighted food web measures can shed new light on classical theories.  相似文献   

18.
Protein sequence design based on the topology of the native state structure   总被引:1,自引:0,他引:1  
Computational design of sequences for a given structure is generally studied by exhaustively enumerating the sequence space or by searching in such a large space, which is prohibitively expensive. However, we point out that the protein topology has a wealth of information, which can be exploited to design sequences for a chosen structure. In this paper, we present a computationally efficient method for ranking the residue sites in a given native-state structure, which enables us to design sequences for a chosen structure. The premise for the method is that the topology of the graph representing the energetically interacting neighbours in a protein plays an important role in the inverse-folding problem. While our previous work (which was also based on topology) used eigenspectral analysis of the adjacency matrix of interactions for ranking the residue sites in a given chain, here we use a simple but effective way of assigning weights to the nodes on the basis of secondary connections, along with primary connections. This indirectly accounts for the edge weight in the graph and removes degeneracy in the degree. The new scheme needs only a few multiplications and additions to compute the preferred ranking of the residue sites even for structures of real proteins of sizes of a few hundred amino acid residues. We use HP lattice model examples (for which exhaustive enumeration of sequences is practical) to validate our ranking approach in obtaining sequences of lowest energy for any H-P residue composition for a given native-state structure. Some examples of native structures of real proteins are also included. Quantitative comparison of the efficacy of the new scheme with the earlier schemes is made. The new scheme consistently performs better and with much lower computational cost. An optimization procedure is added to work with the new scheme in a few rare cases wherein the new scheme fails to provide the best sequence, an optimization procedure is added to work with the new scheme.  相似文献   

19.
Disease gene prioritization aims to suggest potential implications of genes in disease susceptibility. Often accomplished in a guilt-by-association scheme, promising candidates are sorted according to their relatedness to known disease genes. Network-based methods have been successfully exploiting this concept by capturing the interaction of genes or proteins into a score. Nonetheless, most current approaches yield at least some of the following limitations: (1) networks comprise only curated physical interactions leading to poor genome coverage and density, and bias toward a particular source; (2) scores focus on adjacencies (direct links) or the most direct paths (shortest paths) within a constrained neighborhood around the disease genes, ignoring potentially informative indirect paths; (3) global clustering is widely applied to partition the network in an unsupervised manner, attributing little importance to prior knowledge; (4) confidence weights and their contribution to edge differentiation and ranking reliability are often disregarded. We hypothesize that network-based prioritization related to local clustering on graphs and considering full topology of weighted gene association networks integrating heterogeneous sources should overcome the above challenges. We term such a strategy Interactogeneous. We conducted cross-validation tests to assess the impact of network sources, alternative path inclusion and confidence weights on the prioritization of putative genes for 29 diseases. Heat diffusion ranking proved the best prioritization method overall, increasing the gap to neighborhood and shortest paths scores mostly on single source networks. Heterogeneous associations consistently delivered superior performance over single source data across the majority of methods. Results on the contribution of confidence weights were inconclusive. Finally, the best Interactogeneous strategy, heat diffusion ranking and associations from the STRING database, was used to prioritize genes for Parkinson’s disease. This method effectively recovered known genes and uncovered interesting candidates which could be linked to pathogenic mechanisms of the disease.  相似文献   

20.
In this paper, we study the SIS (susceptible–infected–susceptible) and SIR (susceptible–infected–removed) epidemic models on undirected, weighted networks by deriving pairwise-type approximate models coupled with individual-based network simulation. Two different types of theoretical/synthetic weighted network models are considered. Both start from non-weighted networks with fixed topology followed by the allocation of link weights in either (i) random or (ii) fixed/deterministic way. The pairwise models are formulated for a general discrete distribution of weights, and these models are then used in conjunction with stochastic network simulations to evaluate the impact of different weight distributions on epidemic thresholds and dynamics in general. For the SIR model, the basic reproductive ratio R 0 is computed, and we show that (i) for both network models R 0 is maximised if all weights are equal, and (ii) when the two models are ‘equally-matched’, the networks with a random weight distribution give rise to a higher R 0 value. The models with different weight distributions are also used to explore the agreement between the pairwise and simulation models for different parameter combinations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号