首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

Many clustering procedures only allow the user to input a pairwise dissimilarity or distance measure between objects. We propose a clustering method that can input a multi-point dissimilarity measure d(i1, i2, ..., iP) where the number of points P can be larger than 2. The work is motivated by gene network analysis where clusters correspond to modules of highly interconnected nodes. Here, we define modules as clusters of network nodes with high multi-node topological overlap. The topological overlap measure is a robust measure of interconnectedness which is based on shared network neighbors. In previous work, we have shown that the multi-node topological overlap measure yields biologically meaningful results when used as input of network neighborhood analysis.

Findings

We adapt network neighborhood analysis for the use of module detection. We propose the Module Affinity Search Technique (MAST), which is a generalized version of the Cluster Affinity Search Technique (CAST). MAST can accommodate a multi-node dissimilarity measure. Clusters grow around user-defined or automatically chosen seeds (e.g. hub nodes). We propose both local and global cluster growth stopping rules. We use several simulations and a gene co-expression network application to argue that the MAST approach leads to biologically meaningful results. We compare MAST with hierarchical clustering and partitioning around medoid clustering.

Conclusion

Our flexible module detection method is implemented in the MTOM software which can be downloaded from the following webpage: http://www.genetics.ucla.edu/labs/horvath/MTOM/  相似文献   

2.
Chang  Luyao  Li  Fan  Niu  Xinzheng  Zhu  Jiahui 《Cluster computing》2022,25(4):3005-3017

To better collect data in context to balance energy consumption, wireless sensor networks (WSN) need to be divided into clusters. The division of clusters makes the network become a hierarchical organizational structure, which plays the role of balancing the network load and prolonging the life cycle of the system. In clustering routing algorithm, the pros and cons of clustering algorithm directly affect the result of cluster division. In this paper, an algorithm for selecting cluster heads based on node distribution density and allocating remaining nodes is proposed for the defects of cluster head random election and uneven clustering in the traditional LEACH protocol clustering algorithm in WSN. Experiments show that the algorithm can realize the rapid selection of cluster heads and division of clusters, which is effective for node clustering and is conducive to equalizing energy consumption.

  相似文献   

3.

Background

Recent computational techniques have facilitated analyzing genome-wide protein-protein interaction data for several model organisms. Various graph-clustering algorithms have been applied to protein interaction networks on the genomic scale for predicting the entire set of potential protein complexes. In particular, the density-based clustering algorithms which are able to generate overlapping clusters, i.e. the clusters sharing a set of nodes, are well-suited to protein complex detection because each protein could be a member of multiple complexes. However, their accuracy is still limited because of complex overlap patterns of their output clusters.

Results

We present a systematic approach of refining the overlapping clusters identified from protein interaction networks. We have designed novel metrics to assess cluster overlaps: overlap coverage and overlapping consistency. We then propose an overlap refinement algorithm. It takes as input the clusters produced by existing density-based graph-clustering methods and generates a set of refined clusters by parameterizing the metrics. To evaluate protein complex prediction accuracy, we used the f-measure by comparing each refined cluster to known protein complexes. The experimental results with the yeast protein-protein interaction data sets from BioGRID and DIP demonstrate that accuracy on protein complex prediction has increased significantly after refining cluster overlaps.

Conclusions

The effectiveness of the proposed cluster overlap refinement approach for protein complex detection has been validated in this study. Analyzing overlaps of the clusters from protein interaction networks is a crucial task for understanding of functional roles of proteins and topological characteristics of the functional systems.
  相似文献   

4.

Background  

Grouping proteins into sequence-based clusters is a fundamental step in many bioinformatic analyses (e.g., homology-based prediction of structure or function). Standard clustering methods such as single-linkage clustering capture a history of cluster topologies as a function of threshold, but in practice their usefulness is limited because unrelated sequences join clusters before biologically meaningful families are fully constituted, e.g. as the result of matches to so-called promiscuous domains. Use of the Markov Cluster algorithm avoids this non-specificity, but does not preserve topological or threshold information about protein families.  相似文献   

5.
Recent reports suggest that evolving large-scale networks exhibit “explosive percolation”: a large fraction of nodes suddenly becomes connected when sufficiently many links have formed in a network. This phase transition has been shown to be continuous (second-order) for most random network formation processes, including classical mean-field random networks and their modifications. We study a related yet different phenomenon referred to as dense percolation, which occurs when a network is already connected, but a large group of nodes must be dense enough, i.e., have at least a certain minimum required percentage of possible links, to form a “highly connected” cluster. Such clusters have been considered in various contexts, including the recently introduced network modularity principle in biological networks. We prove that, contrary to the traditionally defined percolation transition, dense percolation transition is discontinuous (first-order) under the classical mean-field network formation process (with no modifications); therefore, there is not only quantitative, but also qualitative difference between regular and dense percolation transitions. Moreover, the size of the largest dense (highly connected) cluster in a mean-field random network is explicitly characterized by rigorously proven tight asymptotic bounds, which turn out to naturally extend the previously derived formula for the size of the largest clique (a cluster with all possible links) in such a network. We also briefly discuss possible implications of the obtained mathematical results on studying first-order phase transitions in real-world linked systems.  相似文献   

6.
7.

Background

Recently, large data sets of protein-protein interactions (PPI) which can be modeled as PPI networks are generated through high-throughput methods. And locally dense regions in PPI networks are very likely to be protein complexes. Since protein complexes play a key role in many biological processes, detecting protein complexes in PPI networks is one of important tasks in post-genomic era. However, PPI networks are often incomplete and noisy, which builds barriers to mining protein complexes.

Results

We propose a new and effective algorithm based on robustness to detect overlapping clusters as protein complexes in PPI networks. And in order to improve the accuracy of resulting clusters, our algorithm tries to reduce bad effects brought by noise in PPI networks. And in our algorithm, each new cluster begins from a seed and is expanded through adding qualified nodes from the cluster's neighbourhood nodes. Besides, in our algorithm, a new distance measurement method between a cluster K and a node in the neighbours of K is proposed as well. The performance of our algorithm is evaluated by applying it on two PPI networks which are Gavin network and Database of Interacting Proteins (DIP). The results show that our algorithm is better than Markov clustering algorithm (MCL), Clique Percolation method (CPM) and core-attachment based method (CoAch) in terms of F-measure, co-localization and Gene Ontology (GO) semantic similarity.

Conclusions

Our algorithm detects locally dense regions or clusters as protein complexes. The results show that protein complexes generated by our algorithm have better quality than those generated by some previous classic methods. Therefore, our algorithm is effective and useful.
  相似文献   

8.

Background

The TNM staging system is based on three anatomic prognostic factors: Tumor, Lymph Node and Metastasis. However, cancer is no longer considered an anatomic disease. Therefore, the TNM should be expanded to accommodate new prognostic factors in order to increase the accuracy of estimating cancer patient outcome. The ensemble algorithm for clustering cancer data (EACCD) by Chen et al. reflects an effort to expand the TNM without changing its basic definitions. Though results on using EACCD have been reported, there has been no study on the analysis of the algorithm. In this report, we examine various aspects of EACCD using a large breast cancer patient dataset. We compared the output of EACCD with the corresponding survival curves, investigated the effect of different settings in EACCD, and compared EACCD with alternative clustering approaches.

Results

Using the basic T and N definitions, EACCD generated a dendrogram that shows a graphic relationship among the survival curves of the breast cancer patients. The dendrograms from EACCD are robust for large values of m (the number of runs in the learning step). When m is large, the dendrograms depend on the linkage functions.The statistical tests, however, employed in the learning step have minimal effect on the dendrogram for large m. In addition, if omitting the step for learning dissimilarity in EACCD, the resulting approaches can have a degraded performance. Furthermore, clustering only based on prognostic factors could generate misleading dendrograms, and direct use of partitioning techniques could lead to misleading assignments to clusters.

Conclusions

When only the Partitioning Around Medoids (PAM) algorithm is involved in the step of learning dissimilarity, large values of m are required to obtain robust dendrograms, and for a large m EACCD can effectively cluster cancer patient data.
  相似文献   

9.

The most basic and significant issue in complex network analysis is community detection, which is a branch of machine learning. Most current community detection approaches, only consider a network's topology structures, which lose the potential to use node attribute information. In attributed networks, both topological structure and node attributed are important features for community detection. In recent years, the spectral clustering algorithm has received much interest as one of the best performing algorithms in the subcategory of dimensionality reduction. This algorithm applies the eigenvalues of the affinity matrix to map data to low-dimensional space. In the present paper, a new version of the spectral cluster, named Attributed Spectral Clustering (ASC), is applied for attributed graphs that the identified communities have structural cohesiveness and attribute homogeneity. Since the performance of spectral clustering heavily depends on the goodness of the affinity matrix, the ASC algorithm will use the Topological and Attribute Random Walk Affinity Matrix (TARWAM) as a new affinity matrix to calculate the similarity between nodes. TARWAM utilizes the biased random walk to integrate network topology and attribute information. It can improve the similarity degree among the pairs of nodes in the same density region of the attributed network, without the need for parameter tuning. The proposed approach has been compared to other primary and new attributed graph clustering algorithms based on synthetic and real datasets. The experimental results show that the proposed approach is more effective and accurate compared to other state-of-the-art attributed graph clustering techniques.

  相似文献   

10.
This paper presents an evaluation of ACPI energy saving modes, and deduces the design and implementation of an energy saving daemon for clusters called cherub. The design of the cherub daemon is modular and extensible. Since the only requirement is a central approach for resource management, cherub is suited for Server Load Balancing (SLB) clusters managed by dispatchers like Linux Virtual Server (LVS), as well as for High Performance Computing (HPC) clusters. Our experimental results show that cherub’s scheduling algorithm works well, i.e. it will save energy, if possible, and avoids state-flapping.  相似文献   

11.

Background

The goal of the study was to demonstrate a hierarchical structure of resting state activity in the healthy brain using a data-driven clustering algorithm.

Methodology/Principal Findings

The fuzzy-c-means clustering algorithm was applied to resting state fMRI data in cortical and subcortical gray matter from two groups acquired separately, one of 17 healthy individuals and the second of 21 healthy individuals. Different numbers of clusters and different starting conditions were used. A cluster dispersion measure determined the optimal numbers of clusters. An inner product metric provided a measure of similarity between different clusters. The two cluster result found the task-negative and task-positive systems. The cluster dispersion measure was minimized with seven and eleven clusters. Each of the clusters in the seven and eleven cluster result was associated with either the task-negative or task-positive system. Applying the algorithm to find seven clusters recovered previously described resting state networks, including the default mode network, frontoparietal control network, ventral and dorsal attention networks, somatomotor, visual, and language networks. The language and ventral attention networks had significant subcortical involvement. This parcellation was consistently found in a large majority of algorithm runs under different conditions and was robust to different methods of initialization.

Conclusions/Significance

The clustering of resting state activity using different optimal numbers of clusters identified resting state networks comparable to previously obtained results. This work reinforces the observation that resting state networks are hierarchically organized.  相似文献   

12.
Four of the most common limitations of the many available clustering methods are: i) the lack of a proper strategy to deal with outliers; ii) the need for a good a priori estimate of the number of clusters to obtain reasonable results; iii) the lack of a method able to detect when partitioning of a specific data set is not appropriate; and iv) the dependence of the result on the initialization. Here we propose Cross-clustering (CC), a partial clustering algorithm that overcomes these four limitations by combining the principles of two well established hierarchical clustering algorithms: Ward’s minimum variance and Complete-linkage. We validated CC by comparing it with a number of existing clustering methods, including Ward’s and Complete-linkage. We show on both simulated and real datasets, that CC performs better than the other methods in terms of: the identification of the correct number of clusters, the identification of outliers, and the determination of real cluster memberships. We used CC to cluster samples in order to identify disease subtypes, and on gene profiles, in order to determine groups of genes with the same behavior. Results obtained on a non-biological dataset show that the method is general enough to be successfully used in such diverse applications. The algorithm has been implemented in the statistical language R and is freely available from the CRAN contributed packages repository.  相似文献   

13.
Identifying parental combinations that exhibit high heterosis is a constant target for commercial Brassica napus L. hybrid development programs. Finding high heterotic parental combinations can require hundreds of test crosses and years of yield evaluation. Heterotic pool development could be used to divide breeding material into specific breeding pools and focus the number of parental combinations created. Here, we report the genotypic characterization of 79 B. napus genotypes by calculating genetic distance based on sequence-related amplified polymorphism (SRAP) and genotyping by sequencing (GBS) in association with a neighbour-joining clustering algorithm. Despite the different genotypic analyses, neighbour-joining cluster analysis based on genetic distance of SRAP and GBS produced similar clusters. Homology between SRAP and GBS clusters was approximately 77 % when manually comparing clusters and 68 % when comparing clusters using Compare2Trees. This research demonstrates that SRAP can have similar efficacy when compared to next-generation sequencing technology for heterotic pool classification. This information may provide an important breeding scaffold for the development of hybrid cultivars based upon genetic distance and cluster analysis.  相似文献   

14.

Background

While there are a large number of bioinformatics datasets for clustering, many of them are incomplete, i.e., missing attribute values in some data samples needed by clustering algorithms. A variety of clustering algorithms have been proposed in the past years, but they usually are limited to cluster on the complete dataset. Besides, conventional clustering algorithms cannot obtain a trade-off between accuracy and efficiency of the clustering process since many essential parameters are determined by the human user’s experience.

Results

The paper proposes a Multiple Kernel Density Clustering algorithm for Incomplete datasets called MKDCI. The MKDCI algorithm consists of recovering missing attribute values of input data samples, learning an optimally combined kernel for clustering the input dataset, reducing dimensionality with the optimal kernel based on multiple basis kernels, detecting cluster centroids with the Isolation Forests method, assigning clusters with arbitrary shape and visualizing the results.

Conclusions

Extensive experiments on several well-known clustering datasets in bioinformatics field demonstrate the effectiveness of the proposed MKDCI algorithm. Compared with existing density clustering algorithms and parameter-free clustering algorithms, the proposed MKDCI algorithm tends to automatically produce clusters of better quality on the incomplete dataset in bioinformatics.
  相似文献   

15.
The effect of laminin on the distribution of dystroglycan (DG) and other surface proteins was examined by fluorescent staining in cultures of muscle cells derived from Xenopus embryos. Western blotting confirmed that previously characterized antibodies are reactive in Xenopus. In control cultures, αDG, βDG, and laminin binding sites were distributed as microclusters (<1 μm2 in area) over the entire dorsal surface of the muscle cells. Treatment with laminin induced the formation of macroclusters (1–20 μm2), accompanied by a corresponding decline in the density of the microclusters. With 6 nM laminin, clustering was apparent within 150 min and near maximal within 1 d. Laminin was effective at 30 pM, the lowest concentration tested. The laminin fragment E3, which competes with laminin for binding to αDG, inhibited laminin-induced clustering but did not itself cluster DG, thereby indicating that other portions of the laminin molecule in addition to its αDG binding domain are required for its clustering activity. Laminin-induced clusters also contained dystrophin, but unlike agrin-induced clusters, they did not contain acetylcholine receptors, utrophin, or phosphotyrosine, and their formation was not inhibited by a tyrosine kinase inhibitor. The results reinforce the notion that unclustered DG is mobile on the surface of embryonic muscle cells and suggest that this mobile DG can be trapped by at least two different sets of molecular interactions. Laminin self binding may be the basis for the laminin-induced clustering.  相似文献   

16.
OPTICS is a density-based clustering algorithm that performs well in a wide variety of applications. For a set of input objects, the algorithm creates a reachability plot that can either be used to produce cluster membership assignments, or interpreted itself as an expressive two-dimensional representation of the clustering structure of the input set, even if the input set is embedded in higher dimensions. The focus of this work is a visualization method that can be applied for comparing two, independent hierarchical clusterings by assigning colors to all entries of the input database. We give two applications related to macromolecular structural properties: the first is a sequence-based clustering of the SwissProt database that is evaluated using NCBI taxonomy identifiers, and the second application involves clustering locations of specific atoms in the serine protease enzyme family—and the clusters are evaluated using SCOP structural classifications.  相似文献   

17.
18.

Purpose

To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters) of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment.

Material and Methods

The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4). Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters.

Results

The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4), determined with cluster validation, produced the best separation between reducing and non-reducing clusters.

Conclusion

The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes.  相似文献   

19.
The complex phenomenon of rearrangement of band 3 molecules after erythrocyte swelling under hypotonic condition is considered. The rearrangement includes the increase of the mobile fraction and clustering of band 3. The self-associative tendency and the action of the elastic field generated within the lipid membrane after erythrocyte swelling result in equilibration of the number of molecules per cluster and the number of clusters. The local perturbation of the elastic field induces excitation of the cluster in the nearest neighbor and changes its packing state generating changes in the free volume within the cluster. The local perturbation could result in the reversible formation of osmotic hole. We formulated a model to predict changes of the cluster packing states generated by rearrangement of band 3 molecules on two time-scales. The phenomenon is examined on the basis of two experimental sets, i.e. low (5.2 mM Na3PO4 solution) and high (46.0 mM Na3PO4 solution) hypotonicities at 21°C, from Golan and Veatch (Proc Natl Acad Sci 77(5):2537–2541, 1980). Modeling considerations suggested that lower hypotonic conditions resulted in higher values of: the driving force of agglomeration of band 3 as a measure of self-associative tendency, the specific rate of cluster breaking, the specific rate of increase of the mobile fraction of band 3, and the dispersion of cluster sizes. Lower hypotonic conditions ensure the generation of a higher average value of the free energy within the membrane after erythrocyte swelling, which enables more intensive rearrangement of band 3 molecules.  相似文献   

20.
WCA: A Weighted Clustering Algorithm for Mobile Ad Hoc Networks   总被引:42,自引:0,他引:42  
In this paper, we propose an on-demand distributed clustering algorithm for multi-hop packet radio networks. These types of networks, also known as ad hoc networks, are dynamic in nature due to the mobility of nodes. The association and dissociation of nodes to and from clusters perturb the stability of the network topology, and hence a reconfiguration of the system is often unavoidable. However, it is vital to keep the topology stable as long as possible. The clusterheads, form a dominant set in the network, determine the topology and its stability. The proposed weight-based distributed clustering algorithm takes into consideration the ideal degree, transmission power, mobility, and battery power of mobile nodes. The time required to identify the clusterheads depends on the diameter of the underlying graph. We try to keep the number of nodes in a cluster around a pre-defined threshold to facilitate the optimal operation of the medium access control (MAC) protocol. The non-periodic procedure for clusterhead election is invoked on-demand, and is aimed to reduce the computation and communication costs. The clusterheads, operating in dual power mode, connects the clusters which help in routing messages from a node to any other node. We observe a trade-off between the uniformity of the load handled by the clusterheads and the connectivity of the network. Simulation experiments are conducted to evaluate the performance of our algorithm in terms of the number of clusterheads, reaffiliation frequency, and dominant set updates. Results show that our algorithm performs better than existing ones and is also tunable to different kinds of network conditions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号