首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The application of ACO-based algorithms in data mining has been growing over the last few years, and several supervised and unsupervised learning algorithms have been developed using this bio-inspired approach. Most recent works about unsupervised learning have focused on clustering, showing the potential of ACO-based techniques. However, there are still clustering areas that are almost unexplored using these techniques, such as medoid-based clustering. Medoid-based clustering methods are helpful—compared to classical centroid-based techniques—when centroids cannot be easily defined. This paper proposes two medoid-based ACO clustering algorithms, where the only information needed is the distance between data: one algorithm that uses an ACO procedure to determine an optimal medoid set (METACOC algorithm) and another algorithm that uses an automatic selection of the number of clusters (METACOC-K algorithm). The proposed algorithms are compared against classical clustering approaches using synthetic and real-world datasets.  相似文献   

2.
Optimization in dynamic optimization problems (DOPs) requires the optimization algorithms not only to locate, but also to continuously track the moving optima. Particle swarm optimization (PSO) is a population-based optimization algorithm, originally developed for static problems. Recently, several researchers have proposed variants of PSO for optimization in DOPs. This paper presents a novel multi-swarm PSO algorithm, namely competitive clustering PSO (CCPSO), designed specially for DOPs. Employing a multi-stage clustering procedure, CCPSO splits the particles of the main swarm over a number of sub-swarms based on the particles positions and on their objective function values. The algorithm automatically adjusts the number of sub-swarms and the corresponding region of each sub-swarm. In addition to the sub-swarms, there is also a group of free particles that explore the environment to locate new emerging optima or exploit the current optima which are not followed by any sub-swarm. The adaptive search strategy adopted by the sub-swarms improves both the exploitation and tracking characteristics of CCPSO. A set of experiments is conducted to study the behavior of the proposed algorithm in different DOPs and to provide guidelines for setting the algorithm’s parameters in different problems. The results of CCPSO on a variety of moving peaks benchmark (MPB) functions are compared with those of several state-of-the-art PSO algorithms, indicating the efficiency of the proposed model.  相似文献   

3.
4.
Microarray technology facilitates the monitoring of the expression levels of thousands of genes over different experimental conditions simultaneously. Clustering is a popular data mining tool which can be applied to microarray gene expression data to identify co-expressed genes. Most of the traditional clustering methods optimize a single clustering goodness criterion and thus may not be capable of performing well on all kinds of datasets. Motivated by this, in this article, a multiobjective clustering technique that optimizes cluster compactness and separation simultaneously, has been improved through a novel support vector machine classification based cluster ensemble method. The superiority of MOCSVMEN (MultiObjective Clustering with Support Vector Machine based ENsemble) has been established by comparing its performance with that of several well known existing microarray data clustering algorithms. Two real-life benchmark gene expression datasets have been used for testing the comparative performances of different algorithms. A recently developed metric, called Biological Homogeneity Index (BHI), which computes the clustering goodness with respect to functional annotation, has been used for the comparison purpose.  相似文献   

5.
Summary Identifying homogeneous groups of individuals is an important problem in population genetics. Recently, several methods have been proposed that exploit spatial information to improve clustering algorithms. In this article, we develop a Bayesian clustering algorithm based on the Dirichlet process prior that uses both genetic and spatial information to classify individuals into homogeneous clusters for further study. We study the performance of our method using a simulation study and use our model to cluster wolverines in Western Montana using microsatellite data.  相似文献   

6.
《IRBM》2020,41(5):267-275
Background and objectiveClustering is a widely used popular method for data analysis within many clustering algorithms for years. Today it is used in many predictions, collaborative filtering and automatic segmentation systems on different domains. Also, to be broadly used in practice, such clustering algorithms need to give both better performance and robustness when compared to the ones currently used. In recent years, evolutionary algorithms are used in many domains since they are robust and easy to implement. And many clustering problems can be easily solved with such algorithms if the problem is modeled as an optimization problem. In this paper, we present an optimization approach for clustering by using four well-known evolutionary algorithms which are Biogeography-Based Optimization (BBO), Grey Wolf Optimization (GWO), Genetic Algorithm (GA) and Particle Swarm Optimization (PSO).Methodthe objective function has been specified to minimize the total distance from cluster centers to the data points. Euclidean distance is used for distance calculation. We have applied this objective function to the given algorithms both to find the most efficient clustering algorithm and to compare the clustering performances of algorithms against different data sizes. In order to benchmark the clustering performances of algorithms in the experiments, we have used a number of datasets with different data sizes such as some small scale, medium and big data. The clustering performances have been compared to K-means as it is a widely used clustering algorithm for years in literature. Rand Index, Adjusted Rand Index, Mirkin's Index and Hubert's Index have been considered as parameters for evaluating the clustering performances.ResultAs a result of the clustering experiments of algorithms over different datasets with varying data sizes according to the specified performance criteria, GA and GWO algorithms show better clustering performances among the others.ConclusionsThe results of the study showed that although the algorithms have shown satisfactory clustering results on small and medium scale datasets, the clustering performances on Big data need to be improved.  相似文献   

7.
A key step in network analysis is to partition a complex network into dense modules. Currently, modularity is one of the most popular benefit functions used to partition network modules. However, recent studies suggested that it has an inherent limitation in detecting dense network modules. In this study, we observed that despite the limitation, modularity has the advantage of preserving the primary network structure of the undetected modules. Thus, we have developed a simple iterative Network Partition (iNP) algorithm to partition a network. The iNP algorithm provides a general framework in which any modularity-based algorithm can be implemented in the network partition step. Here, we tested iNP with three modularity-based algorithms: multi-step greedy (MSG), spectral clustering and Qcut. Compared with the original three methods, iNP achieved a significant improvement in the quality of network partition in a benchmark study with simulated networks, identified more modules with significantly better enrichment of functionally related genes in both yeast protein complex network and breast cancer gene co-expression network, and discovered more cancer-specific modules in the cancer gene co-expression network. As such, iNP should have a broad application as a general method to assist in the analysis of biological networks.  相似文献   

8.
The large variety of clustering algorithms and their variants can be daunting to researchers wishing to explore patterns within their microarray datasets. Furthermore, each clustering method has distinct biases in finding patterns within the data, and clusterings may not be reproducible across different algorithms. A consensus approach utilizing multiple algorithms can show where the various methods agree and expose robust patterns within the data. In this paper, we present a software package - Consense, written for R/Bioconductor - that utilizes such an approach to explore microarray datasets. Consense produces clustering results for each of the clustering methods and produces a report of metrics comparing the individual clusterings. A feature of Consense is identification of genes that cluster consistently with an index gene across methods. Utilizing simulated microarray data, sensitivity of the metrics to the biases of the different clustering algorithms is explored. The framework is easily extensible, allowing this tool to be used by other functional genomic data types, as well as other high-throughput OMICS data types generated from metabolomic and proteomic experiments. It also provides a flexible environment to benchmark new clustering algorithms. Consense is currently available as an installable R/Bioconductor package (http://www.ohsucancer.com/isrdev/consense/).  相似文献   

9.
This work introduces a new algorithm for "gene ordering". Given a matrix of gene expression data values, the task is to find a permutation of the gene names list such that genes with similar expression patterns should be relatively close in the permutation. The algorithm is based on a combined approach that integrates a constructive heuristic with evolutionary and Tabu Search techniques in a single methodology. To evaluate the benefits of this method, we compared our results with the current outputs provided by several widely used algorithms in functional genomics. We also compared the results with our own hierarchical clustering method when used in isolation. We show that the use of images, corrupted with known levels of noise, helps to illustrate some aspects of the performance of the algorithms and provide a complementary benchmark for the analysis. The use of these images, with known high-quality solutions, facilitates in some cases the assessment of the methods and helps the software development, validation and reproducibility of results. We also propose two quantitative measures of performance for gene ordering. Using these measures, we make a comparison with probably the most used algorithm (due to Eisen and collaborators, PNAS 1998) using a microarray dataset available on the public domain (the complete yeast cell cycle dataset).  相似文献   

10.
Visual target tracking is a primary task in many computer vision applications and has been widely studied in recent years. Among all the tracking methods, the mean shift algorithm has attracted extraordinary interest and been well developed in the past decade due to its excellent performance. However, it is still challenging for the color histogram based algorithms to deal with the complex target tracking. Therefore, the algorithms based on other distinguishing features are highly required. In this paper, we propose a novel target tracking algorithm based on mean shift theory, in which a new type of image feature is introduced and utilized to find the corresponding region between the neighbor frames. The target histogram is created by clustering the features obtained in the extraction strategy. Then, the mean shift process is adopted to calculate the target location iteratively. Experimental results demonstrate that the proposed algorithm can deal with the challenging tracking situations such as: partial occlusion, illumination change, scale variations, object rotation and complex background clutter. Meanwhile, it outperforms several state-of-the-art methods.  相似文献   

11.
12.
13.
Previous studies have been conducted in gene expression profiling to identify groups of genes that characterize the colorectal carcinoma disease. Despite the success of previous attempts to identify groups of genes in the progression of the colorectal carcinoma disease, their methods either require subjective interpretation of the number of clusters, or lack stability during different runs of the algorithms. All of which limits the usefulness of these methods. In this study, we propose an enhanced algorithm that provides stability and robustness in identifying differentially expressed genes in an expression profile analysis. Our proposed algorithm uses multiple clustering algorithms under the consensus clustering framework. The results of the experiment show that the robustness of our method provides a consistent structure of clusters, similar to the structure found in the previous study. Furthermore, our algorithm outperforms any single clustering algorithms in terms of the cluster quality score.  相似文献   

14.
Scoring clustering solutions by their biological relevance   总被引:1,自引:0,他引:1  
MOTIVATION: A central step in the analysis of gene expression data is the identification of groups of genes that exhibit similar expression patterns. Clustering gene expression data into homogeneous groups was shown to be instrumental in functional annotation, tissue classification, regulatory motif identification, and other applications. Although there is a rich literature on clustering algorithms for gene expression analysis, very few works addressed the systematic comparison and evaluation of clustering results. Typically, different clustering algorithms yield different clustering solutions on the same data, and there is no agreed upon guideline for choosing among them. RESULTS: We developed a novel statistically based method for assessing a clustering solution according to prior biological knowledge. Our method can be used to compare different clustering solutions or to optimize the parameters of a clustering algorithm. The method is based on projecting vectors of biological attributes of the clustered elements onto the real line, such that the ratio of between-groups and within-group variance estimators is maximized. The projected data are then scored using a non-parametric analysis of variance test, and the score's confidence is evaluated. We validate our approach using simulated data and show that our scoring method outperforms several extant methods, including the separation to homogeneity ratio and the silhouette measure. We apply our method to evaluate results of several clustering methods on yeast cell-cycle gene expression data. AVAILABILITY: The software is available from the authors upon request.  相似文献   

15.
Ji X  Li-Ling J  Sun Z 《FEBS letters》2003,542(1-3):125-131
In this work we have developed a new framework for microarray gene expression data analysis. This framework is based on hidden Markov models. We have benchmarked the performance of this probability model-based clustering algorithm on several gene expression datasets for which external evaluation criteria were available. The results showed that this approach could produce clusters of quality comparable to two prevalent clustering algorithms, but with the major advantage of determining the number of clusters. We have also applied this algorithm to analyze published data of yeast cell cycle gene expression and found it able to successfully dig out biologically meaningful gene groups. In addition, this algorithm can also find correlation between different functional groups and distinguish between function genes and regulation genes, which is helpful to construct a network describing particular biological associations. Currently, this method is limited to time series data. Supplementary materials are available at http://www.bioinfo.tsinghua.edu.cn/~rich/hmmgep_supp/.  相似文献   

16.
A variety of methods and algorithms have been developed to solve NP-Hard problems in recent decades. In this paper, we are concerned with a relatively new algorithm based on animal behavioral adaptability and evolutionary computation, namely predatory search. When first introduced, the algorithm was implemented with restrictions based on solution cost as a simplification of distance adopted by search-intensive predators. Our research concentrates on exploring the possibility of using distance to restrict search area. Based on the research of Boese et al. (1994), we propose a type of predatory search algorithm restricted by solution distance (particularly bond distance), and compare it with the original algorithm based on three benchmark traveling salesman problems. The results indicate that both algorithms are suitable for solving the traveling salesman problems, while our proposed algorithm either outperforms or at least matches its predecessor with respect to both the running time and the quality of solutions. In addition, further experiments suggest that there exists a certain relationship between the two algorithms.  相似文献   

17.
Semi-supervised clustering algorithms are increasingly employed for discovering hidden structure in data with partially labelled patterns. In order to make the clustering approach useful and acceptable to users, the information provided must be simple, natural and limited in number. To improve recognition capability, we apply an effective feature enhancement procedure to the entire data-set to obtain a single set of features or weights by weighting and discriminating the information provided by the user. By taking pairwise constraints into account, we propose a semi-supervised fuzzy clustering algorithm with feature discrimination (SFFD) incorporating a fully adaptive distance function. Experiments on several standard benchmark data sets demonstrate the effectiveness of the proposed method.  相似文献   

18.
Taxonomy-independent analysis plays an essential role in microbial community analysis. Hierarchical clustering is one of the most widely employed approaches to finding operational taxonomic units, the basis for many downstream analyses. Most existing algorithms have quadratic space and computational complexities, and thus can be used only for small or medium-scale problems. We propose a new online learning-based algorithm that simultaneously addresses the space and computational issues of prior work. The basic idea is to partition a sequence space into a set of subspaces using a partition tree constructed using a pseudometric, then recursively refine a clustering structure in these subspaces. The technique relies on new methods for fast closest-pair searching and efficient dynamic insertion and deletion of tree nodes. To avoid exhaustive computation of pairwise distances between clusters, we represent each cluster of sequences as a probabilistic sequence, and define a set of operations to align these probabilistic sequences and compute genetic distances between them. We present analyses of space and computational complexity, and demonstrate the effectiveness of our new algorithm using a human gut microbiota data set with over one million sequences. The new algorithm exhibits a quasilinear time and space complexity comparable to greedy heuristic clustering algorithms, while achieving a similar accuracy to the standard hierarchical clustering algorithm.  相似文献   

19.
The artificial bee colony (ABC) algorithm is a recent class of swarm intelligence algorithms that is loosely inspired by the foraging behavior of honeybee swarms. It was introduced in 2005 using continuous optimization problems as an example application. Similar to what has happened with other swarm intelligence techniques, after the initial proposal, several researchers have studied variants of the original algorithm. Unfortunately, often these variants have been tested under different experimental conditions and different fine-tuning efforts for the algorithm parameters. In this article, we review various variants of the original ABC algorithm and experimentally study nine ABC algorithms under two settings: either using the original parameter settings as proposed by the authors, or using an automatic algorithm configuration tool using a same tuning effort for each algorithm. We also study the effect of adding local search to the ABC algorithms. Our experimental results show that local search can improve considerably the performance of several ABC variants and that it reduces strongly the performance differences between the studied ABC variants. We also show that the best ABC variants are competitive with recent state-of-the-art algorithms on the benchmark set we used, which establishes ABC algorithms as serious competitors in continuous optimization.  相似文献   

20.
MOTIVATION: Grouping genes having similar expression patterns is called gene clustering, which has been proved to be a useful tool for extracting underlying biological information of gene expression data. Many clustering procedures have shown success in microarray gene clustering; most of them belong to the family of heuristic clustering algorithms. Model-based algorithms are alternative clustering algorithms, which are based on the assumption that the whole set of microarray data is a finite mixture of a certain type of distributions with different parameters. Application of the model-based algorithms to unsupervised clustering has been reported. Here, for the first time, we demonstrated the use of the model-based algorithm in supervised clustering of microarray data. RESULTS: We applied the proposed methods to real gene expression data and simulated data. We showed that the supervised model-based algorithm is superior over the unsupervised method and the support vector machines (SVM) method. AVAILABILITY: The program written in the SAS language implementing methods I-III in this report is available upon request. The software of SVMs is available in the website http://svm.sdsc.edu/cgi-bin/nph-SVMsubmit.cgi  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号