首页 | 本学科首页   官方微博 | 高级检索  
     


A multiple kernel density clustering algorithm for incomplete datasets in bioinformatics
Authors:Longlong?Liao,Kenli?Li  author-information"  >  author-information__contact u-icon-before"  >  mailto:lkl@hnu.edu.cn"   title="  lkl@hnu.edu.cn"   itemprop="  email"   data-track="  click"   data-track-action="  Email author"   data-track-label="  "  >Email author,Keqin?Li,Canqun?Yang,Qi?Tian
Affiliation:1.College of Computer, National University of Defense Technology,Changsha,China;2.State Key Laboratory of High Performance Computing,Changsha,China;3.College of Information Science and Engineering, Hunan University,Changsha,China;4.Department of Computer Science, State University of New York,New Paltz,USA;5.Department of Computer Science, University of Texas at San Antonio,San Antonio,USA
Abstract:

Background

While there are a large number of bioinformatics datasets for clustering, many of them are incomplete, i.e., missing attribute values in some data samples needed by clustering algorithms. A variety of clustering algorithms have been proposed in the past years, but they usually are limited to cluster on the complete dataset. Besides, conventional clustering algorithms cannot obtain a trade-off between accuracy and efficiency of the clustering process since many essential parameters are determined by the human user’s experience.

Results

The paper proposes a Multiple Kernel Density Clustering algorithm for Incomplete datasets called MKDCI. The MKDCI algorithm consists of recovering missing attribute values of input data samples, learning an optimally combined kernel for clustering the input dataset, reducing dimensionality with the optimal kernel based on multiple basis kernels, detecting cluster centroids with the Isolation Forests method, assigning clusters with arbitrary shape and visualizing the results.

Conclusions

Extensive experiments on several well-known clustering datasets in bioinformatics field demonstrate the effectiveness of the proposed MKDCI algorithm. Compared with existing density clustering algorithms and parameter-free clustering algorithms, the proposed MKDCI algorithm tends to automatically produce clusters of better quality on the incomplete dataset in bioinformatics.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号