首页 | 本学科首页   官方微博 | 高级检索  
   检索      

基于信息论k—modes聚类法的基因表达数据分析
引用本文:刘文远,李建飞,王宝文,于家新.基于信息论k—modes聚类法的基因表达数据分析[J].生物信息学,2009,7(2):95-98.
作者姓名:刘文远  李建飞  王宝文  于家新
作者单位:燕山大学信信息科学与工程学院,河北,秦皇岛,066004
摘    要:k-均值聚类算法是一种广泛应用于基因表达数据聚类分析中的迭代变换算法,它通常用距离法来表示基因间的关系,但不能有效的反应基因间的相互依赖的关系。为此,提出基于信息论的k-modes聚类算法,克服了以上缺点。另外,还引入了伪F统计量,一方面,可以对空间中有部分重叠的点进行有效的分类;另一方面,可以给出最佳聚类数目,从而弥补了k-modes聚类法的不足。使其成为一种非常有效的算法,从而达到较优的聚类效果。

关 键 词:基因表达数据  聚类分析    互信息  伪F统计量

K-modes algorithm for gene expression data analysis based on information theory
LIU Wen-yuan,LI Jian-fei,WANG Bao-wen,YU Jia-xin.K-modes algorithm for gene expression data analysis based on information theory[J].China Journal of Bioinformation,2009,7(2):95-98.
Authors:LIU Wen-yuan  LI Jian-fei  WANG Bao-wen  YU Jia-xin
Institution:( College of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China )
Abstract:K- means clustering algorithm is an iterative transformation algorithm which is widely applied in gene expression data clustering analysis, it measures the relationship between genes by distance, but which can not reflect the interdependence relationship of genes effectively. For this, an attribute clustering algorithm - k - modes based on information theory was proposed, which overcomes the demerits mentioned above. In addition, we have also introduced pseudo F - statistics, on the one hand, some of the overlapping points in space realizes effective classification; on the other hand, it can give the best clustering number, thereby making up for the shortage of k - modes clustering method. All of these merits made the proposed method very effective to achieve optimum clustering effect.
Keywords:gene expression data  clustering analysis  entropy  mutual information  pseudo F-statistics
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号