基于信息论k—modes聚类法的基因表达数据分析 K-modes algorithm for gene expression data analysis based on information theory期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于信息论k—modes聚类法的基因表达数据分析

引用本文：	刘文远,李建飞,王宝文,于家新.基于信息论k—modes聚类法的基因表达数据分析[J].生物信息学,2009,7(2):95-98.

作者姓名：	刘文远李建飞王宝文于家新

作者单位：	燕山大学信信息科学与工程学院,河北,秦皇岛,066004

摘要：	k-均值聚类算法是一种广泛应用于基因表达数据聚类分析中的迭代变换算法，它通常用距离法来表示基因间的关系，但不能有效的反应基因间的相互依赖的关系。为此，提出基于信息论的k-modes聚类算法，克服了以上缺点。另外，还引入了伪F统计量，一方面，可以对空间中有部分重叠的点进行有效的分类；另一方面，可以给出最佳聚类数目，从而弥补了k-modes聚类法的不足。使其成为一种非常有效的算法，从而达到较优的聚类效果。
关键词：	基因表达数据聚类分析熵互信息伪F统计量
K-modes algorithm for gene expression data analysis based on information theory

LIU Wen-yuan,LI Jian-fei,WANG Bao-wen,YU Jia-xin.K-modes algorithm for gene expression data analysis based on information theory[J].China Journal of Bioinformation,2009,7(2):95-98.

Authors:	LIU Wen-yuan LI Jian-fei WANG Bao-wen YU Jia-xin

Institution:	( College of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China )

Abstract:	K- means clustering algorithm is an iterative transformation algorithm which is widely applied in gene expression data clustering analysis, it measures the relationship between genes by distance, but which can not reflect the interdependence relationship of genes effectively. For this, an attribute clustering algorithm - k - modes based on information theory was proposed, which overcomes the demerits mentioned above. In addition, we have also introduced pseudo F - statistics, on the one hand, some of the overlapping points in space realizes effective classification; on the other hand, it can give the best clustering number, thereby making up for the shortage of k - modes clustering method. All of these merits made the proposed method very effective to achieve optimum clustering effect.

Keywords:	gene expression data clustering analysis entropy mutual information pseudo F-statistics
本文献已被维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏