首页 | 本学科首页   官方微博 | 高级检索  
   检索      

ILLUMINA Golden Gate DNA甲基化芯片的KL-FCM聚类分析
引用本文:张林,石玥,汪菲,李琪,万苏磊,王雪松.ILLUMINA Golden Gate DNA甲基化芯片的KL-FCM聚类分析[J].生物信息学,2014,12(2):106-109.
作者姓名:张林  石玥  汪菲  李琪  万苏磊  王雪松
作者单位:中国矿业大学信息与电气工程学院,江苏 徐州 221116;中国矿业大学信息与电气工程学院,江苏 徐州 221116;中国矿业大学信息与电气工程学院,江苏 徐州 221116;中国矿业大学信息与电气工程学院,江苏 徐州 221116;中国矿业大学信息与电气工程学院,江苏 徐州 221116;中国矿业大学信息与电气工程学院,江苏 徐州 221116
基金项目:中国博士后基金面上项目(2012M511336、2012M511335);江苏省大学生创新创业训练计划;霍英东教育基金会青年教师基金(121066)资助。
摘    要:DNA甲基化作为一种重要的表观遗传修饰,其甲基化水平被发现与疾病的发生发展密切相关,对其进行聚类分析有希望发现新的疾病亚型并建立有效的疾病预测预后方法。传统的聚类分析方法之一模糊C-均值(FCM:Fuzzy C-means)适用于特征空间呈球形或椭球形分布的场景,缺乏普适性。而Illumina Golden Gate平台通过计算基因的各甲基化位点的甲基化百分比描述其甲基化程度,其值位于(0,1)之间,服从混合贝塔分布,不能直接采用FCM进行聚类分析。鉴于此,本文提出基于KL特征测度的KL-FCM聚类算法,采用各样本间的K-L距离作为样本划分时的度量准则。最后,本文基于KL-FCM算法实现IRIS测试数据集和基因的DNA甲基化水平数据的聚类分析。实验结果表明该方法可以以更低的计算负荷获得优于k-均值(k-means)和传统FCM的分类效果。

关 键 词:模糊C均值  ILLUMINA  DNA甲基化芯片  K-L距离
收稿时间:2014/3/14 0:00:00

KL-FCM clustering analysis inIllumina golden gate DNA methylation microarrray
ZHANG Lin,SHI Yue,WANG Fei,LI Qi,WAN Sulei and Wang Xuesong.KL-FCM clustering analysis inIllumina golden gate DNA methylation microarrray[J].China Journal of Bioinformation,2014,12(2):106-109.
Authors:ZHANG Lin  SHI Yue  WANG Fei  LI Qi  WAN Sulei and Wang Xuesong
Institution:School of Information and Electrical Engineering China University of Mining and Technology,Xuzhou Jiangsu 221116,China;School of Information and Electrical Engineering China University of Mining and Technology,Xuzhou Jiangsu 221116,China;School of Information and Electrical Engineering China University of Mining and Technology,Xuzhou Jiangsu 221116,China;School of Information and Electrical Engineering China University of Mining and Technology,Xuzhou Jiangsu 221116,China;School of Information and Electrical Engineering China University of Mining and Technology,Xuzhou Jiangsu 221116,China;School of Information and Electrical Engineering China University of Mining and Technology,Xuzhou Jiangsu 221116,China
Abstract:DNA methylation is an important epigenetic modification, which has been found to be closely related to the occurrence and development of disease. Clustering analysis of DNA methylation is expected to find novel subtype of disease or novel method of prediction and prognosis. Fuzzy C-means (FCM) is one of the common clustering methods. However it is more suitable in the condition that the feature space follows spherical or elliptical distribution, which makes it lack in universality. Illumina Golden Gate platform describes the methylation level based on the methylation percentage of each locus in each gene, and it is in (0,1), which follows beta mixture distribution. Thus we can not adopt FCM for clustering directly. This paper introduces the KL-FCM clustering method, which calculates the K-L distance of samples as partition measure. The KL-FCM is used to cluster the IRIS test dataset and some DNA methylation profile data. The validation results show that KL-FCM,with less computational load, can get better clustering performance than k-means and traditional FCM clustering methods.
Keywords:Fuzzy C-means  DNA methylation expressionmicroarray  K-L distance
本文献已被 维普 等数据库收录!
点击此处可从《生物信息学》浏览原始摘要信息
点击此处可从《生物信息学》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号