首页 | 本学科首页   官方微博 | 高级检索  
     

基于模拟退火算法的高分辨率蛋白质质谱数据特征选择
引用本文:李义峰,刘毅慧. 基于模拟退火算法的高分辨率蛋白质质谱数据特征选择[J]. 生物信息学, 2009, 7(2): 85-90
作者姓名:李义峰  刘毅慧
作者单位:山东轻工业学院信息科学与技术学院智能信息处理研究所,济南,250353
基金项目:本研究成果由山东省高等学校优秀骨干教师国际合作培养项目经费和山东轻工业学院博士启动基金 
摘    要:蛋白质质谱技术是蛋白质组学的重要研究工具,它被出色地应用于癌症早期诊断等领域,但是蛋白质质谱数据带来的维灾难问题使得降维成为质谱分析的必需的步骤。本文首先将美国国家癌症研究所提供的高分辨率SELDI—TOF卵巢质谱数据进行预处理;然后将质谱数据的特征选择问题转化成基于模拟退火算法的组合优化模型,用基于线性判别式分析的分类错误率和样本后验概率构造待优化目标函数,用基于均匀分布和控制参数的方法构造新解产生器,在退火过程中添加记忆功能;然后用10-fold交叉验证法选择训练和测试样本,用线性判别式分析分类器评价降维后的质谱数据。实验证明,用模拟退火算法选择6个以上特征时,能够将高分辨率SELDI—TOF卵巢质谱数据全部正确分类,说明模拟退火算法可以很好地应用于蛋白质质谱数据的特征选择。

关 键 词:模拟退火  特征选择  线性判别式分析  癌症早期诊断  蛋白质质谱

Feature selection based on simulated annealing algorithm for high-resolution protein mass spectrometry data
LI Yi-feng,LIU Yi-hui. Feature selection based on simulated annealing algorithm for high-resolution protein mass spectrometry data[J]. Chinese Journal of Bioinformatics, 2009, 7(2): 85-90
Authors:LI Yi-feng  LIU Yi-hui
Affiliation:( Institute of lntelligence Information Processing, School of Information Science and Technology, Shandong Institute of Lighi Industry, Jinan 250353, China )
Abstract:Mass Spectrometry is a significant tool for researching proteomics, and it has been wonderfully used for detection of early - stage cancer. Nevertheless the curse of dimensionality inherently from mass spectrometry data makes the dimensionality reduction a necessary step. Firstly, the raw high - resolution SELDI - TOF ovarian dataset, provided by National Cancer Institute, is preprocessed. Secondly, the feature selection problem is transformed into combinational optimization solver model based on Simulated Annealing Algorithm, which constructs the objective function using the classification error rate of linear discriminant analysis and the posterior probabilities of training samples, builds the new solution generator based on uniform distribution and controlling parameter, adds a memory element for the best- so- far result during the annealing. After feature selection, 10- fold cross validation is performed and classification is conducted to evaluate the dimensionally reduced mass spectrometry data. Experiments show that over six features can classify the whole high - resolution SELDI - TOF ovarian dataset perfectly, which illuminates the simulated annealing algorithm can be outstandingly applied to feature selection of mass spectrometry data.
Keywords:Simulated Annealing   Feature Selection   Linear Discriminant Analysis   Detection of Early - stage Cancer   Protein Mass Spectrometry
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号