首页 | 本学科首页   官方微博 | 高级检索  
   检索      

基于CNN与LSTM模型的蛋白质二级结构预测
引用本文:王剑,成金勇,赵志刚,鹿文鹏.基于CNN与LSTM模型的蛋白质二级结构预测[J].生物信息学,2018,16(2):130-135.
作者姓名:王剑  成金勇  赵志刚  鹿文鹏
作者单位:齐鲁工业大学(山东省科学院)信息学院;齐鲁工业大学(山东省科学院)山东省计算中心(国家超级计算济南中心)
基金项目:国家自然科学基金(61375013);山东省自然科学基金(ZR2013FM020).
摘    要:蛋白质结构的预测在理解蛋白质结构组成和蛋白质的生物学功能有重要意义,而蛋白质二级结构预测是蛋白质结构预测的重要环节。当PSSM位置特异性进化矩阵被广泛应用于将蛋白质初级结构序列编码作为输入样本后,每个残基可以被表示成二维空间的数据平面,由此文中尝试利用卷积神经网络对其进行训练。文中还设计了另一种卷积神经网络,利用长短记忆网络感知了CNN最后卷积特征面的横向特征和纵向特征后连同卷积神经网络的全连接共同完成分类,最后用ensemble方法对两类卷积神经网络模型进行了整合,最终ensemble方法中包含两类卷积神经网络的六个模型,在CB513蛋白质数据集测得的Q3结果为77.2。

关 键 词:卷积神经网络  长短记忆网络  蛋白质二级结构预测  Ensemble方法
收稿时间:2017/12/20 0:00:00
修稿时间:2017/3/21 0:00:00

Protein secondary structure prediction based on CNN and LSTM models
WANG Jian,CHENG Jinyong,ZHAO Zhigang and LU Wenpeng.Protein secondary structure prediction based on CNN and LSTM models[J].China Journal of Bioinformation,2018,16(2):130-135.
Authors:WANG Jian  CHENG Jinyong  ZHAO Zhigang and LU Wenpeng
Institution:College of Information, Qilu University of TechnologyShandong Academy of Sciences, Jinan 250353, China,,College of Information, Qilu University of TechnologyShandong Academy of Sciences, Jinan 250353, China,,Shandong Computer Science Center National Supercomputer Center in Jinan Qilu University of Technology Shandong Academy of Sciences, Jinan 250101, China and College of Information, Qilu University of TechnologyShandong Academy of Sciences, Jinan 250353, China,
Abstract:The prediction of protein structure is of great significance in understanding the structure and the biological function of proteins. The prediction of protein secondary structure is an important part of protein structure prediction. When PSSM position-specific evolution matrix is widely used to encode the primary sequence of a protein, and used as input sample, each residue can be represented as a two-dimensional data plane. Therefore, a convolutional neural network can be adopted as a model to train them. In this paper, we also designed another type of CNN in which LSTM were used to perceive the features of CNN last convolution feature maps both horizontally and vertically, and completed classification collaboratively with the fully-connected neural elements of convolution model. Finally, an ensemble method was adopted to integrate these two types of CNN models. This designed ensemble method includes six models of these two types of CNN. The Q3 accuracy obtained from CB513 is 77.2.
Keywords:CNN  LSTM  Protein secondary structure prediction  Ensemble method
本文献已被 CNKI 等数据库收录!
点击此处可从《生物信息学》浏览原始摘要信息
点击此处可从《生物信息学》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号