首页 | 本学科首页   官方微博 | 高级检索  
     


A seqlet-based maximum entropy Markov approach for protein secondary structure prediction
Authors:Qiwen?Dong  author-information"  >  author-information__contact u-icon-before"  >  mailto:qwdong@insun.hit.edu.cn"   title="  qwdong@insun.hit.edu.cn"   itemprop="  email"   data-track="  click"   data-track-action="  Email author"   data-track-label="  "  >Email author,Xiaolong?Wang,Lei?Lin,Yi?Guan
Affiliation:1.School of Computer Science and Technology,Harbin Institute of Technology,Harbin,China
Abstract:A novel method for predicting the secondary structures of proteins from amino acid sequence has been presented. The protein secondary structure seqlets that are analogous to the words in natural language have been extracted. These seqlets will capture the relationship between amino acid sequence and the secondary structures of proteins and further form the protein secondary structure dictionary. To be elaborate, the dictionary is organism-specific. Protein secondary structure prediction is formulated as an integrated word segmentation and part of speech tagging problem. The word-lattice is used to represent the results of the word segmentation and the maximum entropy model is used to calculate the probability of a seqlet tagged as a certain secondary structure type. The method is markovian in the seqlets, permitting efficient exact calculation of the posterior probability distribution over all possible word segmentations and their tags by viterbi algorithm. The optimal segmentations and their tags are computed as the results of protein secondary structure prediction. The method is applied to predict the secondary structures of proteins of four organisms respectively and compared with the PHD method. The results show that the performance of this method is higher than that of PHD by about 3.9% Q3 accuracy and 4.6% SOV accuracy. Combining with the local similarity protein sequences that are obtained by BLAST can give better prediction. The method is also tested on the 50 CASP5 target proteins with Q3 accuracy 78.9% and SOV accuracy 77.1%. A web server for protein secondary structure prediction has been constructed which is available at http://www.insun.hit.edu.cn:81/demos/biology/index.html.
Keywords:protein secondary structure prediction  protein secondary structure seqlets  Word-lattice  maximum entropy Markov model
本文献已被 万方数据 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号