首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到17条相似文献,搜索用时 62 毫秒
1.
蛋白质二级结构预测是蛋白质结构研究的一个重要环节,大量的新预测方法被提出的同时,也不断有新的蛋白质二级结构预测服务器出现。试验选取7种目前常用的蛋白质二级结构预测服务器:PSRSM、SPOT-1D、MUFOLD、Spider3、RaptorX,Psipred和Jpred4,对它们进行了使用方法的介绍和预测效果的评估。随机选取了PDB在2018年8月至11月份发布的180条蛋白质作为测试集,评估角度为:Q3、Sov、边界识别率、内部识别率、转角C识别率,折叠E识别率和螺旋H识别率七种角度。上述服务器180条测试数据的Q3结果分别为:89.96%、88.18%、86.74%、85.77%、83.61%,79.72%和78.29%。结果表明PSRSM的预测结果最好。180条测试集中,以同源性30%,40%,70%分类的实验结果中,PSRSM的Q3结果分别为:89.49%、90.53%、89.87%,均优于其他服务器。实验结果表明,蛋白质二级结构预测可从结合多种深度学习方法以及使用大数据训练模型方向做进一步的研究。  相似文献   

2.
目前蛋白质二级结构的预测准确率徘徊在75%左右,难以作进一步提高。本文通过统计学的方法,对蛋白质的冗余数据库进行了分析。并由此证明,目前影响预测准确率继续的真正原因是蛋白质数据库本身的系统误差,系统误差大约为25%。而该误差是由于实验条件的客观原因带来的。  相似文献   

3.
神经网络在蛋白质二级结构预测中的应用   总被引:3,自引:0,他引:3  
介绍了蛋白质二级结构预测的研究意义,讨论了用在蛋白质二级结构预测方面的神经网络设计问题,并且较详尽地评述了近些年来用神经网络方法在蛋白质二级结构预测中的主要工作进展情况,展望了蛋白质结构预测的前景。  相似文献   

4.
用人工神经网络方法预测蛋白质超二级结构   总被引:10,自引:0,他引:10  
蛋白质超二级结构,即由α-螺旋和β-折叠等二级结构单元和连接短肽组成的超二级结构,是蛋白质结构研究中的一个重要层次。目前蛋白质超二级结构的预测工作尚属摸索阶段,还没有成熟的方法。人工神经网络预测方法是近年来在二级结构预测中发展起来的新方法。本文成功的将人工神经网络引入蛋白质超二级结构的预测工作中,结果表明蛋白质的超二级结构的发生与其局域的氨基酸的序列模式有重要联系,可以由蛋白质的一级结构序列预测该  相似文献   

5.
吴琳琳  徐硕 《生物信息学》2010,8(3):187-190
蛋白质结构预测是现代计算生物领域最重要的问题之一,而蛋白质二级结构预测是蛋白质高级结构预测的基础。目前蛋白质二级结构的预测方法较多,其中SVM方法取得了较高的预测精度。重在阐述使用SVM用于蛋白质二级结构预测的步骤,以及与其他方法进行比较时应该注意的事项,为下一步的研究提供参考及启发。  相似文献   

6.
目前评价蛋白质二级结构预测方法主要考虑预测准确率,并没有充分考虑方法自身参数对方法的影响。本文提出一种新型评价方法,将内在评价与外在评价相结合评价预测方法的优劣。以基于混合并行遗传算法的蛋白质二级结构预测方法为例,通过内在评价,合理选取内在参数——切片长度和组内类别数,有效提高预测准确率,同时,通过外在评价,与其他基于随机算法的蛋白质二级结构预测算法比较和与CASP所提供的结论比较,说明了方法的有效性与正确性,以此验证内在评价和外在评价的客观性、公正性和全面性。  相似文献   

7.
曹晨  马堃 《生物信息学》2016,14(3):181-187
蛋白质二级结构是指蛋白质骨架结构中有规律重复的构象。由蛋白质原子坐标正确地指定蛋白质二级结构是分析蛋白质结构与功能的基础,二级结构的指定对于蛋白质分类、蛋白质功能模体的发现以及理解蛋白质折叠机制有着重要的作用。并且蛋白质二级结构信息广泛应用到蛋白质分子可视化、蛋白质比对以及蛋白质结构预测中。目前有超过20种蛋白质二级结构指定方法,这些方法大体可以分为两大类:基于氢键和基于几何,不同方法指定结果之间的差异较大。由于尚没有蛋白质二级结构指定方法的综述文献,因此,本文主要介绍和总结已有蛋白质二级结构指定方法。  相似文献   

8.
张斌  尹京苑  薛丹 《生物信息学》2011,9(3):224-228,234
蛋白质二级结构对于研究其功能具有重要作用。采用主成分分析方法对氨基酸的基本物化属性及其二级结构倾向性进行降维降噪处理,使用径向基神经网络对蛋白质二级结构进行预测。主成分分析使得之前 20 ×12 矩阵变为 20 ×4 矩阵,极大地减少了神经网络输入端的维数。在仿真过程中,当窗口大小为 21,扩展函数为 7 时,预测精确度达到了 71. 81%。实验结果表明 RBF 神经网络可以有效的用于蛋白质二级结构的预测。  相似文献   

9.
提出了一种新的蛋白质二级结构预测方法. 该方法从氨基酸序列中提取出和自然语言中的“词”类似的与物种相关的蛋白质二级结构词条, 这些词条形成了蛋白质二级结构词典, 该词典描述了氨基酸序列和蛋白质二级结构之间的关系. 预测蛋白质二级结构的过程和自然语言中的分词和词性标注一体化的过程类似. 该方法把词条序列看成是马尔科夫链, 通过Viterbi算法搜索每个词条被标注为某种二级结构类型的最大概率, 其中使用词网格描述分词的结果, 使用最大熵马尔科夫模型计算词条的二级结构概率. 蛋白质二级结构预测的结果是最优的分词所对应的二级结构类型. 在4个物种的蛋白质序列上对这种方法进行测试, 并和PHD方法进行比较. 试验结果显示, 这种方法的Q3准确率比PHD方法高3.9%, SOV准确率比PHD方法高4.6%. 结合BLAST搜索的局部相似的序列可以进一步提高预测的准确率. 在50个CASP5目标蛋白质序列上进行测试的结果是: Q3准确率为78.9%, SOV准确率为77.1%. 基于这种方法建立了一个蛋白质二级结构预测的服务器, 可以通过http://www.insun.hit.edu.cn:81/demos/biology/index.html来访问.  相似文献   

10.
蛋白质二级结构预测是进行蛋白质三级结构研究的重要基础,氨基酸的编码方式对二级结构预测有一定的影响。本文应用了一种新的组合编码方式,即将基团编码与位置特异性打分矩阵(PSSM)进行组合的编码方式。本文中提出的基团编码是针对氨基酸的一种新的编码方式,基团编码是根据氨基酸内部组成来进行编码的,由42位属性组成。本文选取位置特异性打分矩阵(PSSM)中的Blosum62进化矩阵和新的基团编码进行组合,形成新的编码方式。然后对CB513和25pdb两组数据分别进行实验。本文中将采用贝叶斯分类器与自动编码器两种方法来对这种新的编码方式进行实验,然后比较这两种方法得到的两组数据的结果。可以很明显的发现采用自动编码器的实验结果要比使用贝叶斯分类器的结果要高出1.65%。在本文的实验中,可以提取特征的自动编码器的预测准确率更好。  相似文献   

11.
蛋白质结构的预测在理解蛋白质结构组成和蛋白质的生物学功能有重要意义,而蛋白质二级结构预测是蛋白质结构预测的重要环节。当PSSM位置特异性进化矩阵被广泛应用于将蛋白质初级结构序列编码作为输入样本后,每个残基可以被表示成二维空间的数据平面,由此文中尝试利用卷积神经网络对其进行训练。文中还设计了另一种卷积神经网络,利用长短记忆网络感知了CNN最后卷积特征面的横向特征和纵向特征后连同卷积神经网络的全连接共同完成分类,最后用ensemble方法对两类卷积神经网络模型进行了整合,最终ensemble方法中包含两类卷积神经网络的六个模型,在CB513蛋白质数据集测得的Q3结果为77.2。  相似文献   

12.
We have investigated amino acid features that determine secondary structure: (1) the solvent accessibility of each side chain, and (2) the interaction of each side chain with others one to four residues apart. Solvent accessibility is a simple model that distinguishes residue environment. The pairwise interactions represent a simple model of local side chain to side chain interactions. To test the importance of these features we developed an algorithm to separate alpha-helices, beta-strands, and "other" structure. Single residue and pairwise probabilities were determined for 25,141 samples from proteins with <30% homology. Combining the features of solvent accessibility with pairwise probabilities allows us to distinguish the three structures after cross validation at the 82.0% level. We gain 1.4% to 2.0% accuracy by optimizing the propensities, demonstrating that probabilities do not necessarily reflect propensities. Optimization of residue exposures, weights of all probabilities, and propensities increased accuracy to 84.0%.  相似文献   

13.
Molecular modeling of proteins is confronted with the problem of finding homologous proteins, especially when few identities remain after the process of molecular evolution. Using even the most recent methods based on sequence identity detection, structural relationships are still difficult to establish with high reliability. As protein structures are more conserved than sequences, we investigated the possibility of using protein secondary structure comparison (observed or predicted structures) to discriminate between related and unrelated proteins sequences in the range of 10%-30% sequence identity. Pairwise comparison of secondary structures have been measured using the structural overlap (Sov) parameter. In this article, we show that if the secondary structures likeness is >50%, most of the pairs are structurally related. Taking into account the secondary structures of proteins that have been detected by BLAST, FASTA, or SSEARCH in the noisy region (with high E: value), we show that distantly related protein sequences (even with <20% identity) can be still identified. This strategy can be used to identify three-dimensional templates in homology modeling by finding unexpected related proteins and to select proteins for experimental investigation in a structural genomic approach, as well as for genome annotation.  相似文献   

14.
The Chou-Fasman predictive algorithm for determining the secondary structure of proteins from the primary sequence is reviewed. Many examples of its use are presented which illustrate its wide applicability, such as predicting (a) regions with the potential for conformational change, (b) sequences which are capable of assuming several conformations in different environments, (c) effects of single amino acid mutations, (d) amino acid replacements in synthesis of peptides to bring about a change in conformation, (e) guide to the synthesis of polypeptides with definitive secondary structure,e.g. signal sequences, (f) conformational homologues from varying sequences and (g) the amino acid requirements for amphiphilicα-helical peptides.  相似文献   

15.
In the present paper, we describe how a directed graph was constructed and then searched for the optimum path using a dynamic programming approach, based on the secondary structure propensity of the protein short sequence derived from a training data set. The protein secondary structure was thus predicted in this way. The average three-state accuracy of the algorithm used was 76.70%.  相似文献   

16.
We provide an overview of the methods that can be used for protein structure-based evaluation of missense variants. The algorithms can be broadly divided into those that calculate the difference in free energy (ΔΔG) between the wild type and variant structures and those that use structural features to predict the damaging effect of a variant without providing a ΔΔG. A wide range of machine learning approaches have been employed to develop those algorithms. We also discuss challenges and opportunities for variant interpretation in view of the recent breakthrough in three-dimensional structural modelling using deep learning.  相似文献   

17.
Circular dichroism spectra of proteins are extremely sensitive to secondary structure. Nevertheless, circular dichroism spectra should not be analyzed for protein secondary structure unless they are measured to at least 184 nm. Even if all the various types ofβ-turns are lumped together, there are at least 5 different types of secondary structure in a protein (α-helix, antiparallelβ-sheet, parallelβ-sheet,β-turn, and other structures not included in the first 4 categories). It is not possible to solve for these 5 parameters unless there are 5 equations. Singular value decomposition can be used to show that circular dichroism spectra of proteins measured to 200 nm contain only 2 pieces of information, while spectra measured to 190 nm contain about 4. Adding the constraint that the sum of secondary structures must equal 1 provides another piece of information, but even with this constraint, spectra measured to 190 nm simply do not analyze well for the 5 unknowns in secondary structure. Spectra measured to 184 nm do contain 5 pieces of information and we have used such spectra successfully to analyze a variety of proteins for their component secondary structures.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号