首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 406 毫秒
1.
张斌  尹京苑  薛丹 《生物信息学》2011,9(3):224-228,234
蛋白质二级结构对于研究其功能具有重要作用。采用主成分分析方法对氨基酸的基本物化属性及其二级结构倾向性进行降维降噪处理,使用径向基神经网络对蛋白质二级结构进行预测。主成分分析使得之前 20 ×12 矩阵变为 20 ×4 矩阵,极大地减少了神经网络输入端的维数。在仿真过程中,当窗口大小为 21,扩展函数为 7 时,预测精确度达到了 71. 81%。实验结果表明 RBF 神经网络可以有效的用于蛋白质二级结构的预测。  相似文献   

2.
通过研究神经网络权值矩阵的算法,挖掘蛋白质二级结构与氨基酸序列间的内在规律,提高一级序列预测二级结构的准确度。神经网络方法在特征分类方面具有良好表现,经过学习训练后的神经元连接权值矩阵包含样本的内在特征和规律。研究使用神经网络权值矩阵打分预测;采用错位比对方法寻找敏感的氨基酸邻域;分析测试集在不同加窗长度下的共性表现。实验表明,在滑动窗口长度L=7时,预测性能变化显著;邻域位置P=4的氨基酸残基对预测性能有加强作用。该研究方法为基于局部序列特征的蛋白质二级结构预测提供了新的算法设计。  相似文献   

3.
蛋白质结构的预测在理解蛋白质结构组成和蛋白质的生物学功能有重要意义,而蛋白质二级结构预测是蛋白质结构预测的重要环节。当PSSM位置特异性进化矩阵被广泛应用于将蛋白质初级结构序列编码作为输入样本后,每个残基可以被表示成二维空间的数据平面,由此文中尝试利用卷积神经网络对其进行训练。文中还设计了另一种卷积神经网络,利用长短记忆网络感知了CNN最后卷积特征面的横向特征和纵向特征后连同卷积神经网络的全连接共同完成分类,最后用ensemble方法对两类卷积神经网络模型进行了整合,最终ensemble方法中包含两类卷积神经网络的六个模型,在CB513蛋白质数据集测得的Q3结果为77.2。  相似文献   

4.
曹晨  马堃 《生物信息学》2016,14(3):181-187
蛋白质二级结构是指蛋白质骨架结构中有规律重复的构象。由蛋白质原子坐标正确地指定蛋白质二级结构是分析蛋白质结构与功能的基础,二级结构的指定对于蛋白质分类、蛋白质功能模体的发现以及理解蛋白质折叠机制有着重要的作用。并且蛋白质二级结构信息广泛应用到蛋白质分子可视化、蛋白质比对以及蛋白质结构预测中。目前有超过20种蛋白质二级结构指定方法,这些方法大体可以分为两大类:基于氢键和基于几何,不同方法指定结果之间的差异较大。由于尚没有蛋白质二级结构指定方法的综述文献,因此,本文主要介绍和总结已有蛋白质二级结构指定方法。  相似文献   

5.
氨基酸组成聚类、蛋白质结构型和结构型的预测   总被引:11,自引:0,他引:11  
用信息聚类方法对蛋白质的氨基酸组成进行聚类,发现存在梯级成团(大集团分解成小集团)现象,645个蛋白质可分成15个小集团,每一个小集团与蛋白质二级结构含量决定的结构型有一定相关性,但与蛋白质五大结构型相关性不明显。指出了由氨基酸成分和二级结构含量预测结构型的方案中存在的问题。提出了由蛋白质二级结构序列预测蛋白质结构型的新方法,并给出了预测蛋白质结构型的简明预测规则  相似文献   

6.
用人工神经网络方法预测蛋白质超二级结构   总被引:10,自引:0,他引:10  
蛋白质超二级结构,即由α-螺旋和β-折叠等二级结构单元和连接短肽组成的超二级结构,是蛋白质结构研究中的一个重要层次。目前蛋白质超二级结构的预测工作尚属摸索阶段,还没有成熟的方法。人工神经网络预测方法是近年来在二级结构预测中发展起来的新方法。本文成功的将人工神经网络引入蛋白质超二级结构的预测工作中,结果表明蛋白质的超二级结构的发生与其局域的氨基酸的序列模式有重要联系,可以由蛋白质的一级结构序列预测该  相似文献   

7.
PPⅡ二级结构是一种稀有的蛋白质结构类型。目前使用机器学习方法预测此二级结构的工作还比较少见。引入一种新的方法———支持向量机 (SVM)来预测PPII二级结构 ,并与神经网络方法进行了比较 ,结果表明 ,SVM方法在预测PPII结构上表现良好 ,预测精度达到 76 .5 2 %。  相似文献   

8.
提出了一种新的蛋白质二级结构预测方法. 该方法从氨基酸序列中提取出和自然语言中的“词”类似的与物种相关的蛋白质二级结构词条, 这些词条形成了蛋白质二级结构词典, 该词典描述了氨基酸序列和蛋白质二级结构之间的关系. 预测蛋白质二级结构的过程和自然语言中的分词和词性标注一体化的过程类似. 该方法把词条序列看成是马尔科夫链, 通过Viterbi算法搜索每个词条被标注为某种二级结构类型的最大概率, 其中使用词网格描述分词的结果, 使用最大熵马尔科夫模型计算词条的二级结构概率. 蛋白质二级结构预测的结果是最优的分词所对应的二级结构类型. 在4个物种的蛋白质序列上对这种方法进行测试, 并和PHD方法进行比较. 试验结果显示, 这种方法的Q3准确率比PHD方法高3.9%, SOV准确率比PHD方法高4.6%. 结合BLAST搜索的局部相似的序列可以进一步提高预测的准确率. 在50个CASP5目标蛋白质序列上进行测试的结果是: Q3准确率为78.9%, SOV准确率为77.1%. 基于这种方法建立了一个蛋白质二级结构预测的服务器, 可以通过http://www.insun.hit.edu.cn:81/demos/biology/index.html来访问.  相似文献   

9.
吴琳琳  徐硕 《生物信息学》2010,8(3):187-190
蛋白质结构预测是现代计算生物领域最重要的问题之一,而蛋白质二级结构预测是蛋白质高级结构预测的基础。目前蛋白质二级结构的预测方法较多,其中SVM方法取得了较高的预测精度。重在阐述使用SVM用于蛋白质二级结构预测的步骤,以及与其他方法进行比较时应该注意的事项,为下一步的研究提供参考及启发。  相似文献   

10.
蛋白质超二级结构预测是三级结构预测的一个非常重要的中间步骤。本文从蛋白质的一级序列出发,对5793个蛋白质中的四类简单超二级结构进行预测,以位点氨基酸为参数,采用3种片段截取方式,分别用离散增量算法预测的结果不理想,将组合的离散增量值作为特征参数输入支持向量机,取得了较好的预测结果,5交叉检验的平均预测总精度达到83.0%,Matthew’s相关系数在0.71以上。  相似文献   

11.
Chao Fang  Yi Shang  Dong Xu 《Proteins》2018,86(5):592-598
Protein secondary structure prediction can provide important information for protein 3D structure prediction and protein functions. Deep learning offers a new opportunity to significantly improve prediction accuracy. In this article, a new deep neural network architecture, named the Deep inception‐inside‐inception (Deep3I) network, is proposed for protein secondary structure prediction and implemented as a software tool MUFOLD‐SS. The input to MUFOLD‐SS is a carefully designed feature matrix corresponding to the primary amino acid sequence of a protein, which consists of a rich set of information derived from individual amino acid, as well as the context of the protein sequence. Specifically, the feature matrix is a composition of physio‐chemical properties of amino acids, PSI‐BLAST profile, and HHBlits profile. MUFOLD‐SS is composed of a sequence of nested inception modules and maps the input matrix to either eight states or three states of secondary structures. The architecture of MUFOLD‐SS enables effective processing of local and global interactions between amino acids in making accurate prediction. In extensive experiments on multiple datasets, MUFOLD‐SS outperformed the best existing methods and other deep neural networks significantly. MUFold‐SS can be downloaded from http://dslsrv8.cs.missouri.edu/~cf797/MUFoldSS/download.html .  相似文献   

12.
Accuracy of predicting protein secondary structure and solvent accessibility from sequence information has been improved significantly by using information contained in multiple sequence alignments as input to a neural 'network system. For the Asilomar meeting, predictions for 13 proteins were generated automatically using the publicly available prediction method PHD. The results confirm the estimate of 72% three-state prediction accuracy. The fairly accurate predictions of secondary structure segments made the tool useful as a starting point for modeling of higher dimensional aspects of protein structure. © 1995 Wiley-Liss, Inc.  相似文献   

13.
Protein chemical shifts encode detailed structural information that is difficult and computationally costly to describe at a fundamental level. Statistical and machine learning approaches have been used to infer correlations between chemical shifts and secondary structure from experimental chemical shifts. These methods range from simple statistics such as the chemical shift index to complex methods using neural networks. Notwithstanding their higher accuracy, more complex approaches tend to obscure the relationship between secondary structure and chemical shift and often involve many parameters that need to be trained. We present hidden Markov models (HMMs) with Gaussian emission probabilities to model the dependence between protein chemical shifts and secondary structure. The continuous emission probabilities are modeled as conditional probabilities for a given amino acid and secondary structure type. Using these distributions as outputs of first‐ and second‐order HMMs, we achieve a prediction accuracy of 82.3%, which is competitive with existing methods for predicting secondary structure from protein chemical shifts. Incorporation of sequence‐based secondary structure prediction into our HMM improves the prediction accuracy to 84.0%. Our findings suggest that an HMM with correlated Gaussian distributions conditioned on the secondary structure provides an adequate generative model of chemical shifts. Proteins 2013; © 2012 Wiley Periodicals, Inc.  相似文献   

14.
A novel method for predicting the secondary structures of proteins from amino acid sequence has been presented. The protein secondary structure seqlets that are analogous to the words in natural language have been extracted. These seqlets will capture the relationship between amino acid sequence and the secondary structures of proteins and further form the protein secondary structure dictionary. To be elaborate, the dictionary is organism-specific. Protein secondary structure prediction is formulated as an integrated word segmentation and part of speech tagging problem. The word-lattice is used to represent the results of the word segmentation and the maximum entropy model is used to calculate the probability of a seqlet tagged as a certain secondary structure type. The method is markovian in the seqlets, permitting efficient exact calculation of the posterior probability distribution over all possible word segmentations and their tags by viterbi algorithm. The optimal segmentations and their tags are computed as the results of protein secondary structure prediction. The method is applied to predict the secondary structures of proteins of four organisms respectively and compared with the PHD method. The results show that the performance of this method is higher than that of PHD by about 3.9% Q3 accuracy and 4.6% SOV accuracy. Combining with the local similarity protein sequences that are obtained by BLAST can give better prediction. The method is also tested on the 50 CASP5 target proteins with Q3 accuracy 78.9% and SOV accuracy 77.1%. A web server for protein secondary structure prediction has been constructed which is available at http://www.insun.hit.edu.cn:81/demos/biology/index.html.  相似文献   

15.
1 Introduction The prediction of protein structure and function from amino acid sequences is one of the most impor-tant problems in molecular biology. This problem is becoming more pressing as the number of known pro-tein sequences is explored as a result of genome and other sequencing projects, and the protein sequence- structure gap is widening rapidly[1]. Therefore, com-putational tools to predict protein structures are needed to narrow the widening gap. Although the prediction of three dim…  相似文献   

16.
Cascaded multiple classifiers for secondary structure prediction   总被引:11,自引:0,他引:11       下载免费PDF全文
We describe a new classifier for protein secondary structure prediction that is formed by cascading together different types of classifiers using neural networks and linear discrimination. The new classifier achieves an accuracy of 76.7% (assessed by a rigorous full Jack-knife procedure) on a new nonredundant dataset of 496 nonhomologous sequences (obtained from G.J. Barton and J.A. Cuff). This database was especially designed to train and test protein secondary structure prediction methods, and it uses a more stringent definition of homologous sequence than in previous studies. We show that it is possible to design classifiers that can highly discriminate the three classes (H, E, C) with an accuracy of up to 78% for beta-strands, using only a local window and resampling techniques. This indicates that the importance of long-range interactions for the prediction of beta-strands has been probably previously overestimated.  相似文献   

17.
Prediction of transmembrane spans and secondary structure from the protein sequence is generally the first step in the structural characterization of (membrane) proteins. Preference of a stretch of amino acids in a protein to form secondary structure and being placed in the membrane are correlated. Nevertheless, current methods predict either secondary structure or individual transmembrane states. We introduce a method that simultaneously predicts the secondary structure and transmembrane spans from the protein sequence. This approach not only eliminates the necessity to create a consensus prediction from possibly contradicting outputs of several predictors but bears the potential to predict conformational switches, i.e., sequence regions that have a high probability to change for example from a coil conformation in solution to an α‐helical transmembrane state. An artificial neural network was trained on databases of 177 membrane proteins and 6048 soluble proteins. The output is a 3 × 3 dimensional probability matrix for each residue in the sequence that combines three secondary structure types (helix, strand, coil) and three environment types (membrane core, interface, solution). The prediction accuracies are 70.3% for nine possible states, 73.2% for three‐state secondary structure prediction, and 94.8% for three‐state transmembrane span prediction. These accuracies are comparable to state‐of‐the‐art predictors of secondary structure (e.g., Psipred) or transmembrane placement (e.g., OCTOPUS). The method is available as web server and for download at www.meilerlab.org . Proteins 2013; 81:1127–1140. © 2013 Wiley Periodicals, Inc.  相似文献   

18.
Homaeian L  Kurgan LA  Ruan J  Cios KJ  Chen K 《Proteins》2007,69(3):486-498
Secondary protein structure carries information about local structural arrangements, which include three major conformations: alpha-helices, beta-strands, and coils. Significant majority of successful methods for prediction of the secondary structure is based on multiple sequence alignment. However, multiple alignment fails to provide accurate results when a sequence comes from the twilight zone, that is, it is characterized by low (<30%) homology. To this end, we propose a novel method for prediction of secondary structure content through comprehensive sequence representation, called PSSC-core. The method uses a multiple linear regression model and introduces a comprehensive feature-based sequence representation to predict amount of helices and strands for sequences from the twilight zone. The PSSC-core method was tested and compared with two other state-of-the-art prediction methods on a set of 2187 twilight zone sequences. The results indicate that our method provides better predictions for both helix and strand content. The PSSC-core is shown to provide statistically significantly better results when compared with the competing methods, reducing the prediction error by 5-7% for helix and 7-9% for strand content predictions. The proposed feature-based sequence representation uses a comprehensive set of physicochemical properties that are custom-designed for each of the helix and strand content predictions. It includes composition and composition moment vectors, frequency of tetra-peptides associated with helical and strand conformations, various property-based groups like exchange groups, chemical groups of the side chains and hydrophobic group, auto-correlations based on hydrophobicity, side-chain masses, hydropathy, and conformational patterns for beta-sheets. The PSSC-core method provides an alternative for predicting the secondary structure content that can be used to validate and constrain results of other structure prediction methods. At the same time, it also provides useful insight into design of successful protein sequence representations that can be used in developing new methods related to prediction of different aspects of the secondary protein structure.  相似文献   

19.
目前评价蛋白质二级结构预测方法主要考虑预测准确率,并没有充分考虑方法自身参数对方法的影响。本文提出一种新型评价方法,将内在评价与外在评价相结合评价预测方法的优劣。以基于混合并行遗传算法的蛋白质二级结构预测方法为例,通过内在评价,合理选取内在参数——切片长度和组内类别数,有效提高预测准确率,同时,通过外在评价,与其他基于随机算法的蛋白质二级结构预测算法比较和与CASP所提供的结论比较,说明了方法的有效性与正确性,以此验证内在评价和外在评价的客观性、公正性和全面性。  相似文献   

20.
目前蛋白质二级结构的预测准确率徘徊在75%左右,难以作进一步提高。本文通过统计学的方法,对蛋白质的冗余数据库进行了分析。并由此证明,目前影响预测准确率继续的真正原因是蛋白质数据库本身的系统误差,系统误差大约为25%。而该误差是由于实验条件的客观原因带来的。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号