曹晨  马堃 《生物信息学》2016,14(3):181-187
蛋白质二级结构是指蛋白质骨架结构中有规律重复的构象。由蛋白质原子坐标正确地指定蛋白质二级结构是分析蛋白质结构与功能的基础,二级结构的指定对于蛋白质分类、蛋白质功能模体的发现以及理解蛋白质折叠机制有着重要的作用。并且蛋白质二级结构信息广泛应用到蛋白质分子可视化、蛋白质比对以及蛋白质结构预测中。目前有超过20种蛋白质二级结构指定方法,这些方法大体可以分为两大类:基于氢键和基于几何,不同方法指定结果之间的差异较大。由于尚没有蛋白质二级结构指定方法的综述文献,因此,本文主要介绍和总结已有蛋白质二级结构指定方法。  相似文献   

统计分析了人的 119种蛋白质和大肠杆菌的 92种蛋白质密码子翻译速率和蛋白质二级结构的关系。据m 密码子片段在不同二级结构中的频数分布 ,我们发现人和大肠杆菌中翻译速率与蛋白质二级结构之间有一定关系 :高翻译速率时倾向编码α螺旋、不倾向编码线团 (coil) ;低翻译速率时倾向编码线团、不倾向编码α螺旋 ;β折叠结构则随翻译速率表现出明显的振荡。同时 ,密码子的使用在不同片段内一般也是不均匀的 :在α螺旋片段内 ,结构尾部偏向使用高翻译速率密码子 ;中部倾向使用中翻译速率密码子 ;而头部使用的密码子翻译速率偏低。这样的倾向性不大可能归结为随机起伏的影响。  相似文献   

神经网络在蛋白质二级结构预测中的应用   总被引:3,自引:0,他引:3  
介绍了蛋白质二级结构预测的研究意义,讨论了用在蛋白质二级结构预测方面的神经网络设计问题,并且较详尽地评述了近些年来用神经网络方法在蛋白质二级结构预测中的主要工作进展情况,展望了蛋白质结构预测的前景。  相似文献   

为了研究一级结构对蛋白质耐热性的影响,利用软件DNAMAN对16个家族32种蛋白质序列进行了氨基酸含量分析,并统计分析了氨基酸组成对蛋白质耐热性的影响。通过比较同一家族的高低温蛋白质序列及16个家族中所有高温和低温蛋白质序列中氨基酸含量的变化可以推断(从低温到高温):Ser、Cys.含量降低显著,Arg、Ile、Pro含量升高显著。由此可知高温蛋白质倾向于含有疏水性氨基酸而避免亲水性氨基酸。  相似文献   

[目的]统计48种氨基酸性质之间的相关性并分析其对蛋白质二级结构的影响。[方法]用r软件对20种氨基酸性质进行分析。[结果]侧链二面角角度柔性等3种性质与Pα相关系数为0.412、0.832、0.477等,以正相关影响α-螺旋的形成,而可压缩系数与Pα相关系数为-0.293,以负相关影响其形成;疏水介热性等17种性质与Pβ的相关系数为0.536、0.867等,以正相关影响β-折叠的形成,而氨基酸极性等6种性质与Pβ的相关系数为-0.547等,以负相关影响其形成;均方根起伏位移与Pt、Pc的相关系数为0.742、0.73,以正相关影响β-转角和无规则卷曲的形成,而膨松度等19种性质与Pt、Pc的相关系数为-0.635、-0.626等,以负相关影响二者的形成。[结论]氨基酸本身信息对蛋白质二级结构有正或负相关的影响。  相似文献   

目前蛋白质二级结构的预测准确率徘徊在75%左右,难以作进一步提高。本文通过统计学的方法,对蛋白质的冗余数据库进行了分析。并由此证明,目前影响预测准确率继续的真正原因是蛋白质数据库本身的系统误差,系统误差大约为25%。而该误差是由于实验条件的客观原因带来的。  相似文献   

蛋白质二级结构预测是蛋白质结构研究的一个重要环节,大量的新预测方法被提出的同时,也不断有新的蛋白质二级结构预测服务器出现。试验选取7种目前常用的蛋白质二级结构预测服务器:PSRSM、SPOT-1D、MUFOLD、Spider3、RaptorX,Psipred和Jpred4,对它们进行了使用方法的介绍和预测效果的评估。随机选取了PDB在2018年8月至11月份发布的180条蛋白质作为测试集,评估角度为:Q3、Sov、边界识别率、内部识别率、转角C识别率,折叠E识别率和螺旋H识别率七种角度。上述服务器180条测试数据的Q3结果分别为:89.96%、88.18%、86.74%、85.77%、83.61%,79.72%和78.29%。结果表明PSRSM的预测结果最好。180条测试集中,以同源性30%,40%,70%分类的实验结果中,PSRSM的Q3结果分别为:89.49%、90.53%、89.87%,均优于其他服务器。实验结果表明,蛋白质二级结构预测可从结合多种深度学习方法以及使用大数据训练模型方向做进一步的研究。  相似文献   

蛋白质二级结构的真空紫外圆二色性研究   总被引:2,自引:0,他引:2  
利用同步辐射真空紫外圆二色谱仪和特制的样品池,测定溶液中蛋白质的真空紫外圆二色谱,测定波长低至175nm,并应用一种新的计算法分析计算了蛋白质5种二级结构的含量,所得结果与用X射线衍射法测定的结果一致.讨论了获得好的真空紫外圆二色谱的几个重要因素.结果表明,真空紫外圆二色法是目前测定溶液中蛋白质二级结构的较好方法之一.  相似文献   

近年来关于蛋白质超二级结构(supersecondary motifs,Motifs)的研究已成为国际上一个热点课题,国内也开始出现有关的研究论文,蛋白质超二级结构是两个或几个规则二级结构单元的进一步组合,或看成是二级结构的局域折叠.文章就蛋白质Motifs结构的定义,特点,及对这一结构层次开展研究的意义作了综述,并对蛋白质Motifs研究的进展作了简要的介绍.  相似文献   

用人工神经网络方法预测蛋白质超二级结构   总被引:10,自引:0,他引:10  
蛋白质超二级结构,即由α-螺旋和β-折叠等二级结构单元和连接短肽组成的超二级结构,是蛋白质结构研究中的一个重要层次。目前蛋白质超二级结构的预测工作尚属摸索阶段,还没有成熟的方法。人工神经网络预测方法是近年来在二级结构预测中发展起来的新方法。本文成功的将人工神经网络引入蛋白质超二级结构的预测工作中,结果表明蛋白质的超二级结构的发生与其局域的氨基酸的序列模式有重要联系,可以由蛋白质的一级结构序列预测该  相似文献   

Vries JK  Liu X  Bahar I 《Proteins》2007,68(4):830-838
An n-gram pattern (NP{n,m}) in a protein sequence is a set of n residues and m wildcards in a window of size n+m. Each window of n+m amino acids is associated with a collection of NP{n,m} patterns based on the combinatorics of n+m objects taken m at a time. NP{n,m} patterns that are shared between sequences reflect evolutionary relationships. Recently the authors developed an alignment-independent protein classification algorithm based on shared NP{4,2} patterns that compared favorably to PSI-BLAST. Theoretically, NP{4,2} patterns should also reflect secondary structure propensity since they contain all possible n-grams for 1 < or = n < or = 4 and a window of 6 residues is wide enough to capture periodicities in the 2 < or = n < or = 5 range. This sparked interest in differentiating the information content in NP{4,2} patterns related to evolution from the content related to local propensity. The probability of alpha-, beta-, and coil components was determined for every NP{4,2} pattern over all the chains in the Protein Data Bank (PDB). An algorithm exclusively based on the Z-values of these distributions was developed, which accurately predicted 71-76% of alpha-helical segments and 62-67% of beta-sheets in rigorous jackknife tests. This provided evidence for the strong correlation between NP{4,2} patterns and secondary structure. By grouping PDB chains into subsets with increasing levels of sequence identity, it was also possible to separate the evolutionary and local propensity contributions to the classification process. The results showed that information derived from evolutionary relationships was more important for beta-sheet prediction than alpha-helix prediction.  相似文献   

Jia M  Luo L  Liu C 《Biopolymers》2004,73(1):16-26
A new integrated sequence-structure database, called IADE (Integrated ASTRAL-DSSP-EMBL), incorporating matching mRNA sequence, amino acid sequence, and protein secondary structural data, is constructed. It includes 648 protein domains. Based on the IADE database, we studied the relation between RNA stem-loop frequencies and protein secondary structure. It was found that the alpha-helices and beta-strands on proteins tend to be preferably "coded" by mRNA stem region, while the coils on proteins tend to be preferably "coded" by mRNA loop region. These tendencies are more obvious if we observe the structural words (SWs). An SW is defined by a four-amino-acid-fragment that shows the pronounced secondary structural (alpha-helix or beta-strand) propensity. It is demonstrated that the deduced correlation between protein and mRNA structure can hardly be explained as the stochastic fluctuation effect.  相似文献   

Homaeian L  Kurgan LA  Ruan J  Cios KJ  Chen K 《Proteins》2007,69(3):486-498
Secondary protein structure carries information about local structural arrangements, which include three major conformations: alpha-helices, beta-strands, and coils. Significant majority of successful methods for prediction of the secondary structure is based on multiple sequence alignment. However, multiple alignment fails to provide accurate results when a sequence comes from the twilight zone, that is, it is characterized by low (<30%) homology. To this end, we propose a novel method for prediction of secondary structure content through comprehensive sequence representation, called PSSC-core. The method uses a multiple linear regression model and introduces a comprehensive feature-based sequence representation to predict amount of helices and strands for sequences from the twilight zone. The PSSC-core method was tested and compared with two other state-of-the-art prediction methods on a set of 2187 twilight zone sequences. The results indicate that our method provides better predictions for both helix and strand content. The PSSC-core is shown to provide statistically significantly better results when compared with the competing methods, reducing the prediction error by 5-7% for helix and 7-9% for strand content predictions. The proposed feature-based sequence representation uses a comprehensive set of physicochemical properties that are custom-designed for each of the helix and strand content predictions. It includes composition and composition moment vectors, frequency of tetra-peptides associated with helical and strand conformations, various property-based groups like exchange groups, chemical groups of the side chains and hydrophobic group, auto-correlations based on hydrophobicity, side-chain masses, hydropathy, and conformational patterns for beta-sheets. The PSSC-core method provides an alternative for predicting the secondary structure content that can be used to validate and constrain results of other structure prediction methods. At the same time, it also provides useful insight into design of successful protein sequence representations that can be used in developing new methods related to prediction of different aspects of the secondary protein structure.  相似文献   

For a long time, NMR chemical shifts have been used to identify protein secondary structures. Currently, this is accomplished through comparing the observed (1)H(alpha), (13)C(alpha), (13)C(beta), or (13)C' chemical shifts with the random coil values. Here, we present a new protocol, which is based on the joint probability of each of the three secondary structural types (beta-strand, alpha-helix, and random coil) derived from chemical-shift data, to identify the secondary structure. In combination with empirical smooth filters/functions, this protocol shows significant improvements in the accuracy and the confidence of identification. Updated chemical-shift statistics are reported, on the basis of which the reliability of using chemical shift to identify protein secondary structure is evaluated for each nucleus. The reliability varies greatly among the 20 amino acids, but, on average, is in the order of: (13)C(alpha)>(13)C'>(1)H(alpha)>(13)C(beta)>(15)N>(1)H(N) to distinguish an alpha-helix from a random coil; and (1)H(alpha)>(13)C(beta) >(1)H(N) approximately (13)C(alpha) approximately (13)C' approximately (15)N for a beta-strand from a random coil. Amide (15)N and (1)H(N) chemical shifts, which are generally excluded from the application, in fact, were found to be helpful in distinguishing a beta-strand from a random coil. In addition, the chemical-shift statistical data are compared with those reported previously, and the results are discussed. A JAVA User Interface program has been developed to make the entire procedure fully automated and is available via http://ccsr3150-p3.stanford.edu.  相似文献   

A novel method for predicting the secondary structures of proteins from amino acid sequence has been presented. The protein secondary structure seqlets that are analogous to the words in natural language have been extracted. These seqlets will capture the relationship between amino acid sequence and the secondary structures of proteins and further form the protein secondary structure dictionary. To be elaborate, the dictionary is organism-specific. Protein secondary structure prediction is formulated as an integrated word segmentation and part of speech tagging problem. The word-lattice is used to represent the results of the word segmentation and the maximum entropy model is used to calculate the probability of a seqlet tagged as a certain secondary structure type. The method is markovian in the seqlets, permitting efficient exact calculation of the posterior probability distribution over all possible word segmentations and their tags by viterbi algorithm. The optimal segmentations and their tags are computed as the results of protein secondary structure prediction. The method is applied to predict the secondary structures of proteins of four organisms respectively and compared with the PHD method. The results show that the performance of this method is higher than that of PHD by about 3.9% Q3 accuracy and 4.6% SOV accuracy. Combining with the local similarity protein sequences that are obtained by BLAST can give better prediction. The method is also tested on the 50 CASP5 target proteins with Q3 accuracy 78.9% and SOV accuracy 77.1%. A web server for protein secondary structure prediction has been constructed which is available at http://www.insun.hit.edu.cn:81/demos/biology/index.html.  相似文献   

1 Introduction The prediction of protein structure and function from amino acid sequences is one of the most impor-tant problems in molecular biology. This problem is becoming more pressing as the number of known pro-tein sequences is explored as a result of genome and other sequencing projects, and the protein sequence- structure gap is widening rapidly[1]. Therefore, com-putational tools to predict protein structures are needed to narrow the widening gap. Although the prediction of three dim…  相似文献   

The conformational parametersP k for each amino acid species (j=1–20) of sequential peptides in proteins are presented as the product ofP i,k , wherei is the number of the sequential residues in thekth conformational state (k=-helix,-sheet,-turn, or unordered structure). Since the average parameter for ann-residue segment is related to the average probability of finding the segment in the kth state, it becomes a geometric mean of (P k )av=(P i,k ) 1/n with amino acid residuei increasing from 1 ton. We then used ln(Pk)av to convert a multiplicative process to a summation, i.e., ln(P k ) av =(1/n)P i,k (i=1 ton) for ease of operation. However, this is unlike the popular Chou-Fasman algorithm, which has the flaw of using the arithmetic mean for relative probabilities. The Chou-Fasman algorithm happens to be close to our calculations in many cases mainly because the difference between theirP k and our InP k is nearly constant for about one-half of the 20 amino acids. When stronger conformation formers and breakers exist, the difference become larger and the prediction at the N- and C-terminal-helix or-sheet could differ. If the average conformational parameters of the overlapping segments of any two states are too close for a unique solution, our calculations could lead to a different prediction.  相似文献   

The conditional probability, P(sigma/x), is a statement of the probability that the value of sigma will be found given the prior information that a value of x has been observed. Here sigma represents any one of the secondary structure types, alpha, beta, tau, and rho for helix, sheet, turn, and random, respectively, and x represents a sequence attribute, including, but not limited to: (1) hydropathy; (2) hydrophobic moments assuming helix and sheet; (3) Richardson and Richardson helical N-cap and C-cap values; (4) Chou-Fasman conformational parameters for helix, P alpha, for sheet, P beta, and for turn, P tau; and (5) Garnier, Osguthorpe, and Robson (GOR) information values for helix, I alpha, for sheet, I beta, for turn, I tau, and for random structure, I rho. Plots of P(sigma/x) vs. x are demonstrated to provide information about the correlation between structure and attribute, sigma and x. The separations between different P(sigma/x) vs. x curves indicate the capacity of a given attribute to discriminate between different secondary structural types and permit comparison of different attributes. P(alpha/x), P(beta/x), P(tau/x) and P(rho/x) vs. x plots show that the most useful attributes for discriminating helix are, in order: hydrophobic moment assuming helix greater than P alpha much greater than N-cap greater than C-cap approximately I alpha approximately I tau. The information value for turns, I tau, was found to discriminate helix better than turns. Discrimination for sheet was found to be in the following order: I beta much greater than P beta approximately hydropathy greater than I rho approximately hydrophobic moment assuming sheet. Three attributes, at their low values, were found to give significant discrimination for the absence of helix: I alpha approximately P alpha approximately hydrophobic moment assuming helix. Also, three other attributes were found to indicate the absence of sheet: P beta much greater than I rho approximately hydropathy. Indications of the absence of sigma could be as useful for some applications as the indication of the presence of sigma.  相似文献   

