首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到16条相似文献,搜索用时 203 毫秒
1.
蛋白质折叠规律研究是生命科学领域重要的前沿课题之一,蛋白质折叠类型分类是折叠规律研究的基础。本研究以SCOP数据库的蛋白质折叠类型分类为基础、以Astral SCOPe 2.05数据库中相似性小于40%的α、β、α+β及α/β类所属的折叠类型为研究对象,完成了989种蛋白质折叠类型的模板构建并形成模板数据库;基于折叠类型设计模板建立了蛋白质折叠类型分类方法,实现了SCOP数据库蛋白质折叠类型的自动化分类。家族模板自洽性检验与独立性检验所得的敏感性、特异性以及MCC的平均值分别为:95.00%、99.99%、0.94与90.00%、99.97%、0.92,折叠类型模板自洽性检验与独立性检验所得的敏感性、特异性以及MCC的平均值分别为:93.71%、99.97%、0.91与86.00%、99.93%、0.87。结果表明:模板设计合理,可有效用于对已知结构的蛋白质进行分类。  相似文献   

2.
《生命科学研究》2016,(5):381-388
蛋白质折叠类型识别是蛋白质结构研究的重要内容,折叠类型分类是折叠识别的基础。通过对ASTRAL-1.65数据库α类蛋白质所属折叠类型进行系统研究,建立蛋白质折叠类型模板数据库,提取反映折叠类型拓扑结构的模板特征参数,根据模板特征参数和TM-align结构比对结果,建立基于特征参数的打分函数Fdscore,并实现α类蛋白质折叠类型自动化分类。使用相同数据集样本,将Fdscore分类方法与TM-score分类方法对比,Fdscore分类方法的平均敏感性、平均特异性、MCC值分别为71.86%、99.49%、0.69,均高于TM-score分类方法相对应结果。上述结果表明该分类方法可用于α类蛋白质折叠类型的自动化分类。  相似文献   

3.
蛋白质折叠类型分类是蛋白质分类研究的重要内容。以SCOP数据库中的 PH domain-like barrel 折叠类型为研究对象,选择序列相似度小于25%的61个样本为检验集,通过结构特征分析,确定了该折叠类型的模板及其对应的特征参数,利用模板与待测蛋白的空间结构比对信息,提出了一个新的折叠类型打分函数Fscore,建立了基于Fscore的蛋白质折叠类型分类方法并用于该折叠类型的分类。用此方法对Astral1.75中序列相似度小于95%的16711个样本进行检验,分类结果的特异性为99.97%。结果表明:特征参数抓住了折叠类型的本质,打分函数Fscore及基于Fscore建立的分类方法可用于 PH domain-like barrel 蛋白质折叠类型自动分类。  相似文献   

4.
α/β类蛋白质折叠类型的分类方法研究   总被引:1,自引:0,他引:1       下载免费PDF全文
马帅  王勤  李晓琴 《生物信息学》2014,12(2):123-132
蛋白质折叠规律的研究是生命科学重大前沿课题之一,折叠分类是蛋白质折叠研究的基础。本文基于LIFCA数据库,选取样本量大于2的55种α/β类蛋白质折叠类型为研究对象。结合蛋白质折叠类型的定义及其保守拓扑结构特征,确定了55种蛋白质折叠类型的模板及其对应的特征参数。建立了基于模板的打分函数Mul-Fscore,并结合二级结构参数信息,给出了55种α/β类蛋白质折叠类型的多模板分类方法。用此方法对LIFAC数据库中的931个样本进行检验,分类结果的平均特异性、平均敏感性、MCC值分别为99.58%、79.47%、79.39%。与TM-score分类结果对比发现,Mul-Fscore分类的敏感性与MCC值好于TM-score的相应结果,平均特异性相近。  相似文献   

5.
蛋白质折叠规律研究是生命科学重大前沿课题,折叠类型分类是蛋白质折叠研究的基础。构建BRD-like折叠类型模板数据库,建立了基于多模板的综合分类方法,并用于该折叠类型的分类。对实验集的12 117个样本进行检验,结果的敏感性、特异性分别为0.923和0.997,MCC值为0.72;对独立检验集2 260个样本的检验,结果发现:敏感性、特异性分别为0.941和0.998,MCC值为0.86.结果表明:基于多模板的综合分类方法可用于蛋白质折叠类型分类。  相似文献   

6.
蛋白质空间结构研究是分子生物学、细胞生物学、生物化学以及药物设计等领域的重要课题.折叠类型反映了蛋白质核心结构的拓扑模式,对折叠类型的识别是蛋白质序列与结构关系研究的重要内容.选取LIFCA数据库中样本量较大的53种折叠类型,应用功能域组分方法进行折叠识别.将Astral 1.65中序列一致性小于95%的样本作为检验集,全库检验结果中平均敏感性为96.42%,特异性为99.91%,马修相关系数(MCC)为0.91,各项统计结果表明:功能域组分方法可以很好地应用在蛋白质折叠识别中,LIFCA相对简单的分类规则可以很好地集中蛋白质的大部分功能特性,反映了结构与功能的对应关系.  相似文献   

7.
双绕蛋白质的分类与识别   总被引:1,自引:0,他引:1  
蛋白质折叠识别是蛋白质结构研究的重要内容。双绕是α/β蛋白质中结构典型的常见折叠类型。选取22个家族中序列一致性小于25%的79个典型双绕蛋白质作为训练集,以RMSD为指标进行系统聚类,并对各类建立基于结构比对的概形隐马尔科夫模型(profile-HMM)。将Astral1.65中序列一致性小于95%的9 505个样本作为检验集,整体识别敏感性为93.9%,特异性为82.1%,MCC值为0.876。结果表明:对于成员较多,无法建立统一模型的折叠类型,分类建模可以实现较高准确率的识别。  相似文献   

8.
蛋白质折叠模式识别是一种分析蛋白质结构的重要方法。以序列相似性较低的蛋白质为训练集,提取蛋白质序列信息频数及疏水性等信息作为折叠类型特征,从SCOP数据库中已分类蛋白质构建1 393种折叠模式的数据集,采用SVM预测蛋白质1 393种折叠模式。封闭测试准确率达99.612 2%,基于SCOP的开放测试准确率达79.632 9%。基于另一个权威测试集的开放测试折叠准确率达64.705 9%,SCOP类准确率达76.470 6%,可以有效地对蛋白质折叠模式进行预测,从而为蛋白质从头预测提供参考。  相似文献   

9.
前期的相关研究发现mRNA二级结构中存在对蛋白质折叠速率的重要影响因素.而mRNA二级结构中普遍存在着各种复杂的环结构,这些环结构是否对蛋白质折叠速率也有重要的影响呢?不同的环结构对蛋白质折叠速率的影响是否相同呢?基于此想法,建立了一个包含mRNA内部环、发夹环、膨胀环和多分支环等环结构信息和相应蛋白质折叠速率的数据库.对于数据库中的每一个蛋白质,计算了mRNA二级结构中各种环结构碱基含量、配对碱基含量及单链碱基含量等参量,分析了各参量与相应蛋白质折叠速率的相关性.结果显示,各种环结构碱基含量与蛋白质折叠速率均呈极显著或显著正相关.说明mRNA环结构对蛋白质折叠速率有重要的影响.进一步,把蛋白质按照不同折叠类型或不同二级结构类型分组后,对每一组蛋白质重复上述的分析工作.结果表明,对不同类蛋白质,mRNA的各种环结构对其相应蛋白质折叠速率的影响存在着显著差异.上述研究将为进一步开展有关mRNA和蛋白质折叠速率的研究奠定理论基础.  相似文献   

10.
以序列相似性低于40%的1895条蛋白质序列构建涵盖27个折叠类型的蛋白质折叠子数据库,从蛋白质序列出发,用模体频数值、低频功率谱密度值、氨基酸组分、预测的二级结构信息和自相关函数值构成组合向量表示蛋白质序列信息,采用支持向量机算法,基于整体分类策略,对27类蛋白质折叠子的折叠类型进行预测,独立检验的预测精度达到了66.67%。同时,以同样的特征参数和算法对27类折叠子的4个结构类型进行了预测,独立检验的预测精度达到了89.24%。将同样的方法用于前人使用过的27类折叠子数据库,得到了好于前人的预测结果。  相似文献   

11.
蛋白质折叠类型识别方法研究   总被引:1,自引:0,他引:1  
蛋白质折叠类型识别是一种分析蛋白质结构的重要方法.以序列相似性低于25%的822个全B类蛋白为研究对象,提取核心结构二级结构片段及片段问氢键作用信息为折叠类型特征参数,构建全B类蛋白74种折叠类型模板数据库.定义查询蛋白与折叠类型模板间二级结构匹配函数SS、氢键作用势函数BP及打分函数P,P值最小的模板所对应的折叠类型为查询蛋白的折叠类型.从SCOP1.69中随机抽取三组、每组50个全β类蛋白结构域进行预测,分辨精度分别为56%、56%和42%;对Ding等提供的检验集进行预测,总分辨精度为61.5%.结果和比对表明,此方法是一种有效的折叠类型识别方法.  相似文献   

12.
Taylor WR 《FEBS letters》2006,580(22):5263-5267
A novel measure, called "topological accessibility" quantifies how easy it is to reconstruct a protein structure using only local contacts when starting at any point on the chain. Plotting this measure for all points in the chain gives a picture of how accessible the fold is. Simple folds are accessible from all positions, others are accessible only from limited positions while the most complex folds are not accessible from any position. The distribution of topological accessibility along the chain was found to be completely symmetric for the all-alpha and all-beta protein classes. However, for the betaalpha class, a distinct asymmetry was found (with probability 10(-30) of being due to chance). Examination of the proteins contributing to this signal indicated many that have an ancient origin. This suggests that the folds of these proteins may have become fixed under the influence of amino-terminal folding before the advent of chaperone assisted folding.  相似文献   

13.
使用图像特征构建快速有效的蛋白质折叠识别方法   总被引:2,自引:0,他引:2  
蛋白质结构自动分类是探索蛋白质结构- 功能关系的一种重要研究手段。首先将蛋白质折叠子三维空间结构映射成为二维距离矩阵,并将距离矩阵视作灰度图像。然后基于灰度直方图和灰度共生矩阵提出了一种计算简单的折叠子结构特征提取方法,得到了低维且能够反映折叠结构特点的特征,并进一步阐明了直方图中零灰度孤峰形成原因,深入分析了共生矩阵特征中灰度分布、不同角度和像素距离对应的结构意义。最后应用于27类折叠子分类,对独立集测试的精度达到了71.95 %,对所有数据进行10 交叉验证的精度为78.94 %。与多个基于序列和结构的折叠识别方法的对比结果表明,此方法不仅具有低维和简洁的特征,而且无需复杂的分类系统,能够有效和高效地实现多类折叠子识别。  相似文献   

14.
Identification on protein folding types is always based on the 27-class folds dataset, which was provided by Ding & Dubchak in 2001. But with the avalanche of protein sequences, fold data is also expanding, so it will be the inevitable trend to improve the existing dataset and expand more folding types. In this paper, we construct a multi-class protein fold dataset, which contains 3,457 protein chains with sequence identity below 35% and could be classified into 76 fold types. It was 4 times larger than Ding & Dubchak's dataset. Furthermore, our work proposes a novel approach of support vector machine based on optimal features. By combining motif frequency, low-frequency power spectral density, amino acid composition, the predicted secondary structure and the values of auto-correlation function as feature parameters set, the method adopts criterion of the maximum correlation and the minimum redundancy to filter these features and obtain a 95-dimensions optimal feature subset. Based on the ensemble classification strategy, with 95-dimensions optimal feature as input parameters of support vector machine, we identify the 76-class protein folds and overall accuracy measures up to 44.92% by independent test. In addition, this method has been further used to identify upgraded 27-class protein folds, overall accuracy achieves 66.56%. At last, we also test our method on Ding & Dubchak's 27-class folds dataset and obtained better identification results than most of the previous reported results.  相似文献   

15.
To examine how a short secondary structural element derived from a native protein folds when in a different protein environment, we inserted an 11-residue beta-sheet segment (cassette) from human immunoglobulin fold, Fab new, into an alpha-helical coiled-coil host protein (cassette holder). This de novo design protein model, the structural cassette mutagenesis (SCM) model, allows us to study protein folding principles involving both short- and long-range interactions that affect secondary structure stability and conformation. In this study, we address whether the insertion of this beta-sheet cassette into the alpha-helical coiled-coil protein would result in conformational change nucleated by the long-range tertiary stabilization of the coiled-coil, therefore overriding the local propensity of the cassette to form beta-sheet, observed in its native immunoglobulin fold. The results showed that not only did the nucleating helices of the coiled-coil on either end of the cassette fail to nucleate the beta-sheet cassette to fold with an alpha-helical conformation, but also the entire chimeric protein became a random coil. We identified two determinants in this cassette that prevented coiled-coil formation: (1) a tandem dipeptide NN motif at the N-terminal of the beta-sheet cassette, and (2) the hydrophilic Ser residue, which would be buried in the hydrophobic core if the coiled-coil structure were to fold. By amino acid substitution of these helix disruptive residues, that is, either the replacement of the NN motif with high helical propensity Ala residues or the substitution of Ser with Leu to enhance hydrophobicity, we were able to convert the random coil chimeric protein into a fully folded alpha-helical coiled-coil. We hypothesized that this NN motif is a "secondary structural specificity determinant" which is very selective for one type of secondary structure and may prevent neighboring residues from adopting an alternate protein fold. These sequences with secondary structural specificity determinants have very strong local propensity to fold into a specific secondary structure and may affect overall protein folding by acting as a folding initiation site.  相似文献   

16.
In a natively folded protein of moderate or larger size, the protein backbone may weave through itself in complex ways, raising questions about what sequence of events might have to occur in order for the protein to reach its native configuration from the unfolded state. A mathematical framework is presented here for describing the notion of a topological folding barrier, which occurs when a protein chain must pass through a hole or opening, formed by other regions of the protein structure. Different folding pathways encounter different numbers of such barriers and therefore different degrees of frustration. A dynamic programming algorithm finds the optimal theoretical folding path and minimal degree of frustration for a protein based on its natively folded configuration. Calculations over a database of protein structures provide insights into questions such as whether the path of minimal frustration might tend to favor folding from one or from many sites of folding nucleation, or whether proteins favor folding around the N terminus, thereby providing support for the hypothesis that proteins fold co-translationally. The computational methods are applied to a multi-disulfide bonded protein, with computational findings that are consistent with the experimentally observed folding pathway. Attention is drawn to certain complex protein folds for which the computational method suggests there may be a preferred site of nucleation or where folding is likely to proceed through a relatively well-defined pathway or intermediate. The computational analyses lead to testable models for protein folding.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号