首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
Globin-like蛋白质折叠类型识别   总被引:2,自引:0,他引:2       下载免费PDF全文
蛋白质折叠类型识别是蛋白质结构研究的重要内容.以SCOP中的Globin-like折叠为研究对象,选择其中序列同一性小于25%的17个代表性蛋白质为训练集,采用机器和人工结合的办法进行结构比对,产生序列排比,经过训练得到了适合Globin-like折叠的概形隐马尔科夫模型(profile HMM)用于该折叠类型的识别.以Astrall.65中的68057个结构域样本进行检验,识别敏感度为99.64%,特异性100%.在折叠类型水平上,与Pfam和SUPERFAMILY单纯使用序列比对构建的HMM相比,所用模型由多于100个归为一个,仍然保持了很高的识别效果.结果表明:对序列相似度很低但具有相同折叠类型的蛋白质,可以通过引入结构比对的方法建立统一的HMM模型,实现高准确率的折叠类型识别.  相似文献   

2.
蛋白质折叠类型分类方法及分类数据库   总被引:1,自引:0,他引:1  
李晓琴  仁文科  刘岳  徐海松  乔辉 《生物信息学》2010,8(3):245-247,253
蛋白质折叠规律研究是生命科学重大前沿课题,折叠分类是蛋白质折叠研究的基础。目前的蛋白质折叠类型分类基本上靠专家完成,不同的库分类并不相同,迫切需要一个建立在统一原理基础上的蛋白质折叠类型数据库。本文以ASTRAL-1.65数据库中序列同源性在25%以下、分辨率小于2.5的蛋白为基础,通过对蛋白质空间结构的观察及折叠类型特征的分析,提出以蛋白质折叠核心为中心、以蛋白质结构拓扑不变性为原则、以蛋白质折叠核心的规则结构片段组成、连接和空间排布为依据的蛋白质折叠类型分类方法,建立了低相似度蛋白质折叠分类数据库——LIFCA,包含259种蛋白质折叠类型。数据库的建立,将为进一步的蛋白质折叠建模及数据挖掘、蛋白质折叠识别、蛋白质折叠结构进化研究奠定基础。  相似文献   

3.
蛋白质空间结构研究是分子生物学、细胞生物学、生物化学以及药物设计等领域的重要课题.折叠类型反映了蛋白质核心结构的拓扑模式,对折叠类型的识别是蛋白质序列与结构关系研究的重要内容.选取LIFCA数据库中样本量较大的53种折叠类型,应用功能域组分方法进行折叠识别.将Astral 1.65中序列一致性小于95%的样本作为检验集,全库检验结果中平均敏感性为96.42%,特异性为99.91%,马修相关系数(MCC)为0.91,各项统计结果表明:功能域组分方法可以很好地应用在蛋白质折叠识别中,LIFCA相对简单的分类规则可以很好地集中蛋白质的大部分功能特性,反映了结构与功能的对应关系.  相似文献   

4.
给出了以疏水一亲水模型为基础的蛋白质设计方法,该方法以物理学原理为基础,以相对熵作为优化的目标函数。对四种不同结构类型的天然结构的真实蛋白质进行了检测,分析了影响检测成功率的主要因素,结果表明,该方法是普适的,可用于对不同结构类型的蛋白质设计序列。  相似文献   

5.
蛋白质的折叠   总被引:2,自引:0,他引:2  
重点介绍了蛋白质折叠的热力学控制学说和动力学控制学说,简单介绍了几种蛋白质折叠模型并分析了多肽链在体内进行快速折叠的原因。  相似文献   

6.
蛋白质分子进化规律研究是分子进化研究的重点,对揭示生命起源与进化机制有重要意义。本文对已知空间结构及物种信息的单绕蛋白,利用结构比对信息,构建了不同层次单绕样本系统聚类图。分析发现:功能相似蛋白存在明显聚集现象,同一超家族样本基本聚在一个大支中,同一家族样本集中在所属超家族下的小支中,功能约束下单绕样本聚类图与物种进化图有较好对应关系。结果表明:单绕蛋白的结构演化反映了蛋白质功能的约束,特定功能单绕样本的结构差异具有种属特异性,结构演化包含了物种进化信息。  相似文献   

7.
图聚类用于蛋白质分类问题可以获得较好结果,其前提是将蛋白质之间复杂的相互关系转化为适当的相似性网络作为图聚类分类的输入数据。本文提出一种基于BLAST检索的相似性网络构建方法,从目标蛋白质序列出发,通过若干轮次的BLAST检索逐步从数据库中提取与目标蛋白质直接或间接相关的序列,构成关联集。关联集中序列之间的相似性关系即相似性网络,可作为图聚类算法的分类依据。对Pfam数据库中依直接相似关系难以正确分类的蛋白质的计算表明,按本文方法构建的相似性网络取得了比较满意的结果。  相似文献   

8.
蛋白质折叠模式识别是一种分析蛋白质结构的重要方法。以序列相似性较低的蛋白质为训练集,提取蛋白质序列信息频数及疏水性等信息作为折叠类型特征,从SCOP数据库中已分类蛋白质构建1 393种折叠模式的数据集,采用SVM预测蛋白质1 393种折叠模式。封闭测试准确率达99.612 2%,基于SCOP的开放测试准确率达79.632 9%。基于另一个权威测试集的开放测试折叠准确率达64.705 9%,SCOP类准确率达76.470 6%,可以有效地对蛋白质折叠模式进行预测,从而为蛋白质从头预测提供参考。  相似文献   

9.
基于蛋白质网络功能模块的蛋白质功能预测   总被引:1,自引:0,他引:1       下载免费PDF全文
在破译了基因序列的后基因组时代,随着系统生物学实验的快速发展,产生了大量的蛋白质相互作用数据,利用这些数据寻找功能模块及预测蛋白质功能在功能基因组研究中具有重要意义.打破了传统的基于蛋白质间相似度的聚类模式,直接从蛋白质功能团的角度出发,考虑功能团间的一阶和二阶相互作用,提出了模块化聚类方法(MCM),对实验数据进行聚类分析,来预测模块内未知蛋白质的功能.通过超几何分布P值法和增、删、改相互作用的方法对聚类结果进行预测能力分析和稳定性分析.结果表明,模块化聚类方法具有较高的预测准确度和覆盖率,有很好的容错性和稳定性.此外,模块化聚类分析得到了一些具有高预测准确度的未知蛋白质的预测结果,将会对生物实验有指导意义,其算法对其他具有相似结构的网络也具有普遍意义.  相似文献   

10.
蛋白质折叠类型分类是蛋白质分类研究的重要内容。以SCOP数据库中的 PH domain-like barrel 折叠类型为研究对象,选择序列相似度小于25%的61个样本为检验集,通过结构特征分析,确定了该折叠类型的模板及其对应的特征参数,利用模板与待测蛋白的空间结构比对信息,提出了一个新的折叠类型打分函数Fscore,建立了基于Fscore的蛋白质折叠类型分类方法并用于该折叠类型的分类。用此方法对Astral1.75中序列相似度小于95%的16711个样本进行检验,分类结果的特异性为99.97%。结果表明:特征参数抓住了折叠类型的本质,打分函数Fscore及基于Fscore建立的分类方法可用于 PH domain-like barrel 蛋白质折叠类型自动分类。  相似文献   

11.
The three-dimensional (3D) structure prediction of proteins :is an important task in bioinformatics. Finding energy functions that can better represent residue-residue and residue-solvent interactions is a crucial way to improve the prediction accu- racy. The widely used contact energy functions mostly only consider the contact frequency between different types of residues; however, we find that the contact frequency also relates to the residue hydrophobic environment. Accordingly, we present an improved contact energy function to integrate the two factors, which can reflect the influence of hydrophobic interaction on the stabilization of protein 3D structure more effectively. Furthermore, a fold recognition (threading) approach based on this energy function is developed. The testing results obtained with 20 randomly selected proteins demonstrate that, compared with common contact energy functions, the proposed energy function can improve the accuracy of the fold template prediction from 20% to 50%, and can also improve the accuracy of the sequence-template alignment from 35% to 65%.  相似文献   

12.
We present an analysis of 10 blind predictions prepared for a recent conference, “Critical Assessment of Techniques for Protein Structure Prediction.”1 The sequences of these proteins are not detectably similar to those of any protein in the structure database then available, but we attempted, by a threading method, to recognize similarity to known domain folds. Four of the 10 proteins, as we subsequently learned, do indeed show significant similarity to then-known structures. For 2 of these proteins the predictions were accurate, in the sense that a similar structure was at or near the top of the list of threading scores, and the threading alignment agreed well with the corresponding structural alignment. For the best predicted model mean alignment error relative to the optimal structural alignment was 2.7 residues, arising entirely from small “register shifts” of strands or helices. In the analysis we attempt to identify factors responsible for these successes and failures. Since our threading method does not use gap penalties, we may readily distinguish between errors arising from our prior definition of the “cores” of known structures and errors arising from inherent limitations in the threading potential. It would appear from the results that successful substructure recognition depends most critically on accurate definition of the “fold” of a database protein. This definition must correctly delineate substructures that are, and are not, likely to be conserved during protein evolution. © 1995 Wiley-Liss, Inc.  相似文献   

13.
The mammalian peptidoglycan recognition protein-S (PGRP-S) binds to peptidoglycans (PGNs), which are essential components of the cell wall of bacteria. The protein was isolated from the samples of milk obtained from camels with mastitis and purified to homogeneity and crystallized. The crystals belong to orthorhombic space group I222 with a = 87.0 Å, b = 101.7 Å and c = 162.3 Å having four crystallographically independent molecules in the asymmetric unit. The structure has been determined using X-ray crystallographic data and refined to 1.8 Å resolution. Overall, the structures of all the four crystallographically independent molecules are identical. The folding of PGRP-S consists of a central β-sheet with five β-strands, four parallel and one antiparallel, and three α-helices. This protein fold provides two functional sites. The first of these is the PGN-binding site, located on the groove that opens on the surface in the direction opposite to the location of the N terminus. The second site is implicated to be involved in the binding of non-PGN molecules, it also includes putative N-terminal segment residues (1-31) and helix α2 in the extended binding. The structure reveals a novel arrangement of PGRP-S molecules in which two pairs of molecules associate to form two independent dimers. The first dimer is formed by two molecules with N-terminal segments at the interface in which non-PGN binding sites are buried completely, whereas the PGN-binding sites of two participating molecules are fully exposed at the opposite ends of the dimer. In the second dimer, PGN-binding sites are buried at the interface while non-PGN binding sites are fully exposed at the opposite ends of the dimer. This form of dimeric arrangement is unique and seems to be aimed at enhancing the capability of the protein against specific invading bacteria. This mode of functional dimerization enhances efficiency and specificity, and is observed for the first time in the family of PGRP molecules.  相似文献   

14.
15.
Improving fold recognition without folds   总被引:4,自引:0,他引:4  
The most reliable way to align two proteins of unknown structure is through sequence-profile and profile-profile alignment methods. If the structure for one of the two is known, fold recognition methods outperform purely sequence-based alignments. Here, we introduced a novel method that aligns generalised sequence and predicted structure profiles. Using predicted 1D structure (secondary structure and solvent accessibility) significantly improved over sequence-only methods, both in terms of correctly recognising pairs of proteins with different sequences and similar structures and in terms of correctly aligning the pairs. The scores obtained by our generalised scoring matrix followed an extreme value distribution; this yielded accurate estimates of the statistical significance of our alignments. We found that mistakes in 1D structure predictions correlated between proteins from different sequence-structure families. The impact of this surprising result was that our method succeeded in significantly out-performing sequence-only methods even without explicitly using structural information from any of the two. Since AGAPE also outperformed established methods that rely on 3D information, we made it available through. If we solved the problem of CPU-time required to apply AGAPE on millions of proteins, our results could also impact everyday database searches.  相似文献   

16.
Carugo O 《Bioinformation》2010,4(8):347-351
Several non-redundant ensembles of protein three-dimensional structures were analyzed in order to estimate their natural clustering tendency by means of the Cox-Lewis coefficient. It was observed that, despite proteins tend to aggregate into different and well separated groups, some overlap between different clusters occurs. This suggests that classifications bases only on structural data cannot allow a systematic classification of proteins. Additional information are in particular needed in order to monitor completely the complex evolutionary relationships between proteins.  相似文献   

17.
    
Snyder DA  Montelione GT 《Proteins》2005,59(4):673-686
An important open question in the field of NMR-based biomolecular structure determination is how best to characterize the precision of the resulting ensemble of structures. Typically, the RMSD, as minimized in superimposing the ensemble of structures, is the preferred measure of precision. However, the presence of poorly determined atomic coordinates and multiple \"RMSD-stable domains\"--locally well-defined regions that are not aligned in global superimpositions--complicate RMSD calculations. In this paper, we present a method, based on a novel, structurally defined order parameter, for identifying a set of core atoms to use in determining superimpositions for RMSD calculations. In addition we present a method for deciding whether to partition that core atom set into \"RMSD-stable domains\" and, if so, how to determine partitioning of the core atom set. We demonstrate our algorithm and its application in calculating statistically sound RMSD values by applying it to a set of NMR-derived structural ensembles, superimposing each RMSD-stable domain (or the entire core atom set, where appropriate) found in each protein structure under consideration. A parameter calculated by our algorithm using a novel, kurtosis-based criterion, the epsilon-value, is a measure of precision of the superimposition that complements the RMSD. In addition, we compare our algorithm with previously described algorithms for determining core atom sets. The methods presented in this paper for biomolecular structure superimposition are quite general, and have application in many areas of structural bioinformatics and structural biology.  相似文献   

18.
The wealth of protein sequence and structure data is greater than ever, thanks to the ongoing Genomics and Structural Genomics projects. The information available through such efforts needs to be analysed by new methods that combine both databases. One important result of genomic sequence analysis is the inference of functional homology among proteins. Until recently sequence similarity comparison was the only method for homologue inference. The new fold recognition approach reviewed in this paper enhances sequence comparison methods by including structural information in the process of protein comparison. This additional information often allows for the detection of similarities that cannot be found by methods that only use sequence information.  相似文献   

19.
20.
This study addresses the relation between structural and functional similarity in proteins. We introduce a novel method named tree based on root mean square deviation (T-RMSD), which uses distance RMSD (dRMSD) variations to build fine-grained structure-based classifications of proteins. The main improvement of the T-RMSD over similar methods, such as Dali, is its capacity to produce the equivalent of a bootstrap value for each cluster node. We validated our approach on two domain families studied extensively for their role in many biological and pathological pathways: the small GTPase RAS superfamily and the cysteine-rich domains (CRDs) associated with the tumor necrosis factor receptors (TNFRs) family. Our analysis showed that T-RMSD is able to automatically recover and refine existing classifications. In the case of the small GTPase ARF subfamily, T-RMSD can distinguish GTP- from GDP-bound states, while in the case of CRDs it can identify two new subgroups associated with well defined functional features (ligand binding and formation of ligand pre-assembly complex). We show how hidden Markov models (HMMs) can be built on these new groups and propose a methodology to use these models simultaneously in order to do fine-grained functional genomic annotation without known 3D structures. T-RMSD, an open source freeware incorporated in the T-Coffee package, is available online.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号