首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到15条相似文献,搜索用时 375 毫秒
1.
α/β类蛋白质折叠类型的分类方法研究   总被引:1,自引:0,他引:1       下载免费PDF全文
马帅  王勤  李晓琴 《生物信息学》2014,12(2):123-132
蛋白质折叠规律的研究是生命科学重大前沿课题之一,折叠分类是蛋白质折叠研究的基础。本文基于LIFCA数据库,选取样本量大于2的55种α/β类蛋白质折叠类型为研究对象。结合蛋白质折叠类型的定义及其保守拓扑结构特征,确定了55种蛋白质折叠类型的模板及其对应的特征参数。建立了基于模板的打分函数Mul-Fscore,并结合二级结构参数信息,给出了55种α/β类蛋白质折叠类型的多模板分类方法。用此方法对LIFAC数据库中的931个样本进行检验,分类结果的平均特异性、平均敏感性、MCC值分别为99.58%、79.47%、79.39%。与TM-score分类结果对比发现,Mul-Fscore分类的敏感性与MCC值好于TM-score的相应结果,平均特异性相近。  相似文献   

2.
蛋白质折叠规律研究是生命科学领域重要的前沿课题之一,蛋白质折叠类型分类是折叠规律研究的基础。本研究以SCOP数据库的蛋白质折叠类型分类为基础、以Astral SCOPe 2.05数据库中相似性小于40%的α、β、α+β及α/β类所属的折叠类型为研究对象,完成了989种蛋白质折叠类型的模板构建并形成模板数据库;基于折叠类型设计模板建立了蛋白质折叠类型分类方法,实现了SCOP数据库蛋白质折叠类型的自动化分类。家族模板自洽性检验与独立性检验所得的敏感性、特异性以及MCC的平均值分别为:95.00%、99.99%、0.94与90.00%、99.97%、0.92,折叠类型模板自洽性检验与独立性检验所得的敏感性、特异性以及MCC的平均值分别为:93.71%、99.97%、0.91与86.00%、99.93%、0.87。结果表明:模板设计合理,可有效用于对已知结构的蛋白质进行分类。  相似文献   

3.
《生命科学研究》2016,(5):381-388
蛋白质折叠类型识别是蛋白质结构研究的重要内容,折叠类型分类是折叠识别的基础。通过对ASTRAL-1.65数据库α类蛋白质所属折叠类型进行系统研究,建立蛋白质折叠类型模板数据库,提取反映折叠类型拓扑结构的模板特征参数,根据模板特征参数和TM-align结构比对结果,建立基于特征参数的打分函数Fdscore,并实现α类蛋白质折叠类型自动化分类。使用相同数据集样本,将Fdscore分类方法与TM-score分类方法对比,Fdscore分类方法的平均敏感性、平均特异性、MCC值分别为71.86%、99.49%、0.69,均高于TM-score分类方法相对应结果。上述结果表明该分类方法可用于α类蛋白质折叠类型的自动化分类。  相似文献   

4.
蛋白质折叠类型分类是蛋白质分类研究的重要内容。以SCOP数据库中的 PH domain-like barrel 折叠类型为研究对象,选择序列相似度小于25%的61个样本为检验集,通过结构特征分析,确定了该折叠类型的模板及其对应的特征参数,利用模板与待测蛋白的空间结构比对信息,提出了一个新的折叠类型打分函数Fscore,建立了基于Fscore的蛋白质折叠类型分类方法并用于该折叠类型的分类。用此方法对Astral1.75中序列相似度小于95%的16711个样本进行检验,分类结果的特异性为99.97%。结果表明:特征参数抓住了折叠类型的本质,打分函数Fscore及基于Fscore建立的分类方法可用于 PH domain-like barrel 蛋白质折叠类型自动分类。  相似文献   

5.
双绕蛋白质的分类与识别   总被引:1,自引:0,他引:1  
蛋白质折叠识别是蛋白质结构研究的重要内容。双绕是α/β蛋白质中结构典型的常见折叠类型。选取22个家族中序列一致性小于25%的79个典型双绕蛋白质作为训练集,以RMSD为指标进行系统聚类,并对各类建立基于结构比对的概形隐马尔科夫模型(profile-HMM)。将Astral1.65中序列一致性小于95%的9 505个样本作为检验集,整体识别敏感性为93.9%,特异性为82.1%,MCC值为0.876。结果表明:对于成员较多,无法建立统一模型的折叠类型,分类建模可以实现较高准确率的识别。  相似文献   

6.
蛋白质空间结构研究是分子生物学、细胞生物学、生物化学以及药物设计等领域的重要课题.折叠类型反映了蛋白质核心结构的拓扑模式,对折叠类型的识别是蛋白质序列与结构关系研究的重要内容.选取LIFCA数据库中样本量较大的53种折叠类型,应用功能域组分方法进行折叠识别.将Astral 1.65中序列一致性小于95%的样本作为检验集,全库检验结果中平均敏感性为96.42%,特异性为99.91%,马修相关系数(MCC)为0.91,各项统计结果表明:功能域组分方法可以很好地应用在蛋白质折叠识别中,LIFCA相对简单的分类规则可以很好地集中蛋白质的大部分功能特性,反映了结构与功能的对应关系.  相似文献   

7.
Globin-like蛋白质折叠类型识别   总被引:2,自引:0,他引:2  
蛋白质折叠类型识别是蛋白质结构研究的重要内容.以SCOP中的Globin-like折叠为研究对象,选择其中序列同一性小于25%的17个代表性蛋白质为训练集,采用机器和人工结合的办法进行结构比对,产生序列排比,经过训练得到了适合Globin-like折叠的概形隐马尔科夫模型(profile HMM)用于该折叠类型的识别.以Astrall.65中的68057个结构域样本进行检验,识别敏感度为99.64%,特异性100%.在折叠类型水平上,与Pfam和SUPERFAMILY单纯使用序列比对构建的HMM相比,所用模型由多于100个归为一个,仍然保持了很高的识别效果.结果表明:对序列相似度很低但具有相同折叠类型的蛋白质,可以通过引入结构比对的方法建立统一的HMM模型,实现高准确率的折叠类型识别.  相似文献   

8.
蛋白质折叠类型分类方法及分类数据库   总被引:1,自引:0,他引:1  
李晓琴  仁文科  刘岳  徐海松  乔辉 《生物信息学》2010,8(3):245-247,253
蛋白质折叠规律研究是生命科学重大前沿课题,折叠分类是蛋白质折叠研究的基础。目前的蛋白质折叠类型分类基本上靠专家完成,不同的库分类并不相同,迫切需要一个建立在统一原理基础上的蛋白质折叠类型数据库。本文以ASTRAL-1.65数据库中序列同源性在25%以下、分辨率小于2.5的蛋白为基础,通过对蛋白质空间结构的观察及折叠类型特征的分析,提出以蛋白质折叠核心为中心、以蛋白质结构拓扑不变性为原则、以蛋白质折叠核心的规则结构片段组成、连接和空间排布为依据的蛋白质折叠类型分类方法,建立了低相似度蛋白质折叠分类数据库——LIFCA,包含259种蛋白质折叠类型。数据库的建立,将为进一步的蛋白质折叠建模及数据挖掘、蛋白质折叠识别、蛋白质折叠结构进化研究奠定基础。  相似文献   

9.
以序列相似性低于40%的1895条蛋白质序列构建涵盖27个折叠类型的蛋白质折叠子数据库,从蛋白质序列出发,用模体频数值、低频功率谱密度值、氨基酸组分、预测的二级结构信息和自相关函数值构成组合向量表示蛋白质序列信息,采用支持向量机算法,基于整体分类策略,对27类蛋白质折叠子的折叠类型进行预测,独立检验的预测精度达到了66.67%。同时,以同样的特征参数和算法对27类折叠子的4个结构类型进行了预测,独立检验的预测精度达到了89.24%。将同样的方法用于前人使用过的27类折叠子数据库,得到了好于前人的预测结果。  相似文献   

10.
蛋白质折叠类型识别方法研究   总被引:1,自引:0,他引:1  
蛋白质折叠类型识别是一种分析蛋白质结构的重要方法.以序列相似性低于25%的822个全B类蛋白为研究对象,提取核心结构二级结构片段及片段问氢键作用信息为折叠类型特征参数,构建全B类蛋白74种折叠类型模板数据库.定义查询蛋白与折叠类型模板间二级结构匹配函数SS、氢键作用势函数BP及打分函数P,P值最小的模板所对应的折叠类型为查询蛋白的折叠类型.从SCOP1.69中随机抽取三组、每组50个全β类蛋白结构域进行预测,分辨精度分别为56%、56%和42%;对Ding等提供的检验集进行预测,总分辨精度为61.5%.结果和比对表明,此方法是一种有效的折叠类型识别方法.  相似文献   

11.
Jiang L  Li M  Wen Z  Wang K  Diao Y 《The protein journal》2006,25(4):241-249
A new method was proposed for prediction of mitochondrial proteins by the discrete wavelet transform, based on the sequence–scale similarity measurement. This sequence–scale similarity, revealing more information than other conventional methods, does not rely on subcellular location information and can directly predict protein sequences with different length. In our experiments, 499 mitochondrial protein sequences, constituting a mitochondria database, were used as training dataset, and 681 non-mitochondrial protein sequences were tested. The system can predict these sequences with sensitivity, specificity, accuracy and MCC of 50.30%, 95.74%, 76.53% and 0.54, respectively. Source code of the new program is available on request from the authors.  相似文献   

12.
In our previous work,we developed a computational tool,PreK-ClassK-ClassKv,to predictand classify potassium (K~ ) channels.For K channel prediction (PreK) and classification at family level(ClassK),this method performs well.However,it does not perform so well in classifying voltage-gatedpotassium (Kv) channels (ClassKv).In this paper,a new method based on the local sequence information ofKv channels is introduced to classify Kv channels.Six transmembrane domains of a Kv channel protein areused to define a protein,and the dipeptide composition technique is used to transform an amino acid sequenceto a numerical sequence.A Kv channel protein is represented by a vector with 2000 elements,and a supportvector machine algorithm is applied to classify Kv channels.This method shows good performance withaverages of total accuracy (Acc),sensitivity (SE),specificity (SP),reliability (R) and Matthews correlationcoefficient (MCC) of 98.0%,89.9%,100%,0.95 and 0.94 respectively.The results indicate that the localsequence information-based method is better than the global sequence information-based method to classifyKv channels.  相似文献   

13.
MOTIVATION: Recognizing proteins that have similar tertiary structure is the key step of template-based protein structure prediction methods. Traditionally, a variety of alignment methods are used to identify similar folds, based on sequence similarity and sequence-structure compatibility. Although these methods are complementary, their integration has not been thoroughly exploited. Statistical machine learning methods provide tools for integrating multiple features, but so far these methods have been used primarily for protein and fold classification, rather than addressing the retrieval problem of fold recognition-finding a proper template for a given query protein. RESULTS: Here we present a two-stage machine learning, information retrieval, approach to fold recognition. First, we use alignment methods to derive pairwise similarity features for query-template protein pairs. We also use global profile-profile alignments in combination with predicted secondary structure, relative solvent accessibility, contact map and beta-strand pairing to extract pairwise structural compatibility features. Second, we apply support vector machines to these features to predict the structural relevance (i.e. in the same fold or not) of the query-template pairs. For each query, the continuous relevance scores are used to rank the templates. The FOLDpro approach is modular, scalable and effective. Compared with 11 other fold recognition methods, FOLDpro yields the best results in almost all standard categories on a comprehensive benchmark dataset. Using predictions of the top-ranked template, the sensitivity is approximately 85, 56, and 27% at the family, superfamily and fold levels respectively. Using the 5 top-ranked templates, the sensitivity increases to 90, 70, and 48%.  相似文献   

14.
β-turns are the most common type of non-repetitive structures, and constitute on average 25% of the amino acids in proteins. The formation of β-turns plays an important role in protein folding, protein stability and molecular recognition processes. In this work we present the neural network method NetTurnP, for prediction of two-class β-turns and prediction of the individual β-turn types, by use of evolutionary information and predicted protein sequence features. It has been evaluated against a commonly used dataset BT426, and achieves a Matthews correlation coefficient of 0.50, which is the highest reported performance on a two-class prediction of β-turn and not-β-turn. Furthermore NetTurnP shows improved performance on some of the specific β-turn types. In the present work, neural network methods have been trained to predict β-turn or not and individual β-turn types from the primary amino acid sequence. The individual β-turn types I, I', II, II', VIII, VIa1, VIa2, VIba and IV have been predicted based on classifications by PROMOTIF, and the two-class prediction of β-turn or not is a superset comprised of all β-turn types. The performance is evaluated using a golden set of non-homologous sequences known as BT426. Our two-class prediction method achieves a performance of: MCC=0.50, Qtotal=82.1%, sensitivity=75.6%, PPV=68.8% and AUC=0.864. We have compared our performance to eleven other prediction methods that obtain Matthews correlation coefficients in the range of 0.17-0.47. For the type specific β-turn predictions, only type I and II can be predicted with reasonable Matthews correlation coefficients, where we obtain performance values of 0.36 and 0.31, respectively. CONCLUSION: The NetTurnP method has been implemented as a webserver, which is freely available at http://www.cbs.dtu.dk/services/NetTurnP/. NetTurnP is the only available webserver that allows submission of multiple sequences.  相似文献   

15.
Identification of bacterial and archaeal counterparts to eukaryotic ion channels has greatly facilitated studies of structural biophysics of the channels. Often, searches based only on sequence alignment tools are inadequate for discovering such distant bacterial and archaeal counterparts. We address the discovery of bacterial and archaeal members of the Pentameric Ligand-Gated Ion Channel (pLGIC) family by a combination of four computational methods. One domain-based method involves retrieval of proteins with pLGIC-relevant domains by matching those domains to previously established domain templates in the InterPro family of databases. The second domain-based method involves searches using ungapped de-novo motifs discovered by MEME which were trained with well characterized members of the pLGIC family. The third and fourth methods involve the use of two sequence alignment search algorithms BLASTp and psiBLAST respectively. The sequences returned from all methods were screened by having the correct topology for pLGIC's, and by returning an annotated member of this family as one of the first ten hits using BLASTp against a comprehensive database of eukaryotic proteins. We found the domain based searches to have high specificity but low sensitivity, while the sequence alignment methods have higher sensitivity but lower specificity. The four methods together discovered 69 putative bacterial and archaeal members of the pLGIC family. We ranked and divide the 69 proteins into groups according to the similarity of their domain compositions with known eukaryotic pLGIC's. One especially notable group is more closely related to eukaryotic pLGIC's than to any other known protein family, and has the overall topology of pLGIC's, but the functional domains they contain are sufficiently different from those found in known pLGIC's that they do not score very well against the pLGIC domain templates. We conclude that multiple methods used in a coordinated fashion outperform any single method for identifying likely distant bacterial and archaeal proteins that may provide useful models for important eukaryotic channel function. We note also that the methods used here are largely standard and readily accessible. The novelty is in the effectiveness of a strategy that combines these methods for identifying bacterial and archea relatives of this family. Therefore the paper may serve as a template for a broad group of workers to reliably identify bacterial and archaeal counterparts to eukaryotic proteins.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号