首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 234 毫秒
1.
目的:基于支持向量机建立一个自动化识别新肽链四级结构的方法,提高现有方法的识别精度.方法:改进4种已有的蛋白质一级序列特征值提取方法,采用线性和非线性组合预测方法建立一个有效的组合预测模型.结果:以同源二聚体及非同源二聚体为例.对4种特征值提取方法进行改进后其分类精度均提升了2~3%;进一步实施线性与非线性组合预测后,其分类精度再次提高了2~3%,使独立测试集的分类精度达到了90%以上.结论:4种特征值提取方法均较好地反应出蛋白质一级序列包含四级结构信息,组合预测方法能有效地集多种特征值提取方法优势于一体.  相似文献   

2.
图聚类用于蛋白质分类问题可以获得较好结果,其前提是将蛋白质之间复杂的相互关系转化为适当的相似性网络作为图聚类分类的输入数据。本文提出一种基于BLAST检索的相似性网络构建方法,从目标蛋白质序列出发,通过若干轮次的BLAST检索逐步从数据库中提取与目标蛋白质直接或间接相关的序列,构成关联集。关联集中序列之间的相似性关系即相似性网络,可作为图聚类算法的分类依据。对Pfam数据库中依直接相似关系难以正确分类的蛋白质的计算表明,按本文方法构建的相似性网络取得了比较满意的结果。  相似文献   

3.
蛋白质折叠规律研究是生命科学领域重要的前沿课题之一,蛋白质折叠类型分类是折叠规律研究的基础。本研究以SCOP数据库的蛋白质折叠类型分类为基础、以Astral SCOPe 2.05数据库中相似性小于40%的α、β、α+β及α/β类所属的折叠类型为研究对象,完成了989种蛋白质折叠类型的模板构建并形成模板数据库;基于折叠类型设计模板建立了蛋白质折叠类型分类方法,实现了SCOP数据库蛋白质折叠类型的自动化分类。家族模板自洽性检验与独立性检验所得的敏感性、特异性以及MCC的平均值分别为:95.00%、99.99%、0.94与90.00%、99.97%、0.92,折叠类型模板自洽性检验与独立性检验所得的敏感性、特异性以及MCC的平均值分别为:93.71%、99.97%、0.91与86.00%、99.93%、0.87。结果表明:模板设计合理,可有效用于对已知结构的蛋白质进行分类。  相似文献   

4.
基于支持向量机融合网络的蛋白质折叠子识别研究   总被引:11,自引:1,他引:11  
在不依赖于序列相似性的条件下,蛋白质折叠子识别是一种分析蛋白质结构的重要方法.提出了一种三层支持向量机融合网络,从蛋白质的氨基酸序列出发,对27类折叠子进行识别.融合网络使用支持向量机作为成员分类器,采用“多对多”的多类分类策略,将折叠子的6种特征分为主要特征和次要特征,构建了多个差异的融合方案,然后对这些融合方案进行动态选择得到最终决策.当分类之前难以确定哪些参与组合的特征种类能够使分类结果最好时,提供了一种可靠的解决方案来自动选择特征信息互补最大的组合,保证了最佳分类结果.最后,识别系统对独立测试样本的总分类精度达到61.04%.结果和对比表明,此方法是一种有效的折叠子识别方法.  相似文献   

5.
基于SVM 的药物靶点预测方法及其应用   总被引:1,自引:0,他引:1       下载免费PDF全文
目的:基于已知药物靶点和潜在药物靶点蛋白的一级结构相似性,结合SVM技术研究新的有效的药物靶点预测方法。方法:构造训练样本集,提取蛋白质序列的一级结构特征,进行数据预处理,选择最优核函数,优化参数并进行特征选择,训练最优预测模型,检验模型的预测效果。以G蛋白偶联受体家族的蛋白质为预测集,应用建立的最优分类模型对其进行潜在药物靶点挖掘。结果:基于SVM所建立的最优分类模型预测的平均准确率为81.03%。应用最优分类器对构造的G蛋白预测集进行预测,结果发现预测排位在前20的蛋白质中有多个与疾病相关。特别的,其中有两个G蛋白在治疗靶点数据库(TTD)中显示已作为临床试验的药物靶点。结论:基于SVM和蛋白质序列特征的药物靶点预测方法是有效的,应用该方法预测出的潜在药物靶点能够为发现新的药靶提供参考。  相似文献   

6.
GESTs(gene expression similarity and taxonomy similarity)是结合基因表达相似性和基因功能分类体系Gene Ontology (GO)中的功能概念相似性测度进行功能预测的新方法. 将此预测算法推广应用于蛋白质互相作用数据, 并提出了几种在蛋白质互作网络中为功能待测蛋白质筛选邻居的方法. 与已有的其它蛋白质功能预测方法不同, 新方法在学习过程中自动地从功能分类体系中的各个功能类中选择最合适的尽可能具体细致的功能类, 利用注释于其相近功能类中的互作邻居蛋白质支持对此具体功能类的预测. 使用MIPS提供的酵母蛋白质互作信息与一套基因表达谱数据, 利用特别针对GO体系结构层次特点设计的3种测度, 评价对GO知识体系中的生物过程分支进行蛋白质功能预测的效果. 结果显示, 利用文中的方法, 可以大范围预测蛋白质的精细功能. 此外, 还利用此方法对2004年底Gene Ontology上未知功能的蛋白质进行预测, 其中部分预测结果在2006年4月发布的SGD注释数据中已经得到了证实.  相似文献   

7.
基于支持向量机和贝叶斯方法的蛋白质四级结构分类研究   总被引:4,自引:2,他引:4  
用支持向量机和贝叶斯两种方法对蛋白质四级结构进行分类研究。结果表明,基于支持向量机的分类结果最好,其l0CV检验的总分类精度、正样本正确预测率、Matthes相关系数和假阳性率分别为74.2%、84.6%、0.474、38.9%;基于贝叶斯的分类结果没有支持向量机的分类结果好,但其l0CV检验的假阳性率最低(15.9%).这些结果说明同源寡聚蛋白质一级序列包含四级结构信息,同时特征向量的确表示了埋藏在缔合亚基作用部位接触表面的基本信息。  相似文献   

8.
基于支持向量机的蛋白质同源寡聚体分类研究   总被引:14,自引:1,他引:13  
基于支持向量机和贝叶斯方法,从蛋白质一级序列出发对蛋白质同源二聚体、同源三聚体、同源四聚体、同源六聚体进行分类研究,结果表明:基于支持向量机, 采用“一对多”和“一对一”策略, 其分类总精度分别为77.36%和93.43%, 分别比基于贝叶斯协方差判别法的分类总精度50.64%提高26.72和42.79个百分点.从而说明支持向量机可用于蛋白质同源寡聚体分类,且是一种非常有效的方法.对于多类蛋白质同源寡聚体分类,基于相同的机器学习方法(如支持向量机),采用“一对一”策略比“一对多”效果好.同时亦表明蛋白质同源寡聚体一级序列包含四级结构信息.  相似文献   

9.
张堃  赵静静  唐旭清 《生命科学研究》2011,15(2):101-106,124
基于经典HP模型,利用蛋白质序列的矩阵图谱表达法(MGR)及数值刻画的思想提出了一种新的蛋白质序列的比对方法,通过观察蛋白质序列的数值刻画图及计算两蛋白质序列之间的欧氏距离d,对木聚糖酶两家族的蛋白质序列进行了相似性分析.发现被划分为同一木聚糖酶家族的蛋白质序列之间的相似性更大,而且蛋白质序列的相似性程度与分子大小、结构和分子进化相关.  相似文献   

10.
蛋白质二级结构的真空紫外圆二色性研究   总被引:2,自引:0,他引:2  
利用同步辐射真空紫外圆二色谱仪和特制的样品池,测定溶液中蛋白质的真空紫外圆二色谱,测定波长低至175nm,并应用一种新的计算法分析计算了蛋白质5种二级结构的含量,所得结果与用X射线衍射法测定的结果一致.讨论了获得好的真空紫外圆二色谱的几个重要因素.结果表明,真空紫外圆二色法是目前测定溶液中蛋白质二级结构的较好方法之一.  相似文献   

11.
Intrinsically disordered proteins have a wide variety of important functional roles. However, the relationship between sequence and function in these proteins is significantly different than that for well-folded proteins. In a previous work, we showed that the propensity to be disordered can be recognized based on sequence composition alone. Here that analysis is furthered by examining the relationship of disorder propensity to sequence complexity, where the metrics for these two properties depend only on composition. The distributions of 40 amino acid peptides from both ordered and disordered proteins are graphed in this disorder-complexity space. An analysis of Swiss-Prot shows that most peptides have high complexity and relatively low disorder. However, there are also an appreciable number of low complexity-high disorder peptides in the database. In contrast, there are no low complexity-low disorder peptides. A similar analysis for peptides in the PDB reveals a much narrower distribution, with few peptides of low complexity and high disorder. In this case, the bounds of the disorder-complexity distribution are well defined and might be used to evaluate the likelihood that a peptide can be crystallized with current methods. The disorder-complexity distributions of individual proteins and sets of proteins grouped by function are also examined. Among individual proteins, there is an enormous variety of distributions that in some cases can be rationalized with regard to function. Groups of functionally related proteins are found to have distributions that are similar within each group but show notable differences between groups. Finally, a pattern matching algorithm is used to search for proteins with particular disorder-complexity distributions. The results suggest that this approach might be used to identify relationships between otherwise dissimilar proteins.  相似文献   

12.
Tilted peptides are short sequence fragments (10-20 residues long) that possess an asymmetric hydrophobicity gradient along their sequence when they are helical. Due to this gradient, they adopt a tilted orientation towards a single lipid/water interface and destabilize the lipids. We have detected those peptides in many different proteins with various functions. While being all tilted-oriented at a single lipid/water interface, no consensus sequence can be evidenced. In order to better understand the relationships between their lipid-destabilizing activity and their properties, we used IMPALA to classify the tilted peptides. This method allows the study of interactions between a peptide and a modeled lipid bilayer using simple restraint functions designed to mimic some of the membrane properties. We predict that tilted peptides have access to a wide conformational space in membranes, in contrast to transmembrane and amphipathic helices. In agreement with previous studies, we suggest that those metastable configurations could lead to the perturbation of the acyl chains organization and could be a general mechanism for lipid destabilization. Our results further suggest that tilted peptides fall into two classes: those from proteins acting on membrane behave differently than destabilizing fragments from interfacial proteins. While the former have equal access to the two layers of the membrane, the latter are confined within a single lipid layer. This could be in relation with the organization of lipid substrate on which the peptides physiologically act.  相似文献   

13.
This article contains a comparative review of the structural properties of membrane haemoproteins, with particular emphasis on the possible similarities of the haem-binding peptides. A procedure is suggested for identifying the peptides which may bind membrane-buried haems on the basis of the primary sequences of the proteins. The integration of this procedure with the information deduced by refined hydropathy analysis indicates that the basic structural model for the haemoproteins which interact with quinones may be a transmembrane helical bundle containing the haem(s) at its centre. Structural similarities exist in the sequence of hydrophobic segments that are predicted to bind the membrane-buried haems of b-cytochromes which interact with quinones. The predicted haem-binding sites show similarities also with the peptides that bind the non-haem iron in the bacterial reaction centres, and this may be correlated to the common function of interacting with quinones and their intermediates. The analysis of the amino-acid composition of the proposed ligand peptides in the membrane haemoproteins examined has provided a molecular rationale for explaining the highly anisotropic low-spin EPR signal which is characteristic of many membrane-bound b-cytochromes.  相似文献   

14.
Phages play critical roles in the survival and pathogenicity of their hosts, via lysogenic conversion factors, and in nutrient redistribution, via cell lysis. Analyses of phage- and viral-encoded genes in environmental samples provide insights into the physiological impact of viruses on microbial communities and human health. However, phage ORFs are extremely diverse of which over 70% of them are dissimilar to any genes with annotated functions in GenBank. Better identification of viruses would also aid in better detection and diagnosis of disease, in vaccine development, and generally in better understanding the physiological potential of any environment. In contrast to enzymes, viral structural protein function can be much more challenging to detect from sequence data because of low sequence conservation, few known conserved catalytic sites or sequence domains, and relatively limited experimental data. We have designed a method of predicting phage structural protein sequences that uses Artificial Neural Networks (ANNs). First, we trained ANNs to classify viral structural proteins using amino acid frequency; these correctly classify a large fraction of test cases with a high degree of specificity and sensitivity. Subsequently, we added estimates of protein isoelectric points as a feature to ANNs that classify specialized families of proteins, namely major capsid and tail proteins. As expected, these more specialized ANNs are more accurate than the structural ANNs. To experimentally validate the ANN predictions, several ORFs with no significant similarities to known sequences that are ANN-predicted structural proteins were examined by transmission electron microscopy. Some of these self-assembled into structures strongly resembling virion structures. Thus, our ANNs are new tools for identifying phage and potential prophage structural proteins that are difficult or impossible to detect by other bioinformatic analysis. The networks will be valuable when sequence is available but in vitro propagation of the phage may not be practical or possible.  相似文献   

15.
MOTIVATION: Function of proteins or a network of interacting proteins often involves communication between residues that are well separated in sequence. The classic example is the participation of distant residues in allosteric regulation. Bioinformatic and structural analysis methods have been introduced to infer residues that are correlated. Recently, increasing attention has been paid to obtain the sequence properties that determine the tendency of disease-related proteins (Abeta peptides, prion proteins, transthyretin, etc.) to aggregate and form fibrils. Motivated in part by the need to identify sequence characteristics that indicate a tendency to aggregate, we introduce a general method that probes covariations in charged residues along the sequence in a given protein family. The method, which involves computing the sequence correlation entropy (SCE) using the quenched probability P(sk)(i,j) of finding a residue pair at a given sequence separation, sk, allows us to classify protein families in terms of their SCE. Our general approach may be a useful way in obtaining evolutionary covariations of amino acid residues on a genome wide level. RESULTS: We use a combination of SCE and clustering based on the principle component analysis to classify the protein families. From an analysis of 839 families, covering approximately 500,000 sequences, we find that proteins with relatively low values of SCE are predominantly associated with various diseases. In several families, residues that give rise to peaks in P(sk)(i,j) are clustered in the three-dimensional structure. For the class of proteins with low SCE values, there are significant numbers of mixed charged-hydrophobic (CH) and charged-polar (CP) runs. Our findings suggest that the low values of SCE and the presence of (CH) and/or (CP) may be indicative of disease association or tendency to aggregate. Our results led to the hypothesis that functions of proteins with similar SCE values may be linked. The hypothesis is validated with a few anecdotal examples. The present results also lead to the prediction that the overall charge correlations in proteins affect the kinetics of amyloid formation--a feature that is common to all proteins implicated in neurodegenerative diseases.  相似文献   

16.
A method is reported for the preparation of a group of three proteins from the S-carboxymethylated high-sulphur fraction of wool. These proteins have been partially characterized by their tryptic peptides. All have similar structural features and show an interesting homology within the group and some similarities of sequence with a different group of wool high-sulphur proteins. The evidence for the sequence of some of the peptides is given in a supplementary paper that has been deposited as Supplementary Publication 50008 at the National Lending Library for Science and Technology, Boston Spa, Yorks. LS23 7BQ, U.K., from whom copies can be obtained on the terms indicated in Biochem. J. (1972) 126, 5.  相似文献   

17.
Pánek J  Eidhammer I  Aasland R 《Proteins》2005,58(4):923-934
Structural similarity among proteins is reflected in the distribution of hydropathicity along the amino acids in the protein sequence. Similarities in the hydropathy distributions are obvious for homologous proteins within a protein family. They also were observed for proteins with related structures, even when sequence similarities were undetectable. Here we present a novel method that employs the hydropathy distribution in proteins for identification of (sub)families in a set of (homologous) proteins. We represent proteins as points in a generalized hydropathy space, represented by vectors of specifically defined features. The features are derived from hydropathy of the individual amino acids. Projection of this space onto principal axes reveals groups of proteins with related hydropathy distributions. The groups identified correspond well to families of structurally and functionally related proteins. We found that this method accurately identifies protein families in a set of proteins, or subfamilies in a set of homologous proteins. Our results show that protein families can be identified by the analysis of hydropathy distribution, without the need for sequence alignment.  相似文献   

18.
The detection and classification of membrane-spanning proteins   总被引:167,自引:0,他引:167  
Discriminant analysis can be used to precisely classify membrane proteins as integral or peripheral and to estimate the odds that the classification is correct. Specifically, using 102 membrane proteins from the National Biomedical Research Foundation (NBRF) database we find that discrimination between integral and peripheral membrane proteins can be achieved with 99% reliability. Hydrophobic segments of integral membrane proteins can also be distinguished from interior segments of globular soluble proteins with better than 95% reliability. We also propose a procedure for determining boundaries of membrane-spanning segments and apply it to several integral membrane proteins. For the limited data available (such as on transplantation antigens), the residues at the boundaries of a membrane-spanning segment are predictable to within the error inherent in the concept of boundary. As a specific indication of resolution, seven membrane-spanning segments of bacteriorhodopsin are resolved with no information other than sequence, and the predicted boundary residues agree with the experimental data on proteolytic cleavage sites. Several definitive but yet to be tested predictions are also made, and the relation to other predictive methods is briefly discussed. A computer program in FORTRAN for prediction of membrane-spanning segments is available from the authors.  相似文献   

19.
Wang C  Ye M  Han G  Chen R  Zhang M  Jiang X  Cheng K  Wang F  Zou H 《Proteomics》2011,11(17):3578-3581
Multiple residues with consensus sequence, i.e. motif, on proteins are closely related to protein function. However, there is no effective method for targeted analysis of such proteins. The challenge for analysis of these classes of proteins by MS is how to selectively enrich peptides containing consensus sequence from protein digest. Although enrichment of peptides containing one type of amino acid residue was successfully achieved by chemically labeling followed by chromatographic isolation, however, it is almost impossible to label and isolate signature peptides containing multiple residues with consensus sequence by chemical approach. Herein, we developed an enzymatic approach based on the specific recognition between enzyme and its substrates to enrich such peptides. This approach was realized by modification of a residue in the consensus sequence via enzyme that can recognize the sequence followed by the isolation of the modified peptides. cAMP-dependent protein kinase was used to validate this approach and 168 peptides containing consensus motif were identified with selectivity of 67.2%. Those peptides resulted in the identification of 88 proteins with consensus sequence from serum sample. As this motif-oriented peptide enrichment approach allows targeted analysis of a subset of proteins with consensus sequence, it will have broad application in biological studies.  相似文献   

20.
Double-stranded RNA-binding proteins constitute a large family with conserved domains called dsRBDs. One of these, TRBP, a protein that binds HIV-1 TAR RNA, has two dsRBDs (dsRBD1 and dsRBD2), as indicated by computer sequence homology. However, a 24-amino-acid deletion in dsRBD2 completely abolishes RNA binding, suggesting that only one domain is functional. To analyse further the similarities and differences between these domains, we expressed them independently and measured their RNA-binding affinities. We found that dsRBD2 has a dissociation constant of 5.9 x 10-8 M, whereas dsRBD1 binds RNA minimally. Binding analysis of 25-amino-acid peptides in TRBP and other related proteins showed that only one peptide in TRBP and one in Drosophila Staufen bind TAR and a GC-rich TAR-mimic RNA. Whereas a 25-mer peptide derived from dsRBD2 (TR5) bound TAR RNA, the equivalent peptide in dsRBD1 (TR6) did not. Molecular modelling indicates that this difference can mainly be ascribed to the replacement of Arg by His residues. Mutational analyses in homologous peptides also show the importance of residues K2 and L3. Analysis of 15-amino-acid peptides revealed that, in addition to TR13 (from TRBP dsRBD2), one peptide in S6 kinase has RNA-binding properties. On the basis of previous and the present results, we can define, in a broader context than that of TRBP, the main outlines of a modular KR-helix motif required for binding TAR. This structural motif exists independently from the dsRBD context and therefore has a modular function.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号