首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
构建基于折叠核心的全α类蛋白取代矩阵   总被引:1,自引:0,他引:1  
氨基酸残基取代矩阵是影响多序列比对效果的重要因素,现有的取代矩阵对低相似序列的比对性能较低.在已有的 BLOSUM 取代矩阵算法基础上,定义了基于蛋白质折叠核心结构的序列 结构数据块;提出一种新的基于全α类蛋白质折叠核心结构的氨基酸残基取代矩阵——TOPSSUM25,用于提高低相似度序列的比对效果.将矩阵TOPSSUM25导入多序列比对程序,对相似性小于25%的一组四螺旋束序列 结构数据块的测试结果表明,基于 TOPSSUM25的多序列比对效果明显优于BLOSUM30矩阵;基于一个BAliBASE子集的比对检验也进一步表明, TOPSSUM25在全α类蛋白质的两两序列比对上优于BLOSUM30矩阵.研究结果可为进一步的阐明低同源蛋白质序列 结构 功能关系提供帮助.  相似文献   

2.

Background

Although Transmembrane Proteins (TMPs) are highly important in various biological processes and pharmaceutical developments, general prediction of TMP structures is still far from satisfactory. Because TMPs have significantly different physicochemical properties from soluble proteins, current protein structure prediction tools for soluble proteins may not work well for TMPs. With the increasing number of experimental TMP structures available, template-based methods have the potential to become broadly applicable for TMP structure prediction. However, the current fold recognition methods for TMPs are not as well developed as they are for soluble proteins.

Methodology

We developed a novel TMP Fold Recognition method, TMFR, to recognize TMP folds based on sequence-to-structure pairwise alignment. The method utilizes topology-based features in alignment together with sequence profile and solvent accessibility. It also incorporates a gap penalty that depends on predicted topology structure segments. Given the difference between α-helical transmembrane protein (αTMP) and β-strands transmembrane protein (βTMP), parameters of scoring functions are trained respectively for these two protein categories using 58 αTMPs and 17 βTMPs in a non-redundant training dataset.

Results

We compared our method with HHalign, a leading alignment tool using a non-redundant testing dataset including 72 αTMPs and 30 βTMPs. Our method achieved 10% and 9% better accuracies than HHalign in αTMPs and βTMPs, respectively. The raw score generated by TMFR is negatively correlated with the structure similarity between the target and the template, which indicates its effectiveness for fold recognition. The result demonstrates TMFR provides an effective TMP-specific fold recognition and alignment method.  相似文献   

3.
To investigate a putatively primordial protein we have simplified the sequence of a 56-residue α/β fold (the immunoglobulin-binding domain of protein G) by replacing it with polyalanine, polythreonine, and diglycine segments at regions of the sequence that in the folded structure are α-helical, β-strand, and turns, respectively. Remarkably, multiple folding and unfolding events are observed in a 15-μs molecular dynamics simulation at 330 K. The most stable state (populated at ∼20%) of the simplified-sequence variant of protein G has the same α/β topology as the wild-type but shows the characteristics of a molten globule, i.e., loose contacts among side chains and lack of a specific hydrophobic core. The unfolded state is heterogeneous and includes a variety of α/β topologies but also fully α-helical and fully β-sheet structures. Transitions within the denatured state are very fast, and the molten-globule state is reached in <1 μs by a framework mechanism of folding with multiple pathways. The native structure of the wild-type is more rigid than the molten-globule conformation of the simplified-sequence variant. The difference in structural stability and the very fast folding of the simplified protein suggest that evolution has enriched the primordial alphabet of amino acids mainly to optimize protein function by stabilization of a unique structure with specific tertiary interactions.  相似文献   

4.
蛋白质折叠模式识别是一种分析蛋白质结构的重要方法。以序列相似性较低的蛋白质为训练集,提取蛋白质序列信息频数及疏水性等信息作为折叠类型特征,从SCOP数据库中已分类蛋白质构建1 393种折叠模式的数据集,采用SVM预测蛋白质1 393种折叠模式。封闭测试准确率达99.612 2%,基于SCOP的开放测试准确率达79.632 9%。基于另一个权威测试集的开放测试折叠准确率达64.705 9%,SCOP类准确率达76.470 6%,可以有效地对蛋白质折叠模式进行预测,从而为蛋白质从头预测提供参考。  相似文献   

5.
蛋白质空间结构研究是分子生物学、细胞生物学、生物化学以及药物设计等领域的重要课题.折叠类型反映了蛋白质核心结构的拓扑模式,对折叠类型的识别是蛋白质序列与结构关系研究的重要内容.选取LIFCA数据库中样本量较大的53种折叠类型,应用功能域组分方法进行折叠识别.将Astral 1.65中序列一致性小于95%的样本作为检验集,全库检验结果中平均敏感性为96.42%,特异性为99.91%,马修相关系数(MCC)为0.91,各项统计结果表明:功能域组分方法可以很好地应用在蛋白质折叠识别中,LIFCA相对简单的分类规则可以很好地集中蛋白质的大部分功能特性,反映了结构与功能的对应关系.  相似文献   

6.
Abstract

The closed loops within the proteins of the TIM-barrel fold family are analyzed and compared sequence- and structure-wise. The size distribution of the closed loops of the TIM-barrels confirms universal preference to the standard size of 25–30 residues. 3D structural RMSD comparisons of the closed loops and presentation of their sequences in binary form suggest that the TIM-barrel proteins are built from descendants of several types of basic closed loop prototypes. Comparison of these prototypes points to a likely common ancestor—the alpha helix containing closed loops of 28 amino acids. The presumed ancestor is characterized by specific binary consensus sequence.  相似文献   

7.
依据蛋白质折叠子中氨基酸保守性,以氨基酸、氨基酸的极性、氨基酸的电性以及氨基酸的亲—疏水性为参数,从蛋白质的氨基酸序列出发,采用"一对多"的分类策略,通过构建打分矩阵和选取氨基酸序列模式片断,利用5种相似性打分函数对27类折叠子进行识别,最好的预测精度达到83.46%。结果表明,打分矩阵是预测多类蛋白质折叠子有效的方法。  相似文献   

8.
Knowledge of the three‐dimensional structure of a protein is essential for describing and understanding its function. Today, a large number of known protein sequences faces a small number of identified structures. Thus, the need arises to predict structure from sequence without using time‐consuming experimental identification. In this paper the performance of Support Vector Machines (SVMs) is compared to Neural Networks and to standard statistical classification methods as Discriminant Analysis and Nearest Neighbor Classification. We show that SVMs can beat the competing methods on a dataset of 268 protein sequences to be classified into a set of 42 fold classes. We discuss misclassification with respect to biological function and similarity. In a second step we examine the performance of SVMs if the embedding is varied from frequencies of single amino acids to frequencies of tripletts of amino acids. This work shows that SVMs provide a promising alternative to standard statistical classification and prediction methods in functional genomics.  相似文献   

9.
Summary: The importance of molybdoenzymes is exemplified both by the debilitating and fatal human diseases caused by their deficiency and by their persistence throughout evolution. Here, we show that the protein fold of the molybdopyranopterin-containing domain of sulfite oxidase (the SUOX fold) can be found in all three domains of life. Analyses of sequence data and protein structure comparisons (secondary structure matching) show that the SUOX fold is found in enzymes that have quite distinct macromolecular architectures comprising one or more domains and sometimes subsidiary subunits. These are summarized as follows: (i) animal SUOXs that contain an N-terminal cytochrome b5 domain and an SUOX fold fused to a C-terminal dimerization domain; (ii) plant SUOX that contains an SUOX fold fused to a C-terminal dimerization domain; (iii) the YedY protein from Escherichia coli, which comprises only the SUOX fold; (iv) the sulfite dehydrogenase from Starkeya novella that contains the SUOX fold, a dimerization domain, and an additional c-type cytochrome subunit; and (v) the plant-type nitrate reductases, exemplified by that of Pichia angusta, that contain an N-terminal SUOX fold, a dimerization domain, a cytochrome b5 domain, and a C-terminal NADH binding flavin adenine dinucleotide-containing domain. We used the primary sequences of the proteins containing an SUOX fold to mine 559 sequences of related proteins. A phylogeny of a nonredundant subset of these sequences was generated, and the resultant clades were categorized by sequence motif analyses in the context of the available protein structures. Based on the motif analyses, cladistics, and domain conservations, we are able to postulate a plausible pathway of SUOX fold enzyme evolution.  相似文献   

10.
基于支持向量机融合网络的蛋白质折叠子识别研究   总被引:11,自引:1,他引:11  
在不依赖于序列相似性的条件下,蛋白质折叠子识别是一种分析蛋白质结构的重要方法.提出了一种三层支持向量机融合网络,从蛋白质的氨基酸序列出发,对27类折叠子进行识别.融合网络使用支持向量机作为成员分类器,采用“多对多”的多类分类策略,将折叠子的6种特征分为主要特征和次要特征,构建了多个差异的融合方案,然后对这些融合方案进行动态选择得到最终决策.当分类之前难以确定哪些参与组合的特征种类能够使分类结果最好时,提供了一种可靠的解决方案来自动选择特征信息互补最大的组合,保证了最佳分类结果.最后,识别系统对独立测试样本的总分类精度达到61.04%.结果和对比表明,此方法是一种有效的折叠子识别方法.  相似文献   

11.
The initial aim of the Berkeley Structural Genomics Center is to obtain a near-complete structural complement of two minimal organisms, closely related pathogens Mycoplasma genitalium and M. pneumoniae. The former has fewer than 500 genes and the latter fewer than 700 genes. To achieve this goal, the current protein targets have been selected starting with those predicted to be most tractable and likely to yield new structural and functional information. During the past 3 years, the semi-automated structural genomics pipeline has been set up from cloning, expression, purification, and ultimately to structural determination. The results from the pipeline substantially increased the coverage of the protein fold space of M. pneumoniae and M. genitalium. Furthermore, about 1/2 of the structures of ‘unique’ protein sequences revealed new and novel folds, and over 2/3 of the structures of previously annotated ‘hypothetical proteins’ inferred their molecular functions.  相似文献   

12.
13.
枯草杆菌蛋白酶E的蛋白质工程   总被引:2,自引:0,他引:2  
用定点突变和随机突变的方法,对枯草杆菌碱性蛋白酶E基因进行改造。突变后的基因插入大肠杆菌-枯草杆菌穿梭质粒pBE-2中,在碱性和中性蛋白酶缺陷型的枯草杆菌DBl04中进行表达,得到突变种的碱性蛋白酶.它们的突变位点分别是(M222A)、(M222A、N118S)、(M222A、N118S、Q103R)、(M222A、N118S、Q103R、D60N)。各突变种酶的性质测定 结果表明.M222A突变使酶抗氧化,N118S突变使酶增加热稳定性,Q103R和D60N突变虽然能增加酶的比活,但使酶的热稳定性大大下降,尤其是D60N突变使酶变得极不稳定。野生型碱性蛋白酶与(M222A)突变种的等电点均为8.92.而M222A,N118S)。(M222A,N118S ,Ql03R)和(M222A,118S.Q103R,D60N)突变酶分别为8.88.9.10和9.17。用Nsuc-AAPF-pNA作为底物时酶反应景适pH值为7.5~9.5,而用酪蛋白底物时最适pH值为10~12。  相似文献   

14.
Identifying the fold class of a protein sequence of unknown structure is a fundamental problem in modern biology. We apply a supervised learning algorithm to the classification of protein sequences with low sequence identity from a library of 174 structural classes created with the Combinatorial Extension structural alignment methodology. A class of rules is considered that assigns test sequences to structural classes based on the closest match of an amino acid index profile of the test sequence to a profile centroid for each class. A mathematical optimization procedure is applied to determine an amino acid index of maximal structural discriminatory power by maximizing the ratio of between-class to within-class profile variation. The optimal index is computed as the solution to a generalized eigenvalue problem, and its performance for fold classification is compared to that of other published indices. The optimal index has significantly more structural discriminatory power than all currently known indices, including average surrounding hydrophobicity, which it most closely resembles. It demonstrates >70% classification accuracy over all folds and nearly 100% accuracy on several folds with distinctive conserved structural features. Finally, there is a compelling universality to the optimal index in that it does not appear to depend strongly on the specific structural classes used in its computation.  相似文献   

15.
The evolution of proteins is one of the fundamental processes that has delivered the diversity and complexity of life we see around ourselves today. While we tend to define protein evolution in terms of sequence level mutations, insertions and deletions, it is hard to translate these processes to a more complete picture incorporating a polypeptide''s structure and function. By considering how protein structures change over time we can gain an entirely new appreciation of their long-term evolutionary dynamics. In this work we seek to identify how populations of proteins at different stages of evolution explore their possible structure space. We use an annotation of superfamily age to this space and explore the relationship between these ages and a diverse set of properties pertaining to a superfamily''s sequence, structure and function. We note several marked differences between the populations of newly evolved and ancient structures, such as in their length distributions, secondary structure content and tertiary packing arrangements. In particular, many of these differences suggest a less elaborate structure for newly evolved superfamilies when compared with their ancient counterparts. We show that the structural preferences we report are not a residual effect of a more fundamental relationship with function. Furthermore, we demonstrate the robustness of our results, using significant variation in the algorithm used to estimate the ages. We present these age estimates as a useful tool to analyse protein populations. In particularly, we apply this in a comparison of domains containing greek key or jelly roll motifs.  相似文献   

16.
17.
Protein stability can be enhanced by the incorporation of non-natural amino acids and semi-rigid peptidomimetics to lower the entropic penalty upon protein folding through preorganization. An example is the incorporation of aminoisobutyric acid (Aib, α-methylalanine) into proteins to restrict the Φ and Ψ backbone angles adjacent to Aib to those associated with helix formation. Reverse-turn analogs were introduced into the sequences of HIV protease and ribonuclease A that enhanced their stability and retained their native enzymatic activity. In this work, a chimeric protein, design_4, was engineered, in silico, by replacing the C-terminal helix of full sequence design protein (FSD-1) with a semi-rigid helix mimetic. Residues 1–16 of FSD-1 was ligated in silico with the N-terminus of a phenylbipyridyl-based helix mimetic to form design_4. The designed chimeric protein was stable and maintained the designed fold in a 100-nanosecond molecular dynamics simulation at 280 K. Its β-hairpin adopted conformations that formed three additional hydrogen bonds. Compared to FSD-1, design_4 contained fewer peptide bonds and internal degrees of freedom; it should, therefore, be more resistant to proteolytic degradation and denaturation.  相似文献   

18.
19.
《Biophysical journal》2020,118(2):366-375
Despite advances in sampling and scoring strategies, Monte Carlo modeling methods still struggle to accurately predict de novo the structures of large proteins, membrane proteins, or proteins of complex topologies. Previous approaches have addressed these shortcomings by leveraging sparse distance data gathered using site-directed spin labeling and electron paramagnetic resonance spectroscopy to improve protein structure prediction and refinement outcomes. However, existing computational implementations entail compromises between coarse-grained models of the spin label that lower the resolution and explicit models that lead to resource-intense simulations. These methods are further limited by their reliance on distance distributions, which are calculated from a primary refocused echo decay signal and contain uncertainties that may require manual refinement. Here, we addressed these challenges by developing RosettaDEER, a scoring method within the Rosetta software suite capable of simulating double electron-electron resonance spectroscopy decay traces and distance distributions between spin labels fast enough to fold proteins de novo. We demonstrate that the accuracy of resulting distance distributions match or exceed those generated by more computationally intensive methods. Moreover, decay traces generated from these distributions recapitulate intermolecular background coupling parameters even when the time window of data collection is truncated. As a result, RosettaDEER can discriminate between poorly folded and native-like models by using decay traces that cannot be accurately converted into distance distributions using regularized fitting approaches. Finally, using two challenging test cases, we demonstrate that RosettaDEER leverages these experimental data for protein fold prediction more effectively than previous methods. These benchmarking results confirm that RosettaDEER can effectively leverage sparse experimental data for a wide array of modeling applications built into the Rosetta software suite.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号