首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Abstract

Recent works has suggested that proteins in early evolution have gone through a stage of closed loop elements with a typical contour size of 25–35 residues. These closed loops are still the elementary protein units to these days, and can be used to spell out protein sequence/structure relationship through a relatively small number of protein prototypes. In this study we aimed to identify the sequences that are used to lock the loop ends to one another, and to show how an extensive dictionary of such locking pairs can be created using positional correlation data from a large proteome database, and structural data from PDB databases. Such a dictionary can be used in reconstructing the evolutionary pathway the modern proteins have gone through, and in identifying closed loop elements in modern proteins with yet unknown 3D structure.  相似文献   

2.
Abstract

Conserved protein sequence segments are commonly believed to correspond to functional sites in the protein sequence. A novel approach is proposed to profile the changing degree of conservation along the protein sequence, by evaluating the occurrence frequencies of all short oligopeptides of the given sequence in a large proteome database. Thus, a protein sequence conservation profile can be plotted for every protein. The profile indicates where along the sequences the potential functional (conserved) sites are located. The corresponding oligopeptides belonging to the sites are very frequent across many prokaryotic species. Analysis of a representative set of such profiles reveals a common feature of all examined proteins: they consist of sequence modules represented by the peaks of conservation. Typical size of the modules (peak-to-peak distance) is 25–30 amino acid residues.  相似文献   

3.
Abstract

The closed loops within the proteins of the TIM-barrel fold family are analyzed and compared sequence- and structure-wise. The size distribution of the closed loops of the TIM-barrels confirms universal preference to the standard size of 25–30 residues. 3D structural RMSD comparisons of the closed loops and presentation of their sequences in binary form suggest that the TIM-barrel proteins are built from descendants of several types of basic closed loop prototypes. Comparison of these prototypes points to a likely common ancestor—the alpha helix containing closed loops of 28 amino acids. The presumed ancestor is characterized by specific binary consensus sequence.  相似文献   

4.
Accurately predicted protein secondary structure provides useful information for target selection, to analyze protein function and to predict higher dimensional structure. Existing research shows that more data + refined search = better prediction. We analyze relation between the prediction accuracy and another crucial factor, the protein size. Empirical tests performed with two secondary structure predictors on a large set of high-resolution, non-redundant proteins show that the average accuracies for small proteins (<100 residues) equal 73% and 54% for alpha-helices and beta-strands, respectively. The alpha-helix/beta-strand accuracies for very large proteins (>300 residues) equal 77%/68%, respectively. Similarly, the tests with three secondary structure content predictors show that the prediction errors for the small/very large proteins equal 0.13/0.09 and 0.09/0.06 for alpha-helix and beta-strand content, respectively. Our tests confirm that the secondary structure/content predictions for the very large proteins are characterized statistically significantly better quality than prediction for the small proteins. This is in contrast with the tertiary structure predictions in which higher accuracy is obtained for smaller proteins.  相似文献   

5.
Methods for protein structure prediction are flourishing and becoming widely available to both experimentalists and computational biologists. However, how good are they? What is their range of applicability and how can we know which method is better suited for the task at hand? These are the questions that this review tries to address, by describing the worldwide Critical Assessment of techniques for protein Structure Prediction (CASP) initiative and focusing on the specific problems of assessing the quality of a protein 3D model.  相似文献   

6.
基于知识的蛋白质结构预测   总被引:5,自引:0,他引:5  
介绍了近几年基于知识的蛋白质三维结构预测方法及其进展.目前,基于知识的结构预测方法主要有两类,一类是同源蛋白模建,这种技术比较成熟,模建的结果可靠性比较高,但只适用于同源性比较高的目标序列的模建;另一类方法即蛋白质逆折叠技术,主要包括3D profile方法和基于势函数的方法,给出的是目标蛋白质的空间走向,它主要可用于序列同源性比较低的蛋白质的结构预测.  相似文献   

7.
石鸥燕  杨晶  杨惠云  田心 《现代生物医学进展》2007,7(11):1723-1724,1706
蛋白质二级结构预测对于我们了解蛋白质空间结构是至关重要的一步。文章提出了一种简单的二级结构预测方法,该方法采用多数投票法将现有的3种较好的二级结构预测方法的预测结果汇集形成一致性预测结果。从PDB数据库中随机选取近两年新测定结构的57条相似性小于30%的蛋白质,对该方法的预测结果进行测试,其Q3准确率比3种独立的方法提高了1.12—2.29%,相关系数及SOV准确率也有相应的提高。并且各项准确率均比同样采用一致性方法的Jpred二级结构预测程序准确率要高。这种预测方法虽然原理简单,但无须使用额外的参数,计算量小,易于实现,最重要的前提就是必须选用目前准确性比较出色的蛋白质二级结构预测方法。  相似文献   

8.
采用半补齐方法建立棉铃虫多核衣壳型多角体病毒基因组文库,通过对插入片段进行克隆鉴定和序列分析,获得了38k基因.该基因上游具有晚期调控保守序列TTAAG,是一个晚期表达基因,基因阅读框为903 bp,共编码300个氨基酸,氨基酸序列同源性分析结果表明其与α类杆状病毒的同源性较高,有较近的亲缘关系.氨基酸高级结构的分析表明其与与磷酸酶结构相似性达到95%,与病毒核衣壳的组装有关.  相似文献   

9.
Widely used models of protein evolution ignore protein structure. Therefore, these models do not predict spatial clustering of amino acid replacements with respect to tertiary structure. One formal and biologically implausible possibility is that there is no tendency for amino acid replacements to be spatially clustered during evolution. An alternative to this is that amino acid replacements are spatially clustered and this spatial clustering can be fully explained by a tendency for similar rates of amino acid replacement at sites that are nearby in protein tertiary structure. A third possibility is that the amount of clustering exceeds that which can be explained solely on the basis of independently evolving protein sites with spatially clustered replacement rates. We introduce two simple and not very parametric hypothesis tests that help distinguish these three possibilities. We then apply these tests to 273 homologous protein families. The null hypothesis of no spatial clustering is rejected for 102 of 273 families. The explanation of spatially clustered rates but independent change among sites is rejected for 43 families. These findings need to be reconciled with the common practice of basing evolutionary inferences on models that assume independent change among sites. [Reviewing Editior: Dr. David Pollock]  相似文献   

10.
蛋白质结构与功能中的结构域   总被引:4,自引:1,他引:4  
结构域是蛋白质亚基结构中的紧密球状区域.结构域作为蛋白质结构中介于二级与三级结构之间的又一结构层次,在蛋白质中起着独立的结构单位、功能单位与折叠单位的作用.在复杂蛋白质中,结构域具有结构与功能组件与遗传单位的作用.结构域层次的研究将会促进蛋白质结构与功能关系、蛋白质折叠机制以及蛋白质设计的研究.  相似文献   

11.
近年来关于蛋白质超二级结构(supersecondary motifs,Motifs)的研究已成为国际上一个热点课题,国内也开始出现有关的研究论文,蛋白质超二级结构是两个或几个规则二级结构单元的进一步组合,或看成是二级结构的局域折叠.文章就蛋白质Motifs结构的定义,特点,及对这一结构层次开展研究的意义作了综述,并对蛋白质Motifs研究的进展作了简要的介绍.  相似文献   

12.
Abstract

In this paper, we propose a nongraphical representation for protein secondary structures. By counting the frequency of occurrence of all possible four-tuples (i.e., four-letter words) of a protein secondary structure sequence, we construct a set of 3 × 3 matrices for the corresponding protein secondary structure sequence. Furthermore, the leading eigenvalues of these matrices are computed and considered as invariants for the protein secondary structure sequences. To illustrate the utility of our approach, we apply it to a set of real data to distinguish protein structural classes. The result indicates that it can be used to complement the classification of protein secondary structures.  相似文献   

13.
利用美国国家生物技术信息中心(NCBI)网站所提供的相关信息,分析T-phylloplanin基因编码蛋白。该基因全长861hp,有一个完整的330hp的开放读框,编码110个氨基酸。该基因编码蛋白分子量为11.31kD,理论等电点为7.74。其氨基酸残基的不同区域分布有多N-糖基化位点、酪蛋白激酶Ⅱ磷酸化位点和N-肉豆蔻酰化位点,还有一个跨膜信号肽。T-phylloplanin基因编码蛋白与烟草叶片短柄腺毛分泌的抗性蛋白具有高度的同源性(93%),显示它在烟草抗性系统研究中有潜在价值。  相似文献   

14.
Prediction of the Secondary Structure of Myelin Basic Protein   总被引:14,自引:10,他引:4  
An investigation into the probable secondary structure of the myelin basic protein was carried out by the application of three procedures currently in use to predict the secondary structures of proteins from knowledge of their amino acid sequences. In order to increase the accuracy of the predictions, the amino acid substitutions that occur in the basic protein from different species were incorporated into the predictive algorithms. It was possible to locate regions of probable alpha-helix, beta-structure, beta-turn, and unordered conformation (coil) in the protein. One of the predictive methods introduces a bias into the algorithm to maximize or minimize the amounts of alpha-helix and/or beta-structure present; this made it possible to assess how conditions such as pH and protein concentration or the presence of anionic amphiphilic molecules could influence the protein's secondary structure. The predictions made by the three methods were in reasonably good agreement with one another. They were consistent with experimental data, provided that the stabilizing or destabilizing effects of the environment were taken into account. According to the predictions, the extent of possible alpha-helix and beta-structure formation in the protein s severely restricted by the low frequency and extensive scattering of hydrophobic residues, along with a high frequency and extensive scattering of residues that favor the formation of beta-turns and coils. Neither prolyl residues nor cationic residues per se are responsible for the low content of alpha-helix predicted in the protein. The principal ordered conformation predicted is the beta-turn. Many of the predicted beta-turns overlap extensively, involving in some cases up to 10 residues. In some of these structures it is possible for the peptide backbone to oscillate in a sinusoidal manner, generating a flat, pleated sheetlike structure. Cationic residues located in these structures would appear to be ideally oriented for interaction with lipid phosphate groups located at the cytoplasmic surface of the myelin membrane. An analysis of possible and probable conformations that the triproline sequence could assume questions the popular notion that this sequence produces a hairpin turn in the basic protein.  相似文献   

15.
遗传算法在蛋白质结构预测中的应用   总被引:2,自引:0,他引:2  
遗传算法(geneticalgorithm,GA)作为一种自适应启发式概率性迭代式全局搜索算法,具有不依赖于问题模型的特性、全局最优性、隐含并行性、高效性、解决不同非线性问题的鲁棒性特点,目前已经广泛应用于自动控制、机器人学、计算机科学、模式识别、模糊人工神经和工程优化等设计领域。本文首先介绍了GA的基本原理,即搜索的基本过程;随后总结了GA与传统算法相比所具有的优点;第三部分则分别综述了GA在蛋白质结构预测中主要使用的模型、设计和执行策略,以及使用GA与其他算法相互结合预测蛋白质结构的研究进展;最后提出了作者对GA研究中存在问题的认识和研究展望。  相似文献   

16.
蛋白质的序列决定结构,结构决定功能。新一代准确的蛋白质结构预测工具为结构生物学、结构生物信息学、药物研发和生命科学等许多领域带来了全新的机遇与挑战,单链蛋白质结构预测的准确率达到与试验方法相媲美的水平。本综述概述了蛋白质结构预测领域的理论基础、发展历程与最新进展,讨论了大量预测的蛋白质结构和基于人工智能的方法如何影响实验结构生物学,最后,分析了当前蛋白质结构预测领域仍未解决的问题以及未来的研究方向。  相似文献   

17.
小麦多聚半乳糖醛酸酶抑制蛋白的部分结构   总被引:4,自引:0,他引:4  
为了弄清小麦多聚半乳糖醛酸酶抑制蛋白 (polygalacturonase inhibitingprotein ,PGIP)的作用机制 ,并为其在基因工程中的应用提供依据 ,对其结构进行了研究 .用Edman降解法测得小麦PGIP的N端序列为Lys Pro Leu Leu Thr Lys Ile Thr Lys Gly Ala Ala Ser Thr .用CD谱研究其二级结构 ,发现小麦PGIP天然态含有 4 3 7%的 β折叠和 13 1%的α螺旋 .酸碱和温度变性引起了二级结构改变 .不完全变性阶段 ,二级结构的变化表现为α螺旋无明显变化 ,β折叠遭到破坏 ;活性完全丧失阶段 ,β折叠变化很小 ,α螺旋含量明显减少 .用NR R(非还原 还原 )双向对角线SDS PAGE鉴定出小麦PGIP含有链内二硫键 .用去糖基化法确证了小麦PGIP的糖含量为 2 2 %.小麦PGIP与双子叶植物PGIP相比 ,一级结构差异较大 ,同源性由 36 %变为 9%;二级结构相似 ,都是高 β 折叠的蛋白 ;均具有链内二硫键 ;在糖含量上也相似 .研究结果为进一步弄清小麦PGIP作用机理打下了基础 ,同时对于植物抗赤霉病基因工程具有重要意义 .  相似文献   

18.
曹晨  马堃 《生物信息学》2016,14(3):181-187
蛋白质二级结构是指蛋白质骨架结构中有规律重复的构象。由蛋白质原子坐标正确地指定蛋白质二级结构是分析蛋白质结构与功能的基础,二级结构的指定对于蛋白质分类、蛋白质功能模体的发现以及理解蛋白质折叠机制有着重要的作用。并且蛋白质二级结构信息广泛应用到蛋白质分子可视化、蛋白质比对以及蛋白质结构预测中。目前有超过20种蛋白质二级结构指定方法,这些方法大体可以分为两大类:基于氢键和基于几何,不同方法指定结果之间的差异较大。由于尚没有蛋白质二级结构指定方法的综述文献,因此,本文主要介绍和总结已有蛋白质二级结构指定方法。  相似文献   

19.
The publication of the crystallographic structure of calmodulin protein has offered an example leading us to believe that it is possible for many protein sequence segments to exhibit multiple 3D structures referred to as multi-structural segments. To this end, this paper presents statistical analysis of uniqueness of the 3D-structure of all possible protein sequence segments stored in the Protein Data Bank (PDB, Jan. of 2003, release 103) that occur at least twice and whose lengths are greater than 10 amino acids (AAs). We refined the set of segments by choosing only those that are not parts of longer segments, which resulted in 9297 segments called a sponge set. By adding 8197 signature segments, which occur uniquely in the PDB, into the sponge set we have generated a benchmark set. Statistical analysis of the sponge set demonstrates that rotating, missing and disarranging operations described in the text, result in the segments becoming multi-structural. It turns out that missing segments do not exhibit a change of shape in the 3D-structure of a multi-structural segment. We use the root mean square distance for unit vector sequence (URMSD) as an improved measure to describe the characteristics of hinge rotations, missing, and disarranging segments. We estimated the rate of occurrence for rotating and disarranging segments in the sponge set and divided it by the number of sequences in the benchmark set which is found to be less than 0.85%. Since two of the structure changing operations concern negligible number of segment and the third one is found not to have impact on the structure, we conclude that the 3D-structure of proteins is conserved statistically for more than 98% of the segments. At the same time, the remaining 2% of the sequences may pose problems for the sequence alignment based structure prediction methods.*Jishou Ruan research was supported by Liuhui Center for Applied Mathematics, China-Canada exchange program administered by MITACS and NSFC (10271061). #Ke Chen and Lukasz A. Kurgan research was partially supported by NSERC Canada. Jack A. Tuszynkski research has been supported by MITACS, NSERC Canada and the Allard Foundation.  相似文献   

20.
The number of available protein sequences in public databases is increasing exponentially. However, a significant percentage of these sequences lack functional annotation, which is essential for the understanding of how biological systems operate. Here, we propose a novel method, Quantitative Annotation of Unknown STructure (QAUST), to infer protein functions, specifically Gene Ontology (GO) terms and Enzyme Commission (EC) numbers. QAUST uses three sources of information: structure information encoded by global and local structure similarity search, biological network information inferred by protein–protein interaction data, and sequence information extracted from functionally discriminative sequence motifs. These three pieces of information are combined by consensus averaging to make the final prediction. Our approach has been tested on 500 protein targets from the Critical Assessment of Functional Annotation (CAFA) benchmark set. The results show that our method provides accurate functional annotation and outperforms other prediction methods based on sequence similarity search or threading. We further demonstrate that a previously unknown function of human tripartite motif-containing 22 (TRIM22) protein predicted by QAUST can be experimentally validated.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号