首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
张超  张晖  李冀新  高红 《生物信息学》2006,4(3):128-131
遗传算法源于自然界的进化规律,是一种自适应启发式概率性迭代式全局搜索算法。本文主要介绍了GA的基本原理,算法及优点;总结GA在蛋白质结构预测中建立模型和执行策略,以及多种算法相互结合预测蛋白质结构的研究进展。  相似文献   

2.
通过研究神经网络权值矩阵的算法,挖掘蛋白质二级结构与氨基酸序列间的内在规律,提高一级序列预测二级结构的准确度。神经网络方法在特征分类方面具有良好表现,经过学习训练后的神经元连接权值矩阵包含样本的内在特征和规律。研究使用神经网络权值矩阵打分预测;采用错位比对方法寻找敏感的氨基酸邻域;分析测试集在不同加窗长度下的共性表现。实验表明,在滑动窗口长度L=7时,预测性能变化显著;邻域位置P=4的氨基酸残基对预测性能有加强作用。该研究方法为基于局部序列特征的蛋白质二级结构预测提供了新的算法设计。  相似文献   

3.
基于蛋白质结构字母的预测和分析方法,一个必然的步聚,是将目标蛋白质离散成结构字母序列。本文在对蛋白质结构字母序列空间,及其最小根均方偏差变化,穷举分析的基础上,提出了一种新的蛋白质结构字母序列优化算法,全局贪婪算法。全局贪婪算法避免了基本贪婪算法过度依赖候选集大小,计算量过大、以及过早收缩于局部最小等缺点。经实验分析,全局贪婪算法在性能上优于基本贪婪算法和局部最优方法。。  相似文献   

4.
本文在菱形网格上研究讨论了二维HP模型。首先,将蛋白质结构预测问题转化成一个数学问题,并简化成氨基酸序列中每个氨基酸与网格格点的匹配问题。为了解决这个数学问题,我们改进并扩展了经典的粒子群算法。为了验证算法和模型的有效性,我们对一些典型的算例进行数值模拟。通过与方格网上得到的蛋白质构象进行比较,菱形网上的蛋白质构象更自然,更接近真实。我们进一步比较了菱形网格上的紧致构象和非紧致构象。结果显示我们的模型和算法在菱形网格上预测氨基酸序列的蛋白质结构是有效的有意义的。  相似文献   

5.
大肠杆菌基因中密码子前后碱基的使用与蛋白质结构   总被引:4,自引:0,他引:4  
对一组E.coli基因中编码蛋白质各类二级结构(α-螺旋、β-折叠片、无规卷曲和回折)的密码子前后碱基的使用情况进行统计分析和比较,发现一些密码子前后碱基的使用有偏向,而且这些偏向与蛋白质的二级结构有关联,这同时亦表明,E.coli基因中同义密码子的选用与蛋白质的二级结构有一些关联。模型对于蛋白质结构预测算法的改进以及基因工程的研究有辅助作用。  相似文献   

6.
详细考察了基于HNP(H:hydtophobic,N:neutral,P:hydrophilic)模型及相对熵的蛋白质设计方法对于不同结构类型蛋白质的适用性,并与基于HP模型的结果进行了比较.通过对190个4种不同结构类型的蛋白质进行预测,结果表明,基于HNP模型及相对熵的设计方法对于不同结构类型的蛋白质具有普适性.进一步的研究发现,对于α螺旋、β折叠等规则的二级结构,该方法的预测成功率高于无规卷曲结构预测成功率.另外,还比较了对不同氨基酸的预测差异,结果显示亲水残基的预测成功率较高.此外,研究表明该方法对于蛋白质保守残基的预测成功率高于非保守残基.在以上分析的基础上,进一步讨论了导致这些差异的原因.这些研究为基于相对熵的蛋白质设计方法的实际应用和进一步的发展打下了良好基础.  相似文献   

7.
蛋白质序列中的关联规则发现及其应用   总被引:2,自引:0,他引:2  
随着蛋白质序列-结构分析中使用的机器学习算法越来越复杂,其结果的解释和发现过程也随之复杂化,因此有必要寻找简单且理论上可靠的方法。通过引入原理简单、理论可靠、结果具有很强实际意义的关联规则发现算法,找到了蛋白质序列中数以万计的模式。结合实例演示了如何将这些模式应用于蛋白质序列分析中,如保守区域发现、二级结构预测等。同时根据这些结果构建了一个二级结构规则库和一种简单的二级结构预测算法,实验结果表明,约81%的二级结构可以由至少一条关联规则预测得到。  相似文献   

8.
序列比对是生物信息学研究的一个重要工具,它在序列拼接、蛋白质结构预测、蛋白质结构功能分析、系统进化分析、数据库检索以及引物设计等问题的研究中被广泛使用。本文详细介绍了在生物信息学中常用的一些序列比对算法,比较了这些算法所需的计算复杂度,优缺点,讨论了各自的使用范围,并指出今后序列比对研究的发展方向。  相似文献   

9.
蛋白质的二级结构预测研究进展   总被引:1,自引:0,他引:1  
唐媛  李春花  张瑗  尚进  邹凌云  李立奇 《生物磁学》2013,(26):5180-5182
认识蛋白质的二级结构是了解蛋白质的折叠模式和三级结构的基础,并为研究蛋白质的功能以及它们之间的相互作用模式提供结构基础,同时还可以为新药研发提供帮助。故研究蛋白质的二级结构具有重要的意义。随着后基因组时代的到来,越来越多的蛋白质序列不断被发现,给蛋白质的二级结构研究带来巨大的挑战和研究空间。而依靠传统的实验方法很难获取大规模蛋白质的二级结构信息。目前,采用生物信息学手段仍然是获得大部分蛋白质二级结构的途径。近年来,许多研究者通过构建用于二级结构预测的蛋白质数据集,计算、提取蛋白质的各种特征信息,并采用不同的预测算法预测蛋白质的二级结构得到了快速的发展。本文拟从蛋白质的特征信息的提取与筛选、预测算法以及预测效果的检验方法等方面进行综述,介绍蛋白质二级结构预测领域的研究进展。相信随着基因组学、蛋白质组学和生物信息学的不断发展,蛋白质二级结构预测会不断取得新突破。  相似文献   

10.
本文基于范德华力势能预测2D三向的蛋白质结构。首先,将蛋白质结构预测这一生物问题转化为数学问题,并建立基于范德华力势能函数的数学模型。其次,使用遗传算法对数学模型进行求解,为了提高蛋白质结构预测效率,我们在标准遗传算法的基础上引入了调整算子这一概念,改进了遗传算法。最后,进行数值模拟实验。实验的结果表明范德华力势能函数模型是可行的,同时,和规范遗传算法相比,改进后的遗传算法能够较大幅度提高算法的搜索效率,并且遗传算法在蛋白质结构预测问题上有巨大潜力。  相似文献   

11.
Kamat AP  Lesk AM 《Proteins》2007,66(4):869-876
Comparing and classifying protein folding patterns allows organizing the known structures and enumerating possible protein structural patterns including those not yet observed. We capture the essence of protein folding patterns in a concise tableau representation based on the order and contact patterns of secondary structures: helices and strands of sheet. The tableaux are intelligible to both humans and computers. They provide a database, derived from the Protein Data Bank, mineable in studies of protein architecture. Using this database, we have: (i) determined statistical properties of secondary structure contacts in an unbiased set of protein domains from ASTRAL, (ii) observed that in 98% of cases, the tableau is a faithful representation of the folding pattern as classified in SCOP, (iii) demonstrated that to a large extent the local structure of proteins indicates their complete folding topology, and (iv) studied the use of the representation for fold identification.  相似文献   

12.
Trevor P. Creamer 《Proteins》1998,33(2):218-226
The left-handed polyproline II helix (PPII) is believed to be the preferred conformation for proline-rich regions of sequence in proteins. Such regions have been postulated to be protein-protein interaction domains. The formation of this structure is studied here using simple Monte Carlo computer simulations employing the hard sphere potential. It is found that polyproline sequences adopt only the PPII structure in the simulations. Non-proline, non-glycine residues inserted as guests into polyproline host peptides are conformationally restricted by the following proline residues and tend to be part of the PPII helix. It is found through insertion of two alanine residues into polyproline that the PPII structure is not propagated through more than one non-proline residue. This finding calls into question the hypothesis that proline-rich regions will preferentially adopt this structure since many such sequences are comprised of less than 50% proline residues. Proteins 33:218–226, 1998. © 1998 Wiley-Liss, Inc.  相似文献   

13.
14.
Supersecondary structures of proteins have been systematically searched and classified, but not enough attention has been devoted to such large edifices beyond the basic identification of secondary structures. The objective of the present study is to show that the association of secondary structures that share some of their backbone residues is a commonplace in globular proteins, and that such deeper fusion of secondary structures, namely extended secondary structures (ESSs), helps stabilize the original secondary structures and the resulting tertiary structures. For statistical purposes, a set of 163 proteins from the protein databank was randomly selected and a few specific cases are structurally analyzed and characterized in more detail. The results point that about 30% of the residues from each protein, on average, participate in ESS. Alternatively, for the specific cases considered, our results were based on the secondary structures produced after extensive Molecular Dynamics simulation of a protein–aqueous solvent system. Based on the very small width of the time distribution of the root mean squared deviations, between the ESS taken along the simulation and the ESS from the mean structure of the protein, for each ESS, we conclude that the ESSs significantly increase the conformational stability by forming very stable aggregates. The ubiquity and specificity of the ESS suggest that the role they play in the structure of proteins, including the domains formation, deserves to be thoroughly investigated.  相似文献   

15.
Mark Gerstein 《Proteins》1998,33(4):518-534
Eight microbial genomes are compared in terms of protein structure. Specifically, yeast, H. influenzae, M. genitalium, M. jannaschii, Synechocystis, M. pneumoniae, H. pylori, and E. coli are compared in terms of patterns of fold usage—whether a given fold occurs in a particular organism. Of the ∼340 soluble protein folds currently in the structure databank (PDB), 240 occur in at least one of the eight genomes, and 30 are shared amongst all eight. The shared folds are depleted in all-helical structure and enriched in mixed helix-sheet structure compared to the folds in the PDB. The top-10 most common of the shared 30 are enriched in superfolds, uniting many non-homologous sequence families, and are especially similar in overall architecture—eight having helices packed onto a central sheet. They are also very different from the common folds in the PBD, highlighting databank biases. Folds can be ranked in terms of expression as well as genome duplication. In yeast the top-10 most highly expressed folds are considerably different from the most highly duplicated folds. A tree can be constructed grouping genomes in terms of their shared folds. This has a remarkably similar topology to more conventional classifications, based on very different measures of relatedness. Finally, folds of membrane proteins can be analyzed through transmembrane-helix (TM) prediction. All the genomes appear to have similar usage patterns for these folds, with the occurrence of a particular fold falling off rapidly with increasing numbers of TM-elements, according to a “Zipf-like” law. This implies there are no marked preferences for proteins with particular numbers of TM-helices (e.g. 7-TM) in microbial genomes. Further information pertinent to this analysis is available at http://bioinfo.mbb.yale.edu/genome. Proteins 33:518–534, 1998. © 1998 Wiley-Liss, Inc.  相似文献   

16.
Mönnigmann M  Floudas CA 《Proteins》2005,61(4):748-762
The structure prediction of loops with flexible stem residues is addressed in this article. While the secondary structure of the stem residues is assumed to be known, the geometry of the protein into which the loop must fit is considered to be unknown in our methodology. As a consequence, the compatibility of the loop with the remainder of the protein is not used as a criterion to reject loop decoys. The loop structure prediction with flexible stems is more difficult than fitting loops into a known protein structure in that a larger conformational space has to be covered. The main focus of the study is to assess the precision of loop structure prediction if no information on the protein geometry is available. The proposed approach is based on (1) dihedral angle sampling, (2) structure optimization by energy minimization with a physically based energy function, (3) clustering, and (4) a comparison of strategies for the selection of loops identified in (3). Steps (1) and (2) have similarities to previous approaches to loop structure prediction with fixed stems. Step (3) is based on a new iterative approach to clustering that is tailored for the loop structure prediction problem with flexible stems. In this new approach, clustering is not only used to identify conformers that are likely to be close to the native structure, but clustering is also employed to identify far-from-native decoys. By discarding these decoys iteratively, the overall quality of the ensemble and the loop structure prediction is improved. Step (4) provides a comparative study of criteria for loop selection based on energy, colony energy, cluster density, and a hybrid criterion introduced here. The proposed method is tested on a large set of 3215 loops from proteins in the Pdb-Select25 set and to 179 loops from proteins from the Casp6 experiment.  相似文献   

17.
For biomolecular NMR structures typically only a poor correspondence is observed between statistics derived from the experimental input data and structural quality indicators obtained from the structure ensembles. Here, we investigate the relationship between the amount of available NMR data and structure quality. By generating datasets with a predetermined information content and evaluating the quality of the resulting structure ensembles we show that there is, in contrast to previous findings, a linear relation between the information contained in experimental data and structural quality. From this relation, a new quality parameter is derived that provides direct insight, on a per-residue basis, into the extent to which structural quality is governed by the experimental input data.  相似文献   

18.
Kosloff M  Kolodny R 《Proteins》2008,71(2):891-902
It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which "redundant" structures have been removed, based on a sequence-based criterion for similarity. Similarly, when predicting protein structure using homology modeling, if a template structure for modeling a target sequence is selected by sequence alone, this implicitly assumes that all sequence-similar templates are equivalent. Here, we show that this assumption is often not correct and that standard approaches to create subsets of the PDB can lead to the loss of structurally and functionally important information. We have carried out sequence-based structural superpositions and geometry-based structural alignments of a large number of protein pairs to determine the extent to which sequence similarity ensures structural similarity. We find many examples where two proteins that are similar in sequence have structures that differ significantly from one another. The source of the structural differences usually has a functional basis. The number of such proteins pairs that are identified and the magnitude of the dissimilarity depend on the approach that is used to calculate the differences; in particular sequence-based structure superpositioning will identify a larger number of structurally dissimilar pairs than geometry-based structural alignments. When two sequences can be aligned in a statistically meaningful way, sequence-based structural superpositioning provides a meaningful measure of structural differences. This approach and geometry-based structure alignments reveal somewhat different information and one or the other might be preferable in a given application. Our results suggest that in some cases, notably homology modeling, the common use of nonredundant datasets, culled from the PDB based on sequence, may mask important structural and functional information. We have established a data base of sequence-similar, structurally dissimilar protein pairs that will help address this problem (http://luna.bioc.columbia.edu/rachel/seqsimstrdiff.htm).  相似文献   

19.
A novel protein structure alignment technique has been developed reducing much of the secondary and tertiary structure to a sequential representation greatly accelerating many structural computations, including alignment. Constructed from incidence relations in the Delaunay tetrahedralization, alignments of the sequential representation describe structural similarities that cannot be expressed with rigid-body superposition and complement existing techniques minimizing root-mean-squared distance through superposition. Restricting to the largest substructure superimposable by a single rigid-body transformation determines an alignment suitable for root-mean-squared distance comparisons and visualization. Restricted alignments of a test set of histones and histone-like proteins determined superpositions nearly identical to those produced by the established structure alignment routines of DaliLite and ProSup. Alignment of three, increasingly complex proteins: ferredoxin, cytidine deaminase, and carbamoyl phosphate synthetase, to themselves, demonstrated previously identified regions of self-similarity. All-against-all similarity index comparisons performed on a test set of 45 class I and class II aminoacyl-tRNA synthetases closely reproduced the results of established distance matrix methods while requiring 1/16 the time. Principal component analysis of pairwise tetrahedral decomposition similarity of 2300 molecular dynamics snapshots of tryptophanyl-tRNA synthetase revealed discrete microstates within the trajectory consistent with experimental results. The method produces results with sufficient efficiency for large-scale multiple structure alignment and is well suited to genomic and evolutionary investigations where no geometric model of similarity is known a priori.  相似文献   

20.
曹晨  马堃 《生物信息学》2016,14(3):181-187
蛋白质二级结构是指蛋白质骨架结构中有规律重复的构象。由蛋白质原子坐标正确地指定蛋白质二级结构是分析蛋白质结构与功能的基础,二级结构的指定对于蛋白质分类、蛋白质功能模体的发现以及理解蛋白质折叠机制有着重要的作用。并且蛋白质二级结构信息广泛应用到蛋白质分子可视化、蛋白质比对以及蛋白质结构预测中。目前有超过20种蛋白质二级结构指定方法,这些方法大体可以分为两大类:基于氢键和基于几何,不同方法指定结果之间的差异较大。由于尚没有蛋白质二级结构指定方法的综述文献,因此,本文主要介绍和总结已有蛋白质二级结构指定方法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号