首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
β转角作为一种蛋白质二级结构类型在蛋白质折叠、蛋白质稳定性、分子识别等方面具有重要作用.现有的β转角预测方法,没有将PDB等结构数据库中先前存在的同源序列的结构信息映射到待预测的蛋白质序列上.PDB存储的结构已超过70 000,因此对一条新确定的序列,有较大可能性从PDB中找到其同源序列.本文融合PDB中提取的同源结构信息(对每一待测序列,仅使用先于该序列存储于PDB中的同源信息)与NetTurnP预测,提出了一种新的β转角预测方法BTMapping,在经典的BT426数据集和本文构建的数据集EVA937上,以马修斯相关系数表示的预测精度分别为0.56、0.52,而仅使用NetTurnP的为0.50、0.46,以Qtotal表示的预测精度分别为81.4%、80.4%,而仅使用NetTurnP的为78.2%、77.3%.结果证实同源结构信息结合先进的β转角预测器如NetTurnP有助于改进β转角识别.BTMapping程序及相关数据集可从http://www.bio530.weebly.com获得.  相似文献   

2.
蛋白质结构预测研究进展   总被引:1,自引:0,他引:1  
蛋白质结构预测是生物信息学当前的主要挑战之一.按照蛋白质结构预测对PDB数据 库信息的依赖程度,可以将其划分成两类:模板依赖模型和从头预测方法.其中模板依赖模 型又可以分为同源模型与穿线法.本文介绍了各种预测方法主要步骤,分析了制约各种方法 的瓶颈,及其研究进展.同源模型所取得的结构精度较高,但其对模板依赖性强;用于低同 源性的穿线法是模板依赖的模型重要的研究方向;从头预测法中统计学函数与物理函数的综 合使用取得了很好的效果,但是对于超过150个残基的片段,依然是巨大的挑战.  相似文献   

3.
同源建模关键步骤的研究动态   总被引:1,自引:0,他引:1  
应用同源建模的蛋白质结构预测已经成为一种快速获得蛋白质结构的技术,这种技术也将成为完成结构基因组计划的有力工具.同源建模是指寻找与目标序列同源而且有实验测定结构的蛋白质作为模板,从而构建目标序列的结构模型的方法.限制这种方法的应用主要是同源建模的关键步骤,即目标与模板之间序列比对和环区建模的准确性.当模型的准确性达到令人信服的程度时,更为精确的计算机辅助药物设计和改造蛋白质,甚至设计全新功能的蛋白质将成为可能.综述了从算法和策略上提高同源建模关键步骤准确性的研究进展.  相似文献   

4.
基于知识的蛋白质结构预测   总被引:5,自引:0,他引:5  
介绍了近几年基于知识的蛋白质三维结构预测方法及其进展.目前,基于知识的结构预测方法主要有两类,一类是同源蛋白模建,这种技术比较成熟,模建的结果可靠性比较高,但只适用于同源性比较高的目标序列的模建;另一类方法即蛋白质逆折叠技术,主要包括3D profile方法和基于势函数的方法,给出的是目标蛋白质的空间走向,它主要可用于序列同源性比较低的蛋白质的结构预测.  相似文献   

5.
同源建模在纤维素酶分子改造中的应用   总被引:2,自引:0,他引:2  
同源建模技术(homology modeling)给蛋白质的研究带来了新的希望,在理论上解决了结构预测和功能分析以及蛋白质工程实施方面所面临的难题.纤维素酶(cellulase)是能水解纤维素生成纤维二糖和葡萄糖的一组酶的总称.对纤维素酶的研究目前已经发展到结构功能分析、理性设计等方面.由于实验方法不能胜任全部纤维素酶结构的测定工作,故以计算机为依托的同源建模技术便发挥着重要作用,它在纤维素酶分子改造中的应用主要有:家族同源分析、研究功能氨基酸的作用机理、基于分子结构的理性设计、预测突变体结构和新功能等.随着同源建模技术自身的不断完善,以及分子对接、分子动力学模拟等技术的发展,计算机模拟技术将在酶分子的改造过程中显示出巨大的生命力.  相似文献   

6.
基于SWISS-MODEL的蛋白质三维结构建模   总被引:3,自引:0,他引:3  
蛋白质的三级结构预测可通过同源建模、Threading和TOPITS等方法进行,但同源建模是应用最为广泛的方法。SWISS-MODEL正是一个基于同源建模的蛋白质结构服务器。它与ExPASy网站和DeepView程序是紧密相联系的。该文重点介绍SWISS-MODEL的提交方式、建模的步骤、结果的评估和应用程序等。  相似文献   

7.
目的:基于生物信息学预测人线粒体转录终止因子3(hMTERF3)蛋白的结构与功能。方法:利用GenBank、Uniprot、ExPASy、SWISS-PROT数据库资源和不同的生物信息学软件对hMTERF3蛋白进行系统研究,包括hMTERF3的理化性质、跨膜区和信号肽、二级结构功能域、亚细胞定位、蛋白质的功能分类预测、同源蛋白质多重序列比对、系统发育树构建、三级结构同源建模。结果:软件预测hMTERF3蛋白的相对分子质量为47.97×103,等电点为8.60,不具信号肽和跨膜区;二级结构分析显示主要为螺旋和无规则卷曲,包含6个MTERF基序,三级结构预测结果与二级结构预测结果相符;亚细胞定位分析结果显示该蛋白定位于人线粒体;功能分类预测其为转运和结合蛋白,参与基因转录调控;同源蛋白质多重序列比对和进化分析显示,hMTERF3蛋白与大鼠、小鼠等哺乳动物的MTERF3蛋白具有高度同源性,在系统发育树上聚为一类。结论:hMTERF3蛋白的生物信息学分析为进一步开展对该蛋白的结构和功能的实验研究提供了理论依据。  相似文献   

8.
依据蛋白质氨基酸特性,以氨基酸组成和有偏自协方差函数为特征矢量,用BP神经网络提出了一种预测非同源蛋白质中α螺旋和β折叠二级结构含量的计算方法。采用相互独立的非同源蛋白质数据库对该方法进行了检验。用Ponnuswamy值时,对二级结构α螺旋和β折叠含量的预测结果是;自检验平均绝对误差分别为0.069和0.065,相应标准偏差分别为0.044和0.047;他检验平均绝对误差分别为0.077和0.070,相应标准偏差分别为0.051和0.049。与仅以氨基酸组成为特征矢量的BP神经网络方法比较,相应的他检验平均绝对误差分别减小了0.024和0.016,标准偏差分别减小了0.031和0.018;与改进的多元线性回归方法比较,相应的他检验平均绝对误差分别减小了0.018和0.011,准偏差分别减小了0.020和0.012。表明:基于氨基酸组成和有偏自协方差函数为特征矢量的BP神经网络预测蛋白质二级结构含量的方法可有效提高预测精度。  相似文献   

9.
基因预测是指预测DNA序列中编码蛋白质的部分。随着多数生物基因组的测序工作的完成 ,基因预测更显得尤为重要。基因预测主要包括两种方法 ,首先是同源方法 ,也称为“外在方法” ,其次是基因预测方法或称为“内在方法”。主要对隐马尔可夫模型、傅立叶变换、动态规划等几种“外在方法”进行介绍。  相似文献   

10.
目的:基于支持向量机建立一个自动化识别新肽链四级结构的方法,提高现有方法的识别精度.方法:改进4种已有的蛋白质一级序列特征值提取方法,采用线性和非线性组合预测方法建立一个有效的组合预测模型.结果:以同源二聚体及非同源二聚体为例.对4种特征值提取方法进行改进后其分类精度均提升了2~3%;进一步实施线性与非线性组合预测后,其分类精度再次提高了2~3%,使独立测试集的分类精度达到了90%以上.结论:4种特征值提取方法均较好地反应出蛋白质一级序列包含四级结构信息,组合预测方法能有效地集多种特征值提取方法优势于一体.  相似文献   

11.
Pairwise alignment incorporating dipeptide covariation   总被引:1,自引:0,他引:1  
MOTIVATION: Standard algorithms for pairwise protein sequence alignment make the simplifying assumption that amino acid substitutions at neighboring sites are uncorrelated. This assumption allows implementation of fast algorithms for pairwise sequence alignment, but it ignores information that could conceivably increase the power of remote homolog detection. We examine the validity of this assumption by constructing extended substitution matrices that encapsulate the observed correlations between neighboring sites, by developing an efficient and rigorous algorithm for pairwise protein sequence alignment that incorporates these local substitution correlations and by assessing the ability of this algorithm to detect remote homologies. RESULTS: Our analysis indicates that local correlations between substitutions are not strong on the average. Furthermore, incorporating local substitution correlations into pairwise alignment did not lead to a statistically significant improvement in remote homology detection. Therefore, the standard assumption that individual residues within protein sequences evolve independently of neighboring positions appears to be an efficient and appropriate approximation.  相似文献   

12.
蛋白质工程:从定向进化到计算设计   总被引:1,自引:0,他引:1  
曲戈  朱彤  蒋迎迎  吴边  孙周通 《生物工程学报》2019,35(10):1843-1856
定向进化通过建立突变体文库与高通量筛选方法,快速提升蛋白的特定性质,是目前蛋白质工程最为常用的蛋白质设计改造策略。近十年随着计算机运算能力大幅提升以及先进算法不断涌现,计算机辅助蛋白质设计改造得到了极大的重视和发展,成为蛋白质工程新开辟的重要方向。以结构模拟与能量计算为基础的蛋白质计算设计不但能改造酶的底物特异性与热稳定性,还可从头设计具有特定功能的人工酶。近年来机器学习等人工智能技术也被应用于计算机辅助蛋白质设计改造,并取得瞩目的成绩。文中介绍了蛋白质工程的发展历程,重点评述当前计算机辅助蛋白质设计改造方面的进展与应用,并展望其未来发展方向。  相似文献   

13.
Bioinspired algorithms, such as evolutionary algorithms and ant colony optimization, are widely used for different combinatorial optimization problems. These algorithms rely heavily on the use of randomness and are hard to understand from a theoretical point of view. This paper contributes to the theoretical analysis of ant colony optimization and studies this type of algorithm on one of the most prominent combinatorial optimization problems, namely the traveling salesperson problem (TSP). We present a new construction graph and show that it has a stronger local property than one commonly used for constructing solutions of the TSP. The rigorous runtime analysis for two ant colony optimization algorithms, based on these two construction procedures, shows that they lead to good approximation in expected polynomial time on random instances. Furthermore, we point out in which situations our algorithms get trapped in local optima and show where the use of the right amount of heuristic information is provably beneficial.  相似文献   

14.
We introduce a data structure called a superword array for finding quickly matches between DNA sequences. The superword array possesses some desirable features of the lookup table and suffix array. We describe simple algorithms for constructing and using a superword array to find pairs of sequences that share a unique superword. The algorithms are implemented in a genome assembly program called PCAP.REP for computation of overlaps between reads. Experimental results produced by PCAP.REP and PCAP on a whole-genome dataset show that PCAP.REP produced a more accurate and contiguous assembly than PCAP.  相似文献   

15.

Backgrounds

Recent explosion of biological data brings a great challenge for the traditional clustering algorithms. With increasing scale of data sets, much larger memory and longer runtime are required for the cluster identification problems. The affinity propagation algorithm outperforms many other classical clustering algorithms and is widely applied into the biological researches. However, the time and space complexity become a great bottleneck when handling the large-scale data sets. Moreover, the similarity matrix, whose constructing procedure takes long runtime, is required before running the affinity propagation algorithm, since the algorithm clusters data sets based on the similarities between data pairs.

Methods

Two types of parallel architectures are proposed in this paper to accelerate the similarity matrix constructing procedure and the affinity propagation algorithm. The memory-shared architecture is used to construct the similarity matrix, and the distributed system is taken for the affinity propagation algorithm, because of its large memory size and great computing capacity. An appropriate way of data partition and reduction is designed in our method, in order to minimize the global communication cost among processes.

Result

A speedup of 100 is gained with 128 cores. The runtime is reduced from serval hours to a few seconds, which indicates that parallel algorithm is capable of handling large-scale data sets effectively. The parallel affinity propagation also achieves a good performance when clustering large-scale gene data (microarray) and detecting families in large protein superfamilies.  相似文献   

16.
The problem of alignment of two symbol sequences is considered. The validity of the available algorithms for constructing optimal alignment depends on the weighting coefficients which are frequently difficult to choose. A new approach to the problem is proposed, which is based on the use of vector weighting functions (instead of tradionally used scalar ones) and Pareto-optimal alignment (an alignment that is optimal at any choice of weighting coefficient will always be Pareto-optimal). An efficient algorithm for constructing all Pareto-optimal alignments of two sequences is proposed. An approach to choosing a "biologically correct" alignment among all Pareto-optimal alignments is suggested.  相似文献   

17.
On the algorithms for determining the primary structure of biopolymers   总被引:1,自引:0,他引:1  
The algorithm for determining the primary structure of biopolymers from complete and partial digests are analyzed. The problem of determining the primary structure is formulated in the form of the problem of word reconstruction in the limits of which the corresponding algorithms are analyzed. Difficulties arising in constructing the algorithms for determining the primary structure of nucleic acids from a partial digest are discussed. They seem to be due to the extensive testing of variants. When there is a certain scheme of the initial data from a partial digest we propose an economical testing (searching) algorithm. The scheme of an effective algorithm for reconstruction of the primary structure fromN complete digests is given.  相似文献   

18.
Moreno E  León K 《Proteins》2002,47(1):1-13
We present a new method for representing the binding site of a protein receptor that allows the use of the DOCK approach to screen large ensembles of receptor conformations for ligand binding. The site points are constructed from templates of what we called "attached points" (ATPTS). Each template (one for each type of amino acid) is composed of a set of representative points that are attached to side-chain and backbone atoms through internal coordinates, carry chemical information about their parent atoms and are intended to cover positions that might be occupied by ligand atoms when complexed to the protein. This method is completely automatic and proved to be extremely fast. With the aim of obtaining an experimental basis for this approach, the Protein Data Bank was searched for proteins in complex with small molecules, to study the geometry of the interactions between the different types of protein residues and the different types of ligand atoms. As a result, well-defined patterns of interaction were obtained for most amino acids. These patterns were then used for constructing a set of templates of attached points, which constitute the core of the ATPTS approach. The quality of the ATPTS representation was demonstrated by using this method, in combination with the DOCK matching and orientation algorithms, to generate correct ligand orientations for >1000 protein--ligand complexes.  相似文献   

19.
On combinatorial DNA word design.   总被引:1,自引:0,他引:1  
We consider the problem of designing DNA codes, namely sets of equi-length words over the alphabet [A, C, G, T] that satisfy certain combinatorial constraints. This problem is motivated by the task of reliably storing and retrieving information in synthetic DNA strands for use in DNA computing or as molecular bar codes in chemical libraries. The primary constraints that we consider, defined with respect to a parameter d, are as follows: for every pair of words w, x in a code, there are at least d mismatches between w and x if w not equal x and also between the reverse of w and the Watson-Crick complement of x. Extending classical results from coding theory, we present several upper and lower bounds on the maximum size of such DNA codes and give methods for constructing such codes. An additional constraint that is relevant to the design of DNA codes is that the free energies and enthalpies of the code words, and thus the melting temperatures, be similar. We describe dynamic programming algorithms that can (a) calculate the total number of words of length n whose free energy value, as approximated by a formula of Breslauer et al. (1986) falls in a given range, and (b) output a random such word. These algorithms are intended for use in heuristic algorithms for constructing DNA codes.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号