首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
序列比对是生物信息学研究的一个重要工具,它在序列拼接、蛋白质结构预测、蛋白质结构功能分析、系统进化分析、数据库检索以及引物设计等问题的研究中被广泛使用。本文详细介绍了在生物信息学中常用的一些序列比对算法,比较了这些算法所需的计算复杂度,优缺点,讨论了各自的使用范围,并指出今后序列比对研究的发展方向。  相似文献   

2.
蛋白质三维结构叠加面临的主要问题是,参与叠加的目标蛋白质的氨基酸残基存在某些缺失,但是多结构叠加方法却大多数需要完整的氨基酸序列,而目前通用的方法是直接删去缺失的氨基酸序列,导致叠加结果不准确。由于同源蛋白质间结构的相似性,因此,一个蛋白质结构中缺失的某个区域,可能存在于另一个同源蛋白质结构中。基于此,本文提出一种新的、简单、有效的缺失数据下的蛋白质结构叠加方法(ITEMDM)。该方法采用缺失数据的迭代思想计算蛋白质的结构叠加,采用优化的最小二乘算法结合矩阵SVD分解方法,求旋转矩阵和平移向量。用该方法成功叠加了细胞色素C家族的蛋白质和标准Fischer’s 数据库的蛋白质(67对蛋白质),并且与其他方法进行了比较。数值实验表明,本算法有如下优点:①与THESEUS算法相比较,运行时间快,迭代次数少;②与PSSM算法相比较,结果准确,运算时间少。结果表明,该方法可以更好地叠加缺失数据的蛋白质三维结构。  相似文献   

3.
蛋白质序列中的关联规则发现及其应用   总被引:2,自引:0,他引:2  
随着蛋白质序列-结构分析中使用的机器学习算法越来越复杂,其结果的解释和发现过程也随之复杂化,因此有必要寻找简单且理论上可靠的方法。通过引入原理简单、理论可靠、结果具有很强实际意义的关联规则发现算法,找到了蛋白质序列中数以万计的模式。结合实例演示了如何将这些模式应用于蛋白质序列分析中,如保守区域发现、二级结构预测等。同时根据这些结果构建了一个二级结构规则库和一种简单的二级结构预测算法,实验结果表明,约81%的二级结构可以由至少一条关联规则预测得到。  相似文献   

4.
基于动态规划的快速序列比对算法   总被引:3,自引:0,他引:3  
序列比对算法是生物信息学中重要的研究方向之一,而动态规划法是序列比对算法中最有效最基本的方法.由于原有的基本动态规划方法时间和空间复杂度大,不适合实际的生物序列比对,因此本文在分析介绍几种相关动态规划算法的基础上,提出了一种基于动态规划的快速序列比对算法UKK_FA.实验结果表明,该算法有效地降低了时间复杂度,具有一定的实用性。  相似文献   

5.
本文在菱形网格上研究讨论了二维HP模型。首先,将蛋白质结构预测问题转化成一个数学问题,并简化成氨基酸序列中每个氨基酸与网格格点的匹配问题。为了解决这个数学问题,我们改进并扩展了经典的粒子群算法。为了验证算法和模型的有效性,我们对一些典型的算例进行数值模拟。通过与方格网上得到的蛋白质构象进行比较,菱形网上的蛋白质构象更自然,更接近真实。我们进一步比较了菱形网格上的紧致构象和非紧致构象。结果显示我们的模型和算法在菱形网格上预测氨基酸序列的蛋白质结构是有效的有意义的。  相似文献   

6.
基于量子进化算法的RNA序列-结构比对   总被引:1,自引:0,他引:1  
多序列比对是计算分子生物学的经典问题,也是许多生物学研究的重要基础步骤.RNA作为生物大分子的一种,不同于蛋白质和DNA,其二级结构在进化过程中比初级序列更保守,因此要求在RNA序列比对中不仅要考虑序列信息,更要着重考虑二级结构信息.提出了一种基于量子进化算法的RNA多序列-结构比对程序,对RNA序列进行了量子编码,设计了考虑进结构信息的全交叉算子,提出了适合于进行RNA序列-结构比对的适应度函数,克服了传统进化算法收敛速度慢和早熟问题.在标准数据库上的测试,证实了方法的有效性.  相似文献   

7.
李鑫  范虹  赵兴春  范晓诺  姚若侠 《遗传》2023,(10):933-944
在法医DNA分析领域,混合短串联重复序列(short tandem repeats,STR)图谱的分析一直是研究难点。当前,国内主要依靠法医进行人工分析,不仅效率低下,分析结果还存在着主观性偏好,难以满足日益增长的STR图谱分析的需求。本文提出一种新的混合STR图谱分析方法——全局最小残差法,不仅可以计算出分析结果,还可以预测出每个组分的混合比例。该方法首先给混合比例赋予了新的定义,然后对等位基因模型进行优化,进而综合考虑STR图谱中的所有基因座,将每个基因座的残差值进行累加求和,选择累加和最小的混合比例作为推断结果,并使用灰狼优化算法快速寻找混合比例的最优值。对于二组分STR图谱,全局最小残差法能够兼顾分析的准确性和分析速度,有利于实现大量的图谱分析。本文提出的算法在实际应用中取得了不错的效果,具有较高的应用价值,可为混合STR图谱分析领域的研究提供新的解决方案。  相似文献   

8.
在生物信息学研究中,生物序列比对问题占有重要的地位。多序列比对问题是一个NPC问题,由于时间和空间的限制不能够求出精确解。文中简要介绍了Feng和Doolittle提出的多序列比对算法的基本思想,并改进了该算法使之具有更好的比对精度。实验结果表明,新算法对解决一般的progressive多序列比对方法中遇到的局部最优问题有较好的效果。  相似文献   

9.
改进蛋白质序列空间检索策略是未来蛋白质工程研究的一个关键。本文介绍了一种被称为蛋白质序列-活性相关性(ProSAR)驱动的蛋白质定向进化策略的原理、机器学习算法及应用,为提高蛋白质定向进化效率和解决酶学性质的多维优化问题提供了办法和思路。  相似文献   

10.
蛋白质序列的一种新的三维图形表示及其应用   总被引:1,自引:0,他引:1  
基于氨基酸的五字母模型,给出蛋白质序列的一种新的三维图形表示,然后构造一个12维向量来刻画蛋白质序列,这个向量的分量是与12个图形相对应的D/D矩阵的正规化的ALE-指标。最后基于s结构蛋白对冠状病毒进行系统发生分析来阐明该方法的有用性。  相似文献   

11.
张超  张晖  李冀新  高红 《生物信息学》2006,4(3):128-131
遗传算法源于自然界的进化规律,是一种自适应启发式概率性迭代式全局搜索算法。本文主要介绍了GA的基本原理,算法及优点;总结GA在蛋白质结构预测中建立模型和执行策略,以及多种算法相互结合预测蛋白质结构的研究进展。  相似文献   

12.
Finding structural similarities between proteins often helps reveal shared functionality, which otherwise might not be detected by native sequence information alone. Such similarity is usually detected and quantified by protein structure alignment. Determining the optimal alignment between two protein structures, however, remains a hard problem. An alternative approach is to approximate each three-dimensional protein structure using a sequence of motifs derived from a structural alphabet. Using this approach, structure comparison is performed by comparing the corresponding motif sequences or structural sequences. In this article, we measure the performance of such alphabets in the context of the protein structure classification problem. We consider both local and global structural sequences. Each letter of a local structural sequence corresponds to the best matching fragment to the corresponding local segment of the protein structure. The global structural sequence is designed to generate the best possible complete chain that matches the full protein structure. We use an alphabet of 20 letters, corresponding to a library of 20 motifs or protein fragments having four residues. We show that the global structural sequences approximate well the native structures of proteins, with an average coordinate root mean square of 0.69 Å over 2225 test proteins. The approximation is best for all α-proteins, while relatively poorer for all β-proteins. We then test the performance of four different sequence representations of proteins (their native sequence, the sequence of their secondary-structure elements, and the local and global structural sequences based on our fragment library) with different classifiers in their ability to classify proteins that belong to five distinct folds of CATH. Without surprise, the primary sequence alone performs poorly as a structure classifier. We show that addition of either secondary-structure information or local information from the structural sequence considerably improves the classification accuracy. The two fragment-based sequences perform better than the secondary-structure sequence but not well enough at this stage to be a viable alternative to more computationally intensive methods based on protein structure alignment.  相似文献   

13.
Interresidue protein contacts in proteins structures and at protein-protein interface are classically described by the amino acid types of interacting residues and the local structural context of the contact, if any, is described using secondary structures. In this study, we present an alternate analysis of interresidue contact using local structures defined by the structural alphabet introduced by Camproux et al. This structural alphabet allows to describe a 3D structure as a sequence of prototype fragments called structural letters, of 27 different types. Each residue can then be assigned to a particular local structure, even in loop regions. The analysis of interresidue contacts within protein structures defined using Vorono? tessellations reveals that pairwise contact specificity is greater in terms of structural letters than amino acids. Using a simple heuristic based on specificity score comparison, we find that 74% of the long-range contacts within protein structures are better described using structural letters than amino acid types. The investigation is extended to a set of protein-protein complexes, showing that the similar global rules apply as for intraprotein contacts, with 64% of the interprotein contacts best described by local structures. We then present an evaluation of pairing functions integrating structural letters to decoy scoring and show that some complexes could benefit from the use of structural letter-based pairing functions.  相似文献   

14.
Protein tertiary structure prediction using a branch and bound algorithm   总被引:2,自引:0,他引:2  
We report a new method for predicting protein tertiary structure from sequence and secondary structure information. The predictions result from global optimization of a potential energy function, including van der Waals, hydrophobic, and excluded volume terms. The optimization algorithm, which is based on the alphaBB method developed by Floudas and coworkers (Costas and Floudas, J Chem Phys 1994;100:1247-1261), uses a reduced model of the protein and is implemented in both distance and dihedral angle space, enabling a side-by-side comparison of methodologies. For a set of eight small proteins, representing the three basic types--all alpha, all beta, and mixed alpha/beta--the algorithm locates low-energy native-like structures (less than 6A root mean square deviation from the native coordinates) starting from an unfolded state. Serial and parallel implementations of this methodology are discussed.  相似文献   

15.
Dokholyan NV 《Proteins》2004,54(4):622-628
Selecting a protein sequence that corresponds to a specific three-dimensional protein structure is known as the protein design problem. One principal bottleneck in solving this problem is our lack of knowledge of precise atomic interactions. Using a simple model of amino acid interactions, we determine three crucial factors that are important for solving the protein design problem. Among these factors is the protein alphabet-a set of sequence elements that encodes protein structure. Our model predicts that alphabet size is independent of protein length, suggesting the possibility of designing a protein of arbitrary length with the natural protein alphabet. We also find that protein alphabet size is governed by protein structural properties and the energetic properties of the protein alphabet units. We discover that the usage of average types of amino acid in proteins is less than expected if amino acids were chosen randomly with naturally occurring frequencies. We propose three possible scenarios that account for amino acid underusage in proteins. These scenarios suggest the possibility that amino acids themselves might not constitute the alphabet of natural proteins.  相似文献   

16.
Protein topology can be described at different levels. At the most fundamental level, it is a sequence of secondary structure elements (a "primary topology string"). Searching predicted primary topology strings against a library of strings from known protein structures is the basis of some protein fold recognition methods. Here a method known as TOPSCAN is presented for rapid comparison of protein structures. Rather than a simple two-letter alphabet (encoding strand and helix), more complex alphabets are used encoding direction, proximity, accessibility and length of secondary elements and loops in addition to secondary structure. Comparisons are made between the structural information content of primary topology strings and encodings which contain additional information ("secondary topology strings"). The algorithm is extremely fast, with a scan of a large domain against a library of more than 2000 secondary structure strings completing in approximately 30 s. Analysis of protein fold similarity using TOPSCAN at primary and secondary topology levels is presented.  相似文献   

17.
Yang JM  Tung CH 《Nucleic acids research》2006,34(13):3646-3659
As more protein structures become available and structural genomics efforts provide structural models in a genome-wide strategy, there is a growing need for fast and accurate methods for discovering homologous proteins and evolutionary classifications of newly determined structures. We have developed 3D-BLAST, in part, to address these issues. 3D-BLAST is as fast as BLAST and calculates the statistical significance (E-value) of an alignment to indicate the reliability of the prediction. Using this method, we first identified 23 states of the structural alphabet that represent pattern profiles of the backbone fragments and then used them to represent protein structure databases as structural alphabet sequence databases (SADB). Our method enhanced BLAST as a search method, using a new structural alphabet substitution matrix (SASM) to find the longest common substructures with high-scoring structured segment pairs from an SADB database. Using personal computers with Intel Pentium4 (2.8 GHz) processors, our method searched more than 10 000 protein structures in 1.3 s and achieved a good agreement with search results from detailed structure alignment methods. [3D-BLAST is available at http://3d-blast.life.nctu.edu.tw].  相似文献   

18.
19.
How does a folding protein negotiate a vast, featureless conformational landscape and adopt its native structure in biological real time? Motivated by this search problem, we developed a novel algorithm to compare protein structures. Procedures to identify structural analogs are typically conducted in three-dimensional space: the tertiary structure of a target protein is matched against each candidate in a database of structures, and goodness of fit is evaluated by a distance-based measure, such as the root-mean-square distance between target and candidate. This is an expensive approach because three-dimensional space is complex. Here, we transform the problem into a simpler one-dimensional procedure. Specifically, we identify and label the 11 most populated residue basins in a database of high-resolution protein structures. Using this 11-letter alphabet, any protein''s three-dimensional structure can be transformed into a one-dimensional string by mapping each residue onto its corresponding basin. Similarity between the resultant basin strings can then be evaluated by conventional sequence-based comparison. The disorder → order folding transition is abridged on both sides. At the onset, folding conditions necessitate formation of hydrogen-bonded scaffold elements on which proteins are assembled, severely restricting the magnitude of accessible conformational space. Near the end, chain topology is established prior to emergence of the close-packed native state. At this latter stage of folding, the chain remains molten, and residues populate natural basins that are approximated by the 11 basins derived here. In essence, our algorithm reduces the protein-folding search problem to mapping the amino acid sequence onto a restricted basin string.  相似文献   

20.
Prediction of the three-dimensional structure of a protein from its amino acid sequence can be considered as a global optimization problem. In this paper, the Chaotic Artificial Bee Colony (CABC) algorithm was introduced and applied to 3D protein structure prediction. Based on the 3D off-lattice AB model, the CABC algorithm combines global search and local search of the Artificial Bee Colony (ABC) algorithm with the chaotic search algorithm to avoid the problem of premature convergence and easily trapping the local optimum solution. The experiments carried out with the popular Fibonacci sequences demonstrate that the proposed algorithm provides an effective and high-performance method for protein structure prediction.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号