首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 228 毫秒
1.
首先介绍序列比对的分子生物学基础,即核酸序列基本单元核苷酸和蛋白质序列基本单元氨基酸。文中以精心设计的图表列出四种核苷酸和二十种氨基酸的名称、性质和分类。第2节简述序列比对基础,包括相似性和同源性基本概念、整体比对和局部比对、点阵图方法、动态规划和启发式算法、计分矩阵和空位罚分,以及常用软件和分析平台。第3节介绍核酸序列比对中常用计分矩阵DNAfull,蛋白质序列比对中常用计分矩阵BLOSUM62和PAM250。第4-8节则以血红蛋白、多肽毒素、植物转录因子、癌胚抗原和唾液酸酶为例,介绍双序列比对的具体应用。通过这些实例,说明如何选择分析平台和比对程序、如何设置计分矩阵和空位罚分,如何分析比对结果及其生物学意义。文末进行简要总结。  相似文献   

2.
曹阳 《生物学通报》2005,40(1):11-12
多序列比对能够揭示出一系列DNA或蛋白质序列之间的关系,发现序列间的保守区域主要介绍了几种较为常用的多序列比对程序及其使用技巧.  相似文献   

3.
植物LTR类反转录转座子序列分析识别方法   总被引:2,自引:0,他引:2  
侯小改  张曦  郭大龙 《遗传》2012,(11):1507-1516
LTR类反转录转座子(Long terminal repeat retrotransponson)是真核生物中的一类重要转座元件,具有分布广泛、异质性高等特点,在真核生物基因组进化中起着重要作用,现广泛应用于植物的基因功能分析和遗传多样性研究等方面。LTR类反转录转座子的序列识别是其应用的前提条件,因此对LTR类反转录转座子的序列鉴定和分析方法的研究具有重要的理论意义和实际应用价值。LTR类反转录转座子序列的生物信息学分析软件按原理可大致分为序列比对分析和相关序列保守区域识别鉴定两类。比对软件如BLAST、DNAstar等,是一种序列相似性搜索程序,通过与已知的反转录转座子序列比对后的序列相似性来判断未知序列是否是反转录转座子序列,但这类软件不能直接获得具体的LTR等特征序列的相关信息,不能对反转录转座子序列的全长进行识别。识别鉴定软件按原理可分为从头算起法、比较基因组法、同源搜索法和结构基础法4种,如LTR-Finder等基于从头算起法的识别鉴定软件,可对LTR类反转录转座子全序列进行较准确地预测和注释,RepeatMasker等基于同源搜索法的软件,通过与数据库中的序列的相似性比对后发现可能存在的LTR类反转录转座子。文章对不同的LTR类反转录转座子预测方法进行了比较和分析,在此基础上归纳总结出一套分析LTR类反转录转座子序列的操作流程,旨在为LTR类反转录转座子序列的分析提供参考。  相似文献   

4.
利用VBA查找核酸数据库DNA保守序列   总被引:1,自引:0,他引:1  
采用VBA编写了查找核酸数据库保守序列的四个相关程序,“导入DNA序列”程序可以将Fasta格式的DNA序列文本文件存放到Excel Sheetl的A列中,保留每个序列的Gi号,删除多余的注释部分;“整理DNA序列”程序可以将DNA序列Gi号存放到A列中,B列为对应Gi号的完整序列;“DNA随机序列”程序可以产生DNA随机序列;“发现DNA保守序列”程序可以将随机序列与下载的DNA序列比对,查找每一种随机序列的出现频率.以大豆基因组序列为实例,说明了这些程序的应用方法.该程序弥补了流行序列比对软件的不足,为PCR设计引物、分析基因功能以及种质资源鉴定等方面提供新的工具.  相似文献   

5.
EMBOSS和EMBnet     
罗静初 《生物信息学》2021,19(4):223-231
笔者撰写的“EMBOSS软件包序列分析程序实例”一文,已经在《生物信息学》期刊2021年第19卷第1期发表。此文介绍欧洲分子生物学开放软件包(European Molecular Biology Open Software Suite, EMBOSS)。EMBOSS是欧洲分子生物学网络组织(European Molecular Biology Network, EMBnet)于上世纪九十年代末启动的以欧洲国家为主的国际合作项目,是生物信息学领域中较早投入使用的大型开源软件包。本文基于笔者亲身经历,回顾EMBOSS项目的来龙去脉,讲述EMBnet三十多年来的发展历程,及其对生物信息开发、服务和教育培训等方面的贡献,从某个侧面为读者特别是年轻读者展示生物信息学发展早期的一段历史。  相似文献   

6.
序列比对是生物信息学研究的一个重要工具,它在序列拼接、蛋白质结构预测、蛋白质结构功能分析、系统进化分析、数据库检索以及引物设计等问题的研究中被广泛使用。本文详细介绍了在生物信息学中常用的一些序列比对算法,比较了这些算法所需的计算复杂度,优缺点,讨论了各自的使用范围,并指出今后序列比对研究的发展方向。  相似文献   

7.
多序列比对是一种重要的生物信息学工具,在生物的进化分析以及蛋白质的结构预测方面有着重要的应用。以ClustalW为代表的渐进式多序列比对算法在这个领域取得了很大的成功,成为应用最为广泛的多序列比对程序。但其固有的缺陷阻碍了比对精度的进一步提高,近年来出现了许多渐进式比对算法的改进算法,并取得良好的效果。本文选取了其中比较有代表性的几种算法对其基本比对思想予以描述,并且利用多序列比对程序平台BAliBASE和仿真程序ROSE对它们的精度和速度分别进行了比较和评价。  相似文献   

8.
张林  柴惠  沃立科  袁小凤  黄燕芬 《生物信息学》2011,9(2):146-150,154
生物序列比对是生物信息学的基础,是当今功能基因组学研究中最常用、最重要的研究方法之一。本文对各类序列比对算法优缺点进行分析,对图形硬件的优势进行挖掘。在此基础上,将各类序列比对算法中准确性最高的动态规划算法予以实现,并将其映射到图形硬件上,以实现算法加速。通过实例进行性能评测,结果表明该加速算法在保证比对准确性的同时,能较大地提高比对速度。  相似文献   

9.
杨子恒 《遗传》1990,12(6):15-18
本文介绍了作者编制的一组用于分析DNA序列资料的计算机程序。程序用BASIC语言写成,在IBM微型机伤调试运行,包括序列打入、核苷酸频率统计、转译及限制酶切点查找等级部分。  相似文献   

10.
序列比对是基因序列分析中的一项重要工作.本文以人和鼠的基因为对象,介绍MATLAB 7.X生物信息工具箱中的序列比对方法,内容包括从数据库获取序列信息,查找序列的开放阅读框,将核苷酸序列转换为氨基酸序列,绘制比较两氨基酸序列的散点图,用Needleman-Wunsch算法和Smith-Waterman算法进行比对,以及计算两序列的同一性.  相似文献   

11.
基因组学研究中一些常用软件的概述   总被引:1,自引:0,他引:1  
吴清发 《遗传》2003,25(6):708-712
基因组学是以一个物种的全部遗传信息为研究对象,在整体上研究遗传信息的分子组成、组织结构、表达调控和进化等内在机制的基础性学科。基因组学研究中海量数据的存储、管理和检索,以及对这些数据进行挖掘等过程, 必须借助于生物信息学的方法。 目前,大量成熟的软件广泛地应用在基因组学研究中,它们大都可通过互联网免费访问或索取。本文拟对人类基因组计划中常用的一些软件如序列比对、序列组装、重复序列鉴定和基因预测等软件的原理作一介绍,并结合典型软件加以说明。 Abstract:Genomics is a novel subject that has been developed accompanying with the progress of human genome project.Genomics deals with the chemistry component,structure organization and evolution of genome at global level.As genomics associated with huge data,bioinformatics plays an important role in these processes of data production,data management and data mining.At present,many reliable programs have been used in genomic research successfully,which are usually accessible and downloaded freely.We address here the principles of some programs used wildly in genomics such as sequence alignment,sequence assembly,repeat identification and gene prediction,which are exemplified with typical programs respectively.  相似文献   

12.
Structural alignment of proteins is widely used in various fields of structural biology. In order to further improve the quality of alignment, we describe an algorithm for structural alignment based on text modelling techniques. The technique firstly superimposes secondary structure elements of two proteins and then, models the 3D-structure of the protein in a sequence of alphabets. These sequences are utilized by a step-by-step sequence alignment procedure to align two protein structures. A benchmark test was organized on a set of 200 non-homologous proteins to evaluate the program and compare it to state of the art programs, e.g. CE, SAL, TM-align and 3D-BLAST. On average, the results of all-against-all structure comparison by the program have a competitive accuracy with CE and TM-align where the algorithm has a high running speed like 3D-BLAST.  相似文献   

13.
Structural alignments often reveal relationships between proteins that cannot be detected using sequence alignment alone. However, profile search methods based entirely on structural alignments alone have not been found to be effective in finding remote homologs. Here, we explore the role of structural information in remote homolog detection and sequence alignment. To this end, we develop a series of hybrid multidimensional alignment profiles that combine sequence, secondary and tertiary structure information into hybrid profiles. Sequence-based profiles are profiles whose position-specific scoring matrix is derived from sequence alignment alone; structure-based profiles are those derived from multiple structure alignments. We compare pure sequence-based profiles to pure structure-based profiles, as well as to hybrid profiles that use combined sequence-and-structure-based profiles, where sequence-based profiles are used in loop/motif regions and structural information is used in core structural regions. All of the hybrid methods offer significant improvement over simple profile-to-profile alignment. We demonstrate that both sequence-based and structure-based profiles contribute to remote homology detection and alignment accuracy, and that each contains some unique information. We discuss the implications of these results for further improvements in amino acid sequence and structural analysis.  相似文献   

14.
The effectiveness of sequence alignment in detecting structural homology among protein sequences decreases markedly when pairwise sequence identity is low (the so‐called “twilight zone” problem of sequence alignment). Alternative sequence comparison strategies able to detect structural kinship among highly divergent sequences are necessary to address this need. Among them are alignment‐free methods, which use global sequence properties (such as amino acid composition) to identify structural homology in a rapid and straightforward way. We explore the viability of using tetramer sequence fragment composition profiles in finding structural relationships that lie undetected by traditional alignment. We establish a strategy to recast any given protein sequence into a tetramer sequence fragment composition profile, using a series of amino acid clustering steps that have been optimized for mutual information. Our method has the effect of compressing the set of 160,000 unique tetramers (if using the 20‐letter amino acid alphabet) into a more tractable number of reduced tetramers (~15–30), so that a meaningful tetramer composition profile can be constructed. We test remote homology detection at the topology and fold superfamily levels using a comprehensive set of fold homologs, culled from the CATH database that share low pairwise sequence similarity. Using the receiver‐operating characteristic measure, we demonstrate potentially significant improvement in using information‐optimized reduced tetramer composition, over methods relying only on the raw amino acid composition or on traditional sequence alignment, in homology detection at or below the “twilight zone”. Proteins 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

15.
IIntMuctiona习nenC6allpoent13asenondynamiCpmpCgIsthemostWidely11。dllethed11。-quencecompgnsonatpresent.Wbenmpingon18I’ge一切degenomempence肛dyslswiththiskindofmethed,wefacetwomperdifficulties,the18ig6stompandtheIOllgmptationaltdrie.My。。dMill。[“spplyHi。比那’stecheniqJ‘、mpen。alipentpwhl。,wb。dgofl山mconsumeSpaceMypZ’Oportlonaltothesumd山eapuencelmphs.AnewpIOgTgnSIM”,utilizingthealgorithm,hasbeenueding。eequ。ceallpoent.How。,themptationaltimebySIMisstilltoolO…  相似文献   

16.
Cheng H  Kim BH  Grishin NV 《Proteins》2008,70(4):1162-1166
We describe MALIDUP (manual alignments of duplicated domains), a database of 241 pairwise structure alignments for homologous domains originated by internal duplication within the same polypeptide chain. Since duplicated domains within a protein frequently diverge in function and thus in sequence, this would be the first database of structurally similar homologs that is not strongly biased by sequence or functional similarity. Our manual alignments in most cases agree with the automatic structural alignments generated by several commonly used programs. This carefully constructed database could be used in studies on protein evolution and as a reference for testing structure alignment programs. The database is available at http://prodata.swmed.edu/malidup.  相似文献   

17.
We describe a new strategy for utilizing multiple sequence alignment information to detect distant relationships in searches of sequence databases. A single sequence representing a protein family is enriched by replacing conserved regions with position-specific scoring matrices (PSSMs) or consensus residues derived from multiple alignments of family members. In comprehensive tests of these and other family representations, PSSM-embedded queries produced the best results overall when used with a special version of the Smith-Waterman searching algorithm. Moreover, embedding consensus residues instead of PSSMs improved performance with readily available single sequence query searching programs, such as BLAST and FASTA. Embedding PSSMs or consensus residues into a representative sequence improves searching performance by extracting multiple alignment information from motif regions while retaining single sequence information where alignment is uncertain.  相似文献   

18.
Alignment of protein sequences by their profiles   总被引:7,自引:0,他引:7  
The accuracy of an alignment between two protein sequences can be improved by including other detectably related sequences in the comparison. We optimize and benchmark such an approach that relies on aligning two multiple sequence alignments, each one including one of the two protein sequences. Thirteen different protocols for creating and comparing profiles corresponding to the multiple sequence alignments are implemented in the SALIGN command of MODELLER. A test set of 200 pairwise, structure-based alignments with sequence identities below 40% is used to benchmark the 13 protocols as well as a number of previously described sequence alignment methods, including heuristic pairwise sequence alignment by BLAST, pairwise sequence alignment by global dynamic programming with an affine gap penalty function by the ALIGN command of MODELLER, sequence-profile alignment by PSI-BLAST, Hidden Markov Model methods implemented in SAM and LOBSTER, pairwise sequence alignment relying on predicted local structure by SEA, and multiple sequence alignment by CLUSTALW and COMPASS. The alignment accuracies of the best new protocols were significantly better than those of the other tested methods. For example, the fraction of the correctly aligned residues relative to the structure-based alignment by the best protocol is 56%, which can be compared with the accuracies of 26%, 42%, 43%, 48%, 50%, 49%, 43%, and 43% for the other methods, respectively. The new method is currently applied to large-scale comparative protein structure modeling of all known sequences.  相似文献   

19.
Multiple sequence alignment (MSA) is a crucial first step in the analysis of genomic and proteomic data. Commonly occurring sequence features, such as deletions and insertions, are known to affect the accuracy of MSA programs, but the extent to which alignment accuracy is affected by the positions of insertions and deletions has not been examined independently of other sources of sequence variation. We assessed the performance of 6 popular MSA programs (ClustalW, DIALIGN-T, MAFFT, MUSCLE, PROBCONS, and T-COFFEE) and one experimental program, PRANK, on amino acid sequences that differed only by short regions of deleted residues. The analysis showed that the absence of residues often led to an incorrect placement of gaps in the alignments, even though the sequences were otherwise identical. In data sets containing sequences with partially overlapping deletions, most MSA programs preferentially aligned the gaps vertically at the expense of incorrectly aligning residues in the flanking regions. Of the programs assessed, only DIALIGN-T was able to place overlapping gaps correctly relative to one another, but this was usually context dependent and was observed only in some of the data sets. In data sets containing sequences with non-overlapping deletions, both DIALIGN-T and MAFFT (G-INS-I) were able to align gaps with near-perfect accuracy, but only MAFFT produced the correct alignment consistently. The same was true for data sets that comprised isoforms of alternatively spliced gene products: both DIALIGN-T and MAFFT produced highly accurate alignments, with MAFFT being the more consistent of the 2 programs. Other programs, notably T-COFFEE and ClustalW, were less accurate. For all data sets, alignments produced by different MSA programs differed markedly, indicating that reliance on a single MSA program may give misleading results. It is therefore advisable to use more than one MSA program when dealing with sequences that may contain deletions or insertions, particularly for high-throughput and pipeline applications where manual refinement of each alignment is not practicable.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号