首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 156 毫秒
基于量子进化算法的RNA序列-结构比对   总被引:1,自引:0,他引:1  
多序列比对是计算分子生物学的经典问题,也是许多生物学研究的重要基础步骤.RNA作为生物大分子的一种,不同于蛋白质和DNA,其二级结构在进化过程中比初级序列更保守,因此要求在RNA序列比对中不仅要考虑序列信息,更要着重考虑二级结构信息.提出了一种基于量子进化算法的RNA多序列-结构比对程序,对RNA序列进行了量子编码,设计了考虑进结构信息的全交叉算子,提出了适合于进行RNA序列-结构比对的适应度函数,克服了传统进化算法收敛速度慢和早熟问题.在标准数据库上的测试,证实了方法的有效性.  相似文献   

本文提出能预测单链核酸分子的具有最小自由能的二级结构的计算方法。方法的基础是拓扑平面图的最大C—匹配原理和现有的单链核酸分子折叠构象的热力学数据资料。为了说明算法的能力,对免疫球蛋白r1重链的mRNA片段序列(459个核苷酸残基)大肠杆菌16s rRNA片段序列(567一883)以及脊髓灰白质炎病毒RNA片段序列(1—74O)的二级结构进行了计算机预测并同现有的结构模型进行了比较和讨论。由计算机预测的大肠杆菌16s rRNA中心域的二级结构与Noller和Woese提出的结构模型基本一致。  相似文献   

随着人类基因组和一些模式生物、重要经济生物以及大量微生物基因组测序的完成,生物学整体研究业已进入基因组时代.最近5~10年以来,利用基因组结构信息进行系统发育推断的研究形成了分类学和进化生物学中的前沿领域之一.相对于核苷酸或氨基酸序列中的突变而言,基因组的结构变化--内含子的插入/缺失、反转录子的整合、签名序列、基因重复以及基因排序等--是更大空间(或者时间空间)尺度上的相对稀缺的系统发育信息,一般用于科和科以上阶元间的亲缘关系研究.基因组全序列的获得和其中各基因位置的确定有利于将基因组中不同层次的系统发育信息综合起来,利用全面分子证据(total molecular evidence;包括基因组信息,DNA、RNA、蛋白质的序列信息,RNA和蛋白质的高级结构等)进行分子系统学研究.  相似文献   

提出了一种新的蛋白质二级结构预测方法. 该方法从氨基酸序列中提取出和自然语言中的“词”类似的与物种相关的蛋白质二级结构词条, 这些词条形成了蛋白质二级结构词典, 该词典描述了氨基酸序列和蛋白质二级结构之间的关系. 预测蛋白质二级结构的过程和自然语言中的分词和词性标注一体化的过程类似. 该方法把词条序列看成是马尔科夫链, 通过Viterbi算法搜索每个词条被标注为某种二级结构类型的最大概率, 其中使用词网格描述分词的结果, 使用最大熵马尔科夫模型计算词条的二级结构概率. 蛋白质二级结构预测的结果是最优的分词所对应的二级结构类型. 在4个物种的蛋白质序列上对这种方法进行测试, 并和PHD方法进行比较. 试验结果显示, 这种方法的Q3准确率比PHD方法高3.9%, SOV准确率比PHD方法高4.6%. 结合BLAST搜索的局部相似的序列可以进一步提高预测的准确率. 在50个CASP5目标蛋白质序列上进行测试的结果是: Q3准确率为78.9%, SOV准确率为77.1%. 基于这种方法建立了一个蛋白质二级结构预测的服务器, 可以通过http://www.insun.hit.edu.cn:81/demos/biology/index.html来访问.  相似文献   

蛋白质的二级结构预测研究进展   总被引:1,自引:0,他引:1  
唐媛  李春花  张瑗  尚进  邹凌云  李立奇 《生物磁学》2013,(26):5180-5182
认识蛋白质的二级结构是了解蛋白质的折叠模式和三级结构的基础,并为研究蛋白质的功能以及它们之间的相互作用模式提供结构基础,同时还可以为新药研发提供帮助。故研究蛋白质的二级结构具有重要的意义。随着后基因组时代的到来,越来越多的蛋白质序列不断被发现,给蛋白质的二级结构研究带来巨大的挑战和研究空间。而依靠传统的实验方法很难获取大规模蛋白质的二级结构信息。目前,采用生物信息学手段仍然是获得大部分蛋白质二级结构的途径。近年来,许多研究者通过构建用于二级结构预测的蛋白质数据集,计算、提取蛋白质的各种特征信息,并采用不同的预测算法预测蛋白质的二级结构得到了快速的发展。本文拟从蛋白质的特征信息的提取与筛选、预测算法以及预测效果的检验方法等方面进行综述,介绍蛋白质二级结构预测领域的研究进展。相信随着基因组学、蛋白质组学和生物信息学的不断发展,蛋白质二级结构预测会不断取得新突破。  相似文献   

RNA二级结构比一级结构包含更多的信息, 在物种系统发育分析中更能反应真实情况。因而, 本研究对番石榴实蝇Bactrocera correcta和瓜实蝇B. cucurbitae的L-rRNA全基因进行了测序, 构建了双翅目L-rRNA基因二级结构的模式图, 并分析了其结构特征, 而后基于H45~H47茎环结构参数和结构序列及L-rRNA结构序列对双翅目13科的系统发育关系进行了初步分析。结果表明: 双翅目昆虫具有保守的L-rRNA二级结构; 不同结构区碱基组成和分布不均匀, Ⅳ区和Ⅴ区的完全保守碱基含量最高, GC含量最高, 绝大部分科Ⅵ区的AT斜率均小于0; 科内特有碱基和科间少数保守碱基大部分为G或C。瘿蚊科与双翅目其他科具有较远的系统发育关系, 而丽蝇科、 寄蝇科和食蚜蝇科有相近的系统发育关系; 虻科与网翅虻科同在一个小分支; 蠓科与蚊科同在一个大的系统发育分支。利用单方面参数不易得到理想的进化结果, 准确系统发育分析需要结合多方面的参数。  相似文献   

蛋白质空间结构的分析与预测已经成为现今分子生物学和生物信息学的重要研究课题之一。虽然引入了3D-profile,人工神经网络,遗传算法等复杂的模型或算法,并获得了相对较高的准确率,但是却很难对所得到的结果进行解释,且不易发现其中的生物学规律。而相应地,氨基酸序列所隐含的蛋白质结构信息,生物学和数学意义才应是我们所需探寻的重点。因此我们从分析构成特定二级结构的氨基酸序列着手,引入涉及关联规则的支持度S来计算特殊位点处氨基酸的贡献率,以期发现其中的隐含信息,并获得了相应的数据信息矩阵。通过分析二级结构的支持度,不仅得到了各氨基酸位于不同位点时相对于Beta结构的强弱作用关系,还发现了脯氨酸的特殊作用和相对于Beta结构的成核性。  相似文献   

李倩  闫淑珍  陈双林 《菌物学报》2015,34(2):235-245
为探讨核糖体DNA转录间隔区(r DNA ITS)的RNA二级结构在黏菌系统发育研究中的作用,以黏菌ITS通用引物PHYS4和PHYS5对绒泡菌目5属8种黏菌的r DNA ITS进行扩增和测序,利用RNA structure构建了ITS区的RNA二级结构模型。结果表明:ITS1在绒泡菌目黏菌中不能形成一个紧实的结构,但大部分物种都具有一段稳定的螺旋结构,可能对r RNA的成熟具有作用;5.8S r RNA的二级结构相似,由4个螺旋组成,主要为两种类型;基于5.8S r RNA和28S r RNA相互作用构建的ITS2的二级结构模型显示,它由一个封闭的多分支环和至少4个主要的螺旋组成,其中螺旋IV结构相对比较保守。由于ITS区的二级结构相比核苷酸序列更加保守,因此深入地分析其二级结构有助于认识其结构与进化的关系。  相似文献   

蛋白质分子的一切高级结构,都由一级结构即氨基酸残基序列所包含的信息决定。多年来,由蛋白质的氨基酸序列预测二级结构的方法不下十几种。其中,Chou和Fasman的方法自1974年提出,至1978年修正、精化,已得到了很好结果,越益受到重视。此方法的突出优点是简便,无须计算机的复杂分析,就可预测出蛋白质的二级结构,准确性约为80%。目前蛋白质二级结构的测定,当然以X-晶体衍射结果最准确。Chou和Fasman方法正是基于晶体分析的结果,经统计得出的一整套数据  相似文献   

核酸序列中包含一定的蛋白质结构信息。根据通常情况下遗传密码表中密码子中间位的碱基配对时产生的氢键数目,尝试将20种氨基酸划分为两类,并用自编的计算机软件对蛋白质二级结构数据库中两类氨基酸的类聚现象进行了统计分析。结果表明,使用这种方法对氨基酸进行划分后,氨基酸残基具有较大概率与划入同一类的氨基酸残基相邻出现,并且这种聚集体对二级结构具有一定的偏好性。最后按照该方法设计了一段氨基酸序列并给出了预测服务器预测得到的结构。  相似文献   

Multiple sequence alignment is typically the first step in estimating phylogenetic trees, with the assumption being that as alignments improve, so will phylogenetic reconstructions. Over the last decade or so, new multiple sequence alignment methods have been developed to improve comparative analyses of protein structure, but these new methods have not been typically used in phylogenetic analyses. In this paper, we report on a simulation study that we performed to evaluate the consequences of using these new multiple sequence alignment methods in terms of the resultant phylogenetic reconstruction. We find that while alignment accuracy is positively correlated with phylogenetic accuracy, the amount of improvement in phylogenetic estimation that results from an improved alignment can range from quite small to substantial. We observe that phylogenetic accuracy is most highly correlated with alignment accuracy when sequences are most difficult to align, and that variation in alignment accuracy can have little impact on phylogenetic accuracy when alignment error rates are generally low. We discuss these observations and implications for future work.  相似文献   

The reconstruction of phylogenetic history is predicated on being able to accurately establish hypotheses of character homology, which involves sequence alignment for studies based on molecular sequence data. In an empirical study investigating nucleotide sequence alignment, we inferred phylogenetic trees for 43 species of the Apicomplexa and 3 of Dinozoa based on complete small-subunit rDNA sequences, using six different multiple-alignment procedures: manual alignment based on the secondary structure of the 18S rRNA molecule, and automated similarity-based alignment algorithms using the PileUp, ClustalW, TreeAlign, MALIGN, and SAM computer programs. Trees were constructed using neighboring-joining, weighted-parsimony, and maximum- likelihood methods. All of the multiple sequence alignment procedures yielded the same basic structure for the estimate of the phylogenetic relationship among the taxa, which presumably represents the underlying phylogenetic signal. However, the placement of many of the taxa was sensitive to the alignment procedure used; and the different alignments produced trees that were on average more dissimilar from each other than did the different tree-building methods used. The multiple alignments from the different procedures varied greatly in length, but aligned sequence length was not a good predictor of the similarity of the resulting phylogenetic trees. We also systematically varied the gap weights (the relative cost of inserting a new gap into a sequence or extending an already-existing gap) for the ClustalW program, and this produced alignments that were at least as different from each other as those produced by the different alignment algorithms. Furthermore, there was no combination of gap weights that produced the same tree as that from the structure alignment, in spite of the fact that many of the alignments were similar in length to the structure alignment. We also investigated the phylogenetic information content of the helical and nonhelical regions of the rDNA, and conclude that the helical regions are the most informative. We therefore conclude that many of the literature disagreements concerning the phylogeny of the Apicomplexa are probably based on differences in sequence alignment strategies rather than differences in data or tree-building methods.   相似文献   

We present a new method using nucleic acid secondary structure to assess phylogenetic relationships among species. In this method, which we term "molecular morphometrics," the measurable structural parameters of the molecules (geometrical features, bond energies, base composition, etc.) are used as specific characters to construct a phylogenetic tree. This method relies both on traditional morphological comparison and on molecular sequence comparison. Applied to the phylogenetic analysis of Cirripedia, molecular morphometrics supports the most recent morphological analyses arguing for the monophyly of Cirripedia sensu stricto (Thoracica + Rhizocephala + Acrothoracica). As a proof, a classical multiple alignment was also performed, either using or not using the structural information to realign the sequence segments considered in the molecular morphometrics analysis. These methods yielded the same tree topology as the direct use of structural characters as a phylogenetic signal. By taking into account the secondary structure of nucleic acids, the new method allows investigators to use the regions in which multiple alignments are barely reliable because of a large number of insertions and deletions. It thus appears to be complementary to classical primary sequence analysis in phylogenetic studies.  相似文献   

Alignment of nucleotide and/or amino acid sequences is a fundamental component of sequence‐based molecular phylogenetic studies. Here we examined how different alignment methods affect the phylogenetic trees that are inferred from the alignments. We used simulations to determine how alignment errors can lead to systematic biases that affect phylogenetic inference from those sequences. We compared four approaches to sequence alignment: progressive pairwise alignment, simultaneous multiple alignment of sequence fragments, local pairwise alignment and direct optimization. When taking into account branch support, implied alignments produced by direct optimization were found to show the most extreme behaviour (based on the alignment programs for which nearly equivalent alignment parameters could be set) in that they provided the strongest support for the correct tree in the simulations in which it was easy to resolve the correct tree and the strongest support for the incorrect tree in our long‐branch‐attraction simulations. When applied to alignment‐sensitive process partitions with different histories, direct optimization showed the strongest mutual influence between the process partitions when they were aligned and phylogenetically analysed together, which makes detecting recombination more difficult. Simultaneous alignment performed well relative to direct optimization and progressive pairwise alignment across all simulations. Rather than relying upon methods that integrate alignment and tree search into a single step without accounting for alignment uncertainty, as with implied alignments, we suggest that simultaneous alignment using the similarity criterion, within the context of information available on biological processes and function, be applied whenever possible for sequence‐based phylogenetic analyses.  相似文献   

In order to maximise the positional homology in the primary sequence alignment of the second internal transcribed spacer for 30 species of equine strongyloid nematodes, the secondary structures of the precursor ribosomal RNA were predicted using an approach combining an energy minimisation method and comparative sequence analysis. The results indicated that a common secondary structure model of the second internal transcribed spacer of these nematodes was maintained, despite significant interspecific differences (2–56%) in primary sequences. The secondary structure model was then used to refine the primary second internal transcribed spacer sequence alignment. The “manual” and “structure” alignments were both subjected to phylogenetic analysis using three different tree-building methods to compare the effect of using different sequence alignments on phylogenetic inference. The topologies of the phylogenetic trees inferred from the manual second internal transcribed spacer alignment were usually different to those derived from the structure second internal transcribed spacer alignment. The results suggested that the positional homology in the second internal transcribed spacer primary sequence alignment was maximised when the secondary structure model was taken into consideration.  相似文献   

Phylogenetic studies of ciliates are mainly based on the primary structure information of the nuclear genes. Some regions of the small subunit ribosomal RNA (SSU‐rRNA) gene have distinctive secondary structures, which have demonstrated value as phylogenetic/taxonomic characters. In the current work, we predict the secondary structures of four variable regions (V2, V4, V7 and V9) in the SSU‐rRNA gene of 45 urostylids. Structure comparisons indicate that the V4 region is the most effective in revealing interspecific relationships, while the V9 region appears suitable at the family level or higher. The V2 region also offers some taxonomic information, but is too conserved to reflect phylogenetic relationships at the family or lower level, at least for urostylids. The V7 region is the least informative. We constructed several phylogenetic trees, based on the primary sequence alignment and based on an improved alignment according to the secondary structures. The results suggest that including secondary structure information in phylogenetic analyses provides additional insights into phylogenetic relationships. Using urostylid ciliates as an example, we show that secondary structure information results in a better understanding of their relationships, for example generic relationships within the family Pseudokeronopsidae.  相似文献   

Wu M  Chatterji S  Eisen JA 《PloS one》2012,7(1):e30288
Uncertainty in multiple sequence alignments has a large impact on phylogenetic analyses. Little has been done to evaluate the quality of individual positions in protein sequence alignments, which directly impact the accuracy of phylogenetic trees. Here we describe ZORRO, a probabilistic masking program that accounts for alignment uncertainty by assigning confidence scores to each alignment position. Using the BALIBASE database and in simulation studies, we demonstrate that masking by ZORRO significantly reduces the alignment uncertainty and improves the tree accuracy.  相似文献   

A fundamental task in sequence analysis is to calculate the probability of a multiple alignment given a phylogenetic tree relating the sequences and an evolutionary model describing how sequences change over time. However, the most widely used phylogenetic models only account for residue substitution events. We describe a probabilistic model of a multiple sequence alignment that accounts for insertion and deletion events in addition to substitutions, given a phylogenetic tree, using a rate matrix augmented by the gap character. Starting from a continuous Markov process, we construct a non-reversible generative (birth-death) evolutionary model for insertions and deletions. The model assumes that insertion and deletion events occur one residue at a time. We apply this model to phylogenetic tree inference by extending the program dnaml in phylip. Using standard benchmarking methods on simulated data and a new "concordance test" benchmark on real ribosomal RNA alignments, we show that the extended program dnamlepsilon improves accuracy relative to the usual approach of ignoring gaps, while retaining the computational efficiency of the Felsenstein peeling algorithm.  相似文献   

Rapidly evolving, indel-rich phylogenetic markers play a pivotal role in our understanding of the relationships at multiple levels of the tree of life. There is extensive evidence that indels provide conserved phylogenetic signal, however, the range of phylogenetic depths for which gaps retain tree signal has not been investigated in detail. Here we address this question using the fungal internal transcribed spacer (ITS), which is central in many phylogenetic studies, molecular ecology, detection and identification of pathogenic and non-pathogenic species. ITS is repeatedly criticized for indel-induced alignment problems and the lack of phylogenetic resolution above species level, although these have not been critically investigated. In this study, we examined whether the inclusion of gap characters in the analyses shifts the phylogenetic utility of ITS alignments towards earlier divergences. By re-analyzing 115 published fungal ITS alignments, we found that indels are slightly more conserved than nucleotide substitutions, and when included in phylogenetic analyses, improved the resolution and branch support of phylogenies across an array of taxonomic ranges and extended the resolving power of ITS towards earlier nodes of phylogenetic trees. Our results reconcile previous contradicting evidence for the effects of data exclusion: in the case of more sophisticated indel placement, the exclusion of indel-rich regions from the analyses results in a loss of tree resolution, whereas in the case of simpler alignment methods, the exclusion of gapped sites improves it. Although the empirical datasets do not provide to measure alignment accuracy objectively, our results for the ITS region are consistent with previous simulations studies alignment algorithms. We suggest that sophisticated alignment algorithms and the inclusion of indels make the ITS region and potentially other rapidly evolving indel-rich loci valuable sources of phylogenetic information, which can be exploited at multiple taxonomic levels.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号