首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 265 毫秒
1.
随机文法模型在RNA二级结构预测中的应用   总被引:1,自引:0,他引:1  
RNA二级结构的研究是当今计算分子生物学的一个重要课题,基于比较序列分析方法的随机文法模型预测RNA二级结构具有准确率高,能对假结建模,但不易实施等特点,本文通过分析随机文法对RNA二级结构建模的过程,提出了一种综合利用比较序列方法,随机文法方法,词条方法预测RNA二级结构的方案.  相似文献   

2.
本研究提出了一种新的RNA二级结构的图形表示方法,这种方法不同于以往的表示方式。根据所提出的RNA二级结构的图形表示,将对9种病毒的RNA二级结构进行图形表示,构建系统进化树,进行序列间相似性的比较和分析。根据最终结果,可以很清晰地发现,AVII与LRMV两种病毒是最为相似的,另外,较大的距离值出现在了APMV与ALMV;PDV与AVII中,这说明这几种RNA二级结构明显不相似。这一研究结果与前人相似性分析的结果是十分相似的,同时,所采取的方法更加简单易于区分观察且得到的结果又是十分可靠的,因此,这些更加证明了该方法是有效的。  相似文献   

3.
以A型流感病毒为研究样本,分析了同义密码子使用偏性对RNA二级结构的影响,为进一步研究同义密码子存在的意义及A型流感病毒RNA特征提供一些理论依据。收集整理了NCBI中收录的全部A型流感病毒的核酸序列信息,计算了每一条核酸序列的RNA二级结构,计算出RNA环结构含量和茎结构含量及折叠自由能。在此基础上,计算了RNA二级结构的柔性。同时,计算了每一条核酸序列的相对同义密码子使用值。由此,建立了A型流感病毒RNA二级结构数据库。分析每条核酸序列的相对同义密码子使用值与RNA的环结构、茎结构及柔性之间的关系。分析结果表明,50%的氨基酸的相对同义密码子使用值与RNA茎结构含量和环结构含量显著相关;60%的氨基酸的相对同义密码子使用值与单位平均折叠自由能显著相关;50%的氨基酸的相对同义密码子使用值与RNA柔性显著相关。进一步分析发现,与茎结构含量和环结构含量都显著相关的密码子,它们的相对同义密码子使用值与两种结构含量的相关性截然相反,而且发现,在所选的参量中,RNA柔性与相对同义密码子使用值显示出更好的相关性。结果证实,对于A型流感病毒,同义密码子的使用偏性对RNA二级结构存在很大的影响。  相似文献   

4.
随着21世纪分子生物学研究的蓬勃发展,RNA二级结构预测成为其中一项重要内容。由于RNA二级结构预测的准确性最为关键,因此寻找高精度且易操作的二级结构预测工具显得非常重要。本文选取三种简单且易操作的二级结构预测软件,先基于PDB数据库收录的318个RNA发夹序列进行二级结构预测,进而通过比较预测结果与实验测定结果进行软件预测性能评估。比较结果显示,RNAstructure为三个软件中性能最优的RNA二级结构预测软件。  相似文献   

5.
RNA二级结构预测系统构建   总被引:9,自引:0,他引:9  
运用下列RNA二级结构预测算法:碱基最大配对方法、Zuker极小化自由能方法、螺旋区最优堆积、螺旋区随机堆积和所有可能组合方法与基于一级螺旋区的RNA二级结构绘图技术, 构建了RNA二级结构预测系统Rnafold. 另外, 通过随机选取20个tRNA序列, 从自由能和三叶草结构两个方面比较了前4种二级结构预测算法, 并运用t检验方法分析了自由能的统计学差别. 从三叶草结构来看, 以随机堆积方法最好, 其次是螺旋区最优堆积方法和Zuker算法, 以碱基最大配对方法最差. 最后, 分析了两种极小化自由能方法之间的差别.  相似文献   

6.
基于量子进化算法的RNA序列-结构比对   总被引:1,自引:0,他引:1  
多序列比对是计算分子生物学的经典问题,也是许多生物学研究的重要基础步骤.RNA作为生物大分子的一种,不同于蛋白质和DNA,其二级结构在进化过程中比初级序列更保守,因此要求在RNA序列比对中不仅要考虑序列信息,更要着重考虑二级结构信息.提出了一种基于量子进化算法的RNA多序列-结构比对程序,对RNA序列进行了量子编码,设计了考虑进结构信息的全交叉算子,提出了适合于进行RNA序列-结构比对的适应度函数,克服了传统进化算法收敛速度慢和早熟问题.在标准数据库上的测试,证实了方法的有效性.  相似文献   

7.
成彩霞  苏雪  高婷  周璇 《广西植物》2018,38(5):617-625
藜科植物Grayia spinosa是美国西部地区的特有种,多生长在干旱盐碱地,具有重要的生态价值。该研究测定了采自美国西部犹他州G.spinosa的nrDNA ITS序列,与Gen Bank中已提交的G.spinosa的所有ITS序列以及G.spinosa的四个近缘种作为外类群进行比较,分析了美国西部不同地区G.spinosa ITS序列的一级结构与其RNA二级结构的变异规律。结果表明:所有G.spinosa样品的nrDNA ITS序列长度在611~623 bp之间,GC含量在60.35%~61.0%之间,序列间共存在22个变异位点,5个为简约信息位点。各样品间的遗传距离在0.001 8~0.008 9之间,不同样品间的遗传距离与地理距离的相关性不显著。邻接法构建的系统发育树显示所有G.spinosa聚为一大支,与外类群形成明显分支。此外,利用RNAfold在线软件预测了G.spinosa ITS序列的RNA二级结构,将8个G.spinosa样品的RNA二级结构根据构型差异大体上分为四类,分别记为type A,B,C和D四类,主要变异出现在ITS1和ITS2区。所不同的是在G.spinosa ITS的一级结构分析中GSNE1与GSWA8体现出更近的亲缘关系,但二者的RNA二级结构差异明显,同时GSNE2、GSUT3、GSUT4、GSCA5、GSCA6、GSCO7在ITS序列一级结构分析中也体现出较近的亲缘关系,但是他们的RNA二级结构差异明显。这可能与ITS序列的RNA二级结构在进化中体现出更大的保守性有关。  相似文献   

8.
刘海林  章群  江启明  马奔 《生态科学》2010,29(5):432-437
测定了南海球形棕囊藻香港株P1、P2和湛江株ZhJ1的rDNAITS区序列(含5.8srDNA),结合Gen Bank的13条同源序列,比对长度为904bp,变异位点271个,简约信息位点221个,平均(A+T)(34.5%)<(G+C)(65.4%).藻株P1、P2和ZhJ1序列存在变异位点20个,序列间相似性为97.9%~98.5%.ITS序列在种间和种内的解析度高于18srDNA和28srDNA基因;构建的NJ树、MP树、贝叶斯推断系统树的结构是一致的,不同种类的棕囊藻单独聚类,不同地理来源的球形棕囊藻混杂分布但相同地理来源的藻株多聚类在一起.RNA二级结构显示,不同藻种间5.8srDNA区结构基本一致,表现出属的特异性;ITS1、2区结构表现较大的种间差异,表明ITS区RNA二级结构可为棕囊藻分类鉴定提供有用的分子结构信息.  相似文献   

9.
根据GenBank中检索到的南极棕囊藻(Phaeocystis globosa)psaA基因序列设计psaAL和psaAR引物,对球形棕囊藻(Phaeocystis globosa),的psaA基因片段进行PCR扩增并测序,获得了629bp的DNA序列。应用clustal X对球形棕囊藻P1、P2株系和南极棕囊藻的psaA基因片段序列进行比对,结果表明,球形棕囊藻psaA基因片段序列无插入/缺失,核苷酸差异率为3.34%。应用DNAstar分析软件推断球形棕囊藻和南极棕囊藻的psaA基因对应的氨基酸序列和RNA二级结构,发现它们的氨基酸序列差异不大,序列中209个氨基酸只有1个发生了变化,其氨基酸变异率为0.48%;除部分结构域比较相似外,RNA二级结构上体现一定程度的差异,这可能对棕囊藻的分子分类研究有参考价值。因所获得的psaA基因片段序列及氨基酸序列具有种的极端保守性,不适宜用作Phaeocystis属种间的分子分类研究。  相似文献   

10.
RNA的二级结构预测是生物信息学中一个已经有30多年历史的经典问题,基于最小自由能模型(MFE)的优化算法是使用最为广泛的方法.但RNA结构中假结的存在使MFE问题理论上成为一个NP-hard问题,即使采用动态规划等优化算法也会面临时间复杂度高的困难,同时研究还发现,由于受RNA折叠动力学机制以及环境因素的影响,真实的RNA二级结构往往并不处于自由能最小状态.根据RNA折叠的特点,提出了一种启发式搜索算法来预测带假结的RNA二级结构.该算法以RNA的茎为基本单元,采用启发式搜索策略在茎的组合空间中搜索自由能最小并且出现频率最高的RNA二级结构,该算法不仅能显著降低搜索RNA二级结构的时间复杂度,还有助于弥补单纯依赖能量预测RNA二级结构的不足.在多种类型的RNA标准数据集上进行了检验,结果表明,该算法在预测的精度上优于目前国际上几个著名的RNA二级结构预测算法并且具有较高的运行效率.  相似文献   

11.
Visually examining RNA structures can greatly aid in understanding their potential functional roles and in evaluating the performance of structure prediction algorithms. As many functional roles of RNA structures can already be studied given the secondary structure of the RNA, various methods have been devised for visualizing RNA secondary structures. Most of these methods depict a given RNA secondary structure as a planar graph consisting of base-paired stems interconnected by roundish loops. In this article, we present an alternative method of depicting RNA secondary structure as arc diagrams. This is well suited for structures that are difficult or impossible to represent as planar stem-loop diagrams. Arc diagrams can intuitively display pseudo-knotted structures, as well as transient and alternative structural features. In addition, they facilitate the comparison of known and predicted RNA secondary structures. An added benefit is that structure information can be displayed in conjunction with a corresponding multiple sequence alignments, thereby highlighting structure and primary sequence conservation and variation. We have implemented the visualization algorithm as a web server R-chie as well as a corresponding R package called R4RNA, which allows users to run the software locally and across a range of common operating systems.  相似文献   

12.
Prediction of RNA secondary structure based on helical regions distribution   总被引:5,自引:0,他引:5  
MOTIVATION: RNAs play an important role in many biological processes and knowing their structure is important in understanding their function. Due to difficulties in the experimental determination of RNA secondary structure, the methods of theoretical prediction for known sequences are often used. Although many different algorithms for such predictions have been developed, this problem has not yet been solved. It is thus necessary to develop new methods for predicting RNA secondary structure. The most-used at present is Zuker's algorithm which can be used to determine the minimum free energy secondary structure. However many RNA secondary structures verified by experiments are not consistent with the minimum free energy secondary structures. In order to solve this problem, a method used to search a group of secondary structures whose free energy is close to the global minimum free energy was developed by Zuker in 1989. When considering a group of secondary structures, if there is no experimental data, we cannot tell which one is better than the others. This case also occurs in combinatorial and heuristic methods. These two kinds of methods have several weaknesses. Here we show how the central limit theorem can be used to solve these problems. RESULTS: An algorithm for predicting RNA secondary structure based on helical regions distribution is presented, which can be used to find the most probable secondary structure for a given RNA sequence. It consists of three steps. First, list all possible helical regions. Second, according to central limit theorem, estimate the occurrence probability of every helical region based on the Monte Carlo simulation. Third, add the helical region with the biggest probability to the current structure and eliminate the helical regions incompatible with the current structure. The above processes can be repeated until no more helical regions can be added. Take the current structure as the final RNA secondary structure. In order to demonstrate the confidence of the program, a test on three RNA sequences: tRNAPhe, Pre-tRNATyr, and Tetrahymena ribosomal RNA intervening sequence, is performed. AVAILABILITY: The program is written in Turbo Pascal 7.0. The source code is available upon request. CONTACT: Wujj@nic.bmi.ac.cn or Liwj@mail.bmi.ac.cn   相似文献   

13.
Measuring the (dis)similarity between RNA secondary structures is critical for the study of RNA secondary structures and has implications to RNA functional characterization. Although a number of methods have been developed for comparing RNA structural similarities, their applications have been limited by the complexity of the required computation. In this paper, we present a novel method for comparing the similarity of RNA secondary structures generated from the same RNA sequence, i.e., a secondary structure ensemble, using a matrix representation of the RNA structures. Relevant features of the RNA secondary structures can be easily extracted through singular value decomposition (SVD) of the representing matrices. We have mapped the feature vectors of the singular values to a kernel space, where (dis)similarities among the mapped feature vectors become more evident, making clustering of RNA secondary structures easier to handle. The pair-wise comparison of RNA structures is achieved through computing the distance between the singular value vectors in the kernel space. We have applied a fuzzy kernel clustering method, using this similarity metric, to cluster the RNA secondary structure ensembles. Our application results suggest that our fuzzy kernel clustering method is highly promising for classifications of RNA structure ensembles, because of its low computational complexity and high clustering accuracy.  相似文献   

14.

Background

Ribonucleic acid (RNA) molecules play important roles in many biological processes including gene expression and regulation. Their secondary structures are crucial for the RNA functionality, and the prediction of the secondary structures is widely studied. Our previous research shows that cutting long sequences into shorter chunks, predicting secondary structures of the chunks independently using thermodynamic methods, and reconstructing the entire secondary structure from the predicted chunk structures can yield better accuracy than predicting the secondary structure using the RNA sequence as a whole. The chunking, prediction, and reconstruction processes can use different methods and parameters, some of which produce more accurate predictions than others. In this paper, we study the prediction accuracy and efficiency of three different chunking methods using seven popular secondary structure prediction programs that apply to two datasets of RNA with known secondary structures, which include both pseudoknotted and non-pseudoknotted sequences, as well as a family of viral genome RNAs whose structures have not been predicted before. Our modularized MapReduce framework based on Hadoop allows us to study the problem in a parallel and robust environment.

Results

On average, the maximum accuracy retention values are larger than one for our chunking methods and the seven prediction programs over 50 non-pseudoknotted sequences, meaning that the secondary structure predicted using chunking is more similar to the real structure than the secondary structure predicted by using the whole sequence. We observe similar results for the 23 pseudoknotted sequences, except for the NUPACK program using the centered chunking method. The performance analysis for 14 long RNA sequences from the Nodaviridae virus family outlines how the coarse-grained mapping of chunking and predictions in the MapReduce framework exhibits shorter turnaround times for short RNA sequences. However, as the lengths of the RNA sequences increase, the fine-grained mapping can surpass the coarse-grained mapping in performance.

Conclusions

By using our MapReduce framework together with statistical analysis on the accuracy retention results, we observe how the inversion-based chunking methods can outperform predictions using the whole sequence. Our chunk-based approach also enables us to predict secondary structures for very long RNA sequences, which is not feasible with traditional methods alone.
  相似文献   

15.
RNA pseudoknot prediction in energy-based models.   总被引:11,自引:0,他引:11  
RNA molecules are sequences of nucleotides that serve as more than mere intermediaries between DNA and proteins, e.g., as catalytic molecules. Computational prediction of RNA secondary structure is among the few structure prediction problems that can be solved satisfactorily in polynomial time. Most work has been done to predict structures that do not contain pseudoknots. Allowing pseudoknots introduces modeling and computational problems. In this paper we consider the problem of predicting RNA secondary structures with pseudoknots based on free energy minimization. We first give a brief comparison of energy-based methods for predicting RNA secondary structures with pseudoknots. We then prove that the general problem of predicting RNA secondary structures containing pseudoknots is NP complete for a large class of reasonable models of pseudoknots.  相似文献   

16.
Abstract

Measuring the (dis)similarity between RNA secondary structures is critical for the study of RNA secondary structures and has implications to RNA functional characterization. Although a number of methods have been developed for comparing RNA structural similarities, their applications have been limited by the complexity of the required computation. In this paper, we present a novel method for comparing the similarity of RNA secondary structures generated from the same RNA sequence, i.e., a secondary structure ensemble, using a matrix representation of the RNA structures. Relevant features of the RNA secondary structures can be easily extracted through singular value decomposition (SVD) of the representing matrices. We have mapped the feature vectors of the singular values to a kernel space, where (dis)similarities among the mapped feature vectors become more evident, making clustering of RNA secondary structures easier to handle. The pair-wise comparison of RNA structures is achieved through computing the distance between the singular value vectors in the kernel space. We have applied a fuzzy kernel clustering method, using this similarity metric, to cluster the RNA secondary structure ensembles. Our application results suggest that our fuzzy kernel clustering method is highly promising for classifications of RNA structure ensembles, because of its low computational complexity and high clustering accuracy.  相似文献   

17.
Fang X  Luo Z  Yuan B  Wang J 《Bioinformation》2007,2(5):222-229
The prediction of RNA secondary structure can be facilitated by incorporating with comparative analysis of homologous sequences. However, most of existing comparative methods are vulnerable to alignment errors and thus are of low accuracy in practical application. Here we improve the prediction of RNA secondary structure by detecting and assessing conserved stems shared by all sequences in the alignment. Our method can be summarized by: 1) we detect possible stems in single RNA sequence using the so-called position matrix with which some possibly paired positions can be uncovered; 2) we detect conserved stems across multiple RNA sequences by multiplying the position matrices; 3) we assess the conserved stems using the Signal-to-Noise; 4) we compute the optimized secondary structure by incorporating the so-called reliable conserved stems with predictions by RNAalifold program. We tested our method on data sets of RNA alignments with known secondary structures. The accuracy, measured as sensitivity and specificity, of our method is greater than predictions by RNAalifold.  相似文献   

18.
Abstract

This paper develops mathematical methods for describing and analyzing RNA secondary structures. It was motivated by the need to develop rigorous yet efficient methods to treat transitions from one secondary structure to another, which we propose here may occur as motions of loops within RNAs having appropriate sequences. In this approach a molecular sequence is described as a vector of the appropriate length. The concept of symmetries between nucleic acid sequences is developed, and the 48 possible different types of symmetries are described. Each secondary structure possible for a particular nucleotide sequence determines a symmetric, signed permutation matrix. The collection of all possible secondary structures is comprised of all matrices of this type whose left multiplication with the sequence vector leaves that vector unchanged. A transition between two secondary structures is given by the product of the two corresponding structure matrices. This formalism provides an efficient method for describing nucleic acid sequences that allows questions relating to secondary structures and transitions to be addressed using the powerful methods of abstract algebra. In particular, it facilitates the determination of possible secondary structures, including those containing pseudoknots. Although this paper concentrates on RNA structure, this formalism also can be applied to DNA  相似文献   

19.
We propose a new method for detecting conserved RNA secondary structures in a family of related RNA sequences. Our method is based on a combination of thermodynamic structure prediction and phylogenetic comparison. In contrast to purely phylogenetic methods, our algorithm can be used for small data sets of approximately 10 sequences, efficiently exploiting the information contained in the sequence variability. The procedure constructs a prediction only for those parts of sequences that are consistent with a single conserved structure. Our implementation produces reasonable consensus structures without user interference. As an example we have analysed the complete HIV-1 and hepatitis C virus (HCV) genomes as well as the small segment of hantavirus. Our method confirms the known structures in HIV-1 and predicts previously unknown conserved RNA secondary structures in HCV.  相似文献   

20.
This paper develops mathematical methods for describing and analyzing RNA secondary structures. It was motivated by the need to develop rigorous yet efficient methods to treat transitions from one secondary structure to another, which we propose here may occur as motions of loops within RNAs having appropriate sequences. In this approach a molecular sequence is described as a vector of the appropriate length. The concept of symmetries between nucleic acid sequences is developed, and the 48 possible different types of symmetries are described. Each secondary structure possible for a particular nucleotide sequence determines a symmetric, signed permutation matrix. The collection of all possible secondary structures is comprised of all matrices of this type whose left multiplication with the sequence vector leaves that vector unchanged. A transition between two secondary structures is given by the product of the two corresponding structure matrices. This formalism provides an efficient method for describing nucleic acid sequences that allows questions relating to secondary structures and transitions to be addressed using the powerful methods of abstract algebra. In particular, it facilitates the determination of possible secondary structures, including those containing pseudoknots. Although this paper concentrates on RNA structure, this formalism also can be applied to DNA.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号