首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
刘林梦  温权  欧竑宇 《微生物学通报》2014,41(12):2583-2592
【目的】为识别已完成全测序细菌基因组中的ncRNA基因,对3个常用ncRNA预测工具s RNAPredict、PORTRAIT和s RNAscanner进行评估。【方法】选择了细菌ncRNA数据库BSRD收录的含有已知ncRNA基因数目大于30的9个细菌基因组,并按基因组G+C含量进行分类,比较s RNAPredict和PORTRAIT工具的预测准确性。提取不同G+C含量基因组中ncRNA基因转录起始和终止区的序列特征,对s RNAscanner预测结果进行评估。【结果】s RNAPredict对细菌ncRNA基因的预测特异性和阳性检出率均高于PORTRAIT,而敏感性则较差;两种工具预测效果均随基因组G+C含量不同而产生明显变化。在不同G+C含量的细菌基因组中,ncRNA基因启动子和终止子区域的序列特征有明显差异。利用这些序列特征能提高s RNAscanner预测ncRNA基因的平均水平。【结论】3种ncRNA基因工具预测效果随基因组G+C含量变化而不同。不同G+C含量基因组中ncRNA基因的转录起始和终止区特征可作为ncRNA基因预测的重要参数之一。  相似文献   

2.
潘志锋  张静 《生物信息学》2010,8(4):325-329
不同基因的转录效率一般会有差异,启动子序列的组织结构可能是造成差异的原因之一。本文分别基于出现频率、Markov模型以及加权Markov模型计算出所有6碱基片段(6-mer)在不同转录频率的酵母基因启动子序列中的分布,然后利用最大最小贴近度方法分析这些基因启动子序列结构的差异情况。结果表明,酵母基因的转录频率与启动子序列结构的确有关联,转录频率相差较大的基因,它们的启动子序列结构差异一般也较大。统计分析还表明,高阶(3阶和4阶)加权Markov模型可以更有效地反映基因启动子序列的结构特征。  相似文献   

3.
广义隐Markov模型(GHMM)是基因识别的一种重要模型,但是其计算量比传统的隐Markov模型大得多,以至于不能直 接在基因识别中使用。根据原核生物基因的结构特点,提出了一种高效的简化算法,其计算量是序列长度的线性函数。在此 基础上,构建了针对原核生物基因的识别程序GeneMiner,对实际数据的测试表明,此算法是有效的。  相似文献   

4.
人类基因组计划的研究结果显示,仅有2.5万~3万个蛋白质编码基因,占总基因组序列不到3%,其余基因组序列转录产生的RNA都是非编码RNA(non-coding RNA,ncRNA).ncRNA与恶性肿瘤发生发展关系密切.近年来,关于ncRNA中的长链非编码RNA(lncRNA)以及环状RNA(circRNA)的研究进展迅速.本文就lncRNA以及circRNA在前列腺癌中作用机制的研究进展作一综述.  相似文献   

5.
ncRNA和mRNA一样,都是重要的功能分子。以k-tuple(k字)含量为特征,对酵母ncRNA成熟序列和mRNA的编码区、上游序列与下游序列进行了分类与比较研究,结果显示:基于ncRNA成熟序列与mRNA编码区的3-tuple的含量,ncRNA和mRNA的交叉有效性分类精度(leave-one out cross-validation,LOOCV)平均值达到93.93%;基于上游序列4-tuple和5-tuple的含量,分类精度分别为92.49%和92.76%;基于下游序列4-tuple和5-tuple的含量,分类精度分别为91.58%和90.60%;利用上游序列和下游序列的4-tuple与5-tuple的含量,其平均分类精度分别为94.68%和94.83%;通过t检验,得到了在ncRNA和mRNA上、下游序列中具有显著统计学差异的k-tuple。上述结果表明,基于ncRNA成熟序列与mRNA编码区的3-tuple含量和基于ncRNA与mRNA上、下游序列的4或5-tuple含量可以有效地区分ncRNA与mRNA。此研究结果不仅有助于准确识别ncRNA与mRNA,还有助于发现ncRNA特异的转录因子结合位点。  相似文献   

6.
ncRNA和mRNA一样,都是重要的功能分子.以κ-tuple(κ字)含量为特征,对酵母ncRNA成熟序列和mRNA的编码区、上游序列与下游序列进行了分类与比较研究,结果显示:基于ncRNA成熟序列与mRNA编码区的3-tuple的含量,ncRNA和mRNA的交叉有效性分类精度(leave-one out cross-validation,LOOCV)平均值达到93.93%;基于上游序列4-tuple和5-tuple的含量,分类精度分别为92.49%和92.76%;基于下游序列4-tuple和5-tuple的含量,分类精度分别为91.58%和90.60%;利用上游序列和下游序列的4-tuple与5-tuple的含量,其平均分类精度分别为94.68%和94.83%;通过t检验,得到了在ncRNA和mRNA上、下游序列中具有显著统计学差异的κ-tuple.上述结果表明,基于ncRNA成熟序列与mRNA编码区的3-tuple含量和基于ncRNA与mRNA上、下游序列的4或5-tuple含量可以有效地区分ncRNA与mRNA.此研究结果不仅有助于准确识别ncRNA与mRNA,还有助于发现ncRNA特异的转录因子结合位点.  相似文献   

7.
计算RNA组学:非编码RNA结构识别与功能预测   总被引:2,自引:0,他引:2       下载免费PDF全文
真核生物基因组中包含大量非编码RNA基因,计算RNA组学采用信息科学等多学科方法解析ncRNA的结构与功能.本文就ncRNA数据存储与管理、ncRNA基因识别与鉴定、ncRNA靶标识别与功能预测等问题,对目前计算RNA组学的主要研究方法和内容进行了评述.  相似文献   

8.
非编码RNA(non-coding RNA, ncRNA)是一类广泛存在于多种生物体中,缺乏明确的开放阅读框,不编码蛋白质的RNA分子.目前已从部分植物中分离到一些ncRNA,它们直接以RNA分子的形式在植物体内发挥重要的调节功能,影响细胞分化和个体发育、基因转录调控、mRNA稳定性、RNA加工与修饰、信号传导、以及环境适应调节等.植物ncRNA的研究为深入了解植物的生长发育及系统进化提供了重要信息.  相似文献   

9.
本研究旨在探讨伤寒沙门菌(Salmonella enterica serovar Typhi, S. Typhi)中非编码RNA617(non-coding RNA617,ncRNA617)的分子特性,并研究其对生物膜形成的影响及作用机制。采用Northern blot方法检测ncRNA617的表达,通过cDNA 5’末端快速扩增技术(5’-rapid amplification of cDNA end,5’RACE)和逆转录-聚合酶链式反应(reverse transcriotion-polymerase chain reaction,3’RT-PCR)实验分析ncRNA617可能的转录起始位点和终止位点;构建ncRNA617缺陷菌株、回补菌株和过表达菌株等相关菌株,通过生物膜形成实验,观察ncRNA617对伤寒沙门菌生物膜形成的影响,并用实时荧光定量聚合酶链式反应(quantitative real-time polymerase chain reaction,qPCR)分析生物膜形成相关基因表达水平的变化,综合运用生物信息学方法预测ncRNA617和差异基因的结合区域,初步分析ncRNA617发挥调控作用的机制。结果显示,伤寒沙门菌确有ncRNA617的表达,长度约300 nt,其转录起始位点位于mig-14终止密码子下游967 nt处,终止位点位于t2681起始密码子上游 2 378~2 560 nt处。与野生对照菌株相比,ncRNA617缺陷菌株生物膜形成能力增强(P<0.05),回补菌株的生物膜形成能力恢复至野生菌株水平,过表达菌株的生物膜形成能力有所下降(P<0.05)。qPCR结果表明,ncRNA617可负向调控多个生物膜形成相关基因的转录表达水平(P<0.05)。经生物信息学方法预测发现,ncRNA617与差异基因有不同的结合区域。本研究结果提示,ncRNA617在伤寒沙门菌中存在,其长度约270~452 nt。ncRNA617可能通过靶向结合生物膜形成相关基因下调基因表达,从而负向调控伤寒沙门菌生物膜的生成。  相似文献   

10.
ncRNA 研究技术进展   总被引:2,自引:0,他引:2  
肖章奎  薛良义 《生命科学》2007,19(2):122-126
ncRNA通过多种机制调控着基因的表达,生物信息学、基因组SELEX技术及微阵列分析等方法在ncRNA的研究中发挥了重要作用,导致在最近5年发现了大量的新ncRNA,本文就研究ncRNA的各种方法作一简要介绍。  相似文献   

11.
Detecting members of known noncoding RNA (ncRNA) families in genomic DNA is an important part of sequence annotation. However, the most widely used tool for modeling ncRNA families, the covariance model (CM), incurs a high-computational cost when used for genome-wide search. This cost can be reduced by using a filter to exclude sequences that are unlikely to contain the ncRNA of interest, applying the CM only where it is likely to match strongly. Despite recent advances, designing an efficient filter that can detect ncRNA instances lacking strong conservation while excluding most irrelevant sequences remains challenging. In this work, we design three types of filters based on multiple secondary structure profiles (SSPs). An SSP augments a regular profile (i.e., a position weight matrix) with secondary structure information but can still be efficiently scanned against long sequences. Multi-SSPbased filters combine evidence from multiple SSP matches and can achieve high sensitivity and specificity. Our SSP-based filters are extensively tested in BRAliBase III data set, Rfam 9.0, and a published soil metagenomic data set. In addition, we compare the SSPbased filters with several other ncRNA search tools including Infernal (with profile HMMs as filters), ERPIN, and tRNAscan-SE. Our experiments demonstrate that carefully designed SSP filters can achieve significant speedup over unfiltered CM search while maintaining high sensitivity for various ncRNA families. The designed filters and filter-scanning programs are available at our website: www.cse.msu.edu/~yannisun/ssp/.  相似文献   

12.
Many noncoding RNAs (ncRNAs) function through both their sequences and secondary structures. Thus, secondary structure derivation is an important issue in today's RNA research. The state-of-the-art structure annotation tools are based on comparative analysis, which derives consensus structure of homologous ncRNAs. Despite promising results from existing ncRNA aligning and consensus structure derivation tools, there is a need for more efficient and accurate ncRNA secondary structure modeling and alignment methods. In this work, we introduce a consensus structure derivation approach based on grammar string, a novel ncRNA secondary structure representation that encodes an ncRNA's sequence and secondary structure in the parameter space of a context-free grammar (CFG) and a full RNA grammar including pseudoknots. Being a string defined on a special alphabet constructed from a grammar, grammar string converts ncRNA alignment into sequence alignment. We derive consensus secondary structures from hundreds of ncRNA families from BraliBase 2.1 and 25 families containing pseudoknots using grammar string alignment. Our experiments have shown that grammar string-based structure derivation competes favorably in consensus structure quality with Murlet and RNASampler. Source code and experimental data are available at http://www.cse.msu.edu/~yannisun/grammar-string.  相似文献   

13.
14.

Background  

In recent years, RNA molecules that are not translated into proteins (ncRNAs) have drawn a great deal of attention, as they were shown to be involved in many cellular functions. One of the most important computational problems regarding ncRNA is to predict the secondary structure of a molecule from its sequence. In particular, we attempted to predict the secondary structure for a set of unaligned ncRNA molecules that are taken from the same family, and thus presumably have a similar structure.  相似文献   

15.
Sequence-based heuristics for faster annotation of non-coding RNA families   总被引:7,自引:0,他引:7  
MOTIVATION: Non-coding RNAs (ncRNAs) are functional RNA molecules that do not code for proteins. Covariance Models (CMs) are a useful statistical tool to find new members of an ncRNA gene family in a large genome database, using both sequence and, importantly, RNA secondary structure information. Unfortunately, CM searches are extremely slow. Previously, we created rigorous filters, which provably sacrifice none of a CM's accuracy, while making searches significantly faster for virtually all ncRNA families. However, these rigorous filters make searches slower than heuristics could be. RESULTS: In this paper we introduce profile HMM-based heuristic filters. We show that their accuracy is usually superior to heuristics based on BLAST. Moreover, we compared our heuristics with those used in tRNAscan-SE, whose heuristics incorporate a significant amount of work specific to tRNAs, where our heuristics are generic to any ncRNA. Performance was roughly comparable, so we expect that our heuristics provide a high-quality solution that--unlike family-specific solutions--can scale to hundreds of ncRNA families. AVAILABILITY: The source code is available under GNU Public License at the supplementary web site.  相似文献   

16.
Non protein-coding RNAs (ncRNAs) are a research hotspot in bioinformatics. Recent discoveries have revealed new ncRNA families performing a variety of roles, from gene expression regulation to catalytic activities. It is also believed that other families are still to be unveiled. Computational methods developed for protein coding genes often fail when searching for ncRNAs. Noncoding RNAs functionality is often heavily dependent on their secondary structure, which makes gene discovery very different from protein coding RNA genes. This motivated the development of specific methods for ncRNA research. This article reviews the main approaches used to identify ncRNAs and predict secondary structure. During the execution of this work, AML was supported by CAPES fellowship.  相似文献   

17.
18.
The eukaryotic genome contains varying numbers of non-coding RNA(ncRNA) genes.Computational RNomics takes a multidisciplinary approach,like information science,to resolve the structure and function of ncRNAs.Here,we review the main issues in Computational RNomics of data storage and management,ncRNA gene identification and characterization,ncRNA target identification and functional prediction,and we summarize the main methods and current content of computational RNomics.  相似文献   

19.
The eukaryotic genome contains varying numbers of non-coding RNA(ncRNA) genes."Computational RNomics" takes a multidisciplinary approach,like information science,to resolve the structure and function of ncRNAs.Here,we review the main issues in "Computational RNomics" of data storage and management,ncRNA gene identification and characterization,ncRNA target identification and functional prediction,and we summarize the main methods and current content of "computational RNomics".  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号