首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
为提高非翻译区剪接位点识别的精度,提出一种统计概率与支持向量机相结合的识别方法 .该方法主要分为两个阶段,第一阶段应用统计学方法对非翻译区(UTR)序列进行描述,将序列中各碱基之间的相关性、位置特异性、保守性等特征用概率形式描述,以概率参数作为第二阶段支持向量机的输入向量,第二阶段应用带有多项式核函数的支持向量机(SVM)对剪接位点进行识别.通过对人类5′UTR剪接位点数据集进行测试,结果表明:该方法对非翻译区剪接位点的识别取得了很好的效果.  相似文献   

2.
基于支持向量机(SVM)的剪接位点识别   总被引:14,自引:1,他引:13  
剪接位点的识别作为基因识别中的一个重要环节, 一直受到研究人员的关注。考虑到剪接位点附近存在的序列保守性,已有一些基于统计特性的方法被用于剪接位点的识别中,但效果仍有待进一步改进。支持向量机(Support Vector Machines) 作为一种新的基于统计学习理论的学习机,近几年有了很大的发展,已被应用在模式识别的许多问题中。文中将其用于剪接位点的识别中,并针对满足GT- AG 规则的序列样本中虚假剪接位点的样本数远大于真实位点这一特性, 提出了一种基于SVM 的平衡取小法以获得更好的识别效果。实验结果表明,应用支持向量机进行剪接位点的识别能更好地提取位点附近保守序列的统计特征,对测试集具有更好的推广能力,并且使用上更加简单。这一结果为剪接位点的识别提供了一种新的方法,同时也为生物大分子研究中结构和位点的识别问题的解决提供了新的线索。  相似文献   

3.
低维输入空间的支持向量机识别人类剪接位点   总被引:1,自引:0,他引:1  
真核生物剪接位点的识别作为基因阵构成的向量来表示序列,用支持向量机在六维向量空间中寻找最优超平面,从而将真实的剪接位点和虚假的剪接位点进行分类.计算结果表明,利用这样的算法预测人类的剪接位点,有较好的预测效果.与其他的一些算法相比,表现出参数少,精度高等优点.  相似文献   

4.
新近的基因识别软件比先前的软件有着显著的提高 ,但是在外显子水平上的敏感性和特异性仍然不十分令人满意 .这是因为已有软件对于剪接位点 ,翻译起始等生物信号位点的识别还不够有效 .如果能够分别提高这些生物信号位点的识别效果 ,就能够提高整体的基因识别效率 .隐半马氏模型能够很好地刻画 3′剪接位点 (acceptor)的结构 .据此开发的一套对acceptor进行识别的算法在Burset/Guigo的数据集上经过检验 ,获得了比已有算法更好的识别率 .该模型的成功还使得我们对剪接点上游的分支位点和嘧啶富含区的概貌有了一定的认识 ,加深了人们对于acceptor的结构和剪接过程的理解  相似文献   

5.
新近的基因识别软件比先前的软件有着显著的提高,但是在外显子水平上的敏感性和特异性仍然不十分令人满意.这是因为已有软件对于剪接位点,翻译起始等生物信号位点的识别还不够有效.如果能够分别提高这些生物信号位点的识别效果,就能够提高整体的基因识别效率.隐半马氏模型能够很好地刻画3'剪接位点(acceptor)的结构.据此开发的一套对acceptor进行识别的算法在Burset/Guigo的数据集上经过检验,获得了比已有算法更好的识别率.该模型的成功还使得我们对剪接点上游的分支位点和嘧啶富含区的概貌有了一定的认识,加深了人们对于acceptor的结构和剪接过程的理解.  相似文献   

6.
隐半马氏模型在3′剪接位点识别中的应用(英)   总被引:1,自引:0,他引:1       下载免费PDF全文
新近的基因识别软件比先前的软件有着显著的提高,但是在外显子水平上的敏感性和特异性仍然不十分令人满意.这是因为已有软件对于剪接位点,翻译起始等生物信号位点的识别还不够有效.如果能够分别提高这些生物信号位点的识别效果,就能够提高整体的基因识别效率.隐半马氏模型能够很好地刻画3′剪接位点(acceptor)的结构.据此开发的一套对acceptor进行识别的算法在Burset/Guigo的数据集上经过检验,获得了比已有算法更好的识别率.该模型的成功还使得我们对剪接点上游的分支位点和嘧啶富含区的概貌有了一定的认识,加深了人们对于acceptor的结构和剪接过程的理解.  相似文献   

7.
目的:计算识别果蝇中新的非经典剪接位点,以探索未知的剪接机制。方法:基于黑腹果蝇表达序列标签(EST)与其基因组序列比对数据重构基因结构,从中发现非经典的剪接位点,并采用Weblogo软件分析非经典剪接位点上下游序列,以期发现剪接相关的特异性元件。结果:共得到265个非经典的剪接位点,这些剪接位点落在195个蛋白编码基因上。结论:应用生物信息学方法在果蝇中发现了上百个非经典剪接位点,为研究非经典剪接机制奠定了基础。  相似文献   

8.
DNA序列功能位点的识别是目前生物信息学领域的一个研究热点,剪接位点的识别就是其中之一.为了充分利用剪接位点的特征模式,从而更好地识别剪接位点,建立了一个基于改进Winnow算法的剪接位点识别系统.与其他方法相比较,改进的Winnow算法具有更好的鲁棒性,适用于高维特征空间,能够融合多种模式信息,即使在包含很多不相关特征的情况下,也能有很好的性能.同时在训练的时候,对特征集进行了剪枝,把一些对识别几乎没有贡献的特征去除,这样做对结果的影响可以忽略,而且提高了算法的效率.通过实验验证,改进的Winnow算法可以很好地识别剪接位点,其多个性能指标达到或超过目前国际上流行的剪接位点识别软件.  相似文献   

9.
基于机器学习的高精度剪接位点识别是真核生物基因组注释的关键.本文采用卡方测验确定序列窗口长度,构建卡方统计差表提取位置特征,并结合碱基二联体频次表征序列;针对剪接位点正负样本高度不均衡这一情形,构建10个正负样本均衡的支持向量机分类器,进行加权投票决策,有效解决了不平衡模式分类问题. HS~3D数据集上的独立测试结果显示,供体、受体位点预测准确率分别达到93.39%、90.46%,明显高于参比方法.基于卡方统计差表的位置特征能有效表征DNA序列,在分子序列信号位点识别中具有应用前景.  相似文献   

10.
基于支持向量机识别真核生物DNA中的翻译起始位点   总被引:2,自引:1,他引:1  
翻译起始位点(TIS)的识别是真核生物基因预测的关键步骤之一,近年来一直得到研究人员的高度重视。基于TIS附近序列的统计特性,出现了一些辨识TIS的判别方法,但识别精度还有待进一步提高。针对传统支持向量机(SVM)方法中存在的不足,提出了基于数据优化法的SVM,它通过其它统计学模型优化训练数据集,进而提高分类器的辨识精度。实验结果表明基于数据优化法的SVM分类器在翻译起始位点的辨识上可获得比其他判别方法更好的效果。  相似文献   

11.
Prediction of splice sites in non-coding regions of genes is one of the most challenging aspects of gene structure recognition. We perform a rigorous analysis of such splice sites embedded in human 5' untranslated regions (UTRs), and investigate correlations between this class of splice sites and other features found in the adjacent exons and introns. By restricting the training of neural network algorithms to 'pure' UTRs (not extending partially into protein coding regions), we for the first time investigate the predictive power of the splicing signal proper, in contrast to conventional splice site prediction, which typically relies on the change in sequence at the transition from protein coding to non-coding. By doing so, the algorithms were able to pick up subtler splicing signals that were otherwise masked by 'coding' noise, thus enhancing significantly the prediction of 5' UTR splice sites. For example, the non-coding splice site predicting networks pick up compositional and positional bias in the 3' ends of non-coding exons and 5' non-coding intron ends, where cytosine and guanine are over-represented. This compositional bias at the true UTR donor sites is also visible in the synaptic weights of the neural networks trained to identify UTR donor sites. Conventional splice site prediction methods perform poorly in UTRs because the reading frame pattern is absent. The NetUTR method presented here performs 2-3-fold better compared with NetGene2 and GenScan in 5' UTRs. We also tested the 5' UTR trained method on protein coding regions, and discovered, surprisingly, that it works quite well (although it cannot compete with NetGene2). This indicates that the local splicing pattern in UTRs and coding regions is largely the same. The NetUTR method is made publicly available at www.cbs.dtu.dk/services/NetUTR.  相似文献   

12.
In order to explore the mechanism for the genomic replication of classical swine fever virus (CSFV), so as to make a basis for investigating its pathogenicity, an introduction of the information theory is presented in connection with the statistical mechanics, whence small-sample statistics appears naturally as a consequence of the Bayesian approach. Furthermore, a selection rule for identifying the pattern of a recognition site for an RNA-binding protein is proposed by means of the maximum entropy principle. Based on those, the information contents of 3'-untranslated regions (3'UTRs) of genomes of 20 CSFV strains and 5'-untranslated regions (5'UTRs) of genomes of 58 CSFV strains are analyzed with a computational algorithm in a reduction mode, and the 3'UTR sites of 20 strains and 5'UTR sites of 58 strains containing important motifs are extracted from the unaligned RNA sequences of unequal lengths. These sites, which have the patterns of sequence and structure similar to the putative cis elements related to the regulation of genomic replication, would be identified as the potential recognition sites in 3'UTRs and 5'UTRs for CSFV replicase responsible for classical swine fever virus genomic replication, and to some extent, this identification is supported by experimental evidence. Finally, information analysis allows a presumption to be made about the CSFV RNA replication initiation mechanism.  相似文献   

13.
14.
Xiao  Ming  Zhan Zhu  Zhi  Liu  Jueping  Yu Zhang  Chu 《Molecular Biology》2002,36(1):34-43
In order to explore the mechanism for the genomic replication of classical swine fever virus (CSFV), so as to make a basis for investigating its pathogenicity, an introduction of the information theory is presented in connection with the statistical mechanics, whence small-sample statistics appears naturally as a consequence of the Bayesian approach. Furthermore, a selection rule for identifying the pattern of a recognition site for an RNA-binding protein is proposed by means of the maximum entropy principle. Based on those, the information contents of 3"-untranslated regions (3"UTRs) of genomes of 20 CSFV strains and 5"-untranslated regions (5"UTRs) of genomes of 58 CSFV strains are analyzed with a computational algorithm in a reduction mode, and the 3"UTR sites of 20 strains and 5"UTR sites of 58 strains containing important motifs are extracted from the unaligned RNA sequences of unequal lengths. These sites, which have the patterns of sequence and structure similar to the putative cis elements related to the regulation of genomic replication, would be identified as the potential recognition sites in 3"UTRs and 5"UTRs for CSFV replicase responsible for classical swine fever virus genomic replication, and to some extent, this identification is supported by experimental evidence. Finally, information analysis allows a presumption to be made about the CSFV RNA replication initiation mechanism.  相似文献   

15.
16.
Adenosine to inosine (A-to-I) RNA editing is the most abundant editing event in animals. It converts adenosine to inosine in double-stranded RNA regions through the action of the adenosine deaminase acting on RNA (ADAR) proteins. Editing of pre-mRNA coding regions can alter the protein codon and increase functional diversity. However, most of the A-to-I editing sites occur in the non-coding regions of pre-mRNA or mRNA and non-coding RNAs. Untranslated regions (UTRs) and introns are located in pre-mRNA non-coding regions, thus A-to-I editing can influence gene expression by nuclear retention, degradation, alternative splicing, and translation regulation. Non-coding RNAs such as microRNA (miRNA), small interfering RNA (siRNA) and long non-coding RNA (lncRNA) are related to pre-mRNA splicing, translation, and gene regulation. A-to-I editing could therefore affect the stability, biogenesis, and target recognition of non-coding RNAs. Finally, it may influence the function of non-coding RNAs, resulting in regulation of gene expression. This review focuses on the function of ADAR-mediated RNA editing on mRNA non-coding regions (UTRs and introns) and non-coding RNAs (miRNA, siRNA, and lncRNA).  相似文献   

17.
18.
The phenomenon of nonsense-associated altered splicing raises the possibility that the recognition of in-frame nonsense codons is used generally for exon identification during pre-mRNA splicing. However, nonsense codon frequencies in pseudo exons and in regions flanking 5' splice sites are no greater than that expected by chance, arguing against the widespread use of this strategy as a means of rejecting potential splice sites.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号