首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 578 毫秒
1.
基于支持向量机的人类5’非翻译区剪接位点识别   总被引:5,自引:0,他引:5  
基因非编码区域剪接位点的识别是基因识别中一个非常具有挑战性的问题,尤其是5’非翻译区中剪接位点的识别。与一般剪接位点不同,5’非翻译区剪接位点的两侧不存在由编码到非编码的状态转移,所以通常的剪接位点识别算法在非翻译区的性能不太理想。文章采用了基于支持向量机的方法对5’非翻译区中的剪接位点进行识别。为了提高识别精度,采用了基于矩阵相似性度量的核函数参数选取方法,它能够简单快速地确定合适的核函数参数,进而提高核函数的识别性能。通过实验验证,经过参数选择后的支持向量机能够较好地识别5'非翻译区剪接位点。  相似文献   

2.
采用基于贝叶斯网络的建模方法,预测真核生物DNA序列中的剪接位点.分别建立了供体位点和受体位点模型,并根据两种位点的生物学特性,对模型的拓扑结构和上下游节点的选择进行了优化.通过贝叶斯网络的最大似然学习算法求出模型参数后,利用10分组交互验证方法对测试数据进行剪接位点预测。结果显示,受体位点的平均预测准确率为92.5%,伪受体位点的平均预测准确率为94.0%,供体位点的平均预测准确率为92.3%,伪供体位点的平均预测准确率为93.5%,整体效果要好于基于使用独立和条件概率矩阵、以及隐Markov模型的预测方法.表明利用贝叶斯网络对剪接位点建模是预测剪接位点的一种有效手段.  相似文献   

3.
把最大信息原理应用到核酸序列的保守位点分析中。利用最大信息原理,推导出了核酸和蛋白质特异性结合时的结合能表达式,并且估计了和蛋白质发生相互作用的核酸序列上的位点范围。为了检验此理论是否较为成功地反映了核酸和蛋白质结合时的实际情况,把它应用到基因内含子剪切位点的识别中,识别结果达到了较高的敏感性和特异性,这说明利用最大信息原理推导结合能表达式及估计核酸序列上参与反应的位点范围的理论是较为成功的。此研究结果一方面有助于核酸和蛋白质相互作用的理解,另一方面,也有助于和蛋白质发生相互作用的各种核酸序列的计算机识别研究。  相似文献   

4.
完整基因结构的预测是当前生命科学研究的一个重要基础课题,其中一个关键环节是剪接位点和各种可变剪接事件的精确识别.基于转录组测序(RNA-seq)数据,识别剪接位点和可变剪接事件是近几年随着新一代测序技术发展起来的新技术策略和方法.本工作基于黑腹果蝇睾丸RNA-seq数据,使用TopHat软件成功识别出39718个果蝇剪接位点,其中有10584个新剪接位点.同时,基于剪接位点的不同组合,针对各类型可变剪接特征开发出计算识别算法,成功识别了8477个可变剪接事件(其中新识别的可变剪接事件3922个),包括可变供体位点、可变受体位点、内含子保留和外显子缺失4种类型.RT-PCR实验验证了2个果蝇基因上新识别的可变剪接事件,发现了全新的剪接异构体.进一步表明,RNA-seq数据可有效应用于识别剪接位点和可变剪接事件,为深入揭示剪接机制及可变剪接生物学功能提供新思路和新手段.  相似文献   

5.
基于支持向量机(SVM)的剪接位点识别   总被引:14,自引:1,他引:13  
剪接位点的识别作为基因识别中的一个重要环节, 一直受到研究人员的关注。考虑到剪接位点附近存在的序列保守性,已有一些基于统计特性的方法被用于剪接位点的识别中,但效果仍有待进一步改进。支持向量机(Support Vector Machines) 作为一种新的基于统计学习理论的学习机,近几年有了很大的发展,已被应用在模式识别的许多问题中。文中将其用于剪接位点的识别中,并针对满足GT- AG 规则的序列样本中虚假剪接位点的样本数远大于真实位点这一特性, 提出了一种基于SVM 的平衡取小法以获得更好的识别效果。实验结果表明,应用支持向量机进行剪接位点的识别能更好地提取位点附近保守序列的统计特征,对测试集具有更好的推广能力,并且使用上更加简单。这一结果为剪接位点的识别提供了一种新的方法,同时也为生物大分子研究中结构和位点的识别问题的解决提供了新的线索。  相似文献   

6.
DNA序列功能位点的识别是目前生物信息学领域的一个研究热点,剪接位点的识别就是其中之一.为了充分利用剪接位点的特征模式,从而更好地识别剪接位点,建立了一个基于改进Winnow算法的剪接位点识别系统.与其他方法相比较,改进的Winnow算法具有更好的鲁棒性,适用于高维特征空间,能够融合多种模式信息,即使在包含很多不相关特征的情况下,也能有很好的性能.同时在训练的时候,对特征集进行了剪枝,把一些对识别几乎没有贡献的特征去除,这样做对结果的影响可以忽略,而且提高了算法的效率.通过实验验证,改进的Winnow算法可以很好地识别剪接位点,其多个性能指标达到或超过目前国际上流行的剪接位点识别软件.  相似文献   

7.
可变剪接源于多外显子基因生成多个转录本的调控过程。随着高通量测序,尤其是RNA-seq的研究进展,剪接序列和剪接位点可以通过挖掘海量的测序数据进行预测。可变剪接现象拓宽了人们对基因结构和蛋白质亚型的知识。然而现有的短序列比对软件受到随机性比对的影响,产生很多假阳性剪接位点,干扰下游数据分析。本研究发现,可变剪接位点周边序列的结构特征可被深度学习模型提取,并利用深度卷积神经网络识别剪接位点。本研究的模型具有识别率高、计算速度快,模型泛化能力强、鲁棒性高等优势。  相似文献   

8.
基于机器学习的高精度剪接位点识别是真核生物基因组注释的关键.本文采用卡方测验确定序列窗口长度,构建卡方统计差表提取位置特征,并结合碱基二联体频次表征序列;针对剪接位点正负样本高度不均衡这一情形,构建10个正负样本均衡的支持向量机分类器,进行加权投票决策,有效解决了不平衡模式分类问题. HS~3D数据集上的独立测试结果显示,供体、受体位点预测准确率分别达到93.39%、90.46%,明显高于参比方法.基于卡方统计差表的位置特征能有效表征DNA序列,在分子序列信号位点识别中具有应用前景.  相似文献   

9.
在真核生物的基因中,mRNA选择性剪接现象十分普遍。mRNA选择性剪接导致一个基因多转录本的产生,被认为是高等生物增加蛋白质多样性的主要机制,且已发现与许多人类疾病密切相关。发现这些转录本的选择性剪接位点、新的外显子和外显子组合,乃至获得这些剪接变异体的完整克隆,对于基因功能的深入研究十分必要。简要介绍了几种在mRNA水平探索选择性剪接的方法。  相似文献   

10.
人类基因组盒式外显子和内含子保留的可变剪接位点预测   总被引:2,自引:0,他引:2  
信使RNA的可变剪接是真核生物有别于原核生物的基本特征之一,信使RNA前体的可变剪接极大地丰富了高等真核生物蛋白质的多样性,并与生物体的组织特异性密切相关。文章对人类盒式外显子和内含子保留的一些基本特征进行了统计;根据剪接位点附近的单碱基、碱基二联体和三联体的保守性等特征,利用基于多样性指标的二次判别法,对盒式外显子和内含子保留的供体端和受体端可变剪接位点进行了预测。交叉检验结果表明,盒式外显子供体端和受体端的识别精度分别达到93%、84%以上的水平;内含子保留供体端和受体端的识别精度分别达到89%、81%以上的水平。  相似文献   

11.
Jin HY  Luo LF  Zhang LR 《Gene》2008,424(1-2):115-120
A crucial part in the gene structure prediction is to identify the accurate splice sites, not only constitutive but also alternative ones. Here, we use the maximum information principle (MIP) to analyze the conservative segments around splice sites. According to the MIP, a reaction free energy (RFE) expression is deduced, which can be employed to estimate the free energy change during splicing reaction involving a donor or acceptor site. The expression contains not only the background probability factors, but also all kinds of dependencies among both adjacent and non-adjacent bases. We apply the RFE expression to recognize splice sites and their flanking competitors in human genes, the results show high sensitivity and specificity, so the RFE expression accords well with the splicing reaction process. Moreover, the RFE expression is better than previous methods for predicting competitors of splice sites, and it outperforms the reaction free energy subtraction (RFES), that implies RFE competition between a given splice site and its flanking competitor may not be an only primary factor for alternative splice site selection. The work is helpful to not only the understanding of splicing reaction from its relation to MIP, but also the research on computational recognition of splicing sites and alternative splice events.  相似文献   

12.
Xia H  Bi J  Li Y 《Nucleic acids research》2006,34(21):6305-6313
Alternative splicing plays an important role in regulating gene expression. Currently, most efficient methods use expressed sequence tags or microarray analysis for large-scale detection of alternative splicing. However, it is difficult to detect all alternative splice events with them because of their inherent limitations. Previous computational methods for alternative splicing prediction could only predict particular kinds of alternative splice events. Thus, it would be highly desirable to predict alternative 5'/3' splice sites with various splicing levels using genomic sequences alone. Here, we introduce the competition mechanism of splice sites selection into alternative splice site prediction. This approach allows us to predict not only rarely used but also frequently used alternative splice sites. On a dataset extracted from the AltSplice database, our method correctly classified approximately 70% of the splice sites into alternative and constitutive, as well as approximately 80% of the locations of real competitors for alternative splice sites. It outperforms a method which only considers features extracted from the splice sites themselves. Furthermore, this approach can also predict the changes in activation level arising from mutations in flanking cryptic splice sites of a given splice site. Our approach might be useful for studying alternative splicing in both computational and molecular biology.  相似文献   

13.

Background  

Accurate selection of splice sites during the splicing of precursors to messenger RNA requires both relatively well-characterized signals at the splice sites and auxiliary signals in the adjacent exons and introns. We previously described a feature generation algorithm (FGA) that is capable of achieving high classification accuracy on human 3' splice sites. In this paper, we extend the splice-site prediction to 5' splice sites and explore the generated features for biologically meaningful splicing signals.  相似文献   

14.
Despite the important role of alternative splicing in various aspects of biological processes, our ability to regulate this process at will remains a challenge. In this report, we asked whether a theophylline-responsive riboswitch could be adapted to manipulate alternative splicing. We constructed a pre-mRNA containing a single upstream 5' splice site and two 3' splice sites, of which the proximal 3' splice site is embedded in theophylline-responsive riboswitch. We show that this pre-mRNA spliced with preferential utilization of proximal 3' splice site in vitro. However, addition of theophylline to the splicing reaction promoted splicing at distal 3' splice site thereby changing the ratio of distal-to-proximal 3' splice site usage by more than twofold. Our data suggest that theophylline influenced 3' splice site choice without affecting the kinetics of the splicing reaction. We conclude that an in vitro selected riboswitch can be adapted to control alternative splicing, which may find many applications in basic, biotechnological, and biomedical research.  相似文献   

15.
Liu X  Mayeda A  Tao M  Zheng ZM 《Journal of virology》2003,77(3):2105-2115
Bovine papillomavirus type 1 (BPV-1) late pre-mRNAs are spliced in keratinocytes in a differentiation-specific manner: the late leader 5' splice site alternatively splices to a proximal 3' splice site (at nucleotide 3225) to express L2 or to a distal 3' splice site (at nucleotide 3605) to express L1. Two exonic splicing enhancers, each containing two ASF/SF2 (alternative splicing factor/splicing factor 2) binding sites, are located between the two 3' splice sites and have been identified as regulating alternative 3' splice site usage. The present report demonstrates for the first time that ASF/SF2 is required under physiological conditions for the expression of BPV-1 late RNAs and for selection of the proximal 3' splice site for BPV-1 RNA splicing in DT40-ASF cells, a genetically engineered chicken B-cell line that expresses only human ASF/SF2 controlled by a tetracycline-repressible promoter. Depletion of ASF/SF2 from the cells by tetracycline greatly decreased viral RNA expression and RNA splicing at the proximal 3' splice site while increasing use of the distal 3' splice site in the remaining viral RNAs. Activation of cells lacking ASF/SF2 through anti-immunoglobulin M-B-cell receptor cross-linking rescued viral RNA expression and splicing at the proximal 3' splice site and enhanced Akt phosphorylation and expression of the phosphorylated serine/arginine-rich (SR) proteins SRp30s (especially SC35) and SRp40. Treatment with wortmannin, a specific phosphatidylinositol 3-kinase/Akt kinase inhibitor, completely blocked the activation-induced activities. ASF/SF2 thus plays an important role in viral RNA expression and splicing at the proximal 3' splice site, but activation-rescued viral RNA expression and splicing in ASF/SF2-depleted cells is mediated through the phosphatidylinositol 3-kinase/Akt pathway and is associated with the enhanced expression of other SR proteins.  相似文献   

16.
TIA-1 has recently been shown to activate splicing of specific pre-mRNAs transcribed from transiently transfected minigenes, and of some 5' splice sites in vitro, but has not been shown to activate splicing of any endogenous pre-mRNA. We show here that overexpression of TIA-1 or the related protein TIAR has little effect on splicing of several endogenous pre-mRNAs containing alternative exons, but markedly activates splicing of some normally rarely used alternative exons on the TIA-1 and TIAR pre-mRNAs. These exons have weak 5' splice sites followed by U-rich stretches. When the U-rich stretch following the 5' splice site of a TIA-1 alternative exon was deleted, TIAR overexpression induced use of a cryptic 5' splice site also followed by a U-rich stretch in place of the original splice site. Using in vitro splicing assays, we have shown that TIA-1 is directly involved in activating the 5' splice sites of the TIAR alternative exons. Activation requires a downstream U-rich stretch of at least 10 residues. Our results confirm that TIA-1 activates 5' splice sites followed by U-rich sequences and show that TIAR exerts a similar activity. They suggest that both proteins may autoregulate their expression at the level of splicing.  相似文献   

17.
During an adenovirus infection the expression of mRNA from late region L1 is temporally regulated at the level of alternative 3' splice site selection to produce two major mRNAs encoding the 52,55K and IIIa polypeptides. The proximal 3' splice site (52,55K) is used at all times of the infectious cycle whereas the distal site (IIIa) is used exclusively late after infection. We show that a single A branch nucleotide located at position -23 is used in 52,55K splicing and that two A's located at positions -21 and -22 are used in IIIa splicing. Both 3' splice sites were active in vitro in nuclear extracts prepared from uninfected HeLa cells. However, the efficiency of IIIa splicing was only approximately 10% of 52,55K splicing. This difference in splice site activity correlated with a reduced affinity of the IIIa, relative to the 52,55K, 3' splice site for polypyrimidine tract binding proteins. Reversing the order of 3' splice sites on a tandem pre-mRNA resulted in an almost exclusive IIIa splicing indicating that the order of 3' splice site presentation is important for the outcome of alternative L1 splicing. Based on our results we suggest a cis competition model where the two 3' splice sites compete for a common RNA splicing factor(s). This may represent an important mechanism by which L1 alternative splicing is regulated.  相似文献   

18.
The role of U2AF35 and U2AF65 in enhancer-dependent splicing.   总被引:6,自引:1,他引:5       下载免费PDF全文
Splicing enhancers are RNA sequence elements that promote the splicing of nearby introns. The mechanism by which these elements act is still unclear. Some experiments support a model in which serine-arginine (SR)-rich proteins function as splicing activators by binding to enhancers and recruiting the splicing factor U2AF to an adjacent weak 3' splice site. In this model, recruitment requires interactions between the SR proteins and the 35-kDa subunit of U2AF (U2AF35). However, more recent experiments have not supported the U2AF recruitment model. Here we provide additional evidence for the recruitment model. First, we confirm that base substitutions that convert weak 3' splice sites to a consensus sequence, and therefore increase U2AF binding, relieve the requirement for a splicing activator. Second, we confirm that splicing activators are required for the formation of early spliceosomal complexes on substrates containing weak 3' splice sites. Most importantly, we find that splicing activators promote the binding of both U2AF65 and U2AF35 to weak 3' splice sites under splicing conditions. Finally, we show that U2AF35 is required for maximum levels of activator-dependent splicing. We conclude that a critical function of splicing activators is to recruit U2AF to the weak 3' splice sites of enhancer-dependent introns, and that efficient enhancer-dependent splicing requires U2AF35.  相似文献   

19.
Prediction of splice sites in non-coding regions of genes is one of the most challenging aspects of gene structure recognition. We perform a rigorous analysis of such splice sites embedded in human 5' untranslated regions (UTRs), and investigate correlations between this class of splice sites and other features found in the adjacent exons and introns. By restricting the training of neural network algorithms to 'pure' UTRs (not extending partially into protein coding regions), we for the first time investigate the predictive power of the splicing signal proper, in contrast to conventional splice site prediction, which typically relies on the change in sequence at the transition from protein coding to non-coding. By doing so, the algorithms were able to pick up subtler splicing signals that were otherwise masked by 'coding' noise, thus enhancing significantly the prediction of 5' UTR splice sites. For example, the non-coding splice site predicting networks pick up compositional and positional bias in the 3' ends of non-coding exons and 5' non-coding intron ends, where cytosine and guanine are over-represented. This compositional bias at the true UTR donor sites is also visible in the synaptic weights of the neural networks trained to identify UTR donor sites. Conventional splice site prediction methods perform poorly in UTRs because the reading frame pattern is absent. The NetUTR method presented here performs 2-3-fold better compared with NetGene2 and GenScan in 5' UTRs. We also tested the 5' UTR trained method on protein coding regions, and discovered, surprisingly, that it works quite well (although it cannot compete with NetGene2). This indicates that the local splicing pattern in UTRs and coding regions is largely the same. The NetUTR method is made publicly available at www.cbs.dtu.dk/services/NetUTR.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号