首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到17条相似文献,搜索用时 170 毫秒
1.
人类基因组中可变和组成性剪接位点的预测   总被引:2,自引:0,他引:2  
根据剪接位点的核酸序列保守特征,以及邻近位点的碱基组成和关联特性,结合一对可变剪接位点之间的距离参数和受体端剪接位点前30位碱基的GC和TC含量,利用结合多样性指标的二次判别方法(IDQD),预测了人类基因组中可变和组成性内含子的供体端和受体端的剪接位点,对可变的供体端和受体端剪接位点,阈值ξ选择-2时,总的预测精度分别为87.9%和89.9%,对组成性的供体端和受体端剪接位点,阈值ξ选择-1,总的预测精度分别为92.8%和94.3%.  相似文献   

2.
选择性剪切是调解基因表达的重要机制。识别选择性剪切位点是后基因组时代的一个重要工作。本文从最新的EBI人类基因选择性剪切数据库中,选取5′/3′选择性剪切位点作为正集,选取在剪切位点附近的假剪切位点作为负集,并把所有的选择性剪切位点和假剪切位点随机分成训练集和测试集。本文选用的预测选择性剪切位点的方法是基于位置权重矩阵和离散增量的支持向量机方法。此方法仅基于训练集,以不同位点的单碱基概率和序列片断的三联体频数作为信息参数,利用位置权重矩阵和离散增量算法结合支持向量机,得到了选择性供体位点和受体位点的分类器,并用此分类器对测试集中的选择性供体位点和受体位点进行预测。对独立测试集中的选择性供体位点和选择性受体位点的预测成功率分别为88.74%和90.86%,特异性分别为85.62%和81.19%。本文预测选择性剪切位点的方法成功率高于其它选择性剪切位点预测方法预测成功率,此预测方法进一步提高了对选择性剪切位点的理论预测能力。  相似文献   

3.
选择性剪切是调解基因表达的重要机制.识别选择性剪切位点是后基因组时代的一个重要工作.本文从最新的EBI人类基因选择性剪切数据库中,选取5'/3'选择性剪切位点作为正集,选取在剪切位点附近的假剪切位点作为负集,并把所有的选择性剪切位点和假剪切位点随机分成训练集和测试集.本文选用的预测选择性剪切位点的方法是基于位置权重矩阵和离散增量的支持向量机方法.此方法仅基于训练集,以不同位点的单碱基概率和序列片断的三联体频数作为信息参数,利用位置权重矩阵和离散增量算法结合支持向量机,得到了选择性供体位点和受体位点的分类器,并用此分类器对测试集中的选择性供体位点和受体位点进行预测.对独立测试集中的选择性供体位点和选择性受体位点的预测成功率分别为88.74%和90.86%,特异性分别为85.62%和81.19%.本文预测选择性剪切位点的方法成功率高于其它选择性剪切位点预测方法预测成功率,此预测方法进一步提高了对选择性剪切位点的理论预测能力.  相似文献   

4.
用支持向量机预测人类基因5'/3'选择性剪切位点   总被引:1,自引:0,他引:1  
选择性剪切是调解基因表达的重要机制.识别选择性剪切位点是后基因组时代的一个重要工作.本文从最新的EBI人类基因选择性剪切数据库中,选取5'/3'选择性剪切位点作为正集,选取在剪切位点附近的假剪切位点作为负集,并把所有的选择性剪切位点和假剪切位点随机分成训练集和测试集.本文选用的预测选择性剪切位点的方法是基于位置权重矩阵和离散增量的支持向量机方法.此方法仅基于训练集,以不同位点的单碱基概率和序列片断的三联体频数作为信息参数,利用位置权重矩阵和离散增量算法结合支持向量机,得到了选择性供体位点和受体位点的分类器,并用此分类器对测试集中的选择性供体位点和受体位点进行预测.对独立测试集中的选择性供体位点和选择性受体位点的预测成功率分别为88.74%和90.86%,特异性分别为85.62%和81.19%.本文预测选择性剪切位点的方法成功率高于其它选择性剪切位点预测方法预测成功率,此预测方法进一步提高了对选择性剪切位点的理论预测能力.  相似文献   

5.
人类基因组盒式外显子和内含子保留的可变剪接位点预测   总被引:2,自引:0,他引:2  
信使RNA的可变剪接是真核生物有别于原核生物的基本特征之一,信使RNA前体的可变剪接极大地丰富了高等真核生物蛋白质的多样性,并与生物体的组织特异性密切相关。文章对人类盒式外显子和内含子保留的一些基本特征进行了统计;根据剪接位点附近的单碱基、碱基二联体和三联体的保守性等特征,利用基于多样性指标的二次判别法,对盒式外显子和内含子保留的供体端和受体端可变剪接位点进行了预测。交叉检验结果表明,盒式外显子供体端和受体端的识别精度分别达到93%、84%以上的水平;内含子保留供体端和受体端的识别精度分别达到89%、81%以上的水平。  相似文献   

6.
完整基因结构的预测是当前生命科学研究的一个重要基础课题,其中一个关键环节是剪接位点和各种可变剪接事件的精确识别.基于转录组测序(RNA-seq)数据,识别剪接位点和可变剪接事件是近几年随着新一代测序技术发展起来的新技术策略和方法.本工作基于黑腹果蝇睾丸RNA-seq数据,使用TopHat软件成功识别出39718个果蝇剪接位点,其中有10584个新剪接位点.同时,基于剪接位点的不同组合,针对各类型可变剪接特征开发出计算识别算法,成功识别了8477个可变剪接事件(其中新识别的可变剪接事件3922个),包括可变供体位点、可变受体位点、内含子保留和外显子缺失4种类型.RT-PCR实验验证了2个果蝇基因上新识别的可变剪接事件,发现了全新的剪接异构体.进一步表明,RNA-seq数据可有效应用于识别剪接位点和可变剪接事件,为深入揭示剪接机制及可变剪接生物学功能提供新思路和新手段.  相似文献   

7.
真核基因受体位点识别是剪接位点识别的一部分,也是基因识别中的重要环节,一直受到研究人员的关注。已有的研究结果显示受体位点的识别与分支位点有关,然而关于分支位点和受体位点识别的关系问题,目前还无人将其作为专门的问题予以深入研究。从受体位点识别出发,选取不同的受体位点序列长度,以神经网络为识别工具,对分支位点在受体位点识别中的作用做了深入研究和分析。实验结果表明,受体位点序列的特征信息集中在分支位点一例,因此分支位点在受体位点识别中具有重要作用。研究结果为受体位点识别问题中序列特征提取提供了依据。  相似文献   

8.
基于机器学习的高精度剪接位点识别是真核生物基因组注释的关键.本文采用卡方测验确定序列窗口长度,构建卡方统计差表提取位置特征,并结合碱基二联体频次表征序列;针对剪接位点正负样本高度不均衡这一情形,构建10个正负样本均衡的支持向量机分类器,进行加权投票决策,有效解决了不平衡模式分类问题. HS~3D数据集上的独立测试结果显示,供体、受体位点预测准确率分别达到93.39%、90.46%,明显高于参比方法.基于卡方统计差表的位置特征能有效表征DNA序列,在分子序列信号位点识别中具有应用前景.  相似文献   

9.
针对传统基因剪接位点识别方法具有所用到的序列长,且参数多的问题,论文提出了一种基于KL距离的变长马尔可夫模型(Kullback Leibler divergence-variable length Markovmodel,KL-VLMM)。该模型在变长马尔可夫模型的基础上进行改进,由KL距离代替原来的概率比值来判断序列扩展的方向,有效地提高了特征序列的识别能力,且模型阶数由二阶降为一阶,降低了算法的空间复杂度。利用人类剪接位点数据库N269,对该模型和其他传统方法的识别性能进行了比较。实验结果表明,采用KL-VLMM方法预测人类基因剪接位点的预测效果更好。  相似文献   

10.
本文提出了一种基于卷积神经网络和循环神经网络的深度学习模型,通过分析基因组序列数据,识别人基因组中环形RNA剪接位点.首先,根据预处理后的核苷酸序列,设计了2种网络深度、8种卷积核大小和3种长短期记忆(long short term memory,LSTM)参数,共8组16个模型;其次,进一步针对池化层进行均值池化和最大池化的测试,并加入GC含量提高模型的预测能力;最后,对已经实验验证过的人类精浆中环形RNA进行了预测.结果表明,卷积核尺寸为32×4、深度为1、LSTM参数为32的模型识别率最高,在训练集上为0.9824,在测试数据集上准确率为0.95,并且在实验验证数据上的正确识别率为83%.该模型在人的环形RNA剪接位点识别方面具有较好的性能.  相似文献   

11.
The choice of a splice site is not only related to its own intrinsic strength, but also is influenced by its flanking competitors. Splice site competition is an important mechanism for splice site prediction, especially, it is a new insight for alternative splice site prediction. In this paper, the position weight matrix scoring function is used to represent splice site strength, and the mechanism of splice site competition is described by only one parameter: scoring function subtraction. While applying on the alternative splice site prediction, based on the only one parameter, 68.22% of donor sites and 70.86% of acceptor sites are correctly classified into alternative and constitutive. The prediction abilities are approximately equal to the recent method which is based on the mechanism of splice site competition. The results reveal that the scoring function subtraction is the best parameter to describe the mechanism of splice sites competition.  相似文献   

12.
Constant denaturant capillary electrophoresis (CDCE), based on co-operative DNA melting equilibria, has the resolving power to separate single nucleotide mutants from wild type sequences. We used this technique to study mutations in a 70-bp isomelting domain of the human HPRT gene, which included the entire exon 5 and its flanking splice donor and acceptor sites. Pooled samples of 6-thioguanine selected T-cell clones from 51 healthy donors representing a total of approximately 1000 individual HPRT mutants were analysed. Slow moving peaks from the heteroduplex part of the CDCE electropherograph were collected and subjected to a second round of PCR and CDCE analysis, followed by DNA sequencing. Five independent mutations were detected. Four were splicing errors; one insertion of CC and two G-->A transitions in the splice donor site of intron 5, and one G-->C transversion in the splice acceptor site of intron 4. The fifth mutation was a missense transversion, T389>G. A reconstruction experiment, in which DNA with known mutation was mixed with wild type DNA, showed the sensitivity of mutation detection to be better than 1:100 under the conditions used in this study. These results demonstrate the high sensitivity of the CDCE-method for mutation screening.  相似文献   

13.
Prediction of human mRNA donor and acceptor sites from the DNA sequence   总被引:40,自引:0,他引:40  
Artificial neural networks have been applied to the prediction of splice site location in human pre-mRNA. A joint prediction scheme where prediction of transition regions between introns and exons regulates a cutoff level for splice site assignment was able to predict splice site locations with confidence levels far better than previously reported in the literature. The problem of predicting donor and acceptor sites in human genes is hampered by the presence of numerous amounts of false positives: here, the distribution of these false splice sites is examined and linked to a possible scenario for the splicing mechanism in vivo. When the presented method detects 95% of the true donor and acceptor sites, it makes less than 0.1% false donor site assignments and less than 0.4% false acceptor site assignments. For the large data set used in this study, this means that on average there are one and a half false donor sites per true donor site and six false acceptor sites per true acceptor site. With the joint assignment method, more than a fifth of the true donor sites and around one fourth of the true acceptor sites could be detected without accompaniment of any false positive predictions. Highly confident splice sites could not be isolated with a widely used weight matrix method or by separate splice site networks. A complementary relation between the confidence levels of the coding/non-coding and the separate splice site networks was observed, with many weak splice sites having sharp transitions in the coding/non-coding signal and many stronger splice sites having more ill-defined transitions between coding and non-coding.  相似文献   

14.
To evaluate the importance of the surrounding nucleotide sequence in the selection of a splice site for mRNA, we have carried out computer studies of eukaryotic protein genes whose entire nucleotide sequences were available. A splice site-like sequence that has a significant homology to the consensus splice junction sequences is frequently found within an intron and exon. It is found that the higher the homology of a candidate donor site sequence to the nine-nucleotide consensus sequence, the higher is its probability of being a donor site. For most of the donors, the stability of presumed base-pairing with U1-RNA is higher than that of donor-like sequences, if any, in the adjacent exon and intron. However, homology of a candidate acceptor sequence to the 15-nucleotide consensus is a poor criterion of an acceptor site. The presence of a sequence that could serve as a branch-point 18 to 37 nucleotides before an acceptor does not seem to be critical in distinguishing it from an acceptor-like sequence. For genes of human, rat, mouse and chicken, respectively, nucleotide frequencies around splice junctions of many genes have been calculated. They seem to be different at some positions around a donor site from species to species. The acceptors for these vertebrates have longer pyrimidine-rich regions than the previous consensus sequence. The newly derived nucleotide frequencies were used as the standard to calculate the weighted homology score of a candidate splice site sequence in a gene of the four species. This weighted homology score of the 40 to 60-nucleotide intron-exon sequence is a much better criterion of an acceptor. These results suggest that the most important signal in the selection of a splice resides in the surrounding nucleotide sequence. It is also suggested that the surrounding nucleotide sequence alone is not generally sufficient for the selection.  相似文献   

15.
A new method which predicts internal exon sequences in human DNA has been developed. The method is based on a splice site prediction algorithm that uses the linear discriminant function to combine information about significant triplet frequencies of various functional parts of splice site regions and preferences of oligonucleotides in protein coding and intron regions. The accuracy of our splice site recognition function is 97% for donor splice sites and 96% for acceptor splice sites. For exon prediction, we combine in a discriminant function the characteristics describing the 5'-intron region, donor splice site, coding region, acceptor splice site and 3'-intron region for each open reading frame flanked by GT and AG base pairs. The accuracy of precise internal exon recognition on a test set of 451 exon and 246693 pseudoexon sequences is 77% with a specificity of 79%. The recognition quality computed at the level of individual nucleotides is 89% for exon sequences and 98% for intron sequences. This corresponds to a correlation coefficient for exon prediction of 0.87. The precision of this approach is better than other methods and has been tested on a larger data set. We have also developed a means for predicting exon-exon junctions in cDNA sequences, which can be useful for selecting optimal PCR primers.  相似文献   

16.
The mechanism of cellular src (c-src) transduction by a transformation-defective deletion mutant, td109, of Rous sarcoma virus was studied by sequence analysis of the recombinational junctions in three td109-derived recovered sarcoma viruses (rASVs). Our results show that two rASVs have been generated by recombination between td109 and c-src at the region between exons 1 and 2 defined previously. Significant homology between td109 and c-src sequences was present at the sites of recombination. The viral and c-src sequence junction of the third rASV was formed by splicing a cryptic donor site at the 5' region of env of td109 to exon 1 of c-src. Various lengths of c-src internal intron 1 sequences were incorporated into all three rASV genomes, which resulted from activation of potential splice donor and acceptor sites. The incorporated intron 1 sequences were absent in the c-src mRNA, excluding its being the precursor for recombination with td109 and implying that initial recombinations most likely took place at the DNA level. A potential splice acceptor site within the incorporated intron 1 sequences in two rASVs was activated and was used for the src mRNA synthesis in infected cells. The normal env mRNA splice acceptor site was used for src mRNA synthesis for the third rASV.  相似文献   

17.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号