首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 600 毫秒
1.
糖基化是蛋白质翻译后的主要修饰,O-糖基化的固定模式未知,高精度识别O-糖基化位点是机器学习面临的挑战性问题.以迄今最大的人O-糖基化位点Steentoft数据集为基础,本文首次提出了基于位置的卡方差表特征χ~2-pos,融合伪氨基酸序列进化信息Pse PSSM以及无方向的k间隔氨基酸对组分Undirected-CKSAAP表征序列,构建5个正负样本均衡的支持向量机分类器,经加权投票,独立测试准确率、Matthew相关系数及ROC曲线下面积,分别达到了89.62%、0.79、0.96,明显优于文献报道结果.χ~2-pos、Pse PSSM与Undirected-CKSAAP三种特征的融合在蛋白质糖基化、磷酸化等位点预测中有广泛应用前景.  相似文献   

2.
磷酸化是蛋白质翻译后的主要修饰,可分为激酶特异性和非激酶特异性两种类型.以非激酶特异性磷酸化位点Dou数据集为基础,本文发展了一种基于位置的卡方差表特征χ~2-pos,融合伪氨基酸序列进化信息PsePSSM表征序列,构建正负样本均衡的支持向量机分类器,S,T,Y独立测试Matthew相关系数、ROC曲线下面积分及准确率分别达到了(0.59、0.87、79.74%),(0.55、0.85、77.68%)和(0.50、0.81、75.22%),明显优于文献报道结果.χ~2-pos、PsePSSM两种特征的融合在蛋白质磷酸化位点预测中有广泛应用前景.  相似文献   

3.
基于支持向量机(SVM)的剪接位点识别   总被引:14,自引:1,他引:13  
剪接位点的识别作为基因识别中的一个重要环节, 一直受到研究人员的关注。考虑到剪接位点附近存在的序列保守性,已有一些基于统计特性的方法被用于剪接位点的识别中,但效果仍有待进一步改进。支持向量机(Support Vector Machines) 作为一种新的基于统计学习理论的学习机,近几年有了很大的发展,已被应用在模式识别的许多问题中。文中将其用于剪接位点的识别中,并针对满足GT- AG 规则的序列样本中虚假剪接位点的样本数远大于真实位点这一特性, 提出了一种基于SVM 的平衡取小法以获得更好的识别效果。实验结果表明,应用支持向量机进行剪接位点的识别能更好地提取位点附近保守序列的统计特征,对测试集具有更好的推广能力,并且使用上更加简单。这一结果为剪接位点的识别提供了一种新的方法,同时也为生物大分子研究中结构和位点的识别问题的解决提供了新的线索。  相似文献   

4.
为提高非翻译区剪接位点识别的精度,提出一种统计概率与支持向量机相结合的识别方法 .该方法主要分为两个阶段,第一阶段应用统计学方法对非翻译区(UTR)序列进行描述,将序列中各碱基之间的相关性、位置特异性、保守性等特征用概率形式描述,以概率参数作为第二阶段支持向量机的输入向量,第二阶段应用带有多项式核函数的支持向量机(SVM)对剪接位点进行识别.通过对人类5′UTR剪接位点数据集进行测试,结果表明:该方法对非翻译区剪接位点的识别取得了很好的效果.  相似文献   

5.
磷酸化是蛋白质翻译后的主要修饰,可分为激酶特异性和非激酶特异性两种类型.以非激酶特异性磷酸化位点Dou数据集为基础,本文发展了一种基于位置的卡方差表特征χ2-pos,融合伪氨基酸序列进化信息PsePSSM表征序列,构建正负样本均衡的支持向量机分类器,S, T, Y独立测试Matthew相关系数、ROC曲线下面积分及准确率分别达到了(0.59、0.87、79.74%),(0.55、0.85、77.68%)和(0.50、0.81、75.22%),明显优于文献报道结果. χ2-pos、PsePSSM两种特征的融合在蛋白质磷酸化位点预测中有广泛应用前景.  相似文献   

6.
DNA序列功能位点的识别是目前生物信息学领域的一个研究热点,剪接位点的识别就是其中之一.为了充分利用剪接位点的特征模式,从而更好地识别剪接位点,建立了一个基于改进Winnow算法的剪接位点识别系统.与其他方法相比较,改进的Winnow算法具有更好的鲁棒性,适用于高维特征空间,能够融合多种模式信息,即使在包含很多不相关特征的情况下,也能有很好的性能.同时在训练的时候,对特征集进行了剪枝,把一些对识别几乎没有贡献的特征去除,这样做对结果的影响可以忽略,而且提高了算法的效率.通过实验验证,改进的Winnow算法可以很好地识别剪接位点,其多个性能指标达到或超过目前国际上流行的剪接位点识别软件.  相似文献   

7.
完整基因结构的预测是当前生命科学研究的一个重要基础课题,其中一个关键环节是剪接位点和各种可变剪接事件的精确识别.基于转录组测序(RNA-seq)数据,识别剪接位点和可变剪接事件是近几年随着新一代测序技术发展起来的新技术策略和方法.本工作基于黑腹果蝇睾丸RNA-seq数据,使用TopHat软件成功识别出39718个果蝇剪接位点,其中有10584个新剪接位点.同时,基于剪接位点的不同组合,针对各类型可变剪接特征开发出计算识别算法,成功识别了8477个可变剪接事件(其中新识别的可变剪接事件3922个),包括可变供体位点、可变受体位点、内含子保留和外显子缺失4种类型.RT-PCR实验验证了2个果蝇基因上新识别的可变剪接事件,发现了全新的剪接异构体.进一步表明,RNA-seq数据可有效应用于识别剪接位点和可变剪接事件,为深入揭示剪接机制及可变剪接生物学功能提供新思路和新手段.  相似文献   

8.
前体mRNA的剪接是真核基因表达的关键阶段,识别剪接位点对基因表达也起着至关重要的作用。作者用紧邻与非紧邻的位置关联权重矩阵及组成分的多样性增量得到的五维特征向量来表示序列,应用支持向量机对供体位点和受体位点进行识别。采用5-fold交叉检验,得到供体和受体位点的马修斯相关系数分别为0.924和0.947,ROC曲线下面积分别为99.08%和99.54%。与一些传统方法相比,这一方法考虑了位点之间的相关性和序列的生物信息,表现出特征少、精度高等优点。  相似文献   

9.
目的:计算识别果蝇中新的非经典剪接位点,以探索未知的剪接机制。方法:基于黑腹果蝇表达序列标签(EST)与其基因组序列比对数据重构基因结构,从中发现非经典的剪接位点,并采用Weblogo软件分析非经典剪接位点上下游序列,以期发现剪接相关的特异性元件。结果:共得到265个非经典的剪接位点,这些剪接位点落在195个蛋白编码基因上。结论:应用生物信息学方法在果蝇中发现了上百个非经典剪接位点,为研究非经典剪接机制奠定了基础。  相似文献   

10.
基因表达过程主要包括转录、剪接和翻译,多种调控元件参与其中,是个高度调控的过程。建模识别分析这些调控元件,对理解基因表达具有重要意义。本研究提出了一个基于移动序列模式的短序列建模模型,并对转录启动子和剪接调控元件进行了建模分析。启动子是基因转录的核心调控元件,剪接调控元件参与调控剪接位点的识别。分类实验结果表明,该模型可有效识别转录启动子序列和剪接调控元件序列。并进一步利用该模型,建模分析已为生物实验验证的、会导致剪接影响的基因组变异,实验结果表明,该模型可有效预测基因组变异的剪接影响,进一步验证了该模型的有效性。  相似文献   

11.
Prediction of gene sequences and their exon-intron structure in large eukaryotic genomic sequences is one of the central problems of mathematical biology. Solving this problem involves, in particular, high-accuracy splice site recognition. Using statistical analysis of a splice site-containing human gene fragment database, some characteristic features were described for nucleotide sequences in the splicing site neighborhood, the frequencies of all nucleotides and dinucleotides were determined, and those with frequencies increased or decreased in comparison to a random sequence were identified. The results can be used in sequence annotation, splicing site prediction, and the recognition of the gene exon-intron structure.  相似文献   

12.
13.
A new method which predicts internal exon sequences in human DNA has been developed. The method is based on a splice site prediction algorithm that uses the linear discriminant function to combine information about significant triplet frequencies of various functional parts of splice site regions and preferences of oligonucleotides in protein coding and intron regions. The accuracy of our splice site recognition function is 97% for donor splice sites and 96% for acceptor splice sites. For exon prediction, we combine in a discriminant function the characteristics describing the 5'-intron region, donor splice site, coding region, acceptor splice site and 3'-intron region for each open reading frame flanked by GT and AG base pairs. The accuracy of precise internal exon recognition on a test set of 451 exon and 246693 pseudoexon sequences is 77% with a specificity of 79%. The recognition quality computed at the level of individual nucleotides is 89% for exon sequences and 98% for intron sequences. This corresponds to a correlation coefficient for exon prediction of 0.87. The precision of this approach is better than other methods and has been tested on a larger data set. We have also developed a means for predicting exon-exon junctions in cDNA sequences, which can be useful for selecting optimal PCR primers.  相似文献   

14.
The conformation of RNA sequences spanning five 3' splice sites and two 5' splice sites in adenovirus mRNA was probed by partial digestion with single-strand specific nucleases. Although cleavage of nucleotides near both 3' and 5' splice sites was observed, most striking was the preferential digestion of sequences near the 3' splice site. At each 3' splice site a region of very strong cleavage is observed at low concentrations of enzyme near the splice site consensus sequence or the upstream branch point consensus sequence. Additional sites of moderately strong cutting near the branch point consensus sequence were observed in those sequences where the splice site was the preferred target. Since recognition of the 3' splice site and branch site appear to be early events in mRNA splicing these observations may indicate that the local conformation of the splice site sequences may play a direct or indirect role in enhancing the accessibility of sequences important for splicing.  相似文献   

15.
Multiple splicing defects in an intronic false exon   总被引:18,自引:0,他引:18       下载免费PDF全文
  相似文献   

16.
Vertebrate internal exons are usually between 50 and 400 nt long; exons outside this size range may require additional exonic and/or intronic sequences to be spliced into the mature mRNA. The mouse polymeric immunoglobulin receptor gene has a 654 nt exon that is efficiently spliced into the mRNA. We have examined this exon to identify features that contribute to its efficient splicing despite its large size; a large constitutive exon has not been studied previously. We found that a strong 5′ splice site is necessary for this exon to be spliced intact, but the splice sites alone were not sufficient to efficiently splice a large exon. At least two exonic sequences and one evolutionarily conserved intronic sequence also contribute to recognition of this exon. However, these elements have redundant activities as they could only be detected in conjunction with other mutations that reduced splicing efficiency. Several mutations activated cryptic 5′ splice sites that created smaller exons. Thus, the balance between use of these potential sites and the authentic 5′ splice site must be modulated by sequences that repress or enhance use of these sites, respectively. Also, sequences that enhance cryptic splice site use must be absent from this large exon.  相似文献   

17.
18.
Pre-mRNA splicing is carried out by the spliceosome, which identifies exons and removes intervening introns. In vertebrates, most splice sites are initially recognized by the spliceosome across the exon, because most exons are small and surrounded by large introns. This gene architecture predicts that efficient exon recognition depends largely on the strength of the flanking 3' and 5' splice sites. However, it is unknown if the 3' or the 5' splice site dominates the exon recognition process. Here, we test the 3' and 5' splice site contributions towards efficient exon recognition by systematically replacing the splice sites of an internal exon with sequences of different splice site strengths. We show that the presence of an optimal splice site does not guarantee exon inclusion and that the best predictor for exon recognition is the sum of both splice site scores. Using a genome-wide approach, we demonstrate that the combined 3' and 5' splice site strengths of internal exons provide a much more significant separator between constitutive and alternative exons than either the 3' or the 5' splice site strength alone.  相似文献   

19.
20.
Splice site selection is a key element of pre-mRNA splicing and involves specific recognition of consensus sequences at the 5(') and 3(') splice sites. Evidently, the compliance of a given sequence with the consensus 5(') splice site sequence is not sufficient to define it as a functional 5(') splice site, because not all sequences that conform with the consensus are used for splicing. We have previously hypothesized that the necessity to avoid the inclusion of premature termination codons within mature mRNAs may serve as a criterion that differentiates normal 5(') splice sites from unused (latent) ones. We further provided experimental support to this idea, by analyzing the splicing of pre-mRNAs in which in-frame stop codons upstream of a latent 5(') splice site were mutated, and showing that splicing using the latent site is indeed activated by such mutations. Here we evaluate this hypothesis by a computerized survey for latent 5(') splice sites in 446 protein-coding human genes. This data set contains 2311 introns, in which we found 10490 latent 5(') splice sites. The utilization of 10045 (95.8%) of these sites for splicing would have led to the inclusion of an in-frame stop codon within the resultant mRNA. The validity of this finding is confirmed here by statistical analyses. This finding, together with our previous experimental results, invokes a nuclear scanning mechanism, as part of the splicing machine, which identifies in-frame stop codons within the pre-mRNA and prevents splicing that could lead to the formation of a prematurely terminated protein.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号