首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
人类polⅡ启动子的识别   总被引:12,自引:2,他引:12       下载免费PDF全文
依据基因启动子区和非启动子区碱基分布的特征,应用基于多样性增量的二次判别分析 (IDQD),对人类polⅡ启动子进行识别,识别精度达到90%以上的水平,优于其他已发表的 (包括SVM分类器等) 识别算法. 使用IDQD算法也能对转录起始位点 (TSS) 进行较准确的预测,10-fold交叉检验结果的敏感性和特异性分别为86%和91%. 这些结果表明IDQD是一个有效的分类器.  相似文献   

2.
刘佳  蔡禄  邢永强 《生物信息学》2010,8(4):341-343,346
蛋白质是一切生命活动的物质基础,研究蛋白质的相互作用有助于理解生物过程的分子机制,阐明疾病的分子机理。本文依据蛋白质序列组分特征,应用基于多样性增量的二次判别分析方法,对人类的1 963对蛋白质相互作用进行了预测。自洽检验的各项预测指标均在79%以上,且交叉检验的总精度也大于60%,表明本算法可以用于蛋白质相互作用预测。  相似文献   

3.
Zhang L  Luo L 《Nucleic acids research》2003,31(21):6214-6220
Based on the conservation of nucleotides at splicing sites and the features of base composition and base correlation around these sites we use the method of increment of diversity combined with quadratic discriminant analysis (IDQD) to study the dependence structure of splicing sites and predict the exons/introns and their boundaries for four model genomes: Caenorhabditis elegans, Arabidopsis thaliana, Drosophila melanogaster and human. The comparison of compositional features between two sequences and the comparison of base dependencies at adjacent or non-adjacent positions of two sequences can be integrated automatically in the increment of diversity (ID). Eight feature variables around a potential splice site are defined in terms of ID. They are integrated in a single formal framework given by IDQD. In our calculations 7 (8) base region around the donor (acceptor) sites have been considered in studying the conservation of nucleotides and sequences of 48 bp on either side of splice sites have been used in studying the compositional and base-correlating features. The windows are enlarged to 16 (donor), 29 (acceptor) and 80 bp (either side) to improve the prediction for human splice sites. The prediction capability of the present method is comparable with the leading splice site detector—GeneSplicer.  相似文献   

4.
5.
多样性指标用于基因中剪切位点的识别   总被引:4,自引:0,他引:4       下载免费PDF全文
根据基因剪切位点处的碱基保守性特征,和附近位点的碱基组成和关联特征,应用多样性指标和二次判别分析,对几类模式生物的基因结构进行统一的分析和预测,能够较好地识别外显子和内含子及其边界.计算结果表明,对于4类物种,线虫(C.elegans),拟南芥(A.thaliana), 果蝇(D.melanogaster)和人类(human),核苷酸水平的识别精度为92.5%~97.1%,外显子水平的识别敏感性为83.7%~94.5%,特异性为87.8%~97.1%.预测能力优于GeneSplicer等剪切位点检测软件.  相似文献   

6.
7.
A new method (MZEF) for predicting internal coding exons in genomic DNA sequences has been developed. This method is based on a prediction algorithm that uses the quadratic discriminant function for multivariate statistical pattern recognition. With improved feature measures, an Arabidopsis thaliana-specific implementation of MZEF is completed and made available to the plant genome community.  相似文献   

8.
9.
10.
11.
根据核小体定位序列和缺失序列的碱基分布特征,应用多样性增量二次判别方法(IDQD)构建模型对这两类序列进行了区分,受试者操作特性曲线下的面积达到了0.958.应用这一模型研究了核小体在人类基因组剪接位点(GT/AG)邻近序列中的分布方式,发现外显子所对应的DNA序列通常倾向参与核小体的形成,并且由它所转录的RNA统计上具有较强的刚性,而剪接位点及其邻近的内含子对应的DNA序列则避免参与核小体的形成,所转录的RNA统计上具有较强的柔性.进一步还发现,DNA序列的核小体定位/缺失和RNA的刚性/柔性具有统计相关性,为从机制上解释为何前体RNA剪接事件与DNA序列中的核小体定位信息有关提供了依据.  相似文献   

12.
13.
基于果蝇polⅡ启动予的序列特征,利用结合了离散增量和位置权重矩阵的贝叶斯判别函数对果蝇启动予进行了预测。对预测算法进行10交叉检验。通过比较不同大小训练集对结果的影响,说明了参数选取的合理性和算法的预测能力。同时比较了不同参数的选取对预测结果的影响,从而获得最佳启动子预测结果。预测结果显示成功率达到93%,相互关联系数达到83%。  相似文献   

14.
Li X  Zeng J  Yan H 《Bioinformation》2008,2(9):373-378
  相似文献   

15.
DNA甲基化作为直接作用于DNA序列的一种表观遗传修饰,能够在不改变DNA分子一级结构的情况下影响基因表达,在生命活动中扮演着重要的角色.在哺乳动物中,DNA甲基化主要发生在C_pG二核苷酸的胞嘧啶上,并且在基因组中呈现不均匀分布.准确预测DNA甲基化位点有助于阐明DNA甲基化对基因表达的调控作用,并为肿瘤的早期诊断及治疗提供新的依据.本文应用离散增量结合二次判别分析的方法,对人类的C_pG二核苷酸甲基化状态进行了识别.5折交叉检验的整体准确率超过了80%,受试者操作特性曲线面积也达到了0.86.与现有方法相比,预测成功率显著提高.这说明离散增量结合二次判别分析方法适用于甲基化位点的预测;基因组序列中甲基化位点具有序列依赖性.  相似文献   

16.
组蛋白H2A的变体H2A.Z在基因的表达过程中发挥着重要的作用。根据H2A.Z和H2A核小体中组蛋白甲基化修饰方式的不同,作者应用多样性增量二次判别方法(increment of diversity with quadratic discriminant,IDQD)成功地对H2A.Z和H2A核小体进行了识别,说明了以组蛋白甲基化信息作为特征参数的IDQD模型对H2A.Z和H2A核小体识别的有效性。通过计算DNA序列的柔性,发现H2A.Z核小体对应的DNA序列的平均柔性比常规H2A核小体对应的DNA序列的平均柔性弱。  相似文献   

17.
The conotoxin proteins are disulfide rich small peptides that target ion channels and G protein coupled receptors. And they provide promising application in treating some chronic pain, epilepsy, cardiovascular diseases, and so on. Conotoxins may be classified into 11 superfamilies: A, D, I1, I2, J, L, M, O, P, S, and T according to the disulfide connectivity, highly conserved N-terminal precursor sequence and similar mode of actions. Successful prediction mature conotoxin superfamily peptide has important signification for the biological and pharmacological functions of the toxins. In this study, a new algorithm of increment of diversity combined with modified Mahalanobis discriminant is presented to predict five superfamilies by using the pseudo amino acid composition. The results of jackknife cross-validation test show that the overall prediction sensitivity and specificity are 88% and 91%, respectively. The predictive algorithm is also used to predict three O-conotoxin families. The 72% sensitivity and 78% specificity are obtained. These results indicate that the conotoxin superfamily peptides correlate with their amino acid compositions.  相似文献   

18.
There are selection methods available that allow the optimisation of genetic contributions of selection candidates for maximising the rate of genetic gain while restricting the rate of inbreeding. These methods imply selection on quadratic indices as the selection merit of a particular individual is a quadratic function of its estimated breeding value. This study provides deterministic predictions of genetic gain from selection on quadratic indices for a given set of resources (the number of candidates), heritability, and target rate of inbreeding. The rate of gain was obtained as a function of the accuracy of the Mendelian sampling term at the time of convergence of long-term contributions of selected candidates and the theoretical ideal rate of gain for a given rate of inbreeding after an exact allocation of long-term contributions to Mendelian sampling terms. The expected benefits from quadratic indices over traditional linear indices (i.e. truncation selection), both using BLUP breeding values, were quantified. The results clearly indicate higher gains from quadratic optimisation than from truncation selection. With constant rate of inbreeding and number of candidates, the benefits were generally largest for intermediate heritabilities but evident over the entire range. The advantage of quadratic indices was not highly sensitive to the rate of inbreeding for the constraints considered.  相似文献   

19.
20.
Although a number of bacterial gene-finding programs have been developed, there is still room for improvement especially in the area of correctly detecting translation start sites. We developed a novel bacterial gene-finding program named GeneHacker Plus. Like many others, it is based on a hidden Markov model (HMM) with duration. However, it is a 'local' model in the sense that the model starts from the translation control region and ends at the stop codon of a coding region. Multiple coding regions are identified as partial paths, like local alignments in the Smith-Waterman algorithm, regardless of how they overlap. Moreover, our semiautomatic procedure for constructing the model of the translation control region allows the inclusion of an additional conserved element as well as the ribosome-binding site. We confirmed that GeneHacker Plus is one of the most accurate programs in terms of both finding potential coding regions and precisely locating translation start sites. GeneHacker Plus is also equipped with an option where the results from database homology searches are directly embedded in the HMM. Although this option does not raise the overall predictability, labeled similarity information can be of practical use. GeneHacker Plus can be accessed freely at http://elmo.ims.u-tokyo.ac.jp/GH/.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号