基于支持向量机的细菌基因组水平转移基因预测
DOI:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金资助项目(60671018, 60121101).


Support Vector Machine for Prediction of Horizontal Gene Transfers in Bacteria Genomes
Author:
Affiliation:

Fund Project:

This work was supported by a grant from The National Natural Science Foundation of China (60671018,60121101).

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    随着各种生物基因组序列测定工作的完成,大量的DNA 序列数据涌现出来,为研究在基因组中寻找水平转移基因提供了极大的便利. 将基因序列特征分析和支持向量机技术结合起来,通过分析基因序列的特征差异发现水平转移基因. 依据以前研究工作的基础,选取了绝对密码子使用频率(FCU)作为序列特征,主要因为它既包含了基因密码子使用偏性的信息,也包含了基因所编码蛋白的氨基酸组成信息,支持向量机利用这些信息进行水平转移基因分析和预测,可以提高预测的准确性. 另外,提出了基于分链的水平转移基因预测新方法,即将细菌基因组前导链和滞后链上的基因区别对待,分别进行水平转移基因预测. 结果显示,基本预测方法要优于目前预测结果最好的Tsirigos 等提出的基于八联核苷酸频率的打分算法,命中率的相对提高率最高达31.47%,而基于分链的方法对水平转移基因的预测取得了更好的结果.

    Abstract:

    Horizontal gene transfer (HGT), also Lateral gene transfer (LGT), is any process in which an organism transfers genetic material to another species that is not its offspring. With the increase of available genomic data, it has become more convenient to study the way to detect the genes, which are products of horizontal transfers among a given genome. There are few data about known horizontal gene transfers in three bacterium genomes under consideration, so the experiments, which simulated gene transfer by artificially inserting phage genes, were carried out. Combining the feature analysis methods of gene sequences with support vector machine (SVM), a novel method was developed for identifying horizontal gene transfers (HGT) in 3 fully sequenced bacterium genomes (Escherichia coli K12, Borrelia burgdorferi, Bacillus cereus ZK). According to our previous work, codon use frequency (FCU) was selected as the sequence feature, in respect that it is inherently the fusion of both codon usage bias and amino acid composition signals. In addition, another computational method was proposed considering strand asymmetry and predicting horizontal gene transfers of leading strand and lagging strand of genomes under consideration, respectively. To avoid the occasionality of simulating gene transfer through artificially inserting phage genes, 100 times of the transfer-and-recover experiment were repeated and arithmetic average of measurement for each genome being considered were reported to evaluate algorithm's performance. Ten-fold cross-validation was used for both parameter and accuracy estimation. The best results were obtained for C-Support Vector Classification (C-SVC) type by using the radial basis function kernel with γ=100, while for one-class SVM type the best performance was obtained using the polynomial kernel of three degree. The performance of the approach was compared with that of Tsirigos' method ,which is one of the best predictive approachs to date in detecting of horizontal transfer genes. Firstly, for the original method that did not consider the strand asymmetry, the C-SVC type has a high relative improvement(RI) of 31.47% on hit ratio for Escherichia coli K12, while the one-class SVM type has RI of 11.61% for Borrelia burgdorferi. Moreover, as theoretically expected, the method considering the strand asymmetry resulted in higher RI than the original method. In order to examine the approach's performance in detecting factual gene transfer events, the approach was applied in genome of Enterococcus faecalis V583. It is not only succeed in recovering all the seven factual horizontally transferred genes, also found that the whole segment from 7 kb upstream of gene EF2293 to 38 kb downstream of gene EF2299 was probably transferred into E. faecalis V583 genome simultaneously with the above seven genes.

    参考文献
    相似文献
    引证文献
引用本文

吴建盛,谢建明,周 童,翁建洪,孙啸.基于支持向量机的细菌基因组水平转移基因预测[J].生物化学与生物物理进展,2007,34(7):724-731

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2006-12-01
  • 最后修改日期:2007-05-16
  • 接受日期:
  • 在线发布日期: 2007-05-23
  • 出版日期: 2007-07-20