首页 | 本学科首页   官方微博 | 高级检索  
     


Prediction of enhancer-promoter interactions via natural language processing
Authors:Wanwen Zeng  Mengmeng Wu  Rui Jiang
Affiliation:1.MOE Key Laboratory of Bioinformatics; Bioinformatics Division and Center for Synthetic & Systems Biology,Beijing,China;2.Department of Automation,Tsinghua University,Beijing,China;3.Department of Computer Science,Tsinghua University,Beijing,China
Abstract:

Background

Precise identification of three-dimensional genome organization, especially enhancer-promoter interactions (EPIs), is important to deciphering gene regulation, cell differentiation and disease mechanisms. Currently, it is a challenging task to distinguish true interactions from other nearby non-interacting ones since the power of traditional experimental methods is limited due to low resolution or low throughput.

Results

We propose a novel computational framework EP2vec to assay three-dimensional genomic interactions. We first extract sequence embedding features, defined as fixed-length vector representations learned from variable-length sequences using an unsupervised deep learning method in natural language processing. Then, we train a classifier to predict EPIs using the learned representations in supervised way. Experimental results demonstrate that EP2vec obtains F1 scores ranging from 0.841~?0.933 on different datasets, which outperforms existing methods. We prove the robustness of sequence embedding features by carrying out sensitivity analysis. Besides, we identify motifs that represent cell line-specific information through analysis of the learned sequence embedding features by adopting attention mechanism. Last, we show that even superior performance with F1 scores 0.889~?0.940 can be achieved by combining sequence embedding features and experimental features.

Conclusions

EP2vec sheds light on feature extraction for DNA sequences of arbitrary lengths and provides a powerful approach for EPIs identification.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号