首页 | 本学科首页   官方微博 | 高级检索  
   检索      


Construction of a virtual Mycobacterium tuberculosis consensus genome and its application to data from a next generation sequencer
Authors:Kayo Okumura  Masako Kato  Teruo Kirikae  Mitsunori Kayano  Tohru Miyoshi-Akiyama
Institution:.Department of Animal and Food Hygiene, Obihiro University of Agriculture and Veterinary Medicine, Inada-cho, Obihiro, Hokkaido 080-8555 Japan ;.Department of Infectious Diseases, National Center for Global Health and Medicine, 1-21-1, Shinjuku-ku, Tokyo, 162-8655 Japan
Abstract:

Background

Although Mycobacterium tuberculosis isolates are consisted of several different lineages and the epidemiology analyses are usually assessed relative to a particular reference genome, M. tuberculosis H37Rv, which might introduce some biased results. Those analyses are essentially based genome sequence information of M. tuberculosis and could be performed in sillico in theory, with whole genome sequence (WGS) data available in the databases and obtained by next generation sequencers (NGSs). As an approach to establish higher resolution methods for such analyses, whole genome sequences of the M. tuberculosis complexes (MTBCs) strains available on databases were aligned to construct virtual reference genome sequences called the consensus sequence (CS), and evaluated its feasibility in in sillico epidemiological analyses.

Results

The consensus sequence (CS) was successfully constructed and utilized to perform phylogenetic analysis, evaluation of read mapping efficacy, which is crucial for detecting single nucleotide polymorphisms (SNPs), and various MTBC typing methods virtually including spoligotyping, VNTR, Long sequence polymorphism and Beijing typing. SNPs detected based on CS, in comparison with H37Rv, were utilized in concatemer-based phylogenetic analysis to determine their reliability relative to a phylogenetic tree based on whole genome alignment as the gold standard. Statistical comparison of phylogenic trees based on CS with that of H37Rv indicated the former showed always better results that that of later. SNP detection and concatenation with CS was advantageous because the frequency of crucial SNPs distinguishing among strain lineages was higher than those of H37Rv. The number of SNPs detected was lower with the consensus than with the H37Rv sequence, resulting in a significant reduction in computational time. Performance of each virtual typing was satisfactory and accorded with those published when those are available.

Conclusions

These results indicated that virtual CS constructed from genome sequence data is an ideal approach as a reference for MTBC studies.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1368-9) contains supplementary material, which is available to authorized users.
Keywords:Mycobacterium tuberculosis  Consensus sequence  Virtual typing  Phylogenetic analysis  SNP concatemer
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号