首页 | 本学科首页   官方微博 | 高级检索  
   检索      


Inference of global HIV-1 sequence patterns and preliminary feature analysis
Authors:Yan Wang  Reda Rawi  Daniel Hoffmann  Binlian Sun  Rongge Yang
Institution:1.AIDS and HIV Research Group, State Key Laboratory of Virology, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan 430071, China2.Research Group for Bioinformatics, Center for Medical Biology, University of Duisburg-Essen, Essen 45141, Germany
Abstract:The epidemiology of HIV-1 varies in different areas of the world, and it is possible that this complexity may leave unique footprints in the viral genome. Thus, we attempted to find significant patterns in global HIV-1 genome sequences. By applying the rule inference algorithm RIPPER (Repeated Incremental Pruning to Produce Error Reduction) to multiple sequence alignments of Env sequences from four classes of compiled datasets, we generated four sets of signature patterns. We found that these patterns were able to distinguish southeastern Asian from nonsoutheastern Asian sequences with 97.5% accuracy, Chinese from non-Chinese sequences with 98.3% accuracy, African from non-African sequences with 88.4% accuracy, and southern African from non-southern African sequences with 91.2% accuracy. These patterns showed different associations with subtypes and with amino acid positions. In addition, some signature patterns were characteristic of the geographic area from which the sample was taken. Amino acid features corresponding to the phylogenetic clustering of HIV-1 sequences were consistent with some of the deduced patterns. Using a combination of patterns inferred from subtypes B, C, and all subtypes chimeric with CRF01_AE worldwide, we found that signature patterns of subtype C were extremely common in some sampled countries (for example, Zambia in southern Africa), which may hint at the origin of this HIV-1 subtype and the need to pay special attention to this area of Africa. Signature patterns of subtype B sequences were associated with different countries. Even more, there are distinct patterns at single position 21 with glycine, leucine and isoleucine corresponding to subtype C, B and all possible recombination forms chimeric with CRF01_AE, which also indicate distinct geographic features. Our method widens the scope of inference of signature from geographic, genetic, and genomic viewpoints. These findings may provide a valuable reference for epidemiological research or vaccine design.
Keywords:Pattern inference  global HIV-1 sequence  Repeated Incremental Pruning to Produce Error Reduction (RIPPER)
本文献已被 SpringerLink 等数据库收录!
点击此处可从《中国病毒学》浏览原始摘要信息
点击此处可从《中国病毒学》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号