首页 | 本学科首页   官方微博 | 高级检索  
   检索      


Resampling Nucleotide Sequences with Closest-Neighbor Trimming and Its Comparison to Other Methods
Authors:Kouki Yonezawa  Manabu Igarashi  Keisuke Ueno  Ayato Takada  Kimihito Ito
Institution:1. Department of Computer Bioscience, Nagahama Institute of Bio-science and Technology, Nagahama, Shiga-pref, Japan.; 2. Division of Bioinformatics, Hokkaido University Research Center for Zoonosis Control, Kita-ku, Sapporo, Japan.; 3. Division of Global Epidemiology, Hokkaido University Research Center for Zoonosis Control, Kita-ku, Sapporo, Japan.; George Mason University, United States of America,
Abstract:A large number of nucleotide sequences of various pathogens are available in public databases. The growth of the datasets has resulted in an enormous increase in computational costs. Moreover, due to differences in surveillance activities, the number of sequences found in databases varies from one country to another and from year to year. Therefore, it is important to study resampling methods to reduce the sampling bias. A novel algorithm–called the closest-neighbor trimming method–that resamples a given number of sequences from a large nucleotide sequence dataset was proposed. The performance of the proposed algorithm was compared with other algorithms by using the nucleotide sequences of human H3N2 influenza viruses. We compared the closest-neighbor trimming method with the naive hierarchical clustering algorithm and -medoids clustering algorithm. Genetic information accumulated in public databases contains sampling bias. The closest-neighbor trimming method can thin out densely sampled sequences from a given dataset. Since nucleotide sequences are among the most widely used materials for life sciences, we anticipate that our algorithm to various datasets will result in reducing sampling bias.
Keywords:
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号