首页 | 本学科首页   官方微博 | 高级检索  
   检索      


Extensive Error in the Number of Genes Inferred from Draft Genome Assemblies
Authors:James F Denton  Jose Lugo-Martinez  Abraham E Tucker  Daniel R Schrider  Wesley C Warren  Matthew W Hahn
Institution:1.School of Informatics and Computing, Indiana University, Bloomington, Indiana;2.Department of Biology, Indiana University, Bloomington, Indiana;3.The Genome Institute at Washington University, Washington University School of Medicine, Saint Louis, Missouri;Center for Genomic Regulation, Spain
Abstract:Current sequencing methods produce large amounts of data, but genome assemblies based on these data are often woefully incomplete. These incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. In this paper we investigate the magnitude of the problem, both in terms of total gene number and the number of copies of genes in specific families. To do this, we compare multiple draft assemblies against higher-quality versions of the same genomes, using several new assemblies of the chicken genome based on both traditional and next-generation sequencing technologies, as well as published draft assemblies of chimpanzee. We find that upwards of 40% of all gene families are inferred to have the wrong number of genes in draft assemblies, and that these incorrect assemblies both add and subtract genes. Using simulated genome assemblies of Drosophila melanogaster, we find that the major cause of increased gene numbers in draft genomes is the fragmentation of genes onto multiple individual contigs. Finally, we demonstrate the usefulness of RNA-Seq in improving the gene annotation of draft assemblies, largely by connecting genes that have been fragmented in the assembly process.
Keywords:
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号