首页 | 本学科首页   官方微博 | 高级检索  
   检索      


Locating rearrangement events in a phylogeny based on highly fragmented assemblies
Authors:Zheng  Chunfang  Sankoff  David
Institution:1. Unité de Virologie Médicale, Institut Pasteur de Dakar, 36 Avenue Pasteur, B.P. 220, Dakar, Sénégal
2. Clermont Université, Université Blaise Pascal, Laboratoire Microorganismes, Génome et Environnement, UMR 6023, CNRS, 63177, Aubière, France
3. Laboratoire de Parasitologie Générale, Département de Biologie Animale, Faculté des Sciences et Technologies, Université Cheikh Anta Diop, Dakar, Sénégal
4. Unité de Bioinformatique Structurale, UMR 3528 CNRS, Institut Pasteur, 25-28, rue du Dr Roux, 75015, Paris, France
5. Laboratoire de Météorologie Physique, OPGC UMR 6016 CNRS-Université Blaise Pascal, 24 Avenue des Landais, 63177, Aubière Cedex, France
6. CIRAD, UMR 17, Cirad-Ird, TA-A17/G, Campus International de Baillarguet, 34398, Montpellier, France
7. Section of Infectious Disease and Department of Microbial Pathogenesis, Winchester Building WWW403D, Yale School of Medicine, 15 York St., New Haven, CT, 06520, USA
8. Institut de Recherche en Cancérologie de Montpellier, IRCM - INSERM U1194 & Université de Montpellier & ICM, Institut régional du Cancer Montpellier, Campus Val d’Aurelle, 34298, Montpellier cedex 5, France
9. Institut de Biologie Computationnelle, IBC, Campus Saint Priest, 34090, Montpellier, France
Abstract:Background

The inference of genome rearrangement operations requires complete genome assemblies as input data, since a rearrangement can involve an arbitrarily large proportion of one or more chromosomes. Most genome sequence projects, especially those on non-model organisms for which no physical map exists, produce very fragmented assembles, so that a rearranged fragment may be impossible to identify because its two endpoints are on different scaffolds. However, breakpoints are easily identified, as long as they do not coincide with scaffold ends. For the phylogenetic context, in comparing a fragmented assembly with a number of complete assemblies, certain combinatorial constraints on breakpoints can be derived. We ask to what extent we can use breakpoint data between a fragmented genome and a number of complete genomes to recover all the arrangements in a phylogeny.

Results

We simulate genomic evolution via chromosomal inversion, fragmenting one of the genomes into a large number of scaffolds to represent the incompleteness of assembly. We identify all the breakpoints between this genome and the remainder. We devise an algorithm which takes these breakpoints into account in trying to determine on which branch of the phylogeny a rearrangement event occurred. We present an analysis of the dependence of recovery rates on scaffold size and rearrangement rate, and show that the true tree, the one on which the rearrangement simulation was performed, tends to be most parsimonious in estimating the number of true events inferred.

Conclusions

It is somewhat surprising that the breakpoints identified just between the fragmented genome and each of the others suffice to recover most of the rearrangements produced by the simulations. This holds even in parts of the phylogeny disjoint from the lineage of the fragmented genome.

Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号