Preprocessing choices affect RNA velocity results for droplet scRNA-seq data |
| |
Authors: | Charlotte Soneson Avi Srivastava Rob Patro Michael B. Stadler |
| |
Affiliation: | 1. Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland ; 2. SIB Swiss Institute of Bioinformatics, Basel, Switzerland ; 3. New York Genome Center, New York, United States of America ; 4. Center for Genomics and Systems Biology, New York University, New York, United States of America ; 5. Department of Computer Science, University of Maryland, College Park, Maryland, United States of America ; 6. University of Basel, Basel, Switzerland ; Central South University, CHINA |
| |
Abstract: | Experimental single-cell approaches are becoming widely used for many purposes, including investigation of the dynamic behaviour of developing biological systems. Consequently, a large number of computational methods for extracting dynamic information from such data have been developed. One example is RNA velocity analysis, in which spliced and unspliced RNA abundances are jointly modeled in order to infer a ‘direction of change’ and thereby a future state for each cell in the gene expression space. Naturally, the accuracy and interpretability of the inferred RNA velocities depend crucially on the correctness of the estimated abundances. Here, we systematically compare five widely used quantification tools, in total yielding thirteen different quantification approaches, in terms of their estimates of spliced and unspliced RNA abundances in five experimental droplet scRNA-seq data sets. We show that there are substantial differences between the quantifications obtained from different tools, and identify typical genes for which such discrepancies are observed. We further show that these abundance differences propagate to the downstream analysis, and can have a large effect on estimated velocities as well as the biological interpretation. Our results highlight that abundance quantification is a crucial aspect of the RNA velocity analysis workflow, and that both the definition of the genomic features of interest and the quantification algorithm itself require careful consideration. |
| |
Keywords: | |
|
|