Learning Natural Selection from the Site Frequency Spectrum |
| |
Authors: | Roy Ronen Nitin Udpa Eran Halperin Vineet Bafna |
| |
Affiliation: | *Bioinformatics and Systems Biology Program, University of California, San Diego, California 92093;†The Blavatnik School of Computer Science and Department of Molecular Microbiology and Biotechnology, Tel-Aviv University, Tel-Aviv 69978, Israel, International Computer Science Institute, Berkeley, California 94704;‡Department of Computer Science and Engineering, University of California, San Diego, California 92093 |
| |
Abstract: | Genetic adaptation to external stimuli occurs through the combined action of mutation and selection. A central problem in genetics is to identify loci responsive to specific selective constraints. Many tests have been proposed to identify the genomic signatures of natural selection by quantifying the skew in the site frequency spectrum (SFS) under selection relative to neutrality. We build upon recent work that connects many of these tests under a common framework, by describing how selective sweeps affect the scaled SFS. We show that the specific skew depends on many attributes of the sweep, including the selection coefficient and the time under selection. Using supervised learning on extensive simulated data, we characterize the features of the scaled SFS that best separate different types of selective sweeps from neutrality. We develop a test, SFselect, that consistently outperforms many existing tests over a wide range of selective sweeps. We apply SFselect to polymorphism data from a laboratory evolution experiment of Drosophila melanogaster adapted to hypoxia and identify loci that strengthen the role of the Notch pathway in hypoxia tolerance, but were missed by previous approaches. We further apply our test to human data and identify regions that are in agreement with earlier studies, as well as many novel regions. |
| |
Keywords: | natural selection supervised learning site frequency spectrum |
|
|