Resolving allele dosage in duplicated loci using genotyping‐by‐sequencing data: A path forward for population genetic analysis |
| |
Authors: | Garrett J. McKinney Ryan K. Waples Carita E. Pascal Lisa W. Seeb James E. Seeb |
| |
Affiliation: | School of Aquatic and Fishery Sciences, University of Washington, Seattle, WA, USA |
| |
Abstract: | Whole‐genome duplications have occurred in the recent ancestors of many plants, fish and amphibians. Signals of these whole‐genome duplications still exist in the form of paralogous loci. Recent advances have allowed reliable identification of paralogs in genotyping‐by‐sequencing (GBS) data such as that generated from restriction‐site‐associated DNA sequencing (RADSeq); however, excluding paralogs from analyses is still routine due to difficulties in genotyping. This exclusion of paralogs may filter a large fraction of loci, including loci that may be adaptively important or informative for population genetic analyses. We present a maximum‐likelihood method for inferring allele dosage in paralogs and assess its accuracy using simulated GBS, empirical RADSeq and amplicon sequencing data from Chinook salmon. We accurately infer allele dosage for some paralogs from a RADSeq data set and show how accuracy is dependent upon both read depth and allele frequency. The amplicon sequencing data set, using RADSeq‐derived markers, achieved sufficient depth to infer allele dosage for all paralogs. This study demonstrates that RADSeq locus discovery combined with amplicon sequencing of targeted loci is an effective method for incorporating paralogs into population genetic analyses. |
| |
Keywords: | amplicon sequencing genome duplication genotyping paralog RADSeq |
|
|