首页 | 本学科首页   官方微博 | 高级检索  
     


Automated tetraploid genotype calling by hierarchical clustering
Authors:Cari A. Schmitz Carley  Joseph J. Coombs  David S. Douches  Paul C. Bethke  Jiwan P. Palta  Richard G. Novy  Jeffrey B. Endelman
Affiliation:1.Department of Horticulture,University of Wisconsin,Madison,USA;2.Department of Plant, Soil and Microbial Sciences,Michigan State University,East Lansing,USA;3.USDA Agricultural Research Service,Madison,USA;4.USDA–ARS Small Grains and Potato Germplasm Research Unit,Aberdeen,USA
Abstract:

Key message

New software to make tetraploid genotype calls from SNP array data was developed, which uses hierarchical clustering and multiple F1 populations to calibrate the relationship between signal intensity and allele dosage.

Abstract

SNP arrays are transforming breeding and genetics research for autotetraploids. To fully utilize these arrays, the relationship between signal intensity and allele dosage must be calibrated for each marker. We developed an improved computational method to automate this process, which is provided as the R package ClusterCall. In the training phase of the algorithm, hierarchical clustering within an F1 population is used to group samples with similar intensity values, and allele dosages are assigned to clusters based on expected segregation ratios. In the prediction phase, multiple F1 populations and the prediction set are clustered together, and the genotype for each cluster is the mode of the training set samples. A concordance metric, defined as the proportion of training set samples equal to the mode, can be used to eliminate unreliable markers and compare different algorithms. Across three potato families genotyped with an 8K SNP array, ClusterCall scored 5729 markers with at least 0.95 concordance (94.6% of its total), compared to 5325 with the software fitTetra (82.5% of its total). The three families were used to predict genotypes for 5218 SNPs in the SolCAP diversity panel, compared with 3521 SNPs in a previous study in which genotypes were called manually. One of the additional markers produced a significant association for vine maturity near a well-known causal locus on chromosome 5. In conclusion, when multiple F1 populations are available, ClusterCall is an efficient method for accurate, autotetraploid genotype calling that enables the use of SNP data for research and plant breeding.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号