Automatic Scoring and Quality Assessment Using Accuracy Bounds for FP-TDI SNP Genotyping Data |
| |
Authors: | Maik Kschischo Rainer Kern Christian Gieger Martin Steinhauser Dr Ralf Tolle |
| |
Affiliation: | University of Applied Sciences Koblenz, RheinAhrCampus, Remagen, Germany. |
| |
Abstract: | BACKGROUND: Human diversity, namely single nucleotide polymorphisms (SNPs), is becoming a focus of biomedical research. Despite the binary nature of SNP determination, the majority of genotyping assay data need a critical evaluation for genotype calling. We applied statistical models to improve the automated analysis of 2-dimensional SNP data. METHODS: We derived several quantities in the framework of Gaussian mixture models that provide figures of merit to objectively measure the data quality. The accuracy of individual observations is scored as the probability of belonging to a certain genotype cluster, while the assay quality is measured by the overlap between the genotype clusters. RESULTS: The approach was extensively tested with a dataset of 438 nonredundant SNP assays comprising >150,000 datapoints. The performance of our automatic scoring method was compared with manual assignments. The agreement for the overall assay quality is remarkably good, and individual observations were scored differently by man and machine in 2.6% of cases, when applying stringent probability threshold values. CONCLUSION: Our definition of bounds for the accuracy for complete assays in terms of misclassification probabilities goes beyond other proposed analysis methods. We expect the scoring method to minimise human intervention and provide a more objective error estimate in genotype calling. |
| |
Keywords: | |
本文献已被 PubMed SpringerLink 等数据库收录! |
|