Resolving confusion of tongues in statistics and machine learning: a primer for biologists and bioinformaticians期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Resolving confusion of tongues in statistics and machine learning: a primer for biologists and bioinformaticians

Authors:	van Iterson Maarten van Haagen Herman H H B M Goeman Jelle J

Affiliation:	Center for Human and Clinical Genetics, Leiden University Medical Center, The Netherlands. m.van_iterson.hg@lumc.nl

Abstract:	Bioinformatics is the field where computational methods from various domains have come together for analysis of biological data. Each domain has introduced its own specific jargon. However, in closely related domains, e.g. machine learning and statistics, concordant and discordant terminology occurs, the later can lead to confusion. This article aims to help solve the confusion of tongues arising from these two closely related domains, which are frequently used in bioinformatics. We provide a short summary of the most commonly applied machine learning and statistical approaches to data analysis in bioinformatics, i.e. classification and statistical hypothesis testing. We explain differences and similarities in common terminology used in various domains, such as precision, recall, sensitivity and true positive rate. This primer can serve as a guide to the terminology used in these fields.

Keywords:
本文献已被 PubMed 等数据库收录！