A note on the use of multiple linear regression in molecular ecology |
| |
Authors: | Timothy R Frasier |
| |
Institution: | Department of Biology, Saint Mary's University, Halifax, Nova Scotia, Canada |
| |
Abstract: | Multiple linear regression analyses (also often referred to as generalized linear models – GLMs, or generalized linear mixed models – GLMMs) are widely used in the analysis of data in molecular ecology, often to assess the relative effects of genetic characteristics on individual fitness or traits, or how environmental characteristics influence patterns of genetic differentiation. However, the coefficients resulting from multiple regression analyses are sometimes misinterpreted, which can lead to incorrect interpretations and conclusions within individual studies, and can propagate to wider‐spread errors in the general understanding of a topic. The primary issue revolves around the interpretation of coefficients for independent variables when interaction terms are also included in the analyses. In this scenario, the coefficients associated with each independent variable are often interpreted as the independent effect of each predictor variable on the predicted variable. However, this interpretation is incorrect. The correct interpretation is that these coefficients represent the effect of each predictor variable on the predicted variable when all other predictor variables are zero. This difference may sound subtle, but the ramifications cannot be overstated. Here, my goals are to raise awareness of this issue, to demonstrate and emphasize the problems that can result and to provide alternative approaches for obtaining the desired information. |
| |
Keywords: | coefficient interpretation regression statistics |
|
|