首页 | 本学科首页   官方微博 | 高级检索  
   检索      


Bagging GLM: Improved generalized linear model for the analysis of zero-inflated data
Authors:Takeshi Osawa  Hiromune Mitsuhashi  Yuta Uematsu  Atushi Ushimaru
Institution:aNational Institute for Agro-Environmental Science, 3-1-3, Kannondai, Tsukuba, Ibaraki, 305-8604, Japan;bThe Museum of Nature and Human Activities Hyogo, 6, Yayoigaoka, Sanda, Hyogo, 669-1546, Japan;cGraduate School of Human Development and Environment, Kobe University 3-11, Tsurukabuto, Nada-ku, Hyogo, 657-8501, Japan
Abstract:Species-occurrence data sets tend to contain a large proportion of zero values, i.e., absence values (zero-inflated). Statistical inference using such data sets is likely to be inefficient or lead to incorrect conclusions unless the data are treated carefully. In this study, we propose a new modeling method to overcome the problems caused by zero-inflated data sets that involves a regression model and a machine-learning technique. We combined a generalized liner model (GLM), which is widely used in ecology, and bootstrap aggregation (bagging), a machine-learning technique. We established distribution models of Vincetoxicum pycnostelma (a vascular plant) and Ninox scutulata (an owl), both of which are endangered and have zero-inflated distribution patterns, using our new method and traditional GLM and compared model performances. At the same time we modeled four theoretical data sets that contained different ratios of presence/absence values using new and traditional methods and also compared model performances. For distribution models, our new method showed good performance compared to traditional GLMs. After bagging, area under the curve (AUC) values were almost the same as with traditional methods, but sensitivity values were higher. Additionally, our new method showed high sensitivity values compared to the traditional GLM when modeling a theoretical data set containing a large proportion of zero values. These results indicate that our new method has high predictive ability with presence data when analyzing zero-inflated data sets. Generally, predicting presence data is more difficult than predicting absence data. Our new modeling method has potential for advancing species distribution modeling.
Keywords:Bootstrapping  Data mining  Machine leaning  Ninox scutulata  Regression model  Species distribution model  Vincetoxicum pycnostelma
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号