首页 | 本学科首页   官方微博 | 高级检索  
   检索      


The cognitive mechanisms of optimal sampling
Authors:Lea Stephen E G  McLaren Ian P L  Dow Susan M  Graft Donald A
Institution:a Psychology (CLES), University of Exeter, Washington Singer Laboratories, Exeter EX4 4QG, United Kingdom
b Bristol Zoo Gardens, Bristol BS8 3HA, United Kingdom
c STMicroelectronics, Schaumberg, IL 60173, USA
Abstract:How can animals learn the prey densities available in an environment that changes unpredictably from day to day, and how much effort should they devote to doing so, rather than exploiting what they already know? Using a two-armed bandit situation, we simulated several processes that might explain the trade-off between exploring and exploiting. They included an optimising model, dynamic backward sampling; a dynamic version of the matching law; the Rescorla-Wagner model; a neural network model; and ?-greedy and rule of thumb models derived from the study of reinforcement learning in artificial intelligence. Under conditions like those used in published studies of birds’ performance under two-armed bandit conditions, all models usually identified the more profitable source of reward, and did so more quickly when the reward probability differential was greater. Only the dynamic programming model switched from exploring to exploiting more quickly when available time in the situation was less. With sessions of equal length presented in blocks, a session-length effect was induced in some of the models by allowing motivational, but not memory, carry-over from one session to the next. The rule of thumb model was the most successful overall, though the neural network model also performed better than the remaining models.
Keywords:Foraging  matching law  Neural networks  Optimal sampling  Reinforcement learning  Rescorla-Wagner model  Two-armed bandit
本文献已被 ScienceDirect PubMed 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号