首页 | 本学科首页   官方微博 | 高级检索  
   检索      


Performance-gated deliberation: A context-adapted strategy in which urgency is opportunity cost
Authors:Maximilian Puelma Touzel  Paul Cisek  Guillaume Lajoie
Institution:1. Mila, Québec AI Institute, Montréal, Canada ; 2. Department of Computer Science & Operations Research, Université de Montréal, Montréal, Canada ; 3. Department of Neuroscience, Université de Montréal, Montréal, Canada ; 4. Department of Mathematics & Statistics, Université de Montréal, Montréal, Canada ; Ecole Normale Superieure, FRANCE
Abstract:Finding the right amount of deliberation, between insufficient and excessive, is a hard decision making problem that depends on the value we place on our time. Average-reward, putatively encoded by tonic dopamine, serves in existing reinforcement learning theory as the opportunity cost of time, including deliberation time. Importantly, this cost can itself vary with the environmental context and is not trivial to estimate. Here, we propose how the opportunity cost of deliberation can be estimated adaptively on multiple timescales to account for non-stationary contextual factors. We use it in a simple decision-making heuristic based on average-reward reinforcement learning (AR-RL) that we call Performance-Gated Deliberation (PGD). We propose PGD as a strategy used by animals wherein deliberation cost is implemented directly as urgency, a previously characterized neural signal effectively controlling the speed of the decision-making process. We show PGD outperforms AR-RL solutions in explaining behaviour and urgency of non-human primates in a context-varying random walk prediction task and is consistent with relative performance and urgency in a context-varying random dot motion task. We make readily testable predictions for both neural activity and behaviour.
Keywords:
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号