期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The functional role of sequentially neuromodulated synaptic plasticity in behavioural learning

Grace Wan Yu Ang Clara S. Tang Y. Audrey Hay Sara Zannone Ole Paulsen Claudia Clopath 《PLoS computational biology》2021,17(6)

To survive, animals have to quickly modify their behaviour when the reward changes. The internal representations responsible for this are updated through synaptic weight changes, mediated by certain neuromodulators conveying feedback from the environment. In previous experiments, we discovered a form of hippocampal Spike-Timing-Dependent-Plasticity (STDP) that is sequentially modulated by acetylcholine and dopamine. Acetylcholine facilitates synaptic depression, while dopamine retroactively converts the depression into potentiation. When these experimental findings were implemented as a learning rule in a computational model, our simulations showed that cholinergic-facilitated depression is important for reversal learning. In the present study, we tested the model’s prediction by optogenetically inactivating cholinergic neurons in mice during a hippocampus-dependent spatial learning task with changing rewards. We found that reversal learning, but not initial place learning, was impaired, verifying our computational prediction that acetylcholine-modulated plasticity promotes the unlearning of old reward locations. Further, differences in neuromodulator concentrations in the model captured mouse-by-mouse performance variability in the optogenetic experiments. Our line of work sheds light on how neuromodulators enable the learning of new contingencies. 相似文献

2.

Differential Reward Learning for Self and Others Predicts Self-Reported Altruism

Youngbin Kwak John Pearson Scott A. Huettel 《PloS one》2014,9(9)

In social environments, decisions not only determine rewards for oneself but also for others. However, individual differences in pro-social behaviors have been typically studied through self-report. We developed a decision-making paradigm in which participants chose from card decks with differing rewards for themselves and charity; some decks gave similar rewards to both, while others gave higher rewards for one or the other. We used a reinforcement-learning model that estimated each participant''s relative weighting of self versus charity reward. As shown both in choices and model parameters, individuals who showed relatively better learning of rewards for charity – compared to themselves – were more likely to engage in pro-social behavior outside of a laboratory setting indicated by self-report. Overall rates of reward learning, however, did not predict individual differences in pro-social tendencies. These results support the idea that biases toward learning about social rewards are associated with one''s altruistic tendencies. 相似文献

3.

Evidence of a diurnal rhythm in implicit reward learning

Alexis E. Whitton Malavika Mehta Manon L. Ironside Greg Murray Diego A. Pizzagalli 《Chronobiology international》2018,35(8):1104-1114

Many aspects of hedonic behavior, including self-administration of natural and drug rewards, as well as human positive affect, follow a diurnal cycle that peaks during the species-specific active period. This variation has been linked to circadian modulation of the mesolimbic dopamine system, and is hypothesized to serve an adaptive function by driving an organism to engage with the environment during times where the opportunity for obtaining rewards is high. However, relatively little is known about whether more complex facets of hedonic behavior – in particular, reward learning – follow the same diurnal cycle. The current study aimed to address this gap by examining evidence for diurnal variation in reward learning on a well-validated probabilistic reward learning task (PRT). PRT data from a large normative sample (N = 516) of non-clinical individuals, recruited across eight studies, were examined for the current study. The PRT uses an asymmetrical reinforcement ratio to induce a behavioral response bias, and reward learning was operationalized as the strength of this response bias across blocks of the task. Results revealed significant diurnal variation in reward learning, however in contrast to patterns previously observed in other aspects of hedonic behavior, reward learning was lowest in the middle of the day. Although a diurnal pattern was also observed on a measure of more general task performance (discriminability), this did not account for the variation observed in reward learning. Taken together, these findings point to a distinct diurnal pattern in reward learning that differs from that observed in other aspects of hedonic behavior. The results of this study have important implications for our understanding of clinical disorders characterized by both circadian and reward learning disturbances, and future research is needed to confirm whether this diurnal variation has a truly circadian origin. 相似文献

4.

Reinforcement Learning Using a Continuous Time Actor-Critic Framework with Spiking Neurons

Nicolas Frémaux Henning Sprekeler Wulfram Gerstner 《PLoS computational biology》2013,9(4)

相似文献

5.

Evolution of honest reward signal in flowers

Koichi Ito Miki F. Suzuki Ko Mochizuki 《Proceedings. Biological sciences / The Royal Society》2021,288(1943)

Some flowering plants signal the abundance of their rewards by changing their flower colour, scent or other floral traits as rewards are depleted. These floral trait changes can be regarded as honest signals of reward states for pollinators. Previous studies have hypothesized that these signals are used to maintain plant-level attractiveness to pollinators, but the evolutionary conditions leading to the development of honest signals have not been well investigated from a theoretical basis. We examined conditions leading to the evolution of honest reward signals in flowers by applying a theoretical model that included pollinator response and signal accuracy. We assumed that pollinators learn floral traits and plant locations in association with reward states and use this information to decide which flowers to visit. While manipulating the level of associative learning, we investigated optimal flower longevity, the proportion of reward and rewardless flowers, and honest- and dishonest-signalling strategies. We found that honest signals are evolutionarily stable only when flowers are visited by pollinators with both high and low learning abilities. These findings imply that behavioural variation in learning within a pollinator community can lead to the evolution of an honest signal even when there is no contribution of rewardless flowers to pollinator attractiveness. 相似文献

6.

Sub-Optimal Allocation of Time in Sequential Movements

Shih-Wei Wu Maria F. Dal Martello Laurence T. Maloney 《PloS one》2009,4(12)

The allocation of limited resources such as time or energy is a core problem that organisms face when planning complex actions. Most previous research concerning planning of movement has focused on the planning of single, isolated movements. Here we investigated the allocation of time in a pointing task where human subjects attempted to touch two targets in a specified order to earn monetary rewards. Subjects were required to complete both movements within a limited time but could freely allocate the available time between the movements. The time constraint presents an allocation problem to the subjects: the more time spent on one movement, the less time is available for the other. In different conditions we assigned different rewards to the two tokens. How the subject allocated time between movements affected their expected gain on each trial. We also varied the angle between the first and second movements and the length of the second movement. Based on our results, we developed and tested a model of speed-accuracy tradeoff for sequential movements. Using this model we could predict the time allocation that would maximize the expected gain of each subject in each experimental condition. We compared human performance with predicted optimal performance. We found that all subjects allocated time sub-optimally, spending more time than they should on the first movement even when the reward of the second target was five times larger than the first. We conclude that the movement planning system fails to maximize expected reward in planning sequences of as few as two movements and discuss possible interpretations drawn from economic theory. 相似文献

7.

Coexistence of Reward and Unsupervised Learning During the Operant Conditioning of Neural Firing Rates

Robert R. Kerr David B. Grayden Doreen A. Thomas Matthieu Gilson Anthony N. Burkitt 《PloS one》2014,9(1)

A fundamental goal of neuroscience is to understand how cognitive processes, such as operant conditioning, are performed by the brain. Typical and well studied examples of operant conditioning, in which the firing rates of individual cortical neurons in monkeys are increased using rewards, provide an opportunity for insight into this. Studies of reward-modulated spike-timing-dependent plasticity (RSTDP), and of other models such as R-max, have reproduced this learning behavior, but they have assumed that no unsupervised learning is present (i.e., no learning occurs without, or independent of, rewards). We show that these models cannot elicit firing rate reinforcement while exhibiting both reward learning and ongoing, stable unsupervised learning. To fix this issue, we propose a new RSTDP model of synaptic plasticity based upon the observed effects that dopamine has on long-term potentiation and depression (LTP and LTD). We show, both analytically and through simulations, that our new model can exhibit unsupervised learning and lead to firing rate reinforcement. This requires that the strengthening of LTP by the reward signal is greater than the strengthening of LTD and that the reinforced neuron exhibits irregular firing. We show the robustness of our findings to spike-timing correlations, to the synaptic weight dependence that is assumed, and to changes in the mean reward. We also consider our model in the differential reinforcement of two nearby neurons. Our model aligns more strongly with experimental studies than previous models and makes testable predictions for future experiments. 相似文献

8.

Humans can adopt optimal discounting strategy under real-time constraints

下载免费PDF全文

Schweighofer N Shishida K Han CE Okamoto Y Tanaka SC Yamawaki S Doya K 《PLoS computational biology》2006,2(11):e152

Critical to our many daily choices between larger delayed rewards, and smaller more immediate rewards, are the shape and the steepness of the function that discounts rewards with time. Although research in artificial intelligence favors exponential discounting in uncertain environments, studies with humans and animals have consistently shown hyperbolic discounting. We investigated how humans perform in a reward decision task with temporal constraints, in which each choice affects the time remaining for later trials, and in which the delays vary at each trial. We demonstrated that most of our subjects adopted exponential discounting in this experiment. Further, we confirmed analytically that exponential discounting, with a decay rate comparable to that used by our subjects, maximized the total reward gain in our task. Our results suggest that the particular shape and steepness of temporal discounting is determined by the task that the subject is facing, and question the notion of hyperbolic reward discounting as a universal principle. 相似文献

9.

Modular inverse reinforcement learning for visuomotor behavior

Constantin A. Rothkopf Dana H. Ballard 《Biological cybernetics》2013,107(4):477-490

In a large variety of situations one would like to have an expressive and accurate model of observed animal or human behavior. While general purpose mathematical models may capture successfully properties of observed behavior, it is desirable to root models in biological facts. Because of ample empirical evidence for reward-based learning in visuomotor tasks, we use a computational model based on the assumption that the observed agent is balancing the costs and benefits of its behavior to meet its goals. This leads to using the framework of reinforcement learning, which additionally provides well-established algorithms for learning of visuomotor task solutions. To quantify the agent’s goals as rewards implicit in the observed behavior, we propose to use inverse reinforcement learning, which quantifies the agent’s goals as rewards implicit in the observed behavior. Based on the assumption of a modular cognitive architecture, we introduce a modular inverse reinforcement learning algorithm that estimates the relative reward contributions of the component tasks in navigation, consisting of following a path while avoiding obstacles and approaching targets. It is shown how to recover the component reward weights for individual tasks and that variability in observed trajectories can be explained succinctly through behavioral goals. It is demonstrated through simulations that good estimates can be obtained already with modest amounts of observation data, which in turn allows the prediction of behavior in novel configurations. 相似文献

10.

Multiple reward signals in the brain

Schultz W 《Nature reviews. Neuroscience》2000,1(3):199-207

The fundamental biological importance of rewards has created an increasing interest in the neuronal processing of reward information. The suggestion that the mechanisms underlying drug addiction might involve natural reward systems has also stimulated interest. This article focuses on recent neurophysiological studies in primates that have revealed that neurons in a limited number of brain structures carry specific signals about past and future rewards. This research provides the first step towards an understanding of how rewards influence behaviour before they are received and how the brain might use reward information to control learning and goal-directed behaviour. 相似文献

11.

Neural correlates of variations in event processing during learning in central nucleus of amygdala

Calu DJ Roesch MR Haney RZ Holland PC Schoenbaum G 《Neuron》2010,68(5):991-1001

Attention or variations in event processing help drive learning. Lesion studies have implicated the central nucleus of the amygdala (CeA) in this process, particularly when expected rewards are omitted. However, lesion studies cannot specify how information processing in CeA supports such learning. To address these questions, we recorded CeA neurons in rats performing a task in which rewards were delivered or omitted unexpectedly. We found that activity in CeA neurons increased selectively at the time of omission and declined again with learning. Increased firing correlated with CeA-inactivation sensitive measures of attention. Notably CeA neurons did not fire to the cues or in response to unexpected rewards. These results indicate that CeA contributes to learning in response to reward omission due to a specific role in signaling actual omission rather than a more general involvement in signaling expectancies, errors, or reward value. 相似文献

12.

Credit Assignment during Movement Reinforcement Learning

Gregory Dam Konrad Kording Kunlin Wei 《PloS one》2013,8(2)

We often need to learn how to move based on a single performance measure that reflects the overall success of our movements. However, movements have many properties, such as their trajectories, speeds and timing of end-points, thus the brain needs to decide which properties of movements should be improved; it needs to solve the credit assignment problem. Currently, little is known about how humans solve credit assignment problems in the context of reinforcement learning. Here we tested how human participants solve such problems during a trajectory-learning task. Without an explicitly-defined target movement, participants made hand reaches and received monetary rewards as feedback on a trial-by-trial basis. The curvature and direction of the attempted reach trajectories determined the monetary rewards received in a manner that can be manipulated experimentally. Based on the history of action-reward pairs, participants quickly solved the credit assignment problem and learned the implicit payoff function. A Bayesian credit-assignment model with built-in forgetting accurately predicts their trial-by-trial learning. 相似文献

13.

Neural differentiation of expected reward and risk in human subcortical structures 总被引：6，自引：0，他引：6

Preuschoff K Bossaerts P Quartz SR 《Neuron》2006,51(3):381-390

In decision-making under uncertainty, economic studies emphasize the importance of risk in addition to expected reward. Studies in neuroscience focus on expected reward and learning rather than risk. We combined functional imaging with a simple gambling task to vary expected reward and risk simultaneously and in an uncorrelated manner. Drawing on financial decision theory, we modeled expected reward as mathematical expectation of reward, and risk as reward variance. Activations in dopaminoceptive structures correlated with both mathematical parameters. These activations differentiated spatially and temporally. Temporally, the activation related to expected reward was immediate, while the activation related to risk was delayed. Analyses confirmed that our paradigm minimized confounds from learning, motivation, and salience. These results suggest that the primary task of the dopaminergic system is to convey signals of upcoming stochastic rewards, such as expected reward and risk, beyond its role in learning, motivation, and salience. 相似文献

14.

Attribution and Expression of Incentive Salience Are Differentially Signaled by Ultrasonic Vocalizations in Rats

Juan C. Brenes Rainer K. W. Schwarting 《PloS one》2014,9(7)

During Pavlovian incentive learning, the affective properties of rewards are thought to be transferred to their predicting cues. However, how rewards are represented emotionally in animals is widely unknown. This study sought to determine whether 50-kHz ultrasonic vocalizations (USVs) in rats may signal such a state of incentive motivation to natural, nutritional rewards. To this end, rats learned to anticipate food rewards and, across experiments, the current physiological state (deprived vs. sated), the type of learning mechanism recruited (Pavlovian vs. instrumental), the hedonic properties of UCS (low vs. high palatable food), and the availability of food reward (continued vs. discontinued) were manipulated. Overall, we found that reward-cues elicited 50-kHz calls as they were signaling a putative affective state indicative of incentive motivation in the rat. Attribution and expression of incentive salience, however, seemed not to be an unified process, and could be teased apart in two different ways: 1) under high motivational state (i.e., hunger), the attribution of incentive salience to cues occurred without being expressed at the USVs level, if reward expectations were higher than the outcome; 2) in all experiments when food rewards were devalued by satiation, reward cues were still able to elicit USVs and conditioned anticipatory activity although reward seeking and consumption were drastically weakened. Our results suggest that rats are capable of representing rewards emotionally beyond apparent, immediate physiological demands. These findings may have translational potential in uncovering mechanisms underlying aberrant and persistent motivation as observed in drug addiction, gambling, and eating disorders. 相似文献

15.

Ventromedial prefrontal cortex activation is associated with memory formation for predictable rewards

Bialleck KA Schaal HP Kranz TA Fell J Elger CE Axmacher N 《PloS one》2011,6(2):e16695

During reinforcement learning, dopamine release shifts from the moment of reward consumption to the time point when the reward can be predicted. Previous studies provide consistent evidence that reward-predicting cues enhance long-term memory (LTM) formation of these items via dopaminergic projections to the ventral striatum. However, it is less clear whether memory for items that do not precede a reward but are directly associated with reward consumption is also facilitated. Here, we investigated this question in an fMRI paradigm in which LTM for reward-predicting and neutral cues was compared to LTM for items presented during consumption of reliably predictable as compared to less predictable rewards. We observed activation of the ventral striatum and enhanced memory formation during reward anticipation. During processing of less predictable as compared to reliably predictable rewards, the ventral striatum was activated as well, but items associated with less predictable outcomes were remembered worse than items associated with reliably predictable outcomes. Processing of reliably predictable rewards activated the ventromedial prefrontal cortex (vmPFC), and vmPFC BOLD responses were associated with successful memory formation of these items. Taken together, these findings show that consumption of reliably predictable rewards facilitates LTM formation and is associated with activation of the vmPFC. 相似文献

16.

Sequential Decisions: A Computational Comparison of Observational and Reinforcement Accounts

Nazanin Mohammadi Sepahvand Elisabeth St?ttinger James Danckert Britt Anderson 《PloS one》2014,9(4)

Right brain damaged patients show impairments in sequential decision making tasks for which healthy people do not show any difficulty. We hypothesized that this difficulty could be due to the failure of right brain damage patients to develop well-matched models of the world. Our motivation is the idea that to navigate uncertainty, humans use models of the world to direct the decisions they make when interacting with their environment. The better the model is, the better their decisions are. To explore the model building and updating process in humans and the basis for impairment after brain injury, we used a computational model of non-stationary sequence learning. RELPH (Reinforcement and Entropy Learned Pruned Hypothesis space) was able to qualitatively and quantitatively reproduce the results of left and right brain damaged patient groups and healthy controls playing a sequential version of Rock, Paper, Scissors. Our results suggests that, in general, humans employ a sub-optimal reinforcement based learning method rather than an objectively better statistical learning approach, and that differences between right brain damaged and healthy control groups can be explained by different exploration policies, rather than qualitatively different learning mechanisms. 相似文献

17.

Use of Flower Color-Cue Memory by Honey Bee Foragers Continues when Rewards No Longer Differ between Flower Colors

Marisol Amaya-Márquez Charles I. Abramson Harrington Wells 《Journal of Insect Behavior》2017,30(6):728-740

Honey bees (Hymenoptera: Apidae) were used as a model insect system to explore forager use of a learned color-cue memory over several subsequent days. Experiments used artificial flower patches of blue and white flowers. Two experiments were performed, each beginning with a learning experience where 2 M sucrose was present in one flower color and 1 M sucrose in the alternative flower color. The first experiment followed flower color fidelity over a series of sequential days when rewards no longer differed between flowers of different color. The second examined the effect of intervening days without the forager visiting the flower patch. Results showed that color-cue memory decline was not a passive time-decay process and that information update in honey bees does not occur readily without new experiences of difference in rewarding flowers. Further, although the color cue learned was associated with nectar reward in long term memory, it did not seem to be specifically associated with the 2 M sucrose nectar reward when intervening nights occurred between learning and revisiting the flower patch. 相似文献

18.

Biological auctions with multiple rewards

Johannes G. Reiter Ayush Kanodia Raghav Gupta Martin A. Nowak Krishnendu Chatterjee 《Proceedings. Biological sciences / The Royal Society》2015,282(1812)

The competition for resources among cells, individuals or species is a fundamental characteristic of evolution. Biological all-pay auctions have been used to model situations where multiple individuals compete for a single resource. However, in many situations multiple resources with various values exist and single reward auctions are not applicable. We generalize the model to multiple rewards and study the evolution of strategies. In biological all-pay auctions the bid of an individual corresponds to its strategy and is equivalent to its payment in the auction. The decreasingly ordered rewards are distributed according to the decreasingly ordered bids of the participating individuals. The reproductive success of an individual is proportional to its fitness given by the sum of the rewards won minus its payments. Hence, successful bidding strategies spread in the population. We find that the results for the multiple reward case are very different from the single reward case. While the mixed strategy equilibrium in the single reward case with more than two players consists of mostly low-bidding individuals, we show that the equilibrium can convert to many high-bidding individuals and a few low-bidding individuals in the multiple reward case. Some reward values lead to a specialization among the individuals where one subpopulation competes for the rewards and the other subpopulation largely avoids costly competitions. Whether the mixed strategy equilibrium is an evolutionarily stable strategy (ESS) depends on the specific values of the rewards. 相似文献

19.

Novelty is not surprise: Human exploratory and adaptive behavior in sequential decision-making

He A. Xu Alireza Modirshanechi Marco P. Lehmann Wulfram Gerstner Michael H. Herzog 《PLoS computational biology》2021,17(6)

Classic reinforcement learning (RL) theories cannot explain human behavior in the absence of external reward or when the environment changes. Here, we employ a deep sequential decision-making paradigm with sparse reward and abrupt environmental changes. To explain the behavior of human participants in these environments, we show that RL theories need to include surprise and novelty, each with a distinct role. While novelty drives exploration before the first encounter of a reward, surprise increases the rate of learning of a world-model as well as of model-free action-values. Even though the world-model is available for model-based RL, we find that human decisions are dominated by model-free action choices. The world-model is only marginally used for planning, but it is important to detect surprising events. Our theory predicts human action choices with high probability and allows us to dissociate surprise, novelty, and reward in EEG signals. 相似文献

20.

Modulation of caudate activity by action contingency 总被引：5，自引：0，他引：5

Tricomi EM Delgado MR Fiez JA 《Neuron》2004,41(2):281-292

Research has increasingly implicated the striatum in the processing of reward-related information in both animals and humans. However, it is unclear whether human striatal activation is driven solely by the hedonic properties of rewards or whether such activation is reliant on other factors, such as anticipation of upcoming reward or performance of an action to earn a reward. We used event-related functional magnetic resonance imaging to investigate hemodynamic responses to monetary rewards and punishments in three experiments that made use of an oddball paradigm. We presented reward and punishment displays randomly in time, following an anticipatory cue, or following a button press response. Robust and differential activation of the caudate nucleus occurred only when a perception of contingency existed between the button press response and the outcome. This finding suggests that the caudate is involved in reinforcement of action potentially leading to reward, rather than in processing reward per se. 相似文献