首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 796 毫秒
1.
To accurately predict rewards associated with states or actions, the variability of observations has to be taken into account. In particular, when the observations are noisy, the individual rewards should have less influence on tracking of average reward, and the estimate of the mean reward should be updated to a smaller extent after each observation. However, it is not known how the magnitude of the observation noise might be tracked and used to control prediction updates in the brain reward system. Here, we introduce a new model that uses simple, tractable learning rules that track the mean and standard deviation of reward, and leverages prediction errors scaled by uncertainty as the central feedback signal. We show that the new model has an advantage over conventional reinforcement learning models in a value tracking task, and approaches a theoretic limit of performance provided by the Kalman filter. Further, we propose a possible biological implementation of the model in the basal ganglia circuit. In the proposed network, dopaminergic neurons encode reward prediction errors scaled by standard deviation of rewards. We show that such scaling may arise if the striatal neurons learn the standard deviation of rewards and modulate the activity of dopaminergic neurons. The model is consistent with experimental findings concerning dopamine prediction error scaling relative to reward magnitude, and with many features of striatal plasticity. Our results span across the levels of implementation, algorithm, and computation, and might have important implications for understanding the dopaminergic prediction error signal and its relation to adaptive and effective learning.  相似文献   

2.
3.
According to a prominent view of sensorimotor processing in primates, selection and specification of possible actions are not sequential operations. Rather, a decision for an action emerges from competition between different movement plans, which are specified and selected in parallel. For action choices which are based on ambiguous sensory input, the frontoparietal sensorimotor areas are considered part of the common underlying neural substrate for selection and specification of action. These areas have been shown capable of encoding alternative spatial motor goals in parallel during movement planning, and show signatures of competitive value-based selection among these goals. Since the same network is also involved in learning sensorimotor associations, competitive action selection (decision making) should not only be driven by the sensory evidence and expected reward in favor of either action, but also by the subject''s learning history of different sensorimotor associations. Previous computational models of competitive neural decision making used predefined associations between sensory input and corresponding motor output. Such hard-wiring does not allow modeling of how decisions are influenced by sensorimotor learning or by changing reward contingencies. We present a dynamic neural field model which learns arbitrary sensorimotor associations with a reward-driven Hebbian learning algorithm. We show that the model accurately simulates the dynamics of action selection with different reward contingencies, as observed in monkey cortical recordings, and that it correctly predicted the pattern of choice errors in a control experiment. With our adaptive model we demonstrate how network plasticity, which is required for association learning and adaptation to new reward contingencies, can influence choice behavior. The field model provides an integrated and dynamic account for the operations of sensorimotor integration, working memory and action selection required for decision making in ambiguous choice situations.  相似文献   

4.
A neural network model of how dopamine and prefrontal cortex activity guides short- and long-term information processing within the cortico-striatal circuits during reward-related learning of approach behavior is proposed. The model predicts two types of reward-related neuronal responses generated during learning: (1) cell activity signaling errors in the prediction of the expected time of reward delivery and (2) neural activations coding for errors in the prediction of the amount and type of reward or stimulus expectancies. The former type of signal is consistent with the responses of dopaminergic neurons, while the latter signal is consistent with reward expectancy responses reported in the prefrontal cortex. It is shown that a neural network architecture that satisfies the design principles of the adaptive resonance theory of Carpenter and Grossberg (1987) can account for the dopamine responses to novelty, generalization, and discrimination of appetitive and aversive stimuli. These hypotheses are scrutinized via simulations of the model in relation to the delivery of free food outside a task, the timed contingent delivery of appetitive and aversive stimuli, and an asymmetric, instructed delay response task.  相似文献   

5.
MOTIVATION: A model for learning potential causes of toxicity from positive and negative examples and predicting toxicity for the dataset used in the Predictive Toxicology Challenge (PTC) is presented. The learning model assumes that the causes of toxicity can be given as substructures common to positive examples that are not substructures of negative examples. This assumption results in the choice of a learning model, called the JSM-method, and a language for representing chemical compounds, called the Fragmentary Code of Substructure Superposition (FCSS). By means of the latter, chemical compounds are represented as sets of substructures which are 'biologically meaningful' from the expert point of view. RESULTS: The chosen learning model and representation language show comparatively good performance for the PTC dataset: for three sex/species groups the predictions were ROC optimal, for one group the prediction was nearly optimal. The predictions tend to be conservative (few predictions and almost no errors), which can be explained by the specific features of the learning model. AVAILABILITY: by request to finn@viniti.ru; serge@viniti.ru, http://ki-www2.intellektik.informatik.tu-darmstadt.de/~jsm/QDA.  相似文献   

6.
Voluntary motor commands produce two kinds of consequences. Initially, a sensory consequence is observed in terms of activity in our primary sensory organs (e.g., vision, proprioception). Subsequently, the brain evaluates the sensory feedback and produces a subjective measure of utility or usefulness of the motor commands (e.g., reward). As a result, comparisons between predicted and observed consequences of motor commands produce two forms of prediction error. How do these errors contribute to changes in motor commands? Here, we considered a reach adaptation protocol and found that when high quality sensory feedback was available, adaptation of motor commands was driven almost exclusively by sensory prediction errors. This form of learning had a distinct signature: as motor commands adapted, the subjects altered their predictions regarding sensory consequences of motor commands, and generalized this learning broadly to neighboring motor commands. In contrast, as the quality of the sensory feedback degraded, adaptation of motor commands became more dependent on reward prediction errors. Reward prediction errors produced comparable changes in the motor commands, but produced no change in the predicted sensory consequences of motor commands, and generalized only locally. Because we found that there was a within subject correlation between generalization patterns and sensory remapping, it is plausible that during adaptation an individual''s relative reliance on sensory vs. reward prediction errors could be inferred. We suggest that while motor commands change because of sensory and reward prediction errors, only sensory prediction errors produce a change in the neural system that predicts sensory consequences of motor commands.  相似文献   

7.
Midbrain dopamine neurons encode a quantitative reward prediction error signal   总被引:15,自引:0,他引:15  
Bayer HM  Glimcher PW 《Neuron》2005,47(1):129-141
  相似文献   

8.
Braver TS  Brown JW 《Neuron》2003,38(2):150-152
Accumulating evidence from nonhuman primates suggests that midbrain dopamine cells code reward prediction errors and that this signal subserves reward learning in dopamine-receiving brain structures. In this issue of Neuron, McClure et al. and O'Doherty et al. use event-related fMRI to provide some of the strongest evidence to date that the reward prediction error model of dopamine system activity applies equally well to human reward learning.  相似文献   

9.
Human behavior displays hierarchical structure: simple actions cohere into subtask sequences, which work together to accomplish overall task goals. Although the neural substrates of such hierarchy have been the target of increasing research, they remain poorly understood. We propose that the computations supporting hierarchical behavior may relate to those in hierarchical reinforcement learning (HRL), a machine-learning framework that extends reinforcement-learning mechanisms into hierarchical domains. To test this, we leveraged a distinctive prediction arising from HRL. In ordinary reinforcement learning, reward prediction errors are computed when there is an unanticipated change in the prospects for accomplishing overall task goals. HRL entails that prediction errors should also occur in relation to task subgoals. In three neuroimaging studies we observed neural responses consistent with such subgoal-related reward prediction errors, within structures previously implicated in reinforcement learning. The results reported support the relevance of HRL to the neural processes underlying hierarchical behavior.  相似文献   

10.
Decision making and learning in a real-world context require organisms to track not only the choices they make and the outcomes that follow but also other untaken, or counterfactual, choices and their outcomes. Although the neural system responsible for tracking the value of choices actually taken is increasingly well understood, whether a neural system tracks counterfactual information is currently unclear. Using a three-alternative decision-making task, a Bayesian reinforcement-learning algorithm, and fMRI, we investigated the coding of counterfactual choices and prediction errors in the human brain. Rather than representing evidence favoring multiple counterfactual choices, lateral frontal polar cortex (lFPC), dorsomedial frontal cortex (DMFC), and posteromedial cortex (PMC) encode the reward-based evidence favoring the best counterfactual option at future decisions. In addition to encoding counterfactual reward expectations, the network carries a signal for learning about counterfactual options when feedback is available-a counterfactual prediction error. Unlike other brain regions that have been associated with the processing of counterfactual outcomes, counterfactual prediction errors within the identified network cannot be related to regret theory. Furthermore, individual variation in counterfactual choice-related activity and prediction error-related activity, respectively, predicts variation in the propensity to switch to profitable choices in the future and the ability to learn from hypothetical feedback. Taken together, these data provide both neural and behavioral evidence to support the existence of a previously unidentified neural system responsible for tracking both counterfactual choice options and their outcomes.  相似文献   

11.
Learning by following explicit advice is fundamental for human cultural evolution, yet the neurobiology of adaptive social learning is largely unknown. Here, we used simulations to analyze the adaptive value of social learning mechanisms, computational modeling of behavioral data to describe cognitive mechanisms involved in social learning, and model-based functional magnetic resonance imaging (fMRI) to identify the neurobiological basis of following advice. One-time advice received before learning had a sustained influence on people's learning processes. This was best explained by social learning mechanisms implementing a more positive evaluation of the outcomes from recommended options. Computer simulations showed that this "outcome-bonus" accumulates more rewards than an alternative mechanism implementing higher initial reward expectation for recommended options. fMRI results revealed a neural outcome-bonus signal in the septal area and the left caudate. This neural signal coded rewards in the absence of advice, and crucially, it signaled greater positive rewards for positive and negative feedback after recommended rather than after non-recommended choices. Hence, our results indicate that following advice is intrinsically rewarding. A positive correlation between the model's outcome-bonus parameter and amygdala activity after positive feedback directly relates the computational model to brain activity. These results advance the understanding of social learning by providing a neurobiological account for adaptive learning from advice.  相似文献   

12.
Temporal prediction errors in a passive learning task activate human striatum   总被引:25,自引:0,他引:25  
McClure SM  Berns GS  Montague PR 《Neuron》2003,38(2):339-346
Functional MRI experiments in human subjects strongly suggest that the striatum participates in processing information about the predictability of rewarding stimuli. However, stimuli can be unpredictable in character (what stimulus arrives next), unpredictable in time (when the stimulus arrives), and unpredictable in amount (how much arrives). These variables have not been dissociated in previous imaging work in humans, thus conflating possible interpretations of the kinds of expectation errors driving the measured brain responses. Using a passive conditioning task and fMRI in human subjects, we show that positive and negative prediction errors in reward delivery time correlate with BOLD changes in human striatum, with the strongest activation lateralized to the left putamen. For the negative prediction error, the brain response was elicited by expectations only and not by stimuli presented directly; that is, we measured the brain response to nothing delivered (juice expected but not delivered) contrasted with nothing delivered (nothing expected).  相似文献   

13.
Neuroeconomic studies of decision making have emphasized reward learning as critical in the representation of value-driven choice behaviour. However, it is readily apparent that punishment and aversive learning are also significant factors in motivating decisions and actions. In this paper, we review the role of the striatum and amygdala in affective learning and the coding of aversive prediction errors (PEs). We present neuroimaging results showing aversive PE-related signals in the striatum in fear conditioning paradigms with both primary (shock) and secondary (monetary loss) reinforcers. These results and others point to the general role for the striatum in coding PEs across a broad range of learning paradigms and reinforcer types.  相似文献   

14.
15.
Daw ND  Gershman SJ  Seymour B  Dayan P  Dolan RJ 《Neuron》2011,69(6):1204-1215
The mesostriatal dopamine system is prominently implicated in model-free reinforcement learning, with fMRI BOLD signals in ventral striatum notably covarying with model-free prediction errors. However, latent learning and devaluation studies show that behavior also shows hallmarks of model-based planning, and the interaction between model-based and model-free values, prediction errors, and preferences is underexplored. We designed a multistep decision task in which model-based and model-free influences on human choice behavior could be distinguished. By showing that choices reflected both influences we could then test the purity of the ventral striatal BOLD signal as a model-free report. Contrary to expectations, the signal reflected both model-free and model-based predictions in proportions matching those that best explained choice behavior. These results challenge the notion of a separate model-free learner and suggest a more integrated computational architecture for high-level human decision-making.  相似文献   

16.
In a social network, users hold and express positive and negative attitudes (e.g. support/opposition) towards other users. Those attitudes exhibit some kind of binary relationships among the users, which play an important role in social network analysis. However, some of those binary relationships are likely to be latent as the scale of social network increases. The essence of predicting latent binary relationships have recently began to draw researchers'' attention. In this paper, we propose a machine learning algorithm for predicting positive and negative relationships in social networks inspired by structural balance theory and social status theory. More specifically, we show that when two users in the network have fewer common neighbors, the prediction accuracy of the relationship between them deteriorates. Accordingly, in the training phase, we propose a segment-based training framework to divide the training data into two subsets according to the number of common neighbors between users, and build a prediction model for each subset based on support vector machine (SVM). Moreover, to deal with large-scale social network data, we employ a sampling strategy that selects small amount of training data while maintaining high accuracy of prediction. We compare our algorithm with traditional algorithms and adaptive boosting of them. Experimental results of typical data sets show that our algorithm can deal with large social networks and consistently outperforms other methods.  相似文献   

17.
A fundamental goal of neuroscience is to understand how cognitive processes, such as operant conditioning, are performed by the brain. Typical and well studied examples of operant conditioning, in which the firing rates of individual cortical neurons in monkeys are increased using rewards, provide an opportunity for insight into this. Studies of reward-modulated spike-timing-dependent plasticity (RSTDP), and of other models such as R-max, have reproduced this learning behavior, but they have assumed that no unsupervised learning is present (i.e., no learning occurs without, or independent of, rewards). We show that these models cannot elicit firing rate reinforcement while exhibiting both reward learning and ongoing, stable unsupervised learning. To fix this issue, we propose a new RSTDP model of synaptic plasticity based upon the observed effects that dopamine has on long-term potentiation and depression (LTP and LTD). We show, both analytically and through simulations, that our new model can exhibit unsupervised learning and lead to firing rate reinforcement. This requires that the strengthening of LTP by the reward signal is greater than the strengthening of LTD and that the reinforced neuron exhibits irregular firing. We show the robustness of our findings to spike-timing correlations, to the synaptic weight dependence that is assumed, and to changes in the mean reward. We also consider our model in the differential reinforcement of two nearby neurons. Our model aligns more strongly with experimental studies than previous models and makes testable predictions for future experiments.  相似文献   

18.
A prerequisite for adaptive goal-directed behavior is that animals constantly evaluate action outcomes and relate them to both their antecedent behavior and to stimuli predictive of reward or non-reward. Here, we investigate whether single neurons in the avian nidopallium caudolaterale (NCL), a multimodal associative forebrain structure and a presumed analogue of mammalian prefrontal cortex, represent information useful for goal-directed behavior. We subjected pigeons to a go-nogo task, in which responding to one visual stimulus (S+) was partially reinforced, responding to another stimulus (S–) was punished, and responding to test stimuli from the same physical dimension (spatial frequency) was inconsequential. The birds responded most intensely to S+, and their response rates decreased monotonically as stimuli became progressively dissimilar to S+; thereby, response rates provided a behavioral index of reward expectancy. We found that many NCL neurons'' responses were modulated in the stimulus discrimination phase, the outcome phase, or both. A substantial fraction of neurons increased firing for cues predicting non-reward or decreased firing for cues predicting reward. Interestingly, the same neurons also responded when reward was expected but not delivered, and could thus provide a negative reward prediction error or, alternatively, signal negative value. In addition, many cells showed motor-related response modulation. In summary, NCL neurons represent information about the reward value of specific stimuli, instrumental actions as well as action outcomes, and therefore provide signals useful for adaptive behavior in dynamically changing environments.  相似文献   

19.
We investigate olfactory associative learning in larval Drosophila. A reciprocal training design is used, such that one group of animals receives a reward in the presence of odor X but not in the presence of odor Y (Train: X+ // Y), whereas another group is trained reciprocally (Train: X // Y+). After training, differences in odor preference between these reciprocally trained groups in a choice test (Test: X - Y) reflect associative learning. The current study, after showing which odor pairs can be used for such learning experiments, 1) introduces a one-odor version of such reciprocal paradigm that allows estimating the learnability of single odors. Regarding this reciprocal one-odor paradigm, we show that 2) paired presentations of an odor with a reward increase odor preference above baseline, whereas unpaired presentations of odor and reward decrease odor preference below baseline; this suggests that odors can become predictive either of reward or of reward absence. Furthermore, we show that 3) innate attractiveness and associative learnability can be dissociated. These data deepen our understanding of odor-reward learning in larval Drosophila on the behavioral level, and thus foster its neurogenetic analysis.  相似文献   

20.
Reward prediction errors (RPEs) and risk preferences have two things in common: both can shape decision making behavior, and both are commonly associated with dopamine. RPEs drive value learning and are thought to be represented in the phasic release of striatal dopamine. Risk preferences bias choices towards or away from uncertainty; they can be manipulated with drugs that target the dopaminergic system. Based on the common neural substrate, we hypothesize that RPEs and risk preferences are linked on the level of behavior as well. Here, we develop this hypothesis theoretically and test it empirically. First, we apply a recent theory of learning in the basal ganglia to predict how RPEs influence risk preferences. We find that positive RPEs should cause increased risk-seeking, while negative RPEs should cause risk-aversion. We then test our behavioral predictions using a novel bandit task in which value and risk vary independently across options. Critically, conditions are included where options vary in risk but are matched for value. We find that our prediction was correct: participants become more risk-seeking if choices are preceded by positive RPEs, and more risk-averse if choices are preceded by negative RPEs. These findings cannot be explained by other known effects, such as nonlinear utility curves or dynamic learning rates.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号