期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Computational models of reinforcement learning: the role of dopamine as a reward signal

Samson RD Frank MJ Fellous JM 《Cognitive neurodynamics》2010,4(2):91-105

Reinforcement learning is ubiquitous. Unlike other forms of learning, it involves the processing of fast yet content-poor feedback information to correct assumptions about the nature of a task or of a set of stimuli. This feedback information is often delivered as generic rewards or punishments, and has little to do with the stimulus features to be learned. How can such low-content feedback lead to such an efficient learning paradigm? Through a review of existing neuro-computational models of reinforcement learning, we suggest that the efficiency of this type of learning resides in the dynamic and synergistic cooperation of brain systems that use different levels of computations. The implementation of reward signals at the synaptic, cellular, network and system levels give the organism the necessary robustness, adaptability and processing speed required for evolutionary and behavioral success. 相似文献

2.

The Neural Basis of Risky Choice with Affective Outcomes

Renata S. Suter Thorsten Pachur Ralph Hertwig Tor Endestad Guido Biele 《PloS one》2015,10(4)

相似文献

3.

Distributed coding of actual and hypothetical outcomes in the orbital and dorsolateral prefrontal cortex

Abe H Lee D 《Neuron》2011,70(4):731-741

Knowledge about hypothetical outcomes from unchosen actions is beneficial only when such outcomes can be correctly attributed to specific actions. Here we show that during a simulated rock-paper-scissors game, rhesus monkeys can adjust their choice behaviors according to both actual and hypothetical outcomes from their chosen and unchosen actions, respectively. In addition, neurons in both dorsolateral prefrontal cortex and orbitofrontal cortex encoded the signals related to actual and hypothetical outcomes immediately after they were revealed to the animal. Moreover, compared to the neurons in the orbitofrontal cortex, those in the dorsolateral prefrontal cortex were more likely to change their activity according to the hypothetical outcomes from specific actions. Conjunctive and parallel coding of multiple actions and their outcomes in the prefrontal cortex might enhance the efficiency of reinforcement learning and also contribute to their context-dependent memory. 相似文献

4.

Counterfactual choice and learning in a neural network centered on human lateral frontopolar cortex

Boorman ED Behrens TE Rushworth MF 《PLoS biology》2011,9(6):e1001093

Decision making and learning in a real-world context require organisms to track not only the choices they make and the outcomes that follow but also other untaken, or counterfactual, choices and their outcomes. Although the neural system responsible for tracking the value of choices actually taken is increasingly well understood, whether a neural system tracks counterfactual information is currently unclear. Using a three-alternative decision-making task, a Bayesian reinforcement-learning algorithm, and fMRI, we investigated the coding of counterfactual choices and prediction errors in the human brain. Rather than representing evidence favoring multiple counterfactual choices, lateral frontal polar cortex (lFPC), dorsomedial frontal cortex (DMFC), and posteromedial cortex (PMC) encode the reward-based evidence favoring the best counterfactual option at future decisions. In addition to encoding counterfactual reward expectations, the network carries a signal for learning about counterfactual options when feedback is available-a counterfactual prediction error. Unlike other brain regions that have been associated with the processing of counterfactual outcomes, counterfactual prediction errors within the identified network cannot be related to regret theory. Furthermore, individual variation in counterfactual choice-related activity and prediction error-related activity, respectively, predicts variation in the propensity to switch to profitable choices in the future and the ability to learn from hypothetical feedback. Taken together, these data provide both neural and behavioral evidence to support the existence of a previously unidentified neural system responsible for tracking both counterfactual choice options and their outcomes. 相似文献

5.

Forward propagating reinforcement learning--biologically plausible learning method for multi-layer networks

Watanabe M Masuda T Aihara K 《Bio Systems》2003,71(1-2):213-220

We introduce a biologically plausible method of implementing reinforcement learning to multi-layer neural networks. The key idea is to spatially localize the synaptic modulation induced by reinforcement signals, proceeding downstream from the initial layer to the final layer. Since reinforcement signals are known to be broadcast signals in the actual brain, we need two key assumptions, inhibitory backward connections and bypass to output units, to spatially localize the effect of delayed reinforcement without breaking the basic laws of neurophysiology. 相似文献

6.

Understanding Others' Regret: A fMRI Study

Nicola Canessa Matteo Motterlini Cinzia Di Dio Daniela Perani Paola Scifo Stefano F. Cappa Giacomo Rizzolatti 《PloS one》2009,4(10)

Previous studies showed that the understanding of others'' basic emotional experiences is based on a “resonant” mechanism, i.e., on the reactivation, in the observer''s brain, of the cerebral areas associated with those experiences. The present study aimed to investigate whether the same neural mechanism is activated both when experiencing and attending complex, cognitively-generated, emotions. A gambling task and functional-Magnetic-Resonance-Imaging (fMRI) were used to test this hypothesis using regret, the negative cognitively-based emotion resulting from an unfavorable counterfactual comparison between the outcomes of chosen and discarded options. Do the same brain structures that mediate the experience of regret become active in the observation of situations eliciting regret in another individual? Here we show that observing the regretful outcomes of someone else''s choices activates the same regions that are activated during a first-person experience of regret, i.e. the ventromedial prefrontal cortex, anterior cingulate cortex and hippocampus. These results extend the possible role of a mirror-like mechanism beyond basic emotions. 相似文献

7.

Cortical mechanisms for reinforcement learning in competitive games

Seo H Lee D 《Philosophical transactions of the Royal Society of London. Series B, Biological sciences》2008,363(1511):3845-3857

Game theory analyses optimal strategies for multiple decision makers interacting in a social group. However, the behaviours of individual humans and animals often deviate systematically from the optimal strategies described by game theory. The behaviours of rhesus monkeys (Macaca mulatta) in simple zero-sum games showed similar patterns, but their departures from the optimal strategies were well accounted for by a simple reinforcement-learning algorithm. During a computer-simulated zero-sum game, neurons in the dorsolateral prefrontal cortex often encoded the previous choices of the animal and its opponent as well as the animal's reward history. By contrast, the neurons in the anterior cingulate cortex predominantly encoded the animal's reward history. Using simple competitive games, therefore, we have demonstrated functional specialization between different areas of the primate frontal cortex involved in outcome monitoring and action selection. Temporally extended signals related to the animal's previous choices might facilitate the association between choices and their delayed outcomes, whereas information about the choices of the opponent might be used to estimate the reward expected from a particular action. Finally, signals related to the reward history might be used to monitor the overall success of the animal's current decision-making strategy. 相似文献

8.

Hedging your bets by learning reward correlations in the human brain

Wunderlich K Symmonds M Bossaerts P Dolan RJ 《Neuron》2011,71(6):1141-1152

Human subjects are proficient at tracking the mean and variance of rewards and updating these via prediction errors. Here, we addressed whether humans can also learn about higher-order relationships between distinct environmental outcomes, a defining ecological feature of contexts where multiple sources of rewards are available. By manipulating the degree to which distinct outcomes are correlated, we show that subjects implemented an explicit model-based strategy to learn the associated outcome correlations and were adept in using that information to dynamically adjust their choices in a task that required a minimization of outcome variance. Importantly, the experimentally generated outcome correlations were explicitly represented neuronally in right midinsula with a learning prediction error signal expressed in rostral anterior cingulate cortex. Thus, our data show that the human brain represents higher-order correlation structures between rewards, a core adaptive ability whose immediate benefit is optimized sampling. 相似文献

9.

Explicit neural signals reflecting reward uncertainty

Schultz W Preuschoff K Camerer C Hsu M Fiorillo CD Tobler PN Bossaerts P 《Philosophical transactions of the Royal Society of London. Series B, Biological sciences》2008,363(1511):3801-3811

The acknowledged importance of uncertainty in economic decision making has stimulated the search for neural signals that could influence learning and inform decision mechanisms. Current views distinguish two forms of uncertainty, namely risk and ambiguity, depending on whether the probability distributions of outcomes are known or unknown. Behavioural neurophysiological studies on dopamine neurons revealed a risk signal, which covaried with the standard deviation or variance of the magnitude of juice rewards and occurred separately from reward value coding. Human imaging studies identified similarly distinct risk signals for monetary rewards in the striatum and orbitofrontal cortex (OFC), thus fulfilling a requirement for the mean variance approach of economic decision theory. The orbitofrontal risk signal covaried with individual risk attitudes, possibly explaining individual differences in risk perception and risky decision making. Ambiguous gambles with incomplete probabilistic information induced stronger brain signals than risky gambles in OFC and amygdala, suggesting that the brain's reward system signals the partial lack of information. The brain can use the uncertainty signals to assess the uncertainty of rewards, influence learning, modulate the value of uncertain rewards and make appropriate behavioural choices between only partly known options. 相似文献

10.

Imbalanced Decision Hierarchy in Addicts Emerging from Drug-Hijacked Dopamine Spiraling Circuit

Mehdi Keramati Boris Gutkin 《PloS one》2013,8(4)

Despite explicitly wanting to quit, long-term addicts find themselves powerless to resist drugs, despite knowing that drug-taking may be a harmful course of action. Such inconsistency between the explicit knowledge of negative consequences and the compulsive behavioral patterns represents a cognitive/behavioral conflict that is a central characteristic of addiction. Neurobiologically, differential cue-induced activity in distinct striatal subregions, as well as the dopamine connectivity spiraling from ventral striatal regions to the dorsal regions, play critical roles in compulsive drug seeking. However, the functional mechanism that integrates these neuropharmacological observations with the above-mentioned cognitive/behavioral conflict is unknown. Here we provide a formal computational explanation for the drug-induced cognitive inconsistency that is apparent in the addicts'' “self-described mistake”. We show that addictive drugs gradually produce a motivational bias toward drug-seeking at low-level habitual decision processes, despite the low abstract cognitive valuation of this behavior. This pathology emerges within the hierarchical reinforcement learning framework when chronic exposure to the drug pharmacologically produces pathologicaly persistent phasic dopamine signals. Thereby the drug hijacks the dopaminergic spirals that cascade the reinforcement signals down the ventro-dorsal cortico-striatal hierarchy. Neurobiologically, our theory accounts for rapid development of drug cue-elicited dopamine efflux in the ventral striatum and a delayed response in the dorsal striatum. Our theory also shows how this response pattern depends critically on the dopamine spiraling circuitry. Behaviorally, our framework explains gradual insensitivity of drug-seeking to drug-associated punishments, the blocking phenomenon for drug outcomes, and the persistent preference for drugs over natural rewards by addicts. The model suggests testable predictions and beyond that, sets the stage for a view of addiction as a pathology of hierarchical decision-making processes. This view is complementary to the traditional interpretation of addiction as interaction between habitual and goal-directed decision systems. 相似文献

11.

Dread and the Disvalue of Future Pain

Giles W. Story Ivaylo Vlaev Ben Seymour Joel S. Winston Ara Darzi Raymond J. Dolan 《PLoS computational biology》2013,9(11)

Standard theories of decision-making involving delayed outcomes predict that people should defer a punishment, whilst advancing a reward. In some cases, such as pain, people seem to prefer to expedite punishment, implying that its anticipation carries a cost, often conceptualized as ‘dread’. Despite empirical support for the existence of dread, whether and how it depends on prospective delay is unknown. Furthermore, it is unclear whether dread represents a stable component of value, or is modulated by biases such as framing effects. Here, we examine choices made between different numbers of painful shocks to be delivered faithfully at different time points up to 15 minutes in the future, as well as choices between hypothetical painful dental appointments at time points of up to approximately eight months in the future, to test alternative models for how future pain is disvalued. We show that future pain initially becomes increasingly aversive with increasing delay, but does so at a decreasing rate. This is consistent with a value model in which moment-by-moment dread increases up to the time of expected pain, such that dread becomes equivalent to the discounted expectation of pain. For a minority of individuals pain has maximum negative value at intermediate delay, suggesting that the dread function may itself be prospectively discounted in time. Framing an outcome as relief reduces the overall preference to expedite pain, which can be parameterized by reducing the rate of the dread-discounting function. Our data support an account of disvaluation for primary punishments such as pain, which differs fundamentally from existing models applied to financial punishments, in which dread exerts a powerful but time-dependent influence over choice. 相似文献

12.

Silent communication: toward using brain signals

Pei X Hill J Schalk G 《IEEE pulse》2012,3(1):43-46

From the 1980s movie Firefox to the more recent Avatar, popular science fiction has speculated about the possibility of a persons thoughts being read directly from his or her brain. Such braincomputer interfaces (BCIs) might allow people who are paralyzed to communicate with and control their environment, and there might also be applications in military situations wherever silent user-to-user communication is desirable. Previous studies have shown that BCI systems can use brain signals related to movements and movement imagery or attention-based character selection. Although these systems have successfully demonstrated the possibility to control devices using brain function, directly inferring which word a person intends to communicate has been elusive. A BCI using imagined speech might provide such a practical, intuitive device. Toward this goal, our studies to date addressed two scientific questions: (1) Can brain signals accurately characterize different aspects of speech? (2) Is it possible to predict spoken or imagined words or their components using brain signals? 相似文献

13.

Modeling changes in probabilistic reinforcement learning during adolescence

Liyu Xia Sarah L. Master Maria K. Eckstein Beth Baribault Ronald E. Dahl Linda Wilbrecht Anne Gabrielle Eva Collins 《PLoS computational biology》2021,17(7)

In the real world, many relationships between events are uncertain and probabilistic. Uncertainty is also likely to be a more common feature of daily experience for youth because they have less experience to draw from than adults. Some studies suggest probabilistic learning may be inefficient in youths compared to adults, while others suggest it may be more efficient in youths in mid adolescence. Here we used a probabilistic reinforcement learning task to test how youth age 8-17 (N = 187) and adults age 18-30 (N = 110) learn about stable probabilistic contingencies. Performance increased with age through early-twenties, then stabilized. Using hierarchical Bayesian methods to fit computational reinforcement learning models, we show that all participants’ performance was better explained by models in which negative outcomes had minimal to no impact on learning. The performance increase over age was driven by 1) an increase in learning rate (i.e. decrease in integration time scale); 2) a decrease in noisy/exploratory choices. In mid-adolescence age 13-15, salivary testosterone and learning rate were positively related. We discuss our findings in the context of other studies and hypotheses about adolescent brain development. 相似文献

14.

Oscillation-coordinated,noise-resistant information distribution via the subiculum

《Current opinion in neurobiology》2022

The hippocampus processes information associated with spatial navigation. The subiculum receives input from the hippocampus CA1 and projects to various cortical and subcortical regions. Thus, the subiculum is uniquely positioned to distribute hippocampal information to a range of brain areas. Subicular neurons fire at higher rates than CA1 neurons and exhibit similarly or more accurately decodable representations of place, speed, and trajectory. These representations are more noise-resistant and advantageous for long-range information transfer. Subicular neurons selectively or uniformly distribute information to target areas, depending on the information type. Theta oscillations and sharp-wave ripples control information broadcasting in a pathway-specific manner. Thus, the subiculum routes accurately decodable, noise-resistant, navigation-associated information to downstream regions. 相似文献

15.

A comparative neuropsychological approach to the analysis of hippocampal functions

M L Pigareva L A Preobrazhenskaia 《Zhurnal vysshe? nervno? deiatelnosti imeni I P Pavlova》1990,40(2):340-350

Formation of conditioned switching-over of heterogeneous instrumental reflexes in dogs was more successful than in rats, which testifies about significant differences of the functional organization of analytical-synthetic brain activity in rodents and predatory animals. Experiments with lesion of the hippocampus (in rats) and the records of its electrical activity (in dogs) allowed to conclude that the hippocampus of the both types of animals belongs to the system of structures, participating in the formation of conditioned switching-over. The data, obtained in the process of elaboration of the switching-over and probabilistic reinforcement of the alimentary conditioned stimulus, testify that providing for the reactions to signals with low probability of reinforcement in different species of animals is one of the functions of the hippocampus. 相似文献

16.

Functional MRI in awake unrestrained dogs

Berns GS Brooks AM Spivak M 《PloS one》2012,7(5):e38027

Because of dogs' prolonged evolution with humans, many of the canine cognitive skills are thought to represent a selection of traits that make dogs particularly sensitive to human cues. But how does the dog mind actually work? To develop a methodology to answer this question, we trained two dogs to remain motionless for the duration required to collect quality fMRI images by using positive reinforcement without sedation or physical restraints. The task was designed to determine which brain circuits differentially respond to human hand signals denoting the presence or absence of a food reward. Head motion within trials was less than 1 mm. Consistent with prior reinforcement learning literature, we observed caudate activation in both dogs in response to the hand signal denoting reward versus no-reward. 相似文献

17.

A genetic instrumental variables analysis of the effects of prenatal smoking on birth weight: evidence from two samples

Wehby GL Fletcher JM Lehrer SF Moreno LM Murray JC Wilcox A Lie RT 《Biodemography and social biology》2011,57(1):3-32

There is a large literature showing the detrimental effects of prenatal smoking on birth and childhood health outcomes. It is somewhat unclear though, whether these effects are causal or reflect other characteristics and choices by mothers who choose to smoke that may also affect child health outcomes or biased reporting of smoking. In this paper we use genetic markers that predict smoking behaviors as instruments to address the endogeneity of smoking choices in the production of birth and childhood health outcomes. Our results indicate that prenatal smoking produces more dramatic declines in birth weight than estimates that ignore the endogeneity of prenatal smoking, which is consistent with previous studies with non-genetic instruments. We use data from two distinct samples from Norway and the United States with different measured instruments and find nearly identical results. The study provides a novel application that can be extended to study several behavioral impacts on health and social and economic outcomes. 相似文献

18.

Effects of monetary reserves and rate of gain on human risky choice under budget constraints

Pietras CJ Searcy GD Huitema BE Brandt AE 《Behavioural processes》2008,78(3):358-373

相似文献

19.

Expectations and outcomes: decision-making in the primate brain

Allison?N.?McCoy Michael?L.?Platt Email author 《Journal of comparative physiology. A, Neuroethology, sensory, neural, and behavioral physiology》2005,191(3):201-211

Success in a constantly changing environment requires that decision-making strategies be updated as reward contingencies change. How this is accomplished by the nervous system has, until recently, remained a profound mystery. New studies coupling economic theory with neurophysiological techniques have revealed the explicit representation of behavioral value. Specifically, when fluid reinforcement is paired with visually-guided eye movements, neurons in parietal cortex, prefrontal cortex, the basal ganglia, and superior colliculus—all nodes in a network linking visual stimulation with the generation of oculomotor behavior—encode the expected value of targets lying within their response fields. Other brain areas have been implicated in the processing of reward-related information in the abstract: midbrain dopaminergic neurons, for instance, signal an error in reward prediction. Still other brain areas link information about reward to the selection and performance of specific actions in order for behavior to adapt to changing environmental exigencies. Neurons in posterior cingulate cortex have been shown to carry signals related to both reward outcomes and oculomotor behavior, suggesting that they participate in updating estimates of orienting value. 相似文献

20.

Recognition of balance signals between healthy subjects and otoneurological patients with hidden Markov models

Jyrki Rasku Martti Juhola Esko Toppila Ilmari Pyykk 《Biomedical signal processing and control》2007,2(1):1-8

The objective of the present research was to investigate whether hidden Markov models can be used to recognise and classify balance signals extracted from two subject groups, the healthy and patients suffering from otoneurological vertiginous diseases. Two different testing protocols were applied: arising from a chair and standing on the force platform. Signals recorded according to these protocols were trained for models with different numbers of states to find the best choices as model structures. We found that these models with 7–15 states were able to recognise the healthy subjects from the patients with the accuracy between 70 and 90% although their balance measurements were visually very similar and difficult to separate between two groups. 相似文献