首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Humans and animals face decision tasks in an uncertain multi-agent environment where an agent''s strategy may change in time due to the co-adaptation of others strategies. The neuronal substrate and the computational algorithms underlying such adaptive decision making, however, is largely unknown. We propose a population coding model of spiking neurons with a policy gradient procedure that successfully acquires optimal strategies for classical game-theoretical tasks. The suggested population reinforcement learning reproduces data from human behavioral experiments for the blackjack and the inspector game. It performs optimally according to a pure (deterministic) and mixed (stochastic) Nash equilibrium, respectively. In contrast, temporal-difference(TD)-learning, covariance-learning, and basic reinforcement learning fail to perform optimally for the stochastic strategy. Spike-based population reinforcement learning, shown to follow the stochastic reward gradient, is therefore a viable candidate to explain automated decision learning of a Nash equilibrium in two-player games.  相似文献   

2.
In learning from trial and error, animals need to relate behavioral decisions to environmental reinforcement even though it may be difficult to assign credit to a particular decision when outcomes are uncertain or subject to delays. When considering the biophysical basis of learning, the credit-assignment problem is compounded because the behavioral decisions themselves result from the spatio-temporal aggregation of many synaptic releases. We present a model of plasticity induction for reinforcement learning in a population of leaky integrate and fire neurons which is based on a cascade of synaptic memory traces. Each synaptic cascade correlates presynaptic input first with postsynaptic events, next with the behavioral decisions and finally with external reinforcement. For operant conditioning, learning succeeds even when reinforcement is delivered with a delay so large that temporal contiguity between decision and pertinent reward is lost due to intervening decisions which are themselves subject to delayed reinforcement. This shows that the model provides a viable mechanism for temporal credit assignment. Further, learning speeds up with increasing population size, so the plasticity cascade simultaneously addresses the spatial problem of assigning credit to synapses in different population neurons. Simulations on other tasks, such as sequential decision making, serve to contrast the performance of the proposed scheme to that of temporal difference-based learning. We argue that, due to their comparative robustness, synaptic plasticity cascades are attractive basic models of reinforcement learning in the brain.  相似文献   

3.
The role of dopamine in behaviour and decision-making is often cast in terms of reinforcement learning and optimal decision theory. Here, we present an alternative view that frames the physiology of dopamine in terms of Bayes-optimal behaviour. In this account, dopamine controls the precision or salience of (external or internal) cues that engender action. In other words, dopamine balances bottom-up sensory information and top-down prior beliefs when making hierarchical inferences (predictions) about cues that have affordance. In this paper, we focus on the consequences of changing tonic levels of dopamine firing using simulations of cued sequential movements. Crucially, the predictions driving movements are based upon a hierarchical generative model that infers the context in which movements are made. This means that we can confuse agents by changing the context (order) in which cues are presented. These simulations provide a (Bayes-optimal) model of contextual uncertainty and set switching that can be quantified in terms of behavioural and electrophysiological responses. Furthermore, one can simulate dopaminergic lesions (by changing the precision of prediction errors) to produce pathological behaviours that are reminiscent of those seen in neurological disorders such as Parkinson's disease. We use these simulations to demonstrate how a single functional role for dopamine at the synaptic level can manifest in different ways at the behavioural level.  相似文献   

4.
Finding the right amount of deliberation, between insufficient and excessive, is a hard decision making problem that depends on the value we place on our time. Average-reward, putatively encoded by tonic dopamine, serves in existing reinforcement learning theory as the opportunity cost of time, including deliberation time. Importantly, this cost can itself vary with the environmental context and is not trivial to estimate. Here, we propose how the opportunity cost of deliberation can be estimated adaptively on multiple timescales to account for non-stationary contextual factors. We use it in a simple decision-making heuristic based on average-reward reinforcement learning (AR-RL) that we call Performance-Gated Deliberation (PGD). We propose PGD as a strategy used by animals wherein deliberation cost is implemented directly as urgency, a previously characterized neural signal effectively controlling the speed of the decision-making process. We show PGD outperforms AR-RL solutions in explaining behaviour and urgency of non-human primates in a context-varying random walk prediction task and is consistent with relative performance and urgency in a context-varying random dot motion task. We make readily testable predictions for both neural activity and behaviour.  相似文献   

5.
Using behavioral genetic analyses, we investigated and present a possible relationship between adolescent alcohol use and six domains of common problem behaviors in a community-based sample of 633 twin pairs who were under the legal drinking age of 21 (mean age = 15.0 years). The underlying etiology of the six problem behavioral domains, classified as conduct problems, hyperactivity, school problems, low self-esteem, neuroticism, and social withdrawal, was previously described (Siewert et al., 2003) as two heritable and genetically distinct dimensions of problem behavior. We took the two best-fitting models from that study (one that proposed a generalized behavior problem factor along with an internalizing behavior factor, and one that proposed an externalizing behavior factor along with an internalizing behavior factor) and extended the analyses in this study to include an index of alcohol use. Our results suggest that there is a strong genetic relationship between adolescent alcohol use and a broad spectrum of both externalizing and internalizing behavioral problems. The individual who seems to be at risk for either generalized or specifically externalizing behavioral problems is also at risk for adolescent alcohol use. However, the individual who exhibits internalizing problem behaviors appears to be protected from adolescent alcohol use. We propose that adolescent alcohol consumption needs to be understood in the context of these genetically influenced externalizing and internalizing propensities.  相似文献   

6.
Adolescence is associated with high impulsivity and risk taking, making adolescent individuals more inclined to use drugs. Early drug use is correlated to increased risk for substance use disorders later in life but the neurobiological basis is unclear. The brain undergoes extensive development during adolescence and disturbances at this time are hypothesized to contribute to increased vulnerability. The transition from controlled to compulsive drug use and addiction involve long-lasting changes in neural networks including a shift from the nucleus accumbens, mediating acute reinforcing effects, to recruitment of the dorsal striatum and habit formation. This study aimed to test the hypothesis of increased dopamine release after a pharmacological challenge in adolescent rats. Potassium-evoked dopamine release and uptake was investigated using chronoamperometric dopamine recordings in combination with a challenge by amphetamine in early and late adolescent rats and in adult rats. In addition, the consequences of voluntary alcohol intake during adolescence on these effects were investigated. The data show a gradual increase of evoked dopamine release with age, supporting previous studies suggesting that the pool of releasable dopamine increases with age. In contrast, a gradual decrease in evoked release with age was seen in response to amphetamine, supporting a proportionally larger storage pool of dopamine in younger animals. Dopamine measures after voluntary alcohol intake resulted in lower release amplitudes in response to potassium-chloride, indicating that alcohol affects the releasable pool of dopamine and this may have implications for vulnerability to addiction and other psychiatric diagnoses involving dopamine in the dorsal striatum.  相似文献   

7.
The influence of cAMP analogue 8-Br-cAMP on conditioning was studied in white rats. Two models of learning were used with different kinds of reinforcement, i. e. conditioned active avoidance and instrumental alimentary reactions in a complex maze. Intraventricular 8-Br-cAMP injection 4 or 24 hours before the beginning of learning improved the process of defensive as well as alimentary conditioning. Characteristics of formation of complex behaviour of experimental rats in a maze showed that under the influence of 8-Br-cAMP, not only conditioning was accelerated, but the process of optimal decision making itself was changed. The data obtained permit to suppose that 8-Br-cAMP first of all affects initially poorly learning rats.  相似文献   

8.
Brown and Wanger [Brown, R.T., Wanger, A.R., 1964. Resistance to punishment and extinction following training with shock or nonreinforcement. J. Exp. Psychol. 68, 503-507] investigated rat behaviors with the following features: (1) rats were exposed to reward and punishment at the same time, (2) environment changed and rats relearned, and (3) rats were stochastically exposed to reward and punishment. The results are that exposure to nonreinforcement produces resistance to the decremental effects of behavior after stochastic reward schedule and that exposure to both punishment and reinforcement produces resistance to the decremental effects of behavior after stochastic punishment schedule. This paper aims to simulate the rat behaviors by a reinforcement learning algorithm in consideration of appearance probabilities of reinforcement signals. The former algorithms of reinforcement learning were unable to simulate the behavior of the feature (3). We improve the former reinforcement learning algorithms by controlling learning parameters in consideration of the acquisition probabilities of reinforcement signals. The proposed algorithm qualitatively simulates the result of the animal experiment of Brown and Wanger.  相似文献   

9.
Halici U 《Bio Systems》2001,63(1-3):21-34
The reinforcement learning scheme proposed in Halici (J. Biosystems 40 (1997) 83) for the random neural network (RNN) (Neural Computation 1 (1989) 502) is based on reward and performs well for stationary environments. However, when the environment is not stationary it suffers from getting stuck to the previously learned action and extinction is not possible. To overcome the problem, the reinforcement scheme is extended in Halici (Eur. J. Oper. Res., 126(2000) 288) by introducing a new weight update rule (E-rule) which takes into consideration the internal expectation of reinforcement. Although the E-rule is proposed for the RNN, it can be used for training learning automata or other intelligent systems based on reinforcement learning. This paper looks into the behavior of the learning scheme with internal expectation for the environments where the reinforcement is obtained after a sequence of cascaded decisions. The simulation results have shown that the RNN learns well and extinction is possible even for the cases with several decision steps and with hundreds of possible decision paths.  相似文献   

10.
Kahnt T  Grueschow M  Speck O  Haynes JD 《Neuron》2011,70(3):549-559
The dominant view that perceptual learning is accompanied by changes in early sensory representations has recently been challenged. Here we tested the idea that perceptual learning can be accounted for by reinforcement learning involving changes in higher decision-making areas. We trained subjects on an orientation discrimination task involving feedback over 4 days, acquiring fMRI data on the first and last day. Behavioral improvements were well explained by a reinforcement learning model in which learning leads to enhanced readout of sensory information, thereby establishing noise-robust representations of decision variables. We find stimulus orientation encoded in early visual and higher cortical regions such as lateral parietal cortex and anterior cingulate cortex (ACC). However, only activity patterns in the ACC tracked changes in decision variables during learning. These results provide strong evidence for perceptual learning-related changes in higher order areas and suggest that perceptual and reward learning are based on a common neurobiological mechanism.  相似文献   

11.
Heavy episodic drinking early in adolescence is associated with increased risk of addiction and other stress-related disorders later in life. This suggests that adolescent alcohol abuse is an early marker of innate vulnerability and/or binge exposure impacts the developing brain to increase vulnerability to these disorders in adulthood. Animal models are ideal for clarifying the relationship between adolescent and adult alcohol abuse, but we show that methods of involuntary alcohol exposure are not effective. We describe an operant model that uses multiple bouts of intermittent access to sweetened alcohol to elicit voluntary binge alcohol drinking early in adolescence (~postnatal days 28-42) in genetically heterogeneous male Wistar rats. We next examined the effects of adolescent binge drinking on alcohol drinking and anxiety-like behavior in dependent and non-dependent adult rats, and counted corticotropin-releasing factor (CRF) cell in the lateral portion of the central amygdala (CeA), a region that contributes to regulation of anxiety- and alcohol-related behaviors. Adolescent binge drinking did not alter alcohol drinking under baseline drinking conditions in adulthood. However, alcohol-dependent and non-dependent adult rats with a history of adolescent alcohol binge drinking did exhibit increased alcohol drinking when access to alcohol was intermittent. Adult rats that binged alcohol during adolescence exhibited increased exploration on the open arms of the elevated plus maze (possibly indicating either decreased anxiety or increased impulsivity), an effect that was reversed by a history of alcohol dependence during adulthood. Finally, CRF cell counts were reduced in the lateral CeA of rats with adolescent alcohol binge history, suggesting semi-permanent changes in the limbic stress peptide system with this treatment. These data suggest that voluntary binge drinking during early adolescence produces long-lasting neural and behavioral effects with implications for anxiety and alcohol use disorders.  相似文献   

12.
Alcohol use is common in adolescence, with a large portion of intake occurring during episodes of binging. This pattern of alcohol consumption coincides with a critical period for neurocognitive development and may impact decision-making and reward processing. Prior studies have demonstrated alterations in adult decision-making following adolescent usage, but it remains to be seen if these alterations exist in adolescence, or are latent until adulthood. Here, using a translational model of voluntary binge alcohol consumption in adolescents, we assess the impact of alcohol intake on risk preference and behavioral flexibility during adolescence. During adolescence (postnatal day 30–50), rats were given 1-hour access to either a 10% alcohol gelatin mixture (EtOH) or a calorie equivalent gelatin (Control) at the onset of the dark cycle. EtOH consuming rats were classified as either High or Low consumers based on intake levels. Adolescent rats underwent behavioral testing once a day, with one group performing a risk preference task, and a second group performing a reversal-learning task during the 20-day period of gelatin access. EtOH-High rats showed increases in risk preference compared to Control rats, but not EtOH-Low animals. However, adolescent rats did a poor job of matching their behavior to optimize outcomes, suggesting that adolescents may adopt a response bias. In addition, adolescent ethanol exposure did not affect the animals'' ability to flexibly adapt behavior to changing reward contingencies during reversal learning. These data support the view that adolescent alcohol consumption can have short-term detrimental effects on risk-taking when examined during adolescence, which does not seem to be attributable to an inability to flexibly encode reward contingencies on behavioral responses.  相似文献   

13.
Right brain damaged patients show impairments in sequential decision making tasks for which healthy people do not show any difficulty. We hypothesized that this difficulty could be due to the failure of right brain damage patients to develop well-matched models of the world. Our motivation is the idea that to navigate uncertainty, humans use models of the world to direct the decisions they make when interacting with their environment. The better the model is, the better their decisions are. To explore the model building and updating process in humans and the basis for impairment after brain injury, we used a computational model of non-stationary sequence learning. RELPH (Reinforcement and Entropy Learned Pruned Hypothesis space) was able to qualitatively and quantitatively reproduce the results of left and right brain damaged patient groups and healthy controls playing a sequential version of Rock, Paper, Scissors. Our results suggests that, in general, humans employ a sub-optimal reinforcement based learning method rather than an objectively better statistical learning approach, and that differences between right brain damaged and healthy control groups can be explained by different exploration policies, rather than qualitatively different learning mechanisms.  相似文献   

14.
Four different neural network algorithms, binary adaptive resonance theory (ART1), self-organizing map, learning vector quantization and back-propagation, were compared in the diagnosis of acute appendicitis with different parameter groups. The results show that supervised learning algorithms learning vector quantization and back-propagation were better than unsupervised algorithms in this medical decision making problem. The best results were obtained with the learning vector quantization. The self-organizing map algorithm showed good specificity, but this was in conjunction with lower sensitivity. The best parameter group was found to be the clinical signs. It seems beneficial to design a decision support system which uses these methods in the decision making process.  相似文献   

15.
High performance computing on the Graphics Processing Unit (GPU) is an emerging field driven by the promise of high computational power at a low cost. However, GPU programming is a non-trivial task and moreover architectural limitations raise the question of whether investing effort in this direction may be worthwhile. In this work, we use GPU programming to simulate a two-layer network of Integrate-and-Fire neurons with varying degrees of recurrent connectivity and investigate its ability to learn a simplified navigation task using a policy-gradient learning rule stemming from Reinforcement Learning. The purpose of this paper is twofold. First, we want to support the use of GPUs in the field of Computational Neuroscience. Second, using GPU computing power, we investigate the conditions under which the said architecture and learning rule demonstrate best performance. Our work indicates that networks featuring strong Mexican-Hat-shaped recurrent connections in the top layer, where decision making is governed by the formation of a stable activity bump in the neural population (a "non-democratic" mechanism), achieve mediocre learning results at best. In absence of recurrent connections, where all neurons "vote" independently ("democratic") for a decision via population vector readout, the task is generally learned better and more robustly. Our study would have been extremely difficult on a desktop computer without the use of GPU programming. We present the routines developed for this purpose and show that a speed improvement of 5x up to 42x is provided versus optimised Python code. The higher speed is achieved when we exploit the parallelism of the GPU in the search of learning parameters. This suggests that efficient GPU programming can significantly reduce the time needed for simulating networks of spiking neurons, particularly when multiple parameter configurations are investigated.  相似文献   

16.
17.
A large literature has accumulated suggesting that human and animal decision making is driven by at least two systems, and that important functions of these systems can be captured by reinforcement learning algorithms. The “model-free” system caches and uses stimulus–value or stimulus–response associations, and the “model-based” system implements more flexible planning using a model of the world. However, it is not clear how the two systems interact during deliberation and how a single decision emerges from this process, especially when they disagree. Most previous work has assumed that while the systems operate in parallel, they do so independently, and they combine linearly to influence decisions. Using an integrated reinforcement learning/drift-diffusion model, we tested the hypothesis that the two systems interact in a non-linear fashion similar to other situations with cognitive conflict. We differentiated two forms of conflict: action conflict, a binary state representing whether the systems disagreed on the best action, and value conflict, a continuous measure of the extent to which the two systems disagreed on the difference in value between the available options. We found that decisions with greater value conflict were characterized by reduced model-based control and increased caution both with and without action conflict. Action conflict itself (the binary state) acted in the opposite direction, although its effects were less prominent. We also found that between-system conflict was highly correlated with within-system conflict, and although it is less clear a priori why the latter might influence the strength of each system above its standard linear contribution, we could not rule it out. Our work highlights the importance of non-linear conflict effects, and provides new constraints for more detailed process models of decision making. It also presents new avenues to explore with relation to disorders of compulsivity, where an imbalance between systems has been implicated.  相似文献   

18.
The skills required for the learning and use of language are the focus of extensive research, and their evolutionary origins are widely debated. Using agent-based simulations in a range of virtual environments, we demonstrate that challenges of foraging for food can select for cognitive mechanisms supporting complex, hierarchical, sequential learning, the need for which arises in language acquisition. Building on previous work, where we explored the conditions under which reinforcement learning is out-competed by seldom-reinforced continuous learning that constructs a network model of the environment, we now show that realistic features of the foraging environment can select for two critical advances: (i) chunking of meaningful sequences found in the data, leading to representations composed of units that better fit the prevalent statistical patterns in the environment; and (ii) generalization across units based on their contextual similarity. Importantly, these learning processes, which in our framework evolved for making better foraging decisions, had been earlier shown to reproduce a range of findings in language learning in humans. Thus, our results suggest a possible evolutionary trajectory that may have led from basic learning mechanisms to complex hierarchical sequential learning that can support advanced cognitive abilities of the kind needed for language acquisition.  相似文献   

19.
An adolescent female chimpanzee was trained to press a key in the presence of a computer-graphic geometric figure (“Go” stimulus) within 5 sec and not to press the key during 5-sec presentations of another figure (“No-go” stimulus) with food reinforcement. In the acquisition training, the accuracy of performance increased primarily as a result of learning to inhibit key presses in No-go trials. The chimpanzee acquired this “Go/No-go” visual discrimination task in 1,260 trials. She was then given 14 successive transfer problems. The results for these problems suggested that learning-set formation and repeated use of the same discriminative stimuli both influenced transfer to new problems.  相似文献   

20.
Standard economic theories conceive homo economicus as a rational decision maker capable of maximizing utility. In reality, however, people tend to approximate optimal decision-making strategies through a collection of heuristic routines. Some of these routines are driven by emotional processes, and others are adjusted iteratively through experience. In addition, routines specialized for social decision making, such as inference about the mental states of other decision makers, might share their origins and neural mechanisms with the ability to simulate or imagine outcomes expected from alternative actions that an individual can take. A recent surge of collaborations across economics, psychology and neuroscience has provided new insights into how such multiple elements of decision making interact in the brain.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号