首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Human behavior displays hierarchical structure: simple actions cohere into subtask sequences, which work together to accomplish overall task goals. Although the neural substrates of such hierarchy have been the target of increasing research, they remain poorly understood. We propose that the computations supporting hierarchical behavior may relate to those in hierarchical reinforcement learning (HRL), a machine-learning framework that extends reinforcement-learning mechanisms into hierarchical domains. To test this, we leveraged a distinctive prediction arising from HRL. In ordinary reinforcement learning, reward prediction errors are computed when there is an unanticipated change in the prospects for accomplishing overall task goals. HRL entails that prediction errors should also occur in relation to task subgoals. In three neuroimaging studies we observed neural responses consistent with such subgoal-related reward prediction errors, within structures previously implicated in reinforcement learning. The results reported support the relevance of HRL to the neural processes underlying hierarchical behavior.  相似文献   

2.
Motor learning with unstable neural representations   总被引:2,自引:0,他引:2  
Rokni U  Richardson AG  Bizzi E  Seung HS 《Neuron》2007,54(4):653-666
It is often assumed that learning takes place by changing an otherwise stable neural representation. To test this assumption, we studied changes in the directional tuning of primate motor cortical neurons during reaching movements performed in familiar and novel environments. During the familiar task, tuning curves exhibited slow random drift. During learning of the novel task, random drift was accompanied by systematic shifts of tuning curves. Our analysis suggests that motor learning is based on a surprisingly unstable neural representation. To explain these results, we propose that motor cortex is a redundant neural network, i.e., any single behavior can be realized by multiple configurations of synaptic strengths. We further hypothesize that synaptic modifications underlying learning contain a random component, which causes wandering among synaptic configurations with equivalent behaviors but different neural representations. We use a simple model to explore the implications of these assumptions.  相似文献   

3.
Recent studies suggest that cooperative decision-making in one-shot interactions is a history-dependent dynamic process: promoting intuition versus deliberation typically has a positive effect on cooperation (dynamism) among people living in a cooperative setting and with no previous experience in economic games on cooperation (history dependence). Here, we report on a laboratory experiment exploring how these findings transfer to a non-cooperative setting. We find two major results: (i) promoting intuition versus deliberation has no effect on cooperative behaviour among inexperienced subjects living in a non-cooperative setting; (ii) experienced subjects cooperate more than inexperienced subjects, but only under time pressure. These results suggest that cooperation is a learning process, rather than an instinctive impulse or a self-controlled choice, and that experience operates primarily via the channel of intuition. Our findings shed further light on the cognitive basis of human cooperative decision-making and provide further support for the recently proposed social heuristics hypothesis.  相似文献   

4.
A long-standing goal in artificial intelligence is creating agents that can learn a variety of different skills for different problems. In the artificial intelligence subfield of neural networks, a barrier to that goal is that when agents learn a new skill they typically do so by losing previously acquired skills, a problem called catastrophic forgetting. That occurs because, to learn the new task, neural learning algorithms change connections that encode previously acquired skills. How networks are organized critically affects their learning dynamics. In this paper, we test whether catastrophic forgetting can be reduced by evolving modular neural networks. Modularity intuitively should reduce learning interference between tasks by separating functionality into physically distinct modules in which learning can be selectively turned on or off. Modularity can further improve learning by having a reinforcement learning module separate from sensory processing modules, allowing learning to happen only in response to a positive or negative reward. In this paper, learning takes place via neuromodulation, which allows agents to selectively change the rate of learning for each neural connection based on environmental stimuli (e.g. to alter learning in specific locations based on the task at hand). To produce modularity, we evolve neural networks with a cost for neural connections. We show that this connection cost technique causes modularity, confirming a previous result, and that such sparsely connected, modular networks have higher overall performance because they learn new skills faster while retaining old skills more and because they have a separate reinforcement learning module. Our results suggest (1) that encouraging modularity in neural networks may help us overcome the long-standing barrier of networks that cannot learn new skills without forgetting old ones, and (2) that one benefit of the modularity ubiquitous in the brains of natural animals might be to alleviate the problem of catastrophic forgetting.  相似文献   

5.
Operant learning requires that reinforcement signals interact with action representations at a suitable neural interface. Much evidence suggests that this occurs when phasic dopamine, acting as a reinforcement prediction error, gates plasticity at cortico-striatal synapses, and thereby changes the future likelihood of selecting the action(s) coded by striatal neurons. But this hypothesis faces serious challenges. First, cortico-striatal plasticity is inexplicably complex, depending on spike timing, dopamine level, and dopamine receptor type. Second, there is a credit assignment problem—action selection signals occur long before the consequent dopamine reinforcement signal. Third, the two types of striatal output neuron have apparently opposite effects on action selection. Whether these factors rule out the interface hypothesis and how they interact to produce reinforcement learning is unknown. We present a computational framework that addresses these challenges. We first predict the expected activity changes over an operant task for both types of action-coding striatal neuron, and show they co-operate to promote action selection in learning and compete to promote action suppression in extinction. Separately, we derive a complete model of dopamine and spike-timing dependent cortico-striatal plasticity from in vitro data. We then show this model produces the predicted activity changes necessary for learning and extinction in an operant task, a remarkable convergence of a bottom-up data-driven plasticity model with the top-down behavioural requirements of learning theory. Moreover, we show the complex dependencies of cortico-striatal plasticity are not only sufficient but necessary for learning and extinction. Validating the model, we show it can account for behavioural data describing extinction, renewal, and reacquisition, and replicate in vitro experimental data on cortico-striatal plasticity. By bridging the levels between the single synapse and behaviour, our model shows how striatum acts as the action-reinforcement interface.  相似文献   

6.
Rawlinson D  Kowadlo G 《PloS one》2012,7(1):e29264
The Memory-Prediction Framework (MPF) and its Hierarchical-Temporal Memory implementation (HTM) have been widely applied to unsupervised learning problems, for both classification and prediction. To date, there has been no attempt to incorporate MPF/HTM in reinforcement learning or other adaptive systems; that is, to use knowledge embodied within the hierarchy to control a system, or to generate behaviour for an agent. This problem is interesting because the human neocortex is believed to play a vital role in the generation of behaviour, and the MPF is a model of the human neocortex.We propose some simple and biologically-plausible enhancements to the Memory-Prediction Framework. These cause it to explore and interact with an external world, while trying to maximize a continuous, time-varying reward function. All behaviour is generated and controlled within the MPF hierarchy. The hierarchy develops from a random initial configuration by interaction with the world and reinforcement learning only. Among other demonstrations, we show that a 2-node hierarchy can learn to successfully play "rocks, paper, scissors" against a predictable opponent.  相似文献   

7.
Humans and animals are able to learn complex behaviors based on a massive stream of sensory information from different modalities. Early animal studies have identified learning mechanisms that are based on reward and punishment such that animals tend to avoid actions that lead to punishment whereas rewarded actions are reinforced. However, most algorithms for reward-based learning are only applicable if the dimensionality of the state-space is sufficiently small or its structure is sufficiently simple. Therefore, the question arises how the problem of learning on high-dimensional data is solved in the brain. In this article, we propose a biologically plausible generic two-stage learning system that can directly be applied to raw high-dimensional input streams. The system is composed of a hierarchical slow feature analysis (SFA) network for preprocessing and a simple neural network on top that is trained based on rewards. We demonstrate by computer simulations that this generic architecture is able to learn quite demanding reinforcement learning tasks on high-dimensional visual input streams in a time that is comparable to the time needed when an explicit highly informative low-dimensional state-space representation is given instead of the high-dimensional visual input. The learning speed of the proposed architecture in a task similar to the Morris water maze task is comparable to that found in experimental studies with rats. This study thus supports the hypothesis that slowness learning is one important unsupervised learning principle utilized in the brain to form efficient state representations for behavioral learning.  相似文献   

8.
In this paper we propose a mathematical learning model for the feeding behaviour of a specialist predator operating in a random environment occupied by two types of prey, palatable mimics and unpalatable models, and a generalist predator with additional alternative prey at its disposal. A well known linear reinforcement learning algorithm and its special cases are considered for updating the probabilities of the two actions, eat prey or ignore prey. Each action elicits a probabilistic response from the environment that can be favorable or unfavourable. To assess the performance of the predator a payoff function is constructed that captures the energetic benefit from consuming acceptable prey, the energetic cost from consuming unacceptable prey, and lost benefit from ignoring acceptable prey. Conditions for an improving predator payoff are also explicitly formulated.  相似文献   

9.
This paper investigates the effectiveness of spiking agents when trained with reinforcement learning (RL) in a challenging multiagent task. In particular, it explores learning through reward-modulated spike-timing dependent plasticity (STDP) and compares it to reinforcement of stochastic synaptic transmission in the general-sum game of the Iterated Prisoner's Dilemma (IPD). More specifically, a computational model is developed where we implement two spiking neural networks as two "selfish" agents learning simultaneously but independently, competing in the IPD game. The purpose of our system (or collective) is to maximise its accumulated reward in the presence of reward-driven competing agents within the collective. This can only be achieved when the agents engage in a behaviour of mutual cooperation during the IPD. Previously, we successfully applied reinforcement of stochastic synaptic transmission to the IPD game. The current study utilises reward-modulated STDP with eligibility trace and results show that the system managed to exhibit the desired behaviour by establishing mutual cooperation between the agents. It is noted that the cooperative outcome was attained after a relatively short learning period which enhanced the accumulation of reward by the system. As in our previous implementation, the successful application of the learning algorithm to the IPD becomes possible only after we extended it with additional global reinforcement signals in order to enhance competition at the neuronal level. Moreover it is also shown that learning is enhanced (as indicated by an increased IPD cooperative outcome) through: (i) strong memory for each agent (regulated by a high eligibility trace time constant) and (ii) firing irregularity produced by equipping the agents' LIF neurons with a partial somatic reset mechanism.  相似文献   

10.
Kahnt T  Grueschow M  Speck O  Haynes JD 《Neuron》2011,70(3):549-559
The dominant view that perceptual learning is accompanied by changes in early sensory representations has recently been challenged. Here we tested the idea that perceptual learning can be accounted for by reinforcement learning involving changes in higher decision-making areas. We trained subjects on an orientation discrimination task involving feedback over 4 days, acquiring fMRI data on the first and last day. Behavioral improvements were well explained by a reinforcement learning model in which learning leads to enhanced readout of sensory information, thereby establishing noise-robust representations of decision variables. We find stimulus orientation encoded in early visual and higher cortical regions such as lateral parietal cortex and anterior cingulate cortex (ACC). However, only activity patterns in the ACC tracked changes in decision variables during learning. These results provide strong evidence for perceptual learning-related changes in higher order areas and suggest that perceptual and reward learning are based on a common neurobiological mechanism.  相似文献   

11.
Accumulating evidence shows that the neural network of the cerebral cortex and the basal ganglia is critically involved in reinforcement learning. Recent studies found functional heterogeneity within the cortico-basal ganglia circuit, especially in its ventromedial to dorsolateral axis. Here we review computational issues in reinforcement learning and propose a working hypothesis on how multiple reinforcement learning algorithms are implemented in the cortico-basal ganglia circuit using different representations of states, values, and actions.  相似文献   

12.
Behavioural and neurophysiological studies in primates have increasingly shown the involvement of urgency signals during the temporal integration of sensory evidence in perceptual decision-making. Neuronal correlates of such signals have been found in the parietal cortex, and in separate studies, demonstrated attention-induced gain modulation of both excitatory and inhibitory neurons. Although previous computational models of decision-making have incorporated gain modulation, their abstract forms do not permit an understanding of the contribution of inhibitory gain modulation. Thus, the effects of co-modulating both excitatory and inhibitory neuronal gains on decision-making dynamics and behavioural performance remain unclear. In this work, we incorporate time-dependent co-modulation of the gains of both excitatory and inhibitory neurons into our previous biologically based decision circuit model. We base our computational study in the context of two classic motion-discrimination tasks performed in animals. Our model shows that by simultaneously increasing the gains of both excitatory and inhibitory neurons, a variety of the observed dynamic neuronal firing activities can be replicated. In particular, the model can exhibit winner-take-all decision-making behaviour with higher firing rates and within a significantly more robust model parameter range. It also exhibits short-tailed reaction time distributions even when operating near a dynamical bifurcation point. The model further shows that neuronal gain modulation can compensate for weaker recurrent excitation in a decision neural circuit, and support decision formation and storage. Higher neuronal gain is also suggested in the more cognitively demanding reaction time than in the fixed delay version of the task. Using the exact temporal delays from the animal experiments, fast recruitment of gain co-modulation is shown to maximize reward rate, with a timescale that is surprisingly near the experimentally fitted value. Our work provides insights into the simultaneous and rapid modulation of excitatory and inhibitory neuronal gains, which enables flexible, robust, and optimal decision-making.  相似文献   

13.
Action selection, planning and execution are continuous processes that evolve over time, responding to perceptual feedback as well as evolving top-down constraints. Existing models of routine sequential action (e.g. coffee- or pancake-making) generally fall into one of two classes: hierarchical models that include hand-built task representations, or heterarchical models that must learn to represent hierarchy via temporal context, but thus far lack goal-orientedness. We present a biologically motivated model of the latter class that, because it is situated in the Leabra neural architecture, affords an opportunity to include both unsupervised and goal-directed learning mechanisms. Moreover, we embed this neurocomputational model in the theoretical framework of the theory of event coding (TEC), which posits that actions and perceptions share a common representation with bidirectional associations between the two. Thus, in this view, not only does perception select actions (along with task context), but actions are also used to generate perceptions (i.e. intended effects). We propose a neural model that implements TEC to carry out sequential action control in hierarchically structured tasks such as coffee-making. Unlike traditional feedforward discrete-time neural network models, which use static percepts to generate static outputs, our biological model accepts continuous-time inputs and likewise generates non-stationary outputs, making short-timescale dynamic predictions.  相似文献   

14.
Adolescence is a period of life characterised by changes in learning and decision-making. Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissociable computational modules. Here, we aimed to trace the developmental time-course of the computational modules responsible for learning from reward or punishment, and learning from counterfactual feedback. Adolescents and adults carried out a novel reinforcement learning paradigm in which participants learned the association between cues and probabilistic outcomes, where the outcomes differed in valence (reward versus punishment) and feedback was either partial or complete (either the outcome of the chosen option only, or the outcomes of both the chosen and unchosen option, were displayed). Computational strategies changed during development: whereas adolescents’ behaviour was better explained by a basic reinforcement learning algorithm, adults’ behaviour integrated increasingly complex computational features, namely a counterfactual learning module (enabling enhanced performance in the presence of complete feedback) and a value contextualisation module (enabling symmetrical reward and punishment learning). Unlike adults, adolescent performance did not benefit from counterfactual (complete) feedback. In addition, while adults learned symmetrically from both reward and punishment, adolescents learned from reward but were less likely to learn from punishment. This tendency to rely on rewards and not to consider alternative consequences of actions might contribute to our understanding of decision-making in adolescence.  相似文献   

15.
《Journal of Physiology》2013,107(5):399-408
Recent experiments showed that the bio-mechanical ease and end-point stability associated to reaching movements are predicted prior to movement onset, and that these factors exert a significant influence on the choice of movement. As an extension of these results, here we investigate whether the knowledge about biomechanical costs and their influence on decision-making are the result of an adaptation process taking place during each experimental session or whether this knowledge was learned at an earlier stage of development. Specifically, we analysed both the pattern of decision-making and its fluctuations during each session, of several human subjects making free choices between two reaching movements that varied in path distance (target relative distance), biomechanical cost, aiming accuracy and stopping requirement. Our main result shows that the effect of biomechanics is well established at the start of the session, and that, consequently, the learning of biomechanical costs in decision-making occurred at an earlier stage of development. As a means to characterise the dynamics of this learning process, we also developed a model-based reinforcement learning model, which generates a possible account of how biomechanics may be incorporated into the motor plan to select between reaching movements. Results obtained in simulation showed that, after some pre-training corresponding to a motor babbling phase, the model can reproduce the subjects’ overall movement preferences. Although preliminary, this supports that the knowledge about biomechanical costs may have been learned in this manner, and supports the hypothesis that the fluctuations observed in the subjects’ behaviour may adapt in a similar fashion.  相似文献   

16.
Neurons in a small number of brain structures detect rewards and reward-predicting stimuli and are active during the expectation of predictable food and liquid rewards. These neurons code the reward information according to basic terms of various behavioural theories that seek to explain reward-directed learning, approach behaviour and decision-making. The involved brain structures include groups of dopamine neurons, the striatum including the nucleus accumbens, the orbitofrontal cortex and the amygdala. The reward information is fed to brain structures involved in decision-making and organisation of behaviour, such as the dorsolateral prefrontal cortex and possibly the parietal cortex. The neural coding of basic reward terms derived from formal theories puts the neurophysiological investigation of reward mechanisms on firm conceptual grounds and provides neural correlates for the function of rewards in learning, approach behaviour and decision-making.  相似文献   

17.
In an attempt to elucidate the causal mechanisms underlying learning and memory we have developed a model system, aerial respiration in the pond snail Lymnaea stagnalis. A three-neuron central pattern generator (CPG) whose sufficiency and necessity have been demonstrated mediates this behaviour. Aerial respiration, while an important homeostatic behaviour, is inhibited by the activation of the whole body withdrawal response that the animal uses to protect itself. We found that it was possible to operantly condition snails not to perform aerial respiration in a situation, a hypoxic environment, where aerial respiration should predominate. Operant conditioning was achieved by eliciting the pneumostome withdrawal response, part of the whole body withdrawal response, each time the animal attempted to open its pneumostome to breathe. Yoked control animals did not demonstrate an alteration in breathing behaviour. Subsequently we determined neural correlates of this associative behaviour and found that neuronal changes are distributed throughout the CPG. This preparation may afford us the opportunity to determine the casual neuronal changes that underlie learning and memory of associative conditioning.  相似文献   

18.
In the present study, we will try to single out several principles of the nervous system functioning essential for describing the mechanisms of learning and memory, basing on our own experimental investigation of cellular mechanisms of memory in the nervous system of gastropod molluscs and literature data as follows: (1) Main changes in functioning due to learning occur in the interneurons; (2) Due to learning some synaptic inputs of command neurons selectively change its effectivity; (3) Reinforcement is not related to activity of the neural chain receptor-sensory neuron-interneuron-motoneuron-effector; reinforcement is mediated via activity of modulatory neurons, and in some cases can be exerted by a single neuron; (4) Activity of modulatory neurons is necessary for development of plastic modifications of behaviour (including associative), but is not needed for recall of conditioned responses. At the same time, the modulatory neurons (in fact they constitute a neural reinforcement system) are necessary for recall of context associative memory; (5) Changes due to learning occur at least in two independent loci in the nervous system.  相似文献   

19.
Recent work has reawakened interest in goal-directed or ‘model-based’ choice, where decisions are based on prospective evaluation of potential action outcomes. Concurrently, there has been growing attention to the role of hierarchy in decision-making and action control. We focus here on the intersection between these two areas of interest, considering the topic of hierarchical model-based control. To characterize this form of action control, we draw on the computational framework of hierarchical reinforcement learning, using this to interpret recent empirical findings. The resulting picture reveals how hierarchical model-based mechanisms might play a special and pivotal role in human decision-making, dramatically extending the scope and complexity of human behaviour.  相似文献   

20.
Seo M  Lee E  Averbeck BB 《Neuron》2012,74(5):947-960
The role that frontal-striatal circuits play in normal behavior remains unclear. Two of the leading hypotheses suggest that these circuits are important for action selection or reinforcement learning. To examine these hypotheses, we carried out an experiment in which monkeys had to select actions in two different task conditions. In the first (random) condition, actions were selected on the basis of perceptual inference. In the second (fixed) condition, the animals used reinforcement from previous trials to select actions. Examination of neural activity showed that the representation of the selected action was stronger in lateral prefrontal cortex (lPFC), and occurred earlier in the lPFC than it did in the dorsal striatum (dSTR). In contrast to this, the representation of action values, in both the random and fixed conditions, was stronger in the dSTR. Thus, the dSTR contains an enriched representation of action value, but it followed frontal cortex in action selection.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号