首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Operant learning requires that reinforcement signals interact with action representations at a suitable neural interface. Much evidence suggests that this occurs when phasic dopamine, acting as a reinforcement prediction error, gates plasticity at cortico-striatal synapses, and thereby changes the future likelihood of selecting the action(s) coded by striatal neurons. But this hypothesis faces serious challenges. First, cortico-striatal plasticity is inexplicably complex, depending on spike timing, dopamine level, and dopamine receptor type. Second, there is a credit assignment problem—action selection signals occur long before the consequent dopamine reinforcement signal. Third, the two types of striatal output neuron have apparently opposite effects on action selection. Whether these factors rule out the interface hypothesis and how they interact to produce reinforcement learning is unknown. We present a computational framework that addresses these challenges. We first predict the expected activity changes over an operant task for both types of action-coding striatal neuron, and show they co-operate to promote action selection in learning and compete to promote action suppression in extinction. Separately, we derive a complete model of dopamine and spike-timing dependent cortico-striatal plasticity from in vitro data. We then show this model produces the predicted activity changes necessary for learning and extinction in an operant task, a remarkable convergence of a bottom-up data-driven plasticity model with the top-down behavioural requirements of learning theory. Moreover, we show the complex dependencies of cortico-striatal plasticity are not only sufficient but necessary for learning and extinction. Validating the model, we show it can account for behavioural data describing extinction, renewal, and reacquisition, and replicate in vitro experimental data on cortico-striatal plasticity. By bridging the levels between the single synapse and behaviour, our model shows how striatum acts as the action-reinforcement interface.  相似文献   

2.
The recently developed ‘two-step’ behavioural task promises to differentiate model-based from model-free reinforcement learning, while generating neurophysiologically-friendly decision datasets with parametric variation of decision variables. These desirable features have prompted its widespread adoption. Here, we analyse the interactions between a range of different strategies and the structure of transitions and outcomes in order to examine constraints on what can be learned from behavioural performance. The task involves a trade-off between the need for stochasticity, to allow strategies to be discriminated, and a need for determinism, so that it is worth subjects’ investment of effort to exploit the contingencies optimally. We show through simulation that under certain conditions model-free strategies can masquerade as being model-based. We first show that seemingly innocuous modifications to the task structure can induce correlations between action values at the start of the trial and the subsequent trial events in such a way that analysis based on comparing successive trials can lead to erroneous conclusions. We confirm the power of a suggested correction to the analysis that can alleviate this problem. We then consider model-free reinforcement learning strategies that exploit correlations between where rewards are obtained and which actions have high expected value. These generate behaviour that appears model-based under these, and also more sophisticated, analyses. Exploiting the full potential of the two-step task as a tool for behavioural neuroscience requires an understanding of these issues.  相似文献   

3.
Learning is often understood as an organism''s gradual acquisition of the association between a given sensory stimulus and the correct motor response. Mathematically, this corresponds to regressing a mapping between the set of observations and the set of actions. Recently, however, it has been shown both in cognitive and motor neuroscience that humans are not only able to learn particular stimulus-response mappings, but are also able to extract abstract structural invariants that facilitate generalization to novel tasks. Here we show how such structure learning can enhance facilitation in a sensorimotor association task performed by human subjects. Using regression and reinforcement learning models we show that the observed facilitation cannot be explained by these basic models of learning stimulus-response associations. We show, however, that the observed data can be explained by a hierarchical Bayesian model that performs structure learning. In line with previous results from cognitive tasks, this suggests that hierarchical Bayesian inference might provide a common framework to explain both the learning of specific stimulus-response associations and the learning of abstract structures that are shared by different task environments.  相似文献   

4.
The role of dopamine in behaviour and decision-making is often cast in terms of reinforcement learning and optimal decision theory. Here, we present an alternative view that frames the physiology of dopamine in terms of Bayes-optimal behaviour. In this account, dopamine controls the precision or salience of (external or internal) cues that engender action. In other words, dopamine balances bottom-up sensory information and top-down prior beliefs when making hierarchical inferences (predictions) about cues that have affordance. In this paper, we focus on the consequences of changing tonic levels of dopamine firing using simulations of cued sequential movements. Crucially, the predictions driving movements are based upon a hierarchical generative model that infers the context in which movements are made. This means that we can confuse agents by changing the context (order) in which cues are presented. These simulations provide a (Bayes-optimal) model of contextual uncertainty and set switching that can be quantified in terms of behavioural and electrophysiological responses. Furthermore, one can simulate dopaminergic lesions (by changing the precision of prediction errors) to produce pathological behaviours that are reminiscent of those seen in neurological disorders such as Parkinson's disease. We use these simulations to demonstrate how a single functional role for dopamine at the synaptic level can manifest in different ways at the behavioural level.  相似文献   

5.
Learning a complex task such as table tennis is a challenging problem for both robots and humans. Even after acquiring the necessary motor skills, a strategy is needed to choose where and how to return the ball to the opponent’s court in order to win the game. The data-driven identification of basic strategies in interactive tasks, such as table tennis, is a largely unexplored problem. In this paper, we suggest a computational model for representing and inferring strategies, based on a Markov decision problem, where the reward function models the goal of the task as well as the strategic information. We show how this reward function can be discovered from demonstrations of table tennis matches using model-free inverse reinforcement learning. The resulting framework allows to identify basic elements on which the selection of striking movements is based. We tested our approach on data collected from players with different playing styles and under different playing conditions. The estimated reward function was able to capture expert-specific strategic information that sufficed to distinguish the expert among players with different skill levels as well as different playing styles.  相似文献   

6.
According to the ideomotor theory, actions are represented in terms of their perceptual effects, offering a solution for the correspondence problem of imitation (how to translate the observed action into a corresponding motor output). This effect-based coding of action is assumed to be acquired through action-effect learning. Accordingly, performing an action leads to the integration of the perceptual codes of the action effects with the motor commands that brought them about. While ideomotor theory is invoked to account for imitation, the influence of action-effect learning on imitative behavior remains unexplored. In two experiments, imitative performance was measured in a reaction time task following a phase of action-effect acquisition. During action-effect acquisition, participants freely executed a finger movement (index or little finger lifting), and then observed a similar (compatible learning) or a different (incompatible learning) movement. In Experiment 1, finger movements of left and right hands were presented as action-effects during acquisition. In Experiment 2, only right-hand finger movements were presented during action-effect acquisition and in the imitation task the observed hands were oriented orthogonally to participants’ hands in order to avoid spatial congruency effects. Experiments 1 and 2 showed that imitative performance was improved after compatible learning, compared to incompatible learning. In Experiment 2, although action-effect learning involved perception of finger movements of right hand only, imitative capabilities of right- and left-hand finger movements were equally affected. These results indicate that an observed movement stimulus processed as the effect of an action can later prime execution of that action, confirming the ideomotor approach to imitation. We further discuss these findings in relation to previous studies of action-effect learning and in the framework of current ideomotor approaches to imitation.  相似文献   

7.
Cothros N  Wong J  Gribble PL 《PloS one》2008,3(4):e1990

Background

Previous studies of learning to adapt reaching movements in the presence of novel forces show that learning multiple force fields is prone to interference. Recently it has been suggested that force field learning may reflect learning to manipulate a novel object. Within this theoretical framework, interference in force field learning may be the result of static tactile or haptic cues associated with grasp, which fail to indicate changing dynamic conditions. The idea that different haptic cues (e.g. those associated with different grasped objects) signal motor requirements and promote the learning and retention of multiple motor skills has previously been unexplored in the context of force field learning.

Methodology/Principle Findings

The present study tested the possibility that interference can be reduced when two different force fields are associated with differently shaped objects grasped in the hand. Human subjects were instructed to guide a cursor to targets while grasping a robotic manipulandum, which applied two opposing velocity-dependent curl fields to the hand. For one group of subjects the manipulandum was fitted with two different handles, one for each force field. No attenuation in interference was observed in these subjects relative to controls who used the same handle for both force fields.

Conclusions/Significance

These results suggest that in the context of the present learning paradigm, haptic cues on their own are not sufficient to reduce interference and promote learning multiple force fields.  相似文献   

8.
A basic question, intimately tied to the problem of action selection, is that of how actions are assembled into organized sequences. Theories of routine sequential behaviour have long acknowledged that it must rely not only on environmental cues but also on some internal representation of temporal or task context. It is assumed, in most theories, that such internal representations must be organized into a strict hierarchy, mirroring the hierarchical structure of naturalistic sequential behaviour. This article reviews an alternative computational account, which asserts that the representations underlying naturalistic sequential behaviour need not, and arguably cannot, assume a strictly hierarchical form. One apparent liability of this theory is that it seems to contradict neuroscientific evidence indicating that different levels of sequential structure in behaviour are represented at different levels in a hierarchy of cortical areas. New simulations, reported here, show not only that the original computational account can be reconciled with this alignment between behavioural and neural organization, but also that it gives rise to a novel explanation for how this alignment might develop through learning.  相似文献   

9.
Decision-making ability in the frontal lobe (among other brain structures) relies on the assignment of value to states of the animal and its environment. Then higher valued states can be pursued and lower (or negative) valued states avoided. The same principle forms the basis for computational reinforcement learning controllers, which have been fruitfully applied both as models of value estimation in the brain, and as artificial controllers in their own right. This work shows how state desirability signals decoded from frontal lobe hemodynamics, as measured with near-infrared spectroscopy (NIRS), can be applied as reinforcers to an adaptable artificial learning agent in order to guide its acquisition of skills. A set of experiments carried out on an alert macaque demonstrate that both oxy- and deoxyhemoglobin concentrations in the frontal lobe show differences in response to both primarily and secondarily desirable (versus undesirable) stimuli. This difference allows a NIRS signal classifier to serve successfully as a reinforcer for an adaptive controller performing a virtual tool-retrieval task. The agent''s adaptability allows its performance to exceed the limits of the NIRS classifier decoding accuracy. We also show that decoding state desirabilities is more accurate when using relative concentrations of both oxyhemoglobin and deoxyhemoglobin, rather than either species alone.  相似文献   

10.
Many everyday skills are learned by binding otherwise independent actions into a unified sequence of responses across days or weeks of practice. Here we looked at how the dynamics of action planning and response binding change across such long timescales. Subjects (N = 23) were trained on a bimanual version of the serial reaction time task (32-item sequence) for two weeks (10 days total). Response times and accuracy both showed improvement with time, but appeared to be learned at different rates. Changes in response speed across training were associated with dynamic changes in response time variability, with faster learners expanding their variability during the early training days and then contracting response variability late in training. Using a novel measure of response chunking, we found that individual responses became temporally correlated across trials and asymptoted to set sizes of approximately 7 bound responses at the end of the first week of training. Finally, we used a state-space model of the response planning process to look at how predictive (i.e., response anticipation) and error-corrective (i.e., post-error slowing) processes correlated with learning rates for speed, accuracy and chunking. This analysis yielded non-monotonic association patterns between the state-space model parameters and learning rates, suggesting that different parts of the response planning process are relevant at different stages of long-term learning. These findings highlight the dynamic modulation of response speed, variability, accuracy and chunking as multiple movements become bound together into a larger set of responses during sequence learning.  相似文献   

11.
We often need to learn how to move based on a single performance measure that reflects the overall success of our movements. However, movements have many properties, such as their trajectories, speeds and timing of end-points, thus the brain needs to decide which properties of movements should be improved; it needs to solve the credit assignment problem. Currently, little is known about how humans solve credit assignment problems in the context of reinforcement learning. Here we tested how human participants solve such problems during a trajectory-learning task. Without an explicitly-defined target movement, participants made hand reaches and received monetary rewards as feedback on a trial-by-trial basis. The curvature and direction of the attempted reach trajectories determined the monetary rewards received in a manner that can be manipulated experimentally. Based on the history of action-reward pairs, participants quickly solved the credit assignment problem and learned the implicit payoff function. A Bayesian credit-assignment model with built-in forgetting accurately predicts their trial-by-trial learning.  相似文献   

12.
The coupling process between observed and performed actions is thought to be performed by a fronto-parietal perception-action system including regions of the inferior frontal gyrus and the inferior parietal lobule. When investigating the influence of the movements' characteristics on this process, most research on action observation has focused on only one particular variable even though the type of movements we observe can vary on several levels. By manipulating the visual perspective, transitivity and meaningfulness of observed movements in a functional magnetic resonance imaging study we aimed at investigating how the type of movements and the visual perspective can modulate brain activity during action observation in healthy individuals. Importantly, we used an active observation task where participants had to subsequently execute or imagine the observed movements. Our results show that the fronto-parietal regions of the perception action system were mostly recruited during the observation of meaningless actions while visual perspective had little influence on the activity within the perception-action system. Simultaneous investigation of several sources of modulation during active action observation is probably an approach that could lead to a greater ecological comprehension of this important sensorimotor process.  相似文献   

13.
Behavioral evidence suggests that instrumental conditioning is governed by two forms of action control: a goal-directed and a habit learning process. Model-based reinforcement learning (RL) has been argued to underlie the goal-directed process; however, the way in which it interacts with habits and the structure of the habitual process has remained unclear. According to a flat architecture, the habitual process corresponds to model-free RL, and its interaction with the goal-directed process is coordinated by an external arbitration mechanism. Alternatively, the interaction between these systems has recently been argued to be hierarchical, such that the formation of action sequences underlies habit learning and a goal-directed process selects between goal-directed actions and habitual sequences of actions to reach the goal. Here we used a two-stage decision-making task to test predictions from these accounts. The hierarchical account predicts that, because they are tied to each other as an action sequence, selecting a habitual action in the first stage will be followed by a habitual action in the second stage, whereas the flat account predicts that the statuses of the first and second stage actions are independent of each other. We found, based on subjects'' choices and reaction times, that human subjects combined single actions to build action sequences and that the formation of such action sequences was sufficient to explain habitual actions. Furthermore, based on Bayesian model comparison, a family of hierarchical RL models, assuming a hierarchical interaction between habit and goal-directed processes, provided a better fit of the subjects'' behavior than a family of flat models. Although these findings do not rule out all possible model-free accounts of instrumental conditioning, they do show such accounts are not necessary to explain habitual actions and provide a new basis for understanding how goal-directed and habitual action control interact.  相似文献   

14.
In a large variety of situations one would like to have an expressive and accurate model of observed animal or human behavior. While general purpose mathematical models may capture successfully properties of observed behavior, it is desirable to root models in biological facts. Because of ample empirical evidence for reward-based learning in visuomotor tasks, we use a computational model based on the assumption that the observed agent is balancing the costs and benefits of its behavior to meet its goals. This leads to using the framework of reinforcement learning, which additionally provides well-established algorithms for learning of visuomotor task solutions. To quantify the agent’s goals as rewards implicit in the observed behavior, we propose to use inverse reinforcement learning, which quantifies the agent’s goals as rewards implicit in the observed behavior. Based on the assumption of a modular cognitive architecture, we introduce a modular inverse reinforcement learning algorithm that estimates the relative reward contributions of the component tasks in navigation, consisting of following a path while avoiding obstacles and approaching targets. It is shown how to recover the component reward weights for individual tasks and that variability in observed trajectories can be explained succinctly through behavioral goals. It is demonstrated through simulations that good estimates can be obtained already with modest amounts of observation data, which in turn allows the prediction of behavior in novel configurations.  相似文献   

15.
Schizophrenia is characterized by an altered sense of the reality, associated with hallucinations and delusions. Some theories suggest that schizophrenia is related to a deficiency of the system that generates information about the sensory consequences of the actions realized by the subject. This system monitors the reafferent information resulting from an action and allows its anticipation. In the present study, we examined visual-event-related potentials (ERPs) generated by a sensorimotor task in 15 patients with schizophrenia and 15 normal controls. The visual feedback from hand movements performed by the subjects was experimentally distorted. Behavioral results showed that patients were impaired in recognizing their own movements. The ERP signal in patients also differed from those of control subjects. In patients, the ERP waveform was affected during the early part of the response (200 ms). This early effect in schizophrenic patients reveals a modified processing of the visual consequence of their actions.  相似文献   

16.
In socially tolerant settings, na?ve individuals may have opportunities to interact jointly with knowledgeable demonstrators and novel tasks. This process is expected to facilitate social learning. Individual experience may also be important for reinforcing and honing socially acquired behaviours. We examined the role of joint interaction and individual experience in the acquisition of a novel foraging task in captive cottontop tamarins. The task involved learning how to locate and access two hidden food rewards from among 10 differently cued forage sites. Tamarins were tested in three different conditions: (1) individually, (2) while interacting with a na?ve mate, and (3) while interacting with a mate trained as a knowledgeable demonstrator. For tamarins tested with mates present, we interspersed social input test days with exposure to the task while alone. Tamarins were tested again 17 months after their last exposure to the task, to assess long-term memory. All tamarins tested with knowledgeable demonstrators solved the task. In contrast, tamarins tested alone or with na?ve mates had similarly high levels of neophobia and low levels of task acquisition. We conclude that joint interaction occurs in mated pairs of cottontop tamarins and facilitates the spread of novel behaviour. Interspersing test days with a knowledgeable demonstrator present and test days alone with the task helped tamarins to achieve the ultimate goal of the task: obtaining food rewards. Tamarins performed similarly when tested 17 months later, regardless of their initial learning environment. Tamarins had memory deficits for the location of hidden food rewards, but retained memory of the necessary motor actions and solved the task.  相似文献   

17.
Emerging evidence suggests that a group of dietary-derived phytochemicals known as flavonoids are able to induce improvements in memory, learning and cognition. Flavonoids have been shown to modulate critical neuronal signalling pathways involved in processes of memory, and therefore are likely to affect synaptic plasticity and long-term potentiation mechanisms, widely considered to provide a basis for memory. Animal dietary supplementation studies have further shown that flavonoid-rich foods are able to reverse age-related spatial memory and spatial learning impairments. A more accurate understanding of how a particular spatial memory task works and of which aspects of memory and learning can be assessed in each case, are necessary for a correct interpretation of data relating to diet-cognition experiments. Further understanding of how specific behavioural tasks relate to the functioning of hippocampal circuitry during learning processes might be also elucidative of the specific observed memory improvements. The overall goal of this review is to give an overview of how the hippocampal circuitry operates as a memory system during behavioural tasks, which we believe will provide a new insight into the underlying mechanisms of the action of flavonoids on cognition.  相似文献   

18.
Exploring action dynamics as an index of paired-associate learning   总被引:1,自引:0,他引:1  
Dale R  Roche J  Snyder K  McCall R 《PloS one》2008,3(3):e1728
Much evidence exists supporting a richer interaction between cognition and action than commonly assumed. Such findings demonstrate that short-timescale processes, such as motor execution, may relate in systematic ways to longer-timescale cognitive processes, such as learning. We further substantiate one direction of this interaction: the flow of cognition into action systems. Two experiments explored match-to-sample paired-associate learning, in which participants learned randomized pairs of unfamiliar symbols. During the experiments, their hand movements were continuously tracked using the Nintendo Wiimote. Across learning, participant arm movements are initiated and completed more quickly, exhibit lower fluctuation, and exert more perturbation on the Wiimote during the button press. A second experiment demonstrated that action dynamics index novel learning scenarios, and not simply acclimatization to the Wiimote interface. Results support a graded and systematic covariation between cognition and action, and recommend ways in which this theoretical perspective may contribute to applied learning contexts.  相似文献   

19.
The current paper proposes a novel model for integrative learning of proactive visual attention and sensory-motor control as inspired by the premotor theory of visual attention. The model is characterized by coupling a slow dynamics network with a fast dynamics network and by inheriting our prior proposed multiple timescales recurrent neural networks model (MTRNN) that may correspond to the fronto-parietal networks in the cortical brains. The neuro-robotics experiments in a task of manipulating multiple objects utilizing the proposed model demonstrated that some degrees of generalization in terms of position and object size variation can be achieved by organizing seamless integration of the proactive object-related visual attention and the related sensory-motor control into a set of action primitives in the distributed neural activities appearing in the fast dynamics network. It was also shown that such action primitives can be combined in compositional ways in acquiring novel actions in the slow dynamics network. The experimental results presented substantiate the premotor theory of visual attention.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号