期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Speed/accuracy trade-off between the habitual and the goal-directed processes

Keramati M Dezfouli A Piray P 《PLoS computational biology》2011,7(5):e1002055

Instrumental responses are hypothesized to be of two kinds: habitual and goal-directed, mediated by the sensorimotor and the associative cortico-basal ganglia circuits, respectively. The existence of the two heterogeneous associative learning mechanisms can be hypothesized to arise from the comparative advantages that they have at different stages of learning. In this paper, we assume that the goal-directed system is behaviourally flexible, but slow in choice selection. The habitual system, in contrast, is fast in responding, but inflexible in adapting its behavioural strategy to new conditions. Based on these assumptions and using the computational theory of reinforcement learning, we propose a normative model for arbitration between the two processes that makes an approximately optimal balance between search-time and accuracy in decision making. Behaviourally, the model can explain experimental evidence on behavioural sensitivity to outcome at the early stages of learning, but insensitivity at the later stages. It also explains that when two choices with equal incentive values are available concurrently, the behaviour remains outcome-sensitive, even after extensive training. Moreover, the model can explain choice reaction time variations during the course of learning, as well as the experimental observation that as the number of choices increases, the reaction time also increases. Neurobiologically, by assuming that phasic and tonic activities of midbrain dopamine neurons carry the reward prediction error and the average reward signals used by the model, respectively, the model predicts that whereas phasic dopamine indirectly affects behaviour through reinforcing stimulus-response associations, tonic dopamine can directly affect behaviour through manipulating the competition between the habitual and the goal-directed systems and thus, affect reaction time. 相似文献

2.

Cortical and Hippocampal Correlates of Deliberation during Model-Based Decisions for Rewards in Humans

Aaron M. Bornstein Nathaniel D. Daw 《PLoS computational biology》2013,9(12)

How do we use our memories of the past to guide decisions we''ve never had to make before? Although extensive work describes how the brain learns to repeat rewarded actions, decisions can also be influenced by associations between stimuli or events not directly involving reward — such as when planning routes using a cognitive map or chess moves using predicted countermoves — and these sorts of associations are critical when deciding among novel options. This process is known as model-based decision making. While the learning of environmental relations that might support model-based decisions is well studied, and separately this sort of information has been inferred to impact decisions, there is little evidence concerning the full cycle by which such associations are acquired and drive choices. Of particular interest is whether decisions are directly supported by the same mnemonic systems characterized for relational learning more generally, or instead rely on other, specialized representations. Here, building on our previous work, which isolated dual representations underlying sequential predictive learning, we directly demonstrate that one such representation, encoded by the hippocampal memory system and adjacent cortical structures, supports goal-directed decisions. Using interleaved learning and decision tasks, we monitor predictive learning directly and also trace its influence on decisions for reward. We quantitatively compare the learning processes underlying multiple behavioral and fMRI observables using computational model fits. Across both tasks, a quantitatively consistent learning process explains reaction times, choices, and both expectation- and surprise-related neural activity. The same hippocampal and ventral stream regions engaged in anticipating stimuli during learning are also engaged in proportion to the difficulty of decisions. These results support a role for predictive associations learned by the hippocampal memory system to be recalled during choice formation. 相似文献

3.

Learning to communicate

White SA 《Current opinion in neurobiology》2001,11(4):510-520

Of the few animal groups that learn their vocalizations, songbirds are uniquely amenable to molecular, physiological, and behavioral analyses of the neural features responsible for vocal learning. In order to communicate effectively as an adult, a young songbird recognizes and memorizes a model of his species-specific song during a developmentally critical period called sensory acquisition. Factors are now emerging that contribute to the length and strength of this learning phase. In a second critical period, known as sensorimotor learning, the young bird uses auditory feedback to perfect his motor performance, creating a match to the memorized model. New studies show that motor matching can persist beyond sensorimotor learning, and thus a role for the acquired model might also persist into adulthood. Fascinating in their own right, songbirds also provide optimism that mature brains have recourse to plasticity. 相似文献

4.

Frontal cortex and reward-guided learning and decision-making

Rushworth MF Noonan MP Boorman ED Walton ME Behrens TE 《Neuron》2011,70(6):1054-1069

Reward-guided decision-making and learning depends on distributed neural circuits with many components. Here we focus on recent evidence that suggests four frontal lobe regions make distinct contributions to reward-guided learning and decision-making: the lateral orbitofrontal cortex, the ventromedial prefrontal cortex and adjacent medial orbitofrontal cortex, anterior cingulate cortex, and the anterior lateral prefrontal cortex. We attempt to identify common themes in experiments with human participants and with animal models, which suggest roles that the areas play in learning about reward associations, selecting reward goals, choosing actions to obtain reward, and monitoring the potential value of switching to alternative courses of action. 相似文献

5.

Contributions of the amygdala to reward expectancy and choice signals in human prefrontal cortex

Hampton AN Adolphs R Tyszka MJ O'Doherty JP 《Neuron》2007,55(4):545-555

The prefrontal cortex (PFC) receives substantial anatomical input from the amygdala, and these two structures have long been implicated in reward-related learning and decision making. Yet little is known about how these regions interact, especially in humans. We investigated the contribution of the amygdala to reward-related signals in PFC by scanning two rare subjects with focal bilateral amygdala lesions using fMRI. The subjects performed a reversal learning task in which they first had to learn which of two choices was the more rewarding, and then flexibly switch their choices when contingencies changed. Compared with healthy controls, both amygdala lesion subjects showed a profound change in ventromedial prefrontal cortex (vmPFC) activity associated with reward expectation and behavioral choice. These findings support a critical role for the human amygdala in establishing expected reward representations in PFC, which in turn may be used to guide behavioral choice. 相似文献

6.

Reward Maximization Justifies the Transition from Sensory Selection at Childhood to Sensory Integration at Adulthood

Pedram Daee Maryam S. Mirian Majid Nili Ahmadabadi 《PloS one》2014,9(7)

In a multisensory task, human adults integrate information from different sensory modalities -behaviorally in an optimal Bayesian fashion- while children mostly rely on a single sensor modality for decision making. The reason behind this change of behavior over age and the process behind learning the required statistics for optimal integration are still unclear and have not been justified by the conventional Bayesian modeling. We propose an interactive multisensory learning framework without making any prior assumptions about the sensory models. In this framework, learning in every modality and in their joint space is done in parallel using a single-step reinforcement learning method. A simple statistical test on confidence intervals on the mean of reward distributions is used to select the most informative source of information among the individual modalities and the joint space. Analyses of the method and the simulation results on a multimodal localization task show that the learning system autonomously starts with sensory selection and gradually switches to sensory integration. This is because, relying more on modalities -i.e. selection- at early learning steps (childhood) is more rewarding than favoring decisions learned in the joint space since, smaller state-space in modalities results in faster learning in every individual modality. In contrast, after gaining sufficient experiences (adulthood), the quality of learning in the joint space matures while learning in modalities suffers from insufficient accuracy due to perceptual aliasing. It results in tighter confidence interval for the joint space and consequently causes a smooth shift from selection to integration. It suggests that sensory selection and integration are emergent behavior and both are outputs of a single reward maximization process; i.e. the transition is not a preprogrammed phenomenon. 相似文献

7.

A neural circuit model of flexible sensorimotor mapping: learning and forgetting on multiple timescales

Fusi S Asaad WF Miller EK Wang XJ 《Neuron》2007,54(2):319-333

Volitional behavior relies on the brain's ability to remap sensory flow to motor programs whenever demanded by a changed behavioral context. To investigate the circuit basis of such flexible behavior, we have developed a biophysically based decision-making network model of spiking neurons for arbitrary sensorimotor mapping. The model quantitatively reproduces behavioral and prefrontal single-cell data from an experiment in which monkeys learn visuomotor associations that are reversed unpredictably from time to time. We show that when synaptic modifications occur on multiple timescales, the model behavior becomes flexible only when needed: slow components of learning usually dominate the decision process. However, if behavioral contexts change frequently enough, fast components of plasticity take over, and the behavior exhibits a quick forget-and-learn pattern. This model prediction is confirmed by monkey data. Therefore, our work reveals a scenario for conditional associative learning that is distinct from instant switching between sets of well-established sensorimotor associations. 相似文献

8.

A Spiking Neural Network Model of Model-Free Reinforcement Learning with High-Dimensional Sensory Input and Perceptual Ambiguity

Takashi Nakano Makoto Otsuka Junichiro Yoshimoto Kenji Doya 《PloS one》2015,10(3)

A theoretical framework of reinforcement learning plays an important role in understanding action selection in animals. Spiking neural networks provide a theoretically grounded means to test computational hypotheses on neurally plausible algorithms of reinforcement learning through numerical simulation. However, most of these models cannot handle observations which are noisy, or occurred in the past, even though these are inevitable and constraining features of learning in real environments. This class of problem is formally known as partially observable reinforcement learning (PORL) problems. It provides a generalization of reinforcement learning to partially observable domains. In addition, observations in the real world tend to be rich and high-dimensional. In this work, we use a spiking neural network model to approximate the free energy of a restricted Boltzmann machine and apply it to the solution of PORL problems with high-dimensional observations. Our spiking network model solves maze tasks with perceptually ambiguous high-dimensional observations without knowledge of the true environment. An extended model with working memory also solves history-dependent tasks. The way spiking neural networks handle PORL problems may provide a glimpse into the underlying laws of neural information processing which can only be discovered through such a top-down approach. 相似文献

9.

The ecology of action selection: insights from artificial life

Seth AK 《Philosophical transactions of the Royal Society of London. Series B, Biological sciences》2007,362(1485):1545-1558

相似文献

10.

Hippocampal Lesions Impair Rapid Learning of a Continuous Spatial Alternation Task

Steve M. Kim Loren M. Frank 《PloS one》2009,4(5)

The hippocampus is essential for the formation of memories for events, but the specific features of hippocampal neural activity that support memory formation are not yet understood. The ideal experiment to explore this issue would be to monitor changes in hippocampal neural coding throughout the entire learning process, as subjects acquire and use new episodic memories to guide behavior. Unfortunately, it is not clear whether established hippocampally-dependent learning paradigms are suitable for this kind of experiment. The goal of this study was to determine whether learning of the W-track continuous alternation task depends on the hippocampal formation. We tested six rats with NMDA lesions of the hippocampal formation and four sham-operated controls. Compared to controls, rats with hippocampal lesions made a significantly higher proportion of errors and took significantly longer to reach learning criterion. The effect of hippocampal lesion was not due to a deficit in locomotion or motivation, because rats with hippocampal lesions ran well on a linear track for food reward. Rats with hippocampal lesions also exhibited a pattern of perseverative errors during early task experience suggestive of an inability to suppress behaviors learned during pretraining on a linear track. Our findings establish the W-track continuous alternation task as a hippocampally-dependent learning paradigm which may be useful for identifying changes in the neural representation of spatial sequences and reward contingencies as rats learn and apply new task rules. 相似文献

11.

Altered risk-based decision making following adolescent alcohol use results from an imbalance in reinforcement learning in rats

Clark JJ Nasrallah NA Hart AS Collins AL Bernstein IL Phillips PE 《PloS one》2012,7(5):e37357

Alcohol use during adolescence has profound and enduring consequences on decision-making under risk. However, the fundamental psychological processes underlying these changes are unknown. Here, we show that alcohol use produces over-fast learning for better-than-expected, but not worse-than-expected, outcomes without altering subjective reward valuation. We constructed a simple reinforcement learning model to simulate altered decision making using behavioral parameters extracted from rats with a history of adolescent alcohol use. Remarkably, the learning imbalance alone was sufficient to simulate the divergence in choice behavior observed between these groups of animals. These findings identify a selective alteration in reinforcement learning following adolescent alcohol use that can account for a robust change in risk-based decision making persisting into later life. 相似文献

12.

Localizing Brain Regions Associated with Female Mate Preference Behavior in a Swordtail

Ryan Y. Wong Mary E. Ramsey Molly E. Cummings 《PloS one》2012,7(11)

Female mate choice behavior is a critical component of sexual selection, yet identifying the neural basis of this behavior is largely unresolved. Previous studies have implicated sensory processing and hypothalamic brain regions during female mate choice and there is a conserved network of brain regions (Social Behavior Network, SBN) that underlies sexual behaviors. However, we are only beginning to understand the role this network has in pre-copulatory female mate choice. Using in situ hybridization, we identify brain regions associated with mate preference in female Xiphophorus nigrensis, a swordtail species with a female choice mating system. We measure gene expression in 10 brain regions (linked to sexual behavior, reward, sensory integration or other processes) and find significant correlations between female preference behavior and gene expression in two telencephalic areas associated with reward, learning and multi-sensory processing (medial and lateral zones of the dorsal telencephalon) as well as an SBN region traditionally associated with sexual response (preoptic area). Network analysis shows that these brain regions may also be important in mate preference and that correlated patterns of neuroserpin expression between regions co-vary with differential compositions of the mate choice environment. Our results expand the emerging network for female preference from one that focused on sensory processing and midbrain sexual response centers to a more complex coordination involving forebrain areas that integrate primary sensory processing and reward. 相似文献

13.

Stimulus-dependent adjustment of reward prediction error in the midbrain

Takemura H Samejima K Vogels R Sakagami M Okuda J 《PloS one》2011,6(12):e28337

Previous reports have described that neural activities in midbrain dopamine areas are sensitive to unexpected reward delivery and omission. These activities are correlated with reward prediction error in reinforcement learning models, the difference between predicted reward values and the obtained reward outcome. These findings suggest that the reward prediction error signal in the brain updates reward prediction through stimulus-reward experiences. It remains unknown, however, how sensory processing of reward-predicting stimuli contributes to the computation of reward prediction error. To elucidate this issue, we examined the relation between stimulus discriminability of the reward-predicting stimuli and the reward prediction error signal in the brain using functional magnetic resonance imaging (fMRI). Before main experiments, subjects learned an association between the orientation of a perceptually salient (high-contrast) Gabor patch and a juice reward. The subjects were then presented with lower-contrast Gabor patch stimuli to predict a reward. We calculated the correlation between fMRI signals and reward prediction error in two reinforcement learning models: a model including the modulation of reward prediction by stimulus discriminability and a model excluding this modulation. Results showed that fMRI signals in the midbrain are more highly correlated with reward prediction error in the model that includes stimulus discriminability than in the model that excludes stimulus discriminability. No regions showed higher correlation with the model that excludes stimulus discriminability. Moreover, results show that the difference in correlation between the two models was significant from the first session of the experiment, suggesting that the reward computation in the midbrain was modulated based on stimulus discriminability before learning a new contingency between perceptually ambiguous stimuli and a reward. These results suggest that the human reward system can incorporate the level of the stimulus discriminability flexibly into reward computations by modulating previously acquired reward values for a typical stimulus. 相似文献

14.

Structure Learning in a Sensorimotor Association Task

Daniel A. Braun Stephan Waldert Ad Aertsen Daniel M. Wolpert Carsten Mehring 《PloS one》2010,5(1)

Learning is often understood as an organism''s gradual acquisition of the association between a given sensory stimulus and the correct motor response. Mathematically, this corresponds to regressing a mapping between the set of observations and the set of actions. Recently, however, it has been shown both in cognitive and motor neuroscience that humans are not only able to learn particular stimulus-response mappings, but are also able to extract abstract structural invariants that facilitate generalization to novel tasks. Here we show how such structure learning can enhance facilitation in a sensorimotor association task performed by human subjects. Using regression and reinforcement learning models we show that the observed facilitation cannot be explained by these basic models of learning stimulus-response associations. We show, however, that the observed data can be explained by a hierarchical Bayesian model that performs structure learning. In line with previous results from cognitive tasks, this suggests that hierarchical Bayesian inference might provide a common framework to explain both the learning of specific stimulus-response associations and the learning of abstract structures that are shared by different task environments. 相似文献

15.

Choice history effects in mice and humans improve reward harvesting efficiency

Junior Samuel Lpez-Ypez Juliane Martin Oliver Hulme Duda Kvitsiani 《PLoS computational biology》2021,17(10)

相似文献

16.

Led into Temptation? Rewarding Brand Logos Bias the Neural Encoding of Incidental Economic Decisions

Murawski C Harris PG Bode S Domínguez D JF Egan GF 《PloS one》2012,7(3):e34155

Human decision-making is driven by subjective values assigned to alternative choice options. These valuations are based on reward cues. It is unknown, however, whether complex reward cues, such as brand logos, may bias the neural encoding of subjective value in unrelated decisions. In this functional magnetic resonance imaging (fMRI) study, we subliminally presented brand logos preceding intertemporal choices. We demonstrated that priming biased participants' preferences towards more immediate rewards in the subsequent temporal discounting task. This was associated with modulations of the neural encoding of subjective values of choice options in a network of brain regions, including but not restricted to medial prefrontal cortex. Our findings demonstrate the general susceptibility of the human decision making system to apparently incidental contextual information. We conclude that the brain incorporates seemingly unrelated value information that modifies decision making outside the decision-maker's awareness. 相似文献

17.

Interference and Shaping in Sensorimotor Adaptations with Rewards

Ran Darshan Arthur Leblois David Hansel 《PLoS computational biology》2014,10(1)

When a perturbation is applied in a sensorimotor transformation task, subjects can adapt and maintain performance by either relying on sensory feedback, or, in the absence of such feedback, on information provided by rewards. For example, in a classical rotation task where movement endpoints must be rotated to reach a fixed target, human subjects can successfully adapt their reaching movements solely on the basis of binary rewards, although this proves much more difficult than with visual feedback. Here, we investigate such a reward-driven sensorimotor adaptation process in a minimal computational model of the task. The key assumption of the model is that synaptic plasticity is gated by the reward. We study how the learning dynamics depend on the target size, the movement variability, the rotation angle and the number of targets. We show that when the movement is perturbed for multiple targets, the adaptation process for the different targets can interfere destructively or constructively depending on the similarities between the sensory stimuli (the targets) and the overlap in their neuronal representations. Destructive interferences can result in a drastic slowdown of the adaptation. As a result of interference, the time to adapt varies non-linearly with the number of targets. Our analysis shows that these interferences are weaker if the reward varies smoothly with the subject''s performance instead of being binary. We demonstrate how shaping the reward or shaping the task can accelerate the adaptation dramatically by reducing the destructive interferences. We argue that experimentally investigating the dynamics of reward-driven sensorimotor adaptation for more than one sensory stimulus can shed light on the underlying learning rules. 相似文献

18.

Learning from sensory and reward prediction errors during motor adaptation

Izawa J Shadmehr R 《PLoS computational biology》2011,7(3):e1002012

Voluntary motor commands produce two kinds of consequences. Initially, a sensory consequence is observed in terms of activity in our primary sensory organs (e.g., vision, proprioception). Subsequently, the brain evaluates the sensory feedback and produces a subjective measure of utility or usefulness of the motor commands (e.g., reward). As a result, comparisons between predicted and observed consequences of motor commands produce two forms of prediction error. How do these errors contribute to changes in motor commands? Here, we considered a reach adaptation protocol and found that when high quality sensory feedback was available, adaptation of motor commands was driven almost exclusively by sensory prediction errors. This form of learning had a distinct signature: as motor commands adapted, the subjects altered their predictions regarding sensory consequences of motor commands, and generalized this learning broadly to neighboring motor commands. In contrast, as the quality of the sensory feedback degraded, adaptation of motor commands became more dependent on reward prediction errors. Reward prediction errors produced comparable changes in the motor commands, but produced no change in the predicted sensory consequences of motor commands, and generalized only locally. Because we found that there was a within subject correlation between generalization patterns and sensory remapping, it is plausible that during adaptation an individual''s relative reliance on sensory vs. reward prediction errors could be inferred. We suggest that while motor commands change because of sensory and reward prediction errors, only sensory prediction errors produce a change in the neural system that predicts sensory consequences of motor commands. 相似文献

19.

A neural network model for the intersensory coordination involved in goal-directed movements 总被引：2，自引：0，他引：2

Y. Coiton J. C. Gilhodes J. L. Velay J. P. Roll 《Biological cybernetics》1991,66(2):167-176

A neural network model for a sensorimotor system, which was developed to simulate oriented movements in man, is presented. It is composed of a formal neural network comprising two layers: a sensory layer receiving and processing sensory inputs, and a motor layer driving a simulated arm. The sensory layer is an extension of the topological network previously proposed by Kohonen (1984). Two kinds of sensory modality, proprioceptive and exteroceptive, are used to define the arm position. Each sensory cell receives proprioceptive inputs provided by each arm-joint together with the exteroceptive inputs. This sensory layer is therefore a kind of associative layer which integrates two separate sensory signals relating to movement coding. It is connected to the motor layer by means of adaptive synapses which provide a physical link between a motor activity and its sensory consequences. After a learning period, the spatial map which emerges in the sensory layer clearly depends on the sensory inputs and an associative map of both the arm and the extra-personal space is built up if proprioceptive and exteroceptive signals are processed together. The sensorimotor transformations occuring in the junctions linking the sensory and motor layers are organized in such a manner that the simulated arm becomes able to reach towards and track a target in extra-personal space. Proprioception serves to determine the final arm posture adopted and to correct the ongoing movement in cases where changes in the target location occur. With a view to developing a sensorimotor control system with more realistic salient features, a robotic model was coupled with the formal neural network. This robotic implementation of our model shows the capacity of formal neural networks to control the displacement of mechanical devices. 相似文献

20.

Embodied Choice: How Action Influences Perceptual Decision Making

Nathan F. Lepora Giovanni Pezzulo 《PLoS computational biology》2015,11(4)

Embodied Choice considers action performance as a proper part of the decision making process rather than merely as a means to report the decision. The central statement of embodied choice is the existence of bidirectional influences between action and decisions. This implies that for a decision expressed by an action, the action dynamics and its constraints (e.g. current trajectory and kinematics) influence the decision making process. Here we use a perceptual decision making task to compare three types of model: a serial decision-then-action model, a parallel decision-and-action model, and an embodied choice model where the action feeds back into the decision making. The embodied model incorporates two key mechanisms that together are lacking in the other models: action preparation and commitment. First, action preparation strategies alleviate delays in enacting a choice but also modify decision termination. Second, action dynamics change the prospects and create a commitment effect to the initially preferred choice. Our results show that these two mechanisms make embodied choice models better suited to combine decision and action appropriately to achieve suitably fast and accurate responses, as usually required in ecologically valid situations. Moreover, embodied choice models with these mechanisms give a better account of trajectory tracking experiments during decision making. In conclusion, the embodied choice framework offers a combined theory of decision and action that gives a clear case that embodied phenomena such as the dynamics of actions can have a causal influence on central cognition. 相似文献