期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The Computational Development of Reinforcement Learning during Adolescence

Stefano Palminteri Emma J. Kilford Giorgio Coricelli Sarah-Jayne Blakemore 《PLoS computational biology》2016,12(6)

Adolescence is a period of life characterised by changes in learning and decision-making. Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissociable computational modules. Here, we aimed to trace the developmental time-course of the computational modules responsible for learning from reward or punishment, and learning from counterfactual feedback. Adolescents and adults carried out a novel reinforcement learning paradigm in which participants learned the association between cues and probabilistic outcomes, where the outcomes differed in valence (reward versus punishment) and feedback was either partial or complete (either the outcome of the chosen option only, or the outcomes of both the chosen and unchosen option, were displayed). Computational strategies changed during development: whereas adolescents’ behaviour was better explained by a basic reinforcement learning algorithm, adults’ behaviour integrated increasingly complex computational features, namely a counterfactual learning module (enabling enhanced performance in the presence of complete feedback) and a value contextualisation module (enabling symmetrical reward and punishment learning). Unlike adults, adolescent performance did not benefit from counterfactual (complete) feedback. In addition, while adults learned symmetrically from both reward and punishment, adolescents learned from reward but were less likely to learn from punishment. This tendency to rely on rewards and not to consider alternative consequences of actions might contribute to our understanding of decision-making in adolescence. 相似文献

2.

Reinforcement Learning of Targeted Movement in a Spiking Neuronal Model of Motor Cortex

George L. Chadderdon Samuel A. Neymotin Cliff C. Kerr William W. Lytton 《PloS one》2012,7(10)

Sensorimotor control has traditionally been considered from a control theory perspective, without relation to neurobiology. In contrast, here we utilized a spiking-neuron model of motor cortex and trained it to perform a simple movement task, which consisted of rotating a single-joint “forearm” to a target. Learning was based on a reinforcement mechanism analogous to that of the dopamine system. This provided a global reward or punishment signal in response to decreasing or increasing distance from hand to target, respectively. Output was partially driven by Poisson motor babbling, creating stochastic movements that could then be shaped by learning. The virtual forearm consisted of a single segment rotated around an elbow joint, controlled by flexor and extensor muscles. The model consisted of 144 excitatory and 64 inhibitory event-based neurons, each with AMPA, NMDA, and GABA synapses. Proprioceptive cell input to this model encoded the 2 muscle lengths. Plasticity was only enabled in feedforward connections between input and output excitatory units, using spike-timing-dependent eligibility traces for synaptic credit or blame assignment. Learning resulted from a global 3-valued signal: reward (+1), no learning (0), or punishment (−1), corresponding to phasic increases, lack of change, or phasic decreases of dopaminergic cell firing, respectively. Successful learning only occurred when both reward and punishment were enabled. In this case, 5 target angles were learned successfully within 180 s of simulation time, with a median error of 8 degrees. Motor babbling allowed exploratory learning, but decreased the stability of the learned behavior, since the hand continued moving after reaching the target. Our model demonstrated that a global reinforcement signal, coupled with eligibility traces for synaptic plasticity, can train a spiking sensorimotor network to perform goal-directed motor behavior. 相似文献

3.

Evolution with Reinforcement Learning in Negotiation

Yi Zou Wenjie Zhan Yuan Shao 《PloS one》2014,9(7)

Adaptive behavior depends less on the details of the negotiation process and makes more robust predictions in the long term as compared to in the short term. However, the extant literature on population dynamics for behavior adjustment has only examined the current situation. To offset this limitation, we propose a synergy of evolutionary algorithm and reinforcement learning to investigate long-term collective performance and strategy evolution. The model adopts reinforcement learning with a tradeoff between historical and current information to make decisions when the strategies of agents evolve through repeated interactions. The results demonstrate that the strategies in populations converge to stable states, and the agents gradually form steady negotiation habits. Agents that adopt reinforcement learning perform better in payoff, fairness, and stableness than their counterparts using classic evolutionary algorithm. 相似文献

4.

Temporal-Difference Reinforcement Learning with Distributed Representations

Zeb Kurth-Nelson A. David Redish 《PloS one》2009,4(10)

Temporal-difference (TD) algorithms have been proposed as models of reinforcement learning (RL). We examine two issues of distributed representation in these TD algorithms: distributed representations of belief and distributed discounting factors. Distributed representation of belief allows the believed state of the world to distribute across sets of equivalent states. Distributed exponential discounting factors produce hyperbolic discounting in the behavior of the agent itself. We examine these issues in the context of a TD RL model in which state-belief is distributed over a set of exponentially-discounting “micro-Agents”, each of which has a separate discounting factor (γ). Each µAgent maintains an independent hypothesis about the state of the world, and a separate value-estimate of taking actions within that hypothesized state. The overall agent thus instantiates a flexible representation of an evolving world-state. As with other TD models, the value-error (δ) signal within the model matches dopamine signals recorded from animals in standard conditioning reward-paradigms. The distributed representation of belief provides an explanation for the decrease in dopamine at the conditioned stimulus seen in overtrained animals, for the differences between trace and delay conditioning, and for transient bursts of dopamine seen at movement initiation. Because each µAgent also includes its own exponential discounting factor, the overall agent shows hyperbolic discounting, consistent with behavioral experiments. 相似文献

5.

Reinforcement Learning Signal Predicts Social Conformity

Vasily Klucharev Kaisa Hytönen Mark Rijpkema Ale Smidts Guillén Fernández 《Neuron》2009,61(1):140-151

相似文献

6.

Vocal Learning: Shaping by Social Reinforcement

Daniel Y. Takahashi 《Current biology : CB》2019,29(4):R125-R127

相似文献

7.

Reinforcement Learning Enables Resource Partitioning in Foraging Bats

《Current biology : CB》2020,30(20):4096-4102.e6

相似文献

8.

Computer Use Changes Generalization of Movement Learning

《Current biology : CB》2014,24(1):82-85

相似文献

9.

Reinforcement during ecological speciation. 总被引：4，自引：0，他引：4

M Kirkpatrick 《Proceedings. Biological sciences / The Royal Society》2001,268(1473):1259-1263

Reinforcement of pre-zygotic isolation can result when any of several kinds of selection act against hybrids. This paper investigates the situation where hybrids are selected against for ecological reasons, for example when there is no niche for individuals that are phenotypically intermediate between the parental species. The calculations here show how much ecological selection can lead to the reinforcement of a female mating preference or an assortative mating trait that is expressed in both sexes. The model allows for the ecological trait to be affected by any number of loci, but assumes that selection is weak and the introgression rate small. The effect of selection against hybrids increases rapidly as the difference between the mean phenotypes of the two populations increases. When genetic variation in the ecological trait is caused by many loci, stabilizing selection on it further contributes to reinforcement. 相似文献

10.

Movement Coordination during Conversation

Nida Latif Adriano V. Barbosa Eric Vatiokiotis-Bateson Monica S. Castelhano K. G. Munhall 《PloS one》2014,9(8)

Behavioral coordination and synchrony contribute to a common biological mechanism that maintains communication, cooperation and bonding within many social species, such as primates and birds. Similarly, human language and social systems may also be attuned to coordination to facilitate communication and the formation of relationships. Gross similarities in movement patterns and convergence in the acoustic properties of speech have already been demonstrated between interacting individuals. In the present studies, we investigated how coordinated movements contribute to observers’ perception of affiliation (friends vs. strangers) between two conversing individuals. We used novel computational methods to quantify motor coordination and demonstrated that individuals familiar with each other coordinated their movements more frequently. Observers used coordination to judge affiliation between conversing pairs but only when the perceptual stimuli were restricted to head and face regions. These results suggest that observed movement coordination in humans might contribute to perceptual decisions based on availability of information to perceivers. 相似文献

11.

Reinforcement Learning on Slow Features of High-Dimensional Input Streams

Robert Legenstein Niko Wilbert Laurenz Wiskott 《PLoS computational biology》2010,6(8)

Humans and animals are able to learn complex behaviors based on a massive stream of sensory information from different modalities. Early animal studies have identified learning mechanisms that are based on reward and punishment such that animals tend to avoid actions that lead to punishment whereas rewarded actions are reinforced. However, most algorithms for reward-based learning are only applicable if the dimensionality of the state-space is sufficiently small or its structure is sufficiently simple. Therefore, the question arises how the problem of learning on high-dimensional data is solved in the brain. In this article, we propose a biologically plausible generic two-stage learning system that can directly be applied to raw high-dimensional input streams. The system is composed of a hierarchical slow feature analysis (SFA) network for preprocessing and a simple neural network on top that is trained based on rewards. We demonstrate by computer simulations that this generic architecture is able to learn quite demanding reinforcement learning tasks on high-dimensional visual input streams in a time that is comparable to the time needed when an explicit highly informative low-dimensional state-space representation is given instead of the high-dimensional visual input. The learning speed of the proposed architecture in a task similar to the Morris water maze task is comparable to that found in experimental studies with rats. This study thus supports the hypothesis that slowness learning is one important unsupervised learning principle utilized in the brain to form efficient state representations for behavioral learning. 相似文献

12.

Reinforcement Learning Explains Conditional Cooperation and Its Moody Cousin

Takahiro Ezaki Yutaka Horita Masanori Takezawa Naoki Masuda 《PLoS computational biology》2016,12(7)

Direct reciprocity, or repeated interaction, is a main mechanism to sustain cooperation under social dilemmas involving two individuals. For larger groups and networks, which are probably more relevant to understanding and engineering our society, experiments employing repeated multiplayer social dilemma games have suggested that humans often show conditional cooperation behavior and its moody variant. Mechanisms underlying these behaviors largely remain unclear. Here we provide a proximate account for this behavior by showing that individuals adopting a type of reinforcement learning, called aspiration learning, phenomenologically behave as conditional cooperator. By definition, individuals are satisfied if and only if the obtained payoff is larger than a fixed aspiration level. They reinforce actions that have resulted in satisfactory outcomes and anti-reinforce those yielding unsatisfactory outcomes. The results obtained in the present study are general in that they explain extant experimental results obtained for both so-called moody and non-moody conditional cooperation, prisoner’s dilemma and public goods games, and well-mixed groups and networks. Different from the previous theory, individuals are assumed to have no access to information about what other individuals are doing such that they cannot explicitly use conditional cooperation rules. In this sense, myopic aspiration learning in which the unconditional propensity of cooperation is modulated in every discrete time step explains conditional behavior of humans. Aspiration learners showing (moody) conditional cooperation obeyed a noisy GRIM-like strategy. This is different from the Pavlov, a reinforcement learning strategy promoting mutual cooperation in two-player situations. 相似文献

13.

Neurofeedback in Learning Disabled Children: Visual versus Auditory Reinforcement

Thalía?Fernández Email author Jorge?Bosch-Bayard Thalía?Harmony María?I.?Caballero Lourdes?Díaz-Comas Lídice?Galán Josefina?Ricardo-Garcell Eduardo?Aubert Gloria?Otero-Ojeda 《Applied psychophysiology and biofeedback》2016,41(1):27-37

Children with learning disabilities (LD) frequently have an EEG characterized by an excess of theta and a deficit of alpha activities. NFB using an auditory stimulus as reinforcer has proven to be a useful tool to treat LD children by positively reinforcing decreases of the theta/alpha ratio. The aim of the present study was to optimize the NFB procedure by comparing the efficacy of visual (with eyes open) versus auditory (with eyes closed) reinforcers. Twenty LD children with an abnormally high theta/alpha ratio were randomly assigned to the Auditory or the Visual group, where a 500 Hz tone or a visual stimulus (a white square), respectively, was used as a positive reinforcer when the value of the theta/alpha ratio was reduced. Both groups had signs consistent with EEG maturation, but only the Auditory Group showed behavioral/cognitive improvements. In conclusion, the auditory reinforcer was more efficacious in reducing the theta/alpha ratio, and it improved the cognitive abilities more than the visual reinforcer. 相似文献

14.

Pain: A Precision Signal for Reinforcement Learning and Control

Ben Seymour 《Neuron》2019,101(6):1029-1041

相似文献

15.

Correction: Movement Coordination during Conversation

The PLOS ONE Staff 《PloS one》2014,9(11)

相似文献

16.

A Predictive Reinforcement Model of Dopamine Neurons for Learning Approach Behavior

José L. Contreras-Vidal Wolfram Schultz 《Journal of computational neuroscience》1999,6(3):191-214

A neural network model of how dopamine and prefrontal cortex activity guides short- and long-term information processing within the cortico-striatal circuits during reward-related learning of approach behavior is proposed. The model predicts two types of reward-related neuronal responses generated during learning: (1) cell activity signaling errors in the prediction of the expected time of reward delivery and (2) neural activations coding for errors in the prediction of the amount and type of reward or stimulus expectancies. The former type of signal is consistent with the responses of dopaminergic neurons, while the latter signal is consistent with reward expectancy responses reported in the prefrontal cortex. It is shown that a neural network architecture that satisfies the design principles of the adaptive resonance theory of Carpenter and Grossberg (1987) can account for the dopamine responses to novelty, generalization, and discrimination of appetitive and aversive stimuli. These hypotheses are scrutinized via simulations of the model in relation to the delivery of free food outside a task, the timed contingent delivery of appetitive and aversive stimuli, and an asymmetric, instructed delay response task. 相似文献

17.

Reinforcement Learning Using a Continuous Time Actor-Critic Framework with Spiking Neurons

Nicolas Frémaux Henning Sprekeler Wulfram Gerstner 《PLoS computational biology》2013,9(4)

相似文献

18.

Optic Flow Processing for the Assessment of Object Movement during Ego Movement

Paul A. Warren Simon K. Rushton 《Current biology : CB》2009,19(18):1555-1560

相似文献

19.

Movement as a signal during classical conditioning

E K Davydova 《The Pavlovian journal of biological science》1988,23(3):95-101

Analysis of the available data and that of the author disclosed the peculiarities of motor reaction when used as a conditioned stimulus. The author's data showed that if signal value is attributed to a motor reaction (passive movement or movement evoked by the direct stimulation of the motor cortex), the changes of excitability in the motor cortex representation of the dog's leg depend on the biological sign of the reinforcing stimulus during classic conditioning. They also remained the same during instrumental conditioning and were opposite in sign, showed increased excitability in the food situation, and decreased excitability in the defense situation. Using the movement as a conditional stimulus, we managed to uncover the commonality between classic and instrumental conditioning. This enabled us to answer questions, discussed by Pavlov and Guthrie, which, it seems to us, had not been convincingly answered during their time. 相似文献

20.

Interhemispheric Interaction during Memorization of Movement Rhythm

O. A. Krotkova O. A. Maksakova N. V. D'yakova 《Human physiology》2002,28(1):7-11

After memorizing a simple periodical movement in the ankle joint, 36 healthy subjects (right-handers) reproduced it from memory. The movement rhythm image that formed during movement in the left ankle joint was memorized better than when the movement was performed by the right extremity. The specific features of interhemispheric interaction at various stages of trace processes, unequal participation of the hemispheres in perception and processing of information, and the physiological expedience of the signal transmission from the right to the left hemisphere are sufficient to assume a basic sequence of hemispheric functional activity redistribution when the brain masters new cognitive actions. 相似文献