首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Morris G  Arkadir D  Nevet A  Vaadia E  Bergman H 《Neuron》2004,43(1):133-143
Midbrain dopamine and striatal tonically active neurons (TANs, presumed acetylcholine interneurons) signal behavioral significance of environmental events. Since striatal dopamine and acetylcholine affect plasticity of cortico-striatal transmission and are both crucial to learning, they may serve as teachers in the basal ganglia circuits. We recorded from both neuronal populations in monkeys performing a probabilistic instrumental conditioning task. Both neuronal types respond robustly to reward-related events. Although different events yielded responses with different latencies, the responses of the two populations coincided, indicating integration at the target level. Yet, while the dopamine neurons' response reflects mismatch between expectation and outcome in the positive domain, the TANs are invariant to reward predictability. Finally, TAN pairs are synchronized, compared to a minority of dopamine neuron pairs. We conclude that the striatal cholinergic and dopaminergic systems carry distinct messages by different means, which can be integrated differently to shape the basal ganglia responses to reward-related events.  相似文献   

2.
A fundamental goal of neuroscience is to understand how cognitive processes, such as operant conditioning, are performed by the brain. Typical and well studied examples of operant conditioning, in which the firing rates of individual cortical neurons in monkeys are increased using rewards, provide an opportunity for insight into this. Studies of reward-modulated spike-timing-dependent plasticity (RSTDP), and of other models such as R-max, have reproduced this learning behavior, but they have assumed that no unsupervised learning is present (i.e., no learning occurs without, or independent of, rewards). We show that these models cannot elicit firing rate reinforcement while exhibiting both reward learning and ongoing, stable unsupervised learning. To fix this issue, we propose a new RSTDP model of synaptic plasticity based upon the observed effects that dopamine has on long-term potentiation and depression (LTP and LTD). We show, both analytically and through simulations, that our new model can exhibit unsupervised learning and lead to firing rate reinforcement. This requires that the strengthening of LTP by the reward signal is greater than the strengthening of LTD and that the reinforced neuron exhibits irregular firing. We show the robustness of our findings to spike-timing correlations, to the synaptic weight dependence that is assumed, and to changes in the mean reward. We also consider our model in the differential reinforcement of two nearby neurons. Our model aligns more strongly with experimental studies than previous models and makes testable predictions for future experiments.  相似文献   

3.
An open problem in the field of computational neuroscience is how to link synaptic plasticity to system-level learning. A promising framework in this context is temporal-difference (TD) learning. Experimental evidence that supports the hypothesis that the mammalian brain performs temporal-difference learning includes the resemblance of the phasic activity of the midbrain dopaminergic neurons to the TD error and the discovery that cortico-striatal synaptic plasticity is modulated by dopamine. However, as the phasic dopaminergic signal does not reproduce all the properties of the theoretical TD error, it is unclear whether it is capable of driving behavior adaptation in complex tasks. Here, we present a spiking temporal-difference learning model based on the actor-critic architecture. The model dynamically generates a dopaminergic signal with realistic firing rates and exploits this signal to modulate the plasticity of synapses as a third factor. The predictions of our proposed plasticity dynamics are in good agreement with experimental results with respect to dopamine, pre- and post-synaptic activity. An analytical mapping from the parameters of our proposed plasticity dynamics to those of the classical discrete-time TD algorithm reveals that the biological constraints of the dopaminergic signal entail a modified TD algorithm with self-adapting learning parameters and an adapting offset. We show that the neuronal network is able to learn a task with sparse positive rewards as fast as the corresponding classical discrete-time TD algorithm. However, the performance of the neuronal network is impaired with respect to the traditional algorithm on a task with both positive and negative rewards and breaks down entirely on a task with purely negative rewards. Our model demonstrates that the asymmetry of a realistic dopaminergic signal enables TD learning when learning is driven by positive rewards but not when driven by negative rewards.  相似文献   

4.
The plasticity in the medial Prefrontal Cortex (mPFC) of rodents or lateral prefrontal cortex in non human primates (lPFC), plays a key role neural circuits involved in learning and memory. Several genes, like brain-derived neurotrophic factor (BDNF), cAMP response element binding (CREB), Synapsin I, Calcium/calmodulin-dependent protein kinase II (CamKII), activity-regulated cytoskeleton-associated protein (Arc), c-jun and c-fos have been related to plasticity processes. We analysed differential expression of related plasticity genes and immediate early genes in the mPFC of rats during learning an operant conditioning task. Incompletely and completely trained animals were studied because of the distinct events predicted by our computational model at different learning stages. During learning an operant conditioning task, we measured changes in the mRNA levels by Real-Time RT-PCR during learning; expression of these markers associated to plasticity was incremented while learning and such increments began to decline when the task was learned. The plasticity changes in the lPFC during learning predicted by the model matched up with those of the representative gene BDNF. Herein, we showed for the first time that plasticity in the mPFC in rats during learning of an operant conditioning is higher while learning than when the task is learned, using an integrative approach of a computational model and gene expression.  相似文献   

5.
6.
Reinforcement learning theorizes that strengthening of synaptic connections in medium spiny neurons of the striatum occurs when glutamatergic input (from cortex) and dopaminergic input (from substantia nigra) are received simultaneously. Subsequent to learning, medium spiny neurons with strengthened synapses are more likely to fire in response to cortical input alone. This synaptic plasticity is produced by phosphorylation of AMPA receptors, caused by phosphorylation of various signalling molecules. A key signalling molecule is the phosphoprotein DARPP-32, highly expressed in striatal medium spiny neurons. DARPP-32 is regulated by several neurotransmitters through a complex network of intracellular signalling pathways involving cAMP (increased through dopamine stimulation) and calcium (increased through glutamate stimulation). Since DARPP-32 controls several kinases and phosphatases involved in striatal synaptic plasticity, understanding the interactions between cAMP and calcium, in particular the effect of transient stimuli on DARPP-32 phosphorylation, has major implications for understanding reinforcement learning. We developed a computer model of the biochemical reaction pathways involved in the phosphorylation of DARPP-32 on Thr34 and Thr75. Ordinary differential equations describing the biochemical reactions were implemented in a single compartment model using the software XPPAUT. Reaction rate constants were obtained from the biochemical literature. The first set of simulations using sustained elevations of dopamine and calcium produced phosphorylation levels of DARPP-32 similar to that measured experimentally, thereby validating the model. The second set of simulations, using the validated model, showed that transient dopamine elevations increased the phosphorylation of Thr34 as expected, but transient calcium elevations also increased the phosphorylation of Thr34, contrary to what is believed. When transient calcium and dopamine stimuli were paired, PKA activation and Thr34 phosphorylation increased compared with dopamine alone. This result, which is robust to variation in model parameters, supports reinforcement learning theories in which activity-dependent long-term synaptic plasticity requires paired glutamate and dopamine inputs.  相似文献   

7.
The striatum is the main input station of the basal ganglia and is strongly associated with motor and cognitive functions. Anatomical evidence suggests that individual striatal neurons are unlikely to share their inputs from the cortex. Using a biologically realistic large-scale network model of striatum and cortico-striatal projections, we provide a functional interpretation of the special anatomical structure of these projections. Specifically, we show that weak pairwise correlation within the pool of inputs to individual striatal neurons enhances the saliency of signal representation in the striatum. By contrast, correlations among the input pools of different striatal neurons render the signal representation less distinct from background activity. We suggest that for the network architecture of the striatum, there is a preferred cortico-striatal input configuration for optimal signal representation. It is further enhanced by the low-rate asynchronous background activity in striatum, supported by the balance between feedforward and feedback inhibitions in the striatal network. Thus, an appropriate combination of rates and correlations in the striatal input sets the stage for action selection presumably implemented in the basal ganglia.  相似文献   

8.
A possible mechanism of participation of cholinergic striatal interneurons and dopaminergic cells in conditioned selection of a certain types of motor activity is proposed. This selection is triggered by simultaneous increase in the activity of dopaminergic cells and a pause in the activity of cholinergic interneurons in response to a conditioned stimulus. This pause is promoted by activation of striatal inhibitory interneurons and action of dopamine at D2 receptors on cholinergic cells. Opposite changes in dopamine and acetylcholine concentration synergistically modulate the efficacy of corticostriatal inputs, modulation rules for the "strong" and "weak" corticostriatal inputs are opposite. Subsequent reorganization of neuronal firing in the loop cortex--basal ganglia--thalamus--cortex results in amplification of activity of the group of cortical neurons that strongly activate striatal cells, and simultaneous suppression of activity of another group of cortical neurons that weakly activate striatal cells. These changes can underlie a conditioned selection of motor activity performed with involvement of the motor cortex. As follows from the proposed model, if the time delay between conditioned and unconditioned stimuli does not exceed the latency of responses of dopaminergic and cholinergic cells (about 100 ms), conditioned selection of motor activity and learning is problematic.  相似文献   

9.
The acquisition of new motor skills is essential throughout daily life and involves the processes of learning new motor sequence and encoding elementary aspects of new movement. Although previous animal studies have suggested a functional importance for striatal dopamine release in the learning of new motor sequence, its role in encoding elementary aspects of new movement has not yet been investigated. To elucidate this, we investigated changes in striatal dopamine levels during initial skill-training (Day 1) compared with acquired conditions (Day 2) using (11)C-raclopride positron-emission tomography. Ten volunteers learned to perform brisk contractions using their non-dominant left thumbs with the aid of visual feedback. On Day 1, the mean acceleration of each session was improved through repeated training sessions until performance neared asymptotic levels, while improved motor performance was retained from the beginning on Day 2. The (11)C-raclopride binding potential (BP) in the right putamen was reduced during initial skill-training compared with under acquired conditions. Moreover, voxel-wise analysis revealed that (11)C-raclopride BP was particularly reduced in the right antero-dorsal to the lateral part of the putamen. Based on findings from previous fMRI studies that show a gradual shift of activation within the striatum during the initial processing of motor learning, striatal dopamine may play a role in the dynamic cortico-striatal activation during encoding of new motor memory in skill acquisition.  相似文献   

10.
Extinction describes the process of attenuating behavioral responses to neutral stimuli when they no longer provide the reinforcement that has been maintaining the behavior. There is close correspondence between fear and human anxiety, and therefore studies of extinction learning might provide insight into the biological nature of anxiety-related disorders such as post-traumatic stress disorder, and they might help to develop strategies to treat them. Preclinical research aims to aid extinction learning and to induce targeted plasticity in extinction circuits to consolidate the newly formed memory. Vagus nerve stimulation (VNS) is a powerful approach that provides tight temporal and circuit-specific release of neurotransmitters, resulting in modulation of neuronal networks engaged in an ongoing task. VNS enhances memory consolidation in both rats and humans, and pairing VNS with exposure to conditioned cues enhances the consolidation of extinction learning in rats. Here, we provide a detailed protocol for the preparation of custom-made parts and the surgical procedures required for VNS in rats. Using this protocol we show how VNS can facilitate the extinction of conditioned fear responses in an auditory fear conditioning task. In addition, we provide evidence that VNS modulates synaptic plasticity in the pathway between the infralimbic (IL) medial prefrontal cortex and the basolateral complex of the amygdala (BLA), which is involved in the expression and modulation of extinction memory.  相似文献   

11.
The basal ganglia is a brain region critically involved in reinforcement learning and motor control. Synaptic plasticity in the striatum of the basal ganglia is a cellular mechanism implicated in learning and neuronal information processing. Therefore, understanding how different spatio-temporal patterns of synaptic input select for different types of plasticity is key to understanding learning mechanisms. In striatal medium spiny projection neurons (MSPN), both long term potentiation (LTP) and long term depression (LTD) require an elevation in intracellular calcium concentration; however, it is unknown how the post-synaptic neuron discriminates between different patterns of calcium influx. Using computer modeling, we investigate the hypothesis that temporal pattern of stimulation can select for either endocannabinoid production (for LTD) or protein kinase C (PKC) activation (for LTP) in striatal MSPNs. We implement a stochastic model of the post-synaptic signaling pathways in a dendrite with one or more diffusionally coupled spines. The model is validated by comparison to experiments measuring endocannabinoid-dependent depolarization induced suppression of inhibition. Using the validated model, simulations demonstrate that theta burst stimulation, which produces LTP, increases the activation of PKC as compared to 20 Hz stimulation, which produces LTD. The model prediction that PKC activation is required for theta burst LTP is confirmed experimentally. Using the ratio of PKC to endocannabinoid production as an index of plasticity direction, model simulations demonstrate that LTP exhibits spine level spatial specificity, whereas LTD is more diffuse. These results suggest that spatio-temporal control of striatal information processing employs these Gq coupled pathways.  相似文献   

12.
In contemporary reinforcement learning models, reward prediction error (RPE), the difference between the expected and actual reward, is thought to guide action value learning through the firing activity of dopaminergic neurons. Given the importance of dopamine in reward learning and the involvement of Akt1 in dopamine-dependent behaviors, the aim of this study was to investigate whether Akt1 deficiency modulates reward learning and the magnitude of RPE using Akt1 mutant mice as a model. In comparison to wild-type littermate controls, the expression of Akt1 proteins in mouse brains occurred in a gene-dosage-dependent manner and Akt1 heterozygous (HET) mice exhibited impaired striatal Akt1 activity under methamphetamine challenge. No genotypic difference was found in the basal levels of dopamine and its metabolites. In a series of reward-related learning tasks, HET mice displayed a relatively efficient method of updating reward information from the environment during the acquisition phase of the two natural reward tasks and in the reverse section of the dynamic foraging T-maze but not in methamphetamine-induced or aversive-related reward learning. The implementation of a standard reinforcement learning model and the Bayesian hierarchical parameter estimation show that HET mice have higher RPE magnitudes and that their action values are updated more rapidly among all three test sections in T-maze. These results indicate that Akt1 deficiency modulates natural reward learning and RPE. This study showed a promising avenue for investigating RPE in mutant mice and provided evidence for the potential link from genetic deficiency, to neurobiological abnormalities, to impairment in higher-order cognitive functioning.  相似文献   

13.
The neural basis of positive reinforcement is often studied in the laboratory using intracranial self-stimulation (ICSS), a simple behavioral model in which subjects perform an action in order to obtain exogenous stimulation of a specific brain area. Recently we showed that activation of ventral tegmental area (VTA) dopamine neurons supports ICSS behavior, consistent with proposed roles of this neural population in reinforcement learning. However, VTA dopamine neurons make connections with diverse brain regions, and the specific efferent target(s) that mediate the ability of dopamine neuron activation to support ICSS have not been definitively demonstrated. Here, we examine in transgenic rats whether dopamine neuron-specific ICSS relies on the connection between the VTA and the nucleus accumbens (NAc), a brain region also implicated in positive reinforcement. We find that optogenetic activation of dopaminergic terminals innervating the NAc is sufficient to drive ICSS, and that ICSS driven by optical activation of dopamine neuron somata in the VTA is significantly attenuated by intra-NAc injections of D1 or D2 receptor antagonists. These data demonstrate that the NAc is a critical efferent target sustaining dopamine neuron-specific ICSS, identify receptor subtypes through which dopamine acts to promote this behavior, and ultimately help to refine our understanding of the neural circuitry mediating positive reinforcement.  相似文献   

14.
Huang VS  Haith A  Mazzoni P  Krakauer JW 《Neuron》2011,70(4):787-801
Although motor learning is likely to involve multiple processes, phenomena observed in error-based motor learning paradigms tend to be conceptualized in terms of only a single process: adaptation, which occurs through updating an internal model. Here we argue that fundamental phenomena like movement direction biases, savings (faster relearning), and interference do not relate to adaptation but instead are attributable to two additional learning processes that can be characterized as model-free: use-dependent plasticity and operant reinforcement. Although usually "hidden" behind adaptation, we demonstrate, with modified visuomotor rotation paradigms, that these distinct model-based and model-free processes combine to learn an error-based motor task. (1) Adaptation of an internal model channels movements toward successful error reduction in visual space. (2) Repetition of the newly adapted movement induces directional biases toward the?repeated movement. (3) Operant reinforcement through association of the adapted movement with successful error reduction is responsible for savings.  相似文献   

15.
The dorsal striatum integrates inputs from multiple brain areas to coordinate voluntary movements, associative plasticity, and reinforcement learning. Its projection neurons consist of the GABAergic medium spiny neurons (MSNs) that express dopamine receptor type 1 (D1) or dopamine receptor type 2 (D2). Cholinergic interneurons account for a small portion of striatal neuron populations, but they play important roles in striatal functions by synapsing onto the MSNs and other local interneurons. By combining the modified rabies virus with specific Cre- mouse lines, a recent study mapped the monosynaptic input patterns to MSNs. Because only a small number of extrastriatal neurons were labeled in the prior study, it is important to reexamine the input patterns of MSNs with higher labeling efficiency. Additionally, the whole-brain innervation pattern of cholinergic interneurons remains unknown. Using the rabies virus-based transsynaptic tracing method in this study, we comprehensively charted the brain areas that provide direct inputs to D1-MSNs, D2-MSNs, and cholinergic interneurons in the dorsal striatum. We found that both types of projection neurons and the cholinergic interneurons receive extensive inputs from discrete brain areas in the cortex, thalamus, amygdala, and other subcortical areas, several of which were not reported in the previous study. The MSNs and cholinergic interneurons share largely common inputs from areas outside the striatum. However, innervations within the dorsal striatum represent a significantly larger proportion of total inputs for cholinergic interneurons than for the MSNs. The comprehensive maps of direct inputs to striatal MSNs and cholinergic interneurons shall assist future functional dissection of the striatal circuits.  相似文献   

16.
Exposure to addictive drugs causes changes in synaptic function within the striatal complex, which can either mimic or interfere with the induction of synaptic plasticity. These synaptic adaptations include changes in the nucleus accumbens (NAc), a ventral striatal subregion important for drug reward and reinforcement, as well as the dorsal striatum, which may promote habitual drug use. As the behavioral effects of drugs of abuse are long-lasting, identifying persistent changes in striatal circuits induced by in vivo drug experience is of considerable importance. Within the striatum, drugs of abuse have been shown to induce modifications in dendritic morphology, ionotropic glutamate receptors (iGluR) and the induction of synaptic plasticity. Understanding the detailed molecular mechanisms underlying these changes in striatal circuit function will provide insight into how drugs of abuse usurp normal learning mechanisms to produce pathological behavior.  相似文献   

17.
The negative symptoms of schizophrenia (SZ) are associated with a pattern of reinforcement learning (RL) deficits likely related to degraded representations of reward values. However, the RL tasks used to date have required active responses to both reward and punishing stimuli. Pavlovian biases have been shown to affect performance on these tasks through invigoration of action to reward and inhibition of action to punishment, and may be partially responsible for the effects found in patients. Forty-five patients with schizophrenia and 30 demographically-matched controls completed a four-stimulus reinforcement learning task that crossed action (“Go” or “NoGo”) and the valence of the optimal outcome (reward or punishment-avoidance), such that all combinations of action and outcome valence were tested. Behaviour was modelled using a six-parameter RL model and EEG was simultaneously recorded. Patients demonstrated a reduction in Pavlovian performance bias that was evident in a reduced Go bias across the full group. In a subset of patients administered clozapine, the reduction in Pavlovian bias was enhanced. The reduction in Pavlovian bias in SZ patients was accompanied by feedback processing differences at the time of the P3a component. The reduced Pavlovian bias in patients is suggested to be due to reduced fidelity in the communication between striatal regions and frontal cortex. It may also partially account for previous findings of poorer “Go-learning” in schizophrenia where “Go” responses or Pavlovian consistent responses are required for optimal performance. An attenuated P3a component dynamic in patients is consistent with a view that deficits in operant learning are due to impairments in adaptively using feedback to update representations of stimulus value.  相似文献   

18.
In learning from trial and error, animals need to relate behavioral decisions to environmental reinforcement even though it may be difficult to assign credit to a particular decision when outcomes are uncertain or subject to delays. When considering the biophysical basis of learning, the credit-assignment problem is compounded because the behavioral decisions themselves result from the spatio-temporal aggregation of many synaptic releases. We present a model of plasticity induction for reinforcement learning in a population of leaky integrate and fire neurons which is based on a cascade of synaptic memory traces. Each synaptic cascade correlates presynaptic input first with postsynaptic events, next with the behavioral decisions and finally with external reinforcement. For operant conditioning, learning succeeds even when reinforcement is delivered with a delay so large that temporal contiguity between decision and pertinent reward is lost due to intervening decisions which are themselves subject to delayed reinforcement. This shows that the model provides a viable mechanism for temporal credit assignment. Further, learning speeds up with increasing population size, so the plasticity cascade simultaneously addresses the spatial problem of assigning credit to synapses in different population neurons. Simulations on other tasks, such as sequential decision making, serve to contrast the performance of the proposed scheme to that of temporal difference-based learning. We argue that, due to their comparative robustness, synaptic plasticity cascades are attractive basic models of reinforcement learning in the brain.  相似文献   

19.
Neural mechanisms of reward-related motor learning   总被引:10,自引:0,他引:10  
The analysis of the neural mechanisms responsible for reward-related learning has benefited from recent studies of the effects of dopamine on synaptic plasticity. Dopamine-dependent synaptic plasticity may lead to strengthening of selected inputs on the basis of an activity-dependent conjunction of sensory afferent activity, motor output activity, and temporally related firing of dopamine cells. Such plasticity may provide a link between the reward-related firing of dopamine cells and the acquisition of changes in striatal cell activity during learning. This learning mechanism may play a special role in the translation of reward signals into context-dependent response probability or directional bias in movement responses.  相似文献   

20.
Extinction of an appetitive operant response after administration of MSH   总被引:1,自引:0,他引:1  
Hungry rats were trained to press a lever in order to obtain food on a fixed ratio (FR) or variable ratio (VR) of reinforcement. Rats trained on the FR schedule and injected with synthetic α-MSH had delayed extinction of the task as compared with control rats injected with diluent. The results show that MSH affects the behavior of rats in another type of behavioral situation involving an appetitive operant response.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号