首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 340 毫秒
1.
Avoidance behavior is a critical component of many psychiatric disorders, and as such, it is important to understand how avoidance behavior arises, and whether it can be modified. In this study, we used empirical and computational methods to assess the role of informational feedback and ambiguous outcome in avoidance behavior. We adapted a computer-based probabilistic classification learning task, which includes positive, negative and no-feedback outcomes; the latter outcome is ambiguous as it might signal either a successful outcome (missed punishment) or a failure (missed reward). Prior work with this task suggested that most healthy subjects viewed the no-feedback outcome as strongly positive. Interestingly, in a later version of the classification task, when healthy subjects were allowed to opt out of (i.e. avoid) responding, some subjects (“avoiders”) reliably avoided trials where there was a risk of punishment, but other subjects (“non-avoiders”) never made any avoidance responses at all. One possible interpretation is that the “non-avoiders” valued the no-feedback outcome so positively on punishment-based trials that they had little incentive to avoid. Another possible interpretation is that the outcome of an avoided trial is unspecified and that lack of information is aversive, decreasing subjects’ tendency to avoid. To examine these ideas, we here tested healthy young adults on versions of the task where avoidance responses either did or did not generate informational feedback about the optimal response. Results showed that provision of informational feedback decreased avoidance responses and also decreased categorization performance, without significantly affecting the percentage of subjects classified as “avoiders.” To better understand these results, we used a modified Q-learning model to fit individual subject data. Simulation results suggest that subjects in the feedback condition adjusted their behavior faster following better-than-expected outcomes, compared to subjects in the no-feedback condition. Additionally, in both task conditions, “avoiders” adjusted their behavior faster following worse-than-expected outcomes, and treated the ambiguous no-feedback outcome as less rewarding, compared to non-avoiders. Together, results shed light on the important role of ambiguous and informative feedback in avoidance behavior.  相似文献   

2.
The negative symptoms of schizophrenia (SZ) are associated with a pattern of reinforcement learning (RL) deficits likely related to degraded representations of reward values. However, the RL tasks used to date have required active responses to both reward and punishing stimuli. Pavlovian biases have been shown to affect performance on these tasks through invigoration of action to reward and inhibition of action to punishment, and may be partially responsible for the effects found in patients. Forty-five patients with schizophrenia and 30 demographically-matched controls completed a four-stimulus reinforcement learning task that crossed action (“Go” or “NoGo”) and the valence of the optimal outcome (reward or punishment-avoidance), such that all combinations of action and outcome valence were tested. Behaviour was modelled using a six-parameter RL model and EEG was simultaneously recorded. Patients demonstrated a reduction in Pavlovian performance bias that was evident in a reduced Go bias across the full group. In a subset of patients administered clozapine, the reduction in Pavlovian bias was enhanced. The reduction in Pavlovian bias in SZ patients was accompanied by feedback processing differences at the time of the P3a component. The reduced Pavlovian bias in patients is suggested to be due to reduced fidelity in the communication between striatal regions and frontal cortex. It may also partially account for previous findings of poorer “Go-learning” in schizophrenia where “Go” responses or Pavlovian consistent responses are required for optimal performance. An attenuated P3a component dynamic in patients is consistent with a view that deficits in operant learning are due to impairments in adaptively using feedback to update representations of stimulus value.  相似文献   

3.
Adolescence is a period of life characterised by changes in learning and decision-making. Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissociable computational modules. Here, we aimed to trace the developmental time-course of the computational modules responsible for learning from reward or punishment, and learning from counterfactual feedback. Adolescents and adults carried out a novel reinforcement learning paradigm in which participants learned the association between cues and probabilistic outcomes, where the outcomes differed in valence (reward versus punishment) and feedback was either partial or complete (either the outcome of the chosen option only, or the outcomes of both the chosen and unchosen option, were displayed). Computational strategies changed during development: whereas adolescents’ behaviour was better explained by a basic reinforcement learning algorithm, adults’ behaviour integrated increasingly complex computational features, namely a counterfactual learning module (enabling enhanced performance in the presence of complete feedback) and a value contextualisation module (enabling symmetrical reward and punishment learning). Unlike adults, adolescent performance did not benefit from counterfactual (complete) feedback. In addition, while adults learned symmetrically from both reward and punishment, adolescents learned from reward but were less likely to learn from punishment. This tendency to rely on rewards and not to consider alternative consequences of actions might contribute to our understanding of decision-making in adolescence.  相似文献   

4.
Impairments in flexible goal-directed decisions, often examined by reversal learning, are associated with behavioral abnormalities characterized by impulsiveness and disinhibition. Although the lateral orbital frontal cortex (OFC) has been consistently implicated in reversal learning, it is still unclear whether this region is involved in negative feedback processing, behavioral control, or both, and whether reward and punishment might have different effects on lateral OFC involvement. Using a relatively large sample (N = 47), and a categorical learning task with either monetary reward or moderate electric shock as feedback, we found overlapping activations in the right lateral OFC (and adjacent insula) for reward and punishment reversal learning when comparing correct reversal trials with correct acquisition trials, whereas we found overlapping activations in the right dorsolateral prefrontal cortex (DLPFC) when negative feedback signaled contingency change. The right lateral OFC and DLPFC also showed greater sensitivity to punishment than did their left homologues, indicating an asymmetry in how punishment is processed. We propose that the right lateral OFC and anterior insula are important for transforming affective feedback to behavioral adjustment, whereas the right DLPFC is involved in higher level attention control. These results provide insight into the neural mechanisms of reversal learning and behavioral flexibility, which can be leveraged to understand risky behaviors among vulnerable populations.  相似文献   

5.
Sensorimotor control has traditionally been considered from a control theory perspective, without relation to neurobiology. In contrast, here we utilized a spiking-neuron model of motor cortex and trained it to perform a simple movement task, which consisted of rotating a single-joint “forearm” to a target. Learning was based on a reinforcement mechanism analogous to that of the dopamine system. This provided a global reward or punishment signal in response to decreasing or increasing distance from hand to target, respectively. Output was partially driven by Poisson motor babbling, creating stochastic movements that could then be shaped by learning. The virtual forearm consisted of a single segment rotated around an elbow joint, controlled by flexor and extensor muscles. The model consisted of 144 excitatory and 64 inhibitory event-based neurons, each with AMPA, NMDA, and GABA synapses. Proprioceptive cell input to this model encoded the 2 muscle lengths. Plasticity was only enabled in feedforward connections between input and output excitatory units, using spike-timing-dependent eligibility traces for synaptic credit or blame assignment. Learning resulted from a global 3-valued signal: reward (+1), no learning (0), or punishment (−1), corresponding to phasic increases, lack of change, or phasic decreases of dopaminergic cell firing, respectively. Successful learning only occurred when both reward and punishment were enabled. In this case, 5 target angles were learned successfully within 180 s of simulation time, with a median error of 8 degrees. Motor babbling allowed exploratory learning, but decreased the stability of the learned behavior, since the hand continued moving after reaching the target. Our model demonstrated that a global reinforcement signal, coupled with eligibility traces for synaptic plasticity, can train a spiking sensorimotor network to perform goal-directed motor behavior.  相似文献   

6.

Background

Mania is characterised by increased impulsivity and risk-taking, and psychological accounts argue that these features may be due to hypersensitivity to reward. The neurobiological mechanisms remain poorly understood. Here we examine reinforcement learning and sensitivity to both reward and punishment outcomes in hypomania-prone individuals not receiving pharmacotherapy.

Method

We recorded EEG from 45 healthy individuals split into three groups by low, intermediate and high self-reported hypomanic traits. Participants played a computerised card game in which they learned the reward contingencies of three cues. Neural responses to monetary gain and loss were measured using the feedback-related negativity (FRN), a component implicated in motivational outcome evaluation and reinforcement learning.

Results

As predicted, rewards elicited a smaller FRN in the hypomania-prone group relative to the low hypomania group, indicative of greater reward responsiveness. The hypomania-prone group also showed smaller FRN to losses, indicating diminished response to negative feedback.

Conclusion

Our findings indicate that proneness to hypomania is associated with both reward hypersensitivity and discounting of punishment. This positive evaluation bias may be driven by aberrant reinforcement learning signals, which fail to update future expectations. This provides a possible neural mechanism explaining risk-taking and impaired reinforcement learning in BD. Further research will be needed to explore the potential value of the FRN as a biological vulnerability marker for mania and pathological risk-taking.  相似文献   

7.
Alterations in reward and punishment processing have been reported in adults suffering from long-term cannabis use. However, previous findings regarding the chronic effects of cannabis on reward and punishment processing have been inconsistent. In the present study, we used functional magnetic resonance imaging (fMRI) to reveal the neural correlates of reward and punishment processing in long-term cannabis users (n = 15) and in healthy control subjects (n = 15) with no history of drug abuse. For this purpose, we used the well-established Monetary Incentive Delay (MID) task, a reliable experimental paradigm that allows the differentiation between anticipatory and consummatory aspects of reward and punishment processing. Regarding the gain anticipation period, no significant group differences were observed. In the left caudate and the left inferior frontal gyrus, cannabis users were – in contrast to healthy controls – not able to differentiate between the conditions feedback of reward and control. In addition, cannabis users showed stronger activations in the left caudate and the bilateral inferior frontal gyrus following feedback of no punishment as compared to healthy controls. We interpreted these deficits in dorsal striatal functioning as altered stimulus-reward or action-contingent learning in cannabis users. In addition, the enhanced lateral prefrontal activation in cannabis users that is related to non-punishing feedback may reflect a deficit in emotion regulation or cognitive reappraisal in these subjects.  相似文献   

8.

Objectives

Current models of ADHD suggest abnormal reward and punishment sensitivity, but the exact mechanisms are unclear. This study aims to investigate effects of continuous reward and punishment on the processing of performance feedback in children with ADHD and the modulating effects of stimulant medication.

Methods

15 Methylphenidate (Mph)-treated and 15 Mph-free children of the ADHD-combined type and 17 control children performed a selective attention task with three feedback conditions: no-feedback, gain and loss. Event Related Potentials (ERPs) time-locked to feedback and errors were computed.

Results

All groups performed more accurately with gain and loss than without feedback. Feedback-related ERPs demonstrated no group differences in the feedback P2, but an enhanced late positive potential (LPP) to feedback stimuli (both gains and losses) for Mph-free children with ADHD compared to controls. Feedback-related ERPs in Mph-treated children with ADHD were similar to controls. Correlational analyses in the ADHD groups revealed that the severity of inattention problems correlated negatively with the feedback P2 amplitude and positively with the LPP to losses and omitted gains.

Conclusions

The early selective attention for rewarding and punishing feedback was relatively intact in children with ADHD, but the late feedback processing was deviant (increased feedback LPP). This may explain the often observed positive effects of continuous reinforcement on performance and behaviour in children with ADHD. However, these group findings cannot be generalised to all individuals with the ADHD, because the feedback-related ERPs were associated with the severity of the inattention problems. Children with ADHD-combined type with more inattention problems showed both deviant early attentional selection of feedback stimuli, and deviant late processing of non-reward and punishment.  相似文献   

9.
Brown and Wanger [Brown, R.T., Wanger, A.R., 1964. Resistance to punishment and extinction following training with shock or nonreinforcement. J. Exp. Psychol. 68, 503-507] investigated rat behaviors with the following features: (1) rats were exposed to reward and punishment at the same time, (2) environment changed and rats relearned, and (3) rats were stochastically exposed to reward and punishment. The results are that exposure to nonreinforcement produces resistance to the decremental effects of behavior after stochastic reward schedule and that exposure to both punishment and reinforcement produces resistance to the decremental effects of behavior after stochastic punishment schedule. This paper aims to simulate the rat behaviors by a reinforcement learning algorithm in consideration of appearance probabilities of reinforcement signals. The former algorithms of reinforcement learning were unable to simulate the behavior of the feature (3). We improve the former reinforcement learning algorithms by controlling learning parameters in consideration of the acquisition probabilities of reinforcement signals. The proposed algorithm qualitatively simulates the result of the animal experiment of Brown and Wanger.  相似文献   

10.
The lateral habenula (LHb) is an epithalamic structure involved in signaling reward omission and aversive stimuli, and it inhibits dopaminergic neurons during motivated behavior. Less is known about LHb involvement in the acquisition and retrieval of avoidance learning. Our previous studies indicated that brief electrical stimulation of the LHb, time-locked to the avoidance of aversive footshock (presumably during the positive affective “relief” state that occurs when an aversive outcome is averted), inhibited the acquisition of avoidance learning. In the present study, we used the same paradigm to investigate different frequencies of LHb stimulation. The effect of 20 Hz vs. 50 Hz vs. 100 Hz stimulation was investigated during two phases, either during acquisition or retrieval in Mongolian gerbils. The results indicated that 50 Hz, but not 20 Hz, was sufficient to produce a long-term impairment in avoidance learning, and was somewhat more effective than 100 Hz in this regard. None of the stimulation parameters led to any effects on retrieval of avoidance learning, nor did they affect general motor activity. This suggests that, at frequencies in excess of the observed tonic firing rates of LHb neurons (>1–20 Hz), LHb stimulation may serve to interrupt the consolidation of new avoidance memories. However, these stimulation parameters are not capable of modifying avoidance memories that have already undergone extensive consolidation.  相似文献   

11.
Interval timing is a key element of foraging theory, models of predator avoidance, and competitive interactions. Although interval timing is well documented in vertebrate species, it is virtually unstudied in invertebrates. In the present experiment, we used free-flying honey bees (Apis mellifera ligustica) as a model for timing behaviors. Subjects were trained to enter a hole in an automated artificial flower to receive a nectar reinforcer (i.e. reward). Responses were continuously reinforced prior to exposure to either a fixed interval (FI) 15-sec, FI 30-sec, FI 60-sec, or FI 120-sec reinforcement schedule. We measured response rate and post-reinforcement pause within each fixed interval trial between reinforcers. Honey bees responded at higher frequencies earlier in the fixed interval suggesting subject responding did not come under traditional forms of temporal control. Response rates were lower during FI conditions compared to performance on continuous reinforcement schedules, and responding was more resistant to extinction when previously reinforced on FI schedules. However, no “scalloped” or “break-and-run” patterns of group or individual responses reinforced on FI schedules were observed; no traditional evidence of temporal control was found. Finally, longer FI schedules eventually caused all subjects to cease returning to the operant chamber indicating subjects did not tolerate the longer FI schedules.  相似文献   

12.
13.
Reinforcement learning methods can be used in robotics applications especially for specific target-oriented problems, for example the reward-based recalibration of goal directed actions. To this end still relatively large and continuous state-action spaces need to be efficiently handled. The goal of this paper is, thus, to develop a novel, rather simple method which uses reinforcement learning with function approximation in conjunction with different reward-strategies for solving such problems. For the testing of our method, we use a four degree-of-freedom reaching problem in 3D-space simulated by a two-joint robot arm system with two DOF each. Function approximation is based on 4D, overlapping kernels (receptive fields) and the state-action space contains about 10,000 of these. Different types of reward structures are being compared, for example, reward-on- touching-only against reward-on-approach. Furthermore, forbidden joint configurations are punished. A continuous action space is used. In spite of a rather large number of states and the continuous action space these reward/punishment strategies allow the system to find a good solution usually within about 20 trials. The efficiency of our method demonstrated in this test scenario suggests that it might be possible to use it on a real robot for problems where mixed rewards can be defined in situations where other types of learning might be difficult. This work was supported by EU-Grant PACO-PLUS.  相似文献   

14.
Task Irrelevant Perceptual Learning (TIPL) shows that the brain’s discriminative capacity can improve also for invisible and unattended visual stimuli. It has been hypothesized that this form of “unconscious” neural plasticity is mediated by an endogenous reward mechanism triggered by the correct task performance. Although this result has challenged the mandatory role of attention in perceptual learning, no direct evidence exists of the hypothesized link between target recognition, reward and TIPL. Here, we manipulated the reward value associated with a target to demonstrate the involvement of reinforcement mechanisms in sensory plasticity for invisible inputs. Participants were trained in a central task associated with either high or low monetary incentives, provided only at the end of the experiment, while subliminal stimuli were presented peripherally. Our results showed that high incentive-value targets induced a greater degree of perceptual improvement for the subliminal stimuli, supporting the role of reinforcement mechanisms in TIPL.  相似文献   

15.
Avoidance learning poses a challenge for reinforcement-based theories of instrumental conditioning, because once an aversive outcome is successfully avoided an individual may no longer experience extrinsic reinforcement for their behavior. One possible account for this is to propose that avoiding an aversive outcome is in itself a reward, and thus avoidance behavior is positively reinforced on each trial when the aversive outcome is successfully avoided. In the present study we aimed to test this possibility by determining whether avoidance of an aversive outcome recruits the same neural circuitry as that elicited by a reward itself. We scanned 16 human participants with functional MRI while they performed an instrumental choice task, in which on each trial they chose from one of two actions in order to either win money or else avoid losing money. Neural activity in a region previously implicated in encoding stimulus reward value, the medial orbitofrontal cortex, was found to increase, not only following receipt of reward, but also following successful avoidance of an aversive outcome. This neural signal may itself act as an intrinsic reward, thereby serving to reinforce actions during instrumental avoidance.  相似文献   

16.
Humans and animals are able to learn complex behaviors based on a massive stream of sensory information from different modalities. Early animal studies have identified learning mechanisms that are based on reward and punishment such that animals tend to avoid actions that lead to punishment whereas rewarded actions are reinforced. However, most algorithms for reward-based learning are only applicable if the dimensionality of the state-space is sufficiently small or its structure is sufficiently simple. Therefore, the question arises how the problem of learning on high-dimensional data is solved in the brain. In this article, we propose a biologically plausible generic two-stage learning system that can directly be applied to raw high-dimensional input streams. The system is composed of a hierarchical slow feature analysis (SFA) network for preprocessing and a simple neural network on top that is trained based on rewards. We demonstrate by computer simulations that this generic architecture is able to learn quite demanding reinforcement learning tasks on high-dimensional visual input streams in a time that is comparable to the time needed when an explicit highly informative low-dimensional state-space representation is given instead of the high-dimensional visual input. The learning speed of the proposed architecture in a task similar to the Morris water maze task is comparable to that found in experimental studies with rats. This study thus supports the hypothesis that slowness learning is one important unsupervised learning principle utilized in the brain to form efficient state representations for behavioral learning.  相似文献   

17.
Two groups of human volunteers received three sessions of discriminated avoidance and punishment with the skin resistance response (SRR) as the operant. During each session one group (feedback) received three 6–8-min periods of Sidman avoidance of a 1.5-mA shock (R-S=40 sec, S-S=35 sec) mixed with three periods of punishment with a 20-sec time-out after each period. The avoidance and punishment periods were signaled by red and green lights, and a circle appeared superimposed on the discriminative stimuli for the duration of a criterion response. A second group (no feedback) received the same conditions as the feedback group except that no circle appeared. Instructions to the subject were not informative regarding experimental events. Subjects made significantly more SRR's during avoidance, a contingency in which responding prevented shock, than during punishment, a contingency in which responding produced shock. A reliable four-way interaction suggested that the feedback stimulus curtailed a tendency for avoidance response rate to diminish within and between experimental sessions. The data are considered as evidence for electrodermal (autonomic) control of two different stressful situations, and the potential value of the paradigm for establishing tonic autonomic arousal and suppression is considered.This research was supported by the Charles L. Mix Memorial Fund. The data were collected in part by M. D. McCrary.  相似文献   

18.
Assessment of visual acuity is a well standardized procedure at least for expert opinions and clinical trials. It is often recommended not giving patients feedback on the correctness of their responses. As this viewpoint has not been quantitatively examined so far, we quantitatively assessed possible effects of feedback on visual acuity testing. In 40 normal participants we presented Landolt Cs in 8 orientations using the automated Freiburg Acuity Test (FrACT, <michaelbach.de/fract. Over a run comprising 24 trials, the acuity threshold was measured with an adaptive staircase procedure. In an ABCDDCBA scheme, trial-by-trial feedback was provided in 2 x 4 conditions: (A) no feedback, (B) acoustic signals indicating correctness, (C)visual indication of correct orientation, and (D) a combination of (B) and (C). After each run the participants judged comfort. Main outcome measures were absolute visual acuity (logMAR), its test-retest agreement (limits of agreement) and participants’ comfort estimates on a 5-step symmetric Likert scale. Feedback influenced acuity outcome significantly (p = 0.02), but with a tiny effect size: 0.02 logMAR poorer acuity for (D) compared to (A), even weaker effects for (B) and (C). Test-retest agreement was high (limits of agreement: ± 1.0 lines) and did not depend on feedback (p>0.5). The comfort ranking clearly differed, by 2 steps on the Likert scale: the condition (A)–no feedback–was on average “slightly uncomfortable”, the other three conditions were “slightly comfortable” (p<0.0001). Feedback affected neither reproducibility nor the acuity outcome to any relevant extent. The participants, however, reported markedly greater comfort with any kind of feedback. We conclude that systematic feedback (as implemented in FrACT) offers nothing but advantages for routine use.  相似文献   

19.

Background

In pediatric oncology, effective clinic–based management of acute and long–term distress in families calls for investigation of determinants of parents'' psychological response to the child''s cancer. We examined the relationship between parents'' prior exposure to traumatic life events (TLE) and the occurrence of posttraumatic stress symptoms (PTSS) following their child''s cancer diagnosis. Factors mediating the TLE–PTSS relationship were analyzed.

Methodology

The study comprised 169 parents (97 mothers, 72 fathers) of 103 cancer diagnosed children (median age: 5,9 years; range 0.1–19.7 years). Thirty five parents were of immigrant origin (20.7%). Prior TLE were collated using a standardized questionnaire, PTSS was assessed using the Impact of Events–Revised (IES–R) questionnaire covering intrusion, avoidance and hyperarousal symptoms. The predictive significance of prior TLE on PTSS was tested in adjusted regression models.

Results

Mothers demonstrated more severe PTSS across all symptom dimensions. TLE were associated with significantly increased hyperarousal symptoms. Parents'' gender, age and immigrant status did not significantly influence the TLE–PTSS relationship.

Conclusions

Prior traumatic life–events aggravate posttraumatic hyperarousal symptoms. In clinic–based psychological care of parents of high–risk pediatric patients, attention needs to be paid to life history, and to heightened vulnerability to PTSS associated with female gender.  相似文献   

20.
Biogenic amines are widely characterized in pathways evaluating reward and punishment, resulting in appropriate aversive or appetitive responses of vertebrates and invertebrates. We utilized the honey bee model and a newly developed spatial avoidance conditioning assay to probe effects of biogenic amines octopamine (OA) and dopamine (DA) on avoidance learning. In this new protocol non-harnessed bees associate a spatial color cue with mild electric shock punishment. After a number of experiences with color and shock the bees no longer enter the compartment associated with punishment. Intrinsic aspects of avoidance conditioning are associated with natural behavior of bees such as punishment (lack of food, explosive pollination mechanisms, danger of predation, heat, etc.) and their association to floral traits or other spatial cues during foraging. The results show that DA reduces the punishment received whereas octopamine OA increases the punishment received. These effects are dose-dependent and specific to the acquisition phase of training. The effects during acquisition are specific as shown in experiments using the antagonists Pimozide and Mianserin for DA and OA receptors, respectively. This study demonstrates the integrative role of biogenic amines in aversive learning in the honey bee as modeled in a novel non-appetitive avoidance learning assay.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号