首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Previously, we demonstrated the possibility of fMRI in two awake and unrestrained dogs. Here, we determined the replicability and heterogeneity of these results in an additional 11 dogs for a total of 13 subjects. Based on an anatomically placed region-of-interest, we compared the caudate response to a hand signal indicating the imminent availability of a food reward to a hand signal indicating no reward. 8 of 13 dogs had a positive differential caudate response to the signal indicating reward. The mean differential caudate response was 0.09%, which was similar to a comparable human study. These results show that canine fMRI is reliable and can be done with minimal stress to the dogs.  相似文献   

2.
Classic reinforcement learning (RL) theories cannot explain human behavior in the absence of external reward or when the environment changes. Here, we employ a deep sequential decision-making paradigm with sparse reward and abrupt environmental changes. To explain the behavior of human participants in these environments, we show that RL theories need to include surprise and novelty, each with a distinct role. While novelty drives exploration before the first encounter of a reward, surprise increases the rate of learning of a world-model as well as of model-free action-values. Even though the world-model is available for model-based RL, we find that human decisions are dominated by model-free action choices. The world-model is only marginally used for planning, but it is important to detect surprising events. Our theory predicts human action choices with high probability and allows us to dissociate surprise, novelty, and reward in EEG signals.  相似文献   

3.
Two experimental models with a choice between two reinforcements were used for assessment of individual typological features of dogs. In the first model dogs were given the choice of homogeneous food reinforcements: between less valuable constantly delivered reinforcement and more valuable reinforcement but delivered with low probabilities. In the second model the dogs had the choice of heterogeneous reinforcements: between performing alimentary and defensive reactions. Under conditions of rise of uncertainty owing to a decrease in probability of getting the valuable food, two dogs continued to prefer the valuable reinforcement, while the third animal gradually shifted its behavior from the choice of a highly valuable but infrequent reward to a less valuable but easily achieved reinforcement. Under condition of choice between the valuable food reinforcement and avoidance of electrocutaneous stimulation, the first two dogs preferred food, whereas the third animal which had been previously oriented to the choice of the low-valuable constant reinforcement, steadily preferred the avoidance behavior. The data obtained are consistent with the hypothesis according to which the individual typological characteristics of animals's (human's) behavior substantially depend on two parameters: extent of environmental uncertainty and subjective features of reinforcement assessment.  相似文献   

4.
Neural responses during anticipation of a primary taste reward   总被引:29,自引:0,他引:29  
The aim of this study was to determine the brain regions involved in anticipation of a primary taste reward and to compare these regions to those responding to the receipt of a taste reward. Using fMRI, we scanned human subjects who were presented with visual cues that signaled subsequent reinforcement with a pleasant sweet taste (1 M glucose), a moderately unpleasant salt taste (0.2 M saline), or a neutral taste. Expectation of a pleasant taste produced activation in dopaminergic midbrain, posterior dorsal amygdala, striatum, and orbitofrontal cortex (OFC). Apart from OFC, these regions were not activated by reward receipt. The findings indicate that when rewards are predictable, brain regions recruited during expectation are, in part, dissociable from areas responding to reward receipt.  相似文献   

5.
We investigated the interaction between individual experience and social learning in domestic dogs,Canis familiaris . We conducted two experiments using detour tests, where an object or food was placed behind a transparent, V-shaped wire-mesh fence, such that the dogs could get the reward by going around the fence. In some groups, two open doors were offered as an alternative, easier way to reach the reward. In experiment 1 we opened the doors only in trial 1, then closed them for trials 2 and 3. In experiment 2 other dogs were first taught to detour the fence with closed doors after they had observed a detouring human demonstrator, then we opened the doors for three subsequent trials. In experiment 1 all dogs reached the reward by going through the doors in trial 1, but their detouring performance was poor after the doors had been closed, if they had to solve the task on their own. However, dogs in the experimental group that were allowed to watch a detouring human demonstrator after the doors had been closed showed improved detouring ability compared with those that did not receive a demonstration of detouring. In experiment 2 the dogs tended to keep on detouring along the fence even if the doors had been opened, giving up a chance to get behind the fence by a shorter route. These results show that dogs can use information gained by observing a human demonstrator to overcome their own mistakenly preferred solution in a problem situation. In a reversed situation social learning can also contribute to a preference for a less adaptive behaviour. However, only repeated individual and social experience leads to a durable manifestation of maladaptive behaviour. Copyright 2003 Published by Elsevier Science Ltd on behalf of The Association for the Study of Animal Behaviour.   相似文献   

6.
Individual typological features of behavior of dogs were investigated by the method of choice between the low-valuable food available constantly and food of high quality presented with low probability. Animals were subjected to instrumental conditioning with the same conditioned stimuli but different types of reinforcement. Depression of a white pedal was always reinforced with meat-bread-crumb mixture, depression of a black pedal was reinforced with two pieces of liver (with probabilities of 100, 40, 33, 20, or 0%). The choice of reinforcement depended on probability of valuable food and individual typological features of the nervous system of a dog. Decreasing the probability of the reinforcement value to 40-20% revealed differences in behavior of dogs. Dogs of the first group, presumably with the weak type of the nervous system, more frequently pressed the white pedal (always reinforced) than the black pedal thus "avoiding a situation of risk" to receive an empty cup. They displayed symptoms of neurosis: whimper, refusals of food or of the choice of reinforcement, and obtrusive movements. Dogs of the second group, presumably with the strong type of the nervous system, more frequently pressed the black pedal (more valuable food) for the low-probability reward until they obtained the valuable food. They did not show neurosis symptoms and were not afraid of "situation of risk". A decrease in probability of the valuable reinforcement increased a percentage of long-latency depressions of pedals. It can be probably suggested that this phenomenon was associated with increasing involvement of cognitive processes, when contributions of the assessments of probability and value of the reinforcement to decision making became approximately equal. Choice between the probability and value of alimentary reinforcement is a good method for revealing individual typological features of dogs.  相似文献   

7.
Sensorimotor control has traditionally been considered from a control theory perspective, without relation to neurobiology. In contrast, here we utilized a spiking-neuron model of motor cortex and trained it to perform a simple movement task, which consisted of rotating a single-joint “forearm” to a target. Learning was based on a reinforcement mechanism analogous to that of the dopamine system. This provided a global reward or punishment signal in response to decreasing or increasing distance from hand to target, respectively. Output was partially driven by Poisson motor babbling, creating stochastic movements that could then be shaped by learning. The virtual forearm consisted of a single segment rotated around an elbow joint, controlled by flexor and extensor muscles. The model consisted of 144 excitatory and 64 inhibitory event-based neurons, each with AMPA, NMDA, and GABA synapses. Proprioceptive cell input to this model encoded the 2 muscle lengths. Plasticity was only enabled in feedforward connections between input and output excitatory units, using spike-timing-dependent eligibility traces for synaptic credit or blame assignment. Learning resulted from a global 3-valued signal: reward (+1), no learning (0), or punishment (−1), corresponding to phasic increases, lack of change, or phasic decreases of dopaminergic cell firing, respectively. Successful learning only occurred when both reward and punishment were enabled. In this case, 5 target angles were learned successfully within 180 s of simulation time, with a median error of 8 degrees. Motor babbling allowed exploratory learning, but decreased the stability of the learned behavior, since the hand continued moving after reaching the target. Our model demonstrated that a global reinforcement signal, coupled with eligibility traces for synaptic plasticity, can train a spiking sensorimotor network to perform goal-directed motor behavior.  相似文献   

8.
Previous reports have described that neural activities in midbrain dopamine areas are sensitive to unexpected reward delivery and omission. These activities are correlated with reward prediction error in reinforcement learning models, the difference between predicted reward values and the obtained reward outcome. These findings suggest that the reward prediction error signal in the brain updates reward prediction through stimulus-reward experiences. It remains unknown, however, how sensory processing of reward-predicting stimuli contributes to the computation of reward prediction error. To elucidate this issue, we examined the relation between stimulus discriminability of the reward-predicting stimuli and the reward prediction error signal in the brain using functional magnetic resonance imaging (fMRI). Before main experiments, subjects learned an association between the orientation of a perceptually salient (high-contrast) Gabor patch and a juice reward. The subjects were then presented with lower-contrast Gabor patch stimuli to predict a reward. We calculated the correlation between fMRI signals and reward prediction error in two reinforcement learning models: a model including the modulation of reward prediction by stimulus discriminability and a model excluding this modulation. Results showed that fMRI signals in the midbrain are more highly correlated with reward prediction error in the model that includes stimulus discriminability than in the model that excludes stimulus discriminability. No regions showed higher correlation with the model that excludes stimulus discriminability. Moreover, results show that the difference in correlation between the two models was significant from the first session of the experiment, suggesting that the reward computation in the midbrain was modulated based on stimulus discriminability before learning a new contingency between perceptually ambiguous stimuli and a reward. These results suggest that the human reward system can incorporate the level of the stimulus discriminability flexibly into reward computations by modulating previously acquired reward values for a typical stimulus.  相似文献   

9.
There is a controversy about the mechanisms involved in the interspecific communicative behaviour in domestic dogs. The main question is whether this behaviour is a result of instrumental learning or higher cognitive skills are required. The present investigations were undertaken to study the effect of learning processes upon the gaze towards the human's face as a communicative response. To such purpose, in Study 1, gaze response was subjected to three types of reinforcement schedules: differential reinforcement, reinforcer omission, and extinction in a situation of “asking for food”. Results showed a significant increase in gaze duration in the differential reinforcement phase and a significant decrease in both the omission and extinction phases. These changes were quite rapid, since they occurred only after three training trials in each phase. Furthermore, extinction resulted in animal behaviour changes, such as an increase in the distance from the experimenter, the back position and lying behaviour. This is the first systematic evaluation of the behavioural changes caused by reward withdrawal (frustration) in dogs. In Study 2, the gaze response was studied in a situation where dogs walked along with their owners/trainers. These results show that learning plays an important role in this communicative response. The possible implications of these results for service dogs are discussed.  相似文献   

10.
《Anthrozo?s》2013,26(1):51-68
Abstract

“Instinctive” behavior may be modified using operant techniques. We report here on a field study of training herding dogs in which reinforcers and punishers were used by owners, who were themselves being trained to control their dogs. Access to sheep was assumed to be a primary reinforcer for herding dogs. While blocking their access was aversive to them. Over several months, the number of blocking and access actions by the human were scored during the training of seven naïve herding dogs. We found that rates of punishment by blocking the dog's access to sheep or by stopping the dog occurred at higher levels than positive reinforcement from access or verbal praise. While positive reinforcement can be used exclusively for the training of certain behaviors, it is suggested that in the context of instinctive motor patterns, negative reinforcement and punishment may be desirable and necessary additions to positive reinforcement techniques.  相似文献   

11.
Presenting animals with artificial visual stimuli is a key element of many recent behavioral experiments largely because images are easier to control and manipulate than live demonstrations. Determining how animals process images is crucial for being able to correctly interpret subjects' reactions toward these stimuli. In this study, we aimed to use the framework proposed by Fagot et al. (2010) Proc. Natl. Acad. Sci. USA 107 , 519 to classify how dogs perceive life‐sized projected videos. First, we tested whether dogs can use pre‐recorded and hence non‐interactive, video footage of a human to locate a hidden reward in a three‐way choice task. Secondly, we investigated whether dogs solve this task by means of referential understanding. To achieve this, we separated the location of the video projection from the location where dogs had to search for the hidden reward. Our results confirmed that dogs can reliably use pre‐recorded videos of a human as a source of information when the demonstration and the hiding locations are in the same room. However, they did not find the hidden object above the chance level when the hiding locations were in a separate room. Still, further analysis found a positive connection between the attention paid to the projection and the success rate of dogs. This finding suggests that the factor limiting dogs' performance was their attention and that with further training they might be able to master tasks involving referential understanding.  相似文献   

12.
Avoidance learning poses a challenge for reinforcement-based theories of instrumental conditioning, because once an aversive outcome is successfully avoided an individual may no longer experience extrinsic reinforcement for their behavior. One possible account for this is to propose that avoiding an aversive outcome is in itself a reward, and thus avoidance behavior is positively reinforced on each trial when the aversive outcome is successfully avoided. In the present study we aimed to test this possibility by determining whether avoidance of an aversive outcome recruits the same neural circuitry as that elicited by a reward itself. We scanned 16 human participants with functional MRI while they performed an instrumental choice task, in which on each trial they chose from one of two actions in order to either win money or else avoid losing money. Neural activity in a region previously implicated in encoding stimulus reward value, the medial orbitofrontal cortex, was found to increase, not only following receipt of reward, but also following successful avoidance of an aversive outcome. This neural signal may itself act as an intrinsic reward, thereby serving to reinforce actions during instrumental avoidance.  相似文献   

13.
Human subjects were exposed to a concurrent-chain procedure in which amount of reinforcement in the terminal links was varied. The experimental procedure was designed to resemble as closely as possible animal operant procedures: verbal instructions were eliminated, the key-press operant response was shaped, and a “consummatory” response was required to receive reward. In addition to varying amount of reward, three different pairs of initial-link values in the concurrent chain were studied. The human subjects showed undermatching to amount of reinforcement (as do animal subjects). Moreover, the degree of undermatching tended to increase as the values of the initial links increased, consistent with Fantino's delay reduction hypothesis (1977) that choice for a larger reward decreases as the length of the initial link increases.  相似文献   

14.
Recent studies have suggested that domestic dogs (Canis familiaris) engage in highly complex forms of social learning. Here, we critically assess the potential mechanisms underlying social learning in dogs using two problem‐solving tasks. In a classical detour task, the test dogs benefited from observing a demonstrator walking around a fence to obtain a reward. However, even inexperienced dogs did not show a preference for passing the fence at the same end as the demonstrator. Furthermore, dogs did not need to observe a complete demonstration by a human demonstrator to pass the task. Instead, they were just as successful in solving the problem after seeing a partial demonstration by an object passing by at the end of the fence. In contrast to earlier findings, our results suggest that stimulus enhancement (or affordance learning) might be a powerful social learning mechanism used by dogs to solve such detour problems. In the second task, we examined whether naïve dogs copy actions to solve an instrumental problem. After controlling for stimulus enhancement and other forms of social influence (e.g. social facilitation and observational conditioning), we found that dogs’ problem solving was not influenced by witnessing a skilful demonstrator (either an unknown human, a conspecific or the dog’s owner). Together, these results add to evidence suggesting that social learning may often be explained by relatively simple (but powerful) mechanisms.  相似文献   

15.
Post-traumatic stress disorder (PTSD) symptoms include behavioral avoidance which is acquired and tends to increase with time. This avoidance may represent a general learning bias; indeed, individuals with PTSD are often faster than controls on acquiring conditioned responses based on physiologically-aversive feedback. However, it is not clear whether this learning bias extends to cognitive feedback, or to learning from both reward and punishment. Here, male veterans with self-reported current, severe PTSD symptoms (PTSS group) or with few or no PTSD symptoms (control group) completed a probabilistic classification task that included both reward-based and punishment-based trials, where feedback could take the form of reward, punishment, or an ambiguous “no-feedback” outcome that could signal either successful avoidance of punishment or failure to obtain reward. The PTSS group outperformed the control group in total points obtained; the PTSS group specifically performed better than the control group on reward-based trials, with no difference on punishment-based trials. To better understand possible mechanisms underlying observed performance, we used a reinforcement learning model of the task, and applied maximum likelihood estimation techniques to derive estimated parameters describing individual participants’ behavior. Estimations of the reinforcement value of the no-feedback outcome were significantly greater in the control group than the PTSS group, suggesting that the control group was more likely to value this outcome as positively reinforcing (i.e., signaling successful avoidance of punishment). This is consistent with the control group’s generally poorer performance on reward trials, where reward feedback was to be obtained in preference to the no-feedback outcome. Differences in the interpretation of ambiguous feedback may contribute to the facilitated reinforcement learning often observed in PTSD patients, and may in turn provide new insight into how pathological behaviors are acquired and maintained in PTSD.  相似文献   

16.
In a large variety of situations one would like to have an expressive and accurate model of observed animal or human behavior. While general purpose mathematical models may capture successfully properties of observed behavior, it is desirable to root models in biological facts. Because of ample empirical evidence for reward-based learning in visuomotor tasks, we use a computational model based on the assumption that the observed agent is balancing the costs and benefits of its behavior to meet its goals. This leads to using the framework of reinforcement learning, which additionally provides well-established algorithms for learning of visuomotor task solutions. To quantify the agent’s goals as rewards implicit in the observed behavior, we propose to use inverse reinforcement learning, which quantifies the agent’s goals as rewards implicit in the observed behavior. Based on the assumption of a modular cognitive architecture, we introduce a modular inverse reinforcement learning algorithm that estimates the relative reward contributions of the component tasks in navigation, consisting of following a path while avoiding obstacles and approaching targets. It is shown how to recover the component reward weights for individual tasks and that variability in observed trajectories can be explained succinctly through behavioral goals. It is demonstrated through simulations that good estimates can be obtained already with modest amounts of observation data, which in turn allows the prediction of behavior in novel configurations.  相似文献   

17.
In contrast to animal social learning (e.g. dogs learning from observing another dog), humans typically teach by attracting the attention of the learner. Also during the training of dogs, humans tend to attract their attention in a similar way. Here, we investigated dogs’ ability to learn both from a dog and a human demonstrator in a manipulative task, where the models demonstrated which part of a box to manipulate in order to get a food reward. We varied the communicative context both during the dog and during the human demonstration comparably: a second experimenter directed the attention of the subjects to the model (dog/human ostensive demonstration) or remained silent (dog/human non-ostensive demonstration). Moreover, we investigated whether the training level of the dogs (well-trained vs. untrained) affected how the dogs performed in the manipulative tasks after the different demonstrations.We found that better trained dogs showed significantly better problem solving abilities. They paid more attention to the human demonstration than to the dog model, whereas such a difference in attentiveness of the less trained dogs was not found. Despite slight differences in paying attention to the different demonstrators, the presence of human or the dog demonstrators exerted equally effectiveness on the test performance of the dogs. However, the effectiveness of the demonstrations was significantly reduced if ostensive cues were given during the demonstrations by a second experimenter. Analysis of attentiveness and activity of the observer dogs during the demonstrations indicates that the reason for this negative effect was a combination of distracted attention paid to the demonstration and a higher level of excitement in the ostensive than in the non-ostensive demonstrations.This study suggests that third party communication during demonstration attracts dogs’ attention to the communicator instead of paying close attention to the model. We suggest that precise timing and synchronization of attention-calling and demonstration is necessary to avoid this distracting effect.  相似文献   

18.
Modulation of caudate activity by action contingency   总被引:5,自引:0,他引:5  
Tricomi EM  Delgado MR  Fiez JA 《Neuron》2004,41(2):281-292
Research has increasingly implicated the striatum in the processing of reward-related information in both animals and humans. However, it is unclear whether human striatal activation is driven solely by the hedonic properties of rewards or whether such activation is reliant on other factors, such as anticipation of upcoming reward or performance of an action to earn a reward. We used event-related functional magnetic resonance imaging to investigate hemodynamic responses to monetary rewards and punishments in three experiments that made use of an oddball paradigm. We presented reward and punishment displays randomly in time, following an anticipatory cue, or following a button press response. Robust and differential activation of the caudate nucleus occurred only when a perception of contingency existed between the button press response and the outcome. This finding suggests that the caudate is involved in reinforcement of action potentially leading to reward, rather than in processing reward per se.  相似文献   

19.
In dogs pressing a lever for a brain-stimulation reward, arterial blood pressure (ABP) was elevated for 20 out of 24 sites tested, but this effect was usually conspicuous only at twice the threshold current sustaining stable performance. Hypertension was seen only in one ventral tegmental and two hypothalamic sites. In three anterior placements the ABP and heart rate (HR) increased more upon a fixed ratio than on continuous reinforcement. In most sites, self-stimulation was accompanied by cardiac acceleration; however, in some placements the HR was similar to or even less than control values. Continuous stimulation (5-10 sec) at one nucleus accumbens and four hypothalamic sites by the experimenter was aversive and produced a clearcut pressor response. The cardiovascular changes seem to depend on a spread of current to brain centres controlling circulatory functions and also, to some extent, on the animal's motor activity. The results contradict the claim that a causal relationship exists between the autonomic concomitants of self-stimulation and the intrinsic nature of the brain-stimulation reward.  相似文献   

20.
Satiated rats could be trained to give stable rates of responding for rewarding stimulation of the lateral hypothalamus delivered on differential reinforcement of low rate (DRL) schedule requiring 2 to 8 sec interresponse intervals for reinforcement (DRL-2 to 8). The performance on a DRL-8 schedule was tested 30 min after the oral administration of benzodiazepines. Diazepam (5 and 10 mg/kg) and meprobamate (200 mg/kg) caused significant increases in response rates during the first 5 min of a session, but not thereafter. Bromazepam (1 and 5 mg/kg) also caused a significant increase in the rates during the first and second 5 min. On the other hand, chlorpromazine (20 mg/kg) caused no effect in the first 5 min but decrease in second and third 5 min. These results indicate that DRL schedules with a brain stimulation reward provided a useful tool for evaluation of antianxiety drugs. The advantage of the brain stimulation reward over food reward is that the possible effects of the drugs on hunger motivation need not be considered.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号