首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In learning from trial and error, animals need to relate behavioral decisions to environmental reinforcement even though it may be difficult to assign credit to a particular decision when outcomes are uncertain or subject to delays. When considering the biophysical basis of learning, the credit-assignment problem is compounded because the behavioral decisions themselves result from the spatio-temporal aggregation of many synaptic releases. We present a model of plasticity induction for reinforcement learning in a population of leaky integrate and fire neurons which is based on a cascade of synaptic memory traces. Each synaptic cascade correlates presynaptic input first with postsynaptic events, next with the behavioral decisions and finally with external reinforcement. For operant conditioning, learning succeeds even when reinforcement is delivered with a delay so large that temporal contiguity between decision and pertinent reward is lost due to intervening decisions which are themselves subject to delayed reinforcement. This shows that the model provides a viable mechanism for temporal credit assignment. Further, learning speeds up with increasing population size, so the plasticity cascade simultaneously addresses the spatial problem of assigning credit to synapses in different population neurons. Simulations on other tasks, such as sequential decision making, serve to contrast the performance of the proposed scheme to that of temporal difference-based learning. We argue that, due to their comparative robustness, synaptic plasticity cascades are attractive basic models of reinforcement learning in the brain.  相似文献   

2.
Brown and Wanger [Brown, R.T., Wanger, A.R., 1964. Resistance to punishment and extinction following training with shock or nonreinforcement. J. Exp. Psychol. 68, 503-507] investigated rat behaviors with the following features: (1) rats were exposed to reward and punishment at the same time, (2) environment changed and rats relearned, and (3) rats were stochastically exposed to reward and punishment. The results are that exposure to nonreinforcement produces resistance to the decremental effects of behavior after stochastic reward schedule and that exposure to both punishment and reinforcement produces resistance to the decremental effects of behavior after stochastic punishment schedule. This paper aims to simulate the rat behaviors by a reinforcement learning algorithm in consideration of appearance probabilities of reinforcement signals. The former algorithms of reinforcement learning were unable to simulate the behavior of the feature (3). We improve the former reinforcement learning algorithms by controlling learning parameters in consideration of the acquisition probabilities of reinforcement signals. The proposed algorithm qualitatively simulates the result of the animal experiment of Brown and Wanger.  相似文献   

3.
Operant learning requires that reinforcement signals interact with action representations at a suitable neural interface. Much evidence suggests that this occurs when phasic dopamine, acting as a reinforcement prediction error, gates plasticity at cortico-striatal synapses, and thereby changes the future likelihood of selecting the action(s) coded by striatal neurons. But this hypothesis faces serious challenges. First, cortico-striatal plasticity is inexplicably complex, depending on spike timing, dopamine level, and dopamine receptor type. Second, there is a credit assignment problem—action selection signals occur long before the consequent dopamine reinforcement signal. Third, the two types of striatal output neuron have apparently opposite effects on action selection. Whether these factors rule out the interface hypothesis and how they interact to produce reinforcement learning is unknown. We present a computational framework that addresses these challenges. We first predict the expected activity changes over an operant task for both types of action-coding striatal neuron, and show they co-operate to promote action selection in learning and compete to promote action suppression in extinction. Separately, we derive a complete model of dopamine and spike-timing dependent cortico-striatal plasticity from in vitro data. We then show this model produces the predicted activity changes necessary for learning and extinction in an operant task, a remarkable convergence of a bottom-up data-driven plasticity model with the top-down behavioural requirements of learning theory. Moreover, we show the complex dependencies of cortico-striatal plasticity are not only sufficient but necessary for learning and extinction. Validating the model, we show it can account for behavioural data describing extinction, renewal, and reacquisition, and replicate in vitro experimental data on cortico-striatal plasticity. By bridging the levels between the single synapse and behaviour, our model shows how striatum acts as the action-reinforcement interface.  相似文献   

4.
An explanatory model is developed to show how synaptic learning mechanisms modeled through spike-timing dependent plasticity (STDP) can result in long-term adaptations consistent with reinforcement learning models. In particular, the reinforcement learning model known as temporal difference (TD) learning has been used to model neuronal behavior in the orbitofrontal cortex (OFC) and ventral tegmental area (VTA) of macaque monkey during reinforcement learning. While some research has observed, empirically, a connection between STDP and TD, there has not been an explanatory model directly connecting TD to STDP. Through analysis of the learning dynamics that results from a general form of a STDP learning rule, the connection between STDP and TD is explained. We further demonstrate that a STDP learning rule drives the spike probability of a reward predicting neuronal population to a stable equilibrium. The equilibrium solution has an increasing slope where the steepness of the slope predicts the probability of the reward, similar to the results from electrophysiological recordings suggesting a different slope that predicts the value of the anticipated reward of Montague and Berns [Neuron 36(2):265–284, 2002]. This connection begins to shed light into more recent data gathered from VTA and OFC which are not well modeled by TD. We suggest that STDP provides the underlying mechanism for explaining reinforcement learning and other higher level perceptual and cognitive function. This material is based upon work supported by the National Science Foundation under Grants No. IOB-0445648 (PDR) and DMS-0408334 (GL) and by a Career Support grant from Portland State University (GL).  相似文献   

5.
Recurrent neural networks (RNNs) are widely used in computational neuroscience and machine learning applications. In an RNN, each neuron computes its output as a nonlinear function of its integrated input. While the importance of RNNs, especially as models of brain processing, is undisputed, it is also widely acknowledged that the computations in standard RNN models may be an over-simplification of what real neuronal networks compute. Here, we suggest that the RNN approach may be made computationally more powerful by its fusion with Bayesian inference techniques for nonlinear dynamical systems. In this scheme, we use an RNN as a generative model of dynamic input caused by the environment, e.g. of speech or kinematics. Given this generative RNN model, we derive Bayesian update equations that can decode its output. Critically, these updates define a 'recognizing RNN' (rRNN), in which neurons compute and exchange prediction and prediction error messages. The rRNN has several desirable features that a conventional RNN does not have, e.g. fast decoding of dynamic stimuli and robustness to initial conditions and noise. Furthermore, it implements a predictive coding scheme for dynamic inputs. We suggest that the Bayesian inversion of RNNs may be useful both as a model of brain function and as a machine learning tool. We illustrate the use of the rRNN by an application to the online decoding (i.e. recognition) of human kinematics.  相似文献   

6.

Internet of Things (IoT) has introduced new applications and environments. Smart Home provides new ways of communication and service consumption. In addition, Artificial Intelligence (AI) and deep learning have improved different services and tasks by automatizing them. In this field, reinforcement learning (RL) provides an unsupervised way to learn from the environment. In this paper, a new intelligent system based on RL and deep learning is proposed for Smart Home environments to guarantee good levels of QoE, focused on multimedia services. This system is aimed to reduce the impact on user experience when the classifying system achieves a low accuracy. The experiments performed show that the deep learning model proposed achieves better accuracy than the KNN algorithm and that the RL system increases the QoE of the user up to 3.8 on a scale of 10.

  相似文献   

7.
The study was carried out in mice C57BL/6J and DBA/2J for comparative analysis of two interference processes: latent inhibition and extinction of passive avoidance produced with an unconditioned aversive stimulus of different parameters (0.5 and 0.25 mA). With a strong training to new stimulus, impairment of extinction has been detected only in mice DBA/2J. Reduction in the strength of punishment during training was accompanied by acceleration of extinction in mice C57BL/6J and its appearance in mice DBA/2J. The learning of passive avoidance in strong and weak reinforcement was the same for both strains of mice. Interline differences were found also in the analysis of latent inhibition. With strong and weak training to conditional stimulus, lost of novelty by repeated an 8-fold pre-exposures to the experimental chamber, in DBA/2J mice, in contrast to C57BL/6J, latent inhibition was disrupted. In addition, DBA/2J mice showed impairment of extinction with weak training to non-relevant stimulus.  相似文献   

8.
Kahnt T  Grueschow M  Speck O  Haynes JD 《Neuron》2011,70(3):549-559
The dominant view that perceptual learning is accompanied by changes in early sensory representations has recently been challenged. Here we tested the idea that perceptual learning can be accounted for by reinforcement learning involving changes in higher decision-making areas. We trained subjects on an orientation discrimination task involving feedback over 4 days, acquiring fMRI data on the first and last day. Behavioral improvements were well explained by a reinforcement learning model in which learning leads to enhanced readout of sensory information, thereby establishing noise-robust representations of decision variables. We find stimulus orientation encoded in early visual and higher cortical regions such as lateral parietal cortex and anterior cingulate cortex (ACC). However, only activity patterns in the ACC tracked changes in decision variables during learning. These results provide strong evidence for perceptual learning-related changes in higher order areas and suggest that perceptual and reward learning are based on a common neurobiological mechanism.  相似文献   

9.
There is a controversy about the mechanisms involved in the interspecific communicative behaviour in domestic dogs. The main question is whether this behaviour is a result of instrumental learning or higher cognitive skills are required. The present investigations were undertaken to study the effect of learning processes upon the gaze towards the human's face as a communicative response. To such purpose, in Study 1, gaze response was subjected to three types of reinforcement schedules: differential reinforcement, reinforcer omission, and extinction in a situation of “asking for food”. Results showed a significant increase in gaze duration in the differential reinforcement phase and a significant decrease in both the omission and extinction phases. These changes were quite rapid, since they occurred only after three training trials in each phase. Furthermore, extinction resulted in animal behaviour changes, such as an increase in the distance from the experimenter, the back position and lying behaviour. This is the first systematic evaluation of the behavioural changes caused by reward withdrawal (frustration) in dogs. In Study 2, the gaze response was studied in a situation where dogs walked along with their owners/trainers. These results show that learning plays an important role in this communicative response. The possible implications of these results for service dogs are discussed.  相似文献   

10.
In the metaphor of behavioral momentum, reinforcement is assumed to strengthen discriminated operant behavior in the sense of increasing its resistance to disruption, and extinction is viewed as disruption by contingency termination and reinforcer omission. In multiple schedules of intermittent reinforcement, resistance to extinction is an increasing function of reinforcer rate, consistent with a model based on the momentum metaphor. The partial-reinforcement extinction effect, which opposes the effects of reinforcer rate, can be explained by the large disruptive effect of terminating continuous reinforcement despite its strengthening effect during training. Inclusion of a term for the context of reinforcement during training allows the model to account for a wide range of multiple-schedule extinction data and makes contact with other formulations. The relation between resistance to extinction and reinforcer rate on single schedules of intermittent reinforcement is exactly opposite to that for multiple schedules over the same range of reinforcer rates; however, the momentum model can give an account of resistance to extinction in single as well as multiple schedules. An alternative analysis based on the number of reinforcers omitted to an extinction criterion supports the conclusion that response strength is an increasing function of reinforcer rate during training.  相似文献   

11.
Eight pigeons responded in a multiple variable-interval (VI) schedule in which a constant component always delivered 40rft/h, and an alternated component was either rich (200rft/h) or lean (6.67rft/h) in different conditions. Four tests of resistance to change were conducted in each condition: prefeeding, full extinction, constant-component-only extinction, and response-independent food. Resistance to both prefeeding and full extinction in the constant component varied inversely with the reinforcement rate in the alternated component, but resistance to response-independent food did not. The extinction and response-independent food results were consistent with [J. Exp. Psychol.: Anim. Behav. Proc. 25 (1999) 256] behavioral momentum model. Maintaining reinforcement in the alternated component increased resistance to extinction in the constant component, as predicted by the behavioral momentum model but not accounts of multiple-schedule performance based on [J. Exp. Anal. Behav. 13 (1970) 243] equation. Overall, the momentum model gave a good account of the results with the exception of the prefeeding data. Possible ways to reconcile the prefeeding results with behavioral momentum theory are considered.  相似文献   

12.
Statistical decision theory is discussed as a general framework for analysing how animals should learn. Attention is focused on optimal foraging behaviour in stochastic environments. We emphasise the distinction between the mathematical procedure that can be used to find optimal solutions and the mechanism an animal might use to implement such solutions. The mechanisms might be specific to a restricted class of problems and produce suboptimal behaviour when faced with problems outside this class. We illustrate this point by an example based on what is known in the literature on animal learning as the partial reinforcement effect.  相似文献   

13.
Spike timing dependent plasticity (STDP) likely plays an important role in forming and changing connectivity patterns between neurons in our brain. In a unidirectional synaptic connection between two neurons, it uses the causal relation between spiking activity of a presynaptic input neuron and a postsynaptic output neuron to change the strength of this connection. While the nature of STDP benefits unsupervised learning of correlated inputs, any incorporation of value into the learning process needs some form of reinforcement. Chemical neuromodulators such as Dopamine or Acetylcholine are thought to signal changes between external reward and internal expectation to many brain regions, including the basal ganglia. This effect is often modelled through a direct inclusion of the level of Dopamine as a third factor into the STDP rule. While this gives the benefit of direct control over synaptic modification, it does not account for observed instantaneous effects in neuronal activity on application of Dopamine agonists. Specifically, an instant facilitation of neuronal excitability in the striatum can not be explained by the only indirect effect that dopamine-modulated STDP has on a neuron’s firing pattern. We therefore propose a model for synaptic transmission where the level of neuromodulator does not directly influence synaptic plasticity, but instead alters the relative firing causality between pre- and postsynaptic neurons. Through the direct effect on postsynaptic activity, our rule allows indirect modulation of the learning outcome even with unmodulated, two-factor STDP. However, it also does not prohibit joint operation together with three-factor STDP rules.  相似文献   

14.
Extinction performance is often used to assess underlying psychological processes without the interference of reinforcement. For example, in the extinction/reinstatement paradigm, motivation to seek drug is assessed by measuring responding elicited by drug-associated cues without drug reinforcement. However, extinction performance is governed by several psychological processes that involve motivation, memory, learning, and motoric functions. These processes are confounded when overall response rate is used to measure performance. Based on evidence that operant responding occurs in bouts, this paper proposes an analytic procedure that separates extinction performance into several behavioral components: (1-3) the baseline bout initiation rate, within-bout response rate, and bout length at the onset of extinction; (4-6) their rates of decay during extinction; (7) the time between extinction onset and the decline of responding; (8) the asymptotic response rate at the end of extinction; (9) the refractory period after each response. Data that illustrate the goodness of fit of this analytic model are presented. This paper also describes procedures to isolate behavioral components contributing to extinction performance and make inferences about experimental effects on these components. This microscopic behavioral analysis allows the mapping of different psychological processes to distinct behavioral components implicated in extinction performance, which may further our understanding of the psychological effects of neurobiological treatments.  相似文献   

15.
Partial reinforcement (PR) effects on animal locomotor behavior were studied in the golden hamster, using food-hoarding activity as a reinforcer. The first experiment demonstrated that hoarding reinforces a running response towards the goal section of a straight-alley runway, and that no such learning occurs when sated hamsters were not allowed to hoard food. However, a second experiment using various partial reinforcement schedules and a continuous reinforcement schedule did not give any evidence for the existence of a partial reinforcement acquisition effect (PRAE). The third experiment confirmed these results with an extended training procedure and showed a slight partial reinforcement extinction effect (PREE) mainly in the first sessions of the extinction phase.  相似文献   

16.
D. Marcelli 《Andrologie》1997,7(2):187-198
The problematics of adolescence brings into play, through what we have termed the body circle, the family circle, and the social circle, a series of paradoxes conflicts/oppoisitions, where each time a conquest is possible but where also there is potential risk. This permanent conflictuality, is characteristic of adolescence. confronted with this conflictuality, the capacity to assume the psychic conflict is therefore an essential factor in the development of the adolescent. From this point of view, cathexis of the internal psychic space is an essential element of an adolescent's capacity to deal with this psychic working-through. It is this cathexis that allows a reinforcement of the internal working-through and a better tolerance to expectation and psychic conflictuality. On the other hand, anything that runs in the sence of a diminution of this psychic functioning: projection, acting out, etc … reduces the adolescent's capacities of adaptation with regard to his internal psychic world as well his environment.  相似文献   

17.
Studies of human associative learning have often used causal/predictive learning preparations in which participants decide whether or not a first event is effective in causing or predicting a second event (i.e., an outcome). Those preparations have proved successful in replicating many Pavlovian phenomena. In the present paper we tested a novel associative learning preparation in which visually presented letters were paired with a visual outcome. Reaction times (RTs) were recorded to assess associative strength between specific cues and the outcome. Combining two different dependent variables (RTs and type of response given), we propose a rule for evaluating the associative strength between two events. The preparation and the data transformation rule were successful in producing several Pavlovian phenomena including excitatory acquisition, extinction, overshadowing, and latent inhibition, as well as established summation effects. Advantages and limitations of this new preparation based on the use of RT are discussed.  相似文献   

18.
Extinction describes the process of attenuating behavioral responses to neutral stimuli when they no longer provide the reinforcement that has been maintaining the behavior. There is close correspondence between fear and human anxiety, and therefore studies of extinction learning might provide insight into the biological nature of anxiety-related disorders such as post-traumatic stress disorder, and they might help to develop strategies to treat them. Preclinical research aims to aid extinction learning and to induce targeted plasticity in extinction circuits to consolidate the newly formed memory. Vagus nerve stimulation (VNS) is a powerful approach that provides tight temporal and circuit-specific release of neurotransmitters, resulting in modulation of neuronal networks engaged in an ongoing task. VNS enhances memory consolidation in both rats and humans, and pairing VNS with exposure to conditioned cues enhances the consolidation of extinction learning in rats. Here, we provide a detailed protocol for the preparation of custom-made parts and the surgical procedures required for VNS in rats. Using this protocol we show how VNS can facilitate the extinction of conditioned fear responses in an auditory fear conditioning task. In addition, we provide evidence that VNS modulates synaptic plasticity in the pathway between the infralimbic (IL) medial prefrontal cortex and the basolateral complex of the amygdala (BLA), which is involved in the expression and modulation of extinction memory.  相似文献   

19.
In this paper, we propose a novel approach to clustering noisy and complex data sets based on the eXtend Classifier Systems (XCS). The proposed approach, termed XCSc, has three main processes: (a) a learning process to evolve the rule population, (b) a rule compacting process to remove redundant rules after the learning process, and (c) a rule merging process to deal with the overlapping rules that commonly occur between the clusters. In the first process, we have modified the clustering mechanisms of the current available XCS and developed a new accelerate learning method to improve the quality of the evolved rule population. In the second process, an effective rule compacting algorithm is utilized. The rule merging process is based on our newly proposed agglomerative hierarchical rule merging algorithm, which comprises the following steps: (i) all the generated rules are modeled by a graph, with each rule representing a node; (ii) the vertices in the graph are merged to form a number of sub-graphs (i.e. rule clusters) under some pre-defined criteria, which generates the final rule set to represent the clusters; (iii) each data is re-checked and assigned to a cluster that it belongs to, guided by the final rule set. In our experiments, we compared the proposed XCSc with CHAMELEON, a benchmark algorithm well known for its excellent performance, on a number of challenging data sets. The results show that the proposed approach outperforms CHAMELEON in the successful rate, and also demonstrates good stability.  相似文献   

20.
We consider the efficient initialization of structure and parameters of generalized Gaussian radial basis function (RBF) networks using fuzzy decision trees generated by fuzzy ID3 like induction algorithms. The initialization scheme is based on the proposed functional equivalence property of fuzzy decision trees and generalized Gaussian RBF networks. The resulting RBF network is compact, easy to induce, comprehensible, and has acceptable classification accuracy with stochastic gradient descent learning algorithm.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号