首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The effect of immunostimulator--Freund's adjuvant complete on the nociception and learning of white rats using food-obtaining and avoidance of electric shock techniques has been studied. It has been shown that the adjuvant in doses for active immunization causes itself the expressed changes in animal behaviour. The adjuvant's injections significantly increases the learning ability of animals both with negative and positive reinforcement as compared with control one. The changes in pain severity is marked only for dose 0.2 ml. The nonspecific action of adjuvant should be taken into account in researches which use the active immunisation method.  相似文献   

2.
The paper deals with the effect of anticerebral antibodies on rats learning in T-shaped maze with food reinforcement. The rats of August line were immunized by neurospecific proteins 10-40-4 from the human and rat's brain, 14-3-2 and cerebral tubuline. Rats immunized by bovine serum albumine (BSA) served as a control of the influence of antibodies to noncerebral protein. Rats injected with Freund adjuvant and saline, served as control of Freund adjuvant action. At the time of antibodies formation (from the 7-th day after the last injection) the rats were trained in a T-shaped maze during 4 days. The acquisition of conditioned reactions was inhibited in rats immunized with neurospecific proteins 10-40-4 from the rats brain, 14-3-2 and cerebral tubuline. Antibodies to neurospecific protein 10-40-4 from the rats brain produced the greatest effect on the process of learning. They elicited a significant decrease of conditioning and an increase of the number of errors. Antibodies to the neurospecific protein 10-40-4 from the human brain and to the BSA, i.e. to proteins which are not present in the nervous tissue of the rats, did not affect the learning. The obtained results indirectly confirm the permeability for the antibodies of the blood-brain barrier in immunized animals.  相似文献   

3.
Brown and Wanger [Brown, R.T., Wanger, A.R., 1964. Resistance to punishment and extinction following training with shock or nonreinforcement. J. Exp. Psychol. 68, 503-507] investigated rat behaviors with the following features: (1) rats were exposed to reward and punishment at the same time, (2) environment changed and rats relearned, and (3) rats were stochastically exposed to reward and punishment. The results are that exposure to nonreinforcement produces resistance to the decremental effects of behavior after stochastic reward schedule and that exposure to both punishment and reinforcement produces resistance to the decremental effects of behavior after stochastic punishment schedule. This paper aims to simulate the rat behaviors by a reinforcement learning algorithm in consideration of appearance probabilities of reinforcement signals. The former algorithms of reinforcement learning were unable to simulate the behavior of the feature (3). We improve the former reinforcement learning algorithms by controlling learning parameters in consideration of the acquisition probabilities of reinforcement signals. The proposed algorithm qualitatively simulates the result of the animal experiment of Brown and Wanger.  相似文献   

4.
The effect of inhibitor of serotonin and norepinephrine synthesis in the brain on learning was investigated in rats with emotionally different reinforcement. Parachlorphenylalanine (320 mg/kg) was shown to inhibit learning with food reinforcement, but facilitated learning with pain reinforcement. Disulfiram (100 mg/kg) inhibited learning with pain reinforcement considerably, but failed to influence learning with food reinforcement. Alpha-methyl-m-thyrosine inhibited both forms of learning. These new facts are in line with our previous data on mediating role of the brain monoaminergic systems between emotions and memory.  相似文献   

5.
Ability was shown of ants Myrmica rubra to multiple reconstructions of the habit elaborated in symmetrical multialternative maze under motivation of care for the progeny (transportation of breed of the own species). Reconstruction consisted in the change of reinforcement location on the left or right aim spot. The ants showed the ability to carry out the series of eight reconstructions during one-two days. An improvement took place of the fulfillment of the last reconstructions in comparison with the first ones. Peculiarities of learning and reconstructions were found in two groups of animals differing by conditions of learning: at reinforcement on both ain spors or on one of them. The results obtained are considered as indices of high plasticity of the behaviour of ants of the studied species.  相似文献   

6.
The present paper discusses an optimal learning control method using reinforcement learning for biological systems with a redundant actuator. It is difficult to apply reinforcement learning to biological control systems because of the redundancy in muscle activation space. We solve this problem with the following method. First, we divide the control input space into two subspaces according to a priority order of learning and restrict the search noise for reinforcement learning to the first priority subspace. Then the constraint is reduced as the learning progresses, with the search space extending to the second priority subspace. The higher priority subspace is designed so that the impedance of the arm can be high. A smooth reaching motion is obtained through reinforcement learning without any previous knowledge of the arms dynamics.  相似文献   

7.
Accumulating evidence shows that the neural network of the cerebral cortex and the basal ganglia is critically involved in reinforcement learning. Recent studies found functional heterogeneity within the cortico-basal ganglia circuit, especially in its ventromedial to dorsolateral axis. Here we review computational issues in reinforcement learning and propose a working hypothesis on how multiple reinforcement learning algorithms are implemented in the cortico-basal ganglia circuit using different representations of states, values, and actions.  相似文献   

8.
In the real world, many relationships between events are uncertain and probabilistic. Uncertainty is also likely to be a more common feature of daily experience for youth because they have less experience to draw from than adults. Some studies suggest probabilistic learning may be inefficient in youths compared to adults, while others suggest it may be more efficient in youths in mid adolescence. Here we used a probabilistic reinforcement learning task to test how youth age 8-17 (N = 187) and adults age 18-30 (N = 110) learn about stable probabilistic contingencies. Performance increased with age through early-twenties, then stabilized. Using hierarchical Bayesian methods to fit computational reinforcement learning models, we show that all participants’ performance was better explained by models in which negative outcomes had minimal to no impact on learning. The performance increase over age was driven by 1) an increase in learning rate (i.e. decrease in integration time scale); 2) a decrease in noisy/exploratory choices. In mid-adolescence age 13-15, salivary testosterone and learning rate were positively related. We discuss our findings in the context of other studies and hypotheses about adolescent brain development.  相似文献   

9.
Halici U 《Bio Systems》2001,63(1-3):21-34
The reinforcement learning scheme proposed in Halici (J. Biosystems 40 (1997) 83) for the random neural network (RNN) (Neural Computation 1 (1989) 502) is based on reward and performs well for stationary environments. However, when the environment is not stationary it suffers from getting stuck to the previously learned action and extinction is not possible. To overcome the problem, the reinforcement scheme is extended in Halici (Eur. J. Oper. Res., 126(2000) 288) by introducing a new weight update rule (E-rule) which takes into consideration the internal expectation of reinforcement. Although the E-rule is proposed for the RNN, it can be used for training learning automata or other intelligent systems based on reinforcement learning. This paper looks into the behavior of the learning scheme with internal expectation for the environments where the reinforcement is obtained after a sequence of cascaded decisions. The simulation results have shown that the RNN learns well and extinction is possible even for the cases with several decision steps and with hundreds of possible decision paths.  相似文献   

10.
Methyl beta-carboline-3-carboxylate (beta-CCM) and flumazenil (Ro15-1788) are known to be respectively an inverse agonist and an antagonist of the central benzodiazepine-receptor. Surprisingly, these two drugs have shown a similar enhancing effect in a negatively reinforced multiple-trial brightness discrimination task in mice. Thus, to evaluate the role of anxiety in this task, the action of these two drugs were compared in the same learning task with a positive or a negative reinforcement. Mice were trained for sessions of ten trials per day for six consecutive days. The sessions during the first three days took place after administration of beta-CCM (0.3 mg/kg), flumazenil (15 mg/kg) or vehicles of these drugs. A negative reinforcement (electric foot-shock) was used in a first experiment, and a positive one (food reward) in a second experiment. Results showed that, whatever the reinforcement, the two drugs enhance learning in a brightness discrimination task. The hypothesis is that flumazenil could have an inverse agonist profile in learning tasks. The question remains as to whether the flumazenil enhancing learning process results from increased arousal and/or anxiogenic factors, or from a negative modulatory influence of endogenous diazepam-like ligands for benzodiazepine receptors.  相似文献   

11.
Evidence has been accumulating to support the process of reinforcement as a potential mechanism in speciation. In many species, mate choice decisions are influenced by cultural factors, including learned mating preferences (sexual imprinting) or learned mate attraction signals (e.g., bird song). It has been postulated that learning can have a strong impact on the likelihood of speciation and perhaps on the process of reinforcement, but no models have explicitly considered learning in a reinforcement context. We review the evidence that suggests that learning may be involved in speciation and reinforcement, and present a model of reinforcement via learned preferences. We show that not only can reinforcement occur when preferences are learned by imprinting, but that such preferences can maintain species differences easily in comparison with both autosomal and sex-linked genetically inherited preferences. We highlight the need for more explicit study of the connection between the behavioral process of learning and the evolutionary process of reinforcement in natural systems.  相似文献   

12.
In learning from trial and error, animals need to relate behavioral decisions to environmental reinforcement even though it may be difficult to assign credit to a particular decision when outcomes are uncertain or subject to delays. When considering the biophysical basis of learning, the credit-assignment problem is compounded because the behavioral decisions themselves result from the spatio-temporal aggregation of many synaptic releases. We present a model of plasticity induction for reinforcement learning in a population of leaky integrate and fire neurons which is based on a cascade of synaptic memory traces. Each synaptic cascade correlates presynaptic input first with postsynaptic events, next with the behavioral decisions and finally with external reinforcement. For operant conditioning, learning succeeds even when reinforcement is delivered with a delay so large that temporal contiguity between decision and pertinent reward is lost due to intervening decisions which are themselves subject to delayed reinforcement. This shows that the model provides a viable mechanism for temporal credit assignment. Further, learning speeds up with increasing population size, so the plasticity cascade simultaneously addresses the spatial problem of assigning credit to synapses in different population neurons. Simulations on other tasks, such as sequential decision making, serve to contrast the performance of the proposed scheme to that of temporal difference-based learning. We argue that, due to their comparative robustness, synaptic plasticity cascades are attractive basic models of reinforcement learning in the brain.  相似文献   

13.
The ability of pigeons (Colomba livia, L.) and crows (Corvus corone cornix, L.) was studied to realize urgent numerousness judgement of reinforcement consisting of discrete elements (wheat grains and meal worm larvae, respectively). In the process of preliminary training the birds mastered the information about the conformity of the feeder colour with the definite number (1-9 for pigeons and 5-12 for crows) of reinforcement units at isolated presentation of feeders. In test at presentation of pairs formed from these feeders, pigeons and crows chose the stimulus connected with a greater quantity of reinforcement. In the range of 1-8 units the precision of choice in pigeons depended on absolute and relative differences between comparing values. In crows in the range of 6-12 this dependence was not revealed. The ability to solve the given test is considered as one of manifestations of elementary reasoning.  相似文献   

14.
An explanatory model is developed to show how synaptic learning mechanisms modeled through spike-timing dependent plasticity (STDP) can result in long-term adaptations consistent with reinforcement learning models. In particular, the reinforcement learning model known as temporal difference (TD) learning has been used to model neuronal behavior in the orbitofrontal cortex (OFC) and ventral tegmental area (VTA) of macaque monkey during reinforcement learning. While some research has observed, empirically, a connection between STDP and TD, there has not been an explanatory model directly connecting TD to STDP. Through analysis of the learning dynamics that results from a general form of a STDP learning rule, the connection between STDP and TD is explained. We further demonstrate that a STDP learning rule drives the spike probability of a reward predicting neuronal population to a stable equilibrium. The equilibrium solution has an increasing slope where the steepness of the slope predicts the probability of the reward, similar to the results from electrophysiological recordings suggesting a different slope that predicts the value of the anticipated reward of Montague and Berns [Neuron 36(2):265–284, 2002]. This connection begins to shed light into more recent data gathered from VTA and OFC which are not well modeled by TD. We suggest that STDP provides the underlying mechanism for explaining reinforcement learning and other higher level perceptual and cognitive function. This material is based upon work supported by the National Science Foundation under Grants No. IOB-0445648 (PDR) and DMS-0408334 (GL) and by a Career Support grant from Portland State University (GL).  相似文献   

15.
We investigate the problem of learning with incomplete information as exemplified by learning with delayed reinforcement. We study a two phase learning scenario in which a phase of Hebbian associative learning based on momentary internal representations is supplemented by an ‘unlearning’ phase depending on a graded reinforcement signal. The reinforcement signal quantifies the success-rate globally for a number of learning steps in phase one, and ‘unlearning’ is indiscriminate with respect to associations learnt in that phase. Learning according to this model is studied via simulations and analytically within a student–teacher scenario for both single layer networks and, for a committee machine. Success and speed of learning depend on the ratio λ of the learning rates used for the associative Hebbian learning phase and for the unlearning-correction in response to the reinforcement signal, respectively. Asymptotically perfect generalization is possible only, if this ratio exceeds a critical value λ c , in which case the generalization error exhibits a power law decay with the number of examples seen by the student, with an exponent that depends in a non-universal manner on the parameter λ. We find these features to be robust against a wide spectrum of modifications of microscopic modelling details. Two illustrative applications—one of a robot learning to navigate a field containing obstacles, and the problem of identifying a specific component in a collection of stimuli—are also provided.  相似文献   

16.
A theoretical framework of reinforcement learning plays an important role in understanding action selection in animals. Spiking neural networks provide a theoretically grounded means to test computational hypotheses on neurally plausible algorithms of reinforcement learning through numerical simulation. However, most of these models cannot handle observations which are noisy, or occurred in the past, even though these are inevitable and constraining features of learning in real environments. This class of problem is formally known as partially observable reinforcement learning (PORL) problems. It provides a generalization of reinforcement learning to partially observable domains. In addition, observations in the real world tend to be rich and high-dimensional. In this work, we use a spiking neural network model to approximate the free energy of a restricted Boltzmann machine and apply it to the solution of PORL problems with high-dimensional observations. Our spiking network model solves maze tasks with perceptually ambiguous high-dimensional observations without knowledge of the true environment. An extended model with working memory also solves history-dependent tasks. The way spiking neural networks handle PORL problems may provide a glimpse into the underlying laws of neural information processing which can only be discovered through such a top-down approach.  相似文献   

17.
Scientists and equestrians continually seek to achieve a clearer understanding of equine learning behaviour and its implications for training. Behavioural and learning processes in the horse are likely to influence not only equine athletic success but also the usefulness of the horse as a domesticated species. However given the status and commercial importance of the animal, equine learning behaviour has received only limited investigation. Indeed most experimental studies on equine cognitive function to date have addressed behaviour, learning and conceptualization processes at a moderately basic cognitive level compared to studies in other species. It is however, likely that the horses with the greatest ability to learn and form/understand concepts are those, which are better equipped to succeed in terms of the human-horse relationship and the contemporary training environment. Within equitation generally, interpretation of the behavioural processes and training of the desired responses in the horse are normally attempted using negative reinforcement strategies. On the other hand, experimental designs to actually induce and/or measure equine learning rely almost exclusively on primary positive reinforcement regimes. Employing two such different approaches may complicate interpretation and lead to difficulties in identifying problematic or undesirable behaviours in the horse. The visual system provides the horse with direct access to immediate environmental stimuli that affect behaviour but vision in the horse is of yet not fully investigated or understood. Further investigations of the equine visual system will benefit our understanding of equine perception, cognitive function and the subsequent link with learning and training. More detailed comparative investigations of feral or free-ranging and domestic horses may provide useful evidence of attention, stress and motivational issues affecting behavioural and learning processes in the horse. The challenge for scientists is, as always, to design and commission experiments that will investigate and provide insight into these processes in a manner that withstands scientific scrutiny.  相似文献   

18.
While there is no doubt that social signals affect human reinforcement learning, there is still no consensus about how this process is computationally implemented. To address this issue, we compared three psychologically plausible hypotheses about the algorithmic implementation of imitation in reinforcement learning. The first hypothesis, decision biasing (DB), postulates that imitation consists in transiently biasing the learner’s action selection without affecting their value function. According to the second hypothesis, model-based imitation (MB), the learner infers the demonstrator’s value function through inverse reinforcement learning and uses it to bias action selection. Finally, according to the third hypothesis, value shaping (VS), the demonstrator’s actions directly affect the learner’s value function. We tested these three hypotheses in 2 experiments (N = 24 and N = 44) featuring a new variant of a social reinforcement learning task. We show through model comparison and model simulation that VS provides the best explanation of learner’s behavior. Results replicated in a third independent experiment featuring a larger cohort and a different design (N = 302). In our experiments, we also manipulated the quality of the demonstrators’ choices and found that learners were able to adapt their imitation rate, so that only skilled demonstrators were imitated. We proposed and tested an efficient meta-learning process to account for this effect, where imitation is regulated by the agreement between the learner and the demonstrator. In sum, our findings provide new insights and perspectives on the computational mechanisms underlying adaptive imitation in human reinforcement learning.

This study investigates imitation from a computational perspective; three experiments show that, in the context of reinforcement learning, imitation operates via a durable modification of the learner''s values, shedding new light on how imitation is computationally implemented and shapes learning and decision-making.  相似文献   

19.
Abstract: Rats fed either a safflower oil (α-linolenate-deficient) or a perilla oil (α-linolenate-sufficient) diet through two generations (F1) showed significant differences in the brightness-discrimination learning task. In this task, correct responses were lever-pressing responses, which were reinforced with dietary pellets, and incorrect responses were those with no reinforcement. The inferior learning performance in the safflower oil group was caused mainly by the inferior ability to rectify the incorrect responses through the learning sessions. In the safflower oil group after the learning task, the average densities of synaptic vesicles in the terminals of the hippocampus CA1 region were decreased by nearly 30% as compared with those in the perilla oil group, and it is notable that this difference was not detected without the learning task. These results suggest that dietary oil-induced morphological changes in synapses in the hippocampus of rats are related to the differential learning performance and that the turnover rate of synaptic vesicles in the hippocampus may be an important factor affecting learning performance.  相似文献   

20.
The effect of two oligopeptides--ACTG4-10 and hexopeptide met-glu-his-D-phen-lis-L-phen (Dphen-GP) on memorizing the situation and on orienting-investigating reaction was studied in albino rats by the method of elaboration of food-procuring habit in T-maze and by the method of "open field". It was shown that these peptides in a dose of 15 mcg/kg with certain periods of administration, have an opposite effect on maze learning but a similar effect on memorizing in the "open field". ACTG4-10 slightly increases motor activity in the "open field", whereas Dphen-GP decreases it considerably. It is suggested that ACTG4-10 improves the formation of trace processes independently of the sign of reinforcement, whereas Dphen-GP selectively enhances defensive reaction and memorizing, connected with negative reinforcement.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号