首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
3.
4.
5.
6.
Associative search network: A reinforcement learning associative memory   总被引:10,自引:0,他引:10  
An associative memory system is presented which does not require a teacher to provide the desired associations. For each input key it conducts a search for the output pattern which optimizes an external payoff or reinforcement signal. The associative search network (ASN) combines pattern recognition and function optimization capabilities in a simple and effective way. We define the associative search problem, discuss conditions under which the associative search network is capable of solving it, and present results from computer simulations. The synthesis of sensory-motor control surfaces is discussed as an example of the associative search problem.  相似文献   

7.
8.
Human behavior displays hierarchical structure: simple actions cohere into subtask sequences, which work together to accomplish overall task goals. Although the neural substrates of such hierarchy have been the target of increasing research, they remain poorly understood. We propose that the computations supporting hierarchical behavior may relate to those in hierarchical reinforcement learning (HRL), a machine-learning framework that extends reinforcement-learning mechanisms into hierarchical domains. To test this, we leveraged a distinctive prediction arising from HRL. In ordinary reinforcement learning, reward prediction errors are computed when there is an unanticipated change in the prospects for accomplishing overall task goals. HRL entails that prediction errors should also occur in relation to task subgoals. In three neuroimaging studies we observed neural responses consistent with such subgoal-related reward prediction errors, within structures previously implicated in reinforcement learning. The results reported support the relevance of HRL to the neural processes underlying hierarchical behavior.  相似文献   

9.
10.
 A novel neural network model is presented that learns by trial-and-error to reproduce complex sensory-motor sequences. One subnetwork, corresponding to the prefrontal cortex (PFC), is responsible for generating unique patterns of activity that represent the continuous state of sequence execution. A second subnetwork, corresponding to the striatum, associates these state-encoding patterns with the correct response at each point in the sequence execution. From a neuroscience perspective, the model is based on the known cortical and subcortical anatomy of the primate oculomotor system. From a theoretical perspective, the architecture is similar to that of a finite automaton in which outputs and state transitions are generated as a function of inputs and the current state. Simulation results for complex sequence reproduction and sequence discrimination are presented. Received: 21 July 1994/Accepted in revised form: 21 March 1995  相似文献   

11.
The Bernard Distinguished Lecturers are individuals who have a history of experience and expertise in teaching that impacts multiple levels of health science education. Dr. Joel Michael more than meets these criteria. Joel earned a BS in biology from CalTech and a PhD in physiology from MIT following which he vigorously pursued his fascination with the mammalian central nervous system under continuous National Institutes of Health funding for a 15-yr period. At the same time, he became increasingly involved in teaching physiology, with the computer being his bridge between laboratory science and classroom teaching. Soon after incorporating computers into his laboratory, he began developing computer-based learning resources for his students. Observing students using these resources to solve problems led to an interest in the learning process itself. This in turn led to a research and development program, funded by the Office of Naval Research (ONR), that applied artificial intelligence to develop smart computer tutors. The impact of problem solving on student learning became the defining theme of National Science Foundation (NSF)-supported research in health science education that gradually moved all of Dr. Michael's academic efforts from neurophysiology to physiology education by the early 1980's. More recently, Joel has been instrumental in developing and maintaining the Physiology Education Research Consortium, a group of physiology teachers from around the nation who collaborate on diverse projects designed to enhance learning of the life sciences. In addition to research in education and learning science, Dr. Michael has devoted much of his time to helping physiology teachers adopt modern approaches to helping students learn. He has organized and presented faculty development workshops at many national and international venues. The topics for these workshops have included computer-based education, active learning, problem-based learning, and the use of general models in teaching physiology.  相似文献   

12.
13.
Little is known about how predator recognition develops under natural conditions. Predispositions to respond to some stimuli preferentially are likely to interact with the effects of experience. Convergent evidence from several studies suggests that predator-nai;ve tammar wallabies (Macropus eugenii) have some ability to respond to vertebrate predators differently from non-predators and that antipredator responses can be selectively enhanced by experience. Here, we examined the effects of differential reinforcement on responses to a model fox (Vulpes vulpes), cat (Felis catus) and conspecific wallaby. During training, tammars experienced paired presentations of a model fox and a simulated capture, as well as presentations of a wallaby and a cat alone. Training enhanced responses to the fox, relative to the conspecific wallaby, but acquired responses to the two predators did not differ, despite repeated, non-reinforced presentations of the cat. Results suggest that experience interacts with the wallabies' ability to perceive predators as a natural category.  相似文献   

14.
15.
Twenty-month-old rhesus monkeys were tested in a modified discrimination-reversal paradigm, which was designed to distinguish abstract learning from stimulus-response associational learning. Previous studies indicate that talapoin monkeys learn associationally and great apes via forming abstract concepts. Adult rhesus monkeys are apparently capable of forming simple abstractions, but learn primarily through associational process. The results of this study show the adolescent rhesus monkeys to be associational learners, with their response patterns indicating more complexity than the talapoins but less than the adult rhesus monkeys. The data suggest that rhesus monkeys develop their low-level capacity of abstract learning with maturation.  相似文献   

16.
Reinforcement learning algorithms have provided some of the most influential computational theories for behavioral learning that depends on reward and penalty. After briefly reviewing supporting experimental data, this paper tackles three difficult theoretical issues that remain to be explored. First, plain reinforcement learning is much too slow to be considered a plausible brain model. Second, although the temporal-difference error has an important role both in theory and in experiments, how to compute it remains an enigma. Third, function of all brain areas, including the cerebral cortex, cerebellum, brainstem and basal ganglia, seems to necessitate a new computational framework. Computational studies that emphasize meta-parameters, hierarchy, modularity and supervised learning to resolve these issues are reviewed here, together with the related experimental data.  相似文献   

17.
We developed the model of alimentary instrumental conditioned bar-pressing reflex for cats making a choice between either immediate small reinforcement ("impulsive behavior") or delayed more valuable reinforcement ("self-control behavior"). Our model is based on the reinforcement learning theory. We emulated dopamine contribution by discount coefficient of this theory (a subjective decrease in the value of a delayed reinforcement). The results of computer simulation showed that "cats" with large discount coefficient demonstrated "self-control behavior"; small discount coefficient was associated with "impulsive behavior". This data are in agreement with the experimental data indicating that the impulsive behavior is due to a decreased amount of dopamine in striatum.  相似文献   

18.
Reward,motivation, and reinforcement learning   总被引:15,自引:0,他引:15  
Dayan P  Balleine BW 《Neuron》2002,36(2):285-298
There is substantial evidence that dopamine is involved in reward learning and appetitive conditioning. However, the major reinforcement learning-based theoretical models of classical conditioning (crudely, prediction learning) are actually based on rules designed to explain instrumental conditioning (action learning). Extensive anatomical, pharmacological, and psychological data, particularly concerning the impact of motivational manipulations, show that these models are unreasonable. We review the data and consider the involvement of a rich collection of different neural systems in various aspects of these forms of conditioning. Dopamine plays a pivotal, but complicated, role.  相似文献   

19.
A human’s, or lower insects’, behavior is dominated by its nervous system. Each stable behavior has its own inner steps and control rules, and is regulated by a neural circuit. Understanding how the brain influences perception, thought, and behavior is a central mandate of neuroscience. The phototactic flight of insects is a widely observed deterministic behavior. Since its movement is not stochastic, the behavior should be dominated by a neural circuit. Based on the basic firing characteristics of biological neurons and the neural circuit’s constitution, we designed a plausible neural circuit for this phototactic behavior from logic perspective. The circuit’s output layer, which generates a stable spike firing rate to encode flight commands, controls the insect’s angular velocity when flying. The firing pattern and connection type of excitatory and inhibitory neurons are considered in this computational model. We simulated the circuit’s information processing using a distributed PC array, and used the real-time average firing rate of output neuron clusters to drive a flying behavior simulation. In this paper, we also explored how a correct neural decision circuit is generated from network flow view through a bee’s behavior experiment based on the reward and punishment feedback mechanism. The significance of this study: firstly, we designed a neural circuit to achieve the behavioral logic rules by strictly following the electrophysiological characteristics of biological neurons and anatomical facts. Secondly, our circuit’s generality permits the design and implementation of behavioral logic rules based on the most general information processing and activity mode of biological neurons. Thirdly, through computer simulation, we achieved new understanding about the cooperative condition upon which multi-neurons achieve some behavioral control. Fourthly, this study aims in understanding the information encoding mechanism and how neural circuits achieve behavior control. Finally, this study also helps establish a transitional bridge between the microscopic activity of the nervous system and macroscopic animal behavior.  相似文献   

20.
In this paper, an improved and much stronger RNH-QL method based on RBF network and heuristic Q-learning was put forward for route searching in a larger state space. Firstly, it solves the problem of inefficiency of reinforcement learning if a given problem’s state space is increased and there is a lack of prior information on the environment. Secondly, RBF network as weight updating rule, reward shaping can give an additional feedback to the agent in some intermediate states, which will help to guide the agent towards the goal state in a more controlled fashion. Meanwhile, with the process of Q-learning, it is accessible to the underlying dynamic knowledge, instead of the need of background knowledge of an upper level RBF network. Thirdly, it improves the learning efficiency by incorporating the greedy exploitation strategy to train the neural network, which has been testified by the experimental results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号