期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An implementation of reinforcement learning based on spike timing dependent plasticity 总被引：1，自引：0，他引：1

Roberts PD Santiago RA Lafferriere G 《Biological cybernetics》2008,99(6):517-523

An explanatory model is developed to show how synaptic learning mechanisms modeled through spike-timing dependent plasticity (STDP) can result in long-term adaptations consistent with reinforcement learning models. In particular, the reinforcement learning model known as temporal difference (TD) learning has been used to model neuronal behavior in the orbitofrontal cortex (OFC) and ventral tegmental area (VTA) of macaque monkey during reinforcement learning. While some research has observed, empirically, a connection between STDP and TD, there has not been an explanatory model directly connecting TD to STDP. Through analysis of the learning dynamics that results from a general form of a STDP learning rule, the connection between STDP and TD is explained. We further demonstrate that a STDP learning rule drives the spike probability of a reward predicting neuronal population to a stable equilibrium. The equilibrium solution has an increasing slope where the steepness of the slope predicts the probability of the reward, similar to the results from electrophysiological recordings suggesting a different slope that predicts the value of the anticipated reward of Montague and Berns [Neuron 36(2):265–284, 2002]. This connection begins to shed light into more recent data gathered from VTA and OFC which are not well modeled by TD. We suggest that STDP provides the underlying mechanism for explaining reinforcement learning and other higher level perceptual and cognitive function. This material is based upon work supported by the National Science Foundation under Grants No. IOB-0445648 (PDR) and DMS-0408334 (GL) and by a Career Support grant from Portland State University (GL). 相似文献

2.

Computational Consequences of Temporally Asymmetric Learning Rules: II. Sensory Image Cancellation

Roberts PD Bell CC 《Journal of computational neuroscience》2000,9(1):67-83

The electrosensory lateral line lobe (ELL) of mormyrid electric fish is a cerebellum-like structure that receives primary afferent input from electroreceptors in the skin. Purkinje-like cells in ELL store and retrieve a temporally precise negative image of prior sensory input. The stored image is derived from the association of centrally originating predictive signals with peripherally originating sensory input. The predictive signals are probably conveyed by parallel fibers. Recent in vitro experiments have demonstrated that pairing parallel fiber-evoked excitatory postsynaptic potentials (epsps) with postsynaptic spikes in Purkinje-like cells depresses the strength of these synapses. The depression has a tight dependence on the temporal order of pre- and postsynaptic events. The postsynaptic spike must follow the onset of the epsp within a window of about 60 msec for the depression to occur and pairings at other delays yield a nonassociative enhancement of the epsp. Mathematical analyses and computer simulations are used here to test the hypothesis that synaptic plasticity of the type established in vitro could be responsible for the storage of temporal patterns that is observed in vivo. This hypothesis is confirmed. The temporally asymmetric learning rule established in vitro results in the storage of activity patterns as observed in vivo and does so with significantly greater fidelity than other types of learning rules. The results demonstrate the importance of precise timing in pre- and postsynaptic activity for accurate storage of temporal information. 相似文献

3.

Pattern orthogonalization via channel decorrelation by adaptive networks

Stuart D. Wick Martin T. Wiechert Rainer W. Friedrich Hermann Riecke 《Journal of computational neuroscience》2010,28(1):29-45

The early processing of sensory information by neuronal circuits often includes a reshaping of activity patterns that may facilitate further processing in the brain. For instance, in the olfactory system the activity patterns that related odors evoke at the input of the olfactory bulb can be highly similar. Nevertheless, the corresponding activity patterns of the mitral cells, which represent the output of the olfactory bulb, can differ significantly from each other due to strong inhibition by granule cells and peri-glomerular cells. Motivated by these results we study simple adaptive inhibitory networks that aim to separate or even orthogonalize activity patterns representing similar stimuli. Since the animal experiences the different stimuli at different times it is difficult for the network to learn the connectivity based on their similarity; biologically it is more plausible that learning is driven by simultaneous correlations between the input channels. We investigate the connection between pattern orthogonalization and channel decorrelation and demonstrate that networks can achieve effective pattern orthogonalization through channel decorrelation if they simultaneously equalize their output levels. In feedforward networks biophysically plausible learning mechanisms fail, however, for even moderately similar input patterns. Recurrent networks do not have that limitation; they can orthogonalize the representations of highly similar input patterns. Even when they are optimized for linear neuronal dynamics they perform very well when the dynamics are nonlinear. These results provide insights into fundamental features of simplified inhibitory networks that may be relevant for pattern orthogonalization by neuronal circuits in general. 相似文献

4.

The chronotron: a neuron that learns to fire temporally precise spike patterns

RV Florian 《PloS one》2012,7(8):e40233

In many cases, neurons process information carried by the precise timings of spikes. Here we show how neurons can learn to generate specific temporally precise output spikes in response to input patterns of spikes having precise timings, thus processing and memorizing information that is entirely temporally coded, both as input and as output. We introduce two new supervised learning rules for spiking neurons with temporal coding of information (chronotrons), one that provides high memory capacity (E-learning), and one that has a higher biological plausibility (I-learning). With I-learning, the neuron learns to fire the target spike trains through synaptic changes that are proportional to the synaptic currents at the timings of real and target output spikes. We study these learning rules in computer simulations where we train integrate-and-fire neurons. Both learning rules allow neurons to fire at the desired timings, with sub-millisecond precision. We show how chronotrons can learn to classify their inputs, by firing identical, temporally precise spike trains for different inputs belonging to the same class. When the input is noisy, the classification also leads to noise reduction. We compute lower bounds for the memory capacity of chronotrons and explore the influence of various parameters on chronotrons' performance. The chronotrons can model neurons that encode information in the time of the first spike relative to the onset of salient stimuli or neurons in oscillatory networks that encode information in the phases of spikes relative to the background oscillation. Our results show that firing one spike per cycle optimizes memory capacity in neurons encoding information in the phase of firing relative to a background rhythm. 相似文献

5.

Flavor Preference Learning Increases Olfactory and Gustatory Convergence onto Single Neurons in the Basolateral Amygdala but Not in the Insular Cortex in Rats

Bertrand Desgranges Victor Ramirez-Amaya Itzel Rica?o-Cornejo Frédéric Lévy Guillaume Ferreira 《PloS one》2010,5(4)

The basolateral amygdala (BLA) and the insular cortex (IC) represent two major areas for odor-taste associations, i.e. flavor integration. This learning may require the development of convergent odor and taste neuronal activation allowing the memory representation of such association. Yet identification of neurons that respond to such coincident input and the effect of flavor experience on odor-taste convergence remain unclear. In the present study we used the compartmental analysis of temporal activity using fluorescence in situ hybridization for Arc (catFISH) to visualize odor-taste convergence onto single neurons in the BLA and in the IC to assess the number of cells that were co-activated by both stimuli after odor-taste association. We used a sucrose conditioned odor preference as a flavor experience in rats, in which 9 odor-sucrose pairings induce a reliable odor-taste association. The results show that flavor experience induced a four-fold increase in the percentage of cells activated by both taste and odor stimulations in the BLA, but not in the IC. Because conditioned odor preference did not modify the number of cells responding selectively to one stimulus, this greater odor-taste convergence into individual BLA neurons suggests the recruitment of a neuronal population that can be activated by both odor and taste only after the association. We conclude that the development of convergent activation in amygdala neurons after odor-taste associative learning may provide a cellular basis of flavor memory. 相似文献

6.

Computational Consequences of Temporally Asymmetric Learning Rules: I. Differential Hebbian Learning

Roberts PD 《Journal of computational neuroscience》1999,7(3):235-246

Temporally asymetric learning rules governing plastic changes in synaptic efficacy have recently been identified in physiological studies. In these rules, the exact timing of pre- and postsynaptic spikes is critical to the induced change of synaptic efficacy. The temporal learning rules treated in this article are approximately antisymmetric; the synaptic efficacy is enhanced if the postsynaptic spike follows the presynaptic spike by a few milliseconds, but the efficacy is depressed if the postsynaptic spike precedes the presynaptic spike. The learning dynamics of this rule are studied using a stochastic model neuron receiving a set of serially delayed inputs. The average change of synaptic efficacy due to the temporally antisymmetric learning rule is shown to yield differential Hebbian learning. These results are demonstrated with both mathematical analyses and computer simulations, and connections with theories of classical conditioning are discussed. 相似文献

7.

Songbird: a unique animal model for studying the molecular basis of disorders of vocal development and communication

Chihiro MORI Kazuhiro WADA 《Experimental Animals》2015,64(3):221-230

Like humans, songbirds are one of the few animal groups that learn vocalization. Vocal learning requires coordination of auditory input and vocal output using auditory feedback to guide one’s own vocalizations during a specific developmental stage known as the critical period. Songbirds are good animal models for understand the neural basis of vocal learning, a complex form of imitation, because they have many parallels to humans with regard to the features of vocal behavior and neural circuits dedicated to vocal learning. In this review, we will summarize the behavioral, neural, and genetic traits of birdsong. We will also discuss how studies of birdsong can help us understand how the development of neural circuits for vocal learning and production is driven by sensory input (auditory information) and motor output (vocalization). 相似文献

8.

Symbol manipulation and rule learning in spiking neuronal networks

Fernando C 《Journal of theoretical biology》2011,275(1):29-41

相似文献

9.

Instantaneous Non-Linear Processing by Pulse-Coupled Threshold Units

Moritz Helias Moritz Deger Stefan Rotter Markus Diesmann 《PLoS computational biology》2010,6(9)

Contemporary theory of spiking neuronal networks is based on the linear response of the integrate-and-fire neuron model derived in the diffusion limit. We find that for non-zero synaptic weights, the response to transient inputs differs qualitatively from this approximation. The response is instantaneous rather than exhibiting low-pass characteristics, non-linearly dependent on the input amplitude, asymmetric for excitation and inhibition, and is promoted by a characteristic level of synaptic background noise. We show that at threshold the probability density of the potential drops to zero within the range of one synaptic weight and explain how this shapes the response. The novel mechanism is exhibited on the network level and is a generic property of pulse-coupled networks of threshold units. 相似文献

10.

An imperfect dopaminergic error signal can drive temporal-difference learning

Potjans W Diesmann M Morrison A 《PLoS computational biology》2011,7(5):e1001133

An open problem in the field of computational neuroscience is how to link synaptic plasticity to system-level learning. A promising framework in this context is temporal-difference (TD) learning. Experimental evidence that supports the hypothesis that the mammalian brain performs temporal-difference learning includes the resemblance of the phasic activity of the midbrain dopaminergic neurons to the TD error and the discovery that cortico-striatal synaptic plasticity is modulated by dopamine. However, as the phasic dopaminergic signal does not reproduce all the properties of the theoretical TD error, it is unclear whether it is capable of driving behavior adaptation in complex tasks. Here, we present a spiking temporal-difference learning model based on the actor-critic architecture. The model dynamically generates a dopaminergic signal with realistic firing rates and exploits this signal to modulate the plasticity of synapses as a third factor. The predictions of our proposed plasticity dynamics are in good agreement with experimental results with respect to dopamine, pre- and post-synaptic activity. An analytical mapping from the parameters of our proposed plasticity dynamics to those of the classical discrete-time TD algorithm reveals that the biological constraints of the dopaminergic signal entail a modified TD algorithm with self-adapting learning parameters and an adapting offset. We show that the neuronal network is able to learn a task with sparse positive rewards as fast as the corresponding classical discrete-time TD algorithm. However, the performance of the neuronal network is impaired with respect to the traditional algorithm on a task with both positive and negative rewards and breaks down entirely on a task with purely negative rewards. Our model demonstrates that the asymmetry of a realistic dopaminergic signal enables TD learning when learning is driven by positive rewards but not when driven by negative rewards. 相似文献

11.

Behavioral analysis of differential hebbian learning in closed-loop systems

Tomas Kulvicius Christoph Kolodziejski Minija Tamosiunaite Bernd Porr Florentin Wörgötter 《Biological cybernetics》2010,103(4):255-271

相似文献

12.

Neuronal models for sleep-wake regulation and synaptic reorganization in the sleeping hippocampus

Best J Diniz Behn C Poe GR Booth V 《Journal of biological rhythms》2007,22(3):220-232

In this article, we discuss mathematical models that address the control of sleep-wake behavior in the infant and adult rodent and a model that addresses changes in single-cell firing patterns in the hippocampus across wake and rapid eye movement (REM) sleep states. Each of the models describes the dynamics of experimentally identified neuronal components--either the firing activity of wake-and sleep-promoting neuronal populations or the spiking activity of hippocampal pyramidal neurons. Our discussion of each model illustrates how a mathematical model that describes the temporal dynamics of the modeled neuronal components can reveal specifics about proposed neuronal mechanisms that underlie sleep-wake regulation or sleep-specific firing patterns. For example, the dynamics of the models developed for sleep-wake regulation in the infant rodent lend insight into the involved brain-stem neuronal populations and the evolution of the network during maturation. The results of the model for sleep-wake regulation in the adult rodent suggest distinct properties of the involved neuronal populations and their interactions that account for long-lasting and brief waking bouts. The dynamics of the model for sleep-specific hippocampal neural activity proposes neural mechanisms to account for observed activity changes that can invoke synaptic reorganization associated with learning and memory consolidation. 相似文献

13.

On learning dynamics underlying the evolution of learning rules

《Theoretical population biology》2014

In order to understand the development of non-genetically encoded actions during an animal’s lifespan, it is necessary to analyze the dynamics and evolution of learning rules producing behavior. Owing to the intrinsic stochastic and frequency-dependent nature of learning dynamics, these rules are often studied in evolutionary biology via agent-based computer simulations. In this paper, we show that stochastic approximation theory can help to qualitatively understand learning dynamics and formulate analytical models for the evolution of learning rules. We consider a population of individuals repeatedly interacting during their lifespan, and where the stage game faced by the individuals fluctuates according to an environmental stochastic process. Individuals adjust their behavioral actions according to learning rules belonging to the class of experience-weighted attraction learning mechanisms, which includes standard reinforcement and Bayesian learning as special cases. We use stochastic approximation theory in order to derive differential equations governing action play probabilities, which turn out to have qualitative features of mutator-selection equations. We then perform agent-based simulations to find the conditions where the deterministic approximation is closest to the original stochastic learning process for standard 2-action 2-player fluctuating games, where interaction between learning rules and preference reversal may occur. Finally, we analyze a simplified model for the evolution of learning in a producer–scrounger game, which shows that the exploration rate can interact in a non-intuitive way with other features of co-evolving learning rules. Overall, our analyses illustrate the usefulness of applying stochastic approximation theory in the study of animal learning. 相似文献

14.

Is He Being Bad? Social and Language Brain Networks during Social Judgment in Children with Autism

Elizabeth J. Carter Diane L. Williams Nancy J. Minshew Jill F. Lehman 《PloS one》2012,7(10)

Individuals with autism often violate social rules and have lower accuracy in identifying and explaining inappropriate social behavior. Twelve children with autism (AD) and thirteen children with typical development (TD) participated in this fMRI study of the neurofunctional basis of social judgment. Participants indicated in which of two pictures a boy was being bad (Social condition) or which of two pictures was outdoors (Physical condition). In the within-group Social–Physical comparison, TD children used components of mentalizing and language networks [bilateral inferior frontal gyrus (IFG), bilateral medial prefrontal cortex (mPFC), and bilateral posterior superior temporal sulcus (pSTS)], whereas AD children used a network that was primarily right IFG and bilateral pSTS, suggesting reduced use of social and language networks during this social judgment task. A direct group comparison on the Social–Physical contrast showed that the TD group had greater mPFC, bilateral IFG, and left superior temporal pole activity than the AD group. No regions were more active in the AD group than in the group with TD in this comparison. Both groups successfully performed the task, which required minimal language. The groups also performed similarly on eyetracking measures, indicating that the activation results probably reflect the use of a more basic strategy by the autism group rather than performance disparities. Even though language was unnecessary, the children with TD recruited language areas during the social task, suggesting automatic encoding of their knowledge into language; however, this was not the case for the children with autism. These findings support behavioral research indicating that, whereas children with autism may recognize socially inappropriate behavior, they have difficulty using spoken language to explain why it is inappropriate. The fMRI results indicate that AD children may not automatically use language to encode their social understanding, making expression and generalization of this knowledge more difficult. 相似文献

15.

Automatic Adaptation to Fast Input Changes in a Time-Invariant Neural Circuit

Arjun Bharioke Dmitri B. Chklovskii 《PLoS computational biology》2015,11(8)

Neurons must faithfully encode signals that can vary over many orders of magnitude despite having only limited dynamic ranges. For a correlated signal, this dynamic range constraint can be relieved by subtracting away components of the signal that can be predicted from the past, a strategy known as predictive coding, that relies on learning the input statistics. However, the statistics of input natural signals can also vary over very short time scales e.g., following saccades across a visual scene. To maintain a reduced transmission cost to signals with rapidly varying statistics, neuronal circuits implementing predictive coding must also rapidly adapt their properties. Experimentally, in different sensory modalities, sensory neurons have shown such adaptations within 100 ms of an input change. Here, we show first that linear neurons connected in a feedback inhibitory circuit can implement predictive coding. We then show that adding a rectification nonlinearity to such a feedback inhibitory circuit allows it to automatically adapt and approximate the performance of an optimal linear predictive coding network, over a wide range of inputs, while keeping its underlying temporal and synaptic properties unchanged. We demonstrate that the resulting changes to the linearized temporal filters of this nonlinear network match the fast adaptations observed experimentally in different sensory modalities, in different vertebrate species. Therefore, the nonlinear feedback inhibitory network can provide automatic adaptation to fast varying signals, maintaining the dynamic range necessary for accurate neuronal transmission of natural inputs. 相似文献

16.

Input-output relationship of the Leaky-Integrator Neuron Model

Hans Scharstein 《Journal of mathematical biology》1979,8(4):403-420

相似文献

17.

Dissociable reward and timing signals in human midbrain and ventral striatum

Klein-Flügge MC Hunt LT Bach DR Dolan RJ Behrens TE 《Neuron》2011,72(4):654-664

Reward prediction error (RPE) signals are central to current models of reward-learning. Temporal difference (TD) learning models posit that these signals should be modulated by predictions, not only of magnitude but also timing of reward. Here we show that BOLD activity in the VTA conforms to such TD predictions: responses to unexpected rewards are modulated by a temporal hazard function and activity between a predictive stimulus and reward is depressed in proportion to predicted reward. By contrast, BOLD activity in ventral striatum (VS) does not reflect a TD RPE, but instead encodes a signal on the variable relevant for behavior, here timing but not magnitude of reward. The results have important implications for dopaminergic models of cortico-striatal learning and suggest a modification of the conventional view that VS BOLD necessarily reflects inputs from dopaminergic VTA neurons signaling an RPE. 相似文献

18.

History-dependent excitability as a single-cell substrate of transient memory for information discrimination

Baroni F Torres JJ Varona P 《PloS one》2010,5(12):e15023

Neurons react differently to incoming stimuli depending upon their previous history of stimulation. This property can be considered as a single-cell substrate for transient memory, or context-dependent information processing: depending upon the current context that the neuron "sees" through the subset of the network impinging on it in the immediate past, the same synaptic event can evoke a postsynaptic spike or just a subthreshold depolarization. We propose a formal definition of History-Dependent Excitability (HDE) as a measure of the propensity to firing in any moment in time, linking the subthreshold history-dependent dynamics with spike generation. This definition allows the quantitative assessment of the intrinsic memory for different single-neuron dynamics and input statistics. We illustrate the concept of HDE by considering two general dynamical mechanisms: the passive behavior of an Integrate and Fire (IF) neuron, and the inductive behavior of a Generalized Integrate and Fire (GIF) neuron with subthreshold damped oscillations. This framework allows us to characterize the sensitivity of different model neurons to the detailed temporal structure of incoming stimuli. While a neuron with intrinsic oscillations discriminates equally well between input trains with the same or different frequency, a passive neuron discriminates better between inputs with different frequencies. This suggests that passive neurons are better suited to rate-based computation, while neurons with subthreshold oscillations are advantageous in a temporal coding scheme. We also address the influence of intrinsic properties in single-cell processing as a function of input statistics, and show that intrinsic oscillations enhance discrimination sensitivity at high input rates. Finally, we discuss how the recognition of these cell-specific discrimination properties might further our understanding of neuronal network computations and their relationships to the distribution and functional connectivity of different neuronal types. 相似文献

19.

Time-oriented hierarchical method for computation of principal components using subspace learning algorithm

Jankovic M Ogawa H 《International journal of neural systems》2004,14(5):313-323

Principal Component Analysis (PCA) and Principal Subspace Analysis (PSA) are classic techniques in statistical data analysis, feature extraction and data compression. Given a set of multivariate measurements, PCA and PSA provide a smaller set of "basis vectors" with less redundancy, and a subspace spanned by them, respectively. Artificial neurons and neural networks have been shown to perform PSA and PCA when gradient ascent (descent) learning rules are used, which is related to the constrained maximization (minimization) of statistical objective functions. Due to their low complexity, such algorithms and their implementation in neural networks are potentially useful in cases of tracking slow changes of correlations in the input data or in updating eigenvectors with new samples. In this paper we propose PCA learning algorithm that is fully homogeneous with respect to neurons. The algorithm is obtained by modification of one of the most famous PSA learning algorithms--Subspace Learning Algorithm (SLA). Modification of the algorithm is based on Time-Oriented Hierarchical Method (TOHM). The method uses two distinct time scales. On a faster time scale PSA algorithm is responsible for the "behavior" of all output neurons. On a slower scale, output neurons will compete for fulfillment of their "own interests". On this scale, basis vectors in the principal subspace are rotated toward the principal eigenvectors. At the end of the paper it will be briefly analyzed how (or why) time-oriented hierarchical method can be used for transformation of any of the existing neural network PSA method, into PCA method. 相似文献

20.

The spatiotemporal learning rule and its efficiency in separating spatiotemporal patterns

Tsukada M Pan X 《Biological cybernetics》2005,92(2):139-146

The hippocampus plays an important role in the course of establishing long-term memory, i.e., to make short-term memory of spatially and temporally associated input information. In 1996 (Tsukada et al. 1996), the spatiotemporal learning rule was proposed based on differences observed in hippocampal long-term potentiation (LTP) induced by various spatiotemporal pattern stimuli. One essential point of this learning rule is that the change of synaptic weight depends on both spatial coincidence and the temporal summation of input pulses. We applied this rule to a single-layered neural network and compared its ability to separate spatiotemporal patterns with that of other rules, including the Hebbian learning rule and its extended rules. The simulated results showed that the spatiotemporal learning rule had the highest efficiency in discriminating spatiotemporal pattern sequences, while the Hebbian learning rule (including its extended rules) was sensitive to differences in spatial patterns. 相似文献