期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Learning to Obtain Reward,but Not Avoid Punishment,Is Affected by Presence of PTSD Symptoms in Male Veterans: Empirical Data and Computational Model

Catherine E. Myers Ahmed A. Moustafa Jony Sheynin Kirsten M. VanMeenen Mark W. Gilbertson Scott P. Orr Kevin D. Beck Kevin C. H. Pang Richard J. Servatius 《PloS one》2013,8(8)

Post-traumatic stress disorder (PTSD) symptoms include behavioral avoidance which is acquired and tends to increase with time. This avoidance may represent a general learning bias; indeed, individuals with PTSD are often faster than controls on acquiring conditioned responses based on physiologically-aversive feedback. However, it is not clear whether this learning bias extends to cognitive feedback, or to learning from both reward and punishment. Here, male veterans with self-reported current, severe PTSD symptoms (PTSS group) or with few or no PTSD symptoms (control group) completed a probabilistic classification task that included both reward-based and punishment-based trials, where feedback could take the form of reward, punishment, or an ambiguous “no-feedback” outcome that could signal either successful avoidance of punishment or failure to obtain reward. The PTSS group outperformed the control group in total points obtained; the PTSS group specifically performed better than the control group on reward-based trials, with no difference on punishment-based trials. To better understand possible mechanisms underlying observed performance, we used a reinforcement learning model of the task, and applied maximum likelihood estimation techniques to derive estimated parameters describing individual participants’ behavior. Estimations of the reinforcement value of the no-feedback outcome were significantly greater in the control group than the PTSS group, suggesting that the control group was more likely to value this outcome as positively reinforcing (i.e., signaling successful avoidance of punishment). This is consistent with the control group’s generally poorer performance on reward trials, where reward feedback was to be obtained in preference to the no-feedback outcome. Differences in the interpretation of ambiguous feedback may contribute to the facilitated reinforcement learning often observed in PTSD patients, and may in turn provide new insight into how pathological behaviors are acquired and maintained in PTSD. 相似文献

2.

Better Than I Thought: Positive Evaluation Bias in Hypomania

Liam Mason Noreen O'Sullivan Richard P. Bentall Wael El-Deredy 《PloS one》2012,7(10)

Background

Mania is characterised by increased impulsivity and risk-taking, and psychological accounts argue that these features may be due to hypersensitivity to reward. The neurobiological mechanisms remain poorly understood. Here we examine reinforcement learning and sensitivity to both reward and punishment outcomes in hypomania-prone individuals not receiving pharmacotherapy.

Method

We recorded EEG from 45 healthy individuals split into three groups by low, intermediate and high self-reported hypomanic traits. Participants played a computerised card game in which they learned the reward contingencies of three cues. Neural responses to monetary gain and loss were measured using the feedback-related negativity (FRN), a component implicated in motivational outcome evaluation and reinforcement learning.

Results

As predicted, rewards elicited a smaller FRN in the hypomania-prone group relative to the low hypomania group, indicative of greater reward responsiveness. The hypomania-prone group also showed smaller FRN to losses, indicating diminished response to negative feedback.

Conclusion

Our findings indicate that proneness to hypomania is associated with both reward hypersensitivity and discounting of punishment. This positive evaluation bias may be driven by aberrant reinforcement learning signals, which fail to update future expectations. This provides a possible neural mechanism explaining risk-taking and impaired reinforcement learning in BD. Further research will be needed to explore the potential value of the FRN as a biological vulnerability marker for mania and pathological risk-taking. 相似文献

3.

Counterfactual choice and learning in a neural network centered on human lateral frontopolar cortex

Boorman ED Behrens TE Rushworth MF 《PLoS biology》2011,9(6):e1001093

Decision making and learning in a real-world context require organisms to track not only the choices they make and the outcomes that follow but also other untaken, or counterfactual, choices and their outcomes. Although the neural system responsible for tracking the value of choices actually taken is increasingly well understood, whether a neural system tracks counterfactual information is currently unclear. Using a three-alternative decision-making task, a Bayesian reinforcement-learning algorithm, and fMRI, we investigated the coding of counterfactual choices and prediction errors in the human brain. Rather than representing evidence favoring multiple counterfactual choices, lateral frontal polar cortex (lFPC), dorsomedial frontal cortex (DMFC), and posteromedial cortex (PMC) encode the reward-based evidence favoring the best counterfactual option at future decisions. In addition to encoding counterfactual reward expectations, the network carries a signal for learning about counterfactual options when feedback is available-a counterfactual prediction error. Unlike other brain regions that have been associated with the processing of counterfactual outcomes, counterfactual prediction errors within the identified network cannot be related to regret theory. Furthermore, individual variation in counterfactual choice-related activity and prediction error-related activity, respectively, predicts variation in the propensity to switch to profitable choices in the future and the ability to learn from hypothetical feedback. Taken together, these data provide both neural and behavioral evidence to support the existence of a previously unidentified neural system responsible for tracking both counterfactual choice options and their outcomes. 相似文献

4.

Adaptive properties of differential learning rates for positive and negative outcomes

Romain D. Cazé Matthijs A. A. van der Meer 《Biological cybernetics》2013,107(6):711-719

The concept of the reward prediction error—the difference between reward obtained and reward predicted—continues to be a focal point for much theoretical and experimental work in psychology, cognitive science, and neuroscience. Models that rely on reward prediction errors typically assume a single learning rate for positive and negative prediction errors. However, behavioral data indicate that better-than-expected and worse-than-expected outcomes often do not have symmetric impacts on learning and decision-making. Furthermore, distinct circuits within cortico-striatal loops appear to support learning from positive and negative prediction errors, respectively. Such differential learning rates would be expected to lead to biased reward predictions and therefore suboptimal choice performance. Contrary to this intuition, we show that on static “bandit” choice tasks, differential learning rates can be adaptive. This occurs because asymmetric learning enables a better separation of learned reward probabilities. We show analytically how the optimal learning rate asymmetry depends on the reward distribution and implement a biologically plausible algorithm that adapts the balance of positive and negative learning rates from experience. These results suggest specific adaptive advantages for separate, differential learning rates in simple reinforcement learning settings and provide a novel, normative perspective on the interpretation of associated neural data. 相似文献

5.

An evolutionary computational theory of prefrontal executive function in decision-making

Etienne Koechlin 《Philosophical transactions of the Royal Society of London. Series B, Biological sciences》2014,369(1655)

The prefrontal cortex subserves executive control and decision-making, that is, the coordination and selection of thoughts and actions in the service of adaptive behaviour. We present here a computational theory describing the evolution of the prefrontal cortex from rodents to humans as gradually adding new inferential Bayesian capabilities for dealing with a computationally intractable decision problem: exploring and learning new behavioural strategies versus exploiting and adjusting previously learned ones through reinforcement learning (RL). We provide a principled account identifying three inferential steps optimizing this arbitration through the emergence of (i) factual reactive inferences in paralimbic prefrontal regions in rodents; (ii) factual proactive inferences in lateral prefrontal regions in primates and (iii) counterfactual reactive and proactive inferences in human frontopolar regions. The theory clarifies the integration of model-free and model-based RL through the notion of strategy creation. The theory also shows that counterfactual inferences in humans yield to the notion of hypothesis testing, a critical reasoning ability for approximating optimal adaptive processes and presumably endowing humans with a qualitative evolutionary advantage in adaptive behaviour. 相似文献

6.

Reinforcement Learning of Targeted Movement in a Spiking Neuronal Model of Motor Cortex

George L. Chadderdon Samuel A. Neymotin Cliff C. Kerr William W. Lytton 《PloS one》2012,7(10)

Sensorimotor control has traditionally been considered from a control theory perspective, without relation to neurobiology. In contrast, here we utilized a spiking-neuron model of motor cortex and trained it to perform a simple movement task, which consisted of rotating a single-joint “forearm” to a target. Learning was based on a reinforcement mechanism analogous to that of the dopamine system. This provided a global reward or punishment signal in response to decreasing or increasing distance from hand to target, respectively. Output was partially driven by Poisson motor babbling, creating stochastic movements that could then be shaped by learning. The virtual forearm consisted of a single segment rotated around an elbow joint, controlled by flexor and extensor muscles. The model consisted of 144 excitatory and 64 inhibitory event-based neurons, each with AMPA, NMDA, and GABA synapses. Proprioceptive cell input to this model encoded the 2 muscle lengths. Plasticity was only enabled in feedforward connections between input and output excitatory units, using spike-timing-dependent eligibility traces for synaptic credit or blame assignment. Learning resulted from a global 3-valued signal: reward (+1), no learning (0), or punishment (−1), corresponding to phasic increases, lack of change, or phasic decreases of dopaminergic cell firing, respectively. Successful learning only occurred when both reward and punishment were enabled. In this case, 5 target angles were learned successfully within 180 s of simulation time, with a median error of 8 degrees. Motor babbling allowed exploratory learning, but decreased the stability of the learned behavior, since the hand continued moving after reaching the target. Our model demonstrated that a global reinforcement signal, coupled with eligibility traces for synaptic plasticity, can train a spiking sensorimotor network to perform goal-directed motor behavior. 相似文献

7.

Simulation of rat behavior by a reinforcement learning algorithm in consideration of appearance probabilities of reinforcement signals

Murakoshi K Noguchi T 《Bio Systems》2005,80(1):83-90

Brown and Wanger [Brown, R.T., Wanger, A.R., 1964. Resistance to punishment and extinction following training with shock or nonreinforcement. J. Exp. Psychol. 68, 503-507] investigated rat behaviors with the following features: (1) rats were exposed to reward and punishment at the same time, (2) environment changed and rats relearned, and (3) rats were stochastically exposed to reward and punishment. The results are that exposure to nonreinforcement produces resistance to the decremental effects of behavior after stochastic reward schedule and that exposure to both punishment and reinforcement produces resistance to the decremental effects of behavior after stochastic punishment schedule. This paper aims to simulate the rat behaviors by a reinforcement learning algorithm in consideration of appearance probabilities of reinforcement signals. The former algorithms of reinforcement learning were unable to simulate the behavior of the feature (3). We improve the former reinforcement learning algorithms by controlling learning parameters in consideration of the acquisition probabilities of reinforcement signals. The proposed algorithm qualitatively simulates the result of the animal experiment of Brown and Wanger. 相似文献

8.

Alterations of Monetary Reward and Punishment Processing in Chronic Cannabis Users: An fMRI Study

Bj?rn Enzi Silke Lissek Marc-Andreas Edel Martin Tegenthoff Volkmar Nicolas Norbert Scherbaum Georg Juckel Patrik Roser 《PloS one》2015,10(3)

Alterations in reward and punishment processing have been reported in adults suffering from long-term cannabis use. However, previous findings regarding the chronic effects of cannabis on reward and punishment processing have been inconsistent. In the present study, we used functional magnetic resonance imaging (fMRI) to reveal the neural correlates of reward and punishment processing in long-term cannabis users (n = 15) and in healthy control subjects (n = 15) with no history of drug abuse. For this purpose, we used the well-established Monetary Incentive Delay (MID) task, a reliable experimental paradigm that allows the differentiation between anticipatory and consummatory aspects of reward and punishment processing. Regarding the gain anticipation period, no significant group differences were observed. In the left caudate and the left inferior frontal gyrus, cannabis users were – in contrast to healthy controls – not able to differentiate between the conditions feedback of reward and control. In addition, cannabis users showed stronger activations in the left caudate and the bilateral inferior frontal gyrus following feedback of no punishment as compared to healthy controls. We interpreted these deficits in dorsal striatal functioning as altered stimulus-reward or action-contingent learning in cannabis users. In addition, the enhanced lateral prefrontal activation in cannabis users that is related to non-punishing feedback may reflect a deficit in emotion regulation or cognitive reappraisal in these subjects. 相似文献

9.

Midbrain dopamine neurons encode a quantitative reward prediction error signal 总被引：15，自引：0，他引：15

Bayer HM Glimcher PW 《Neuron》2005,47(1):129-141

相似文献

10.

The Role of Informative and Ambiguous Feedback in Avoidance Behavior: Empirical and Computational Findings

Ahmed A. Moustafa Jony Sheynin Catherine E. Myers 《PloS one》2015,10(12)

Avoidance behavior is a critical component of many psychiatric disorders, and as such, it is important to understand how avoidance behavior arises, and whether it can be modified. In this study, we used empirical and computational methods to assess the role of informational feedback and ambiguous outcome in avoidance behavior. We adapted a computer-based probabilistic classification learning task, which includes positive, negative and no-feedback outcomes; the latter outcome is ambiguous as it might signal either a successful outcome (missed punishment) or a failure (missed reward). Prior work with this task suggested that most healthy subjects viewed the no-feedback outcome as strongly positive. Interestingly, in a later version of the classification task, when healthy subjects were allowed to opt out of (i.e. avoid) responding, some subjects (“avoiders”) reliably avoided trials where there was a risk of punishment, but other subjects (“non-avoiders”) never made any avoidance responses at all. One possible interpretation is that the “non-avoiders” valued the no-feedback outcome so positively on punishment-based trials that they had little incentive to avoid. Another possible interpretation is that the outcome of an avoided trial is unspecified and that lack of information is aversive, decreasing subjects’ tendency to avoid. To examine these ideas, we here tested healthy young adults on versions of the task where avoidance responses either did or did not generate informational feedback about the optimal response. Results showed that provision of informational feedback decreased avoidance responses and also decreased categorization performance, without significantly affecting the percentage of subjects classified as “avoiders.” To better understand these results, we used a modified Q-learning model to fit individual subject data. Simulation results suggest that subjects in the feedback condition adjusted their behavior faster following better-than-expected outcomes, compared to subjects in the no-feedback condition. Additionally, in both task conditions, “avoiders” adjusted their behavior faster following worse-than-expected outcomes, and treated the ambiguous no-feedback outcome as less rewarding, compared to non-avoiders. Together, results shed light on the important role of ambiguous and informative feedback in avoidance behavior. 相似文献

11.

Learning to represent visual input

Geoffrey E. Hinton 《Philosophical transactions of the Royal Society of London. Series B, Biological sciences》2010,365(1537):177-184

One of the central problems in computational neuroscience is to understand how the object-recognition pathway of the cortex learns a deep hierarchy of nonlinear feature detectors. Recent progress in machine learning shows that it is possible to learn deep hierarchies without requiring any labelled data. The feature detectors are learned one layer at a time and the goal of the learning procedure is to form a good generative model of images, not to predict the class of each image. The learning procedure only requires the pairwise correlations between the activations of neuron-like processing units in adjacent layers. The original version of the learning procedure is derived from a quadratic ‘energy’ function but it can be extended to allow third-order, multiplicative interactions in which neurons gate the pairwise interactions between other neurons. A technique for factoring the third-order interactions leads to a learning module that again has a simple learning rule based on pairwise correlations. This module looks remarkably like modules that have been proposed by both biologists trying to explain the responses of neurons and engineers trying to create systems that can recognize objects. 相似文献

12.

Teaching and learning in a probabilistic prisoner's dilemma

Baker F Rachlin H 《Behavioural processes》2002,57(2-3):211-226

The prisoner's dilemma is much studied in social psychology and decision-making because it models many real-world conflicts. In everyday terms, the choice to 'cooperate' (maximize reward for the group) or 'defect' (maximize reward for the individual) is often attributed to altruistic or selfish motives. Alternatively, behavior during a dilemma may be understood as a function of reinforcement and punishment. Human participants played a prisoner's-dilemma-type game (for points exchangeable for money) with a computer that employed either a teaching strategy (a probabilistic version of tit-for-tat), in which the computer reinforced or punished participants' cooperation or defection, or a learning strategy (a probabilistic version of Pavlov), in which the computer's responses were reinforced and punished by participants' cooperation and defection. Participants learned to cooperate against both computer strategies. However, in a second experiment which varied the context of the game, they learned to cooperate only against one or other strategy; participants did not learn to cooperate against tit-for-tat when they believed that they were playing against another person; participants did not learn to cooperate against Pavlov when the computer's cooperation probability was signaled by a spinner. The results are consistent with the notion that people are biased not only to cooperate or defect on individual social choices, but also to employ one or other strategy of interaction in a pattern across social choices. 相似文献

13.

Asymmetric and adaptive reward coding via normalized reinforcement learning

Kenway Louie 《PLoS computational biology》2022,18(7)

Learning is widely modeled in psychology, neuroscience, and computer science by prediction error-guided reinforcement learning (RL) algorithms. While standard RL assumes linear reward functions, reward-related neural activity is a saturating, nonlinear function of reward; however, the computational and behavioral implications of nonlinear RL are unknown. Here, we show that nonlinear RL incorporating the canonical divisive normalization computation introduces an intrinsic and tunable asymmetry in prediction error coding. At the behavioral level, this asymmetry explains empirical variability in risk preferences typically attributed to asymmetric learning rates. At the neural level, diversity in asymmetries provides a computational mechanism for recently proposed theories of distributional RL, allowing the brain to learn the full probability distribution of future rewards. This behavioral and computational flexibility argues for an incorporation of biologically valid value functions in computational models of learning and decision-making. 相似文献

14.

Altered risk-based decision making following adolescent alcohol use results from an imbalance in reinforcement learning in rats

Clark JJ Nasrallah NA Hart AS Collins AL Bernstein IL Phillips PE 《PloS one》2012,7(5):e37357

Alcohol use during adolescence has profound and enduring consequences on decision-making under risk. However, the fundamental psychological processes underlying these changes are unknown. Here, we show that alcohol use produces over-fast learning for better-than-expected, but not worse-than-expected, outcomes without altering subjective reward valuation. We constructed a simple reinforcement learning model to simulate altered decision making using behavioral parameters extracted from rats with a history of adolescent alcohol use. Remarkably, the learning imbalance alone was sufficient to simulate the divergence in choice behavior observed between these groups of animals. These findings identify a selective alteration in reinforcement learning following adolescent alcohol use that can account for a robust change in risk-based decision making persisting into later life. 相似文献

15.

Consider the Source: Adolescents and Adults Similarly Follow Older Adult Advice More than Peer Advice

Frederico S. Lourenco Johannes H. Decker Gloria A. Pedersen Danielle V. Dellarco B. J. Casey Catherine A. Hartley 《PloS one》2015,10(6)

Individuals learn which of their actions are likely to be rewarded through trial and error. This form of learning is critical for adapting to new situations, which adolescents frequently encounter. Adolescents are also greatly influenced by their peers. The current study tested the extent to which adolescents rely on peer advice to guide their actions. Adolescent and young adult participants completed a probabilistic learning task in which they chose between four pairs of stimuli with different reinforcement probabilities, with one stimulus in each pair more frequently rewarded. Participants received advice about two of these pairs, once from a similarly aged peer and once from an older adult. Crucially, this advice was inaccurate, enabling the dissociation between experience-based and instruction-based learning. Adolescents and adults learned equally well from experience and no age group difference was evident in the overall influence of advice on choices. Surprisingly, when considering the source of advice, there was no evident influence of peer advice on adolescent choices. However, both adolescents and adults were biased toward choosing the stimulus recommended by the older adult. Contrary to conventional wisdom, these data suggest that adolescents may prioritize the advice of older adults over that of peers in certain decision-making contexts. 相似文献

16.

Model-based hierarchical reinforcement learning and human action control

Matthew Botvinick Ari Weinstein 《Philosophical transactions of the Royal Society of London. Series B, Biological sciences》2014,369(1655)

Recent work has reawakened interest in goal-directed or ‘model-based’ choice, where decisions are based on prospective evaluation of potential action outcomes. Concurrently, there has been growing attention to the role of hierarchy in decision-making and action control. We focus here on the intersection between these two areas of interest, considering the topic of hierarchical model-based control. To characterize this form of action control, we draw on the computational framework of hierarchical reinforcement learning, using this to interpret recent empirical findings. The resulting picture reveals how hierarchical model-based mechanisms might play a special and pivotal role in human decision-making, dramatically extending the scope and complexity of human behaviour. 相似文献

17.

Modeling changes in probabilistic reinforcement learning during adolescence

Liyu Xia Sarah L. Master Maria K. Eckstein Beth Baribault Ronald E. Dahl Linda Wilbrecht Anne Gabrielle Eva Collins 《PLoS computational biology》2021,17(7)

In the real world, many relationships between events are uncertain and probabilistic. Uncertainty is also likely to be a more common feature of daily experience for youth because they have less experience to draw from than adults. Some studies suggest probabilistic learning may be inefficient in youths compared to adults, while others suggest it may be more efficient in youths in mid adolescence. Here we used a probabilistic reinforcement learning task to test how youth age 8-17 (N = 187) and adults age 18-30 (N = 110) learn about stable probabilistic contingencies. Performance increased with age through early-twenties, then stabilized. Using hierarchical Bayesian methods to fit computational reinforcement learning models, we show that all participants’ performance was better explained by models in which negative outcomes had minimal to no impact on learning. The performance increase over age was driven by 1) an increase in learning rate (i.e. decrease in integration time scale); 2) a decrease in noisy/exploratory choices. In mid-adolescence age 13-15, salivary testosterone and learning rate were positively related. We discuss our findings in the context of other studies and hypotheses about adolescent brain development. 相似文献

18.

Identifying the Basal Ganglia Network Model Markers for Medication-Induced Impulsivity in Parkinson's Disease Patients

Pragathi Priyadharsini Balasubramani V. Srinivasa Chakravarthy Manal Ali Balaraman Ravindran Ahmed A. Moustafa 《PloS one》2015,10(6)

Impulsivity, i.e. irresistibility in the execution of actions, may be prominent in Parkinson''s disease (PD) patients who are treated with dopamine precursors or dopamine receptor agonists. In this study, we combine clinical investigations with computational modeling to explore whether impulsivity in PD patients on medication may arise as a result of abnormalities in risk, reward and punishment learning. In order to empirically assess learning outcomes involving risk, reward and punishment, four subject groups were examined: healthy controls, ON medication PD patients with impulse control disorder (PD-ON ICD) or without ICD (PD-ON non-ICD), and OFF medication PD patients (PD-OFF). A neural network model of the Basal Ganglia (BG) that has the capacity to predict the dysfunction of both the dopaminergic (DA) and the serotonergic (5HT) neuromodulator systems was developed and used to facilitate the interpretation of experimental results. In the model, the BG action selection dynamics were mimicked using a utility function based decision making framework, with DA controlling reward prediction and 5HT controlling punishment and risk predictions. The striatal model included three pools of Medium Spiny Neurons (MSNs), with D1 receptor (R) alone, D2R alone and co-expressing D1R-D2R. Empirical studies showed that reward optimality was increased in PD-ON ICD patients while punishment optimality was increased in PD-OFF patients. Empirical studies also revealed that PD-ON ICD subjects had lower reaction times (RT) compared to that of the PD-ON non-ICD patients. Computational modeling suggested that PD-OFF patients have higher punishment sensitivity, while healthy controls showed comparatively higher risk sensitivity. A significant decrease in sensitivity to punishment and risk was crucial for explaining behavioral changes observed in PD-ON ICD patients. Our results highlight the power of computational modelling for identifying neuronal circuitry implicated in learning, and its impairment in PD. The results presented here not only show that computational modelling can be used as a valuable tool for understanding and interpreting clinical data, but they also show that computational modeling has the potential to become an invaluable tool to predict the onset of behavioral changes during disease progression. 相似文献

19.

Computational models of reinforcement learning: the role of dopamine as a reward signal

Samson RD Frank MJ Fellous JM 《Cognitive neurodynamics》2010,4(2):91-105

Reinforcement learning is ubiquitous. Unlike other forms of learning, it involves the processing of fast yet content-poor feedback information to correct assumptions about the nature of a task or of a set of stimuli. This feedback information is often delivered as generic rewards or punishments, and has little to do with the stimulus features to be learned. How can such low-content feedback lead to such an efficient learning paradigm? Through a review of existing neuro-computational models of reinforcement learning, we suggest that the efficiency of this type of learning resides in the dynamic and synergistic cooperation of brain systems that use different levels of computations. The implementation of reward signals at the synaptic, cellular, network and system levels give the organism the necessary robustness, adaptability and processing speed required for evolutionary and behavioral success. 相似文献

20.

Reward Maximization Justifies the Transition from Sensory Selection at Childhood to Sensory Integration at Adulthood

Pedram Daee Maryam S. Mirian Majid Nili Ahmadabadi 《PloS one》2014,9(7)

In a multisensory task, human adults integrate information from different sensory modalities -behaviorally in an optimal Bayesian fashion- while children mostly rely on a single sensor modality for decision making. The reason behind this change of behavior over age and the process behind learning the required statistics for optimal integration are still unclear and have not been justified by the conventional Bayesian modeling. We propose an interactive multisensory learning framework without making any prior assumptions about the sensory models. In this framework, learning in every modality and in their joint space is done in parallel using a single-step reinforcement learning method. A simple statistical test on confidence intervals on the mean of reward distributions is used to select the most informative source of information among the individual modalities and the joint space. Analyses of the method and the simulation results on a multimodal localization task show that the learning system autonomously starts with sensory selection and gradually switches to sensory integration. This is because, relying more on modalities -i.e. selection- at early learning steps (childhood) is more rewarding than favoring decisions learned in the joint space since, smaller state-space in modalities results in faster learning in every individual modality. In contrast, after gaining sufficient experiences (adulthood), the quality of learning in the joint space matures while learning in modalities suffers from insufficient accuracy due to perceptual aliasing. It results in tighter confidence interval for the joint space and consequently causes a smooth shift from selection to integration. It suggests that sensory selection and integration are emergent behavior and both are outputs of a single reward maximization process; i.e. the transition is not a preprogrammed phenomenon. 相似文献