Publications by authors named "Stefano Palminteri"

41 Publications

The elusive effects of incidental anxiety on reinforcement-learning.

J Exp Psychol Learn Mem Cogn 2021 Sep 13. Epub 2021 Sep 13.

Center for Research in Experimental Economics and political Decision Making (CREED), Amsterdam School of Economics (ASE), Universiteit van Amsterdam.

Anxiety is a common affective state, characterized by the subjectively unpleasant feelings of dread over an anticipated event. Anxiety is suspected to have important negative consequences on cognition, decision-making, and learning. Yet, despite a recent surge in studies investigating the specific effects of anxiety on reinforcement-learning, no coherent picture has emerged. Here, we investigated the effects of incidental anxiety on instrumental reinforcement-learning, while addressing several issues and defaults identified in a focused literature review. We used a rich experimental design, featuring both a learning and a transfer phase, and a manipulation of outcomes valence (gains vs losses). In two variants (N = 2 × 50) of this experimental paradigm, incidental anxiety was induced with an established threat-of-shock paradigm. Model-free results show that incidental anxiety effects seem limited to a small, but specific increase in postlearning performance measured by a transfer task. A comprehensive modeling effort revealed that, irrespective of the effects of anxiety, individuals give more weight to positive than negative outcomes, and tend to experience the omission of a loss as a gain (and vice versa). However, in line with results from our targeted literature survey, isolating specific computational effects of anxiety on learning per se proved to be challenging. Overall, our results suggest that learning mechanisms are more complex than traditionally presumed, and raise important concerns about the robustness of the effects of anxiety previously identified in simple reinforcement-learning studies. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1037/xlm0001033DOI Listing
September 2021

Responses to Heartbeats in Ventromedial Prefrontal Cortex Contribute to Subjective Preference-Based Decisions.

J Neurosci 2021 Jun 29;41(23):5102-5114. Epub 2021 Apr 29.

Laboratoire de Neurosciences Cognitives et Computationnelles, Ecole Normale Supérieure, PSL University, 75005 Paris, France

or ? Preference-based decisions are subjective and entail self-reflection. However, these self-related features are unaccounted for by known neural mechanisms of valuation and choice. Self-related processes have been linked to a basic interoceptive biological mechanism, the neural monitoring of heartbeats, in particular in ventromedial prefrontal cortex (vmPFC), a region also involved in value encoding. We thus hypothesized a functional coupling between the neural monitoring of heartbeats and the precision of value encoding in vmPFC. Human participants of both sexes were presented with pairs of movie titles. They indicated either which movie they preferred or performed a control objective visual discrimination that did not require self-reflection. Using magnetoencephalography, we measured heartbeat-evoked responses (HERs) before option presentation and confirmed that HERs in vmPFC were larger when preparing for the subjective, self-related task. We retrieved the expected cortical value network during choice with time-resolved statistical modeling. Crucially, we show that larger HERs before option presentation are followed by stronger value encoding during choice in vmPFC. This effect is independent of overall vmPFC baseline activity. The neural interaction between HERs and value encoding predicted preference-based choice consistency over time, accounting for both interindividual differences and trial-to-trial fluctuations within individuals. Neither cardiac activity nor arousal fluctuations could account for any of the effects. HERs did not interact with the encoding of perceptual evidence in the discrimination task. Our results show that the self-reflection underlying preference-based decisions involves HERs, and that HER integration to subjective value encoding in vmPFC contributes to preference stability. Deciding whether you prefer or is based on subjective values, which only you, the decision-maker, can estimate and compare, by asking yourself. Yet, how self-reflection is biologically implemented and its contribution to subjective valuation are not known. We show that in ventromedial prefrontal cortex, the neural response to heartbeats, an interoceptive self-related process, influences the cortical representation of subjective value. The neural interaction between the cortical monitoring of heartbeats and value encoding predicts choice consistency (i.e., whether you consistently prefer over time. Our results pave the way for the quantification of self-related processes in decision-making and may shed new light on the relationship between maladaptive decisions and impaired interoception.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1523/JNEUROSCI.1932-20.2021DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8197644PMC
June 2021

Two sides of the same coin: Beneficial and detrimental consequences of range adaptation in human reinforcement learning.

Sci Adv 2021 Apr 2;7(14). Epub 2021 Apr 2.

Laboratoire de Neurosciences Cognitives et Computationnelles, Institut National de la Santé et Recherche Médicale, 29 rue d'Ulm, 75005 Paris, France.

Evidence suggests that economic values are rescaled as a function of the range of the available options. Although locally adaptive, range adaptation has been shown to lead to suboptimal choices, particularly notable in reinforcement learning (RL) situations when options are extrapolated from their original context to a new one. Range adaptation can be seen as the result of an adaptive coding process aiming at increasing the signal-to-noise ratio. However, this hypothesis leads to a counterintuitive prediction: Decreasing task difficulty should increase range adaptation and, consequently, extrapolation errors. Here, we tested the paradoxical relation between range adaptation and performance in a large sample of participants performing variants of an RL task, where we manipulated task difficulty. Results confirmed that range adaptation induces systematic extrapolation errors and is stronger when decreasing task difficulty. Last, we propose a range-adapting model and show that it is able to parsimoniously capture all the behavioral results.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1126/sciadv.abe0340DOI Listing
April 2021

Aberrant Striatal Value Representation in Huntington's Disease Gene Carriers 25 Years Before Onset.

Biol Psychiatry Cogn Neurosci Neuroimaging 2021 09 11;6(9):910-918. Epub 2021 Jan 11.

Huntington's Disease Centre, University College London Queen Square Institute of Neurology, University College London, London, United Kingdom; Wellcome Centre for Human Neuroimaging, University College London Queen Square Institute of Neurology, University College London, London, United Kingdom. Electronic address:

Background: In this study, we asked whether differences in striatal activity during a reinforcement learning (RL) task with gain and loss domains could be one of the earliest functional imaging features associated with carrying the Huntington's disease (HD) gene. Based on previous work, we hypothesized that HD gene carriers would show either neural or behavioral asymmetry between gain and loss learning.

Methods: We recruited 35 HD gene carriers, expected to demonstrate onset of motor symptoms in an average of 26 years, and 35 well-matched gene-negative control subjects. Participants were placed in a functional magnetic resonance imaging scanner, where they completed an RL task in which they were required to learn to choose between abstract stimuli with the aim of gaining rewards and avoiding losses. Task behavior was modeled using an RL model, and variables from this model were used to probe functional magnetic resonance imaging data.

Results: In comparison with well-matched control subjects, gene carriers more than 25 years from motor onset showed exaggerated striatal responses to gain-predicting stimuli compared with loss-predicting stimuli (p = .002) in our RL task. Using computational analysis, we also found group differences in striatal representation of stimulus value (p = .0004). We found no group differences in behavior, cognitive scores, or caudate volumes.

Conclusions: Behaviorally, gene carriers 9 years from predicted onset have been shown to learn better from gains than from losses. Our data suggest that a window exists in which HD-related functional neural changes are detectable long before associated behavioral change and 25 years before predicted motor onset. These represent the earliest functional imaging differences between HD gene carriers and control subjects.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.bpsc.2020.12.015DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8423628PMC
September 2021

The description-experience gap: a challenge for the neuroeconomics of decision-making under uncertainty.

Philos Trans R Soc Lond B Biol Sci 2021 03 11;376(1819):20190665. Epub 2021 Jan 11.

Laboratoire de Neurosciences Cognitives et Computationnelles, Ecole Normale Supérieure, Institut National de la Santé et Recherche Médicale, Université de Recherche Paris Sciences et Lettres, Paris, France.

The experimental investigation of decision-making in humans relies on two distinct types of paradigms, involving either description- or experience-based choices. In description-based paradigms, decision variables (i.e. payoffs and probabilities) are explicitly communicated by means of symbols. In experience-based paradigms decision variables are learnt from trial-by-trial feedback. In the decision-making literature, 'description-experience gap' refers to the fact that different biases are observed in the two experimental paradigms. Remarkably, well-documented biases of description-based choices, such as under-weighting of rare events and loss aversion, do not apply to experience-based decisions. Here, we argue that the description-experience gap represents a major challenge, not only to current decision theories, but also to the neuroeconomics research framework, which relies heavily on the translation of neurophysiological findings between human and non-human primate research. In fact, most non-human primate neurophysiological research relies on behavioural designs that share features of both description- and experience-based choices. As a consequence, it is unclear whether the neural mechanisms built from non-human primate electrophysiology should be linked to description-based or experience-based decision-making processes. The picture is further complicated by additional methodological gaps between human and non-human primate neuroscience research. After analysing these methodological challenges, we conclude proposing new lines of research to address them. This article is part of the theme issue 'Existence and prevalence of economic behaviours among non-human primates'.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1098/rstb.2019.0665DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7815421PMC
March 2021

A Causal Role for the Pedunculopontine Nucleus in Human Instrumental Learning.

Curr Biol 2021 03 21;31(5):943-954.e5. Epub 2020 Dec 21.

Motivation, Brain and Behavior (MBB) laboratory, Paris Brain Institute (ICM), Groupe Hospitalier Pitié-Salpêtrière, Paris 75013, France; INSERM Unit 1127, CNRS Unit 7225, Sorbonne Universités (SU), Paris 75005, France. Electronic address:

A critical mechanism for maximizing reward is instrumental learning. In standard instrumental learning models, action values are updated on the basis of reward prediction errors (RPEs), defined as the discrepancy between expectations and outcomes. A wealth of evidence across species and experimental techniques has established that RPEs are signaled by midbrain dopamine neurons. However, the way dopamine neurons receive information about reward outcomes remains poorly understood. Recent animal studies suggest that the pedunculopontine nucleus (PPN), a small brainstem structure considered as a locomotor center, is sensitive to reward and sends excitatory projection to dopaminergic nuclei. Here, we examined the hypothesis that the PPN could contribute to reward learning in humans. To this aim, we leveraged a clinical protocol that assessed the therapeutic impact of PPN deep-brain stimulation (DBS) in three patients with Parkinson disease. PPN local field potentials (LFPs), recorded while patients performed an instrumental learning task, showed a specific response to reward outcomes in a low-frequency (alpha-beta) band. Moreover, PPN DBS selectively improved learning from rewards but not from punishments, a pattern that is typically observed following dopaminergic treatment. Computational analyses indicated that the effect of PPN DBS on instrumental learning was best captured by an increase in subjective reward sensitivity. Taken together, these results support a causal role for PPN-mediated reward signals in human instrumental learning.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cub.2020.11.042DOI Listing
March 2021

The actions of others act as a pseudo-reward to drive imitation in the context of social reinforcement learning.

PLoS Biol 2020 12 8;18(12):e3001028. Epub 2020 Dec 8.

Laboratoire de Neurosciences Cognitives et Computationnelles, Institut National de la Santé et de la Recherche Médicale, Paris, France.

While there is no doubt that social signals affect human reinforcement learning, there is still no consensus about how this process is computationally implemented. To address this issue, we compared three psychologically plausible hypotheses about the algorithmic implementation of imitation in reinforcement learning. The first hypothesis, decision biasing (DB), postulates that imitation consists in transiently biasing the learner's action selection without affecting their value function. According to the second hypothesis, model-based imitation (MB), the learner infers the demonstrator's value function through inverse reinforcement learning and uses it to bias action selection. Finally, according to the third hypothesis, value shaping (VS), the demonstrator's actions directly affect the learner's value function. We tested these three hypotheses in 2 experiments (N = 24 and N = 44) featuring a new variant of a social reinforcement learning task. We show through model comparison and model simulation that VS provides the best explanation of learner's behavior. Results replicated in a third independent experiment featuring a larger cohort and a different design (N = 302). In our experiments, we also manipulated the quality of the demonstrators' choices and found that learners were able to adapt their imitation rate, so that only skilled demonstrators were imitated. We proposed and tested an efficient meta-learning process to account for this effect, where imitation is regulated by the agreement between the learner and the demonstrator. In sum, our findings provide new insights and perspectives on the computational mechanisms underlying adaptive imitation in human reinforcement learning.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pbio.3001028DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7723279PMC
December 2020

Robust valence-induced biases on motor response and confidence in human reinforcement learning.

Cogn Affect Behav Neurosci 2020 12;20(6):1184-1199

Laboratory for Behavioral Neurology and Imaging of Cognition (LabNIC), Department of Basic Neurosciences, University of Geneva, Campus Biotech, 9 Chemin des Mines, 1202, Geneva, Switzerland.

In simple instrumental-learning tasks, humans learn to seek gains and to avoid losses equally well. Yet, two effects of valence are observed. First, decisions in loss-contexts are slower. Second, loss contexts decrease individuals' confidence in their choices. Whether these two effects are two manifestations of a single mechanism or whether they can be partially dissociated is unknown. Across six experiments, we attempted to disrupt the valence-induced motor bias effects by manipulating the mapping between decisions and actions and imposing constraints on response times (RTs). Our goal was to assess the presence of the valence-induced confidence bias in the absence of the RT bias. We observed both motor and confidence biases despite our disruption attempts, establishing that the effects of valence on motor and metacognitive responses are very robust and replicable. Nonetheless, within- and between-individual inferences reveal that the confidence bias resists the disruption of the RT bias. Therefore, although concomitant in most cases, valence-induced motor and confidence biases seem to be partly dissociable. These results highlight new important mechanistic constraints that should be incorporated in learning models to jointly explain choice, reaction times and confidence.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3758/s13415-020-00826-0DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7716860PMC
December 2020

Information about action outcomes differentially affects learning from self-determined versus imposed choices.

Nat Hum Behav 2020 10 3;4(10):1067-1079. Epub 2020 Aug 3.

Laboratoire de Neurosciences Cognitives et Computationnelles, Département d'Études Cognitives, École Normale Supérieure, INSERM, PSL University, Paris, France.

The valence of new information influences learning rates in humans: good news tends to receive more weight than bad news. We investigated this learning bias in four experiments, by systematically manipulating the source of required action (free versus forced choices), outcome contingencies (low versus high reward) and motor requirements (go versus no-go choices). Analysis of model-estimated learning rates showed that the confirmation bias in learning rates was specific to free choices, but was independent of outcome contingencies. The bias was also unaffected by the motor requirements, thus suggesting that it operates in the representational space of decisions, rather than motoric actions. Finally, model simulations revealed that learning rates estimated from the choice-confirmation model had the effect of maximizing performance across low- and high-reward environments. We therefore suggest that choice-confirmation bias may be adaptive for efficient learning of action-outcome contingencies, above and beyond fostering person-level dispositions such as self-esteem.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41562-020-0919-5DOI Listing
October 2020

The Effect of Counterfactual Information on Outcome Value Coding in Medial Prefrontal and Cingulate Cortex: From an Absolute to a Relative Neural Code.

J Neurosci 2020 04 10;40(16):3268-3277. Epub 2020 Mar 10.

Center for Mind/Brain Sciences-CIMeC, University of Trento, Mattarello 38123, Italy.

Adaptive coding of stimuli is well documented in perception, where it supports efficient encoding over a broad range of possible percepts. Recently, a similar neural mechanism has been reported also in value-based decision, where it allows optimal encoding of vast ranges of values in PFC: neuronal response to value depends on the choice context (relative coding), rather than being invariant across contexts (absolute coding). Additionally, value learning is sensitive to the amount of feedback information: providing complete feedback (both obtained and forgone outcomes) instead of partial feedback (only obtained outcome) improves learning. However, it is unclear whether relative coding occurs in all PFC regions and how it is affected by feedback information. We systematically investigated univariate and multivariate feedback encoding in various mPFC regions and compared three modes of neural coding: absolute, partially-adaptive and fully-adaptive.Twenty-eight human participants (both sexes) performed a learning task while undergoing fMRI scanning. On each trial, they chose between two symbols associated with a certain outcome. Then, the decision outcome was revealed. Notably, in one-half of the trials participants received partial feedback, whereas in the other half they got complete feedback. We used univariate and multivariate analysis to explore value encoding in different feedback conditions.We found that both obtained and forgone outcomes were encoded in mPFC, but with opposite sign in its ventral and dorsal subdivisions. Moreover, we showed that increasing feedback information induced a switch from absolute to relative coding. Our results suggest that complete feedback information enhances context-dependent outcome encoding. This study offers a systematic investigation of the effect of the amount of feedback information (partial vs complete) on univariate and multivariate outcome value encoding, within multiple regions in mPFC and cingulate cortex that are critical for value-based decisions and behavioral adaptation. Moreover, we provide the first comparison of three possible models of neural coding (i.e., absolute, partially-adaptive, and fully-adaptive coding) of value signal in these regions, by using commensurable measures of prediction accuracy. Taken together, our results help build a more comprehensive picture of how the human brain encodes and processes outcome value. In particular, our results suggest that simultaneous presentation of obtained and foregone outcomes promotes relative value representation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1523/JNEUROSCI.1712-19.2020DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7159892PMC
April 2020

Temporal chunking as a mechanism for unsupervised learning of task-sets.

Elife 2020 03 9;9. Epub 2020 Mar 9.

Laboratoire de Neurosciences Cognitives et Computationnelles, Institut National de la Sante et de la Recherche Medicale, Paris, France.

Depending on environmental demands, humans can learn and exploit multiple concurrent sets of stimulus-response associations. Mechanisms underlying the learning of such task-sets remain unknown. Here we investigate the hypothesis that task-set learning relies on unsupervised chunking of stimulus-response associations that occur in temporal proximity. We examine behavioral and neural data from a task-set learning experiment using a network model. We first show that task-set learning can be achieved provided the timescale of chunking is slower than the timescale of stimulus-response learning. Fitting the model to behavioral data on a subject-by-subject basis confirmed this expectation and led to specific predictions linking chunking and task-set retrieval that were borne out by behavioral performance and reaction times. Comparing the model activity with BOLD signal allowed us to identify neural correlates of task-set retrieval in a functional network involving ventral and dorsal prefrontal cortex, with the dorsal system preferentially engaged when retrievals are used to improve performance.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.7554/eLife.50469DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7108869PMC
March 2020

Computational noise in reward-guided learning drives behavioral variability in volatile environments.

Nat Neurosci 2019 12 28;22(12):2066-2077. Epub 2019 Oct 28.

Laboratoire de Neurosciences Cognitives et Computationnelles, Inserm U960, Département d'Études Cognitives, École Normale Supérieure, PSL University, Paris, France.

When learning the value of actions in volatile environments, humans often make seemingly irrational decisions that fail to maximize expected value. We reasoned that these 'non-greedy' decisions, instead of reflecting information seeking during choice, may be caused by computational noise in the learning of action values. Here using reinforcement learning models of behavior and multimodal neurophysiological data, we show that the majority of non-greedy decisions stem from this learning noise. The trial-to-trial variability of sequential learning steps and their impact on behavior could be predicted both by blood oxygen level-dependent responses to obtained rewards in the dorsal anterior cingulate cortex and by phasic pupillary dilation, suggestive of neuromodulatory fluctuations driven by the locus coeruleus-norepinephrine system. Together, these findings indicate that most behavioral variability, rather than reflecting human exploration, is due to the limited computational precision of reward-guided learning.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41593-019-0518-9DOI Listing
December 2019

Cost-benefit trade-offs in decision-making and learning.

PLoS Comput Biol 2019 09 6;15(9):e1007326. Epub 2019 Sep 6.

Institut Jean Nicod, Département d'Études Cognitives, École Normale Supérieure, EHESS, CNRS, PSL University, Paris, France.

Value-based decision-making involves trading off the cost associated with an action against its expected reward. Research has shown that both physical and mental effort constitute such subjective costs, biasing choices away from effortful actions, and discounting the value of obtained rewards. Facing conflicts between competing action alternatives is considered aversive, as recruiting cognitive control to overcome conflict is effortful. Moreover, engaging control to proactively suppress irrelevant information that could conflict with task-relevant information would presumably also be cognitively costly. Yet, it remains unclear whether the cognitive control demands involved in preventing and resolving conflict also constitute costs in value-based decisions. The present study investigated this question by embedding irrelevant distractors (flanker arrows) within a reversal-learning task, with intermixed free and instructed trials. Results showed that participants learned to adapt their free choices to maximize rewards, but were nevertheless biased to follow the suggestions of irrelevant distractors. Thus, the perceived cost of investing cognitive control to suppress an external suggestion could sometimes trump internal value representations. By adapting computational models of reinforcement learning, we assessed the influence of conflict at both the decision and learning stages. Modelling the decision showed that free choices were more biased when participants were less sure about which action was more rewarding. This supports the hypothesis that the costs linked to conflict management were traded off against expected rewards. During the learning phase, we found that learning rates were reduced in instructed, relative to free, choices. Learning rates were further reduced by conflict between an instruction and subjective action values, whereas learning was not robustly influenced by conflict between one's actions and external distractors. Our results show that the subjective cognitive control costs linked to conflict factor into value-based decision-making, and highlight that different types of conflict may have different effects on learning about action outcomes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1007326DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6750595PMC
September 2019

Assessing inter-individual differences with task-related functional neuroimaging.

Nat Hum Behav 2019 09 26;3(9):897-905. Epub 2019 Aug 26.

Laboratoire de Neurosciences Cognitives et Computationnelles, Institut National de la Santé et de la Recherche Médicale, Paris, France.

Explaining and predicting individual behavioural differences induced by clinical and social factors constitutes one of the most promising applications of neuroimaging. In this Perspective, we discuss the theoretical and statistical foundations of the analyses of inter-individual differences in task-related functional neuroimaging. Leveraging a five-year literature review (July 2013-2018), we show that researchers often assess how activations elicited by a variable of interest differ between individuals. We argue that the rationale for such analyses, typically grounded in resource theory, offers an over-large analytical and interpretational flexibility that undermines their validity. We also recall how, in the established framework of the general linear model, inter-individual differences in behaviour can act as hidden moderators and spuriously induce differences in activations. We conclude with a set of recommendations and directions, which we hope will contribute to improving the statistical validity and the neurobiological interpretability of inter-individual difference analyses in task-related functional neuroimaging.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41562-019-0681-8DOI Listing
September 2019

Depressive symptoms are associated with blunted reward learning in social contexts.

PLoS Comput Biol 2019 07 29;15(7):e1007224. Epub 2019 Jul 29.

Laboratoire de Neurosciences Cognitives et Computationnelles, Institut National de la Santé et de la Recherche Médicale, Paris, France.

Depression is characterized by a marked decrease in social interactions and blunted sensitivity to rewards. Surprisingly, despite the importance of social deficits in depression, non-social aspects have been disproportionally investigated. As a consequence, the cognitive mechanisms underlying atypical decision-making in social contexts in depression are poorly understood. In the present study, we investigate whether deficits in reward processing interact with the social context and how this interaction is affected by self-reported depression and anxiety symptoms in the general population. Two cohorts of subjects (discovery and replication sample: N = 50 each) took part in an experiment involving reward learning in contexts with different levels of social information (absent, partial and complete). Behavioral analyses revealed a specific detrimental effect of depressive symptoms-but not anxiety-on behavioral performance in the presence of social information, i.e. when participants were informed about the choices of another player. Model-based analyses further characterized the computational nature of this deficit as a negative audience effect, rather than a deficit in the way others' choices and rewards are integrated in decision making. To conclude, our results shed light on the cognitive and computational mechanisms underlying the interaction between social cognition, reward learning and decision-making in depressive disorders.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1007224DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6699715PMC
July 2019

Decomposing the effects of context valence and feedback information on speed and accuracy during reinforcement learning: a meta-analytical approach using diffusion decision modeling.

Cogn Affect Behav Neurosci 2019 06;19(3):490-502

Amsterdam Brain and Cognition, Universiteit van Amsterdam, Amsterdam, The Netherlands.

Reinforcement learning (RL) models describe how humans and animals learn by trial-and-error to select actions that maximize rewards and minimize punishments. Traditional RL models focus exclusively on choices, thereby ignoring the interactions between choice preference and response time (RT), or how these interactions are influenced by contextual factors. However, in the field of perceptual decision-making, such interactions have proven to be important to dissociate between different underlying cognitive processes. Here, we investigated such interactions to shed new light on overlooked differences between learning to seek rewards and learning to avoid losses. We leveraged behavioral data from four RL experiments, which feature manipulations of two factors: outcome valence (gains vs. losses) and feedback information (partial vs. complete feedback). A Bayesian meta-analysis revealed that these contextual factors differently affect RTs and accuracy: While valence only affects RTs, feedback information affects both RTs and accuracy. To dissociate between the latent cognitive processes, we jointly fitted choices and RTs across all experiments with a Bayesian, hierarchical diffusion decision model (DDM). We found that the feedback manipulation affected drift rate, threshold, and non-decision time, suggesting that it was not a mere difficulty effect. Moreover, valence affected non-decision time and threshold, suggesting a motor inhibition in punishing contexts. To better understand the learning dynamics, we finally fitted a combination of RL and DDM (RLDDM). We found that while the threshold was modulated by trial-specific decision conflict, the non-decision time was modulated by the learned context valence. Overall, our results illustrate the benefits of jointly modeling RTs and choice data during RL, to reveal subtle mechanistic differences underlying decisions in different learning contexts.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3758/s13415-019-00723-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6598978PMC
June 2019

Contextual influence on confidence judgments in human reinforcement learning.

PLoS Comput Biol 2019 04 8;15(4):e1006973. Epub 2019 Apr 8.

CREED, Amsterdam School of Economics (ASE), Universiteit van Amsterdam, Amsterdam, the Netherlands.

The ability to correctly estimate the probability of one's choices being correct is fundamental to optimally re-evaluate previous choices or to arbitrate between different decision strategies. Experimental evidence nonetheless suggests that this metacognitive process-confidence judgment- is susceptible to numerous biases. Here, we investigate the effect of outcome valence (gains or losses) on confidence while participants learned stimulus-outcome associations by trial-and-error. In two experiments, participants were more confident in their choices when learning to seek gains compared to avoiding losses, despite equal difficulty and performance between those two contexts. Computational modelling revealed that this bias is driven by the context-value, a dynamically updated estimate of the average expected-value of choice options, necessary to explain equal performance in the gain and loss domain. The biasing effect of context-value on confidence, revealed here for the first time in a reinforcement-learning context, is therefore domain-general, with likely important functional consequences. We show that one such consequence emerges in volatile environments, where the (in)flexibility of individuals' learning strategies differs when outcomes are framed as gains or losses. Despite apparent similar behavior- profound asymmetries might therefore exist between learning to avoid losses and learning to seek gains.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1006973DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6472836PMC
April 2019

Can We Infer Inter-Individual Differences in Risk-Taking From Behavioral Tasks?

Front Psychol 2018 21;9:2307. Epub 2018 Nov 21.

Laboratoire de Neurosciences Cognitives, Institut National de la Santé et de la Recherche Médicale, Paris, France.

Investigating the bases of inter-individual differences in risk-taking is necessary to refine our cognitive and neural models of decision-making and to ultimately counter risky behaviors in real-life policy settings. However, recent evidence suggests that behavioral tasks fare poorly compared to standard questionnaires to measure individual differences in risk-taking. Crucially, using model-based measures of risk taking does not seem to improve reliability. Here, we put forward two possible - not mutually exclusive - explanations for these results and suggest future avenues of research to improve the assessment of inter-individual differences in risk-taking by combining repeated online testing and mechanistic computational models.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3389/fpsyg.2018.02307DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6260002PMC
November 2018

Contrasting temporal difference and opportunity cost reinforcement learning in an empirical money-emergence paradigm.

Proc Natl Acad Sci U S A 2018 12 15;115(49):E11446-E11454. Epub 2018 Nov 15.

Laboratoire de Neurosciences Cognitives, Institut National de la Santé et de la Recherche Médicale, 75005 Paris, France;

Money is a fundamental and ubiquitous institution in modern economies. However, the question of its emergence remains a central one for economists. The monetary search-theoretic approach studies the conditions under which commodity money emerges as a solution to override frictions inherent to interindividual exchanges in a decentralized economy. Although among these conditions, agents' rationality is classically essential and a prerequisite to any theoretical monetary equilibrium, human subjects often fail to adopt optimal strategies in tasks implementing a search-theoretic paradigm when these strategies are speculative, i.e., involve the use of a costly medium of exchange to increase the probability of subsequent and successful trades. In the present work, we hypothesize that implementing such speculative behaviors relies on reinforcement learning instead of lifetime utility calculations, as supposed by classical economic theory. To test this hypothesis, we operationalized the Kiyotaki and Wright paradigm of money emergence in a multistep exchange task and fitted behavioral data regarding human subjects performing this task with two reinforcement learning models. Each of them implements a distinct cognitive hypothesis regarding the weight of future or counterfactual rewards in current decisions. We found that both models outperformed theoretical predictions about subjects' behaviors regarding the implementation of speculative strategies and that the latter relies on the degree of the opportunity costs consideration in the learning process. Speculating about the marketability advantage of money thus seems to depend on mental simulations of counterfactual events that agents are performing in exchange situations.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1073/pnas.1813197115DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6298096PMC
December 2018

The role of the striatum in linguistic selection: Evidence from Huntington's disease and computational modeling.

Cortex 2018 12 15;109:189-204. Epub 2018 Sep 15.

Département d'Etudes Cognitives, Ecole Normale Supérieure - PSL Research University, Paris, France; Equipe de NeuroPsychologie Interventionnelle, Institut National de la Santé et Recherche Médical (INSERM) U955, Equipe 01, Créteil, France; Université Paris Est, Faculté de Médecine, Créteil, France; Centre de référence maladie de Huntington, Hôpital Henri Mondor, AP-HP, Créteil, France.

Though accumulating evidence indicates that the striatum is recruited during language processing, the specific function of this subcortical structure in language remains to be elucidated. To answer this question, we used Huntington's disease as a model of striatal lesion. We investigated the morphological deficit of 30 early Huntington's disease patients with a novel linguistic task that can be modeled within an explicit theory of linguistic computation. Behavioral results reflected an impairment in HD patients on the linguistic task. Computational model-based analysis compared the behavioral data to simulated data from two distinct lesion models, a selection deficit model and a grammatical deficit model. This analysis revealed that the impairment derives from an increased randomness in the process of selecting between grammatical alternatives, rather than from a disruption of grammatical knowledge per se. Voxel-based morphometry permitted to correlate this impairment to dorsal striatal degeneration. We thus show that the striatum holds a role in the selection of linguistic alternatives, just as in the selection of motor and cognitive programs.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cortex.2018.08.031DOI Listing
December 2018

Reference-point centering and range-adaptation enhance human reinforcement learning at the cost of irrational preferences.

Nat Commun 2018 10 29;9(1):4503. Epub 2018 Oct 29.

Laboratoire de Neurosciences Cognitives Computationnelles, Institut National de la Santé et Recherche Médicale, 29 rue d'Ulm, 75005, Paris, France.

In economics and perceptual decision-making contextual effects are well documented, where decision weights are adjusted as a function of the distribution of stimuli. Yet, in reinforcement learning literature whether and how contextual information pertaining to decision states is integrated in learning algorithms has received comparably little attention. Here, we investigate reinforcement learning behavior and its computational substrates in a task where we orthogonally manipulate outcome valence and magnitude, resulting in systematic variations in state-values. Model comparison indicates that subjects' behavior is best accounted for by an algorithm which includes both reference point-dependence and range-adaptation-two crucial features of state-dependent valuation. In addition, we find that state-dependent outcome valuation progressively emerges, is favored by increasing outcome information and correlated with explicit understanding of the task structure. Finally, our data clearly show that, while being locally adaptive (for instance in negative valence and small magnitude contexts), state-dependent valuation comes at the cost of seemingly irrational choices, when options are extrapolated out from their original contexts.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-018-06781-2DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6206161PMC
October 2018

How the Level of Reward Awareness Changes the Computational and Electrophysiological Signatures of Reinforcement Learning.

J Neurosci 2018 11 16;38(48):10338-10348. Epub 2018 Oct 16.

Department of Psychology, University of Amsterdam, 1018 WT, Amsterdam, The Netherlands,

The extent to which subjective awareness influences reward processing, and thereby affects future decisions, is currently largely unknown. In the present report, we investigated this question in a reinforcement learning framework, combining perceptual masking, computational modeling, and electroencephalographic recordings (human male and female participants). Our results indicate that degrading the visibility of the reward decreased, without completely obliterating, the ability of participants to learn from outcomes, but concurrently increased their tendency to repeat previous choices. We dissociated electrophysiological signatures evoked by the reward-based learning processes from those elicited by the reward-independent repetition of previous choices and showed that these neural activities were significantly modulated by reward visibility. Overall, this report sheds new light on the neural computations underlying reward-based learning and decision-making and highlights that awareness is beneficial for the trial-by-trial adjustment of decision-making strategies. The notion of reward is strongly associated with subjective evaluation, related to conscious processes such as "pleasure," "liking," and "wanting." Here we show that degrading reward visibility in a reinforcement learning task decreases, without completely obliterating, the ability of participants to learn from outcomes, but concurrently increases subjects' tendency to repeat previous choices. Electrophysiological recordings, in combination with computational modeling, show that neural activities were significantly modulated by reward visibility. Overall, we dissociate different neural computations underlying reward-based learning and decision-making, which highlights a beneficial role of reward awareness in adjusting decision-making strategies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1523/JNEUROSCI.0457-18.2018DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6596205PMC
November 2018

Neural computations underpinning the strategic management of influence in advice giving.

Nat Commun 2017 12 19;8(1):2191. Epub 2017 Dec 19.

UCL Institute of Cognitive Neuroscience, University College London, 17 Queen Square, London, WC1N 3AR, UK.

Research on social influence has focused mainly on the target of influence (e.g., consumer and voter); thus, the cognitive and neurobiological underpinnings of the source of the influence (e.g., politicians and salesmen) remain unknown. Here, in a three-sided advice-giving game, two advisers competed to influence a client by modulating their own confidence in their advice about which lottery the client should choose. We report that advisers' strategy depends on their level of influence on the client and their merit relative to one another. Moreover, blood-oxygenation-level-dependent (BOLD) signal in the temporo-parietal junction is modulated by adviser's current level of influence on the client, and relative merit prediction error affects activity in medial-prefrontal cortex. Both types of social information modulate ventral striatum response. By demonstrating what happens in our mind and brain when we try to influence others, these results begin to explain the biological mechanisms that shape inter-individual differences in social conduct.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41467-017-02314-5DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5736665PMC
December 2017

Confirmation bias in human reinforcement learning: Evidence from counterfactual feedback processing.

PLoS Comput Biol 2017 Aug 11;13(8):e1005684. Epub 2017 Aug 11.

Institute of Cognitive Neuroscience, University College London, London, United Kingdom.

Previous studies suggest that factual learning, that is, learning from obtained outcomes, is biased, such that participants preferentially take into account positive, as compared to negative, prediction errors. However, whether or not the prediction error valence also affects counterfactual learning, that is, learning from forgone outcomes, is unknown. To address this question, we analysed the performance of two groups of participants on reinforcement learning tasks using a computational model that was adapted to test if prediction error valence influences learning. We carried out two experiments: in the factual learning experiment, participants learned from partial feedback (i.e., the outcome of the chosen option only); in the counterfactual learning experiment, participants learned from complete feedback information (i.e., the outcomes of both the chosen and unchosen option were displayed). In the factual learning experiment, we replicated previous findings of a valence-induced bias, whereby participants learned preferentially from positive, relative to negative, prediction errors. In contrast, for counterfactual learning, we found the opposite valence-induced bias: negative prediction errors were preferentially taken into account, relative to positive ones. When considering valence-induced bias in the context of both factual and counterfactual learning, it appears that people tend to preferentially take into account information that confirms their current choice.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1005684DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5568446PMC
August 2017

Specific effect of a dopamine partial agonist on counterfactual learning: evidence from Gilles de la Tourette syndrome.

Sci Rep 2017 07 24;7(1):6292. Epub 2017 Jul 24.

Laboratoire de Neurosciences Cognitives, Institut National de la Santé et de la Recherche Médicale, Paris, France.

The dopamine partial agonist aripiprazole is increasingly used to treat pathologies for which other antipsychotics are indicated because it displays fewer side effects, such as sedation and depression-like symptoms, than other dopamine receptor antagonists. Previously, we showed that aripiprazole may protect motivational function by preserving reinforcement-related signals used to sustain reward-maximization. However, the effect of aripiprazole on more cognitive facets of human reinforcement learning, such as learning from the forgone outcomes of alternative courses of action (i.e., counterfactual learning), is unknown. To test the influence of aripiprazole on counterfactual learning, we administered a reinforcement learning task that involves both direct learning from obtained outcomes and indirect learning from forgone outcomes to two groups of Gilles de la Tourette (GTS) patients, one consisting of patients who were completely unmedicated and the other consisting of patients who were receiving aripiprazole monotherapy, and to healthy subjects. We found that whereas learning performance improved in the presence of counterfactual feedback in both healthy controls and unmedicated GTS patients, this was not the case in aripiprazole-medicated GTS patients. Our results suggest that whereas aripiprazole preserves direct learning of action-outcome associations, it may impair more complex inferential processes, such as counterfactual learning from forgone outcomes, in GTS patients treated with this medication.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41598-017-06547-8DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5524760PMC
July 2017

The Importance of Falsification in Computational Cognitive Modeling.

Trends Cogn Sci 2017 06 2;21(6):425-433. Epub 2017 May 2.

Laboratoire de Neurosciences Cognitives, Institut National de la Santé et de la Recherche Médicale, Paris, France; Institut d'Étude de la Cognition, Departement d'Études Cognitives, École Normale Supérieure, Paris, France. Electronic address:

In the past decade the field of cognitive sciences has seen an exponential growth in the number of computational modeling studies. Previous work has indicated why and how candidate models of cognition should be compared by trading off their ability to predict the observed data as a function of their complexity. However, the importance of falsifying candidate models in light of the observed data has been largely underestimated, leading to important drawbacks and unjustified conclusions. We argue here that the simulation of candidate models is necessary to falsify models and therefore support the specific claims about cognitive function made by the vast majority of model-based studies. We propose practical guidelines for future research that combine model comparison and falsification.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.tics.2017.03.011DOI Listing
June 2017

The Computational Development of Reinforcement Learning during Adolescence.

PLoS Comput Biol 2016 06 20;12(6):e1004953. Epub 2016 Jun 20.

Institute of Cognitive Neuroscience, University College London, London, United Kingdom.

Adolescence is a period of life characterised by changes in learning and decision-making. Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissociable computational modules. Here, we aimed to trace the developmental time-course of the computational modules responsible for learning from reward or punishment, and learning from counterfactual feedback. Adolescents and adults carried out a novel reinforcement learning paradigm in which participants learned the association between cues and probabilistic outcomes, where the outcomes differed in valence (reward versus punishment) and feedback was either partial or complete (either the outcome of the chosen option only, or the outcomes of both the chosen and unchosen option, were displayed). Computational strategies changed during development: whereas adolescents' behaviour was better explained by a basic reinforcement learning algorithm, adults' behaviour integrated increasingly complex computational features, namely a counterfactual learning module (enabling enhanced performance in the presence of complete feedback) and a value contextualisation module (enabling symmetrical reward and punishment learning). Unlike adults, adolescent performance did not benefit from counterfactual (complete) feedback. In addition, while adults learned symmetrically from both reward and punishment, adolescents learned from reward but were less likely to learn from punishment. This tendency to rely on rewards and not to consider alternative consequences of actions might contribute to our understanding of decision-making in adolescence.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1371/journal.pcbi.1004953DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4920542PMC
June 2016

Enhanced habit formation in Gilles de la Tourette syndrome.

Brain 2016 Feb 21;139(Pt 2):605-15. Epub 2015 Oct 21.

1 Sorbonne Universités, UPMC Univ Paris 06, UMR S 975, CNRS UMR 7225, ICM, F-75013, Paris, France 2 Assistance Publique-Hôpitaux de Paris, Department of Neurology, Groupe Hospitalier Pitié-Salpêtrière, 47 boulevard de l'Hôpital, F-75013, Paris, France 9 French Reference Centre for Gilles de la Tourette Syndrome, Groupe Hospitalier Pitié-Salpêtrière, 47 boulevard de l'Hôpital, F-75013, Paris, France

Tics are sometimes described as voluntary movements performed in an automatic or habitual way. Here, we addressed the question of balance between goal-directed and habitual behavioural control in Gilles de la Tourette syndrome and formally tested the hypothesis of enhanced habit formation in these patients. To this aim, we administered a three-stage instrumental learning paradigm to 17 unmedicated and 17 antipsychotic-medicated patients with Gilles de la Tourette syndrome and matched controls. In the first stage of the task, participants learned stimulus-response-outcome associations. The subsequent outcome devaluation and 'slip-of-action' tests allowed evaluation of the participants' capacity to flexibly adjust their behaviour to changes in action outcome value. In this task, unmedicated patients relied predominantly on habitual, outcome-insensitive behavioural control. Moreover, in these patients, the engagement in habitual responses correlated with more severe tics. Medicated patients performed at an intermediate level between unmedicated patients and controls. Using diffusion tensor imaging on a subset of patients, we also addressed whether the engagement in habitual responding was related to structural connectivity within cortico-striatal networks. We showed that engagement in habitual behaviour in patients with Gilles de la Tourette syndrome correlated with greater structural connectivity within the right motor cortico-striatal network. In unmedicated patients, stronger structural connectivity of the supplementary motor cortex with the sensorimotor putamen predicted more severe tics. Overall, our results indicate enhanced habit formation in unmedicated patients with Gilles de la Tourette syndrome. Aberrant reinforcement signals to the sensorimotor striatum may be fundamental for the formation of stimulus-response associations and may contribute to the habitual behaviour and tics of this syndrome.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/brain/awv307DOI Listing
February 2016

Contextual modulation of value signals in reward and punishment learning.

Nat Commun 2015 Aug 25;6:8096. Epub 2015 Aug 25.

Laboratoire de Neurosciences Cognitives (LNC), Département d'Etudes Cognitives (DEC), Institut National de la Santé et Recherche Médical (INSERM) U960, École Normale Supérieure (ENS), 75005 Paris, France.

Compared with reward seeking, punishment avoidance learning is less clearly understood at both the computational and neurobiological levels. Here we demonstrate, using computational modelling and fMRI in humans, that learning option values in a relative--context-dependent--scale offers a simple computational solution for avoidance learning. The context (or state) value sets the reference point to which an outcome should be compared before updating the option value. Consequently, in contexts with an overall negative expected value, successful punishment avoidance acquires a positive value, thus reinforcing the response. As revealed by post-learning assessment of options values, contextual influences are enhanced when subjects are informed about the result of the forgone alternative (counterfactual information). This is mirrored at the neural level by a shift in negative outcome encoding from the anterior insula to the ventral striatum, suggesting that value contextualization also limits the need to mobilize an opponent punishment learning system.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1038/ncomms9096DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4560823PMC
August 2015

Learning to minimize efforts versus maximizing rewards: computational principles and neural correlates.

J Neurosci 2014 Nov;34(47):15621-30

Motivation, Brain and Behavior Laboratory, Neuroimaging Research Center, Brain and Spine Institute, INSERM U975, CNRS UMR 7225, UPMC-P6 UMR S 1127, 7561 Paris Cedex 13, France,

The mechanisms of reward maximization have been extensively studied at both the computational and neural levels. By contrast, little is known about how the brain learns to choose the options that minimize action cost. In principle, the brain could have evolved a general mechanism that applies the same learning rule to the different dimensions of choice options. To test this hypothesis, we scanned healthy human volunteers while they performed a probabilistic instrumental learning task that varied in both the physical effort and the monetary outcome associated with choice options. Behavioral data showed that the same computational rule, using prediction errors to update expectations, could account for both reward maximization and effort minimization. However, these learning-related variables were encoded in partially dissociable brain areas. In line with previous findings, the ventromedial prefrontal cortex was found to positively represent expected and actual rewards, regardless of effort. A separate network, encompassing the anterior insula, the dorsal anterior cingulate, and the posterior parietal cortex, correlated positively with expected and actual efforts. These findings suggest that the same computational rule is applied by distinct brain systems, depending on the choice dimension-cost or benefit-that has to be learned.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1523/JNEUROSCI.1350-14.2014DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6608437PMC
November 2014
-->