全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
PLOS ONE  2014 

Value Learning and Arousal in the Extinction of Probabilistic Rewards: The Role of Dopamine in a Modified Temporal Difference Model

DOI: 10.1371/journal.pone.0089494

Full-Text   Cite this paper   Add to My Lib

Abstract:

Because most rewarding events are probabilistic and changing, the extinction of probabilistic rewards is important for survival. It has been proposed that the extinction of probabilistic rewards depends on arousal and the amount of learning of reward values. Midbrain dopamine neurons were suggested to play a role in both arousal and learning reward values. Despite extensive research on modeling dopaminergic activity in reward learning (e.g. temporal difference models), few studies have been done on modeling its role in arousal. Although temporal difference models capture key characteristics of dopaminergic activity during the extinction of deterministic rewards, they have been less successful at simulating the extinction of probabilistic rewards. By adding an arousal signal to a temporal difference model, we were able to simulate the extinction of probabilistic rewards and its dependence on the amount of learning. Our simulations propose that arousal allows the probability of reward to have lasting effects on the updating of reward value, which slows the extinction of low probability rewards. Using this model, we predicted that, by signaling the prediction error, dopamine determines the learned reward value that has to be extinguished during extinction and participates in regulating the size of the arousal signal that controls the learning rate. These predictions were supported by pharmacological experiments in rats.

References

[1]  Bouton ME (2004) Context and behavioral processes in extinction. Learn Mem 11: 485–494. doi: 10.1101/lm.78804
[2]  Quirk GJ, Mueller D (2008) Neural mechanisms of extinction learning and retrieval. Neuropsychopharmacology 33: 56–72. doi: 10.1038/sj.npp.1301555
[3]  Courville AC, Daw ND, Touretzky DS (2006) Bayesian theories of conditioning in a changing world. Trends Cogn Sci 10: 294–300. doi: 10.1016/j.tics.2006.05.004
[4]  Nassar MR, Rumsey KM, Wilson RC, Parikh K, Heasly B, et al. (2012) Rational regulation of learning dynamics by pupil-linked arousal systems. Nat Neurosci 15: 1040–1046. doi: 10.1038/nn.3130
[5]  Pearce JM, Kaye H, Hall G (1982) Predictive accuracy and stimulus associability: Development of a model for Pavlovian learning. In: Commons ML, Herrnstein RJ, Wagner AR, editors. Quantitative Analyses of Behavior. Cambridge, MA: Ballinger. 241–255.
[6]  Horsley RR, Osborne M, Norman C, Wells T (2012) High-frequency gamblers show increased resistance to extinction following partial reinforcement. Behav Brain Res 229: 438–442. doi: 10.1016/j.bbr.2012.01.024
[7]  Weatherly JN, Sauter JM, King BM (2004) The “big win” and resistance to extinction when gambling. J Psychol 138: 495–504. doi: 10.3200/jrlp.138.6.495-504
[8]  Haselgrove M, Aydin A, Pearce JM (2004) A partial reinforcement extinction effect despite equal rates of reinforcement during Pavlovian conditioning. J Exp Psychol Anim Behav Process 30: 240–250. doi: 10.1037/0097-7403.30.3.240
[9]  Bacon WE (1962) Partial-reinforcement extinction effect following different amounts of training. J Comp Physiol Psychol 55: 998–1003. doi: 10.1037/h0048614
[10]  Lewis DJ (1960) Partial reinforcement: a selective review of the literature since 1950. Psychol Bull 57: 1–28. doi: 10.1037/h0040963
[11]  Steinberg EE, Keiflin R, Boivin JR, Witten IB, Deisseroth K, et al. (2013) A causal link between prediction errors, dopamine neurons and learning. Nat Neurosci 16: 966–973. doi: 10.1038/nn.3413
[12]  Schultz W (2007) Behavioral dopamine signals. Trends Neurosci 30: 203–210. doi: 10.1016/j.tins.2007.03.007
[13]  Fiorillo CD, Tobler PN, Schultz W (2003) Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299: 1898–1902. doi: 10.1126/science.1077349
[14]  Roesch MR, Calu DJ, Schoenbaum G (2007) Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nat Neurosci 10: 1615–1624. doi: 10.1038/nn2013
[15]  Li J, Schiller D, Schoenbaum G, Phelps EA, Daw ND (2011) Differential roles of human striatum and amygdala in associative learning. Nat Neurosci 14: 1250–1252. doi: 10.1038/nn.2904
[16]  Roesch MR, Calu DJ, Esber GR, Schoenbaum G (2010) Neural correlates of variations in event processing during learning in basolateral amygdala. J Neurosci 30: 2464–2471. doi: 10.1523/jneurosci.5781-09.2010
[17]  Esber GR, Roesch MR, Bali S, Trageser J, Bissonette GB, et al. (2012) Attention-related Pearce-Kaye-Hall signals in basolateral amygdala require the midbrain dopaminergic system. Biol Psychiatry 72: 1012–1019. doi: 10.1016/j.biopsych.2012.05.023
[18]  Weiner I, Bercovitz H, Lubow RE, Feldon J (1985) The abolition of the partial reinforcement extinction effect (PREE) by amphetamine. Psychopharmacology (Berl) 86: 318–323. doi: 10.1007/bf00432221
[19]  Pan WX, Brown J, Dudman JT (2013) Neural signals of extinction in the inhibitory microcircuit of the ventral midbrain. Nat Neurosci 16: 71–78. doi: 10.1038/nn.3283
[20]  Pan WX, Schmidt R, Wickens JR, Hyland BI (2008) Tripartite mechanism of extinction suggested by dopamine neuron activity and temporal difference model. J Neurosci 28: 9619–9631. doi: 10.1523/jneurosci.0255-08.2008
[21]  Lupica CR, Riegel AC, Hoffman AF (2004) Marijuana and cannabinoid regulation of brain reward circuits. Br J Pharmacol 143: 227–234. doi: 10.1038/sj.bjp.0705931
[22]  Schultz W, Preuschoff K, Camerer C, Hsu M, Fiorillo CD, et al. (2008) Explicit neural signals reflecting reward uncertainty. Philos Trans R Soc Lond B Biol Sci 363: 3801–3811. doi: 10.1098/rstb.2008.0152
[23]  Redish AD, Jensen S, Johnson A, Kurth-Nelson Z (2007) Reconciling reinforcement learning models with behavioral extinction and renewal: implications for addiction, relapse, and problem gambling. Psychol Rev 114: 784–805. doi: 10.1037/0033-295x.114.3.784
[24]  Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction and reward. Science 275: 1593–1599. doi: 10.1126/science.275.5306.1593
[25]  Sutton RS, Barto AG (1998) Reinforcement learning : an introduction. Cambridge, Mass.: MIT Press. xviii, 322 p. p.
[26]  Pan WX, Schmidt R, Wickens JR, Hyland BI (2005) Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward-learning network. J Neurosci 25: 6235–6242. doi: 10.1523/jneurosci.1478-05.2005
[27]  Dreyer JK, Herrik KF, Berg RW, Hounsgaard JD (2010) Influence of phasic and tonic dopamine release on receptor activation. J Neurosci 30: 14273–14283. doi: 10.1523/jneurosci.1894-10.2010
[28]  Moquin KF, Michael AC (2009) Tonic autoinhibition contributes to the heterogeneity of evoked dopamine release in the rat striatum. J Neurochem 110: 1491–1501. doi: 10.1111/j.1471-4159.2009.06254.x
[29]  French ED, Dillon K, Wu X (1997) Cannabinoids excite dopamine neurons in the ventral tegmentum and substantia nigra. Neuroreport 8: 649–652. doi: 10.1097/00001756-199702100-00014
[30]  Cheer JF, Wassum KM, Heien ML, Phillips PE, Wightman RM (2004) Cannabinoids enhance subsecond dopamine release in the nucleus accumbens of awake rats. J Neurosci 24: 4393–4400. doi: 10.1523/jneurosci.0529-04.2004
[31]  Phillips PE, Stuber GD, Heien ML, Wightman RM, Carelli RM (2003) Subsecond dopamine release promotes cocaine seeking. Nature 422: 614–618. doi: 10.1038/nature01476
[32]  Frank MJ, Seeberger LC, O’Reilly RC (2004) By carrot or by stick: cognitive reinforcement learning in parkinsonism. Science 306: 1940–1943. doi: 10.1126/science.1102941
[33]  Arbuthnott GW, Wickens J (2007) Space, time and dopamine. Trends Neurosci 30: 62–69. doi: 10.1016/j.tins.2006.12.003
[34]  Beeler JA, Daw N, Frazier CR, Zhuang X (2010) Tonic dopamine modulates exploitation of reward learning. Front Behav Neurosci 4: 170. doi: 10.3389/fnbeh.2010.00170
[35]  Niv Y, Daw ND, Joel D, Dayan P (2007) Tonic dopamine: opportunity costs and the control of response vigor. Psychopharmacology (Berl) 191: 507–520. doi: 10.1007/s00213-006-0502-4
[36]  Seamans JK, Yang CR (2004) The principal features and mechanisms of dopamine modulation in the prefrontal cortex. Prog Neurobiol 74: 1–58. doi: 10.1016/j.pneurobio.2004.10.002
[37]  Theios J (1962) The partial reinforcement effect sustained through blocks of continuous reinforcement. J Exp Psychol 64: 1–6. doi: 10.1037/h0046302
[38]  Jenkins HM (1962) Resistance to extinction when partial reinforcement is followed by regular reinforcement. J Exp Psychol 64: 441–450. doi: 10.1037/h0048700
[39]  Nassar MR, Wilson RC, Heasly B, Gold JI (2010) An approximately Bayesian delta-rule model explains the dynamics of belief updating in a changing environment. J Neurosci 30: 12366–12378. doi: 10.1523/jneurosci.0822-10.2010
[40]  Nadel L, Samsonovich A, Ryan L, Moscovitch M (2000) Multiple trace theory of human memory: computational, neuroimaging, and neuropsychological results. Hippocampus 10: 352–368. doi: 10.1002/1098-1063(2000)10:4<352::aid-hipo2>3.0.co;2-d
[41]  Nader K, Hardt O (2009) A single standard for memory: the case for reconsolidation. Nat Rev Neurosci 10: 224–234. doi: 10.1038/nrn2590
[42]  Gershman SJ, Blei DM, Niv Y (2010) Context, learning, and extinction. Psychol Rev 117: 197–209. doi: 10.1037/a0017808
[43]  Daw ND, Niv Y, Dayan P (2005) Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat Neurosci 8: 1704–1711. doi: 10.1038/nn1560
[44]  Floresco SB, Magyar O (2006) Mesocortical dopamine modulation of executive functions: beyond working memory. Psychopharmacology (Berl) 188: 567–585. doi: 10.1007/s00213-006-0404-5
[45]  van der Meulen JA, Joosten RN, de Bruin JP, Feenstra MG (2007) Dopamine and noradrenaline efflux in the medial prefrontal cortex during serial reversals and extinction of instrumental goal-directed behavior. Cereb Cortex 17: 1444–1453. doi: 10.1093/cercor/bhl057
[46]  McLaughlin RJ, Floresco SB (2007) The role of different subregions of the basolateral amygdala in cue-induced reinstatement and extinction of food-seeking behavior. Neuroscience 146: 1484–1494. doi: 10.1016/j.neuroscience.2007.03.025
[47]  Roesch MR, Calu DJ, Esber GR, Schoenbaum G (2010) All that glitters … dissociating attention and outcome expectancy from prediction errors signals. J Neurophysiol 104: 587–595. doi: 10.1152/jn.00173.2010
[48]  Dayan P, Daw ND (2008) Decision theory, reinforcement learning, and the brain. Cogn Affect Behav Neurosci 8: 429–453. doi: 10.3758/cabn.8.4.429
[49]  Orban G, Wolpert DM (2011) Representations of uncertainty in sensorimotor control. Curr Opin Neurobiol 21: 629–635. doi: 10.1016/j.conb.2011.05.026
[50]  Esber GR, Haselgrove M (2011) Reconciling the influence of predictiveness and uncertainty on stimulus salience: a model of attention in associative learning. Proc Biol Sci 278: 2553–2561. doi: 10.1098/rspb.2011.0836
[51]  Le Pelley ME (2004) The role of associative history in models of associative learning: a selective review and a hybrid model. Q J Exp Psychol B 57: 193–243. doi: 10.1080/02724990344000141
[52]  Doll BB, Hutchison KE, Frank MJ (2011) Dopaminergic genes predict individual differences in susceptibility to confirmation bias. J Neurosci 31: 6188–6198. doi: 10.1523/jneurosci.6486-10.2011
[53]  Li J, Delgado MR, Phelps EA (2011) How instructed knowledge modulates the neural systems of reward learning. Proc Natl Acad Sci U S A 108: 55–60. doi: 10.1073/pnas.1014938108

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133