全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
PLOS ONE  2014 

A Spiking Network Model of Decision Making Employing Rewarded STDP

DOI: 10.1371/journal.pone.0090821

Full-Text   Cite this paper   Add to My Lib

Abstract:

Reward-modulated spike timing dependent plasticity (STDP) combines unsupervised STDP with a reinforcement signal that modulates synaptic changes. It was proposed as a learning rule capable of solving the distal reward problem in reinforcement learning. Nonetheless, performance and limitations of this learning mechanism have yet to be tested for its ability to solve biological problems. In our work, rewarded STDP was implemented to model foraging behavior in a simulated environment. Over the course of training the network of spiking neurons developed the capability of producing highly successful decision-making. The network performance remained stable even after significant perturbations of synaptic structure. Rewarded STDP alone was insufficient to learn effective decision making due to the difficulty maintaining homeostatic equilibrium of synaptic weights and the development of local performance maxima. Our study predicts that successful learning requires stabilizing mechanisms that allow neurons to balance their input and output synapses as well as synaptic noise.

References

[1]  Izhikevich EM (2007) Solving the distal reward problem through linkage of STDP and dopamine signaling. Cereb Cortex 17: 2443–2452. doi: 10.1093/cercor/bhl152
[2]  Farries MA, Fairhall AL (2007) Reinforcement learning with modulated spike timing dependent synaptic plasticity. J Neurophysiol 98: 3648–3665. doi: 10.1152/jn.00364.2007
[3]  Florian RV (2007) Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity. Neural Comput 19: 1468–1502. doi: 10.1162/neco.2007.19.6.1468
[4]  Legenstein R, Pecevski D, Maass W (2008) A learning theory for reward-modulated spike-timing-dependent plasticity with application to biofeedback. PLoS Comput Biol 4: e1000180. doi: 10.1371/journal.pcbi.1000180
[5]  Hull CL (1943) Principles of Behavior. New York: Appelton-century.
[6]  Frey U, Morris RG (1997) Synaptic tagging and long-term potentiation. Nature 385: 533–536. doi: 10.1038/385533a0
[7]  Morris RG (2006) Elements of a neurobiological theory of hippocampal function: the role of synaptic plasticity, synaptic tagging and schemas. The European journal of neuroscience 23: 2829–2846. doi: 10.1111/j.1460-9568.2006.04888.x
[8]  Seamans JK, Yang CR (2004) The principal features and mechanisms of dopamine modulation in the prefrontal cortex. Prog Neurobiol 74: 1–58. doi: 10.1016/j.pneurobio.2004.10.002
[9]  Nitz DA, Kargo WJ, Fleischer J (2007) Dopamine signaling and the distal reward problem. Neuroreport 18: 1833–1836. doi: 10.1097/wnr.0b013e3282f16d86
[10]  Zhang JC, Lau PM, Bi GQ (2009) Gain in sensitivity and loss in temporal contrast of STDP by dopaminergic modulation at hippocampal synapses. Proc Natl Acad Sci U S A 106: 13028–13033. doi: 10.1073/pnas.0900546106
[11]  Cassenaer S, Laurent G (2012) Conditional modulation of spike-timing-dependent plasticity for olfactory learning. Nature 482: 47–52. doi: 10.1038/nature10776
[12]  Fremaux N, Sprekeler H, Gerstner W (2010) Functional requirements for reward-modulated spike-timing-dependent plasticity. J Neurosci 30: 13326–13337. doi: 10.1523/jneurosci.6249-09.2010
[13]  Turing AM (1948) Intelligent Machinery. National Physical Laboratory.
[14]  Cheng B, Titterington D (1994) Neural Networks: A Review from a Statistical Perspective. Statistical Science: 2–54.
[15]  Ciresan D, Meier U, Gambardella LM, Schmidhuber J (2010) Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition. Neural Computation 22: 3207–3220. doi: 10.1162/neco_a_00052
[16]  Zhang GP, Hu MY, Patuwo BE, Indro DC (1999) Artificial neural networks in bankruptcy prediction: General framework and cross-validation analysis. European Journal of Operational Research 116: 16–32. doi: 10.1016/s0377-2217(98)00051-4
[17]  Basheer IA, Hajmeer M (2000) Artificial neural networks: fundamentals, computing, design, and application. J Microbiol Methods 43: 3–31. doi: 10.1016/s0167-7012(00)00201-3
[18]  Hebb DO (1961) Distinctive features of learning in the higher animal. In: JF d, editor.Brain mechanisms and learning.Lodon: Oxford University press. pp. 37–46.
[19]  Rao RPN, Sejnowski TJ (2001) Spike-Timing-Dependent Hebbian Plasticity as Temporal Difference Learning. Neural Computation 13: 2221–2237. doi: 10.1162/089976601750541787
[20]  Bienenstock EL, Cooper LN, Munro PW (1982) Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex. J Neurosci 2: 32–48. doi: 10.1142/9789812795885_0006
[21]  Lisman J (1989) A mechanism for the Hebb and the anti-Hebb processes underlying learning and memory. Proc Natl Acad Sci U S A 86: 9574–9578. doi: 10.1073/pnas.86.23.9574
[22]  Hansel C, Artola A, Singer W (1997) Relation between dendritic Ca2+ levels and the polarity of synaptic long-term modifications in rat visual cortex neurons. Eur J Neurosci 9: 2309–2322. doi: 10.1111/j.1460-9568.1997.tb01648.x
[23]  Ismailov I, Kalikulov D, Inoue T, Friedlander MJ (2004) The kinetic profile of intracellular calcium predicts long-term potentiation and long-term depression. J Neurosci 24: 9847–9861. doi: 10.1523/jneurosci.0738-04.2004
[24]  Malenka RC, Kauer JA, Zucker R, Nicoll RA (1988) Postsynaptic calcium is sufficient for potentiation of the hippocampal synaptic transmission. Science: 81–83.
[25]  Bliss TV, Collingridge GL (1993) A synaptic model of memory: long-term potentiation in the hippocampus. Nature 361: 31–39. doi: 10.1038/361031a0
[26]  Kawato M, Kuroda S, Schweighofer N (2011) Cerebellar supervised learning revisited: biophysical modeling and degrees-of-freedom control. Curr Opin Neurobiol 21: 791–800. doi: 10.1016/j.conb.2011.05.014
[27]  Jain AK, Mao J, Mohiuddin KM (1996) Artificial Neural Networks: A tutorial. Computer 23: 31–44. doi: 10.1109/2.485891
[28]  White H (1989) Learning in Artificial Neural Networks: A Statistical Perspective. Neural Computation 1: 425–464. doi: 10.1162/neco.1989.1.4.425
[29]  Huerta R, Nowotny T (2009) Fast and robust learning by reinforcement signals: explorations in the insect brain. Neural Comput 21: 2123–2151. doi: 10.1162/neco.2009.03-08-733
[30]  Huerta R, Nowotny T, Garcia-Sanchez M, Abarbanel HD, Rabinovich MI (2004) Learning classification in the olfactory system of insects. Neural Comput 16: 1601–1640. doi: 10.1162/089976604774201613
[31]  Lotfi A, Benyettou A (2011) Using Probabilistic Neural Networks for Handwritten Digit Recognition. Journal of Artificial Intelligence: 288–294.
[32]  Potjans W, Morrison A, Diesmann M (2009) A spiking neural network model of an actor-critic learning agent. Neural Comput 21: 301–339. doi: 10.1162/neco.2008.08-07-593
[33]  Chadderdon GL, Neymotin SA, Kerr CC, Lytton WW (2012) Reinforcement learning of targeted movement in a spiking neuronal model of motor cortex. PLoS One 7: e47251. doi: 10.1371/journal.pone.0047251
[34]  Miller P, Katz DB (2010) Stochastic transitions between neural states in taste processing and decision-making. J Neurosci 30: 2559–2570. doi: 10.1523/jneurosci.3047-09.2010
[35]  Lee K, Kwon DS (2008) Synaptic plasticity model of a spiking neural network for reinforcement learning. Neurocomputing 71: 3037–3043. doi: 10.1016/j.neucom.2007.09.009
[36]  van Rossum MC, Bi GQ, Turrigiano GG (2000) Stable Hebbian learning from spike timing-dependent plasticity. J Neurosci 20: 8812–8821.
[37]  Wu Z, Yamaguchi Y (2006) Conserving total synaptic weight ensures one-trial sequence learning of place fields in the hippocampus. Neural Netw 19: 547–563. doi: 10.1016/j.neunet.2005.06.048
[38]  Elliott T, Shadbolt NR (2002) Multiplicative synaptic normalization and a nonlinear Hebb rule underlie a neurotrophic model of competitive synaptic plasticity. Neural Comput 14: 1311–1322. doi: 10.1162/089976602753712954
[39]  Finelli LA, Haney S, Bazhenov M, Stopfer M, Sejnowski TJ (2008) Synaptic learning rules and sparse coding in a model sensory system. PLoS Comput Biol 4: e1000062. doi: 10.1371/journal.pcbi.1000062
[40]  Bazhenov M, Huerta R, Smith BH (2013) A computational framework for understanding decision making through integration of basic learning rules. J Neurosci 33: 5686–5697. doi: 10.1523/jneurosci.4145-12.2013
[41]  Bi GQ, Poo MM (1998) Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type. J Neurosci 18: 10464–10472.
[42]  Hardingham NR, Hardingham GE, Fox KD, Jack JJ (2007) Presynaptic efficacy directs normalization of synaptic strength in layer 2/3 rat neocortex after paired activity. J Neurophysiol 97: 2965–2975. doi: 10.1152/jn.01352.2006
[43]  Morrison A, Aertsen A, Diesmann M (2007) Spike-timing-dependent plasticity in balanced random networks. Neural Comput 19: 1437–1467. doi: 10.1162/neco.2007.19.6.1437
[44]  Babadi B, Abbott LF (2010) Intrinsic stability of temporally shifted spike-timing dependent plasticity. PLoS Comput Biol 6: e1000961. doi: 10.1371/journal.pcbi.1000961
[45]  Gilson M, Fukai T (2010) Stability versus neuronal specialization for STDP: long-tail weight distributions solve the dilemma. PLoS One 6: e25339. doi: 10.1371/journal.pone.0025339
[46]  Delgado JY, Gomez-Gonzalez JF, Desai NS (2010) Pyramidal neuron conductance state gates spike-timing-dependent plasticity. J Neurosci 30: 15713–15725. doi: 10.1523/jneurosci.3068-10.2010
[47]  Abbott LF, Nelson SB (2000) Synaptic plasticity: taming the beast. Nat Neurosci 3 Suppl: 1178–1183
[48]  Kempter R, Gerstner W, van Hemmen JL (2001) Intrinsic stabilization of output rates by spike-based Hebbian learning. Neural Comput 13: 2709–2741. doi: 10.1162/089976601317098501
[49]  Gutig R, Aharonov R, Rotter S, Sompolinsky H (2003) Learning input correlations through nonlinear temporally asymmetric Hebbian plasticity. J Neurosci 23: 3697–3714.
[50]  Nishiyama M, Hong K, Mikoshiba K, Poo MM, Kato K (2000) Calcium stores regulate the polarity and input specificity of synaptic modification. Nature 408: 584–588. doi: 10.1038/35046067
[51]  Zhou YD, Acker CD, Netoff TI, Sen K, White JA (2005) Increasing Ca2+ transients by broadening postsynaptic action potentials enhances timing-dependent synaptic depression. Proc Natl Acad Sci U S A 102: 19121–19125. doi: 10.1073/pnas.0509856103
[52]  Haas JS, Nowotny T, Abarbanel HD (2006) Spike-timing-dependent plasticity of inhibitory synapses in the entorhinal cortex. J Neurophysiol 96: 3305–3313. doi: 10.1152/jn.00551.2006
[53]  Sjostrom PJ, Turrigiano GG, Nelson SB (2001) Rate, timing, and cooperativity jointly determine cortical synaptic plasticity. Neuron 32: 1149–1164. doi: 10.1016/s0896-6273(01)00542-6
[54]  Feldman DE (2009) Synaptic mechanisms for plasticity in neocortex. Annu Rev Neurosci 32: 33–55. doi: 10.1146/annurev.neuro.051508.135516
[55]  Royer S, Pare D (2003) Conservation of total synaptic weight through balanced synaptic depression and potentiation. Nature 422: 518–522. doi: 10.1038/nature01530
[56]  Chistiakova M, Volgushev M (2009) Heterosynaptic plasticity in the neocortex. Exp Brain Res 199: 377–390. doi: 10.1007/s00221-009-1859-5
[57]  Turrigiano GG, Leslie KR, Desai NS, Rutherford LC, Nelson SB (1998) Activity-dependent scaling of quantal amplitude in neocortical neurons. Nature 391: 892–896. doi: 10.1038/36103
[58]  Jay TM (2003) Dopamine: a potential substrate for synaptic plasticity and memory mechanisms. Prog Neurobiol 69: 375–390. doi: 10.1016/s0301-0082(03)00085-6
[59]  Pawlak V, Kerr JN (2008) Dopamine receptor activation is required for corticostriatal spike-timing-dependent plasticity. J Neurosci 28: 2435–2446. doi: 10.1523/jneurosci.4402-07.2008
[60]  Schultz W (1999) The Reward Signal of Midbrain Dopamine Neurons. News Physiol Sci 14: 249–255.
[61]  Yuste R, Denk W (1995) Dendritic spines as basic functional units of neuronal integration. Nature 375: 682–684. doi: 10.1038/375682a0
[62]  Schiller J, Schiller Y, Clapham DE (1998) NMDA receptors amplify calcium influx into dendritic spines during associative pre- and postsynaptic activation. Nat Neurosci 1: 114–118.
[63]  Lynch GS, Dunwiddie T, Gribkoff V (1977) Heterosynaptic depression: a postsynaptic correlate of long-term potentiation. Nature 266: 737–739. doi: 10.1038/266737a0
[64]  Bonhoeffer T, Staiger V, Aertsen A (1989) Synaptic plasticity in rat hippocampal slice cultures: local "Hebbian" conjunction of pre- and postsynaptic stimulation leads to distributed synaptic enhancement. Proc Natl Acad Sci U S A 86: 8113–8117.
[65]  Kossel A, Bonhoeffer T, Bolz J (1990) Non-Hebbian synapses in rat visual cortex. Neuroreport 1: 115–118. doi: 10.1097/00001756-199010000-00008
[66]  Engert F, Bonhoeffer T (1997) Synapse specificity of long-term potentiation breaks down at short distances. Nature 388: 279–284.
[67]  Schuman EM, Madison DV (1994) Locally distributed synaptic potentiation in the hippocampus. Science 263: 532–536. doi: 10.1126/science.8290963
[68]  Chen JY, Lonjers P, Lee C, Chistiakova M, Volgushev M, et al. (2013) Heterosynaptic Plasticity Prevents Runaway Synaptic Dynamics. J Neurosci 33: 15915–15929. doi: 10.1523/jneurosci.5088-12.2013
[69]  Skinner BF (1948) Superstition in the pigeon. J Exp Psychol 38: 168–172. doi: 10.1037/h0055873
[70]  Segal DS, Mandell AJ (1974) Long-term administration of d-amphetamine: progressive augmentation of motor activity and stereotypy. Pharmacol Biochem Behav 2: 249–255. doi: 10.1016/0091-3057(74)90060-4
[71]  Baker DA, Specio SE, Tran-Nguyen LT, Neisewander JL (1998) Amphetamine infused into the ventrolateral striatum produces oral stereotypies and conditioned place preference. Pharmacol Biochem Behav 61: 107–111. doi: 10.1016/s0091-3057(98)00070-7
[72]  Ermentrout GB, Galan RF, Urban NN (2008) Reliability, synchrony and noise. Trends Neurosci 31: 428–434. doi: 10.1016/j.tins.2008.06.002
[73]  Anderson JS, Lampl I, Gillespie DC, Ferster D (2000) The contribution of noise to contrast invariance of orientation tuning in cat visual cortex. Science 290: 1968–1972. doi: 10.1126/science.290.5498.1968
[74]  Mainen ZF, Sejnowski TJ (1995) Reliability of spike timing in neocortical neurons. Science 268: 1503–1506. doi: 10.1126/science.7770778
[75]  Rulkov NF, Timofeev I, Bazhenov M (2004) Oscillations in large-scale cortical networks: map-based model. J Comput Neurosci 17: 203–223. doi: 10.1023/b:jcns.0000037683.55688.7e
[76]  Rulkov NF, Bazhenov M (2008) Oscillations and synchrony in large-scale cortical network models. J Biol Phys 34: 279–299. doi: 10.1007/s10867-008-9079-y
[77]  Bazhenov M, Stopfer M (2010) Forward and back: motifs of inhibition in olfactory processing. Neuron 67: 357–358. doi: 10.1016/j.neuron.2010.07.023
[78]  Timofeev I, Bazhenov M (2005) Mechanisms and biological role of thalamocortical oscillations. Trends in Chronobiology Research: 1–47.
[79]  Pouille F, Scanziani M (2001) Enforcement of temporal fidelity in pyramidal cells by somatic feed-forward inhibition. Science 293: 1159–1163. doi: 10.1126/science.1060342
[80]  Assisi C, Stopfer M, Laurent G, Bazhenov M (2007) Adaptive regulation of sparseness by feedforward inhibition. Nat Neurosci 10: 1176–1184. doi: 10.1038/nn1947
[81]  Stokes CC, Isaacson JS (2011) From dendrite to soma: dynamic routing of inhibition by complementary interneuron microcircuits in olfactory cortex. Neuron 67: 452–465. doi: 10.1016/j.neuron.2010.06.029
[82]  Mittmann W, Koch U, Hausser M (2005) Feed-forward inhibition shapes the spike output of cerebellar Purkinje cells. J Physiol 563: 369–378. doi: 10.1113/jphysiol.2004.075028
[83]  Markram H, Lubke J, Frotscher M, Sakmann B (1997) Regulation of synaptic efficacy by coincidence of postsynaptic APs and EPSPs. Science 275: 213–215. doi: 10.1126/science.275.5297.213
[84]  Rulkov NF (2002) Modeling of spiking-bursting neural behavior using two-dimensional map. Phys Rev E Stat Nonlin Soft Matter Phys 65: 041922. doi: 10.1103/physreve.65.041922
[85]  Bazhenov M, Rulkov NF, Fellous JM, Timofeev I (2005) Role of network dynamics in shaping spike timing reliability. Phys Rev E Stat Nonlin Soft Matter Phys 72: 041903. doi: 10.1103/physreve.72.041903

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133