We address strategic cognitive sequencing, the “outer loop” of human cognition: how the brain decides what cognitive process to apply at a given moment to solve complex, multistep cognitive tasks. We argue that this topic has been neglected relative to its importance for systematic reasons but that recent work on how individual brain systems accomplish their computations has set the stage for productively addressing how brain regions coordinate over time to accomplish our most impressive thinking. We present four preliminary neural network models. The first addresses how the prefrontal cortex (PFC) and basal ganglia (BG) cooperate to perform trial-and-error learning of short sequences; the next, how several areas of PFC learn to make predictions of likely reward, and how this contributes to the BG making decisions at the level of strategies. The third models address how PFC, BG, parietal cortex, and hippocampus can work together to memorize sequences of cognitive actions from instruction (or “self-instruction”). The last shows how a constraint satisfaction process can find useful plans. The PFC maintains current and goal states and associates from both of these to find a “bridging” state, an abstract plan. We discuss how these processes could work together to produce strategic cognitive sequencing and discuss future directions in this area. 1. Introduction Weighing the merits of one scientific theory against another, deciding which plan of action to pursue, or considering whether a bill should become law all require many cognitive acts, in particular sequences [1, 2]. Humans use complex cognitive strategies to solve difficult problems, and understanding exactly how we do this is necessary to understand human intelligence. In these cases, different strategies composed of different sequences of cognitive acts are possible, and the choice of strategy is crucial in determining how we succeed and fail at particular cognitive challenges [3, 4]. Understanding strategic cognitive sequencing has important implications for reducing biases and thereby improving human decision making (e.g., [5, 6]). However, this aspect of cognition has been studied surprisingly little [7, 8] because it is complex. Tasks in which participants tend to use different strategies (and therefore sequences) necessarily produce data that is less clear and interpretable than that from a single process in a simple task . Therefore, cognitive neuroscience tends to avoid such tasks, leaving the neural mechanisms of strategy selection and cognitive sequencing underexplored relative to the
J. M. Unterrainer, B. Rahm, R. Leonhart, C. C. Ruff, and U. Halsband, “The tower of London: the impact of instructions, cueing, and learning on planning abilities,” Cognitive Brain Research, vol. 17, no. 3, pp. 675–683, 2003.
A. Newell, “You can't play 20 questions with nature and win: projective comments on the papers of this symposium,” in Visual Information Processing, W. G. Chase, Ed., pp. 283–308, Academic Press, New York, NY, USA, 1973.
A. Dagher, A. M. Owen, H. Boecker, and D. J. Brooks, “Mapping the network for planning: a correlational PET activation study with the tower of London task,” Brain, vol. 122, no. 10, pp. 1973–1987, 1999.
O. A. van den Heuvel, H. J. Groenewegen, F. Barkhof, R. H. C. Lazeron, R. van Dyck, and D. J. Veltman, “Frontostriatal system in planning complexity: a parametric functional magnetic resonance version of Tower of London task,” NeuroImage, vol. 18, no. 2, pp. 367–374, 2003.
R. C. O'Reilly, T. E. Hazy, and S. A. Herd, “The leabra cognitive architecture: how to play 20 principles with nature and win!,” in The Oxford Handbook of Cognitive Science, S. Chipman, Ed., Oxford University Press, In press.
C. Lebiere, J. R. Anderson, and D. Bothell, “Multi-tasking and cognitive workload in an act-r model of a simplified air traffic control task,” in Proceedings of the 10th Conference on Computer Generated Forces and Behavioral Representation, 2001.
D. J. Jilk, C. Lebiere, R. C. O'Reilly, and J. R. Anderson, “SAL: an explicitly pluralistic cognitive architecture,” Journal of Experimental and Theoretical Artificial Intelligence, vol. 20, no. 3, pp. 197–218, 2008.
M. J. Frank, B. Loughry, and R. C. O'Reilly, “Interactions between frontal cortex and basal ganglia in working memory: a computational model,” Cognitive, Affective and Behavioral Neuroscience, vol. 1, no. 2, pp. 137–160, 2001.
T. E. Hazy, M. J. Frank, and R. C. O'Reilly, “Towards an executive without a homunculus: computational models of the prefrontal cortex/basal ganglia system,” Philosophical Transactions of the Royal Society B, vol. 362, no. 1485, pp. 1601–1613, 2007.
M. J. Frank, “Dynamic dopamine modulation in the basal ganglia: a neurocomputational account of cognitive deficits in medicated and nonmedicated Parkinsonism,” Journal of Cognitive Neuroscience, vol. 17, no. 1, pp. 51–72, 2005.
S. A. Deadwyler, S. Hayashizaki, J. Cheer, and R. E. Hampson, “Reward, memory and substance abuse: functional neuronal circuits in the nucleus accumbens,” Neuroscience and Biobehavioral Reviews, vol. 27, no. 8, pp. 703–711, 2004.
R. S. Sutton and A. G. Barto, “Time-derivative models of pavlovian reinforcement,” in Learning and Computational Neuroscience, J. W. Moore and M. Gabriel, Eds., pp. 497–537, MIT Press, Cambridge, Mass, USA, 1990.
C. H. Chatham, S. A. Herd, A. M. Brant et al., “From an executive network to executive control: a computational model of the n-back task,” Journal of Cognitive Neuroscience, vol. 23, no. 11, pp. 3598–3619, 2011.
W.-T. Fu and J. R. Anderson, “Solving the credit assignment problem: explicit and implicit learning of action sequences with probabilistic outcomes,” Psychological Research, vol. 72, no. 3, pp. 321–330, 2008.
M. P. Noonan, N. Kolling, M. E. Walton, and M. F. S. Rushworth, “Re-evaluating the role of the orbitofrontal cortex in reward and reinforcement,” European Journal of Neuroscience, vol. 35, no. 7, pp. 997–1010, 2012.
P. L. Croxson, M. E. Walton, J. X. O'Reilly, T. E. J. Behrens, and M. F. S. Rushworth, “Effort-based cost-benefit valuation and the human brain,” The Journal of Neuroscience, vol. 29, no. 14, pp. 4531–4541, 2009.
S. W. Kennerley and M. E. Walton, “Decision making and reward in frontal cortex: complementary evidence from neurophysiological and neuropsychological studies,” Behavioral Neuroscience, vol. 125, no. 3, pp. 297–317, 2011.
M. J. Frank and E. D. Claus, “Anatomy of a decision: striato-orbitofrontal interactions in reinforcement learning, decision making, and reversal,” Psychological Review, vol. 113, no. 2, pp. 300–326, 2006.
J. M. Hyman, L. Ma, E. Balaguer-Ballester, D. Durstewitz, and J. K. Seamans, “Contextual encoding by ensembles of medial prefrontal cortex neurons,” Proceedings of the National Academy of Sciences of the United States of America, vol. 109, no. 13, pp. 5086–5091, 2012.
J. J. Day, J. L. Jones, R. M. Wightman, and R. M. Carelli, “Phasic nucleus accumbens dopamine release encodes effort- and delay-related costs,” Biological Psychiatry, vol. 68, no. 3, pp. 306–309, 2010.
G. Fernández, H. Weyerts, M. Schrader-B？lsche et al., “Successful verbal encoding into episodic memory engages the posterior hippocampus: a parametrically analyzed functional magnetic resonance imaging study,” The Journal of Neuroscience, vol. 18, no. 5, pp. 1841–1847, 1998.
D. C. Noelle and G. W. Cottrell, “A connectionist model of instruction following, pages,” in Proceedings of the 17th Annual Conference of the Cognitive Science Society, J. D. Moore and J. F. Lehman, Eds., pp. 369–374, Lawrence Erlbaum Associates, Mahwah, NJ, USA, January 1995.
B. B. Doll, W. J. Jacobs, A. G. Sanfey, and M. J. Frank, “Instructional control of reinforcement learning: a behavioral and neurocomputational investigation,” Brain Research, vol. 1299, pp. 74–94, 2009.
J. Li, M. R. Delgado, and E. A. Phelps, “How instructed knowledge modulates the neural systems of reward learning,” Proceedings of the National Academy of Sciences of the United States of America, vol. 108, no. 1, pp. 55–60, 2011.
M. M. Walsh and J. R. Anderson, “Modulation of the feedback-related negativity by instruction and experience,” Proceedings of the National Academy of Sciences of the United States of America, vol. 108, no. 47, pp. 19048–19053, 2011.
P. Gregory, D. Long, and M. Fox, “Constraint based planning with composable substate graphs,” in Proceedings of the 19th European Conference on Artificial Intelligence (ECAI '10), H. Coelho, R. Studer, and M. Wooldridge, Eds., IOS Press, 2010.
G. Konidaris and A. Barto, “Building portable options: Skill tran sfer in reinforcement learning,” in Proceedings of the 20th International Joint Conference on Artificial Intelligence, M. M. Veloso, Ed., pp. 895–900, 2006.
K. Ferguson and S. Mahadevan, “Proto-transfer learning in markov decision processes using spectral methods,” in Proceedings of the Workshop on Structural Knowledge Transfer for Machine Learning (ICML '06), 2006.