All Title Author
Keywords Abstract

Publish in OALib Journal
ISSN: 2333-9721
APC: Only $99


Relative Articles


Reinforcement Learning in an Environment Synthetically Augmented with Digital Pheromones

DOI: 10.1155/2014/932485

Full-Text   Cite this paper   Add to My Lib


Reinforcement learning requires information about states, actions, and outcomes as the basis for learning. For many applications, it can be difficult to construct a representative model of the environment, either due to lack of required information or because of that the model's state space may become too large to allow a solution in a reasonable amount of time, using the experience of prior actions. An environment consisting solely of the occurrence or nonoccurrence of specific events attributable to a human actor may appear to lack the necessary structure for the positioning of responding agents in time and space using reinforcement learning. Digital pheromones can be used to synthetically augment such an environment with event sequence information to create a more persistent and measurable imprint on the environment that supports reinforcement learning. We implemented this method and combined it with the ability of agents to learn from actions not taken, a concept known as fictive learning. This approach was tested against the historical sequence of Somali maritime pirate attacks from 2005 to mid-2012, enabling a set of autonomous agents representing naval vessels to successfully respond to an average of 333 of the 899 pirate attacks, outperforming the historical record of 139 successes. 1. Introduction Sequences of events resulting from the actions of human adversarial actors such as military forces or criminal organizations may appear to have random dynamics in time and space. Finding patterns in such sequences and using those patterns in order to anticipate and respond to the events can be quite challenging. Often, the number of potentially causal factors for such events is very large, making it infeasible to obtain and analyze all relevant information prior to the occurrence of the next event. These difficulties can hinder the planning of responses using conventional computational methods such as multiagent models and machine learning, which typically exploit information available in or about the environment. A real-world example of such a problem is Somali maritime piracy. Beginning in 2005, the number of attacks attributed to Somali pirates steadily increased. The attacks were carried out on a nearly daily basis during some periods of the year and often took place despite the presence of naval patrol vessels in the area [1]. They were often launched with little warning and at unexpected locations. We would like to use the attributes of past attacks to anticipate and respond to future attacks. However, the set of attack attributes potentially


[1]  J. Bahadur, The Pirates of Somalia: Inside Their Hidden World, Pantheon Books, New York, NY, USA, 2011.
[2]  S. J. Russell and P. Norvig, Artificial Intelligence: A Modern Approach, Prentice Hall, Upper Saddle River, NJ, USA, 2nd edition, 2003.
[3]  M. J. Wooldridge, An Introduction to Multiagent Systems, Wiley & Sons, West Sussex, UK, 2nd edition, 2009.
[4]  G. P. Williams, Chaos Theory Tamed, Joseph Henry Press, Washington, DC, USA, 1997.
[5]  P. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining, Addison Wesley, New York, NY, USA, 2006.
[6]  H. Witten, E. Frank, and M. A. Hall, Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, New York, NY, USA, 3rd edition, 2011.
[7]  R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, Mass, USA, 1998.
[8]  D. Floreano and C. Mattiussi, Bio-Inspired Artificial Intelligence: Theories, Methods, and Technologies, MIT Press, Cambridge, Mass, USA, 2008.
[9]  M. Dorigo and T. Stützle, Ant Colony Optimization, MIT Press, Cambridge, Mass, USA, 2004.
[10]  J. Kennedy and R. Eberhart, Swarm Intelligence, Morgan Kaufmann, San Francisco, Calif, USA, 2001.
[11]  Z. Michalewicz and D. B. Fogel, How to Solve it: Modern Heuristics, 2nd Revised and Extended Edition, Springer, New York, NY, USA, 2004.
[12]  International Maritime Bureau, Piracy and Armed Robbery Against Ships Annual Report 1 January-31 December 2005, International Maritime Bureau, London, UK, 2006.
[13]  International Maritime Bureau, Piracy and Armed Robbery Against Ships Annual Report 1 January-31 December 2006, International Maritime Bureau, London, UK, 2007.
[14]  International Maritime Bureau, Piracy and Armed Robbery Against Ships Annual Report 1 January-31 December 2007, International Maritime Bureau, London, UK, 2008.
[15]  International Maritime Bureau, Piracy and Armed Robbery Against Ships Annual Report 1 January-31 December 2008, International Maritime Bureau, London, UK, 2009.
[16]  International Maritime Bureau, Piracy and Armed Robbery Against Ships Annual Report 1 January-31 December 2009, International Maritime Bureau, London, UK, 2010.
[17]  International Maritime Bureau, Piracy and Armed Robbery Against Ships Annual Report 1 January-31 December 2010, International Maritime Bureau, London, UK, 2010.
[18]  International Maritime Bureau, Piracy and Armed Robbery Against Ships Report for the Period 1 January-31 December 2011, International Maritime Bureau, London, UK, 2012.
[19]  International Maritime Bureau, Piracy and Armed Robbery Against Ships Report for the Period 1 January-30 June 2012, International Maritime Bureau, London, UK, 2012.
[20]  R. I. Rotberg, “Combating maritime piracy: a policy brief with recommendations for action,” Policy Brief #11, World Peace Foundation, Medford Somerville, Mass, USA, 2010.
[21]  Oceans Beyond Piracy, “The Economic Cost of Somali Piracy 2011,” 2011,
[22]  P. Eichstaedt, Pirate State: Inside Somalia's Terrorism at Sea, Chicago Review Press, Chicago, Ill, USA, 2010.
[23]  A. Shortland and M. Vothknecht, “Combating maritime terrorism off the Coast of Somalia,” Working Paper 47, European Security Economics, Vienna, Austria, 2011.
[24]  Combined Maritime Forces, 2012,
[25]  R. Mirshak, “Ship Response Capability Models for Counter-Piracy Patrols in the Gulf of Aden,” Technical Memorandum DRDC CORA TM, 2011-139, Maritime Operations Research Team, Defence R&D Canada, Ottawa, Canada, 2011,
[26]  S. Marsland, Machine Learning: An Algorithmic Perspective, Chapman & Hall/CRC, New York, NY, USA, 2009.
[27]  C. H. Watkins and P. Dayan, “Q-learning,” Machine Learning, vol. 8, no. 3-4, pp. 279–292, 1992.
[28]  B. C. da Silva, E. W. Basso, A. L. C. Bazzan, and P. M. Engel, “Dealing with non-stationary environments using context detection,” in Proceedings of the 23rd International Conference on Machine Learning (ICML '06), pp. 217–224, June 2006.
[29]  V. Bulitko, N. Sturtevant, and M. Kazakevich, “Speeding up learning in reel-time search via automatic state abstraction,” in Proceedings of the 20th National Conference on Artificial Intelligence and the 17th Innovative Applications of Artificial Intelligence Conference (AAAI '05), pp. 1349–1354, July 2005.
[30]  L. Panait and S. Luke, “Cooperative multi-agent learning: the state of the art,” Autonomous Agents and Multi-Agent Systems, vol. 11, no. 3, pp. 387–434, 2005.
[31]  L. Bu?oniu, R. Babu?ka, and B. De Schutter, “A comprehensive survey of multiagent reinforcement learning,” IEEE Transactions on Systems, Man and Cybernetics C, vol. 38, no. 2, pp. 156–172, 2008.
[32]  L. Jing and N. Cerone, “Thoughts on multiagent learning: from a reinforcement learning perspective,” Technical Report CSE-2010-07, Department of Computer Science and Engineering, York University, Ontario, Canda, 2010.
[33]  L. Oliwenstein, “From dendrites to decisions,” Engineering and Science, vol. 74, no. 3, pp. 14–21, 2011.
[34]  P. R. Montague, B. King-Casas, and J. D. Cohen, “Imaging valuation models in human choice,” Annual Review of Neuroscience, vol. 29, pp. 417–448, 2006.
[35]  T. Lohrenz, K. McCabe, C. F. Camerer, and P. R. Montague, “Neural signature of fictive learning signals in a sequential investment task,” Proceedings of the National Academy of Sciences of the United States of America, vol. 104, no. 22, pp. 9493–9498, 2007.
[36]  A. Agogino and K. Tumer, “Regulating air traffic flow with coupled agents,” in Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems, vol. 2, pp. 535–542, 2008.
[37]  K. Tumer and N. Khani, “Learning from actions not taken in multiagent systems,” Advances in Complex Systems, vol. 12, no. 4-5, pp. 455–473, 2009.
[38]  D. M. Gordon, Ants at Work: How An Insect Society Is Organized, The Free Press, New York, NY, USA, 1999.
[39]  K. L. Huang and C. J. Liao, “Ant colony optimization combined with taboo search for the job shop scheduling problem,” Computers and Operations Research, vol. 35, no. 4, pp. 1030–1046, 2008.
[40]  Z. J. Lee, C. Y. Lee, and S. F. Su, “An immunity-based ant colony optimization algorithm for solving weapon-target assignment problem,” Applied Soft Computing Journal, vol. 2, no. 1, pp. 39–47, 2002.
[41]  J. Bautista and J. Pereira, “Ant algorithms for a time and space constrained assembly line balancing problem,” European Journal of Operational Research, vol. 177, no. 3, pp. 2016–2032, 2007.
[42]  M. Gosnell, S. O'Hara, and M. Simon, “Spatially decomposed searching by heterogeneous unmanned systems,” in Proceedings of the International Conference on Integration of Knowledge Intensive Multi-Agent Systems (KIMAS '07), pp. 52–57, May 2007.
[43]  J. G. M. Fu and M. H. Ang, “Probabilistic ants (PAnts) in multi-agent patrolling,” in Proceedings of the International Conference on Advanced Intelligent Mechatronics, pp. 1371–1376, 2009.
[44]  H. Chu, A. Glad, O. Simonin, F. Sempé, A. Drogoul, and F. Charpillet, “Swarm approaches for the patrolling problem, information propagation vs. pheromone evaporation,” in Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence (ICTAI '07), pp. 442–449, October 2007.
[45]  J. A. Sauter, R. Matthews, H. Van Dyke Parunak, and S. A. Brueckner, “Performance of digital pheromones for swarming vehicle control,” in Proceedings of the 4th International Conference on Autonomous Agents and Multi agent Systems (AAMAS '05), pp. 1037–1044, July 2005.
[46]  N. Monekosso and P. Remagnino, “An analysis of the pheromone Q-learning algorithm,” in Proceedings of the 8th Ibero-American Conference on Artificial Intelligence, pp. 224–232, 2002.
[47]  V. Furtado, A. Melo, A. L. V. Coelho, R. Menezes, and R. Perrone, “A bio-inspired crime simulation model,” Decision Support Systems, vol. 48, no. 1, pp. 282–292, 2009.
[48]  O. Vaněk, B. Bo?ansky, M. Jakob, and M. Pěchou?ek, “Transiting areas patrolled by a mobile adversary,” in Proceedings of the IEEE Conference on Computational Intelligence and Games (CIG '10), pp. 9–16, August 2010.
[49]  M. Jakob, O. Vanek, S. Urban, P. Benda, and M. Pechoucek, “Agent C: agent-based testbed for adversarial modeling and reasoning in the maritime domain,” in Proceedings of the International Conference on Autonomous and Multiagent Systems, pp. 1641–1642, 2010.
[50]  M. Jakob, O. Vaněk, and M. Pěchou?ek, “Using agents to improve international maritime transport security,” IEEE Intelligent Systems, vol. 26, no. 1, pp. 90–95, 2011.
[51]  L. A. Slootmaker, Countering piracy with the next-generation piracy performance surface model [M.S. thesis], Naval Postgraduate School, Monterey, Calif, USA, 2011.
[52]  J. Decraene, M. Anderson, and M. Low, “Maritime counter-piracy study using agent-based simulations,” in Proceedings of the Spring Simulation Multiconference (SpringSim '10), pp. 82–89, April 2010.
[53]  D. Walton, E. Paulo, C. J. McCarthy, and R. Vaidyanathan, “Modeling force response to small boat attack against high value commercial ships,” in Proceedings of the 2005 Winter Simulation Conference, pp. 988–991, December 2005.
[54]  M. T. J. Spaan, “Partially observable markov decision processes,” in Reinforcement Learning State-of-the-Art, M. Wiering and M. van Otterlo, Eds., Springer, Berlin, Germany, 2012.
[55]  B. Weitjens, Geopredict: Geographical crime forecasting for varying situations [M.S. thesis], Vrije Universiteit, Amsterdam, The Netherlands, 2010.
[56]  P. Kaluza, A. K?lzsch, M. T. Gastner, and B. Blasius, “The complex network of global cargo ship movements,” 2010,
[57]  M. West, “Asset allocation to cover a region of piracy,” Report DSTO-TN-1030, Maritime Operations Division, Defence Science and Technology Organisation, Australian Government Department of Defense, Canberra, Australia, 2011.
[58]  E. Alpaydin, Introduction to Machine Learning, MIT Press, Cambridge, Mass, USA, 2nd edition, 2010.
[59]  C. Jones and M. J. Matari?, “From local to global behavior in intelligent self-assembly,” in Proceedings of the IEEE International Conference on Robotics and Automation, pp. 721–726, September 2003.


comments powered by Disqus

Contact Us


WhatsApp +8615387084133

WeChat 1538708413