Reinforcement learning algorithm for multirobot will become very slow when the number of robots is increasing resulting in an exponential increase of state space. A sequential Q-learning based on knowledge sharing is presented. The rule repository of robots behaviors is firstly initialized in the process of reinforcement learning. Mobile robots obtain present environmental state by sensors. Then the state will be matched to determine if the relevant behavior rule has been stored in the database. If the rule is present, an action will be chosen in accordance with the knowledge and the rules, and the matching weight will be refined. Otherwise the new rule will be appended to the database. The robots learn according to a given sequence and share the behavior database. We examine the algorithm by multirobot following-surrounding behavior, and find that the improved algorithm can effectively accelerate the convergence speed. 1. Introduction In recent years, multirobot systems (MRSs) have received considerable attention because such systems possess some special capabilities such as more flexibility, adaptability, and efficiency in dealing with a complex task [1]. Multirobot learning is the process of acquiring new cooperative behaviors for a particular task by trial and error in the presence of other robots. The desired cooperative behaviors may emerge by local interactions among the robots which are with limited sensing capabilities. Multirobot system can perform more complex tasks via cooperation and coordination [2, 3]. Normally, multirobot learning method can be classified as collective swarm learning and intentionally cooperative learning based on the various levels of explicit communication. The collective swarm systems allow participating robots to learn swarm behaviors with only minimal explicit communication among robots [4, 5]. In these systems a large number of homogeneous mobile robots interact implicitly with each other based on the sharing environment. The robots are organized on the basis of local control laws, such as the stigmergy introduced by Garnier et al. [6]. Stigmergy is a mechanism of indirect interaction mediated by modifications of the sharing environment of agents [7]. The information coming from the local environment can guide the participating individual activity. The complex intelligent behavior emerges at the colony level from the local interactions that take place among individuals exhibiting simple behaviors. At present, the swarm behaviors are often modeled using methods inspired by biology. Along with the advent of
References
[1]
H. K. Dong, “Self-organization of unicycle swarm robots based on a modified particle swarm framework,” International Journal of Control, Automation and Systems, vol. 8, no. 3, pp. 622–629, 2010.
[2]
Y. Wang and C. W. Desilva, “A machine-learning approach to multi-robot coordination,” Engineering Applications of Artificial Intelligence, vol. 21, no. 3, pp. 470–484, 2008.
[3]
J. Liu, X. Jin, and S. Zhang, The Model and Experiment of Multi-Agent, Tsinghua University Press, Beijing, China, 2003.
[4]
H. Hamann, Space-Time Continuous Models of Swarm Robotic Systems, Springer, Berlin, Germany, 2010.
[5]
D. H. Kim and S. Shin, “Self-organization of decentralized swarm agents based on modified particle swarm algorithm,” Journal of Intelligent and Robotic Systems, vol. 46, no. 2, pp. 129–149, 2006.
[6]
S. Garnier, J. Gautrais, and G. Theraulaz, “The biological principles of swarm intelligence,” Swarm Intelligence, vol. 1, no. 1, pp. 3–31, 2007.
[7]
L. Marsh and C. Onof, “Stigmergic epistemology, stigmergic cognition,” Cognitive Systems Research, vol. 9, no. 1-2, pp. 136–149, 2008.
[8]
O. Holland and C. Melhuish, “Stigmergy, self-organization, and sorting in collective robotics,” Artificial Life, vol. 5, no. 2, pp. 173–202, 1999.
[9]
R. Beckers, O. E. Holland, and J. L. Deneubourg, “From local actions to global tasks: stigmergy and collective robotics,” in Proceedings of the Artificial Life, pp. 181–189, MIT Press, Cambridge, UK, 1994.
[10]
L. Bayindir and E. ?ahin, “A review of studies in swarm robotics,” Turkish Journal of Electrical Engineering and Computer Sciences, vol. 15, no. 2, pp. 115–147, 2007.
[11]
S. N. Givigi and H. M. Schwartz, “Swarms of robots based on evolutionary game theory,” in Proceedings of the 9th International Conference on Control and Applications, pp. 1–7, ACTA Press, Montreal, Canada, June 2007.
[12]
S. Sang-Wook, Y. Hyun-Chang, and S. Kwee-Bo, “Behavior learning and evolution of swarm robot system for cooperative behavior,” in Proceedings of the International Conference on Advanced Intelligent Mechatronics (AIM '09), pp. 673–678, IEEE/ASME, Singapore, July 2009.
[13]
K. Kobayashi, K. Nakano, T. Kuremoto, and M. Obayashi, “Cooperative behavior acquisition of multiple autonomous mobile robots by an objective-based reinforcement learning system,” in Proceedings of the International Conference on Control, Automation and Systems (ICCAS '07), pp. 777–780, IEEE, Seoul, Korea, October 2007.
[14]
F. Fernández, D. Borrajo, and L. E. Parker, “A reinforcement learning algorithm in cooperative multi-robot domains,” Journal of Intelligent and Robotic Systems, vol. 43, no. 2–4, pp. 161–174, 2005.
[15]
D. W. Lee, S. W. Seo, and K. B. Sim, “Online evolution for cooperative behavior in group robot systems,” International Journal of Control, Automation and Systems, vol. 6, no. 2, pp. 282–287, 2008.
[16]
M. N. Ahmadabadi, M. Asadpour, and E. Nakano, “Cooperative Q-learning: the knowledge sharing issue,” Advanced Robotics, vol. 15, no. 8, pp. 815–832, 2001.