Many studies have been conducted on the application of reinforcement learning (RL) to robots. A robot which is made for general purpose has redundant sensors or actuators because it is difficult to assume an environment that the robot will face and a task that the robot must execute. In this case, -space on RL contains redundancy so that the robot must take much time to learn a given task. In this study, we focus on the importance of sensors with regard to a robot’s performance of a particular task. The sensors that are applicable to a task differ according to the task. By using the importance of the sensors, we try to adjust the state number of the sensors and to reduce the size of -space. In this paper, we define the measure of importance of a sensor for a task with the correlation between the value of each sensor and reward. A robot calculates the importance of the sensors and makes the size of -space smaller. We propose the method which reduces learning space and construct the learning system by putting it in RL. In this paper, we confirm the effectiveness of our proposed system with an experimental robot. 1. Introduction In recent years, reinforcement learning (RL) [1] has been actively studied, and many studies on its application to robots have been conducted [2–4]. A matter of concern in RL is the learning time. In RL, information from sensors is projected onto a state space. A robot learns the correspondence between each state action in the state space and determines the best correspondence. When the state space expands according to the number of sensors, the number of correspondences learned by the robot is also increased. In addition, the robot needs considerable much experience in each state to perform a task. Therefore, learning the best correspondence becomes time-consuming. To overcome this problem, many studies have investigated accelerated RL [5–15] for which there are two approaches: a multirobot system and autonomous construction of the state space. In the former approach, multiple robots exchange experience information [5–9], so that each robot augments its own knowledge. Therefore, in this system robots can find the best correspondence between each state and action faster than an individual robot in a single-robot system. In addition Nishi et al. [10] proposed a learning method in which a robot learns behavior through observations of the behavior of other robots, constructing its own relationships between state and behavior. However, in this approach, a robot needs other robots with whom to exchange experience information, and hence,
References
[1]
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, 1998.
[2]
T. Kondo and K. Ito, “A reinforcement learning with evolutionary state recruitment strategy for autonomous mobile robots control,” Robotics and Autonomous Systems, vol. 46, no. 2, pp. 111–124, 2004.
[3]
K. J. Person, E. Oztop, and J. Peters, “Reinforcement learning to adjust robot movements to new situations,” in Proceedings of the 22 nd International Joint Conference on Artificial Intelligence, pp. 2650–2655, 2010.
[4]
N. Navarro, C. Weber, and S. Wermter, “Real-world reinforcement learning for autonomous humanoid robot charging in a home environment,” in Towards Autonomous Robotic Systems, vol. 6856 of Lecture Notes in Computer Science, pp. 231–240, Springer, 2011.
[5]
M. Tan, “Multi-agent reinforcement learning: independent vs. cooperative agents,” in Proceedings of the 10th International Conference on Machine Learning, 1993.
[6]
M. N. Ahmadabadi, M. Asadpur, S. H. Khodaabakhsh, and E. Nakano, “Expertness measuring in cooperative learning,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS '00), vol. 3, pp. 2261–2267, November 2000.
[7]
M. N. Ahmadabadi and M. Asadpour, “Expertness based cooperative Q-learning,” IEEE Transactions on Systems, Man, and Cybernetics, Part B, vol. 32, no. 1, pp. 66–76, 2002.
[8]
H. Iima and Y. Kuroe, “Swarm reinforcement learning algorithm based on exchanging information among agents,” Transactions of the Society of Instrument and Control Engineers, vol. 42, no. 11, pp. 1244–1251, 2006.
[9]
Y. Yongming, T. Yantao, and M. Hao, “Cooperative Q learning based on blackboard architecture,” in Proceedings of International Conference on Computational Intelligence and Security Workshops (CIS '07), pp. 224–227, December 2007.
[10]
T. Nishi, Y. Takahashi, and M. Asada, “Incremental behavior acquisition based on reliability of observed behavior recognition,” in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS '07), pp. 70–75, November 2007.
[11]
M. Asada, S. Noda, and K. Hosoda, “Action-based sensor space categorization for robot learning,” in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS '96), pp. 1502–1509, November 1996.
[12]
H. Ishiguro, R. Sato, and T. Ishida, “Robot oriented state space construction,” in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS '96), pp. 1496–1501, November 1996.
[13]
K. Samejima and T. Omori, “Adaptive internal state space construction method for reinforcement learning of a real-world agent,” Neural Networks, vol. 12, no. 7-8, pp. 1143–1155, 1999.
[14]
A. J. Smith, “Applications of the self-organising map to reinforcement learning,” Neural Networks, vol. 15, no. 8-9, pp. 1107–1124, 2002.
[15]
K. T. Aung and T. Fuchda, “A proposition of adaptive state space partition in reinforcement learning with voronoi tessellation,” in Proceedings of the 17th International Symposium on Artificial Life and Robotics, pp. 638–641, 2012.
[16]
Y. Takahashi, M. Asada, and K. Hosoda, “Reasonable performance in less learning time by real robot based on incremental state space segmentation,” in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 96), pp. 1518–1524, November 1996.