%0 Journal Article
%T Quantum Multiple Q-Learning
%A Michael Ganger
%A Wei Hu
%J International Journal of Intelligence Science
%P 1-22
%@ 2163-0356
%D 2019
%I Scientific Research Publishing
%R 10.4236/ijis.2019.91001
%X In this paper, a
collection of value-based quantum reinforcement learning algorithms are
introduced which use Grover¡¯s algorithm to update the policy, which is stored
as a superposition of qubits associated with each possible action, and their
parameters are explored. These algorithms may be grouped in two classes, one
class which uses value functions (V(s)) and new class which
uses action value functions (Q(s,a)). The
new (Q(s,a))-based quantum algorithms
are found to converge faster than V(s)-based algorithms, and
in general the quantum algorithms are found to converge in fewer iterations
than their classical counterparts, netting larger returns during training. This
is due to fact that the (Q(s,a)) algorithms are more
precise than those based on