%0 Journal Article %T Quantum Multiple Q-Learning %A Michael Ganger %A Wei Hu %J International Journal of Intelligence Science %P 1-22 %@ 2163-0356 %D 2019 %I Scientific Research Publishing %R 10.4236/ijis.2019.91001 %X In this paper, a collection of value-based quantum reinforcement learning algorithms are introduced which use Grover’s algorithm to update the policy, which is stored as a superposition of qubits associated with each possible action, and their parameters are explored. These algorithms may be grouped in two classes, one class which uses value functions (V(s)) and new class which uses action value functions (Q(s,a)). The new (Q(s,a))-based quantum algorithms are found to converge faster than V(s)-based algorithms, and in general the quantum algorithms are found to converge in fewer iterations than their classical counterparts, netting larger returns during training. This is due to fact that the (Q(s,a)) algorithms are more precise than those based on