A Library for Provably Fast Reinforcement Learning

Caleb Ju, Georgia Institute of Technology

Photo of Caleb Ju

Reinforcement learning concerns sequential decision making under uncertainty, and it models problems arising from robotics, power system operation, to strategic games. However, many reinforcement learning algorithms used today are implemented with heuristics and thus lack provable guarantees for optimality, especially in the regime of large-scale environments. To address this gap, we propose a software library implementing a recently developed family of algorithms — policy mirror descent and policy dual averaging — for solving large-scale reinforcement learning. Most importantly, these algorithms come with provable guarantees and are provably fast. We discuss the philosophy and design of the library as well as present preliminary numerical results.

Abstract Author(s): Caleb Ju