About

Reinforcement learning (RL) is an elegant problem definition for autonomous agents that learn from their own interactions with an environment. But the methods to solve this simple problem definition are not so simple. To solve this problem you must simultaneously tackle many subproblems that are all complex enough to warrant their own subfields in AI, such as perception, prediction, planning, and memory.

Unlike other forms of machine learning, we cannot feed an algorithm a well curated dataset. An agent must form its own data from interactions. Even worse, this data is temporally correlated and does not explicitly include the correct response. The agent must reason about its data to determine the correct response and it must actively explore the environment to ensure it has good data coverage. If that wasn’t hard enough, it must do all this even when it doesn’t have much data to go on. The world waits for no ~~man~~ agent, and it must make do with what it has.

The RL problem is hard and if you feel lost, you’re not alone.

This website aims to guide you through this challenging landscape. To that end, I’ve collected answers I’ve given to common RL questions in the past. Questions that many beginners eventually ask, but have trouble finding complete answers to in literature. I am launching the site with eight questions and answers, but I’ll add more with time.

Each answer begins with a concise response, followed by a step-by-step derivation from first principles. While I will assume you have enough background knowledge to ask the question, I won’t assume you have any more.

I can’t promise that you’ll come away from these answers with perfect clarity on the matter. Most of us have to suffer a long time to gain competence. But perhaps this site will help you suffer a little less.

About the name

This site is called “Decisions & Dragons.” “Decisions” represents the core goal of RL: developing agents that learn to make effective decisions. “Dragons” represents the perilous complexities and challenges that must be navigated in pursuit of solving the RL problem.

I trust you understand that there were no other motivations for this name and any similarities it has with other titles is purely coincidental.

About me

I’m James MacGlashan. If you want to ask me RL questions, the best place is either on Bluesky, or on the Reinforcement Learning Discord server. (See my other social links at the top of this page.)

I received my PhD in computer science from the University of Maryland, Baltimore County in 2013 where I worked on reinforcement learning with Marie desJardins. I then moved on to a postdoctoral position at Brown University, where I continued to work on reinforcement learning with Michael Littman as my advisor and (subsequently) Stefanie Tellex as my co-advisor. Following my postdoc, I joined the startup Cogitai, where we worked to build reinforcement learning and continual learning as a service. Cogitai was eventually acquired by Sony and we formed the game AI team at Sony AI, where I continue to work on reinforcement learning.

Despite all these years working on reinforcement learning, I have shockingly failed to solve it.

Fortunately, it hasn’t all been bad news. RL methods have vastly improved and I’ve played a role in bringing reinforcement learning to products with GT Sophy – an RL agent that outraced the best racers in the game Gran Turismo Sport. GT Sophy was subsequently adapted to be a racing opponent in Gran Turismo 7 that you can race against today!

We’re continuing to work on exciting reinforcement learning applications and problems at Sony AI and I hope we can help turn reinforcement learning into a robust technology that can be more broadly used. Perhaps one that is less fraught with dragons.