I was waiting for one of the DM researchers to answer the question about reward shaping and Oriol Vinyals just said:

Most agents get rewarded for win/loss, without discount (i.e., they don't care to play long games). Some, however, use rewards such as the agent that "liked" to build disruptors. [...]

  1. Yes. Supervised learning makes agents play more or less reasonably. RL can then figure out what it means to win / be good at the game.

  2. If you win, you get a reward of 1. If you win, and build 1 disruptor at least, you get a reward of 2.

"AlphaStar: Mastering the Real-Time Strategy Game StarCraft II", DeepMind [won 10 of 11 games against human pros]

by gwern 9mo24th Jan 201952 comments