[ Question ]

What problem would you like to see Reinforcement Learning applied to?

by Julian Schrittwieser1 min read8th Jul 20204 comments



We often talk about the dangers and challenges of AI and self-improving agents, but I'm curious what you view as potential beneficial applications of AI - if any! As a ML researcher I encounter a lot of positivity and hype in the field, so the very different perspective of the rationality community would be very interesting.

Specifically I'd like to focus on reinforcement learning, because this most closely matches the concept of an AI that has agency and makes decisions. A reinforcement learning agent is usually defined as a program that interacts with an environment, maximising the sum of rewards it receives.

The environment represents the problem to be solved, and the rewards are a measure of how good the solution is. For some problems - a board game, 3-SAT - assessing a solution (giving a reward) is easy, for others computing a reward may be as difficult as solving the problem in the first place. Those are likely not good candidates to be solved with RL ;)

To facilitate discussion, I would suggest one top level reply per problem, specifying:

  • a short description of the problem
  • the action space - how does the agent interact with the problem / environment
  • the reward - how do we know the agent has done well?

Disclaimer: I work on RL, so if you make suggestions that are feasible and would have substantial positive impact on the world, I may pursue them.

New Answer
Ask Related Question
New Comment

2 Answers

The problem: Winning Diplomacy games against humans.

The action space: You control an empire in early twentieth-century Europe. You can write orders to your armies and fleets to move and cities to build more armies and fleets. Crucially, you can also engage in text messaging with the other empires. When a timer runs out, the orders are all simultaneously carried out.

The reward: 0 for losing, 1 for winning by yourself, some sort of fractional reward for victories split with other players.

I'm no AI scientist, so I might be totally wrong, but I suspect that you could fine-tune a language model like GPT-2 on a corpus of online diplomacy game chat logs, and then use that model somehow as a component of the RL agent that you train to play the game.

Not sure this would be good or bad for the world, just thought it would be interesting. I sure would love to see it, haha.

Problem: Automatic planting

Action space: the agent obtains data from the sensors and decides how to use the actuators (temperature modifiers, humidity, exposure to sunlight/other modifier) to maximize specific crop characteristics

The reward: the agent knows that he is performing better when he minimizes the time needed for the plants to reach specific characteristics. For example, when trying to minimize the time required for three plants to reach a specific height of 0.2m, a higher score would be attributed to the action policy that led the plant to grow faster to (0.02, 0.05, 0.10, 0.15, 0.20)m. Or say a watermelon plantation, the policy of mapping conditions of temperature, humidity (etc) that led to the emergence of the largest watermelon (given a threshold) in the shortest time possible would reward the agent with higher scores.

If it is possible to achieve high efficiency on food production using RL agents that control cheap sensors on a simple wooden box and cheap products (earth, seeds, water), we could mass produce boxes and distribute them with the embedded agent and a few rules to the final user. Users with this system would get enough food that would pay the cost of the system itself. Users could buy more boxes by selling the exceeding food, and they could distribute the boxes with neighbours, providing substantial positive impact on the world.

I really believe we should decentralize food production and it would be easier with low cost systems that automate practically the whole process, and the user would just do easy things. People would get healthier foods, they would spend less money on food (leaving more money to invest in other needs), they would develop less diseases associated with the consumption of high industrialized products or products with high amount of herbicides.