Summary. This teaser post sketches our current ideas for dealing with more complex environments. It will ultimately be replaced by one or more longer posts describing these in more detail. Reach out if you would like to collaborate on these issues.
For real-world tasks that are specified in terms of more than a single evaluation metric, e.g., how much apples to buy and how much money to spend at most, we can generalize Algorithm 2 as follows from aspiration intervals to convex aspiration sets:
(We also consider other variants of this general idea)
A common way of planning complex tasks is to decompose them into a hierarchy of two or more levels of subtasks. Similar to existing approaches from hierarchical reinforcement learning, we imagine that an AI system can make such hierarchical decisions as depicted in the following diagram (shown for only two hierarchical levels, but obviously generalizable to more levels):