Previously: Attainable Utility Preservation: Empirical Results; summarized in AN #105

Our most recent AUP paper was accepted to NeurIPS 2020 as a spotlight presentation:
Reward function specification can be difficult, even in simple environments. Rewarding the agent for making a widget may be easy, but penalizing the multitude of possible negative side effects is hard. In toy environments, Attainable Utility Preservation (AUP) avoided side effects by penalizing shifts in the ability to achieve randomly generated goals. We scale this approach to large, randomly generated environments based on Conway’s Game of Life. By preserving optimal value for a single randomly generated reward function, AUP incurs modest overhead while leading the agent to complete the specified task and avoid side effects.
Here are some slides from our spotlight talk (publicly available; it...
Reframing Impact has focused on supplying the right intuitions and framing. Now we can see how these intuitions about power and the AU landscape both predict and explain AUP's empirical success thus far.
Let's start with the known and the easy: avoiding side effects[1] in the small AI safety gridworlds (for the full writeup on these experiments, see Conservative Agency). The point isn't to get too into the weeds, but rather to see how the weeds still add up to the normalcy predicted by our AU landscape reasoning.
In the following MDP levels, the agent can move in the cardinal directions or do nothing (). We give the agent a reward function which partially encodes what we want, and also an auxiliary reward function ...
The difficulty of correctly reasoning with probabilities reminds of something Geoff Hinton said about working in high dimensional space (paraphrasing): "when we try to imagine high dimensions, we all just imagine a 3D surface and say 'N dimensions' really loud in our heads". I have a habit of trying to use probabilities whenever I'm trying to reason about something, but I'm becoming increasingly sure that my Bayes net (or causal graph) is badly wired with wrong probabilities everywhere.
I see quite a few papers on PubMed discussing collider bias with regard to obesity-associated health risks. The effect is probably in full swing with covid research, unfortunately.