The Lebowski Theorem — Charitable Reads of Anti-AGI-X-Risk Arguments, Part 2
This is the second post in a series where I try to understand arguments against AGI x-risk by summarizing and evaluating them as charitably as I can. (Here's Part 1.) I don't necessarily agree with these arguments; my goal is simply to gain a deeper understanding of the debate by...

This is interesting — maybe the "meta Lebowski" rule should be something like "No superintelligent AI is going to bother with a task that is harder than hacking its reward function in such a way that it doesn't perceive itself as hacking its reward function." One goes after the cheapest shortcut that one can justify.