This is a linkpost for https://mapandterritory.org/formally-stating-the-ai-alignment-problem-fe7a6e3e5991
- The AI alignment problem lacks a formal statement beyond "align AI with human values".
- I try to give a formal statement that incoprorates what people have figured out AI alignment should mean so far using...
- Decision Theory
- Good starting point since much AI alignment research is being done using decision theory or in a way that is compatible with decision theory.
- Turns out to be inadequate because utility functions impose strong assumptions on agent preferences that do not hold.
- Allows more flexible statement of preferences.
- Still makes strong assumptions about the choice set.
- Allows choice set to be any set of thoughts the agents have.
- Gives us a way to formally specify AI alignment without strong assumptions.
- But still requires we solve the problem of how to treat humanity as an agent.
- Suggests existing AI alignment research is inadequate because it focuses on solving the decision theory or axiology versions of the alignment problem only.
- Existing research should continue but we should also explore additional research directions since existing efforts are likely to be insufficient.
Note: this is being submitted as part of the current round of the AI alignment prize.