Formally Stating the AI Alignment Problem



  • The AI alignment problem lacks a formal statement beyond "align AI with human values".
  • I try to give a formal statement that incoprorates what people have figured out AI alignment should mean so far using...
    • Decision Theory
      • Good starting point since much AI alignment research is being done using decision theory or in a way that is compatible with decision theory.
      • Turns out to be inadequate because utility functions impose strong assumptions on agent preferences that do not hold.
    • Axiology
      • Allows more flexible statement of preferences.
      • Still makes strong assumptions about the choice set.
    • Noematology
      • Allows choice set to be any set of thoughts the agents have.
      • Gives us a way to formally specify AI alignment without strong assumptions.
      • But still requires we solve the problem of how to treat humanity as an agent.
  • Suggests existing AI alignment research is inadequate because it focuses on solving the decision theory or axiology versions of the alignment problem only.
  • Existing research should continue but we should also explore additional research directions since existing efforts are likely to be insufficient.

Full text at link.

Note: this is being submitted as part of the current round of the AI alignment prize.