(via twitter)

Seems pretty relevant to the Natural Categories hypothesis.


My current favorite story for "how we solve alignment" is

  1.  Solve the natural categories hypothesis
  2. Add corrigibility
  3. Combine these to build an AI that "does what I meant, not what I said"
  4. Distribute the code/a foundation model for such an AI as widely as possible so it becomes the default whenever anyone is building a AI
  5. Build some kind of "coalition of the willing" to make sure that human-compatible AI always has big margin of advantage in terms of computation
New Comment
1 comment, sorted by Click to highlight new comments since: Today at 5:07 AM

Could you explain a bit more how this is relevant to building a DWIM AI?