LESSWRONG
LW

Natural AbstractionAI
Frontpage

20

An embedding decoder model, trained with a different objective on a different dataset, can decode another model's embeddings surprisingly accurately

by Logan Zoellner
3rd Sep 2023
1 min read
1

20

Natural AbstractionAI
Frontpage

20

An embedding decoder model, trained with a different objective on a different dataset, can decode another model's embeddings surprisingly accurately
4Charlie Steiner
New Comment
1 comment, sorted by
top scoring
Click to highlight new comments since: Today at 11:13 PM
[-]Charlie Steiner2y40

Could you explain a bit more how this is relevant to building a DWIM AI?

Reply
Moderation Log
More from Logan Zoellner
View more
Curated and popular this week
1Comments
(via twitter)

Seems pretty relevant to the Natural Categories hypothesis.

P.S.

My current favorite story for "how we solve alignment" is

  1.  Solve the natural categories hypothesis
  2. Add corrigibility
  3. Combine these to build an AI that "does what I meant, not what I said"
  4. Distribute the code/a foundation model for such an AI as widely as possible so it becomes the default whenever anyone is building a AI
  5. Build some kind of "coalition of the willing" to make sure that human-compatible AI always has big margin of advantage in terms of computation