On my side podcast, "The Filan Cabinet", I invited Ronny Fernandez and Divia Eden to talk about the strong orthogonality thesis, and whether it's true. Seems like people here might also be interested. Podcast description below, and you can listen here.

In this episode, Divia Eden and Ronny Fernandez talk about the (strong) orthogonality thesis - that arbitrarily smart intelligences can be paired with arbitrary goals, without additional complication beyond that of specifying the goal - with light prompting from me. Topics they touch on include:

  • Why aren't bees brilliant scientists?
  • Can you efficiently make an AGI out of one part that predicts the future conditioned on some plans, and another that evaluates whether plans are good?
  • If minds are made of smaller sub-agents with more primitive beliefs and desires, does that shape their terminal goals?
    • Also, how would that even work?
  • Which is cooler: rockets, or butterflies?
  • What processes would make AIs terminally value integrity?
  • Why do beavers build dams?
  • Would these questions be easier to answer if we made octopuses really smart?
New Comment
1 comment, sorted by Click to highlight new comments since: Today at 12:20 PM

My first post should be validated soon, and is a proof that the strong form does not hold: in some games some terminal alignment perform less than non-terminal equivalent alignment.

An hypothesis is that most goals, if they become “terminal” (“in itself”, impervious to change), prevent evolution, and mutualistic relationships with other agents.

New to LessWrong?