The idea of consequentialist agents arising in sufficiently strong optimizing systems intuitively makes sense to me. However, I don't have a good mental model of the differences between a world where optimization daemons can arise and a world where they can't (i.e. what facts about the world provide Bayesian evidence for the concept of ODs). The only example I've seen is the evolution of humans, but I find it concerning that I can't make any other predictions about the world based on the idea of ODs.

What other Bayesian evidence/potential intuition pumps exist for the possibility of optimization daemons arising?

[removed discussion of religion to make the question more clear/straightforward]

New to LessWrong?

New Answer
New Comment

1 Answers sorted by

Rohin Shah

Apr 22, 2019

120

I think a lot of the intuition right now is "there is an argument that inner optimizers will arise by default; we don't know how likely it is but evolution is one example so it's not non-negligible".

For the argument part, have you read More realistic tales of doom? Part 2 is a good explanation of why inner optimizers might arise.

I have read that post; it makes sense, but I'm not sure how to distinguish "correct" from "persuasive but wrong" in this case without other evidence

6 comments, sorted by Click to highlight new comments since: Today at 5:11 AM

"Catholicism predicts that all soulless optimizers will explicitly represent and maximize their evolutionary fitness function" is a pretty unusual view (even as Catholic views go)! If you want answers to take debates about God and free will into account, I suggest mentioning God/Catholicism in the title.

More broadly, my recommendation would be to read all of https://www.lesswrong.com/rationality and flag questions and disagreements there before trying to square any AI safety stuff with your religious views.

Regarding the first part of your comment: If I understand the quoted section correctly, I don't think I know enough about biology or theology to confidently take a position on that view. Is the observed behavior of some soulless optimizer (e.g. intelligent non-human primates) significantly different from what one would expect if they only maximized inclusive genetic fitness? If so, that would definitely answer my question.

Thank you for the prompt response to a poorly-worded question!

I'm not particularly interested in answers that take God/free will into account; I was just hoping to find evidence/justifications for the existence of optimization daemons other than evolution. It sounds like my question would be clearer and more relevant if I removed the mention of religion?

I don't understand why it would be -- it looks like MENACE is just a simple physical algorithm that successfully optimizes for winning tic-tac-toe. I thought the idea of an OD was that a process optimizing for goal A hard enough could produce a consequentialist* agent that cares about a different goal B. What is the goal B here (or am I misunderstanding the concept)?

*in the sense Christiano uses "consequentialist"

You are right, it's not a good example, since the optimization pressure does not result in optimizing for a different goal.