## LESSWRONGLW

On self-deception

Does the chapter really count as evidence? Normally, X is evidence for Z is P(X|Z) > P(X|not Z). In this case, X = "there are compelling arguments for Z" and you already suppose that X is true whether or not Z. X is therefore not evidence for Z. Of course, after reading the chapter you learn the particular compelling arguments A(1), A(2), ... But those arguments support Z only through X and since you know X is not evidence, A(n) are screened off. Put another way, you know that for each A(n) there is an equally compelling argument B(n) that can... (Read more)(Click to expand thread. ⌘/CTRL+F to Expand All)Cmd/Ctrl F to expand all comments on this post

It's not obvious to me that X screens off the individual arguments. In particular, X only asserts the existence of at least one compelling argument. If there are multiple, independent compelling arguments in the book, then this should presumably increase our confidence in Z beyond just knowing X. Or am I confused about something?

Also, the individual pieces of evidence could undercut my confidence in the strength of the other books' arguments conditional on (knowledge of) the evidence I just learned. For example, suppose we expect all arguments from the other books to proceed from some initially plausible premise P. But the Zoroastrian book produces strong evidence against P. Then our evidence goes quite beyond X.

# 34

(Meta-note: First post on this site)

I have read the sequence on self-deception/doublethink and I have some comments for which I'd like to solicit feedback. This post is going to focus on the idea that it's impossible to deceive oneself, or to make oneself believe something which one knows apriori to be wrong. I think Eliezer believes this to be true, e.g. as discussed here. I'd like to propose a contrary position.

Let's suppose that a super-intelligent AI has been built, and it knows plenty of tricks that no human ever thought of, in order to present a false argument which is not easily detectable to be false. Whether it can do that by presenting subtly wrong premises, or by incorrect generalization, or word tricks, or who knows what, is not important. It can, however, present an argument in a Socratic manner, and like Socrates' interlocutors, you find yourself agreeing with things you don't expect to agree with. I now come to this AI, and request it to make a library of books for me (personally). Each is to be such that if I (specifically) were to read it, I would very likely come to believe a certain proposition. It should take into account that initially I may be opposed to the proposition, and that I am aware that I am being manipulated. Now, AI produces such a library, on the topic of religion, for all major known religions, A to Z. It has a book called "You should be an atheist", and "You should be a Christian", etc, up to "You should be a Zoroastrian".

Suppose, I now want to deceive myself. I throw fair dice, and end up picking a Zoroastrian book. I now commit to reading the entire book and do so. In the process I become convinced that indeed, I should be a Zoroastrian, despite my initial skepticism. Now my skeptical friend comes to me:

Q: You don't really believe in Zoroastrianism.

A: No, I do. Praise Ahura Mazda!

Q: You can't possibly mean it. You know that you didn't believe it and you read a book that was designed to manipulate you, and now you do? Don't you have any introspective ability?

A: I do. I didn't intend to believe it, but it turns out that it is actually true! Just because I picked this book up for the wrong reason, doesn't mean I can't now be genuinely convinced. There are many examples where people would study religion of their enemy in order to discredit it and in the process become convinced of its truth. I think St. Augustine was in a somewhat similar case.

Q: But you know the book is written in such a way as to convince you, whether it's true or not.

A: I took that into account, and my prior was really low that I would ever believe it. But the evidence presented in the book was so significant and convincing that it overcame my skepticism.

Q: But the book is a rationalization of Zoroastrianism. It's not an impartial analysis.

A: I once read a book trying to explain and prove Gödel's theorem. It was written explicitly to convince the reader that the theorem was true. It started with the conclusion and built all arguments to prove it. But the book was in fact correct in asserting this proposition.

Q: But the AI is a clever arguer. It only presents arguments that are useful to its cause.

A: So is the book on Gödel's theorem. It never presented any arguments against Gödel, and I know there are some, at least philosophical ones. It's still true.

Q: You can't make a new decision based on such a book which is a rationalization. Perhaps it can only be used to expand one's knowledge. Even if it argues in support of a true proposition, a book that is a rationalization is not really evidence for the proposition's truth.

A: You know that our AI created a library of books to argue for most theological positions. Do you agree that with very high probability one of the books in the library argues for a true proposition? E.g. the one about atheism? If I were to read it now, I'd become an atheist again.

Q: Then do so!

A: No, Ahura Mazda will punish me. I know I would think he's not there after I read it, but he'll punish me anyway. Besides, at present I believe that book to be intentionally misleading. Anyway, if one of the books argues for a true proposition, it may also use a completely valid argument without any tricks. I think this is true of this book on Zoroastrianism, and is false of all other books in AI's library.

Q: Perhaps I believe the Atheism book argues for a true proposition, but it is possible that all the books written by the AI use specious reasoning, even the one that argues for a true proposition. In this case, you can't rely on any of them being valid.

A: Why should the AI do that? Valid argument is the best way to demonstrate the truth of something that is in fact true. If tricks are used, this may be uncovered which would throw doubt onto the proposition being argued.

Q: If you picked a book "You should believe in Zeus", you'd believe in Zeus now!

A: Yes, but I would be wrong. You see, I accidentally picked the right one. Actually, it's not entirely accidental. You see, if Ahura Mazda exists, he would with some positive probability interfere with the dice and cause me to pick the book on the true religion because he would like me to be his worshiper. (Same with other gods, of course). So, since P(I picked the book on Zoroastrianism|Zoroastrianism is a true religion) > P(I picked the book on Zoroastrianism|Zoroastrianism is a false religion), I can conclude by Bayes' rule that me picking that book up is evidence for Zoroastrianism. Of course, if the prior P(Zoroastrianism is a true religion) is low, it's not a lot of evidence, but it's some.

Q: So you are really saying you won the lottery.

A: Yes. A priori, the probability is low, of course. But I actually have won the lottery: some people do, you know. Now that I have won it, the probability is close to 1 (It's not 1, because I recognize that I could be wrong, as a good Bayesian should. But the evidence is so overwhelming, my model says it's really close to 1).

Q: Why don't you ask your super-intelligent AI directly whether the book's reasoning is sound?

A: According to the book, I am not supposed to do it because Ahura Mazda wouldn't like it.

Q: Of course, the book is written by the superintelligent AI in such a way that there's no trick I can think of that it didn't cover. Your ignorance is now invincible.

A: I still remain a reasonable person and I don't like being denied access to information. However, I am now convinced that while having more information is useful, it is not my highest priority anymore. I know it is possible for me to disbelieve again if given certain (obviously false!) information, but my estimate of the chance that any further true information could change my opinion is very low. In fact, I am far more likely to be deceived by false information about Ahura Mazda, because I am not superintelligent. This is why Ahura Mazda (who is superintelligent, by the way) advises that one should not tempt oneself into sin by reading any criticism of Zoroastrianism.

Q: Just read that atheist book and become normal again!

A: You are possessed by demons! Repent and become the follower of Ahura Mazda!

So, are you now convinced that you should be a Zoroastrian one can really intentionally deceive oneself and be in a mentally consistent (although weird) state?

To answer one potential objection, an ordinary person may well not need a super-intelligent AI to fool him, just a very convincing book or very clever interlocutor. As to why someone would want to submit to this, I'll discuss this in a separate post.

Update:

Here are some points that I think are useful to add from various comments.

• Require the AI to use tricks as little as possible. In particular, every true proposition it argues should be argued for in an entirely logically valid fashion. This shortcuts the part of the Q&A where I discuss whether the book you picked can possibly be entirely valid.
• From the above point, and assuming that really only one religion of our set can be "true", P(Book on X is valid) = P(X is true). If more than one religion can be true at the same time, it's possible that P(Book on X is valid) > P(X is true).
• P(X is true) for any religion X is some value strictly between 0 and 1. It's obvious, but just to ensure we are not ruling out anything a priori. This is my ordinary probability of belief in X.
• P(Book on X will convince me to believe X with probability > 1-epsilon) is some value close to 1-f(epsilon), where f(epsilon) is small, but presumably may increase as epsilon -> 0. In other words, it's harder for the AI to convince me without a shadow of a doubt. Epsilon is picked by AI to keep f(epsilon) reasonably small. I don't know what value was picked, it clearly depends on X: the more ridiculous it is, the larger epsilon is. If the AI is infinitely more intelligent than I am, perhaps it can keep f(epsilon) small even as epsilon -> 0.
• However, whatever P(Book on X will convince me to believe X with probability > 1-epsilon) for X=Zoroastrianism is, it was high enough that the book succeeded in my case.
• I do not think it is valid to make a meta-statement on what the value of the posterior P(X is true|I have read the book on X) can be, without actually reading the book. In particular, the book has at least this probability of being true: P(Book on X is valid) >= P(X is true) > 0, so you cannot claim that the posterior is the same as prior because you believe that the book will convince you of X and it does. Additionally, any meta-argument clearly depends on f(epsilon), which I don't know.
• The book can convince me to adjust my world view in such a way that will rule out the invisible elephant problem, at least where modern science is concerned. I will remember what the science says, of course, but where it conflicts with my religion I will really believe what the religion says, even if it says it's turtles all the way down and will really be afraid of falling of the edge of the Earth if that's what my religion teaches.

Any thoughts on whether I should post this on the main site?