(Meta-note: First post on this site)
I have read the sequence on self-deception/doublethink and I have some comments for which I'd like to solicit feedback. This post is going to focus on the idea that it's impossible to deceive oneself, or to make oneself believe something which one knows apriori to be wrong. I think Eliezer believes this to be true, e.g. as discussed here. I'd like to propose a contrary position.
Let's suppose that a super-intelligent AI has been built, and it knows plenty of tricks that no human ever thought of, in order to present a false argument which is not easily detectable to be false. Whether it can do that by presenting subtly wrong premises, or by incorrect generalization, or word tricks, or who knows what, is not important. It can, however, present an argument in a Socratic manner, and like Socrates' interlocutors, you find yourself agreeing with things you don't expect to agree with. I now come to this AI, and request it to make a library of books for me (personally). Each is to be such that if I (specifically) were to read it, I would very likely come to believe a certain proposition. It should take into account that initially I may be opposed to the proposition, and that I am aware that I am being manipulated. Now, AI produces such a library, on the topic of religion, for all major known religions, A to Z. It has a book called "You should be an atheist", and "You should be a Christian", etc, up to "You should be a Zoroastrian".
Suppose, I now want to deceive myself. I throw fair dice, and end up picking a Zoroastrian book. I now commit to reading the entire book and do so. In the process I become convinced that indeed, I should be a Zoroastrian, despite my initial skepticism. Now my skeptical friend comes to me:
Q: You don't really believe in Zoroastrianism.
A: No, I do. Praise Ahura Mazda!
Q: You can't possibly mean it. You know that you didn't believe it and you read a book that was designed to manipulate you, and now you do? Don't you have any introspective ability?
A: I do. I didn't intend to believe it, but it turns out that it is actually true! Just because I picked this book up for the wrong reason, doesn't mean I can't now be genuinely convinced. There are many examples where people would study religion of their enemy in order to discredit it and in the process become convinced of its truth. I think St. Augustine was in a somewhat similar case.
Q: But you know the book is written in such a way as to convince you, whether it's true or not.
A: I took that into account, and my prior was really low that I would ever believe it. But the evidence presented in the book was so significant and convincing that it overcame my skepticism.
Q: But the book is a rationalization of Zoroastrianism. It's not an impartial analysis.
A: I once read a book trying to explain and prove Gödel's theorem. It was written explicitly to convince the reader that the theorem was true. It started with the conclusion and built all arguments to prove it. But the book was in fact correct in asserting this proposition.
Q: But the AI is a clever arguer. It only presents arguments that are useful to its cause.
A: So is the book on Gödel's theorem. It never presented any arguments against Gödel, and I know there are some, at least philosophical ones. It's still true.
Q: You can't make a new decision based on such a book which is a rationalization. Perhaps it can only be used to expand one's knowledge. Even if it argues in support of a true proposition, a book that is a rationalization is not really evidence for the proposition's truth.
A: You know that our AI created a library of books to argue for most theological positions. Do you agree that with very high probability one of the books in the library argues for a true proposition? E.g. the one about atheism? If I were to read it now, I'd become an atheist again.
Q: Then do so!
A: No, Ahura Mazda will punish me. I know I would think he's not there after I read it, but he'll punish me anyway. Besides, at present I believe that book to be intentionally misleading. Anyway, if one of the books argues for a true proposition, it may also use a completely valid argument without any tricks. I think this is true of this book on Zoroastrianism, and is false of all other books in AI's library.
Q: Perhaps I believe the Atheism book argues for a true proposition, but it is possible that all the books written by the AI use specious reasoning, even the one that argues for a true proposition. In this case, you can't rely on any of them being valid.
A: Why should the AI do that? Valid argument is the best way to demonstrate the truth of something that is in fact true. If tricks are used, this may be uncovered which would throw doubt onto the proposition being argued.
Q: If you picked a book "You should believe in Zeus", you'd believe in Zeus now!
A: Yes, but I would be wrong. You see, I accidentally picked the right one. Actually, it's not entirely accidental. You see, if Ahura Mazda exists, he would with some positive probability interfere with the dice and cause me to pick the book on the true religion because he would like me to be his worshiper. (Same with other gods, of course). So, since P(I picked the book on Zoroastrianism|Zoroastrianism is a true religion) > P(I picked the book on Zoroastrianism|Zoroastrianism is a false religion), I can conclude by Bayes' rule that me picking that book up is evidence for Zoroastrianism. Of course, if the prior P(Zoroastrianism is a true religion) is low, it's not a lot of evidence, but it's some.
Q: So you are really saying you won the lottery.
A: Yes. A priori, the probability is low, of course. But I actually have won the lottery: some people do, you know. Now that I have won it, the probability is close to 1 (It's not 1, because I recognize that I could be wrong, as a good Bayesian should. But the evidence is so overwhelming, my model says it's really close to 1).
Q: Why don't you ask your super-intelligent AI directly whether the book's reasoning is sound?
A: According to the book, I am not supposed to do it because Ahura Mazda wouldn't like it.
Q: Of course, the book is written by the superintelligent AI in such a way that there's no trick I can think of that it didn't cover. Your ignorance is now invincible.
A: I still remain a reasonable person and I don't like being denied access to information. However, I am now convinced that while having more information is useful, it is not my highest priority anymore. I know it is possible for me to disbelieve again if given certain (obviously false!) information, but my estimate of the chance that any further true information could change my opinion is very low. In fact, I am far more likely to be deceived by false information about Ahura Mazda, because I am not superintelligent. This is why Ahura Mazda (who is superintelligent, by the way) advises that one should not tempt oneself into sin by reading any criticism of Zoroastrianism.
Q: Just read that atheist book and become normal again!
A: You are possessed by demons! Repent and become the follower of Ahura Mazda!
So, are you now convinced that you should be a Zoroastrian one can really intentionally deceive oneself and be in a mentally consistent (although weird) state?
To answer one potential objection, an ordinary person may well not need a super-intelligent AI to fool him, just a very convincing book or very clever interlocutor. As to why someone would want to submit to this, I'll discuss this in a separate post.
Here are some points that I think are useful to add from various comments.
- Require the AI to use tricks as little as possible. In particular, every true proposition it argues should be argued for in an entirely logically valid fashion. This shortcuts the part of the Q&A where I discuss whether the book you picked can possibly be entirely valid.
- From the above point, and assuming that really only one religion of our set can be "true", P(Book on X is valid) = P(X is true). If more than one religion can be true at the same time, it's possible that P(Book on X is valid) > P(X is true).
- P(X is true) for any religion X is some value strictly between 0 and 1. It's obvious, but just to ensure we are not ruling out anything a priori. This is my ordinary probability of belief in X.
- P(Book on X will convince me to believe X with probability > 1-epsilon) is some value close to 1-f(epsilon), where f(epsilon) is small, but presumably may increase as epsilon -> 0. In other words, it's harder for the AI to convince me without a shadow of a doubt. Epsilon is picked by AI to keep f(epsilon) reasonably small. I don't know what value was picked, it clearly depends on X: the more ridiculous it is, the larger epsilon is. If the AI is infinitely more intelligent than I am, perhaps it can keep f(epsilon) small even as epsilon -> 0.
- However, whatever P(Book on X will convince me to believe X with probability > 1-epsilon) for X=Zoroastrianism is, it was high enough that the book succeeded in my case.
- I do not think it is valid to make a meta-statement on what the value of the posterior P(X is true|I have read the book on X) can be, without actually reading the book. In particular, the book has at least this probability of being true: P(Book on X is valid) >= P(X is true) > 0, so you cannot claim that the posterior is the same as prior because you believe that the book will convince you of X and it does. Additionally, any meta-argument clearly depends on f(epsilon), which I don't know.
- The book can convince me to adjust my world view in such a way that will rule out the invisible elephant problem, at least where modern science is concerned. I will remember what the science says, of course, but where it conflicts with my religion I will really believe what the religion says, even if it says it's turtles all the way down and will really be afraid of falling of the edge of the Earth if that's what my religion teaches.
Any thoughts on whether I should post this on the main site?
If before you open the book, you believe that the book will provide incredibly compelling evidence of Zoroastrianism whether or not Zoroastrianism is true, and upon opening the book you find incredibly compelling evidence of Zoroastrianism, your probability of Zoroastrianism should not change, since you didn't observe any evidence which is more likely to exist if Zoroastrianism were true than if it were not true.
It may be that you are underestimating the AI's cleverness, so that you expect to see decent evidence of Zoroastrianism, but in fact you found incredible evidence of Zoroastrianism, and so you become convinced. In this case your false belief about the AI not being too convincing is doing the philosophical work of deceiving you, and it's no longer really deceiving yourself. Deceiving yourself seems to be more about starting with all correct beliefs, but talking yourself into an incorrect belief.
If you happen to luck out into having a false belief about the AI being unconvincing, and if this situation with the library of theology just falls out of the sky without your arranging it, you got lucky - but that's being deceived by others. If you try to set up the situation, you can't deliberately underestimate the AI because you'll know you're doing it. And you can't set up the theological library situation until you're confident you've deliberately underestimated the AI.
This presumes that your mind can continue to obey the rules of Bayesian updating in the face of an optimization process that's deliberately trying to make it break those rules. We can't do that very well.
You may want to look at Brandon Fitelson's short paper Evidence of evidence is not (necessarily) evidence. You seem to be arguing that, since we have strong evidence that the book has strong evidence for Zoroastrianism before we read it, it follows that we already have (the most important part of) our evidence for Zoroastrianism. But it turns out that it's extremely tricky to make this sort of reasoning work. To use the most primitive example from the paper, discovering that a playing card C is black is evidence that C is the ace of spades. Furthermore, that C is the ace of spades is excellent evidence that it's an ace. But discovering that C is black does not give you any evidence whatsoever that C is an ace.
The problem here - at least one of them - is that discovering C is black is just as much evidence for C being the x of spades for any other card-value x. Similarly, before opening the book on Zoroastrianism, we have just as much evidence for the existence of strong evidence for Christianity/atheism/etc, so our credences shouldn't suddenly start favoring any one of these. But once we learn the evidence for Zoroastrianism, we've acquired new information, in just the same way that learning that the card is an ace of spades provides us new information if we previously just knew it was black.
I do suspect that there are relevant disanalogies here, but don't have a very detailed understanding of them.
Suppose, I am going to read a book by a top Catholic theologian. I know he is probably smarter than me, because of the number of priests in the world, and their average IQ and intellectual abilities, etc, I figure the smartest of them is probably really really smart and more well read and has the very best arguments the Church found in 2000 years. If I read his book, should I take it into account and discount his evidence because of this meta information? Or should I evaluate the evidence?
It's the very fallacy Eliezer argues against where people know about clever arguers and use this fact against everyone else.
You should take the meta-information into account, because what you're getting is filtered evidence. See What Evidence Filtered Evidence. If the book only contained very weak arguments, this would suggest that no strong arguments could be found, and would therefore be evidence against what the book was arguing for.
Such as a rogue AI (played by EY) convincing a skeptic to let it out of the box. Apparently the super-intelligence threshold does not need to be super-high (no offense to EY).
A con man does not even need to be smarter than a conned man to con him.
I've noticed this sort of thing with documentaries about the JFK assassination. One documentary will seem to produce very strong and reasonable evidence that Oswald did it, and the next documentary seems to have a similar strength argument that he did not. Sigh. The real world is confusing some times; when smart people are trying to make you more confused then life is hard.
I used to have a hard time with cases like that. Then I figured out the right mental category to put them in, after the story of Alexander Litvinenko's poisoning made headlines. I took an outside view: spy poisoned, accuses other spies, and a radioactive leads to someone's door. Once I realized that there were obviously competent parties fucking with the evidence, I classified it as "spy business" and deemed it unsolvable. Having this mental category has served me well, and it's fairly obvious that the JFK assassination goes in the same bucket.
c.f. Epistemic Luck.
This argument is irrelevant to the point Eliezer was making in the sequence, as it doesn't distinguish levels of self-deception possible in normal human experience and those reachable with superoptimization. In effect, you are exploiting the sorites fallacy (or fallacy of gray). That superoptimization might be able to break your mind in a certain way says little about whether your mind can normally break that way.
We rarely observe Christians trying to walk on water even though they should be able to, given enough faith. In fact they act as if it's impossible. I assume that this is the sort of thing you are talking about? But we also see people trying faith healing even though it doesn't work. Their model of the world really is different from yours. Likewise with scientologists and psychiatry. They aren't faking it. If Z tells me that I must pray in order to be healed, and not take drugs (I have no idea if it does, probably not) and I do in fact do so, being convinced by the book that I must, would that be sufficient?
Not sure whether you really mean 'know apriori to be wrong', which would be a very bold claim on almost any issue. But I think people can definitely self-decieve. Used to spend a lot of time arguing on religion sites, and I always found the counter-argument to Pascal's Wager that you 'can't just decide what to believe' very weak, especially as Pascal himself set out a 'influence your own belief' how-to. It's not even that exceptional: I suspect that most people could get themselves into being 'true believers' in one political stance or other by surrounding... (read more)
Was it? See the following passage:
is an argument from ignorance, and
is not true, it's not even properly wrong since it assumes the mind projection fallacy.
Firstly, upvoted for an excellent problem!
So the ... (read more)
Is Eliezer's claim that it is impossible for a perfect reasoner to deceive themself, or that it is impossible for real-life humans to deceive themselves?
I assume he doesn't argue that crazy people can't deceive themselves. But then where is the boundary between crazy and perfect? And if the claim only applies to perfect reasoners, of what use is it?
An introduction to Zoroastrianism, by Omega:
"Dear reader, if you have picked this book first, I recommend you to stop being rational, right now. Ignore all rational rules and techniques you know, and continue reading this book with open mind -- that means, without critical thinking. Because if you fail to believe in Zoroastrianism, I will torture you for eternity, and I mean it!"
A friendly Omega could write this too, if it already knows that reader will surrender, so at the end the reader is not tortured, and the reader's wish (to believe in Zoro... (read more)