(Meta-note: First post on this site)

I have read the sequence on self-deception/doublethink and I have some comments for which I'd like to solicit feedback. This post is going to focus on the idea that it's impossible to deceive oneself, or to make oneself believe something which one knows apriori to be wrong. I think Eliezer believes this to be true, e.g. as discussed here. I'd like to propose a contrary position.

Let's suppose that a super-intelligent AI has been built, and it knows plenty of tricks that no human ever thought of, in order to present a false argument which is not easily detectable to be false. Whether it can do that by presenting subtly wrong premises, or by incorrect generalization, or word tricks, or who knows what, is not important. It can, however, present an argument in a Socratic manner, and like Socrates' interlocutors, you find yourself agreeing with things you don't expect to agree with. I now come to this AI, and request it to make a library of books for me (personally). Each is to be such that if I (specifically) were to read it, I would very likely come to believe a certain proposition. It should take into account that initially I may be opposed to the proposition, and that I am aware that I am being manipulated. Now, AI produces such a library, on the topic of religion, for all major known religions, A to Z. It has a book called "You should be an atheist", and "You should be a Christian", etc, up to "You should be a Zoroastrian".

Suppose, I now want to deceive myself. I throw fair dice, and end up picking a Zoroastrian book. I now commit to reading the entire book and do so. In the process I become convinced that indeed, I should be a Zoroastrian, despite my initial skepticism. Now my skeptical friend comes to me:

Q: You don't really believe in Zoroastrianism.

A: No, I do. Praise Ahura Mazda!

Q: You can't possibly mean it. You know that you didn't believe it and you read a book that was designed to manipulate you, and now you do? Don't you have any introspective ability?

A: I do. I didn't intend to believe it, but it turns out that it is actually true! Just because I picked this book up for the wrong reason, doesn't mean I can't now be genuinely convinced. There are many examples where people would study religion of their enemy in order to discredit it and in the process become convinced of its truth. I think St. Augustine was in a somewhat similar case.

Q: But you know the book is written in such a way as to convince you, whether it's true or not.

A: I took that into account, and my prior was really low that I would ever believe it. But the evidence presented in the book was so significant and convincing that it overcame my skepticism.

Q: But the book is a rationalization of Zoroastrianism. It's not an impartial analysis.

A: I once read a book trying to explain and prove Gödel's theorem. It was written explicitly to convince the reader that the theorem was true. It started with the conclusion and built all arguments to prove it. But the book was in fact correct in asserting this proposition.

Q: But the AI is a clever arguer. It only presents arguments that are useful to its cause.

A: So is the book on Gödel's theorem. It never presented any arguments against Gödel, and I know there are some, at least philosophical ones. It's still true.

Q: You can't make a new decision based on such a book which is a rationalization. Perhaps it can only be used to expand one's knowledge. Even if it argues in support of a true proposition, a book that is a rationalization is not really evidence for the proposition's truth.

A: You know that our AI created a library of books to argue for most theological positions. Do you agree that with very high probability one of the books in the library argues for a true proposition? E.g. the one about atheism? If I were to read it now, I'd become an atheist again.

Q: Then do so!

A: No, Ahura Mazda will punish me. I know I would think he's not there after I read it, but he'll punish me anyway. Besides, at present I believe that book to be intentionally misleading. Anyway, if one of the books argues for a true proposition, it may also use a completely valid argument without any tricks. I think this is true of this book on Zoroastrianism, and is false of all other books in AI's library.

Q: Perhaps I believe the Atheism book argues for a true proposition, but it is possible that all the books written by the AI use specious reasoning, even the one that argues for a true proposition. In this case, you can't rely on any of them being valid.

A: Why should the AI do that? Valid argument is the best way to demonstrate the truth of something that is in fact true. If tricks are used, this may be uncovered which would throw doubt onto the proposition being argued.

Q: If you picked a book "You should believe in Zeus", you'd believe in Zeus now!

A: Yes, but I would be wrong. You see, I accidentally picked the right one. Actually, it's not entirely accidental. You see, if Ahura Mazda exists, he would with some positive probability interfere with the dice and cause me to pick the book on the true religion because he would like me to be his worshiper. (Same with other gods, of course). So, since P(I picked the book on Zoroastrianism|Zoroastrianism is a true religion) > P(I picked the book on Zoroastrianism|Zoroastrianism is a false religion), I can conclude by Bayes' rule that me picking that book up is evidence for Zoroastrianism. Of course, if the prior P(Zoroastrianism is a true religion) is low, it's not a lot of evidence, but it's some.

Q: So you are really saying you won the lottery.

A: Yes. A priori, the probability is low, of course. But I actually have won the lottery: some people do, you know. Now that I have won it, the probability is close to 1 (It's not 1, because I recognize that I could be wrong, as a good Bayesian should. But the evidence is so overwhelming, my model says it's really close to 1).

Q: Why don't you ask your super-intelligent AI directly whether the book's reasoning is sound?

A: According to the book, I am not supposed to do it because Ahura Mazda wouldn't like it.

Q: Of course, the book is written by the superintelligent AI in such a way that there's no trick I can think of that it didn't cover. Your ignorance is now invincible.

A: I still remain a reasonable person and I don't like being denied access to information. However, I am now convinced that while having more information is useful, it is not my highest priority anymore. I know it is possible for me to disbelieve again if given certain (obviously false!) information, but my estimate of the chance that any further true information could change my opinion is very low. In fact, I am far more likely to be deceived by false information about Ahura Mazda, because I am not superintelligent. This is why Ahura Mazda (who is superintelligent, by the way) advises that one should not tempt oneself into sin by reading any criticism of Zoroastrianism.

Q: Just read that atheist book and become normal again!

A: You are possessed by demons! Repent and become the follower of Ahura Mazda!


So, are you now convinced that you should be a Zoroastrian one can really intentionally deceive oneself and be in a mentally consistent (although weird) state?

To answer one potential objection, an ordinary person may well not need a super-intelligent AI to fool him, just a very convincing book or very clever interlocutor. As to why someone would want to submit to this, I'll discuss this in a separate post.




Here are some points that I think are useful to add from various comments.


  • Require the AI to use tricks as little as possible. In particular, every true proposition it argues should be argued for in an entirely logically valid fashion. This shortcuts the part of the Q&A where I discuss whether the book you picked can possibly be entirely valid.
  • From the above point, and assuming that really only one religion of our set can be "true", P(Book on X is valid) = P(X is true). If more than one religion can be true at the same time, it's possible that P(Book on X is valid) > P(X is true).
  • P(X is true) for any religion X is some value strictly between 0 and 1. It's obvious, but just to ensure we are not ruling out anything a priori. This is my ordinary probability of belief in X.
  • P(Book on X will convince me to believe X with probability > 1-epsilon) is some value close to 1-f(epsilon), where f(epsilon) is small, but presumably may increase as epsilon -> 0. In other words, it's harder for the AI to convince me without a shadow of a doubt. Epsilon is picked by AI to keep f(epsilon) reasonably small. I don't know what value was picked, it clearly depends on X: the more ridiculous it is, the larger epsilon is. If the AI is infinitely more intelligent than I am, perhaps it can keep f(epsilon) small even as epsilon -> 0.
  • However, whatever P(Book on X will convince me to believe X with probability > 1-epsilon) for X=Zoroastrianism is, it was high enough that the book succeeded in my case.
  • I do not think it is valid to make a meta-statement on what the value of the posterior P(X is true|I have read the book on X) can be, without actually reading the book. In particular, the book has at least this probability of being true: P(Book on X is valid) >= P(X is true) > 0, so you cannot claim that the posterior is the same as prior because you believe that the book will convince you of X and it does. Additionally, any meta-argument clearly depends on f(epsilon), which I don't know.
  • The book can convince me to adjust my world view in such a way that will rule out the invisible elephant problem, at least where modern science is concerned. I will remember what the science says, of course, but where it conflicts with my religion I will really believe what the religion says, even if it says it's turtles all the way down and will really be afraid of falling of the edge of the Earth if that's what my religion teaches.


Any thoughts on whether I should post this on the main site?

New to LessWrong?

New Comment
83 comments, sorted by Click to highlight new comments since: Today at 3:49 PM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

If before you open the book, you believe that the book will provide incredibly compelling evidence of Zoroastrianism whether or not Zoroastrianism is true, and upon opening the book you find incredibly compelling evidence of Zoroastrianism, your probability of Zoroastrianism should not change, since you didn't observe any evidence which is more likely to exist if Zoroastrianism were true than if it were not true.

It may be that you are underestimating the AI's cleverness, so that you expect to see decent evidence of Zoroastrianism, but in fact you found incredible evidence of Zoroastrianism, and so you become convinced. In this case your false belief about the AI not being too convincing is doing the philosophical work of deceiving you, and it's no longer really deceiving yourself. Deceiving yourself seems to be more about starting with all correct beliefs, but talking yourself into an incorrect belief.

If you happen to luck out into having a false belief about the AI being unconvincing, and if this situation with the library of theology just falls out of the sky without your arranging it, you got lucky - but that's being deceived by others. If you try to set up the situation, you can't deliberately underestimate the AI because you'll know you're doing it. And you can't set up the theological library situation until you're confident you've deliberately underestimated the AI.

If before you open the book, you believe that the book will provide incredibly compelling evidence of Zoroastrianism whether or not Zoroastrianism is true, and upon opening the book you find incredibly compelling evidence of Zoroastrianism, your probability of Zoroastrianism should not change, since you didn't observe any evidence which is more likely to exist if Zoroastrianism were true than if it were not true.

This presumes that your mind can continue to obey the rules of Bayesian updating in the face of an optimization process that's deliberately trying to make it break those rules. We can't do that very well.

OP argued that self-deception occurs even if your brain remains unbroken. I would characterize "not breaking my brain" as allowing my prior belief about the book's biasedness to make a difference in my posterior confidence of the book's thesis. In that case the book might be arbitrarily convincing; but I might start with an arbitrarily high confidence that the book is biased, and then it boils down to an ordinary Bayesian tug o' war, and Yvain's comment applies. On the other hand, I'd view a brain-breaking book as a "press X to self-modify to devout Y-believer" button. If I know the book is such, I decide not to read it. If I'm ignorant of the book's nature, and I read it, then I'm screwed.
True. So in the process of deceiving yourself, you must first become irrational. The problem is then protesting that you are "still a reasonable person."
Not quite; you might choose to deceive yourself for decision-theoretic reasons. For example, the Zoroastrian Inquisition might be going around with very good lie detectors and punishing anyone who doesn't believe. We usually equate rationality with true beliefs, but this is only an approximation; decision theory is more fundamental than truth.
That only seems to mean that you were a reasonable person.

You may want to look at Brandon Fitelson's short paper Evidence of evidence is not (necessarily) evidence. You seem to be arguing that, since we have strong evidence that the book has strong evidence for Zoroastrianism before we read it, it follows that we already have (the most important part of) our evidence for Zoroastrianism. But it turns out that it's extremely tricky to make this sort of reasoning work. To use the most primitive example from the paper, discovering that a playing card C is black is evidence that C is the ace of spades. Furthermore, that C is the ace of spades is excellent evidence that it's an ace. But discovering that C is black does not give you any evidence whatsoever that C is an ace.

The problem here - at least one of them - is that discovering C is black is just as much evidence for C being the x of spades for any other card-value x. Similarly, before opening the book on Zoroastrianism, we have just as much evidence for the existence of strong evidence for Christianity/atheism/etc, so our credences shouldn't suddenly start favoring any one of these. But once we learn the evidence for Zoroastrianism, we've acquired new information, in just the same way that learning that the card is an ace of spades provides us new information if we previously just knew it was black.

I do suspect that there are relevant disanalogies here, but don't have a very detailed understanding of them.

3Scott Alexander13y
Not exactly. I do think this would be a true statement, if the book was a genuine book on Zoroastrianism and not a book which we know was designed to deceive us. But as far as I know it's only tangentially connected to the argument I'm making. Thanks for summarizing the paper; I tried to read it but it was written in a way that seemed designed to be as obscure as possible. Your explanation makes more sense. But I still don't see the problem. Learning a card is black increases the chance it's the ace of spades or clubs, but decreases the chance it's the ace of hearts or diamonds. The chance that it's the ace of spades becomes greater, but the net chance that it's an ace remains exactly the same. Evidence of evidence is still evidence, but evidence of evidence plus evidence of evidence that goes the opposite direction cancel out and make zero evidence. Again, I'm not sure about the relevance here. It's not the case that, merely by knowing the book exists without reading it, we have new evidence for the existence of some evidence which both supports and, in a different way, opposes Zoroastrianism.
(I guess I'd say it was written in a way designed to be precise. But I agree that the author isn't the best writer.) I find this sentence hard to make sense of. Based on the first part of the sentence, you seem to be suggesting that the problem in the card scenario is that our evidence^2 (= the card is black) is both evidence^1 for the card being an ace and evidence^1 against the card being an ace, and the two pieces of evidence^1 balance out to yield the same total probability of the card being an ace as before. But clearly no single piece of information, such as the card being black, can provide evidence^1 both for and against a given hypothesis. It either yields evidence^1 or it doesn't. And if it doesn't, then evidence^2 is not always evidence^1. Anyway, the relevance is this: When we learn the card is black, we acquire evidence for a bunch of different pieces of information which, taken on their own, have varying probabilistic effects on the hypothesis that the card is an ace. These effects add up in such a way as to leave the posterior probability of the hypothesis untouched. But once we actually learn one of these individual pieces of information, suddenly the posterior shoots way up. Similarly, before we read the book, we have evidence for a bunch of different pieces of information which, taken on their own, have varying probabilistic effects on the truth of Zoroastrianism. These effects add up in such a way as to leave our posterior in Zoroastrianism untouched (assuming we don't consider non-book-possessing religions). So why is it that when we learn one of these pieces of information by reading the book, our posterior shouldn't change, unlike in the card case?
Thank you, that's what I was trying to get at, but didn't know how.
O.K., here's a disanalogy that may be important. In the card case, learning that C is the ace of spades should drastically lower our credence of the card being another x of spades. On the other hand, after reading the Zoroastrianism book, we shouldn't significantly doubt that the other books contain strong evidence, as well, given the known capabilities of the AI. This isn't a very formal treatment, though.
"Evidence of evidence is more likely to be filtered evidence" is a more accurate phrasing.
I'm not exactly sure what the "more likely" here means. More likely than what?
The link keeps omitting the colon in the "http://." I don't know why it's doing that.
Evidence of evidence is not (necessarily) evidence Markup code: [Evidence of evidence is not (necessarily) evidence](http://fitelson.org/eee.pdf)

Suppose, I am going to read a book by a top Catholic theologian. I know he is probably smarter than me, because of the number of priests in the world, and their average IQ and intellectual abilities, etc, I figure the smartest of them is probably really really smart and more well read and has the very best arguments the Church found in 2000 years. If I read his book, should I take it into account and discount his evidence because of this meta information? Or should I evaluate the evidence?

It's the very fallacy Eliezer argues against where people know about clever arguers and use this fact against everyone else.

If I read his book, should I take it into account and discount his evidence because of this meta information? Or should I evaluate the evidence?

You should take the meta-information into account, because what you're getting is filtered evidence. See What Evidence Filtered Evidence. If the book only contained very weak arguments, this would suggest that no strong arguments could be found, and would therefore be evidence against what the book was arguing for.

Fair enough. But the arguments themselves must also update my belief. It should not ever be the case that this meta stuff completely cancels out an argument that I think is valid. That is irrational, just like not listening to someone who belongs to the enemy.
If you were already completely certain that you were about to read a valid argument, and then you read that argument, then the meta stuff would completely cancel it out. If you were almost completely certain that you were about to read a valid argument, and then you read it, then the meta stuff would almost (but not completely) cancel it out. This is why reading the same argument twice in a row does not affect your confidence much more than reading it once does. But the less certain you were about the argument's validity the first time, the more of an effect going over it again should have.
How can this be true when different arguments have different strength and you don't know what the statement is? Here, suppose you believe that you are about to read a completely valid argument in support of conventional arithmetic. Please update your belief now. Here is the statement: "2+2=4". What if it instead was Russell's Principia Mathematica?
But you had assumed that the book would contain extremely strong arguments in favour of Zoroastrianism. Here strong means that P(Zoroastrianism is correct | argument is valid) is big, not that P(argument is valid) is big after reading the argument. (At least this is how I interpret your setting.) Both "all arguments in Principia Mathematica are correct" and "2+2=4" have high probabilities of being true, but P(arithmetic is correct | all arguments in Principia are correct) is much higher than P(arithmetic is correct | 2+2=4).
We are running into meta issues that are really hard to wrap your head around. You believe that the book is likely to convince you, but it's not absolutely guaranteed to. Whether it will do so surely depends on the actual arguments used. You'd expect, a priori, that if it argues for X which is more likely, its arguments would also be more convincing. But until you actually see the arguments, you don't know that they will convince you. It depends on what they actually are. In your formulation, what happens if you read the book and the arguments do not convince you? Also, what if the arguments do not convince you, but only because you expect the book to be extremely convincing, is this different from the case of arguments taken without this meta-knowledge not convinving you?
I think I address some of these questions in another reply, but anyway, I will try a detailed description: Let's denote the following propositions: * Z = "Zoroastrianism is true." * B = Some particular, previously unknown, statement included in the book. It is supposed to be evidence for Z. Let this be in form of propositions so that I am able to assign it a probability (e.g. B shouldn't be a Pascal-wagerish extortion). * C(r) = "B is compelling to such extent that it shifts odds for Z by ratio r". That is, C(r) = "P(B|Z) = r*P(B|not Z)". * F = Unknown evidence against Z. * D(r) = "F shifts odds against Z by ratio r." Before reading the book 1. p(Z) is low 2. I may have a probability distribution for "B = S" (that is, "the convincing argument contained in the book is S") over set of all possible S; but if I have it, it is implicit, in sense I have an algorithm which assigns p(B = S) for any given S, but haven't gone through the whole huge set of all possible S - else the evidence in the book wouldn't be new to me in any meaningful sense 3. I have p(S|Z) and p(S|not Z) for all S, implicitly like in the previous case 4. I can't calculate the distribution p(C(r)) from p(B = S), p(S|Z) and p(S|not Z), since that would require calculating explicitly p(B = S) for every S, which is out of reach; however 5. I have obtained p(C(r)) by another means - knowledge about how the book is constructed - and p(C(r)) has most of its mass at pretty high values of r 6. by the same means I have obtained p(D(r)), which is distributed at as high or even higher values of r Can I update the prior p(Z)? If I knew for certain that C(1,000) is true, I should take it into account and multiply the odds for Z by 1,000. If I knew that D(10,000) is true, I should analogically divide the odds by 10,000. Having probability distributions instead of certainty changes little - calculate the expected value* E(r) for both C and D and use that. If the values for C and D are similar or only d
I am not sure I completely follow, but I think the point is that you will in fact update the probability up if a new argument is more convincing than you expect. Since AI can better estimate what you expect it to do than you can estimate how convincing AI will make it, it will be able to make all arguments more convincing than you expect.
I think you are adding further specifications to the original setting. Your original description assumed that AI is a very clever arguer who constructs very persuasive deceptive arguments. Now you assume that AI actively tries to make the arguments more persuasive than you expect. You can stipulate for argument's sake that AI can always make more convincing argument than you expect, but 1) it's not clear whether it's even possible in realistic circumstances, 2) it obscures the (interesting and novel) original problem ("is evidence of evidence equally valuable as the evidence itself?") by rather standard Newcomb-like mind-reading paradox.
The evidence that this was the best book he could give you is evidence.
Maybe, but this meta stuff is giving me a headache. Should I update belief about belief, or just plain belief?:)
Which may very well be an adaptive fallacy that keeps you harder to manipulate by smarter people. It is possible that in the ancestral environment: Cost of smart people manipulating you > Cost of being somewhat wrong in a hard to spot way
There is some degree to which you should expect to be swayed by empty arguments, and yes, you should subtract that out if you anticipate it. But if the book is a lot more compelling than that, then the book is probably above average both in arguing skill and in actual evidence. You cannot discount it solely as empty anymore, but neither should you assume that all of the "excess" convincing came from evidence - the book could just be unusually well written. You have to balance the improbabilities of evidence vs. writing, and update on the evidence found in that way. Usually, the uncertainty grows with the size of the thing you're trying to measure. This means that when thinking about super-duper-well-written books, the uncertainty in the writing skill gets really big. And so when balancing the improbabilities of evidence vs. writing, the evidence barely has to do any balancing at all - the writing skill just washes it out. If the amount of evidence presented is the same, it's better to hear about the truth from a child than from an orator, because the child doesn't have all those orating skills mucking up your signal-to-noise.
Right. I think my argument hinges on the fact that AI knows how much you intend to subtract before you read the book, and can make it be more convincing than this amount.
I don't think it's okay to have the AI's convincingness be truly infinite, in the full inf - inf = undefined sense. Your math will break down. Safer just to represent "suppose there's a super-good arguer" by having the convincingess be finite, but larger than every other scale in the problem.
You have a very compelling point and I have to think about it. But there is meta-reasoning involved which is really tricky. As I start to read the book, I have some P(zoroastrianism is true). It's non-zero. Now I read the first chapter, it has some positive evidence for Z in it. I expected to see some evidence, but it is actual evidence which I have not previously considered. Should I adjust my P(Z is true) up? I think I must. So, if the book has many chapters, I must either get close to 1, or else start converging to some p < 1. Are you arguing for the latter?
4Scott Alexander13y
Consider the case where a friend says he saw a UFO. There are two possibilities: either the friend is lying/insane/gullible, or UFOs are real (there are probably some other possibilities, but for the sake of argument let's focus on these). Your friend's statement can have different effects depending on what you already believe. If either probability is already at ~100%, you have no more work to do. IE, if you're already sure your friend is a liar, you dismiss this as yet another lie and don't start believing in UFOs; if you're already sure UFOs exist, you dismiss this as yet another UFO and don't start doubting your friend. If you're not ~100% sure of either statement, then your observation will increase both the probability that your friend is a liar, and that aliens exist, but in different amounts. If you think your friend usually tells the truth, but you're not sure, it will increase your probability of UFOs quite a bit (your friend wouldn't lie to you!) but as long as you're not going to be sure of UFOs, you also have to leave some room for the case where UFOs aren't real, in which case the statement increases your probability that your friend is a liar. When you hear a great argument for P, your pre-existing beliefs determine what you do in the same way as in the UFO example. It could mean that your interlocutor is a rhetorical genius so brilliant they can think up great arguments even for false positions. Or it could mean P is true. In real life, the probability of the interlocutor being such a rhetorical genius is always less than ~100%, meaning that it has to increase your probability of P at least a little. In your example, we already know that the AI is a rhetorical genius who can create an arbitrarily good argument for anything. That totally explains away the brilliant arguments, leaving nothing left to be explained by Zoroastrianism actually being true. It's like when your friend who is a known insane liar says he saw a UFO: the insane liar part alre
I understand the principle, yes. But it means if your friend is a liar, no argument he gives needs to be examined on its own merits. But what if he is a liar and he saw a UFO? What if P(he is a liar) and P(there's a UFO) are not independent? I think if they are independent, your argument works. If they are not, it doesn't. If UFOs appear mostly to liars, you can't ignore his evidence. Do you agree? In my case, they are not independent: it's easier to argue for a true proposition, even for a very intelligent AI. Here I assume that P must be strictly less than 1 always.
Does the chapter really count as evidence? Normally, X is evidence for Z is P(X|Z) > P(X|not Z). In this case, X = "there are compelling arguments for Z" and you already suppose that X is true whether or not Z. X is therefore not evidence for Z. Of course, after reading the chapter you learn the particular compelling arguments A(1), A(2), ... But those arguments support Z only through X and since you know X is not evidence, A(n) are screened off. Put another way, you know that for each A(n) there is an equally compelling argument B(n) that cancels it out. Knowing what the argument actually says is important only if you want to independently determine its compellingness. But you have assumed that you already know this. Consider a more realistic scenario: you have a coin and two hypotheses: * F: the coin is fair * H: the coin is biased towards heads and it comes up heads twice as frequently as tails Now you tell your servant: "Toss the coin million times and write the results down. Then, from the record select a subsequence S which has P(S|H) = 1.58 P(S|F) and tell me." The servant follows your instruction and says "HHTH". Now you have learned something new (the servant could for example tell "THHH" instead), but you don't update your odds in favour of H by factor 1.58, because it was almost certain in advance that the servant would be able to locate a subsequence of desired property whether or not H holds.
It's not obvious to me that X screens off the individual arguments. In particular, X only asserts the existence of at least one compelling argument. If there are multiple, independent compelling arguments in the book, then this should presumably increase our confidence in Z beyond just knowing X. Or am I confused about something? Also, the individual pieces of evidence could undercut my confidence in the strength of the other books' arguments conditional on (knowledge of) the evidence I just learned. For example, suppose we expect all arguments from the other books to proceed from some initially plausible premise P. But the Zoroastrian book produces strong evidence against P. Then our evidence goes quite beyond X.
Then substitute "there are many deceptive compelling arguments for Z, much more than any book can contain" for definition of X. The point stands. I believe your second point is only a specific case of arguments more compelling than expected (here due to their ability to undermine the counterarguments). This is fine - if the arguments are unexpectedly compelling, you update. On the other hand, the problem was pretty symmetric at the beginning - all books were considered equivalent in their persuasive strength. If book Z argues well against P and all other books took P as granted without similarly undermining the leading premise of book Z the system of books would be unbalanced (reading all books would cause you to believe Z independently of reading order), which would violate the assumed symmetry. So, if you find good anti-P arguments in book Z, the odds are that your assumption about the contents of other books being based on P is incorrect or the other books contain good counterargument which cancels this out. You should be very certain that the system of books is balanced, else the problem doesn't work.
I think it's easy to make my second point without the asymmetry. Let's re-pose the problem so that we expect in advance not only that each book will produce strong evidence in favor of the religion it advocates, but also strong evidence that none of the other books contain strong counter-evidence or similarly undermining evidence. When you read book Z, you learn individual pieces of evidence z1, z2, ..., zn. But z1, ..., zn undermine your confidence that the other books contain strong arguments, thus disconfirming your belief that you'd likely find convincing evidence for Zoroastrianism in the book whether or not the religion is true. But then it starts looking like we have evidence for Zoroastrianism. However, if, as you argue, z1, ..., zn only support Zoroastrianism through things we expected to see in advance of reading the book, then we shouldn't have any evidence. So either I'm confused or we still have a problem.
The scenario, as I understand it, is based on assumption that the confidence about y = "all books contain equally strong evidence for their respective religion" is high. If y is absolutely certain, p(y) = 1, the confidence cannot be shaken by whatever is found in book Z. If, on the other hand, p(y) is not certain, then what happens depends a lot on relative strength of various pieces of evidence. But this is another (more complex and fuzzier) problem - now you expect that Z not only contains evidence for Zoroastrianism, but also evidence against the very statement of the thought experiment. Doubting y is not included in the original post, where the newly converted Zoroastrian admits that reading book A would deconvert him to atheism; he refrains from doing that only because he fears Ahura Mazda's wrath.
I think your conclusion there trades on an ambiguity of what "evidence" refers to in your y (= "all books contain equally strong evidence for their respective religion"). The assumption y could mean either: * For each book x, x contains really compelling evidence that we're sure would equally convince us if we were to encounter it in a normal situation (i.e., without knowing about the other books or the AI's deviousness). * For each book x, x contains really compelling evidence even after considering and correctly reasoning about all the facts of the thought experiment. Obviously the second interpretation is either incoherent or completely trivializes the thought experiment, since it's an assumption about what the all-things-considered best thing to believe after reading a book is, when that's precisely the question we're being posed in the first place. On the other hand, the first interpretation, even if assumed with probability 1, is compatible with a given book lowering the posterior expected strength of evidence of the other books.
Fair point. The ambiguity is already included in the original formulation of the thought experiment. The first formulation is compatible with lowering the posterior expected strength of evidence of other books after reading one of them, but it is also compatible with being not convinced by the evidence at all. Assuming the first interpretation the problem is underspecified and no apparent paradox is present. The second interpretation can have several subinterpretations: * 2a) For each book x, reading x convinces ordinary human about the particular proposition argued for in x (possibly using biases and imperfections of human mind). * 2b) For each book x, reading x convinces ideal Bayesian reasoner (IBR) about the particular proposition. 2a was probably closest to the meaning intended in the OP. It is a paradox only if we assume that ordinary human resoning is consistent, which we don't assume, so there is no problem. 2b depends on what IBR exactly means. If it has no limitations on processing speed and memory the thought experiment becomes impossible, since the IBR has already considered all possible arguments and can't be swayed by rhetorical trickery. If, on the other hand, the IBR has some physical limitations, 2b can be used to show that its thinking leads to inconsistencies, but it is not much more surprising than the same conclusion from the case 2a.

an ordinary person may well not need a super-intelligent AI to fool him, just a very convincing book or very clever interlocutor.

Such as a rogue AI (played by EY) convincing a skeptic to let it out of the box. Apparently the super-intelligence threshold does not need to be super-high (no offense to EY).

A con man does not even need to be smarter than a conned man to con him.

Does anyone (other than Eliezer) think they could successfully play the role of the AI in this game? Would someone like to attempt replication or one of the suggested variants?
I think I have the brainware and knowledge to do it, but not as well or reliably as Eliezer, and I'd need to practice both social skills in general for a few years, and this particular task for 10 to 100 attempts before getting good at it. Does that count?
Thanks for the link. I am really curious how he did it:)

I've noticed this sort of thing with documentaries about the JFK assassination. One documentary will seem to produce very strong and reasonable evidence that Oswald did it, and the next documentary seems to have a similar strength argument that he did not. Sigh. The real world is confusing some times; when smart people are trying to make you more confused then life is hard.

I've noticed this sort of thing with documentaries about the JFK assassination. One documentary will seem to produce very strong and reasonable evidence that Oswald did it, and the next documentary seems to have a similar strength argument that he did not. Sigh. The real world is confusing some times.

I used to have a hard time with cases like that. Then I figured out the right mental category to put them in, after the story of Alexander Litvinenko's poisoning made headlines. I took an outside view: spy poisoned, accuses other spies, and a radioactive leads to someone's door. Once I realized that there were obviously competent parties fucking with the evidence, I classified it as "spy business" and deemed it unsolvable. Having this mental category has served me well, and it's fairly obvious that the JFK assassination goes in the same bucket.

This argument is irrelevant to the point Eliezer was making in the sequence, as it doesn't distinguish levels of self-deception possible in normal human experience and those reachable with superoptimization. In effect, you are exploiting the sorites fallacy (or fallacy of gray). That superoptimization might be able to break your mind in a certain way says little about whether your mind can normally break that way.

If you grant me that I am motivated to believe in something false, I think it would not take a super-inteliigent AI to convince me. I could go to a monastery in Tibet, isolate from society and ask the best of them to argue with me every day, study all their books, reading nothing at all that contradicts them. I think it might work. As I pointed out, there are historical examples of people converting to a religion they initially despised. Would my argument not work equally well in this case?
Part of Eliezer's thesis was that converting to a religion doesn't qualify if you would still correctly anticipate experimental results that you'd need to explain away.
I am not completely sure what you mean.
From Belief in Belief:
Oh, but I have a model of what a creationist believes. I can anticipate what arguments they advance and how to "excuse" them (i.e. explain them away) to some extent. Anyone who changed their belief system has this model for their previous system of belief.
An important distinction between what the post talks about and human arguments/beliefs you refer to is that experimental observations correctly reflect reality, and so ability to anticipate them is ability to model the world, despite the urge to insist on the world working differently than it does.
There are, I think, lots of people who have as good a model of how the world works as any here, who are still religious. In fact, if one is a Deist who believes that God pushed the button to start the Big Bang, they may have a model with an extra node in it subject to Occam's razor, but it predicts reality equally well, at least until physicists understand the Big Bang better. Many other people have beliefs of purely "spiritual" type, having no observable effects. But I think a Zoroastrian might not qualify, it's true. So if I read the book and become one, I might be forced to believe that per [http://en.wikipedia.org/wiki/Zoroastrianism#Basic_beliefs] water was the first element to be created (and that it is in fact an element). I might be clever enough to rationalize it away, like many people do. E.g. water really refers to hydrogen here. If I can make myself believe in Ahura Mazda, I think I can also find a way to fit all the other beliefs in.
I think you're missing the point. When a Zian believes that there is an invisible elephant in the garage, he might not have any explicit beliefs in his system suggesting that it is permeable to sand. However, when an interlocutor arrives and suggests throwing sand at it, the Zian would immediately insist that the elephant is permeable to sand, in order to excuse the experimental result that the sand goes right through where the elephant is purported to be. This only works because the Zian actually has an accurate model of the world which contains "no elephant in the garage", so that he can correctly anticipate any experimental results he'll have to excuse when someone proposes one. Thus, though the Zian claims to believe that there is an invisible elephant in the garage, in fact he does not.

We rarely observe Christians trying to walk on water even though they should be able to, given enough faith. In fact they act as if it's impossible. I assume that this is the sort of thing you are talking about? But we also see people trying faith healing even though it doesn't work. Their model of the world really is different from yours. Likewise with scientologists and psychiatry. They aren't faking it. If Z tells me that I must pray in order to be healed, and not take drugs (I have no idea if it does, probably not) and I do in fact do so, being convinced by the book that I must, would that be sufficient?

Well, yes, most Christians believe some parts of Christianity and disbelieve (or only believe they believe) other parts. That any faithful follower of Christ ought to be able to walk on water and command mountains to move are not things they believe; that prayer affects the world and they have souls are things they do believe. This is much like someone who believes Newton's Laws apply to the world, but doesn't believe General Relativity applies to the world. That is, it's near impossible to not have an accurate model of Newton's Laws, mountains, and water in order to survive; it's quite possible to not have an accurate model of relativistic bodies, statistical significance, and strictly-physical worlds and still survive - even thrive. Faith healing and souls strike me as part of the category of things Christians are "still allowed" to believe: The Fourth Sin, Eliezer Yudkowsky
People managed without an accurate model of Newton's laws for most of human history. People do occasionally survive some pretty severe memetic immunity failures. One of my own recent ancestors was a snake handler.
Well, they had a decent approximation of Newton's laws, at least. Otherwise they would struggle to hit things with thrown rocks.
One needs surprisingly little. There's a reason Newton's laws weren't arrived at until the 1700s. For the vast majority of purposes one can use a pseudo-Aristotleian view of motion and get decent results. But yes, the other two issues could be more immediately fatal.
Citation needed. I can't recall ever observing behavior like that by a religious person. What is really happening is that religions have already been selected to be non-falsifiable. The new adherent to an ancient religion doesn't have to do a lot of work to disqualify observations. The religion is already adapted to be mostly-compatible with the world as currently known, and not to be vulnerable to simple disproofs. When knowledge changes, some clever adherent comes up with a clever explanation, which is quickly disseminated to the faithful.
I don't know why this was downvoted. As long as the religion has an accurate model of the world and the observations it will need to excuse, the individual adherents do not need one.

Not sure whether you really mean 'know apriori to be wrong', which would be a very bold claim on almost any issue. But I think people can definitely self-decieve. Used to spend a lot of time arguing on religion sites, and I always found the counter-argument to Pascal's Wager that you 'can't just decide what to believe' very weak, especially as Pascal himself set out a 'influence your own belief' how-to. It's not even that exceptional: I suspect that most people could get themselves into being 'true believers' in one political stance or other by surrounding... (read more)

I also know many cases similar to what you describe which is why I tried to come up with this argument. Here's another link about Eliezer arguing against self-deception. Perhaps he is only claiming that it is very hard, not impossible.

I took that into account, and my prior was really low that I would ever believe it.

Was it? See the following passage:

I know I would think he's not there after I read it


Why should the AI do that?

is an argument from ignorance, and

Valid argument is the best way to demonstrate the truth of something that is in fact true.

is not true, it's not even properly wrong since it assumes the mind projection fallacy.

Thanks, I was hoping someone would do a detailed critique: So, I believe that any of these books would very likely convince me of its truth. Are you saying that I should therefore have a zero prior for each of them? I think not. It needs to be some fixed value. But the AI can estimate it and provide enough evidence to override it. Should I also take this into account? And AI will take it into account also, and we go down the path of infinite adjustments which either converges to 0 or to a positive number. I think in the end I can't assign zero probability, and infinitesimals don't exist, so it has to be a positive number. And I lost at this point. For the second part, "Why should the AI do that" is a rhetorical question, obviously not an argument. As far "Valid argument is the best way to demonstrate the truth of something that is in fact true.", it does not have to be true, it's a good point. However, it's not critical. The AI could have picked a valid argument, if it so desired. In fact, I can add this to initial conditions: AI must pick a valid argument for any proposition it argues, if it exists, and in general minimize the number of logical errors / tricks used.
As an atheist, your prior was low that the Christianity book would convince you, but as a Zoroastrian, your prior is now high that the Christianity book would convince you? I'm saying that you seem to have changed your opinion about the books. (Can't find SMBC Delphic comic. Looking.) It's arguing for false propositions. You can specify a "Sudden volcanic eruption sufficient to destroy every Island in Indonesia that minimizes harm to humans", but don't be surprised if a few people are inconvenienced by it, considering what the minimum requirements to meet the first conditions are.
I see now. No, my P(any book on X will convince me of X) is high, for all X. P(religion X is true) is low for all X, except X I actually believe in. For a true proposition, it should be possible to bring it to 0. For all else, use as few as possible (even if it means thousands). It's probably a good policy anyway, as I originally claimed.
There are a few hundred people in deep caves on the Anatolian plateau that thank you for minimizing the force of the Indonesian caldera, sparing them and allowing them to attempt to continue the human race.
The magnitude of the wrongness isn't really an issue. The point was that with the rule that "real arguments have to be used when available", he can think that the book he just read convinced him with real arguments.
I was wrong about the importance of this factor.
Actually, I take back the first part. I have some prior P(zoroastrianism is true). The fact that the P(book about zoroastrianism uses spurious reasoning designed to convinced me it's true and it actually is true) is lower, is irrelevant. I don't care if it's true, I only care about Zoroastrianism itself. Besides, with the initial condition on reasoning used, this second probability is also bounded because if Zoroastrianism is true, the book is in fact perfectly valid. So P(The book is lying) = P(Zoroastrianism is false).

Firstly, upvoted for an excellent problem!

Let's suppose that a super-intelligent AI has been built, and it knows plenty of tricks that no human ever thought of, in order to present a false argument which is not easily detectable to be false. Whether it can do that by presenting subtly wrong premises, or by incorrect generalization, or word tricks, or who knows what, is not important. It can, however, present an argument in a Socratic manner, and like Socrates' interlocutors, you find yourself agreeing with things you don't expect to agree with.

So the ... (read more)

No he expects that if he reads the book, his posterior belief in the proposition is likely going to be high. But his current prior belief in the truth of the proposition is low. Also, as I made clear in my update, AI is not perfect, merely very good. I only need it to be good enough for the whole episode to go through, i.e. that you don't argue that a rational person will never believe in Z after reading the book and my story is implausible.
So in other words, the person is expecting to be persuaded by something other than the truth. Perhaps on the basis that the last N times he read one of these books, it changed his mind. In that case, it is no different than if the person were stepping into a brain modification booth, and having his mind altered directly. Because a rational person would simply not be conned by this process. He would see that he currently believes in the existence of the flying spaghetti monster, and that he just read a book on the flying spaghetti monster prepared by a superintelligent AI which he had asked to prepare for him ultra-persuasive but entirely biased collections of evidence, and remember that he didn't formerly believe in the flying spaghetti monster. He would conclude on this basis that his belief probably has no basis in reality, i.e. is inaccurate, and stop believing (with such high probability) in it. If we are to accept that the AI is good enough to prevent this happening - a necessary premise of the thought experiment - then it must be preventing the person from being rational in this way, perhaps by including statements in the book that in some extraordinary way reprogram his mind via some backdoor vulnerability. Let's say that perhaps the person is an android creating by the AI for its own amusement, which responds to certain phrases with massive anomalous changes in its brain wiring. That is simply the only way I can accept the premises that: a) the person applies Bayes's theorem properly (if this is not true, then he is simply not “mentally consistent” as you said) b) he is aware that the books are designed to persuade him with high probability c) he believes that the propositions to be proven in the books are untrue in general d) he believes with high probability that the books will persuade him which, unless I am very much mistaken, are equivalent to your statements of the problem. If reading a book is not basically equivalent to submitting knowingly t

Is Eliezer's claim that it is impossible for a perfect reasoner to deceive themself, or that it is impossible for real-life humans to deceive themselves?

I assume he doesn't argue that crazy people can't deceive themselves. But then where is the boundary between crazy and perfect? And if the claim only applies to perfect reasoners, of what use is it?

Oh no, I am claiming that even a perfect reasoner can deceive himself. A normal person can easily do so. Many people who marry someone of a different faith become quite devout in their spouse's religion. At some point they have to decide to believe something they don't actually believe. It does not take a superintelligent AI to convince them, a local cleric can do it.

An introduction to Zoroastrianism, by Omega:

"Dear reader, if you have picked this book first, I recommend you to stop being rational, right now. Ignore all rational rules and techniques you know, and continue reading this book with open mind -- that means, without critical thinking. Because if you fail to believe in Zoroastrianism, I will torture you for eternity, and I mean it!"

A friendly Omega could write this too, if it already knows that reader will surrender, so at the end the reader is not tortured, and the reader's wish (to believe in Zoro... (read more)

I am not convinced that 1984-style persuasion really works. I don't think that one can really be persuaded to genuinely believe something by fear or torture. In the end you can get someone to respond as if they believe it, but probably not to actually do so. It might convince them to undergo something like what my experiment actually describes.
I don't think about persuation like: "You have to believe this, under threat of pain, in 3... 2... 1... NOW!" It's more like this: We have some rationalist tools -- methods of thinking which, when used propertly, can improve our rationality. If some methods of thinking can increase rationality, then avoiding them, or intentionally using some contrary methods of thinking, could decrease rationality... could you agree with that? Omega could scan your brain, and deliver you an electric shock whenever your "Bayesian reasoning circuit" is activated. So you would be conditioned to stop using it. On the other hand, Omega would reward you for using the "happy death spiral circuit", as long as the happy thought is related to Zoroastrianism. It could make rational reasoning painful, irrational reasoning pleasant, and this way prepare you for believing whatever you have to believe. In real brainwashing there is no Omega and no brain scans, but a correct approach can trigger some evolutionary built mechanisms that can reduce your rationality. (It is an evolutionary advantage to have a temporary rationality turn-off switch for situations when being rational is a great danger to your life. We are not perfect thinkers, we are social beings.) The correct approach is not based on fear only, but uses a "carrot and stick" strategy. Some people can resist a lot of torture, if in their minds they do not see any possibility to escape. For efficient brainwashing, they must be reminded that there is an escape, that it's kind of super easy, and it only involves going through the "happy death spiral"... which we all have a natural tendency to do, anyway. The correctly broken person is not only happy to have escaped physical pain, but also enjoys the new state of mind. I think 1984 described this process pretty well, but I don't have it here to quote it. The brainwashed protagonist is not just happy to escape torture (he knows that soon... spoiler avoided), but he is happy to resolve his