I expect most readers to know me either as MIRI's co-founder and the originator of a number of the early research problems in AI alignment, or as the author of Harry Potter and the Methods of Rationality, a popular work of Harry Potter fanfiction. I’ve described how I apply concepts in Inadequate Equilibria to various decisions in my personal life, and some readers may be wondering how I see these tying in to my AI work and my fiction-writing. And I do think these serve as useful case studies in inadequacy, exploitability, and modesty.
As a supplement to Inadequate Equilibria, then, the following is a dialogue that never took place—largely written in 2014, and revised and posted online in 2017.
i. Outperforming and the outside view
(The year is 2010. eliezer-2010 is sitting in a nonexistent park in Redwood City, California, working on his laptop. A person walks up to him.)
person: Pardon me, but are you Eliezer Yudkowsky?
eliezer-2010: I have that dubious honor.
person: My name is Pat; Pat Modesto. We haven’t met, but I know you from your writing online. What are you doing with your life these days?
eliezer-2010: I’m trying to write a nonfiction book on rationality. The blog posts I wrote on Overcoming Bias—I mean Less Wrong—aren’t very compact or edited, and while they had some impact, it seems like a book on rationality could reach a wider audience and have a greater impact.
pat: Sounds like an interesting project! Do you mind if I peek in on your screen and—
eliezer: (shielding the screen) —Yes, I mind.
pat: Sorry. Um... I did catch a glimpse and that didn’t look like a nonfiction book on rationality to me.
eliezer: Yes, well, work on that book was going very slowly, so I decided to try to write something else in my off hours, just to see if my general writing speed was slowing down to molasses or if it was this particular book that was the problem.
pat: It looked, in fact, like Harry Potter fanfiction. Like, I’m pretty sure I saw the words “Harry” and “Hermione” in configurations not originally written by J. K. Rowling.
eliezer: Yes, and I currently seem to be writing it very quickly. And it doesn’t seem to use up mental energy the way my regular writing does, either.
(A mysterious masked stranger, watching this exchange, sighs wistfully.)
eliezer: Now I’ve just got to figure out why my main book-writing project is going so much slower and taking vastly more energy... There are so many books I could write, if I could just write everything as fast as I’m writing this...
pat: Excuse me if this is a silly question. I don’t mean to say that Harry Potter fanfiction is bad—in fact I’ve read quite a bit of it myself—but as I understand it, according to your basic philosophy the world is currently on fire and needs to be put out. Now given that this is true, why are you writing Harry Potter fanfiction, rather than doing something else?
eliezer: I am doing something else. I’m writing a nonfiction rationality book. This is just in my off hours.
pat: Okay, but I’m asking why you are doing this particular thing in your off hours.
eliezer: Because my life is limited by mental energy far more than by time. I can currently produce this work very cheaply, so I’m producing more of it.
pat: What I’m trying to ask is why, even given that you can write Harry Potter fanfiction very cheaply, you are writing Harry Potter fanfiction. Unless it really is true that the only reason is that you need to observe yourself writing quickly in order to understand the way of quick writing, in which case I’d ask what probability you assign to learning that successfully. I’m skeptical that this is really the best way of using your off hours.
eliezer: I’m skeptical that you have correctly understood the concept of “off hours.” There’s a reason they exist, and the reason isn’t just that humans are lazy. I admit that Anna Salamon and Luke Muehlhauser don’t require off hours, but I don’t think they are, technically speaking, “humans.”
(The Mysterious Masked Stranger speaks for the first time.)
stranger: Excuse me.
eliezer: Who are you?
stranger: No one of consequence.
pat: And why are you wearing a mask?
stranger: Well, I’m definitely not a version of Eliezer from 2014 who’s secretly visiting the past, if that’s what you’re thinking.
pat: It’s fair to say that’s not what I’m thinking.
stranger: Pat and Eliezer-2010, I think the two of you are having some trouble communicating. The two of you actually disagree much more than you think.
pat & eliezer: Go on.
stranger: If you ask Eliezer of February 2010 why he’s writing Harry Potter and the Methods of Rationality, he will, indeed, respond in terms of how he expects writing Methods to positively impact his attempt to write The Art of Rationality, his attempt at a nonfiction how-to book. This is because we have—I mean, Eliezer has—a heuristic of planning on the mainline, which means that his primary justification for anything will be phrased in terms of how it positively contributes to a “normal” future timeline, not low-probability side-scenarios.
pat: Wait, isn’t your whole life—
stranger: Eliezer-2010 also has a heuristic that might be described as “never try to do anything unless you have a chance of advancing the Pareto frontier of the category.” In other words, if he’s expecting that some other work will be strictly better than his along all dimensions, it won’t occur to Eliezer-2010 that this is something he should spend time on. Eliezer-2010 thinks he has the potential to do things that advance Pareto frontiers, so why would he consider a project that wasn’t trying? So, off-hours or not, Eliezer wouldn’t be working on this story if he thought it would be strictly dominated along every dimension by any other work of fanfiction, or indeed, any other book.
eliezer: I wouldn’t put it in exactly those terms.
stranger: Yes, because when you say things like that out loud, people start saying the word “arrogance” a lot, and you don’t fully understand the reasons. So you’ll cleverly dance around the words and try to avoid that branch of possible conversation.
pat: Is that true?
eliezer: It sounds to me like the Masked Stranger is trying to use the Barnum effect—like, most people would acknowledge that as a secret description of themselves if you asked them.
pat: ...... I really, really don’t think so.
eliezer: I’d be surprised if it were less than 10% of the population, seriously.
stranger: Eliezer, you’ll have a somewhat better understanding of human status emotions in 4 years. Though you’ll still only go there when you have a point to make that can’t be made any other way, which in turn will be unfortunately often as modest epistemology norms propagate through your community. But anyway, Pat, the fact that Eliezer-2010 has spent any significant amount of time on Harry Potter and the Methods of Rationality indeed lets you infer that Eliezer-2010 thinks Methods has a chance of being outstanding along some key dimension that interests him—of advancing the frontiers of what has ever been done—although he might hesitate to tell you that before he’s actually done it.
eliezer: Okay, yes, that’s true. I’m unhappy with the treatment of supposedly “intelligent” and/or “rational” characters in fiction and I want to see it done right just once, even if I have to write the story myself. I have an explicit thesis about what’s being done wrong and how to do it better, and if this were not the case then the prospect of writing Methods would not interest me as much.
stranger: (aside) There’s so much civilizational inadequacy in our worldview that we hardly even notice when we invoke it. Not that this is an alarming sign, since, as it happens, we do live in an inadequate civilization.
eliezer: (continuing to Pat) However, the reason I hold back from saying in advance what Methods might accomplish isn’t just modesty. I’m genuinely unsure that I can make Methods be what I think it can be. I don’t want to promise more than I can deliver. And since one should first plan along the mainline, if investigating the conditions under which I can write quickly weren’t a sufficiently important reason, I wouldn’t be doing this.
stranger: (aside) I have some doubts about that alleged justification in retrospect, though it wasn’t stupid.
pat: Can you say more about how you think your Harry Potter story will have outstandingly “intelligent” characters?
eliezer: I’d rather not? As a matter of literature, I should show, not tell, my thesis. Obviously it’s not that I think that my characters are going to learn fifty-seven languages because they’re super-smart. I think most attempts to create “intelligent characters” focus on surface qualities, like how many languages someone has learned, or they focus on stereotypical surface features the author has seen in other “genius” characters, like a feeling of alienation. If it’s a movie, the character talks with a British accent. It doesn’t seem like most such authors are aware of Vinge’s reasoning for why it should be hard to write a character that is smarter than the author. Like, if you know exactly where an excellent chessplayer would move on a chessboard, you must be at least that good at playing chess yourself, because you could always just make that move. For exactly the same reason, it’s hard to write a character that’s more rational than the author.
I don’t think the concept of “intelligence” or “rationality” that’s being used in typical literature has anything to do with discerning good choices or making good predictions. I don’t think there is a standard literary concept for characters who excel at cognitive optimization, distinct from characters who just win because they have a magic sword in their brains. And I don’t think most authors of “genius” characters respect their supposed geniuses enough to really put themselves in their shoes—to really feel what their inner lives would be like, and think beyond the first cliche that comes to mind. The author still sets themselves above the “genius,” gives the genius some kind of obvious stupidity that lets the author maintain emotional distance...
stranger: (aside) Most writers have a hard time conceptualizing a character who's genuinely smarter than the author; most futurists have a hard time conceptualizing genuinely smarter-than-human AI; and indeed, people often neglect the hypothesis that particularly smart human beings will have already taken into account all the factors that they consider obvious. But with respect to sufficiently competent individuals making decisions that they can make on their own cognizance—as opposed to any larger bureaucracy or committee, or the collective behavior of a field—it is often appropriate to ask if they might be smarter than you think, or have better justifications than are obvious to you.
pat: Okay, but supposing you can write a book with intelligent characters, how does that help save the world, exactly?
eliezer: Why are you focusing on the word “intelligence” instead of “rationality”? But to answer your question, nonfiction writing conveys facts; fiction writing conveys experiences. I’m worried that my previous two years of nonfiction blogging haven’t produced nearly enough transfer of real cognitive skills. The hope is that writing about the inner experience of someone trying to be rational will convey things that I can’t easily convey with nonfiction blog posts.
eliezer: What is it, Masked Stranger?
stranger: Just... you’re so very modest.
eliezer: You’re saying this to me?
stranger: It’s sort of obvious from where I live now. So very careful not to say what you really hope Harry Potter and the Methods of Rationality will do, because you know people like Pat won’t believe it and can’t be persuaded to believe it.
pat: This guy is weird.
eliezer: (shrugging) A lot of people are.
pat: Let’s ignore him. So you’re presently investing a lot of hours—
eliezer: But surprisingly little mental energy.
stranger: Where I come from, we would say that you’re investing surprisingly few spoons.
pat: —but still a lot of hours, into crafting a Harry Potter story with, you hope, exceptionally rational characters. Which will cause some of your readers to absorb the experience of being rational. Which you think eventually ends up important to saving the world.
eliezer: Mm, more or less.
pat: What do you think the outside view would say about—
eliezer: Actually, I think I’m about out of time for today. (Starts to close his laptop.)
stranger: Wait. Please stick around. Can you take my word that it’s important?
eliezer: ...all right. I suppose I don’t have very much experience with listening to Masked Strangers, so I’ll try that and see what happens.
pat: What did I say wrong?
stranger: You said that the conversation would never go anywhere helpful.
eliezer: I wouldn’t go that far. It’s true that in my experience, though, people who use the phrase “outside view” usually don’t offer advice that I think is true, and the conversations take up a lot of mental energy—spoons, you called them? But since I’m taking the Masked Stranger’s word on things and trying to continue, fine. What do you think the outside view has to say about the Methods of Rationality project?
pat: Well, I was just going to ask you to consider what the average story with a rational character in it accomplishes in the way of skill transfer to readers.
eliezer: I’m not trying to write an average story. The whole point is that I think the average story with a “rational” character is screwed up.
pat: So you think that your characters will be truly rational. But maybe those authors also think their characters are rational—
eliezer: (in a whisper to the Masked Stranger) Can I exit this conversation?
stranger: No. Seriously, it’s important.
eliezer: Fine. Pat, your presumption is wrong. These hypothetical authors making a huge effort to craft rational characters don’t actually exist. They don’t realize that it should take an effort to craft rational characters; they’re just regurgitating cliches about Straw Vulcans with very little self-perceived mental effort.
stranger: Or as I would phrase it: This is not one of the places where our civilization puts in enough effort that we should expect adequacy.
pat: Look, I don’t dispute that you can probably write characters more rational than those of the average author; I just think it’s important to remember, on each occasion, that being wrong feels just like being right.
stranger: Eliezer, please tell him what you actually think of that remark.
eliezer: You do not remember on each occasion that “being wrong feels just like being right.” You remember it on highly selective occasions where you are motivated to be skeptical of someone else. This feels just like remembering it on every relevant occasion, since, after all, every time you felt like you ought to think of it, you did. You just used a fully general counterargument, and the problem with arguments like that is that they provide no Bayesian discrimination between occasions where we are wrong and occasions where we are right. Like “but I have faith,” “being wrong feels just like being right” is as easy to say on occasions when someone is right as on occasions when they are wrong.
stranger: There is a stage of cognitive practice where people should meditate on how the map is not the territory, especially if it’s never before occurred to them that what feels like the universe of their immersion is actually their brain’s reconstructed map of the true universe. It’s just that Eliezer went through that phase while reading S. I. Hayakawa’s Language in Thought and Action at age eleven or so. Once that lesson is fully absorbed internally, invoking the map-territory distinction as a push against ideas you don’t like is (fully general) motivated skepticism.
pat: Leaving that aside, there’s this research showing that there’s a very useful technique called “reference class forecasting”—
eliezer: I am aware of this.
pat: And I’m wondering what reference class forecasting would say about your attempt to do good in the world via writing Harry Potter fanfiction.
eliezer: (to the Masked Stranger) Please can I run away?
eliezer: (sighing) Okay, to take the question seriously as more than generic skepticism: If I think of the books which I regard as having well-done rational characters, their track record isn’t bad. A. E. van Vogt’s The World of Null-A was an inspiration to me as a kid. Null-A didn’t just teach me the phrase “the map is not the territory”; it was where I got the idea that people employing rationality techniques ought to be awesome and if they weren’t awesome that meant they were doing something wrong. There are a heck of a lot of scientists and engineers out there who were inspired by reading one of Robert A. Heinlein’s hymns in praise of science and engineering—yes, I know Heinlein had problems, but the fact remains.
stranger: I wonder what smart kids who grew up reading Harry Potter and the Methods of Rationality as twelve-year-olds will be like as adults...
pat: But surely van Vogt’s Null-A books are an exceptional case of books with rationalist characters. My first question is, what reason do you have to believe you can do that? And my second question is, even given that you write a rational character as inspiring as a character in a Heinlein novel, how much impact do you think one character like that has on an average reader, and how many people do you think will read your Harry Potter fanfiction in the best case?
eliezer: To be honest, it feels to me like you’re asking the wrong questions. Like, it would never occur to me to ask any of the questions you’re asking now, in the course of setting out to write Methods.
stranger: (aside) That’s true, by the way. None of these questions ever crossed my mind in the original timeline. I’m only asking them now because I’m writing the character of Pat Modesto. A voice like Pat Modesto is not a productive voice to have inside your head, in my opinion, so I don’t spontaneously wonder what he would say.
eliezer: To produce the best novel I can, it makes sense for me to ask what other authors were doing wrong with their rational characters, and what A. E. van Vogt was doing right. I don’t see how it makes sense for me to be nervous about whether I can do better than A. E. van Vogt, who had no better source to work with than Alfred Korzybski, decades before Daniel Kahneman was born. I mean, to be honest about what I’m really thinking: So far as I’m concerned, I’m already walking outside whatever so-called reference class you’re inevitably going to put me in—
pat: What?! What the heck does it mean to “walk outside” a reference class?
eliezer: —which doesn’t guarantee that I’ll succeed, because being outside of a reference class isn’t the same as being better than it. It means that I don’t draw conclusions from the reference class to myself. It means that I try, and see what happens.
pat: You think you’re just automatically better than every other author who’s ever tried to write rational characters?
eliezer: No! Look, thinking things like that is just not how the inside of my head is organized. There’s just the book I have in my head and the question of whether I can translate that image into reality. My mental world is about the book, not about me.
pat: But if the book you have in your head implies that you can do things at a very high percentile level, relative to the average fiction author, then it seems reasonable for me to ask why you already think you occupy that percentile.
stranger: Let me try and push things a bit further. Eliezer-2010, suppose I told you that as of the start of 2014, Methods succeeded to the following level. First, it has roughly half a million words, but you’re not finished writing it—
eliezer: Damn. That’s disappointing. I must have slowed down a lot, and definitely haven’t mastered the secret of whatever speed-writing I’m doing right now. I wonder what went wrong? Actually, why am I hypothetically continuing to write this book instead of giving up?
stranger: Because it’s the most reviewed work of Harry Potter fanfiction out of more than 500,000 stories on fanfiction.net, has organized fandoms in many universities and colleges, has received at least 15,000,000 page views on what is no longer the main referenced site, has been turned by fans into an audiobook via an organized project into which you yourself put zero effort, has been translated by fans into many languages, is famous among the Caltech/MIT crowd, has its own daily-trafficked subreddit with 6,000 subscribers, is often cited as the most famous or the most popular work of Harry Potter fanfiction, is considered by a noticeable fraction of its readers to be literally the best book they have ever read, and on at least one occasion inspired an International Mathematical Olympiad gold medalist to join the alliance and come to multiple math workshops at MIRI.
eliezer: I like this scenario. It is weird, and I like weird. I would derive endless pleasure from inflicting this state of affairs on reality and forcing people to come to terms with it.
stranger: Anyway, what probability would you assign to things going at least that well?
eliezer: Hm... let me think. Obviously this exact scenario is improbable, because conjunctive. But if we partition outcomes according to whether they rank at least this high or better in my utility function, and ask how much probability mass I put into outcomes like that, then I think it’s around 10%. That is, a success like this would come in at around the 90th percentile of my hopes.
pat: (incoherent noises)
eliezer: Oh. Oops. I forgot you were there.
pat: 90th percentile?! You mean you seriously think there’s a 1 in 10 chance that might happen?
eliezer: Ah, um...
stranger: Yes, he does. He wouldn’t have considered it in exactly those words if I hadn’t put it that way—not just because it’s ridiculously specific, but because Eliezer Yudkowsky doesn’t think in terms like that in advance of encountering the actual fact. He would consider it a “specific fantasy” that was threatening to drain away his emotional energy. But if it did happen, he would afterward say that he had achieved an outcome such that around 10% of his probability mass “would have been” in outcomes like that one or better, though he would worry about being hindsight-biased.
pat: I think a reasonable probability for an outcome like that would be more like 0.1%, and even that is being extremely generous!
eliezer: “Outside viewers” sure seem to tell me that a lot whenever I try to do anything interesting. I’m actually kind of surprised to hear you say that, though. I mean, my basic hypothesis for how the “outside view” thing operates is that it’s an expression of incredulity that can be leveled against any target by cherry-picking a reference class that predicts failure. One then builds an inescapable epistemic trap around that reference class by talking about the Dunning-Kruger effect and the dangers of inside-viewing. But trying to write Harry Potter fanfiction, even unusually good Harry Potter fanfiction, should sound to most people like it’s not high-status. I would expect people to react mainly to the part about the IMO gold medalist, even though the base rate for being an IMO gold medalist is higher than the base rate for authoring the most-reviewed Harry Potter fanfiction.
pat: Have you ever even tried to write Harry Potter fanfiction before? Do you know any of the standard awards that help publicize the best Harry Potter fan works or any of the standard sites that recommend them? Do you have any idea what the vast majority of the audience for Harry Potter fanfiction wants? I mean, just the fact that you’re publishing on FanFiction.Net is going to turn off a lot of people; the better stories tend to be hosted at ArchiveOfOurOwn.Org or on other, more specialized sites.
eliezer: Oh. I see. You do know about the pre-existing online Harry Potter fanfiction community, and you’re involved in it. You actually have a pre-existing status hierarchy built up in your mind around Harry Potter fanfiction. So when the Masked Stranger talks about Methods becoming the most popular Harry Potter fanfiction ever, you really do hear that as an overreaching status-claim, and you do that thing that makes an arbitrary proposition sound very improbable using the “outside view.”
pat: I don’t think the outside view, or reference class forecasting, can make arbitrary events sound very improbable. I think it makes events that won’t actually happen sound very improbable. As for my prior acquaintance with the community—how is that supposed to devalue my opinions? I have domain expertise. I have some actual idea of how many thousands of authors, including some very good authors, are trying to write Harry Potter fanfiction, only one of whom