From my experience reading and talking about decision theory on LW, it seems that many of the unproductive comments in these discussions can be attributed to a handful of common mistakes.

#### Mistake #1: Arguing about assumptions

The main reason why I took so long to understand Newcomb's Problem and Counterfactual Mugging was my insistence on denying the assumptions behind these puzzles. I could have saved months if I'd just said to myself, okay, is this direction of inquiry interesting when taken on its own terms?

Many assumptions seemed to be divorced from real life at first. People dismissed the study of electromagnetism as an impractical toy, and considered number theory hopelessly abstract until cryptography arrived. The only way to make intellectual progress (either individually or as a group) is to explore the implications of interesting assumptions wherever they might lead. Unfortunately people love to argue about assumptions instead of getting anything done, though they can't really judge before exploring the implications in detail.

Several smart people on LW are repeating my exact mistake about Newcomb's Problem now, and others find ways to commit the same mistake when looking at our newer ideas. It's so frustrating and uninteresting to read yet another comment saying my assumptions look unintuitive or unphysical or irrelevant to FAI or whatever. I'm not against criticism, but somehow such comments never blossom into interesting conversations, and that's reason enough to caution you against the way of thinking that causes them.

#### Mistake #2: Stopping when your idea seems good enough

There's a handful of ideas that decision theory newbies rediscover again and again, like pointing out indexical uncertainty as the solution to Newcomb's problem, or adding randomness to models of UDT to eliminate spurious proofs. These ideas don't work and don't lead anywhere interesting, but that's hard to notice when you just had the flash of insight and want to share it with the world.

A good strategy in such situations is to always push a little bit *past* the point where you have everything figured out. Take one extra step and ask yourself: "Can I make this idea precise?" What are the first few implications? What are the obvious extensions? If your result seems to contradict what's already known, work through some of the contradictions yourself. If you don't find any mistakes in your idea, you will surely find new formal things to say about your idea, which always helps.

#### Mistake #2A: Stopping when your idea actually is good enough

I didn't want to name any names in this post because my status on LW puts me in a kinda position of power, but there's a name I can name with a clear conscience. In 2009, Eliezer wrote:

Formally you'd use a Godelian diagonal to write (...)

Of course that's not a newbie mistake at all, but an awesome and fruitful idea! As it happens, writing out that Godelian diagonal immediately leads to all sorts of puzzling questions like "but what does it actually do? and how do we prove it?", and eventually to all the decision theory research we're doing now. Knowing Eliezer's intelligence, he probably could have preempted most of our results. Instead he just declared the problem solved. Maybe he thought he was already at 0.95 formality and that going to 1.0 would be a trivial step? I don't want to insinuate here, but IMO he made a mistake.

Since this mistake is indistinguishable from the last, the remedy for it is the same: "Can I make this idea precise?" Whenever you stake out a small area of knowledge and make it amenable to mathematical thinking, you're likely to find new math that has lasting value. When you stop because your not-quite-formal idea seems already good enough, you squander that opportunity.

...

If this post has convinced you to stop making these common mistakes, be warned that it won't necessarily make you happier. As you learn to see more clearly, the first thing you'll see will be a locked door with a sign saying "Research is hard". Though it's not very scary or heroic, mostly you just stand there feeling stupid about yourself :-)

Here's a few more from my list:

Answering "The right solution to this decision problem is X" (and seemly being satisfied with that) when the answer that's generally wanted is of the form "Y is the right decision theory, and here's why Y gives the right answers to this and other tricky decision problems".

Taking speculative ideas too seriously and trying to apply them to real life before the necessary details have been worked out.

Doing decision theory research might be a mistake in itself, if your goal is a positive Singularity and not advancing decision theory per se or solving interesting puzzles. (I had a philosophical interest in decision theory before I came to OB/LW. Cousin_it sees it mainly as a source of cool math problems. So both of us are excused. :)

Isn't it sort of a moral imperative to familiarize oneself with the foundations of decision theory? It seems sort of important for understanding the foundations of epistemology, morality, ontology of agency, et cetera, which are things it'd be helpful to understand if you were trying to be morally justified. I guess this is to some extent what you meant by "philosophical interest"? -- Will Newsome on Luke's computer

The first one is a perfect fit, I've seen many people get stuck in exactly that way. Thanks!

My education in decision theory has been fairly informal so far, and I've had trouble understanding some of your recent technical posts because I've been uncertain about what assumptions you've made. I think more explicitly stating your assumptions could lessen the frequency of arguments about assumptions by decreasing the frequency of readers mistakenly believing you've made

differentassumptions. It could also decreaseinquiriesabout your assumptions, like the one I made on your post on the limited predictor problem.One way to do this could be to, in your posts, link to other works that define your assumptions. Such links could also function to connect less-experienced readers with relevant background reading.

Do you understand these posts now, or do you have any other questions about them? I'll be glad to answer your questions and use that to learn how to communicate more clearly.

I have several questions. I hadn't asked them because I thought I should do more research before taking up your time. Here are some examples:

solvethe limited predictor problem? In what form should a solution be—an agent program?I will plan to do more research and then ask more detailed questions in the relevant discussion threads if I still don't understand.

I think my failure to comprehend parts of your posts is more due to my lack of familiarity with the subject matter than your communication style. Adding links to works that establish the assumptions or formal systems you're using could help less advanced readers start learning that background material without you having to significantly lengthen your posts.

Thanks for the help!

1) Yes, the solution should be an agent program. It can't be something as simple as "return 1", because when I talk about solving the LPP, there's an implicit desire to have a single agent that solves all problems similar enough to the LPP, for example the version where the agent's actions 1 and 2 are switched, or where the agent's source code has some extra whitespace and comments compared to its own quined representation, etc.

2) We imagine the world to be a program with no arguments that returns a utility value, and the agent to be a subprogram within the world program. Even though the return value of an argumentless program is just a constant, the agent can still try to "maximize that constant", if the agent is ignorant about it in just the right way. For example, if the world program calls the agent program and then returns 0 or 1 depending on whether the agent's return value was even or odd, and the agent can prove a theorem to that effect by looking at the world program's source code, then it makes sense for the agent to return an odd value.

Newcomb's Problem can be formalized as a world program that makes two calls to the agent (or maybe one call to the agent and another call to something provably equivalent). The first call's return value is used to set the contents of the boxes, and the second one represents the agent's actual decision. If a smart enough agent receives the world's source code as an argument (which includes possibly mangled versions of the agent's source code inside), and the agent knows its own source code by quining), then the agent can prove a theorem saying that one-boxing would logically imply higher utility than two-boxing. That setting is explored in a little more detail here.

Before you ask: no, we don't know any rigorous definition of what it means to "maximize" the return value of an argumentless program in general. We're still fumbling with isolated cases, hoping to find more understanding. I'm only marginally less confused than you about the whole field.

3) You can think about an agent as a program that receives the world's source code as an argument, so that one agent can solve many possible world programs. I usually talk about agents as if they were argumentless functions that had access to the world's source code via quining, but that's just to avoid cluttering up the proofs. The results are the same either way.

4) Usually you can assume that S is just Peano arithmetic. You can represent programs by their Gödel numbers, and write a PA predicate saying "program X returns integer Y". You can also represent statements and proofs in PA by their Gödel numbers, and write a PA predicate saying "proof X is a valid proof of statement Y". You can implement both these predicates in your favorite programming language, as functions that accept two integers and return a boolean. You can have statements referring to these predicates by their Gödel numbers. The diagonal lemma gives you a generalized way to make statements refer to themselves, and quining allows you to have programs that refer to their own source code. You can have proofs that talk about programs, programs that enumerate and check proofs, and generally go wild.

For example, you can write a program P that enumerates all possible proofs trying to find a valid proof that P itself returns 1, and returns 1 if such a proof is found. To prove that P will in fact return 1 and not loop forever, note that it's just a restatement of Löb's theorem. That setting is explored in a little more detail here.

Please let me know if the above makes sense to you!

Thank you. Your comment resolved some of my confusion. While I didn't understand it entirely, I am happy to have accrued a long list of relevant background reading.

Which of these mistakes do you attribute to insufficient rationality? Which are due to insufficient intelligence? Which are "I made the best bet possible given the information I had or could have obtained, and just turned out to be wrong"?

1: Nash equilibria (such as D,D in PD) essentially assumes independence between different agent's decisions: P(B()=b | A()=a) = P(B()=b | A()!=a). It took Eliezer to realize that this assumption is not always valid and the opposite assumption may be more relevant for some decision problems, especially those involving AIs. If he didn't "argue" about assumptions, how would he transmit his insight to others? You observe a correlation between less arguing over assumptions and more interesting discussions/results, but isn't it possible that both are caused by higher intelligence (i.e., smarter people could more quickly see that Eliezer had a point)?

2: This is more clearly a failure of rationality. In these situations I think one ought to ask "Why do I think I have the right solution when there are so many smart people who profess to be confused or disagree with me? Am I sure they haven't already thought through my proposed solution and found it wanting, and I'm also confused but just don't realize it?"

2A: From Eliezer's perspective at that time, he thought he had the right central insight to the problem and there were just technical loose ends to be tied up, and his time could be better spent doing other things. It's not clear that was a mistake, even in retrospect, especially if you consider that his interest was eventually building an FAI, not doing decision theory research per se.

All these mistakes look more like failures of rationality to me, because smart people make them too.

At least some of the credit goes to Hofstadter. In any case, I think people listened to Eliezer more because he said things like "I have worked out a mathematical analysis of these confusing problems", not just "my intuition says the basic assumptions of game theory don't sound right". If you do explore the implications of your alternative set of assumptions and they turn out to be interesting, you're exempt from mistake #1.

Intelligence is certainly a common factor, but I also observe that correlation by looking at myself at different times. If I get more interesting results when I view problems on their own terms, that strategy might work for other people too.

The difference between Hofstadter and Eliezer is that Hofstadter couldn't make a convincing enough case for his assumptions, because he was talking about humans instead of AIs, and it's just not clear that human decision procedures are similar enough to each other for his assumptions to hold. Eliezer also thought his ideas applied to humans, but he had a backup argument to the effect "even if you don't think this applies to humans, at least it applies to AIs who know each others' source code, so it's still important to to work on" and that's what convinced me.

BTW, for historical interest, I found a 2002 post by Hal Finney that came pretty close to some of the ideas behind TDT:

I responded to Hal, and stated my agreement, but neither of us followed it up at the time. I even forgot about the post until I found it again yesterday, but I guess it must have influenced my thinking once Eliezer started talking about similar ideas.

Personally, I thought he made a good case that the basic assumptions of game theory aren't right, or rather won't be right in a future where superintelligent AIs know each others' source code. I don't think I would have been particularly interested if he just said "these non-standard assumptions lead to some cool math" since I don't have that much interest in math qua math.

Similarly, I explore other seemingly strange assumptions like the ones in Newcomb's Problem or Counterfactual Mugging because I think they are abstracted/simplified versions of real problems in FAI design and ethics, designed to isolate and clarify some particular difficulties, not because they are "interesting when taken on its own terms".

I guess it appears to you that you are working on these problems because they seem like interesting math, or "interesting when taken on its own terms", but I wonder why you find these particular math problems or assumptions interesting, and not the countless others you could choose instead. Maybe the part of your brain that outputs "interesting" is subconsciously evaluating importance and relevance?

An even more likely explanation is that my mind evaluates reputation gained per unit of effort. Academic math is really crowded, chances are that no one would read my papers anyway. Being in a frustratingly informal field with a lot of pent-up demand for formality allows me to get many people interested in my posts while my mathematician friends get zero feedback on their publications. Of course it didn't feel so cynical from the inside, it felt more like a growing interest fueled by constant encouragement from the community. If "Re-formalizing PD" had met with a cold reception, I don't think I'd be doing this now.

In that case you're essentially outsourcing your "interestingness" evaluation to the SIAI/LW community, and I think

weare basing it mostly on relevance to FAI.Yeah. Though that doesn't make me adopt FAI as my own primary motivation, just like enjoying sex doesn't make me adopt genetic fitness as my primary motivation.

My point is that your advice isn't appropriate for everyone. People who do care about FAI or other goals besides community approval

shouldthink/argue about assumptions. Of course one could overdo that and waste too much time, but they clearly can't just work on whatever problems seem likely to offer the largest social reward per unit of effort.What if we rewarded you for adopting FAI as your primary motivation? :)

That sounds sideways. Wouldn't that make the

rewardmy primary motivation? =)No, I mean what if we offered you rewards for changing your

terminalgoals so that you'd continue to be motivated by FAI even after the rewards end? You should take that deal if we can offer big enough rewards and your discount rate is high enough, right? Previous related threadYou're trying to affect the motivation of a decision theory researcher by offering a transaction whose acceptance is itself a tricky decision theory problem?

Upvoted for hilarious metaness.

Now, all we need to do is figure out how humans can modify their own source code and verify those modifications in others...

That could work, but how would that affect my behavior? We don't seem to have any viable mathematical attacks on FAI-related matters except this one.

I suggest editing the post to include this point.

In retrospect, someone definitely needed to post about this. Thanks for thinking of it!

Have you considered using a different pseudonym for each post, and never decloaking any of the names?

The social implications of such a practice would interact with the existence of the anti-kibitzer, I think; it would amount to forcing others to experience (some) antikibitz features even if they had chosen not to. On the other hand, if the anti-kibitzer didn't already exist, I probably would have advocated writer-side over reader-side implementation/enforcement of pseudonymity.

I think people have more reasons to trust a post like this if they know it's coming from someone who got actual results. As for my more mathy posts, their style is probably so distinctive that using a pseudonym would be futile.

To what degree do these points apply to mathematics research in general?

I think they apply pretty widely. Or rather, the extent to which they apply to decision theory is roughly the extent they apply to mathematics in general.

I don't know, but would guess that the wider you try to apply them, the less correct they become.