Followup toNewcomb's Problem and Regret of Rationality, Towards a New Decision Theory

Wei Dai asked:

"Why didn't you mention earlier that your timeless decision theory mainly had to do with logical uncertainty? It would have saved people a lot of time trying to guess what you were talking about."


All right, fine, here's a fast summary of the most important ingredients that go into my "timeless decision theory".  This isn't so much an explanation of TDT, as a list of starting ideas that you could use to recreate TDT given sufficient background knowledge.  It seems to me that this sort of thing really takes a mini-book, but perhaps I shall be proven wrong.

The one-sentence version is:  Choose as though controlling the logical output of the abstract computation you implement, including the output of all other instantiations and simulations of that computation.

The three-sentence version is:  Factor your uncertainty over (impossible) possible worlds into a causal graph that includes nodes corresponding to the unknown outputs of known computations; condition on the known initial conditions of your decision computation to screen off factors influencing the decision-setup; compute the counterfactuals in your expected utility formula by surgery on the node representing the logical output of that computation.

To obtain the background knowledge if you don't already have it, the two main things you'd need to study are the classical debates over Newcomblike problems, and the Judea Pearl synthesis of causality.  Canonical sources would be "Paradoxes of Rationality and Cooperation" for Newcomblike problems and "Causality" for causality.

For those of you who don't condescend to buy physical books, Marion Ledwig's thesis on Newcomb's Problem is a good summary of the existing attempts at decision theories, evidential decision theory and causal decision theory.  You need to know that causal decision theories two-box on Newcomb's Problem (which loses) and that evidential decision theories refrain from smoking on the smoking lesion problem (which is even crazier).  You need to know that the expected utility formula is actually over a counterfactual on our actions, rather than an ordinary probability update on our actions.

I'm not sure what you'd use for online reading on causality.  Mainly you need to know:

  • That a causal graph factorizes a correlated probability distribution into a deterministic mechanism of chained functions plus a set of uncorrelated unknowns as background factors.
  • Standard ideas about "screening off" variables (D-separation).
  • The standard way of computing counterfactuals (through surgery on causal graphs).

It will be helpful to have the standard Less Wrong background of defining rationality in terms of processes that systematically discover truths or achieve preferred outcomes, rather than processes that sound reasonable; understanding that you are embedded within physics; understanding that your philosophical intutions are how some particular cognitive algorithm feels from inside; and so on.

The first lemma is that a factorized probability distribution which includes logical uncertainty - uncertainty about the unknown output of known computations - appears to need cause-like nodes corresponding to this uncertainty.

Suppose I have a calculator on Mars and a calculator on Venus.  Both calculators are set to compute 123 * 456.  Since you know their exact initial conditions - perhaps even their exact initial physical state - a standard reading of the causal graph would insist that any uncertainties we have about the output of the two calculators, should be uncorrelated.  (By standard D-separation; if you have observed all the ancestors of two nodes, but have not observed any common descendants, the two nodes should be independent.)  However, if I tell you that the calculator at Mars flashes "56,088" on its LED display screen, you will conclude that the Venus calculator's display is also flashing "56,088".  (And you will conclude this before any ray of light could communicate between the two events, too.)

If I was giving a long exposition I would go on about how if you have two envelopes originating on Earth and one goes to Mars and one goes to Venus, your conclusion about the one on Venus from observing the one on Mars does not of course indicate a faster-than-light physical event, but standard ideas about D-separation indicate that completely observing the initial state of the calculators ought to screen off any remaining uncertainty we have about their causal descendants so that the descendant nodes are uncorrelated, and the fact that they're still correlated indicates that there is a common unobserved factor, and this is our logical uncertainty about the result of the abstract computation.  I would also talk for a bit about how if there's a small random factor in the transistors, and we saw three calculators, and two showed 56,088 and one showed 56,086, we would probably treat these as likelihood messages going up from nodes descending from the "Platonic" node standing for the ideal result of the computation - in short, it looks like our uncertainty about the unknown logical results of known computations, really does behave like a standard causal node from which the physical results descend as child nodes.

But this is a short exposition, so you can fill in that sort of thing yourself, if you like.

Having realized that our causal graphs contain nodes corresponding to logical uncertainties / the ideal result of Platonic computations, we next construe the counterfactuals of our expected utility formula to be counterfactuals over the logical result of the abstract computation corresponding to the expected utility calculation, rather than counterfactuals over any particular physical node.

You treat your choice as determining the result of the logical computation, and hence all instantiations of that computation, and all instantiations of other computations dependent on that logical computation.

Formally you'd use a Godelian diagonal to write:

Argmax[A in Actions] in Sum[O in Outcomes](Utility(O)*P(this computation yields A []-> O|rest of universe))

(where P( X=x []-> Y | Z ) means computing the counterfactual on the factored causal graph P, that surgically setting node X to x, leads to Y, given Z)

Setting this up correctly (in accordance with standard constraints on causal graphs, like noncircularity) will solve (yield reflectively consistent, epistemically intuitive, systematically winning answers to) 95% of the Newcomblike problems in the literature I've seen, including Newcomb's Problem and other problems causing CDT to lose, the Smoking Lesion and other problems causing EDT to fail, Parfit's Hitchhiker which causes both CDT and EDT to lose, etc.

Note that this does not solve the remaining open problems in TDT (though Nesov and Dai may have solved one such problem with their updateless decision theory).  Also, although this theory goes into much more detail about how to compute its counterfactuals than classical CDT, there are still some visible incompletenesses when it comes to generating causal graphs that include the uncertain results of computations, computations dependent on other computations, computations uncertainly correlated to other computations, computations that reason abstractly about other computations without simulating them exactly, and so on.  On the other hand, CDT just has the entire counterfactual distribution rain down on the theory as mana from heaven (e.g. James Joyce, Foundations of Causal Decision Theory), so TDT is at least an improvement; and standard classical logic and standard causal graphs offer quite a lot of pre-existing structure here.  (In general, understanding the causal structure of reality is an AI-complete problem, and so in philosophical dilemmas the causal structure of the problem is implicitly given in the story description.)

Among the many other things I am skipping over:

  • Some actual examples of where CDT loses and TDT wins, EDT loses and TDT wins, both lose and TDT wins, what I mean by "setting up the causal graph correctly" and some potential pitfalls to avoid, etc.
  • A rather huge amount of reasoning which defines reflective consistency on a problem class; explains why reflective consistency is a rather strong desideratum for self-modifying AI; why the need to make "precommitments" is an expensive retreat to second-best and shows lack of reflective consistency; explains why it is desirable to win and get lots of money rather than just be "reasonable" (that is conform to pre-existing intuitions generated by a pre-existing algorithm); which notes that, considering the many pleas from people who want, but can't find any good intermediate stage between CDT and EDT, it's a fascinating little fact that if you were rewriting your own source code, you'd rewrite it to one-box on Newcomb's Problem and smoke on the smoking lesion problem...
  • ...and so, having given many considerations of desirability in a decision theory, shows that the behavior of TDT corresponds to reflective consistency on a problem class in which your payoff is determined by the type of decision you make, but not sensitive to the exact algorithm you use apart from that - that TDT is the compact way of computing this desirable behavior we have previously defined in terms of reflectively consistent systematic winning.
  • Showing that classical CDT, given self-modification ability, modifies into a crippled and inelegant form of TDT.
  • Using TDT to fix the non-naturalistic behavior of Pearl's version of classical causality in which we're supposed to pretend that our actions are divorced from the rest of the universe - the counterfactual surgery, written out Pearl's way, will actually give poor predictions for some problems (like someone who two-boxes on Newcomb's Problem and believes that box B has a base-rate probability of containing a million dollars, because the counterfactual surgery says that box B's contents have to be independent of the action).  TDT not only gives the correct prediction, but explains why the counterfactual surgery can have the form it does - if you condition on the initial state of the computation, this should screen off all the information you could get about outside things that affect your decision; then your actual output can be further determined only by the Godel-diagonal formula written out above, permitting the formula to contain a counterfactual surgery that assumes its own output, so that the formula does not need to infinitely recurse on calling itself.
  • An account of some brief ad-hoc experiments I performed on IRC to show that a majority of respondents exhibited a decision pattern best explained by TDT rather than EDT or CDT.
  • A rather huge amount of exposition of what TDT decision theory actually corresponds to in terms of philosophical intuitions, especially those about "free will".  For example, this is the theory I was using as hidden background when I wrote in "Causality and Moral Responsibility" that factors like education and upbringing can be thought of as determining which person makes a decision - that you rather than someone else makes a decision - but that the decision made by that particular person is up to you.  This corresponds to conditioning on the known initial state of the computation, and performing the counterfactual surgery over its output.  I've actually done a lot of this exposition on OBLW without explicitly mentioning TDT, like Timeless Control and Thou Art Physics for reconciling determinism with choice (actually effective choice requires determinism, but this confuses humans for reasons given in Possibility and Could-ness).  But if you read the other parts of the solution to "free will", and then furthermore explicitly formulate TDT, then this is what utterly, finally, completely, and without even a tiny trace of confusion or dissatisfaction or a sense of lingering questions, kills off entirely the question of "free will".
  • Some concluding chiding of those philosophers who blithely decided that the "rational" course of action systematically loses; that rationalists defect on the Prisoner's Dilemma and hence we need a separate concept of "social rationality"; that the "reasonable" thing to do is determined by consulting pre-existing intuitions of reasonableness, rather than first looking at which agents walk away with huge heaps of money and then working out how to do it systematically; people who take their intuitions about free will at face value; assuming that counterfactuals are fixed givens raining down from the sky rather than non-observable constructs which we can construe in whatever way generates a winning decision theory; et cetera.  And celebrating of the fact that rationalists can cooperate with each other, vote in elections, and do many other nice things that philosophers have claimed they can't.  And suggesting that perhaps next time one should extend "rationality" a bit more credit before sighing and nodding wisely about its limitations.
  • In conclusion, rational agents are not incapable of cooperation, rational agents are not constantly fighting their own source code, rational agents do not go around helplessly wishing they were less rational, and finally, rational agents win.

Those of you who've read the quantum mechanics sequence can extrapolate from past experience that I'm not bluffing.  But it's not clear to me that writing this book would be my best possible expenditure of the required time.

New Comment
232 comments, sorted by Click to highlight new comments since: Today at 2:05 PM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Today I finally came up with a simple example where TDT clearly loses and CDT clearly wins, and as a bonus, proves that TDT isn't reflectively consistent.

Omega comes to you and says

I'm hosting a game with 3 players. Two players are AIs I created running TDT but not capable of self-modification, one being a paperclip maximizer, the other being a staples maximizer. The last player is an AI you will design. When the game starts, my two AIs will first get the source code of your AI (which is only fair since you know the design of my AIs). Then 2 of the 3 players will be chosen randomly to play a one-shot true PD, without knowing who they are facing. What AI do you submit?

Say the payoffs of the PD are

  • 5/5 0/6
  • 6/0 1/1

Suppose you submit an AI running CDT. Then, Omega's AIs will reason as follows: "I have 1/2 chance of playing against a TDT, and 1/2 chance of playing against a CDT. If I play C, then my opponent will play C if it's a TDT, and D if it's a CDT, therefore my expected payoff is 5/2+0/2=2.5. If I play D, then my opponent will play D, so my payoff is 1. Therefore I should play C." Your AI then gets a payoff of 6, since it will play D.

Suppose you submit an AI runn... (read more)

Or does my example fall outside of the specified problem class?

If I wanted to defend the original thesis, I would say yes, because TDT doesn't cooperate or defect depending directly on your decision, but cooperates or defects depending on how your decision depends on its decision (which was one of the open problems I listed - the original TDT is for cases where Omega offers you straightforward dilemmas in which its behavior is just a direct transform of your behavior). So where one algorithm has one payoff matrix for defection or cooperation, the other algorithm gets a different payoff matrix for defection or cooperation, which breaks the "problem class" under which the original TDT is automatically reflectively consistent.

Nonetheless it's certainly an interesting dilemma.

Your comment here is actually pre-empting a comment that I'd planned to make after providing some of the background for the content of TDT. I'd thought about your dilemmas, and then did manage to translate into my terms a notion about how it might be possible to unilaterally defect in the Prisoner's Dilemma and predictably get away with it, provided you did so for unusual reasons. But the condition... (read more)

9Wei Dai14y
You're right, I failed to realize that with timeless agents, we can't do backwards induction using the physical order of decisions. We need some notion of the logical order of decisions. Here's an idea. The logical order of decisions is related to simulation ability. Suppose A can simulate B, meaning it has trustworthy information about B's source code and has sufficient computing power to fully simulate B or sufficient intelligence to analyze B using reliable shortcuts, but B can't simulate A. Then the logical order of decisions is B followed by A, because when B makes his decision, he can treat A's decision as conditional on his. But when A makes her decision, she has to take B's decision as a given. Does that make sense?

Moving second is a disadvantage (at least it seems to always work out that way, counterexamples requested if you can find them) and A can always use less computing power. Rational agents should not regret having more computing power (because they can always use less) or more knowledge (because they can always implement the same strategy they would use with less knowledge) - this sort of thing is a sure sign of reflective inconsistency.

To see why moving logically second is a disadvantage, consider that it lets an opponent playing Chicken always toss their steering wheel out the window and get away with it.

That both players desire to move "logically first" argues strongly that neither one will; that the resolution here does not involve any particular fixed global logical order of decisions.

(I should comment in the future about the possibility that bio-values-derived civs, by virtue of having evolved to be crazy, can succeed in moving logically first using crazy reasoning, but that would be a whole 'nother story, and of course also falls into the "Way the fuck too dangerous to try in real life" category relative to my present knowledge.)

With timeless agents, we can't do backwards induction using the physical order of decisions. We need some notion of the logical order of decisions.

BTW, thanks for this compact way of putting it.

Being logically second only keeps being a disadvantage because examples keep being chosen to be of the kind that make it so. One category of counterexample comes from warfare, where if you know what the enemy will do and he doesn't know what you will do, you have the upper hand. (The logical versus temporal distinction is clear here: being temporally the first to reach an objective can be a big advantage.) Another counterexample is in negotiation where a buyer and seller are both uncertain about fair market price; each may prefer the other to be first to suggest a price. (In practice this is often resolved by the party with more knowledge, or more at stake, or both - usually the seller - being first to suggest a price.)
3Wei Dai14y
You're right. Rock-paper-scissors is another counter-example. In these cases, the relationship between between the logical order of moves and simulation ability seems pretty obvious and intuitive.
4Eliezer Yudkowsky14y
Except that the analogy to rock-paper-scissors would be that I get to move logically first by deciding my conditional strategy "rock if you play scissors" etc., and simulating you simulating me without running into an apparently non-halting computation (that would otherwise have to be stopped by my performing counterfactual surgery on the part of you that simulates my own decision), then playing rock if I simulate you playing scissors. At least I think that's how the analogy would work.
I suspect that this kind of problems will run into computational complexity issues, not clever decision theory issues. Like with a certain variation on St. Petersburg paradox (see the last two paragraphs), where you need to count to the greatest finite number to which you can count, and then stop.
1Wei Dai14y
Suppose I know that's your strategy, and decide to play the move equal to (the first googleplex digits of pi mod 3), and I can actually compute that but you can't. What are you going to do? If you can predict what I do, then your conditional strategy works, which just shows that move order is related to simulation ability.
4Eliezer Yudkowsky14y
In this zero-sum game, yes, it's possible that whoever has the most computing power wins, if neither can access unpredictable random or private variables. But what if both sides have exactly equal computing power? We could define a Timeless Paper-Scissors-Rock Tournament this way - standard language, no random function, each program gets access to the other's source code and exactly 100 million ticks, if you halt without outputting a move then you lose 2 points.
3Wei Dai14y
This game is pretty easy to solve, I think. A simple equilibrium is for each side to do something like iterate x = SHA-512(x), with a random starting value, using an optimal implementation of SHA-512, until time is just about to run out, then output x mod 3. SHA-512 is easy to optimize (in the sense of writing the absolutely fastest implementation), and It seems very unlikely that there could be shortcuts to computing (SHA-512)^n until n gets so big (around 2^256 unless SHA-512 is badly designed) that the function starts to cycle. I think I've answered your specific question, but the answer doesn't seem that interesting, and I'm not sure why you asked it.
3Paul Crowley12y
Schneier et al here prove that being able to calculate H^n(x) quickly leads to a faster way of finding collisions in H.
1Eliezer Yudkowsky14y
Well, it's probably not all that interesting from a purely theoretical perspective, but if the prize money was divided up among only the top fifth of players, you'd actually have to try to win, and that would be an interesting challenge for computer programmers.
1Wei Dai14y
But if you are TDT, you can't always use less computing power, because that might be correlated with your opponents also deciding to use less computing power, or will be distrusted by your opponent because it can't simulate you. But if you simply don't have that much computing power (and opponent knows this) then you seem to have the advantage of logically moving first. Lack of computing power could be considered a form of "crazy reasoning"... Why does TDT lead to the phenomenon of "stupid winners"? If there's a way to explain this as a reasonable outcome, I'd feel a lot better. But is that like a two-boxer asking for an explanation of why, when the stupid (from their perspective) one-boxers keep winning, that's a reasonable outcome?
1Eliezer Yudkowsky14y
Substitute "move logically first" for "use less computing power"? Using less computing power seems like a red herring to me. TDT on simple problems (with the causal / logical structure already given) uses skeletally small amounts of computing power. "Who moves first" is a "battle"(?) over the causal / logical structure, not over who can manage to run out of computing power first. If you're visualizing this using lots of computing power for the core logic, rather than computing the 20th decimal place of some threshold or verifying large proofs, then we've got different visualizations. The idea of "if you do this, the opponent does the same" might apply to trying to move logically first, but in my world this has nothing to do with computing power, so at this point I think it'd be pretty odd if the agents were competing to be stupider. Besides, you don't want to respond to most logical threats, because that gives your opponent an incentive to make logical threats; you only want to respond to logical offers that you want your opponent to have an incentive to make. This gets into the scary issues I was hinting at before, like determining in advance that if you see your opponent predetermine to destroy the universe in a mutual suicide unless you pay a ransom, you'll call their bet and die with them, even if they've predetermined to ignore your decision, etcetera; but if they offer to trade you silver for gold at a Ricardian-advantageous rate, you'll predetermine to cooperate, etc. The point, though, is that "If I do X, they'll do Y" is not a blank check to decide that minds do X, because you could choose a different form of responsiveness. But anyway, I don't see in the first place that agents should be having these sorts of contests over how little computing power to use. That doesn't seem to me like a compelling advantage to reach for. If you've got that little computing power then perhaps you can't simulate your opponent's skeletally small TDT decision, i.e., you c
0Eliezer Yudkowsky14y
...possibly employing mixed strategies, by analogy to the equilibrium of games where neither agent gets to go first and both must choose simultaneously? But I haven't done anything with this idea, yet.
BTW, thanks for this compact way of putting it.
This reminds me of logical Fatalism and the Argument from Bivalence
That's a good point, but what if the process that gives birth to CDT doesn't listen to the incentives you give it? For example, it could be evolution or random chance. Here's an example, similar to Wei's example above. Imagine two parallel universes, both containing large populations of TDT agents. In both universes, a child is born, looking exactly like everyone else. The child in universe A is a TDT agent named Alice. The child in universe B is named Bob and has a random mutation that makes him use CDT. Both children go on to play many blind PDs with their neighbors. It looks like Bob's life will be much happier than Alice's, right? What force will push against evolution and keep the number of Bobs small?
The problem is that "source code of your AI" is not a complete story, since your decisions as AI programmer also depended on the Omega AIs' code, and so what you give as the source of AI is already only one of the possible worlds that presupposes the behavior of Omega AIs.
2Wei Dai14y
Yes, I think Eliezer made a similar point: So if you run TDT, then there are at least two equilibria in this game, only one of which involves you submitting a CDT. Can you think of a way to select between these two equilibria? If not, I can fix this by changing the game a bit. Omega will now create his TDT AIs after you design yours, and hard code the source code of your AI into it as givens. His AIs won't even know about you, the real player.
8Eliezer Yudkowsky14y
They might simply infer you, the real player. You might as well tell the TDT AIs that they're up against a hardcoded Defect move as the "other player", but they won't know if that player has been selected. In fact, that pretty much is what you're telling them, if you show them a CDT player. The CDT player is a red herring - the decision to defect was made by you, in the moment of submitting a CDT player. There is no law against TDT players realizing this after Omega codes them. I should note that in matters such as these, the phrase "hard code" should act as a warning sign that you're trying to fix something that, at least in your own mind, doesn't want to be fixed. (E.g. "hard code obedience into AIs, build it into the very circuitry!") Where you are tempted to say "hard code" you may just need to accept whatever complex burden you were trying to get rid of by saying "fix it in place with codes of iron!"
2Wei Dai14y
By hard code, I meant code it into the TDT's probability distribution. (Even TDT isn't meta enough to say "My prior is wrong!") But that does make the example less convincing, so let me try something else. Have Omega's AIs physically go first and you play for yourself. They get a copy of your source code, then make their moves in the 3-choose-2 PD game first. You learn their move, then make your choice. Now, if you follow CDT, you'll reason that your decision has no causal effect on the TDT's decisions, and therefore choose D. The TDTs, knowing this, will play C. And I think I can still show that if you run TDT, you will decide to self-modify into CDT before starting this game. First, if Omega's AIs know that you run TDT at the beginning, then they can use that "play D if you self-modify" strategy to deter you from self-modifying. But you can also use "I'll self-modify anyway" to deter them from doing that. So who wins this game? (If someone moves first logically, then he wins, but what if everyone moves simultaneously in the logical sense, which seems to be the case in this game?) Suppose it's common knowledge that Omega mostly chooses CDT agents to participate in this game, then "play D if you self-modify" isn't very "credible". That's because they only see your source code after you self-modify so they'd have to play D if they predict that a TDT agent would self-modify, even if the actual player started with CDT. Given that, your "I'll self-modify anyway" would be highly credible. I'm not sure how to formalize this notion of "credibility" among TDTs, but it seems to make intuitive sense.
6Eliezer Yudkowsky14y
Well that should never happen. Anything that would make a TDT want to self-modify into CDT should make it just want to play D, no need for self-modification. It should give the same answer at different times, that's what makes it a timeless decision theory. If you can break that without direct explicit dependence on the algorithm apart from its decisions, then I am in trouble! But it seems to me that I can substitute "play D" for "self-modify" in all cases above. E.g., "play D if you play D to deter you from playing D" seems like the same idea, the self-modification doesn't add anything. Well... it partially seems to me that, in assuming certain decisions are made without logical consequences - because you move logically first, or because the TDT agents have fixed wrong priors, etc. - you are trying to reduce the game to a Prisoner's Dilemma in which you have a certain chance of playing against a piece of cardboard with "D" written on it. Even a uniform population of TDTs may go on playing C in this case, of course, if the probability of facing cardboard is low enough. But by the same token, the fact that the cardboard sometimes "wins" does not make it smarter or more rational than the TDT agents. Now, I want to be very careful about how I use this argument, because indeed a piece of cardboard with "only take box B" written on it, is smarter than CDT agents on Newcomb's Problem. But who writes that piece of cardboard, rather than a different one? An authorless piece of cardboard genuinely does go logically first, but at the expense of being a piece of cardboard, which makes it unable to adapt to more complex situations. A true CDT agent goes logically first, but at the expense of losing on Newcomb's Problem. And your choice to put forth a piece of cardboard marked "D" relies on you expecting the TDT agents to make a certain response, which makes the claim that it's really just a piece of cardboard and therefore gets to go logically first, somewhat questionable.
1Wei Dai14y
The reason to self-modify is to make yourself indistinguishable from players who started as CDT agents, so that Omega's AIs can't condition their moves on the player's type. Remember that Omega's AIs get a copy of your source code. But a CDT agent would self-modify into something not losing on Newcomb's problem if it expects to face that. On the other hand, if TDT doesn't self-modify into something that wins my game, isn't that worse? (Is it better to be reflectively consistent, or winning, if you had to choose one?) Yes, I agree that's a big piece of the puzzle, but I'm guessing the solution to that won't fully solve the "stupid winner" problem. ETA: And for TDT agents that move simultaneously, there remains the problem of "bargaining" to use Nesov's term. Lots of unsolved problems... I wish you started us working on this stuff earlier!
Being (or performing an action) indistinguishable from X doesn't protect you from the inference that X probably resulted from such a plot. That you can decide to camouflage like this may even reduce X's own credibility (and so a lot of platonic/possible agents doing that will make the configuration unattractive). Thus, the agents need to decide among themselves what to look like: first-mover configurations is a limited resource. (This seems like a step towards solving bargaining.)
0Wei Dai14y
Yes, I see that your comment does seem like a step towards solving bargaining among TDT agents. But I'm still trying to argue that if we're not TDT agents yet, maybe we don't want to become them. My comment was made in that context.
Let's pick up Eliezer's suggestion and distinguish now-much-less-mysterious TDT from the different idea of "updateless decision theory", UDT, that describes choice of a whole strategy (function from states of knowledge to actions) rather than choice of actions in each given state of knowledge, of which latter class TDT is an example. TDT isn't a UDT, and UDT is a rather vacuous statement, as it only achieves reflective consistency pretty much by definition, but doesn't tell much about the structure of preference and how to choose the strategy. I don't want to become a TDT agent, as in UDT sense, TDT agents aren't reflectively consistent. They could self-modify towards more UDT-ish look, but this is the same argument as with CDT self-modifying into a TDT.
0Eliezer Yudkowsky14y
Dai's version of this is a genuine, reflectively consistent updateless decision theory, though. It makes the correct decision locally, rather than needing to choose a strategy once and for all time from a privileged vantage point. That's why I referred to it as "Dai's decision theory" at first, but both you and Dai seem to think your idea was the important one, so I compromised and referred to it as Nesov-Dai decision theory.
Well, as I see UDT, it also makes decisions locally, with understanding that this local computation is meant to find the best global solution given other such locally computed decisions. That is, each local computation can make a mistake, making the best global solution impossible, which may make it very important for the other local computations to predict (or at least notice) this mistake and find the local decisions that together with this mistake constitute the best remaining global solution, and so on. The structure of states of knowledge produced by the local computations for the adjacent local computations is meant to optimize the algorithm of local decision-making in those states, giving most of the answer explicitly, leaving the local computations to only move the goalpost a little bit. The nontrivial form of the decision-making comes from the loop that makes local decisions maximize preference given the other local decisions, and those other local decisions do the same. Thus, the local decisions have to coordinate with each other, and they can do that only through the common algorithm and logical dependencies between different states of knowledge. At which point the fact that these local decisions are part of the same agent seems to become irrelevant, so that a more general problem needs to be solved, one of cooperation of any kinds of agents, or even more generally processes that aren't exactly "agents".
4Wei Dai14y
One thing I don't understand is that both you and Eliezer talk confidently about how agents would make use of logical dependencies/correlations. You guys don't seem to think this is a really hard problem. But we don't even know how to assign a probability (or whether it even makes sense to do so) to a simple mathematical statement like P=NP. How do we calculate and/or represent the correlation between one agent and another agent (except in simple cases like where they're identical or easily proven to be equivalent)? I'm impressed by how far you've managed to push the idea of updatelessness, but it's hard for me to process what you say, when the basic concept of logical uncertainty is still really fuzzy.
3Eliezer Yudkowsky14y
I can argue pretty forcefully that (1) a causal graph in which uncertainty has been factored into uncorrelated sources, must have nodes or some kind of elements corresponding to logical uncertainty; (2) that in presenting Newcomblike problems, the dilemma-presenters are in fact talking of such uncertainties and correlations; (3) that human beings use logical uncertainty all the time in an intuitive sense, to what seems like good effect. Of course none of that is actually having a good formal theory of logical uncertainty - I just drew a boundary rope around a few simple logical inferences and grafted them onto causal graphs. Two-way implications get represented by the same node, that sort of thing. I would be drastically interested in a formal theory of logical uncertainty for non-logically-omniscient Bayesians. Meanwhile - you're carrying out logical reasoning about whole other civilizations starting from a vague prior over their origins, every time you reason that most superintelligences (if any) that you encounter in faraway galaxies, will have been built in such a way as to maximize a utility function rather than say choosing the first option in alphabetical order, on the likes of true PDs.
2Wei Dai14y
I want to try to understand the nature of logical correlations between agents a bit better. Consider two agents who are both TDT-like but not perfectly correlated. They play a one-shot PD but in turn. First one player moves, then the other sees the move and makes its move. In normal Bayesian reasoning, once the second player sees the first player's move, all correlation between them disappears. (Does this happen in your TDT?) But in UDT, the second player doesn't update, so the correlation is preserved. So far so good. Now consider what happens if the second player has more computing power than the first, so that it can perfectly simulate the first player and compute its move. After it finishes that computation and knows the first player's move, the logical correlation between them disappears, because no uncertainty implies no correlation. So, given there's no logical correlation, it ought to play D. The first player would have expected that, and also played D. Looking at my formulation of UDT, this may or may not happen, depending on what the "mathematical intuition subroutine" does when computing the logical consequences of a choice. If it tries to be maximally correct, then it would do a full simulation of the opponent when it can, which removes logical correlation, which causes the above outcome. Maybe the second player could get a better outcome if it doesn't try to be maximally correct, but the way my theory is formulated, what strategy the "mathematical intuition subroutine" uses is not part of what's being optimized. So, I'm not sure what to do about this, except to add it to the pile of unsolved problems.
Coming to this a bit late :), but I've got a basic question (which I think is similar to Nesov's, but I'm still confused after reading the ensuing exchange). Why would it be that, If the second player has more computer power (so that the first player cannot simulate it), how can the first player predict what the second player will do? Can the first player reason that since the second player could simulate it, the second player will decide that they're uncorrelated and play D no matter what? That dependence on computing power seems very odd, though maybe I'm sneaking in expectations from my (very rough) understanding of UDT.
The first player's move could depend on the second player's, in which case the second player won't get the answer is a closed form, the answer must be a function of the second player's move...
0Wei Dai14y
But if the second player has more computational power, it can just keep simulating the first player until the first player runs out of clock cycles and has to output something.
I don't understand your reply: exact simulation is brute force that isn't a good idea. You can prove general statements about the behavior of programs on runs of unlimited or infinite length in finite time. But anyway, why would the second player provoke mutual defection?
0Wei Dai14y
In my formulation, it doesn't have a choice. Whether or not it does exact simulation of the first player is determined by its "mathematical intuition subroutine", which I treated as a black box. If that module does an exact simulation, then mutual defection is the result. So this also ties in with my lack of understanding regarding logical uncertainty. If we don't treat the thing that reasons about logical uncertainty as a black box, what should we do? ETA: Sometimes exact simulation clearly is appropriate, for example in rock-paper-scissors.
Conceptually, I treat logical uncertainty as I do prior+utility, a representation of preference, in this more general case over mathematical structures. The problems of representing this preference compactly and extracting human preference don't hinder these particular explorations.
0Wei Dai14y
I don't understand this yet. Can you explain in more detail what is a general (noncompact) way to representing logical uncertainty?
If you are a CDT agent, you can't (or simply won't) become a normal TDT agent. If you are a human, who knows what that means.
After all, for anything you can hard code, the AI can build a new AI that lacks your hard coding and sacrifice its resources to that new AI.
Wei_Dai wrote on 19 August 2009 07:08:23AM : That seems to violate the secrecy assumptions of the Prisoner's Dilemma problem! I thought each prisoner has to commit to his action before learning what the other one did. What am I missing? Thanks!

This is very cool, and I haven't digested it yet, but I wonder if it might be open to the criticism that you're effectively postulating the favored answer to Newcomb's Problem (and other such scenarios) by postulating that when you surgically alter one of the nodes, you correspondingly alter the nodes for the other instances of the computation. After all, the crux of the counterfactual-reasoning dilemma in Newcomb's Problem (and similarly in the Prisoner's Dilemma) is to jusftify the inference "If I choose both boxes, then (probably) so does the simulation (even if in fact I/it do not)" rather than "If I choose both boxes, then the simulation doesn't necessarily match my choice (even though in fact it does)". It could be objected that your formalism postulates the desired answer rather than giving a basis for deriving it--an objection that becomes more important when we move away from identical or functionally equivalent source code and start to consider approximate similarities. (See my criticism of Leslie (1991)'s proposal that you should make your choice as though you were also choosing on behalf of other agents of similar causal structure. If I'm not mistake... (read more)

3Eliezer Yudkowsky14y
Replied at
To clarify: the agent in MCDT is a particular physical instantiation, rather than being timeless/Platonic (well, except insofar as physics itself is Platonic).

Does this theory handle Drescher's example of raising my hand because I want the universe a billion years ago to be such that I would raise my hand a billion years hence?

9Eliezer Yudkowsky14y
Yes. That's a logical dependence. ETA: To be exact, you have a fixed state a billion years ago, a computation which runs on that state to determine "Will you raise your hand a billion years hence?", and you can know the initial state without knowing the output of the function, but then determine that the function outputs "Yes" iff your decision diagonal outputs "Raise hand", so if your values U maximize at "Yes" of this function on that data, then you can (will) exert logical control over the value of this fixed mathematical function in which a copy of you is embedded. That's what life is all about, actually. You could just regard the universe as a big mathematical function containing a copy of you, over which you're exerting logical control. ETA2: You'd have to ask Gary Drescher whether he knows of anyone else who's reductionist enough to realize that you can control the output of a fixed deterministic mathematical function if that function happens to be one in which you are embedded. As far as I know, it's just Gary Drescher. ETA3: "Logical control" and "Thou art math" is essentially the same idea as timeless control and thou art physics, it's just even more fun.
Nice. A while ago I also noticed that you can control any mathematical structure if it knows about you and you know about it (i.e. there is logical dependence), which generalizes the notion of trade with other possible worlds, control of the past, etc. If that other mathematical structure is interpreted as an agent, it can be made to behave as you prefer, if in return you behave is it prefers. Thus, it's possible for us to have and realize preferences over mathematical structures, in particular by trading with them in this manner. At the same time there are all sorts of weird limitations of what's possible to affect this way, for example you can control something faster than light (logical control), but only with info that is already in the logical dependence, which excludes the info that only one side has. For example, if you send away a perfect simulation of your mind on a spaceship, you can "control" what happens of the spaceship if neither of you receives observations from outside, as both computations will be identical. If some info from a year ago is sent to the spaceship, and both you and the simulation observe it (simultaneously), you remain synchronized, but now you learned something new. This way, streams of observations can be sent in both directions, continuously updating both copies. These observations, being identical, are added to logical dependence between you and the simulation, and so can be used in logical control. Thus, the whole state of knowledge in shared, and the conclusions of the whole algorithm of mind can be used for control. On the other hand, if you know something above and beyond this shared knowledge (like recent observations), you can't use this knowledge or any conclusions reached from this knowledge in logical control. You can't update on non-shared knowledge and retain ability to handle logical dependence. This seems related to non-updating in counterfactual mugging: you need to exercise control over the other possible world, an
2Eliezer Yudkowsky14y
I think you could use a non-updated Pearl graph for your updateless decision theory, but the part where you (instead of updating) decide which computational processes are similar or dissimilar to you, would be a logical problem, I think, not the domain of causal graphs.
Not-updating is the same kind of simplified denotational behemoth as a GLUT. Much of the usefulness of probabilistic graphical models comes from the fact that they compress the probability distribution into smaller representations and allow manipulation and specification of these distributions in terms of the compact representations. If I just start copying a lot of the graphical models, it won't capture the structure of the problem, so instead of being updateless, the decision theory must update what it can, or represent a lot of partially dependent states of knowledge in a single structure, allowing to extract decisions unaffected by the knowledge that doesn't belong to them. I suspect that expectation maximization/probability won't play an important role in this structure, as the structure of graphical models seems to capture the same objects as logical dependence must (where do you get the causal graphs from?), and so a structure that can work with logical (in)dependence may already contain the structure captured by probabilistic graphical models, subsuming the latter.
Just as a matter of terminology, I prefer to say that we can choose (or that we have a choice about) the output, rather than that we control it. To me, control has too strong a connotation of cause. It's tricky, of course, because the concepts of choice-about and causal-influence-over are so thoroughly conflated that most people will use the same word to refer to both without distinction. So my terminology suggestion is kind of like most materialsts' choice to relinquish the word soul to refer to something extraphysical, retaining consciousness to refer to the actual physical/computational process. (Causes, unlike souls, are real, but still distinct from what they're often conflated with.) Again, this is just terminology, nothing substantive. EDIT: In the (usual) special case where a means-end link is causal, I agree with you that we control something that's ultimately mathematical, even in my proposed sense of the term.
1Eliezer Yudkowsky14y
Hm. To me, "choose" sounds like invoking the idea of multiple possibilities, while "control" sounds more determinism-compatible. Of course that is a mere matter of terminology. Though I'm not sure what you mean by "in the special case where a means-end link is causal" - my thesis was that if you are uncertain about the output of your decision computation, and you factor the universe the Pearlian way, then your logical decision will end up being, in the graph, the logical cause of box B containing a million dollars. You mean the special case where a means-end link is physical? But what is physics except math? Or are we assuming that the local causal relations in physics are more privileged as ontologically basic causes, whereas "logical causality" is just a convenient way of factoring uncertainty and a winning way to construe counterfactuals? (That last one may have some justice to it.)
I agree that "choose" connotes multiple alternatives, but they're counterfactual antecedents, and when construed as such, are not inconsistent with determinism. I don't know about being ontologically basic, but (what I think of as) physical/causal laws have the important property that they compactly specify the entirety of space-time (together with a specification of the initial conditions).
Is there a formulation of this example that isn't purely metaphysical, i.e. where you could actually detect the difference?

One of the benefits of publishing a complete explanation is that some of the (valid) criticisms of it will lead to a stronger, repaired theory.

I confess that I don't follow your program yet, but the outline is much preferred to vague "I have a secret theory" teasing.

0Eliezer Yudkowsky14y
Yeah, I hear that claim a lot. It seems to apply to some other world than this one. At some point one must notice when an idealistic belief is failing to accumulate evidence in favor of itself. We'll see whether publishing this outline yields any criticisms or suggestions over and above what Nesov and Dai already managed to say based on merely "I have a timeless decision theory". I'm not holding my breath. This outline actually is enough that someone versed in Newcomblike problems and causality ought to be able to make out what I'm talking about, and with a bit of intelligence work out on their own just how many classical dilemmas it solves. Nonetheless I fully expect this post to drop into the void and never be heard from again. That's not because of an evil conspiracy, of course. It's just the default course of events in academia.
I feel like the ratio of words written to words read in compsci research is getting pretty awful. Conferences are happy to take whatever paper-like substance you can churn out. It's probably worse in other fields.
2Wei Dai14y
I'd be surprised too if academia were to take a blog post seriously. Why not explain the ideas to someone who has the time and motivation to write them up into academic papers (and share co-authorship or whatever)? If you found the right person, that ought to be much faster than doing it yourself. (I mean take up much less of your own time.)
1Eliezer Yudkowsky14y
I'd still expect it to drop into the void. Maybe if I write a popular rationality book and it proves popular enough, that probable cost/benefit will change. Are you volunteering?
5Wei Dai14y
No, I'm not volunteering. I said earlier that I don't have the skill/experience/patience/willpower for it. You could publicly ask for volunteers though. Perhaps there is a bunch of Ph.D. students around looking for something to write about. Why is it that Adam Elga can write about the Sleeping Beauty Problem and get 89 citations? Decision theorists are clearly looking something to do... ETA: Maybe it's because of his reputation/status? In that case I guess you need to convince someone high-status to co-author the papers.
-2Eliezer Yudkowsky14y
Anyone who declines to talk about interesting material because it's in a blog post, or for that matter, a poem scrawled in blood on toilet paper, is not taking Science seriously. Why should I expect them to have anything important to say if I go to the further trouble of publishing a paper? I ought to post the decision theory to a thread on /b on 4chan, then try forwarding it around to philosophers who've written on Newcomblike problems. Only the ones who really care about their work would dare to comment on it, and the net quality of discussion would go up. Publishing in a peer-reviewed journal just invites in the riffraff. Yes, this is somewhat tongue-in-cheek, but not so tongue-in-cheek that I'm not seriously considering trying it.
Ignoring non-papers claiming to have solved a problem is a good crackpot-avoiding heuristic. What isn't even written up is even less likely worth reading than something with only a few citations that is written up.
4Eliezer Yudkowsky14y
If that were really what was going on, not status games, then getting a link to the blog post from a couple of known folk of good reputation - e.g. Nick Bostrom and Gary Drescher - would be enough to tell people that here was something worth a quick glance to find out more. Now it's worth noting that my whole cynicism here can be falsified if this post gets a couple of links from folk of good reputation, followed by genuinely somewhere-leading discussion which solves open problems or points out new genuine problems.
Heh, if you find a poem scrawled in blood on toilet paper, you probably have a higher priority than Science at the moment -- like finding the psycho f---! But anyway, you half-jest, but this is a problem I've run into myself. Stephan Kinsella has a widely-cited magnum opus opposing intellectual property rights. I have since presented a gaping hole in its logic, which he acknowledges isn't handled well, but doesn't feel the need to resolve this hole in something he's built his reputation around, merely because I didn't get it published in a journal. Yes, peer review is good crackpot filter, but it can also be a filter from having to admit your errors. [/threadjack]
"Anyone who declines to talk about interesting material because it's in a blog post, or for that matter, a poem scrawled in blood on toilet paper, is not taking Science seriously. Why should I expect them to have anything important to say if I go to the further trouble of publishing a paper?" What? Vladimir is right not paying attention to blog entry with no published work is a great way to avoid crackpots. You have this all backwards you speak as if you have all these credentials so everyone should just take you seriously. In reality what credentials do you have? You built all this expectation for this grand theory and this vague outline is the best you can do? Where is the math? Where is the theory? I think anyone in academia would be inclined to ask the same question of you why should they take some vague blog entry seriously when the writer controls the comments and can't be bothered to submit his work for peer-review? You talk about wanting to write a PhD thesis this won't help get you there. In fact this vague outline should do nothing but cast doubt in everyones mind as to whether you have a theory or not. I have been following this TDT issue for a while and I for one would like to see some math and some worked out problems. Otherwise I would be inclined to call your bluff. Eliezer have you ever published a paper in a peer-review journal? The way you talk about it says naive amateur. There is huge value especially for you since you don't have a PhD or any successful companies or any of the other typical things that people who go the non-academic route tend to have. Let's face the music here, your one practical AI project that I am aware of Flare failed, and most of your writing has never been subjected to the rigor that all science should be subjected to. It seems to me if you want to do what you claim you need to start publishing.
"Levels of Organization in General Intelligence" appeared in the Springer volume Artificial General Intelligence. "Cognitive Biases Potentially Affecting Judgement of Global Risks" (PDF) and "Artificial Intelligence as a Positive and Negative Factor in Global Risk" (PDF) appeared in the Oxford University Press volume Global Catastrophic Risks. They're not mathy papers, though.
I am sorry I am going to take a shortcut here and respond to a couple posts along with yours. So fine I partially insert my foot in my mouth... but the issue I think here is that the papers we need to be talking about are math papers right? Anyone can publish non-technical ideas as long as they are well reasoned, but the art of science is the technical mastery. As for Eliezer's comment concerning the irrelevance of Flare being a pre 2003 EY work I have to disagree. When you have no formal academic credentials and you are trying to make your mark in a technical field such as decision theory anything technical that you have done or attempted counts. You essentially are building your credentials via work that you have done. I am speaking from experience since I didn't complete college I went the business route. But I can also say that I did a lot of technical work so I built my credentials in the field by doing novel technical things. I am trying to help here coming from a similar position and wanting a PhD etc. having various technical achievements as my prior work made all the difference in getting in to a PhD program without a B.S. or M.S. It also makes all the difference in being taken seriously by the scientific community. Which circles back to my original point which is an vague outline is not enough to show you really have a theory much less a revolutionary one. Sadly asking to be taken seriously is just not enough, you have to prove that you meet the bar of admission (decision theory is going to be math). If someone can show me some technical math work EY has done that would be great, but as of now I have very little confidence that he has a real theory (if someone can I will drop the issue.) Yes I am aware of the Bayesian Theory paper but this lets face it is fairly basic and is far from showing that EY has the ability to revolutionize decision theory.
1Eliezer Yudkowsky14y
Where? What university?
The university would be Carnegie Mellon Computer Science Program (an esoteric area of CS) As for the other parts I did some work in computer hardware specifically graphics hardware design, body armor design (bullet proof vests) etc. The body armor got to prototyping but was not marketable for a variety reasons to dull to go into. I am currently starting a video game company.
1Eliezer Yudkowsky14y
Also, volume-editing isn't as (pointlessly? signallingly?) difficult as journal peer-review.
This vague outline is the result of Eliezer yielding to our pleas to say something - anything - about his confident solution to Newcomb's problem. Now that it's been posted as a not-obviously-formalizable text, and people are discussing it informally, I share a lot of your disappointment. But let's give the topic some days and see how it crystallizes. What's Flare? (...looks it up...) Oh dear Cthulhu, oh no. (Edit: I originally listed several specific users as "refusing to formalize". That was wrong.)
1Eliezer Yudkowsky14y
A legacy of pre-2003 Eliezer, of no particular importance one way or another.
0Wei Dai14y
What about what I wrote? Which part do you find insufficiently formal? Of course I use "mathematical intuition" as a black box without explaining how it works, but that's just like EDT using "prior" without explaining where it comes from, or CDT using "causal probability" as a black box. It's an unsolved problem, not refusal to formalize.
Your decision theory is formal enough for me, but it seems to be different from Eliezer's, which I was talking about. If they're really the same, could you explain how?
0Wei Dai14y
In that case, I never said I understood Eliezer's version well enough that I could formalize it if I wanted to, and I don't think Nesov and Drescher claimed that either, so I don't know why you mentioned our names in connection with "refuse to formalize". Actually I explicitly said that I don't understand Eliezer's theory very well yet.
You're right. I apologize. Amended the comment.
Well, it may be that some academics do take Science seriously, but they also care about status signaling. There's nothing that says a person can't simultaneously optimize for two different values, right? Why exclude those whose values aren't exactly your values, instead of trying to cooperate with them?
Also: Anyone who declines to talk about interesting material because it's in a blog post, or for that matter, a poem scrawled in blood on toilet paper, is not taking Science seriously. Why should I expect them to have anything important to say if I go to the further trouble of publishing a paper?
Looks to me like there's a pretty lively conversation so far!

The three-sentence version is: Factor your uncertainty over (impossible) possible worlds into a causal graph that includes nodes corresponding to the unknown outputs of known computations; condition on the known initial conditions of your decision computation to screen off factors influencing the decision-setup; compute the counterfactuals in your expected utility formula by surgery on the node representing the logical output of that computation.

I'm trying to understand the difference between this formulation and mine. Interestingly, Eliezer seems to h... (read more)

CTDT vs. ETDT. Hmm, that's a tough one. First, CTDT allows "screening off" of causes, which makes a big difference. I liked EY's formulation above: "TDT doesn't cooperate or defect depending directly on your decision, but cooperates or defects depending on how your decision depends on its decision." It's hard to collect evidence, I think, but reasoning about a causal graph gives you the ability to find out how latent decisions affect other outcomes. So in this case, expected utility based reasoning leaves you in a posiiton where you make some decisions because they seem correlated with good outcomes, while the causal reasoning lets you sometimes see either that the actions and consequences are disconnected or that the causation runs in the opposite direction to what you desire. ETA: EY's street crossing example is an example of causation running in the opposite direction.
1Eliezer Yudkowsky14y
= Drescher's street crossing example, don't know if Drescher got it from somewhere else.
1Eliezer Yudkowsky14y
Parfit's Hitchhiker; in the future, after having observed that you've already been picked up and made it to safety, you'll still compute the counterfactual "If the output of my computation were to refuse to pay, then I would not have been picked up." Since TDT screens off all info that goes into your decision-setup, using your updateless version of TDT might obliterate the difference between evidential and causal approaches entirely - no counterfactuals, no updates, just ruling out of self-copies that have received incompatible sense data. (Not sure yet if this works.)

This feels right to me. I can't implement it, and I'm not sure I could explain what Eli said, but I understand Pearl well enough (at an intuitive level) to say that it feels like the kind of additions Eli is talking about would clarify and reach the results he's talking about.

Read Pearl. It's not mathy, it's mostly words about graph manipulation.

If you're bothered by math, read Pearl anyway. He doesn't use equations or make you transform symbols. If you can think about information flows or reason visually, Pearl's calculus is for you. You'll under... (read more)

Second Chris' advice on reading Pearl. If it helps, I am happy to help with the technical content of the book, or with general technical questions about causal inference (either over email or here).
I've tried to read Pearl's decision theory book, but it seemed dry and boring. Guess I'll have to give it another go... It's available online too, but don't pirate it.
That's "Causality: models, reasoning, and inference By Judea Pearl"...? "Not mathy"? It's jammed full of dense maths! It has integration symbols, summation symbols, logic, probability, theorems and lemmas coming out of its ears! Obviously, Pearl is showing off to impress his peers ;-)
okay, you're right they're in there, but Pearl uses those in the proofs, not the explanations, as I recall. I don't think you have to understand the proofs to get the idea. If you find math oppressive, let me know if you try Pearl and find it too daunting. If that happens, I'll change the way I describe the book, I promise.
0Eliezer Yudkowsky14y
Probably a little, but it does help you find mistakes where they exist. (Okay, that was showing off.)

Rolf Nelson wanted to know what everyday problems evidential decision theory produces. Newcomb's Problem can be mapped onto the Prisoner's Dilemma, but are there similarly common Smoking Lesion like problems?

3Eliezer Yudkowsky14y
Well, if you're using TDT, then conditioning on the initial state of your physical computation screens off most such problems. But if you don't break down your causal graph that finely, then there are all sorts of situations in which crazy people might be tempted to use EDT. I think Drescher in his book gives the case of someone who observes that people usually decide to cross the street only when it is safe to do so, who concludes that by deciding to cross the street they can make it safe.
Majoritarianism may frequently be the result of the application of evidential decision theory, ignoring all of the non-naturalistic vagueness in the formulations of CDT and EDT, might it not?
Some kinds of majoritarianism, certainly. The confusion is based on mistaking correlation of votes with commonality of interests. "If we can all agree to vote for proposition X, then it must be in our favor, right?"

This is better than nothing, thanks and upvote. Now let's begin translating this stuff. AFAICT, a "decision theory" is supposed to have two parts:

1) A blah blah verbal algorithm for translating real-world problem descriptions into a certain kind of formal structure.

2) A mathematical algorithm that accepts that formal structure and outputs a decision.

I don't fully understand what formal structure you're proposing (a Pearl-style causal graph with additional "logical" arrows? why would this always be acyclic?), and can't understand the algorithm until the structure is clear enough.

0Eliezer Yudkowsky14y
If the arrows are material implications, then A -> B -> C -> A collapses via iff to a single node. Can you give an example of cyclic logical uncertainty?
I was thinking of some case where the cycle contains both physical and logical arrows. Logical arrows can point backwards in time, so this doesn't seem to be impossible in principle. Sorry, can't give a specific example because I don't fully understand what you mean by "logical uncertainty".
My reading is that logical nodes can point to physical nodes, but not vice versa. (Also that it doesn't make sense to say an arrow from a logical node "points backwards in time". Logical nodes are timeless.)
Physical arrows shouldn't point to logical nodes, though... right?

I gave one example earlier of TDT agents not playing cooperate in PD against each other. Here's another, perhaps even more puzzling, example.

Consider 3 TDT agents, A, B, and C, playing a game of 3-choose-2 PD. These agents are identical, except that they have different beliefs about how they are logically related to each other. A and B both believe that A and B are 100% logically correlated (in other words, logically equivalent). A and C both believe that A and C are 0% logically correlated. B and C also believe that B and C are 0% logically correlated.

Wha... (read more)

1Eliezer Yudkowsky14y
How do they end up in this situation? Clearly they cannot all have common knowledge of each other's source code, so where do they obtain their definite beliefs about each other instead?
0Wei Dai14y
Re: "definite beliefs", the numbers don't have to be 100% and 0%. They could be any p and q, where p is above the threshold for cooperation, and q is below. As for where the numbers come from, I don't know. Perhaps the players have different initial intuitions (from a mathematical intuition module provided by evolution or their programmers) about their logical correlations, which causes them to actually have different logical correlations (since they are actually computing different things when making decisions), which then makes those intuitions consistent upon reflection.
1Eliezer Yudkowsky14y
Why can't A and B choose to be correlated with C by deliberately making their decision dependent on its decision? Insufficient knowledge of C's code even to make their decision dependent on "what an agent does when it thinks it's not correlated to you"? In other words, you know that C is going to follow a certain decision algorithm here - do the Dai-obvious thing and defect - but A and B don't know enough about C to defect conditional on the "obvious" thing being to defect?
1Wei Dai14y
A and B don't choose this, because given their beliefs (i.e., low correlation between A and C, and B and C), that doesn't maximize their expected utilities. So the belief is like a self-fulling prophecy. Intuitively, you might think "Why don't they get out of this trap by choosing to be correlated with C and simultaneously change their beliefs?" The problem is that they don't think this will work, because they think C wouldn't respond to this. In other words, why would A and B defect conditional on C defecting, when they know "C is going to follow a certain decision algorithm here - do the Dai-obvious thing and defect"? Anyway, that's what I think happens under UDT1. It's quite possible (almost certain, really) that UDT1 is wrong or incomplete. But if you have a better solution, can you try to formalize it, and not just make informal arguments? Or, if you think you have an intuitively satisfactory solution that you don't know how to formalize yet, I'll stop beating this dead horse and let you work it out.
1Eliezer Yudkowsky14y
I don't have a general solution. I'm just carrying out the reasoning by hand. I don't know how to solve the logical ordering problem. Why would C choose to follow such an algorithm, if C perceives that not following such an algorithm might lead to mutual cooperation instead of mutual defection? Essentially, I'm claiming that your belief about "logical uncorrelation" is hard to match up with your out-of-context intuitive reasoning about what all the parties are likely to do. It's another matter if C is a piece of cardboard, a random number generator, or a biological organism operating on some weird deluded decision theory; but you're reasoning as if C is calmly maximizing. Suppose I put things to you this way: Groups of superrational agents will not occupy anything that is not at least a Pareto optimum, because they have strong motives to occupy Pareto optima and TDT lets them coordinate where such motives exist. Now the 3-choose-2 problem with two C players and one D player may be a Pareto optimum (if taken at face value without further trades being possible), but if you think of Pareto-ization as an underlying motivation - that everyone starts out in the mutual defection state, and then has a motive to figure out how to leave the mutual defection state by increasing their entanglement - then you might see why I'm a bit more skeptical about these "logical uncorrelations". Then you just end up in the all-D state, the base state, and agents have strong incentives to figure out ways to leave it if they can. In other words, you seem to be thinking in terms of a C-equilibrium already accomplished among one group of agents locally correlated with themselves only, and looking at the incentive of other agents to locally-D; whereas my own reasoning assumes the D-equilibrium already globally accomplished, but suspects that in this case rational agents have a strong incentive to reach up to the largest reachable C-equilibria, which they can accomplish by increasing (not de
0Wei Dai14y
Ok, this looks reasonable to me. But how would they actually go about doing this? So far I can see two general methods: 1. convergence towards an "obvious" decision theory 2. deliberate conditioning of moves between players My current view is that neither of these methods seem very powerful as mechanisms for enabling cooperation, compared to say the ability to prove source code, or to merge securely. To summarize my thoughts and the various examples I've given, here are the problems with each of the above methods for "increasing entanglement": 1. Two agents with the same "obvious" decision theory may not be highly correlated, if they have different heuristics, intuitions, priors, utility functions, etc. Also, an agent may have a disincentive to unilaterally increase his correlation with a large group of already highly correlated agents. 2. Deliberate conditioning of moves is difficult when two sides have high uncertainty about each others' source code. Which hypothetical agent(s) do you condition your move against? How would they know that you've done so, when they don't know your source code either? It's also difficult if two sides have different preferences about the correlation of their moves, that is, if one side wants them to be positively correlated, and another wants them to be uncorrelated or negatively correlated.
1Eliezer Yudkowsky14y
These sound like basically reasonable worries / lines of argument to me. I'm sure life will be a lot easier for... not necessarily everyone, but at least us primitive mortal analysts... if it's easy for superintelligences to exhibit their source code to each other. Then we just have the problem of logical ordering in threats and games of Chicken. (Come to think of it, blackmail threats of mutual destruction unless paid off, would seem to become more probable, not less, as you became more able to exhibit and prove your source code to the other player.) A possible primary remaining source of our differing guesses at this point, may have to do with the degree to which we think that decision processes are a priori (un)correlated. I take statements like "Obviously, everyone plays D at the end" to be evidence of very high a priori correlation - it's no good talking about different heuristics, intuitions, priors, utility functions, etcetera, if you don't actually conclude that maybe some players play C and others play D. How would that happen?
0Wei Dai14y
I think Nesov's position is that such threats don't work against updateless agents, but I'm not sure about that yet. ETA: See previous discussion of this topic. That doesn't make sense... Suppose nobody smokes, and nobody gets cancer. Does that mean smoking and cancer are correlated? In order to have correlation, you need to have both (C,C) and (D,D) outcomes. If all you have are (D,D) outcomes, there is no correlation. I'm referring to rock-paper-scissors and this example. Or were you asking something else?

I'm not keeping up here - I only peek at this site occasionallly, rather than following it - but this:

"The one-sentence version is: Choose as though controlling the logical output of the abstract computation you implement, including the output of all other instantiations and simulations of that computation."

... seems rather similar to the dictum that you should choose as if you really might be any of your subjective duplicates, from across all possible worlds. (I suppose there is a difference, in that "subjective duplicate" refers onl... (read more)

3Eliezer Yudkowsky14y
Omega may not contain a copy of you which is detailed enough to be a subjective duplicate. Omega may just be reasoning abstractly about you. So you legitimately know that you are not inside Omega - but you also expect that whatever you decide, Omega will have successfully predicted.

Upvoted; this is a good summary of the issue, and using the new label TDT is arguably more elegant than having to talk separately about the rationality of cultivating a disposition.

How significant are the open questions? We should not expect correct theory to work in the face of arbitrary acts of Omega. Suppose Omega says "Tomorrow I will examine your source code, and if you don't subscribe to TDT I will give you $1 million, and if you do subscribe to TDT I will make you watch the Alien movie series -- from the third one on". In this scenario it ... (read more)

2Eliezer Yudkowsky14y
Right, so the decision theories I try to construct are for classes of problems where I can identify a winning property of how the algorithm decides things or strategizes things or responds to things or whatever, a property which determines the payoff fully and screens off all other dependence on the algorithm. Then the algorithm can maximize that property of itself. Causal decision theory then corresponds to the problem class where your physical action fully determines the result, and anything else, like logical dependence on your algorithm's disposition, is not allowed. CDT agents will successfully maximize on that problem class.
Okay, so what problem class are you aiming for with TDT? It can't be the full class of problems where the result depends on your disposition, because there will always be a counter. Do you have a slightly more restricted class in mind?
2Eliezer Yudkowsky14y
The TDT I actually worked out is for the class where your payoffs are fully determined by the actual output of your algorithm, but not by other outputs that your algorithm would have made under other conditions. As I described in the "open problems" post, once you allow this sort of strategy-based dependence, then I can depend on your dependence on my dependence on ... and I don't yet know how to stop the recursion. This is closely related to what Wei Dai and I are talking about in terms of the "logical order" of decisions. If you want to use the current TDT for the Prisoner's Dilemma, you have to start by proving (or probabilistically expecting) that your opponent's decision is isomorphic to your own. Not by directly simulating the opponent's attempt to determine if you cooperate only if they cooperate. Because, as written, the counterfactual surgery that stops the recursion is just over "What if I cooperate?" not "What if I cooperate only if they cooperate?" (Look at the diagonal sentence.)
Okay... Omega comes along and says "I ran a simulation to see if you would one-box in Newcomb. The answer was yes, so I am now going to feed you to the Ravenous Bugblatter Beast of Traal. Have a nice day." Doesn't this problem fit within your criteria? If you reject it on the basis of "if you had told me the relevant facts up front, I would've made the right decision", can't you likewise reject the one where Omega flips a coin before telling you about the proposed bet? If you have reason in advance to believe that either is likely to occur, you can make an advance decision about what to do. Does either problem have some particular quality relevant for its classification here, that the other does not?
3Eliezer Yudkowsky14y
That's more like a Counterfactual Mugging, which is the domain of Nesov-Dai updateless decision theory - you're being rewarded or punished based on a decision you would have made in a different state of knowledge, which is not "you" as I'm defining this problem class. (Which again may sound quite restrictive at this point, but if you look at 95% of the published Newcomblike problems...) What you need here is for the version of you that Omega simulates facing Newcomb's Box, to know about the fact that another Omega is going to reward another version of itself (that it cares about) based on its current logical output. If the simulated decision system doesn't know/believe this, then you really are screwed, but it's more because now Omega really is an unfair bastard (i.e. doing something outside the problem class) because you're being punished based on the output of a decision system that didn't know about the dependency of that event on its output - sort of like Omega, entirely unbeknownst to you, watching you from a rooftop and sniping you if you eat a delicious sandwich. If the version of you facing Newcomb's Problem has a prior over Omega doing things like this, even if the other you's observed reality seems incompatible with that possible world, then this is the sort of thing handled by updateless decision theory.
Right. But then if that is the (reasonable) criterion under which TDT operates, it seems to me that it does indeed handle the case of Omega's after the fact coin flip bet, in the same way that it handles (some versions of) Newcomb's problem. How do you figure that it doesn't?
0Eliezer Yudkowsky14y
Because the decision diagonal I wrote out, handles the probable consequences of "this computation" doing something, given its current state of knowledge - its current, updated P - so if it already knows the coinflip (especially a logical coinflip like a binary digit of pi) came up heads, and this coinflip has nothing counterfactually to do with its decision, then it won't care about what Omega would have done if the coin had come up tails and the currently executing decision diagonal says "don't pay".
Ah! so you're defining "this" as exact bitwise match, I see. Certainly that helps make the conclusions more rigorous. I will suggest the way to handle the after-the-fact coin flip bet is to make the natural extension to sufficiently similar computations. Note that even selfish agents must do this in order to care about themselves five minutes in the future. To further motivate the extension, consider the variant of Newcomb where just before making your choice, you are given a piece of paper with a large number written on it; the number has been chosen to be prime or composite depending on whether the money is in the opaque box.
0Eliezer Yudkowsky14y
That's not the problem. The problem is that you've already updated your probability distribution, so you just don't care about the cases where the binary digit came up 0 instead of 1 - not because your utility function isn't over them, but because they have negligible probability. (First read that variant in Martin Gardner.) The epistemically intuitive answer is "Once I choose to take one box, I will be able to infer that this number has always been prime". If I wanted to walk through TDT doing this, I'd draw a causal graph with Omega's choice descending from my decision diagonal, and sending a prior-message in turn to the parameters of a child node that runs a primality test over numbers and picked this number because it passed (failed), so that - knowing / having decided your logical choice - seeing this number becomes evidence that its primality test came up positive. In terms of logical control, you don't control whether the primality test comes up positive on this fixed number, but you do control whether this number got onto the box-label by passing a primality test or a compositeness test.
(I don't remember where I first read that variant, but Martin Gardner sounds likely.) Yes, I agree with your analysis of it -- but that doesn't contradict the assertion that you can solve these problems by extending your utility function across parallel versions of you who received slightly different sensory data. I will conjecture that this turns out to be the only elegant solution.
0Eliezer Yudkowsky14y
Sorry, that doesn't make any sense. It's a probability distribution that's the issue, not a utility function. UDT tosses out the probability distribution entirely. TDT still uses it and therefore fails on Counterfactual Mugging.
It's precisely the assertion that all such problems have to be solved at the probability distribution level that I'm disputing. I'll go so far as to make a testable prediction: it will be eventually acknowledged that the notion of a purely selfish agent is a good approximation that nonetheless cannot handle such extreme cases. If you can come up with a theory that handles them all without touching the utility function, I will be interested in seeing it!
None of the decision theories in question assume a purely selfish agent.
No, but most of the example problems do.
It might be nontrivial to do this in a way that doesn't automatically lead to wireheading (using all available power to simulate many extremely fulfilled versions of itself). Or is that problem even more endemic than this?
This is a statement about my global strategy, the strategy I consider winning. In this strategy, I one-box in the states of knowledge where I don't know about the monster, and two-box where I know. If Omega told me about the monster, I'd transition to a state of knowledge where I know about it, and, according to the fixed strategy above, I two-box. In counterfactual mugging, for each instance of mugging, I give away $100 on the mugging side, and receive $10000 on the reward side. This is also a fixed global strategy that gives the actions depending on agent's state of knowledge.
We already have Disposition-Based Decision Theory - and have had since 2002 or so. I think it's more a case of whether there is anything more to add.
Thanks for the link! I'll read the paper more thoroughly later, a quick skim suggests it is along the same lines. Are there any cases where DBDT and TDT give different answers?
I don't think DBDT gives the right answer if the predictor's snapshot of the local universe-state was taken before the agent was born (or before humans evolved, or whatever), because the "critical point", as Fisher defines it, occurs too late. But a one-box chooser can still expect a better outcome.
5Eliezer Yudkowsky14y
It looks to me like DBDT is working in the direction of TDT but isn't quite there yet. It looks similar to the sort of reasoning I was talking about earlier, where you try to define a problem class over payoff-determining properties of algorithms. But this isn't the same as a reflectively consistent decision theory, because you can only maximize on the problem class from outside the system - you presume an existing decision process or ability to maximize, and then maximize the dispositions using that existing decision theory. Why not insert yet another step? What if one were to talk about dispositions to choose particular disposition-choosing algorithms as being rational? In other words, maximizing "dispositions" from outside strikes me as close kin to "precommitment" - it doesn't so much guarantee reflective consistency of viewpoints, as pick one particular viewpoint to have control. As Drescher points out, if the base theory is a CDT, then there's still a possibility that DBDT will end up two-boxing if Omega takes a snapshot of the (classical) universe a billion years ago before DBDT places the "critical point". A base theory of TDT, of course, would one-box, but then you don't need the edifice of DBDT on top because the edifice doesn't add anything. So you could define "reflective consistency" in terms of "fixed point under precommitment or disposition-choosing steps". TDT is validated by the sort of reasoning that goes into DBDT, but the TDT algorithm itself is a plain-vanilla non-meta decision theory which chooses well on-the-fly without needing to step back and consider its dispositions, or precommit, etc. The Buck Stops Immediately. This is what I mean by "reflective consistency". (Though I should emphasize that so far this only works on the simple cases that constitute 95% of all published Newcomblike problems, and in complex cases like Wei Dai and I are talking about, I don't know any good fixed algorithm (let alone a single-step non-meta one).)
Exactly. Unless "cultivating a disposition" amounts to a (subsequent-choice-circumventing) precommitment, you still need a reason, when you make that subsequent choice, to act in accordance with the cultivated disposition. And there's no good explanation for why that reason should care about whether or not you previously cultivated a disposition.
0Eliezer Yudkowsky14y
(Though I think the paper was trying to use dispositions to define "rationality" more than to implement an agent that would consistently carry out those dispositions?)
I didn't really get the purpose of the paper's analysis of "rationality talk". Ultimately, as I understood the paper, it was making a prescriptive argument about how people (as actually implemented) should behave in the scenarios presented (i.e, the "rational" way for them to behave).
I had a look a the Wikipedia "Precommitment" article to see whether precommitment is actually as inappropriate as it seems to be being portrayed as. According to the article, the main issue seems to involve cutting off your own options. Is a sensible one-boxing agent "precommitting" to one-boxing by "cutting off its own options" - namely the option of two-boxing? On one hand, they still have the option and a free choice when they come to decide. On the other hand, the choice has been made for them by their own nature - and so they don't really have the option of choosing any more. My assessment is that the word is not obviously totally inappropriate. Does "disposition" have the same negative connotations as "precommitting" has? I would say not: "disposition" seems like a fairly appropriate word to me.
I don't know if Justin Fisher's work exactly replicates your own conclusions. However it seems to have much the same motivations, and to have reached many of the same conclusions. FWIW, it took me about 15 minutes to find that paper in a literature search. Another relevant paper: "No regrets: or: Edith Piaf revamps decision theory". That one seems to have christened what you tend to refer to as "consistency under reflection" as "desire reflection". I don't seem to like either term very much - but currently don't have a better alternative to offer.
0Eliezer Yudkowsky14y
Violation of desire reflection would be a sufficient condition for violation of dynamic consistency, which in turn is a sufficient condition to violate reflective consistency. I don't see a necessity link.
The most obvious reply to the point about dispositions to have dispositions is to take a behavourist stance: if a disposition results in particular actions under particular circumstances, then a disposition to have a disposition (plus the ability to self-modify) is just another type of disposition, really.
What the document says about the placing of the "critical point" is: Consequently, I am not sure where the idea that it could be positioned "too late" comes from. The document pretty clearly places it early on.
Well, we have a lengthy description of the revised DBDT - so that should hopefully help figure out what its predicted actions are. The author claims it gets both the The Smoking-Cancer Problem and Newcomb’s problem right - which seems to be a start.

The three sentence version is actually a one sentence version; it's three independent clauses, but semicolons don't separate sentences.

I'm really sorry, I couldn't help myself.

Can anyone suggest me good background reading material to understand the technical language/background knowledge of this and, more generally, on decision theory?

But if you read the other parts of the solution to "free will", and then furthermore explicitly formulate TDT, then this is what utterly, finally, completely, and without even a tiny trace of confusion or dissatisfaction or a sense of lingering questions, kills off entirely the question of "free will".

If this is correct, then it amounts to a profound philosophical and scientific achievement.

8Eliezer Yudkowsky14y
Not by my standards. Free will is about as easy as a problem can get and still be Confusing. Plenty of moderately good reductionists have refused to be confused by it. Killing off the problem entirely is more like dropping nuclear weapons to obliterate the last remnants of a dead horse than any great innovation within the field of reductionism. There are non-reductionist philosophers who would think of reducing free will as a great and difficult achievement, but by reductionist standards it's a mostly-solved problem already. Formal cooperation in the one-shot PD, now that should be interesting.
Free will is counted as one of the great problems of philosophy. Wikipedia Lists it as a "central problem of metaphysics". SEP has a whole, long article on it along with others on: "compatibilism", "causal determinism" , "free will and fatalism", "divine foreknowledge", "incompatibilism (nondeterministic) theories of free will" and "arguments for incompatibilism". If you really have "nuked the dead donkey" here, you would cut out a lot of literature. Furthermore, religious people would no longer be able to use "free will" as a magic incantation with which to defend God.
The only reason free will is regarded as a problem of philosophy is that philosophers are in the rather bizarre habit of defining it as "your actions are uncaused" - it should be no surprise that a nonsensical definition leads to problems! When we use the correct definition - the one that corresponds to how the term is actually used - "your actions are caused by your own decisions, as opposed to by external coercion" - the problem doesn't arise.
3Eliezer Yudkowsky14y
Dennett and others have used multi-ton high explosives on the dead donkey. Why would nuclear weapons make a further difference?
People respond to math more than to words.
4Eliezer Yudkowsky14y
Er... no they don't?
Some do.
rather, if one challenges a valid verbal theory one can usually find some way of convincing people that there is some "wiggle room", that it may or may not be valid, etc. But a mathematical theory has, I think, an air of respectability that will make people pay attention, even if they don't like it, and especially if they don't actually understand the mathematics. If your theory finds applications, (which, given the robotics revolution we seem to be in the middle of is not vastly unlikely), then it will further marginalize those who stick to the old convenient confusion about free will. Of course, given what has happened with evolution (smart Christians accept it, but find excuses to still believe in God), I suspect that it will only have an incremental impact on religiosity, even amongst the elite.
Free will seems like a pretty boring topic to me. The main recent activity I have noticed in the area was Daniel Dennett's "Freedom Evolves" book. That book was pretty boring and mostly wrong - I thought. It was curious to see Daniel Dennett make such a mess of the subject, though.
As it happens, I'm reading through Freedom Evolves right now; up to chapter 3, and while I don't quite buy his ideas on inevitability, it so far doesn't strike me as a mess?
I liked the bit on memes. Most of the rest of it was a lot of word games, IMO.
Here is what I don't understand about the free will problem. I know this is a simple objection, so there must be a standard reply to it; but I don't know what that reply is. Denote F as a world in which free will exists, f as one in which it doesn't. Denote B as a world in which you believe in free will, and b as one in which you don't. Let a combination of the two, e.g., FB, denote the utility you derive from having that belief in that world. Suppose FB > Fb and fb > fB (being correct > being wrong). The expected utility of B is FB x p(F) + fB x (1-p(F)). Expected utility of b is Fb x p(F) + fb x (1-p(F)). Choose b if Fb x p(F) + fb x (1-p(F)) > FB x p(F) + fB x (1-p(F)). But, that's not right in this case! You shouldn't consider worlds of type f in your decision, because if you're in one of those worlds, your decision is pre-ordained. It doesn't make any sense to "choose" not to believe in free will - that belief may be correct, but if it is correct, then you can't choose it. Over worlds of type F, the expected utility of B is FB x p(F), and the utility of b is Fb x p(F), and FB > Fb. So you always choose B.
Saying that you shouldn't do something because it's preordained whether you do it or not is a very confused way of looking at things. Christine Korsgaard, by whom I am normally unimpressed but who has a few quotables, says: (From "The Authority of Reflection")
I don't understand what that Korsgaard quote is trying to say. I didn't say that. I said that, when making a choice, you shouldn't consider, in your set of possible worlds, possible worlds in which you can't make that choice. It's certainly not as confused a way of looking at things as choosing to believe that you can't choose what to believe. I should have said you shouldn't try to consider those worlds. If you are in f, then it may be that you will consider such possible worlds; and there's no shouldness about it. "But", you might object, "what should you do if you are a computer program, running in a deterministic language on deterministic hardware?" The answer is that in that case, you do what you will do. You might adopt the view that you have no free will, and you might be right. The 2-sentence version of what I'm saying is that, if you don't believe in free will, you might be making an error that you could have avoided. But if you believe in free will, you can't be making an error that you could have avoided.
In the context of the larger paper, the most charitable way of interpreting her (IMO) is that whether we have free will or not, we have the subjective impression of it, this impression is simply not going anywhere, and so it makes no sense to try to figure out how a lack of free will ought to influence our behavior, because then we'll just sit around waiting for our lack of free will to pick us up out of our chair and make us water our houseplants and that's not going to happen. What if we're in a possible world where we can't choose not to consider those worlds? ;) "Choosing to believe that you can't choose what to believe" is not a way of looking at things; it's a possible state of affairs, in which one has a somewhat self-undermining and false belief. Now, believing that one can choose to believe that one cannot choose what to believe is a way of looking at things, and might even be true. There is some evidence that people can choose to believe self-undermining false things, so believing that one could choose to believe a particular self-undermining false thing which happens to have recursive bearing on the choice to believe it isn't so far out.
7Eliezer Yudkowsky14y
I am unable to attach a truth condition to these sentences - I can't imagine two different ways that reality could be which would make the statements true or alternatively false.
Do you mean that the phrases "free will exists" and "free will does not exist" are both incoherent?
6Eliezer Yudkowsky14y
If I want to, I can assign a meaning to "free will" in which it is tautologically true of causal universes as such, and applied to agents, is true of some agents but not others. But you used the term, you tell me what it means to you.
You used the term first. You called it a "dead horse" and "about as easy as a problem can get and still be Confusing". I would think this meant that you have a clear concept of what it means. And it can't be a tautology, because tautologies are not dead horses. I can at least say that, to me, "Free will exists" implies "No Omega can predict with certainty whether I will one-box or two-box." (This is not an "if and only if" because I don't want to say that a random process has free will; nor that an undecidable algorithm has free will.) I thought about saying: "Free will does not exist" if and only if "Consciousness is epiphenomenal". That sounds dangerously tautological, but closer to what I mean. I can't think how to say anything more descriptive than what I wrote in my first comment above. I understand that saying there is free will seems to imply that I am not an algorithm; and that that seems to require some weird spiritualism or vitalism. But that is vague and fuzzy to me; whereas it is clear that it doesn't make sense to worry about what I should do in the worlds where I can't actually choose what I will do. I choose to live with the vague paradox rather than the clear-cut one. ADDED: I should clarify that I don't believe in free will. I believe there is no such thing. But, when choosing how to act, I don't consider that possibility, because of the reasons I gave previously.
6Eliezer Yudkowsky14y
Then you've got the naive incoherent version of "free will" stuck in your head. Read the links.
All right, I read all of the non-italicized links, except for the "All posts on Less Wrong tagged Free Will", trusting that one of them would say something relevant to what I've said here. But alas, no. All of those links are attempts to argue about the truth value of "there is free will", or about whether the concept of free will is coherent, or about what sort of mental models might cause someone to believe in free will. None of those things are at issue here. What I am talking about is what happens when you are trying to compute something over different possible worlds, where what your computation actually does is different in these different worlds. When you must compare expected value in possible worlds in which there is no free will, to expected value in possible worlds in which there is free will, and then make a choice; what that choice actually does is not independent of what possible world you end up in. This means that you can't apply expectation-maximization in the usual way. The counterintuitive result, I think, is that you should act in the way that maximizes expected value given that there is free will, regardless of the computed expected value given that there is not free will. As I mentioned, I don't believe in free will. But I think, based on a history of other concepts or frameworks that seemed paradoxical but were eventually worked out satisfactorily, that it's possible there's something to the naive notion of "free will". We have a naive notion of "free will" which, so far, no one has been able to connect up with our understanding of physics in a coherent way. This is powerful evidence that it doesn't exist, or isn't even a meaningful concept. It isn't proof, however; I could say the same thing about "consciousness", which as far as I can see really shouldn't exist. All attempts that I've seen so far to parse out what free will means, including Eliezer's careful and well-written essays linked to above, fail to noticeably reduce the probabil
7Eliezer Yudkowsky14y
I have stated exactly what I mean by the term "free will" and it makes this sentence nonsense; there is no world in which you do not have free will. And I see no way that your will could possibly be any freer than it already is. There is no possible amendment to reality which you can consistently describe, that would make your free will any freer than it is in our own timeless and deterministic (though branching) universe. What do you mean by "free will" that makes your sentence non-nonsense? Don't say "if we did actually have free will", tell me how reality could be different.
That's the part I don't buy. I'm not saying it's false, but I don't see any good reason to think it's true. (I think I read the posts where you explained why you believe it, but I might have missed some.)
The mistake you're making is that determinism does not mean your decisions are irrelevant. The universe doesn't swoop in and force you to decide a certain way even though you'd rather not. Determinism only means that your decisions, by being part of physical reality rather than existing outside it, result from the physical events that led to them. You aren't free to make events happen without a cause, but you can still look at evidence and come to correct conclusions.
If you can't choose whether you believe, then you don't choose whether you believe. You just believe or not. The full equation still captures the correctness of your belief, however you arrived at it. There's nothing inconsistent about thinking that you are forced to not believe and that seeing the equation is (part of) what forced you. (I avoid the phrase "free will" because there are so many different definitions. You seem to be using one that involves choice, while Eliezer uses one based on control. As I understand it, the two of you would disagree about whether a TV remote in a deterministic universe has free will.) edit: missing word, extra word
Brian said: And Alicorn said: And before either of those, I said: These all seem to mean the same thing. When you try to argue against what someone said by agreeing with him, someone is failing to communicate. Brian, my objection is not based on the case fb. It's based on the cases Fb and fB. fB is a mistake that you had to make. Fb, "choosing to believe that you can't choose to believe", is a mistake you didn't have to make.
Yes. I started writing my reply before Alicorn said anything, took a short break, posted it, and was a bit surprised to see a whole discussion had happened under my nose. But I don't see how what you originally said is the same as what you ended up saying. At first, you said not to consider f because there's no point. My response was that the equation correctly includes f regardless of your ability to choose based on the solution. Now you are saying that Fb is different from (inferior to?) fB.

In conclusion, rational agents are not incapable of cooperation, rational agents are not constantly fighting their own source code, rational agents do not go around helplessly wishing they were less rational, and finally, rational agents win.

I'm pretty sure Socrates and Aristotle already pointed much of this out in different words. I should make a post about that. Of course, they didn't do the math.

I agree with cousin_it below. It seems like you're missing some math.

But other than that, I don't see what the big deal is. I was expecting something monumental and game-changing, not "Is that it?"

This is indeed interesting, although it seems to be going over my head somewhat.

Re: "Some concluding chiding of those philosophers who blithely decided that the "rational" course of action systematically loses"

Some of those philosophers draw a distinction between rational action and the actions of a rational agent - see here:

I conclude that the rational action for a player in the Newcomb Paradox is taking both boxes, but that rational agents will usually take only one box because they have rationally adopted the disposition to do so.

So: these folk had got the right answer, and any debate with them is over terminology.

1Eliezer Yudkowsky14y
(Looks over Tim Tyler's general trend in comments.) Okay. It's helpful that you're doing a literature search. It's not helpful that every time you find something remotely related, you feel a need to claim that it is already TDT and that TDT is nothing innovative by comparison. It does not appear to me that you understand either the general background of these questions as they have been pursued within decision theory, or TDT in particular. Literature search is great, but if you're just spending 15 minutes Googling, then you have insufficient knowledge to compare the theories. Plenty of people have called for a decision theory that one-boxes on Newcomb and smokes on the smoking lesion - the question is coughing up something that seems reasonably formal. Plenty of people have advocated precommitment, but it comes with its own set of problems, and that is why a non-precommitment-based decision theory is important.
In the spirit of dredging up references with no actual deep insight, I note this recent post on Andrew Gelman's blog.
Well, other people have previously taken a crack at the same problem. If they have resolved it, then I should think that would be helpful - since then you can look at their solution. If not, their efforts to solve the problem might still be enlightening. So: I think my contribution in this area is probably helpful. 15 minutes was how long it took me to find the cited material in the first place. Not trivial - but not that hard. No need to beat me up for not knowing the background of your own largely unpublished theory! ...but yes, in my view, advanced decision theory is a bit of a red herring for those interested in machine intelligence. It's like: that is so not the problem. It seems like wondering whether to use butter-icing or marzipan on the top of the cake - when you don't yet have the recipe or the ingredients.
1Eliezer Yudkowsky14y
The cited material isn't much different from a lot of other material in the same field.
So far, "Disposition-Based Decision Theory" (and its apparently-flawed precursor) is the only thing I have seen that apparently claims to address and solve the same problem that is under discussion in this forum: I suppose there's also a raft of CDT enthusiasts, who explain why two-boxing is actually not a flaw in their system, and that they have no objections to the idea of agents who one-box. In their case, the debate appears to be over terminology: what does the word "rational" actually mean - is it about choosing the best action from the available options? Or does it mean something else? Are there other attempts at a solution? Your turn for some references, I feel.
1Eliezer Yudkowsky14y
"Paradoxes of Rationality and Cooperation" (the edited volume) will give you a feel for the basics, as will reading Marion Ledwig's thesis paper.
If "precommitment" just means cutting off some of your options in advance, precommitment seems to be desirable - under various circumstances where you want to signal commitment - and believe that faked commitment signals would be detected. You use the term as though it is in some way negative. It seems to me that I have not encountered the critics of precommitment saying what they mean by the term. Consequently, it is hard to see what problems they see with the idea.
1Eliezer Yudkowsky14y
This is the crippleware version of TDT that pure CDT agents self-modify to. It's crippleware because if you self-modify at 7:00pm you'll two-box against an Omega who saw your code at 6:59am.
By hypothesis, Omega on examining your code at 6:59, knows that you will self-modify at 7:00 and one-box thereafter. Consider that every TDT agent must be derived from a non-TDT agent. There is no difference in principle between "I used to adhere to CDT but self-modified to TDT" and "I didn't understand TDT when I was a child, but I follow it now as an adult". Correction made, thanks to Tim Tyler.
1Eliezer Yudkowsky14y
CDT agents don't care. They aren't causing Omega to fill box B by changing their source code at 7pm, so they have no reason to change their source code in a way that takes only one box. The source code change only causes Omega to fill box B if Omega looks at their source code after 7pm. That is how CDT agents (unwisely) compute "causes".
Yes, but the CDT agent at seven o'clock is not being asked to choose one or two boxes. It has to choose between rewriting its algorithm to plain TDT (or DBDT or some variant that will one box), or to TDT with an exception clause "but use the old algorithm if you find out Omega's prediction was made before seven o'clock". Even by straight CDT, there is no motive for writing that exception.
5Eliezer Yudkowsky14y
This is the point at which I say "Wrong" and "Read the literature". I'm not sure how I can explain this any more clearly than I have already, barring a full-fledged sequence. At 7pm the CDT agent calculates that if it modifies its source to use the old algorithm in cases where Omega saw the code before 7pm, it will get an extra thousand dollars on Newcomb's Problem, since it will take box A which contains an additional thousand dollars, and since its decision to modify its code at 7pm has no effect on an Omega who saw the code before 7pm, hence no effect on whether box B is full. It does not reason "but Omega knows I will change my code". If it reasoned that way it would be TDT, not CDT, and would one-box to begin with.
Actually I will add another comment because I can now articulate where the ambiguity comes in: how you add self modification to CDT (which doesn't have it in the usual form); I've been assuming the original algorithm doesn't try to micromanage the new algorithm's decisions (which strikes me as the sensible way, not least because it gives better results here); you've been assuming it does (which I suppose you could argue, is more true to the spirit of the original CDT).
I still disagree, but I agree that we have hit the limits of discussion in this comment thread; fundamentally this needs to be analyzed in a more precise language than English. We can revisit it if either of us ever gets to actually programming anything like this.
By what hypothesis? That is not how the proposed Disposition-Based Decision Theory says it works. It claims to result in agents who have the disposition to one-box.
Sure. This sub thread was about plain CDT, and how it self-modifies into some form of DBDT/TDT once it figures out the benefits of doing so -- and given the hypothesis of an omniscient Omega, then Omega will know that this will occur.
In that case, what I think you meant to say was:
Doh! Thanks for the correction, editing comment.
I don't see any reason for thinking this fellow's work represents "crippleware". It seems to me that he agrees with you regarding actions, but differs about terminology. Here's the CDT explanation of the terminology: The basic idea of forming a disposition to one-box has been around for a while. Here's another one: * Realistic decision theory: rules for nonideal agents ... by Paul Weirich - 2004 ...and another one: "DISPOSITION-BASED DECISION THEORY"
In Eliezer's article on Newcomb's problem, he says, "Omega has been correct on each of 100 observed occasions so far - everyone who took both boxes has found box B empty and received only a thousand dollars; everyone who took only box B has found B containing a million dollars. " Such evidence from previous players fails to appear in some problem descriptions, including Wikipedia's. For me this is a "no-brainer". Take box B, deposit it, and come back for more. That's what the physical evidence says. Any philosopher who says "Taking BOTH boxes is the rational action," occurs to me as an absolute fool in the face of the evidence. (But I've never understood non-mathematical philosophy anyway, so I may a poor judge.) Clarifying (NOT rhetorical) questions: Have I just cheated, so that "it's not the Newcomb Problem anymore?" When you fellows say a certain decision theory "two-boxes", are those theory-calculations including the previous play evidence or not? Thanks for your time and attention.
There is no opportunity to come back for more. Assume that when you take box B before taking box A, box A is removed.
Yes, I read about " ... disappears in a puff of smoke." I wasn't coming back for a measly $1K, I was coming back for another million! I'll see if they'll let me play again. Omega already KNOWS I'm greedy, this won't come as a shock. He'll probably have told his team what to say when I try it. " ... and come back for more." was meant to be funny. Anyway, this still doesn't answer my questions about "Omega has been correct on each of 100 observed occasions so far - everyone who took both boxes has found box B empty and received only a thousand dollars; everyone who took only box B has found B containing a million dollars." Someone please answer my questions! Thanks!
The problem needs lots of little hypotheses about Omega. In general, you can create these hypotheses for yourself, using the principle of "Least Convenient Possible World" Or, from philosophy/argumentation theory, "Principle of Charity". In your case, I think you need to add at least two helper assumptions - Omega's prediction abilities are trustworthy, and Omega's offer will never be repeated - not for you, not for anyone.
What the physical evidence says is that the boxes are there, the money is there, and Omega is gone. So what does your choice effect and when?
Well, I mulled that over for a while, and I can't see any way that contributes to answering my questions. As to " ... what does your choice effect and when?", I suppose there are common causes starting before Omega loaded the boxes, that affect both Omega's choices and mine. For example, the machinery of my brain. No backwards-in-time is required.
Penalising a rational agent for its character flaws while it is under construction seems like a rather weak objection. Most systems have a construction phase during which they may behave imperfectly - so similar objections seem likely to apply to practically any system. However, this is surely no big deal: once a synthetic rational agent exists, we can copy its brain. After that, developmental mistakes would no longer be much of a factor. It does seem as though this makes CDT essentially correct - in a sense. The main issue would then become one of terminology - of what the word "rational" means. There would be no significant difference over how agents should behave, though. My reading of this issue is that the case goes against CDT. Its terminology is misleading. I don't think there's much of a case that it is wrong, though.
Eric Barnes - while appreciating the benefits of taking one box - has harsh words for the "taking one box is rational" folk.
4Eliezer Yudkowsky14y
(Sigh.) Yes, causal decision theorists have been saying harsh words against the winners on Newcomb's Problem since the dawn of causal decision theory. I am replying to them.
Note that this is the same guy who says: He's drawing a distinction between a "rational action" and the actions of a "rational agent".
Newcomb's Problem capriciously rewards irrational people in the same way that reality capriciously rewards people who irrationally believe their choices matter.