Timeless Decision Theory: Problems I Can't Solve

31Wei Dai

9cousin_it

1[anonymous]

7Wei Dai

16PhilGoetz

11Vladimir_Nesov

8Psychohistorian

3CronoDAS

4cousin_it

1Velochy

7Psychohistorian

2cousin_it

7cousin_it

7Eliezer Yudkowsky

3Perplexed

2timtyler

3Sniffnoy

2Vaniver

0timtyler

0Sniffnoy

6A1987dM

0casebash

5Will_Newsome

1katydee

4cousin_it

4Jonathan_Graehl

5Eliezer Yudkowsky

0Tedav

5Eliezer Yudkowsky

2SoullessAutomaton

4Eliezer Yudkowsky

4SoullessAutomaton

19Eliezer Yudkowsky

1SoullessAutomaton

3Jonathan_Graehl

20MBlume

0Jonathan_Graehl

2Gavin

4SoullessAutomaton

4SoullessAutomaton

1JulianMorrison

0SoullessAutomaton

0JulianMorrison

0SoullessAutomaton

0Eliezer Yudkowsky

3SoullessAutomaton

-1thomblake

4SoullessAutomaton

3[anonymous]

3wedrifid

8Larks

3JamesAndrix

3MBlume

2Peter_de_Blanc

1JamesAndrix

3Liron

2Ronny Fernandez

2chaosmosis

1thomblake

0chaosmosis

1thomblake

2Matt_Young

0CuSithBell

0Matt_Young

0CuSithBell

0Matt_Young

2PhilGoetz

2MrHen

1Vladimir_Nesov

0MrHen

2Richard_Kennaway

0RobinZ

1Jotaf

0RobinZ

1paulfchristiano

0dankane

1Jayson_Virissimo

0Nanani

1[anonymous]

5Liron

-2[anonymous]

0Ishaan

0CCC

0VKS

0VKS

0duckduckMOO

0Matt_Young

0Matt_Young

0diegocaleiro

0Manfred

0JoshBurroughs

0JoshBurroughs

0dankane

0Manfred

0dankane

0Manfred

0dankane

1AlephNeil

0dankane

0dankane

0AlephNeil

0dankane

0AlephNeil

0dankane

0AlephNeil

0AlephNeil

0dankane

0AlephNeil

0dankane

0Manfred

0dankane

0Desrtopa

0PhilGoetz

0Jotaf

0ArthurB

0Vladimir_Nesov

0ArthurB

0Vladimir_Nesov

0ArthurB

0Vladimir_Nesov

0nawitus

0JamesAndrix

0timtyler

0mariorz

0Bo102010

-1dclayh

2Eliezer Yudkowsky

0dclayh

0orthonormal

2Vladimir_Nesov

0orthonormal

2timtyler

0orthonormal

-2timtyler

0dclayh

1orthonormal

1kpreid

-1dankane

2Desrtopa

1dankane

3Desrtopa

2dankane

0Caspian

2shokwave

1dankane

1shokwave

0dankane

0wedrifid

0wedrifid

0dankane

0dankane

-1[anonymous]

-2CannibalSmith

6JGWeissman

-4CannibalSmith

-4mendel

New Comment

156 comments, sorted by Click to highlight new comments since: Today at 5:37 AM

Some comments are truncated due to high volume. (⌘F to expand all)

There does not appear to be any such thing as a dominant majority vote.

Eliezer, are you aware that there's an academic field studying issues like this? It's called Social Choice Theory, and happens to be covered in chapter 4 of Hervé Moulin's Fair Division and Collective Welfare, which I recommended in my post about Cooperative Game Theory.

I know you're probably approaching this problem from a different angle, but it should still be helpful to read what other researchers have written about it.

A separate comment I want to make is that if you want others to help you solve problems in "timeless decision theory", you really need to publish the results you've got already. What you're doing now is like if Einstein had asked people to help him predict the temperature of black holes before having published the general theory of relativity.

As far as needing a long sequence, are you assuming that the reader has no background in decision theory? What if you just write to an audience of professional decision theorists, or someone who has at least read "The Foundations of Causal Decision Theory" or the equivalent?

915y

Seconded. I, for one, would be perfectly OK with posts requiring a lot of unfamiliar background math as long as they're correct and give references. For example, Scott Aaronson isn't afraid of scary topics and I'm not afraid of using his posts as entry points into the maze.

1[anonymous]15y

For that matter, I'm sure someone else would be willing to write a sequence on decision theory to ensure everyone has the required background knowledge. This might work even better if Eliezer suggested some topics to be covered in the sequence so that the background was more specific.
In fact, I would happily do that and I'm sure others would too.

714y

I want to note that this is also know in Cooperative Game Theory as an "empty core". (In Social Choice Theory it's studied under "majority cycling".) See http://www.research.rmutp.ac.th/research/Game%20Theory%20and%20Economic%20Analysis.pdf#page=85 for a good explanation of how Cooperative Game Theory views the problem. Unfortunately it doesn't look like anyone has a really good solution.

Unfortunately this "timeless decision theory" would require a long sequence to write up, and it's not my current highest writing priority unless someone offers to let me do a PhD thesis on it.

But it is the writeup most-frequently requested of you, and also, I think, the thing you have done that you refer to the most often.

Nobody's going to

*offer*. You have to ask them.

In case you're wondering, I'm writing this up because one of the SIAI Summer Project people asked if there was any Friendly AI problem that could be modularized and handed off and potentially written up afterward, and the answer to this is almost always "No"

Does it mean that the problem isn't reduced enough to reasonably modularize? It would be nice if you written up the outline of state of research at SIAI (even a brief one with unexplained labels) or an explanation of why you won't.

Hanson's example of ten people dividing the pie seems to hinge on arbitrarily passive actors who get to accept and make propositions instead of being able to solicit other deals or make counter proposals, and it is also contingent on infinite and costless bargaining time. The bargaining time bit may be a fair (if unrealistic) assumption, but the passivity does not make sense. It really depends on the kind of commitments and bargains players are able to make and enforce, and the degree/order of proposals from outgroup and ingroup members.

When the first two ...

315y

What if we try a simpler model?
Let's go from ten agents to two, with the stipulation that nobody gets any pie until both agents agree on the split...

415y

This is the Nash bargaining game. Voting plays no role there, but it's a necessary ingredient in our game; this means we've simplified too much.

115y

But three people should do already. Im fairly convinced that this game is unstable in the sense it would not make sense for any of them to agree to get 1/3 as they can always guarantee themselves more by defecting with someone (even by offeing them 1/6 - epsilon which is REALLY hard to turn down). It seems that a given majority getting 1/2 each would be a more probable solution but you would really need to formalize the rules before this can be proven. Im a cryptologist so this is sadly not really my area...

715y

I almost posted on the three-person situation earlier, but what I wrote wasn't cogent enough. It does seem like it should work as an archetype for any N > 2.
The problem is how the game is iterated. Call the players A, B, and C. If A says, "B, let's go 50-50," and you assume C doesn't get to make a counter-offer and they vote immediately, 50-50-0 is clearly the outcome. This is probably also the case for the 10-person if there's no protracted bargaining.
If there is protracted bargaining, it turns into an infinite regression as long as there is an out-group, and possibly even without an outgroup. Take this series of proposals, each of which will be preferred to the one prior (format is Proposer:A gets-B gets-C gets):
A:50-50-0
C:0-55-45
A:50-0-50
B: 55-45-0
C:0-55-45
A:50-0-50 ...
There's clearly no stable equilibrium. It seems (though I'm not sure how to prove this) that an equal split is the appropriate efficient outcome. Any action by any individual will create an outgroup that will spin them into an infinite imbalance. Moreover, if we are to arbitrarily stop somewhere along that infinite chain, the expected value for each player is going to be 100/3 (they get part of a two-way split twice which should average to 50 each time overall, and they get zero once per three exchanges). Thus, at 33-33-33, one can't profitably defect. At 40-40-20, C could defect and have a positive expected outcome.
If the players have no bargaining costs whatsoever, and always have the opportunity to bargain before a deal is voted on, and have an infinite amount of time and do not care how long it takes to reach agreement (or if agreement is reached), then it does seem like you get an infinite loop, because there's always going to be an outgroup that can outbid one of the ingroup. This same principle should also apply to the 10-person model; with infinite free time and infinite free bargaining, no equilibrium can be reached. If there is some cost to defecting, or a limitation o

215y

The example of the Rubinstein bargaining model suggests that you could make players alternate offers and introduce exponential temporal discounting. An equal split isn't logically necessary in this case: a player's payoff will likely depend on their personal rate of utility discounting, also known as "impatience", and others' perceptions of it. The search keyword is "n-person bargaining"; there seems to be a lot of literature that I'm too lazy and stupid to quickly summarize.

Here's a comment that took me way too long to formulate:

On the Prisoner's Dilemma in particular, this infinite regress can be cut short by expecting that the other agent is doing symmetrical reasoning on a symmetrical problem and will come to a symmetrical conclusion...

Eliezer, if such reasoning from symmetry is allowed, then we sure don't need your "TDT" to solve the PD!

714y

TDT allows you to use whatever you can prove mathematically. If you can prove that two computations have the same output because their global structures are isomorphic, it doesn't matter if the internal structure is twisty or involves regresses you haven't yet resolved. However, you need a license to use that sort of mathematical reasoning in the first place, which is provided by TDT but not CDT.

314y

Strategies are probability (density) functions over choices. Behaviors are the choices themselves. Proving that two strategies are identical (by symmetry, say) doesn't license you to assume that the behaviors are the same. And it is behaviors you seem to need here. Two random variables over the same PDF are not equal.
Seldin got a Nobel for re-introducing time into game theory (with the concept of subgame perfect equilibrium as a refinement of Nash equilibrium). I think he deserved the prize. If you think that you can overturn Seldin's work with your TDT, then I say "To hell with a PhD. Write it up and go straight to Stockholm."

213y

After looking at this: http://lesswrong.com/lw/vp/worse_than_random/
...I figure Yudkowsky will not be able to swallow this first sentence - without indigestion.

313y

In this case, I can only conclude that you haven't read thoroughly enough.
I think EY's restriction to "cryptographic adversaries" is needlessly specific; any adversary (or other player) will do.
Of course, this is still not really relevant to the original point, as, well, when is there reason to play a mixed strategy in Prisoner's Dilemma?

213y

Even if your strategy is (1,0) or (0,1) on (C,D), isn't that a probability distribution? It might not be valuable to express it that way for this instance, but you do get the benefits that if you ever do want a random strategy you just change your numbers around instead of having to develop a framework to deal with it.

013y

The rule in question is concerned with improving on randomness. It may be tricky to improve on randomness by very much if, say, you face a highly-intelligent opponent playing the matching pennies game. However, it is usually fairly simple to equal it - even when facing a smarter, crpytography-savvy opponent - just use a secure RNG with a reasonably secure seed.

014y

...unless the resulting strategies are unmixed, as will usually be the case with Prisoner's Dilemma?

Is Parfit's Hitchhiker essentially the same as Kavka's toxin, or is there some substantial difference between the two I'm missing?

08y

I think the question is very similar, but there is a slight difference in focus. Kavka's toxin focuses of whether a person can intent something if they also intend to change their mind. Parfit's Hitchhiker focuses on another person's prediction.

113y

Come on, why would anyone downvote this?

Is your majority vote problem related to Condorcet's paradox? It smells so, but I can't put a handle on why.

I cheated the PD infinite regress problem with a quine trick in Re-formalizing PD. The asymmetric case seems to be hard because fair division of utility is hard, not because quining is hard. Given a division procedure that everyone accepts as fair, the quine trick seems to solve the asymmetric case just as well.

Post your "timeless decision theory" already. If it's correct, it shouldn't be *that* complex. With your intelligence you can always ...

"I believe X to be like me" => "whatever I decide, X will decide also" seems tenuous without some proof of likeness that is beyond any guarantee possible in humans.

I can accept your analysis in the context of actors who have irrevocably committed to some mechanically predictable decision rule, which, along with perfect information on all the causal inputs to the rule, gives me perfect predictions of their behavior, but I'm not sure such an actor could ever trust its understanding of an actual human.

Maybe you could aspire to such determinism in a proven-correct software system running on proven-robust hardware.

515y

Well, yeah, this is primarily a theory for AIs dealing with other AIs.
You could possibly talk about human applications if you knew that the N of you had the same training as rationalists, or if you assigned probabilities to the others having such training.

010y

For X to be able to model the decisions of Y with 100% accuracy, wouldn't X require a more sophisticated model?
If so, why would supposedly symmetrical models retain this symmetry?

510y

Nope. http://arxiv.org/abs/1401.5577

215y

Let's play a little game; you and an opponent, 10 rounds of the prisoner's dilemma. It will cost you each $5 to play, with the following payouts on each round:
* (C,C) = $0.75 each
* (C,D) = $1.00 for D, $0 for C
* (D,D) = $0.25 each
Conventional game theory says both people walk away with $2.50 and a grudge against each other, and I, running the game, pocket the difference.
Your opponent is Eliezer Yudkowsky.
How much money do you expect to have after the final round?

415y

But that's not the true PD.

415y

The statistical predictability of human behavior in less extreme circumstances is a much weaker constraint. I thought the (very gentle) PD presented sufficed to make the point that prediction is not impossible even in a real-world scenario.
I don't know that I have confidence in even you to cooperate on the True PD--sorry. A hypothetical transhuman Bayesian intelligence with your value system? Quite possibly.

115y

Well, obviously. But the more interesting question is what if you suspect, but are not certain, that your opponent is Eliezer Yudkowsky? Assuming identity makes the problem too easy.
My position is that I'd expect a reasonable chance that an arbitrary, frequent LW participant playing this game against you would also end with 10 (C,C)s. I'd suggest actually running this as an experiment if I didn't think I'd lose money on the deal...

315y

Harsher dilemmas (more meaningful stake, loss from an unreciprocated cooperation that may not be recoverable in the remaining iterations) would make me increasingly hesitant to assume "this person is probably like me".
This makes me feel like I'm in "no true Scotsman" territory; nobody "like me" would fail to optimistically attempt cooperation. But if caring more about the difference in outcomes makes me less optimistic about other-similarity, then in a hypothetical where I am matched up against essentially myself (but I don't know this), I defeat myself exactly when it matters - when the payoff is the highest.

and this is exactly the problem: If your behavior on the prisoner's dilemma changes with the size of the outcome, then *you aren't really playing the prisoner's dilemma*. Your calculation in the low-payoff case was being confused by other terms in your utility function, terms for *being someone who cooperates* -- terms that didn't scale.

015y

Yes, my point was that my variable skepticism is surely evidence of bias or rationalization, and that we can't learn much from "mild" PD. I do also agree that warm fuzzies from being a cooperator don't scale.

215y

If we wanted to be clever we could include Eliezer playing against himself (just report back to him the same value) as a possibility, though if it's a high probability that he faces himself it seems pointless.
I'd be happy to front the (likely loss of) $10.
It might be possible to make it more like a the true prisoner's dilemma if we could come up with two players each of whom want the money donated to a cause that they consider worthy but the other player opposes or considers ineffective.
Though I have plenty of paperclips, sadly I lack the resources to successfully simulate Eliezer's true PD . . .

415y

Meaningful results would probably require several iterations of the game, though, with different players (also, the expected loss in my scenario was $5 per game).
I seem to recall Douglas Hofstadter did an experiment with several of his more rational friends, and was distressed by the globally rather suboptimal outcome. I do wonder if we on LW would do better, with or without Eliezer?

As a first off-the-cuff thought, the infinite regress of conditionality sounds suspiciously close to general recursion. Do you have any guarantee that a fully general theory that gives a decision wouldn't be equivalent to a Halting Oracle?

ETA: If you *don't* have such a guarantee, I would submit that the first priority should be either securing one, or proving isomorphism to the Entscheidungsproblem and, thus, the impossibility of the fully general solution.

115y

Hah! Same thought!
What's the moral action when the moral problem seems to diverge, and you don't have the compute resources to follow it any further? Flip a coin?

015y

I would suggest that the best move would be to attempt to coerce the situation into one where the infinite regress is subject to analysis without Halting issues, in a way that is predicted to be least likely to have negative impacts.
Remember, Halting is only undecidable in the general case, and it is often quite tractable to decide on some subset of computations.

015y

Unless you're saying "don't answer the question, use the answer from a different but closely related one", then a moral problem is either going to be known transformable into a decidable halting problem, or not. And if not, my above question remains unanswered.

015y

I meant something more like "don't make a decision, change the context such that there is a different question that must be answered". In practice this would probably mean colluding to enforce some sort of amoral constraints on all parties.
I grant that at some point you may get irretrievably stuck. And no, I don't have an answer, sorry. Chosing randomly is likely to be better than inaction, though.

015y

Obviously any game theory is equivalent to the halting problem if your opponents can be controlled by arbitrary Turing machines. But this sort of infinite regress doesn't come from a big complex starting point, it comes from a simple starting point that keeps passing the recursive buck.

315y

I understand that much, but if there's anything I've learned from computer science it's that turing completeness can pop up in the strangest places.
I of course admit it was an off-the-cuff, intuitive thought, but the structure of the problem reminds me vaguely of the combinatorial calculus, particularly Smullyan's Mockingbird forest.

-115y

This was a clever ploy to distract me with logic problems, wasn't it?

415y

No, but mentioning the rest of Smullyan's books might be.

If I were forced to pay $100 upon losing, I'd have a net gain of $4950 each time I play the game, on average. Transitioning from this into the game as it currently stands, I've merely been given an additional option. As a rationalist, I should not regret being one. Even knowing I won't get the $10,000, as the coin came up heads, I'm basically paying $100 for the other quantum me to receive $10,000. As the other quantum me, who saw the coin come up tails, my desire to have had the first quantum me pay $100 outweighs the other quantum me's desire to not lose...

814y

Well, for starters you have a 1/3^^^3 chance of 3^^^3 utils...

I swear I'll give you a PhD if you write the thesis. On fancy paper and everything.

Would timeless decision theory handle negotiation with your future self? For example if a timeless decision agent likes paperclips today but you knows it is going to be modified to like apples tomorrow, (and not care a bit about paperclips,) will it abstain from destroying the apple orchard, and its future self abstain from destroying the paperclips in exchange?

And is negotiation the right way to think about reconciling the difference between what I now want and what a predicted smarter, grown up, more knowledgeable version of me would want? or am I going the wrong way?

315y

to talk about turning a paperclip maximizer into an apple maximizer is needlessly confusing. Better to talk about destroying a paperclip maximizer and creating an apple maximizer. And yes, timeless decision theory should allow these two agents to negotiate, though it gets confusing fast.

215y

In what sense is that a future self?

115y

In the paperclip->apple scenario, in the sense that it retains the memory and inherits the assets of the original, and everything else that keeps you 'you' when you start wanting something different.
In the simulation scenario, I'm not sure.

But I don't have a general theory which replies "Yes" [to a counterfactual mugging].

You don't? I was sure you'd handled this case with Timeless Decision Theory.

I will try to write up a sketch of my idea, which involves using a Markov State Machine to represent world states that transition into one another. Then you distinguish evidence about the structure of the MSM, from evidence of your historical path through the MSM. And the best decision to make in a world state is defined as the decision which is part of a policy that maximizes expected ...

Here's a crack at the coin problem.

Firstly TDT seems to answer correctly under one condition, if P(some agent will use my choice as evidence about how I am going to act in these situations and make this offer.) = 0. Then certainly, our AI shouldn't give omega any money. On the other hand, if P(some agent will use my choice as evidence about how I am going to act in these situations and make this offer.) = 0.5, then the expected utility =-100 + 0.5 ( 0.5 (1,000,000) + 0.5(-100)) So my general solution is this, add a node that represents the probability of...

**I THINK I SOLVED ONE** - *EDIT* - Sorry, not quite.

..."Suppose Omega (the same superagent from Newcomb's Problem, who is known to be honest about how it poses these sorts of dilemmas) comes to you and says: "I just flipped a fair coin. I decided, before I flipped the coin, that if it came up heads, I would ask you for $1000. And if it came up tails, I would give you $1,000,000 if and only if I predicted that you would give me $1000 if the coin had come up heads. The coin came up heads - can I have $1000?" Obviously, the only reflectively consiste

112y

Just prefix the quote with a single greater-than sign.

012y

I did, but I don't know how to stop quoting. I can start but I don't know how to stop.
Also, one of the times I tried to quote it I ended up with an ugly horizontal scroll bar in the middle of the text.

112y

A blank newline between the quote and non-quote will stop the quote.
>quoted text
more quoted text
non-quoted text

...Suppose Omega (the same superagent from Newcomb's Problem, who is known to be honest about how it poses these sorts of dilemmas) comes to you and says:

"I just flipped a fair coin. I decided, before I flipped the coin, that if it came up heads, I would ask you for $1000. And if it came up tails, I would give you $1,000,000 if and only if I predicted that you would give me $1000 if the coin had come up heads. The coin came up heads - can I have $1000?"

Obviously, the only reflectively consistent answer in this case is "Yes - here's the $100

013y

What if it's offered just once - but if the coin comes up tails, Omega simulates a universe in which it came up heads, asks you this question, then acts based on your response? (Do whatever you like to ignore anthropics... say, Omega always simulates the opposite result, at the appropriate time.)

013y

To be clear:
* Are both I and my simulation told this is a one-time offer?
* Is a simulation generated whether the real coin is heads or tails?
* Are both my simulation and I told that one of us is a simulation?
* Does the simulation persist after the choice is made?
I suppose the second and fourth points don't matter particularly... as long as the first and third are true, then I consider it plus EV to pay the $1000.

013y

Should you pay the money even if you're not told about the simulations, because Omega is a good predictor (perhaps because it's using simulations)?

013y

If I judge the probability that I am a simulation or equivalent construct to be greater than 1/499500, yes.
(EDIT: Er, make that 1/999000, actually. What's the markup code for strikethrough 'round these parts?)
(EDIT 2: Okay, I'm posting too quickly. It should be just 10^-6, straight up. If I'm a figment then the $1000 isn't real disutility.)
(EDIT 3: ARGH. Sorry. 24 hours without sleep here. I might not be the sim, duh. Correct calculations:
u(pay|sim) = 10^6; u(~pay|sim) = 0; u(pay|~sim) = -1000; u(~pay|~sim) = 0
u(~pay) = 0; u(pay) = P(sim) 10^6 - P(~sim) (1000) = 1001000 * P(sim) - 1000
pay if P(sim) > 1/1001.
Double-checking... triple-checking... okay, I think that's got it. No... no... NOW that's got it.)

...Another stumper was presented to me by Robin Hanson at an OBLW meetup. Suppose you have ten ideal game-theoretic selfish agents and a pie to be divided by majority vote. Let's say that six of them form a coalition and decide to vote to divide the pie among themselves, one-sixth each. But then two of them think, "Hey, this leaves four agents out in the cold. We'll get together with those four agents and offer them to divide half the pie among the four of them, leaving one quarter apiece for the two of us. We get a larger share than one-sixth that

"I just flipped a fair coin. I decided, before I flipped the coin, that if it came up heads, I would ask you for $1000. And if it came up tails, I would give you $1,000,000 if and only if I predicted that you would give me $1000 if the coin had come up heads. The coin came up heads - can I have $1000?"

Err... pardon my noobishness but I am failing to see the game here. This is mostly me working it out audibly.

A less Omega version of this game involves flipping a coin, getting $100 on tails, losing $1 on heads. Using humans, it makes sense ...

115y

See counterfactual mugging for an extended discussion in comments.

015y

Thanks.

If the ten pie-sharers is to be more than a theoretical puzzle, but something with applicability to real decision problems, then certain expansions of the problem suggest themselves. For example, some of the players might conspire to forcibly exclude the others entirely. And then a subset of the conspirators do the same.

This is the plot of "For a Few Dollars More".

How do criminals arrange these matters in real life?

015y

Dagnabbit, another movie I have to see now!
(i.e. thanks for the ref!)

115y

The Dark Knight has an even better example - in the bank robbery scene, each subgroup excludes only one more member, until the only man left is... That's enough of a spoiler I guess.

015y

Yeah ... guess which scene I came in during the middle of? :P

Is this equivalent to the modified Newcomb's problem?

Omega looks at my code and produces a perfect copy of me which it puts in a separate room. One of us (decided by the toss of a coin if you like) is told, "if you put $1000 in the box, I will give $1000000 to your clone."

Once Omega tells us this, we know that putting $1000 in the box won't get us anything, but if we are the sort of person who puts $1000 in the box then we would have gotten $1000000 if we were the other clone.

What happens now if Omega is able to change my utility function? Mayb...

013y

There is at least a slight difference in that in the stated version it is at least question whether any version of you is actually getting anything useful out of giving Omega money.

015y

Indeed. It would seem sufficient to push a bit further and take in the desirebility of upholding verbal contracts. Unless of course, the driver is so harsh as to drive away for a mere second of considering non-payment.

Obviously, the only reflectively consistent answer in this case is "Yes - here's the $1000", because if you're an agent who expects to encounter many problems like this in the future, you will self- modify to be the sort of agent who answers "Yes" to this sort of question - just like with Newcomb's Problem or Parfit's Hitchhiker.

But I don't have a general theory which replies "Yes".

If you think being a rational agent includes an infinite ability to modify oneself, then the game has no solution because such an agent would b...

515y

An agent can guarantee the persistence of a trait by self-modifying into code that provably can never lead to the modification of that trait. A trivial example is that the agent can self-modify into code that preserves a trait and can't self-modify.

-2[anonymous]15y

But more precisely, an agent can guarantee the persistence of a trait only "by self-modifying into code that provably can nevenrlead to the modification of that trait." Anything tied to rationality that guarantees the existence of a conforming modification at the time of offer must guarantee the continued existence of the same capacity after the modification, making the proposed self-modification self-contradictory.

"I can predict that if (the other agent predicts) I choose strategy X, then the other agent will implement strategy Y, and my expected payoff is Z"

...are we allowed to use self-reference?

X = "if the other agent is trustworth...

In an undergraduate seminar on game theory I attended, it was mentioned in an answer to a question posed to the presenter that, when computing a payoff matrix, the headings in the rows and columns *aren't* individual actions, but are rather *entire strategies*; in other words it's as if you pretty much decide what you do *in all circumstances* at the beginning of the game. This is because when evaluating strategies *nobody cares when you decide*, so might as well act as if you had them all planned out in advance. So in that spirit, I'm going to use the following p...

012y

Further elaboration on the cake problem's discrete case:
Suppose there are two slices of cake, and three people who can chose how these will be distributed, by majority vote. Nobody votes so that they alone get both slices, since they can't get a majority that way. So everybody just votes to get one slice for themselves, and randomly decides who gets the other slice. There can be ties, but you're getting an expected 2/3 of a slice whenever a vote is finally not a tie.
To get the continuous case:
It's tricky, but find a way to extend the previous reasoning to n slices and m players, and then take the limit as n goes to infinity. The voting sessions do get longer and longer before consensus is reached, but even when consensus is forever away, you should be able to calculate your expectation of each outcome...

For the coin came up tails give me 1000 please case does it reduce to this?

"I can predict that if (the other agent predicts) I choose strategy X: for any gamble I'd want to enter if the consequences were not already determined, I will pay when i lose, then the other agent will implement strategy Y: letting me play, and my expected payoff is Z:999,000",

"I just flipped a fair coin. I decided, before I flipped the coin, that if it came up heads, I would ask you for $1000. And if it came up tails, I would give you $1,000,000 if and only if I ...

Suppose you have ten ideal game-theoretic selfish agents and a pie to be divided by

majority vote.......Every majority coalition and division of the pie, is

dominatedby anothermajoritycoalition in which each agent of the new majority getsmorepie. There does not appear to be any such thing as a dominant majority vote.

I suggest offering the following deal at the outset:

"I offer each of you the opportunity to lobby for an open spot in a coalition with me, to split the pie equally six ways, formed with a mutual promise that we will not defect, an...

...Here's yet another problem whose proper formulation I'm still not sure of, and it runs as follows. First, consider the Prisoner's Dilemma. Informally, two timeless decision agents with common knowledge of the other's timeless decision agency, but no way to communicate or make binding commitments, will both Cooperate because they know that the other agent is in a similar epistemic state, running a similar decision algorithm, and will end up doing the same thing that they themselves do. In general, on the True Prisoner's Dilemma, facing an opponent who c

I'm curious as to what extend is Timeless Decision Theory compared to this proposal: by Arntzenius http://uspfiloanalitica.googlegroups.com/web/No+regrets+%28Arntzenius%29.pdf?gda=0NZxMVIAAABcaixQLRmTdJ3- x5P8Pt_4Hkp7WOGi_UK-R218IYNjsD-841aBU4P0EA-DnPgAJsNWGgOFCWv8fj8kNZ7_xJRIVeLt2muIgCMmECKmxvZ2j4IeqPHHCwbz-gobneSjMyE

013y

It's different because the problems it talks about aren't determined by what decision is made in the end, but by the state of mind of the person making the decision (in a particular and perhaps quite limited way).
You could probably show that a mixed-strategy-aware problem could make the proposed theory fail in a similar way to how causal decision theory fails (i.e. is reflectively inconsistent) on Newcomb's problem. But it might be easy to extend TDT in the same way to resolve that.

Agents A & B are two TDT agents playing some prisoner's dilemma scenario. A can reason:

u(c(A)) = P(c(B))u(C,C) + P(d(B))u(C,D)

u(d(A)) = P(c(B))u(D,C) + P(d(B))u(D,D)

( u(X) is utility of X, P() is probability, c() & d() are cooperate & defect predicates )

A will always pick the option with higher utility, so it reasons B will do the same:

p(c(B) u'(c(B)) > u'(d(B)) --> c(B)

(u'() is A's estimate of B's utility function)

But A can't perfectly predict B (even though it may be quite good at it), so A can represent this uncertainty as a random ...

013y

A simpler way to say all this is "Pick a depth where you will stop recursing (due to growing uncertainty or computational limits) and at that depth assume your opponent acts randomly." Is my first attempt needlessly verbose?

I think I have a general theory that gives the "correct" answer to Omega problem here and Newcomb's problem.

The theory depends on the assumption that Omega makes his prediction by evaluating the decision of an accurate simulation of you (or does something computationally equivalent, which should be the same). In this case there are two of you, real-you and simulation-you. Since you are identical to your simulation the two of you can reasonably be assumed to share an identity and thus have common goals (presumably that real-you gets the money be...

013y

Some interesting things to think about:
Why is it a 50/50 chance? Why not a 1% chance you're in the simulation, or 99%?
This approach requires that Omega work a certain way. What if Omega didn't work that way?
Why would simulation you care what happens to real you? Why not just care about your own future?
A way of resolving problems involving Omega would be most useful if it wasn't just a special-purpose tool, but was instead the application of logic to ordinary decision theory. What could you do to ordinary decision theory that would give you a satisfactory answer in the limit of the other person becoming a perfect predictor?

013y

1) I agree that this depends on Omega operating a certain way, but as I argue here: http://lesswrong.com/lw/nc/newcombs_problem_and_regret_of_rationality/35zm?c=1, your answer should depend on how Omega operates.
2) 50/50 assuming that there is one real you and one simulation.
3) The simulation should care because your goal as a self-interested being to "do what is best for me" should really be "do what is best for people with my identity" (because how else do you define "me"). Your simulation has the same identity (except for maybe the 10s of experience it has had since the simulation started, which doesn't feel like it should matter), so it should therefore care also about your outcome.
4) This is a modification of classical decision theory, modifying your objective to also care about the well-being of your simulations/ people that you are simulations of.

013y

1) Interesting. But you didn't really argue why your answer should depend on Omega, you just illustrated what you would do for different Omegas. By the statement of the problem, the outcome doesn't depend on how Omega works, so changing your answers for different Omegas is exactly the same as having different answers for the same outcome. This contradicts ordinary decision theory.
I suspect that you are simply not doing Newcomb's problem in the other comments - you are trying to evaluate the chances of Omega being wrong, when you are directly given that Omega is always right. Even in any simulation of Omega's, simulated Omega already knows whether simulated you will 1-box or 2-box, or else it's no longer Newcomb's problem.
2) Why is this assumption better than some other assumption? And if it isn't, why should we assume something at all?
3) Well, you could define "me" to involve things like continuity or causal interactions to resolve the fact that cloning yourself doesn't give you subjective immortality.
4) But it's not broadly applicable, it's just a special purpose tool that depends not only on how Omega works, but also on how many of you it simulates under what conditions, something that you never know. It also has a sharp discontinuity as Omega becomes an imperfect predictor, rather than smoothly approaching a limit, or else it quickly becomes unusable as subjective questions about the moral value of things almost like you and maybe not conscious crop up.

013y

1) I argue that my answer should depend on Omega because I illustrate different ways that Omega could operate that would reasonably change what the correct decision was. The problem statement can't specify that my answer shouldn't depend on how Omega works. I guess it can specify that I don't know anything about Omega other than that he's right and I would have to make assumptions based on what seemed like the most plausible way for that to happen. Also many statements of the problem don't specify that Omega is always right. Besides even if that were the case, how could I possibly know for sure that Omega was always right (or that I was actually talking to Omega).
2) Maybe its not the best assumption. But if Omega is running simulations you should have some expectation of the probability of you being a simulation and making it proportional to the number of simulations vs. real you's seems reasonable.
3) Well when simulated-you is initiated is it psychologically contiguous with real-you. After the not much time that the simulation probably takes, it hasn't diverged that much.
4) Why should it have a sharp discontinuity when Omega becomes an imperfect predictor? This would only happen if you had a sharp discontinuity about how much you cared about copies of yourself as they became less than perfect. Why should one discontinuity be bad but not the other?
OK. Perhaps this theory isn't terribly general but it is a reasonably coherent theory that extends to some class of other examples and produces the "correct" answer in these ones.

113y

I don't think this is going to work.
Consider a variation of 'counterfactual mugging' where Omega asks for $101 if heads but only gives back $100 if (tails and) it predicts the player would have paid up. Suppose that, for whatever reason, Omega runs 2 simulations of you rather than just 1. Then by the logic above, you should believe you are a simulation with probability 2/3, and so because 2x100 > 1x101, you should pay up. This is 'intuitively wrong' because if you choose to play this game many times then Omega will just take more and more of your money.
In counterfactual mugging (the original version, that is), to be 'reflectively consistent', you need to pay up regardless of whether Omega's simulation is in some obscure sense 'subjectively continuous' with you.
Let me characterize your approach to this problem as follows:
1. You classify each possible state of the game as either being 'you' or 'not you'.
2. You decide what to do on the basis of the assumption that you are a sample of one drawn from a uniform distribution over the set of states classified as being 'you'.
Note that this approach gets the answer wrong for the absent-minded driver problem unless you can somehow force yourself to believe that the copy of you at the second intersection is really a mindless robot whose probability of turning is (for whatever reason) guaranteed to be the same as yours.

013y

Though perhaps you are right and I need to be similarly careful to avoid counting some outcomes more often than others, which might be a problem, for example, if Omega ran different numbers of simulations depending on the coin flip.

013y

So if Omega simulates several copies of me, it can't be that both of them, by themselves have the power to make Omega decide to give real-me the money. So I have to give Omega money twice in simulation to get real-me the money once.
As for the absent-minded driver problem the problem here is that the probabilistic approach overcounts probability in some situations but not in others. It's like playing the following game:
Phase 1: I flip a coin If it was heads, there is a Phase 1.5 in which you get to guess its value and then are given an amnesia pill Phase 2: You get to guess whether the coin came up heads and if you are right you get $1.
Using the bad analysis from the absent-minded driver problem. Your strategy is to always guess that it is heads with probability p. Suppose that there is a probability alpha that you are in phase 1.5 when you guess, and 1-alpha that you are in phase 2.
Well your expected payoff is then (alpha)(p) + (1-alpha)*(1/2)
This is clearly silly. This is because if the coin came up heads they counted your strategy twice. I guess to fix this in general, you need to pick a time at which you make your averaging over possible yous. (For example, only count it at phase 2).
For the absent-minded driver problem, you could either choose at the first intersection you come to, in which case you have
1*(p^2+4p(1-p)) + 0(p+4(1-p))
Or the last intersection you come to in which case alpha = 1-p and you have
(1-p)*0 + p(p + 4(1-p))
(the 0 because if X is your last intersection you get 0) Both give the correct answer.

013y

This is (a variant of) the Sleeping Beauty problem. I'm guessing you must be new here - this is an old 'chestnut' that we've done to death several times. :-)
Good stuff. But now here's a really stupid idea for you:
Suppose you're going to play Counterfactual Mugging with an Omega who (for argument's sake) doesn't create a conscious simulation of you. But your friend Bill has a policy that if you ever have to play counterfactual mugging, and the coin lands tails then he will create a simulation of you as you were just prior to the game and make your copy have an experience indistinguishable from the experience you would have had of Omega asking you for money (as though the coin had landed heads). Then following your approach, surely you ought now to pay up (whereas you wouldn't have previously)? Despite the fact that your friend Bill is penniless, and his actions have no effect on Omega or your payoff in the real world?

013y

I don't see why you think I should pay if Bill is involved. Knowing Bill's behavior, I think that there's a 50% chance that I am real, any paying earns me -$1000, and there's a 50% chance that I am a Bill-simulation and paying earns me $0. Hence paying earns me an expected -$500.

013y

If you know there is going to be a simulation then your subjective probability for the state of the real coin is that it's heads with probability 1/2. And if the coin is really tails then, assuming Omega is perfect, your action of 'giving money' (in the simulation) seems to be "determining" whether or not you receive money (in the real world).
(Perhaps you'll simply take this as all the more reason to rule out the possibility that there can be a perfect Omega that doesn't create a conscious simulation of you? Fair enough.)

013y

I'm not sure I would buy this argument unless you could claim that my Bob-simulation's actions would cause Omega to give or not give me money. At very least it should depend on how Omega makes his prediction.

013y

Perhaps a clearer variation goes as follows: Bill arranges so that if the coin is tails then (a) he will temporarily receive your winnings, if you get any, and (b) he will do a flawless imitation of Omega asking for money.
If you pay Bill then he returns both what you paid and your winnings (which you're guaranteed to have, by hypothesis). If you don't pay him then he has no winnings to give you.

013y

Well look: If the real coin is tails and you pay up, then (assuming Omega is perfect, but otherwise irrespectively of how it makes its prediction) you know with certainty that you get the prize. If you don't pay up then you would know with certainty that you don't get the prize. The absence of a 'causal arrow' pointing from your decision to pay to Omega's decision to pay becomes irrelevant in light of this.
(One complication which I think is reasonable to consider here is 'what if physics is indeterministic and so knowing your prior state doesn't permit Omega (or Bill) to calculate with certainty what you will do?' Here I would generalize the game slightly so that if Omega calculates that your probability of paying up is p then you receive proportion p of the prize. Then everything else goes through unchanged - Omega and Bill will now calculate the same probability that you pay up.)

013y

OK. I am uncomfortable with the idea of dealing with the situation where Omega is actually perfect.
I guess this boils down to me being not quite convinced by the arguments for one-boxing in Newcomb's problem without further specification of how Omega operates.

013y

Do you know about the "Smoking Lesion" problem?
At first sight it appears to be isomorphic to Newcomb's problem. However, a couple of extra details have been thrown in:
1. A person's decisions are a product of both conscious deliberation and predetermined unconscious factors beyond their control.
2. "Omega" only has access to the latter.
Now, I agree that when you have an imperfect Omega, even though it may be very accurate, you can't rule out the possibility that it can only "see" the unfree part of your will, in which case you should "try as hard as you can to two-box (but perhaps not succeed)." However, if Omega has even "partial access" to the "free part" of your will then it will usually be best to one-box.
Or at least this is how I like to think about it.

013y

I did not know about it, thanks for pointing it out. It's Simpson's paradox the decision theory problem.
On the other hand (ignoring issues of Omega using magic or time travel, or you making precommitments), isn't Newcomb's problem always like this in that there is no direct causal relationship between your decision and his prediction, just that they share some common causation.

013y

1) Yes, perfection is terribly unrealistic, but I think it gets too complicated to be interesting if it's done any other way. It's like a limit in mathematics - in fact, it should be the limit of relating to any prediction process as that process approaches perfection, or else you have a nasty discontinuity in your decision process, because all perfect processes can just be defined as "it's perfect."
2) Okay.
3) Statistical correlation, but not causal, so my definition would still tell them apart. In short, if you could throw me into the sun and then simulate me to atom-scale perfection, I would not want you to. This is because continuity is important to my sense of self.
4) Because any solution to the problem of consciousness and relationship between how much like you it is and how much you identify with it is going to be arbitrary. And so the picture in my head is is that the function of how much you would be willing to pay becomes multivalued as Omega becomes imperfect. And my brain sees a multivalued function and returns "not actually a function. Do not use."

013y

1) OK taking a limit is an idea I hadn't thought of. It might even defeat my argument that your answer depends on how Omega achieves this. On the other hand:
a) I am not sure what the rest of my beliefs would look like anymore if I saw enough evidence to convince me that Omega was right all the time with probability 1-1/3^^^3 .
b) I doubt that the above is even possible, since given my argument you shouldn't be able to convince me that the probability is less than say 10^-10 that I am a simulation talking to something that is not actually Omega.
3) I am not sure why you think that the simulation is not causally a copy of you. Either that or I am not sure what your distinction between statistical and causal is.
3+4) I agree that one of the weaknesses of this theory is that it depends heavily, among other things, on a somewhat controversial theory of identity/ what it means to win. Though I don't see why the amount that you identify with an imperfect copy of yourself should be arbitrary, or at very least if that's the case why its a problem for the dependence of your actions on Omega's degree of perfection to be arbitrary, but not a problem for your identification with imperfect copies of yourself to be.

My first thought on the coalition scenario is that the solution might hinge on something as simple as the agents deciding to avoid a stable equilibrium that does not terminate in anyone ending up with pie.

Edit: this seems to already have been discussed at length. That'll teach me to reply to year old threads without an adequate perusal of the preexisting comments.

...Obviously, the only reflectively consistent answer in this case is "Yes - here's the $1000", because if you're an agent who expects to encounter many problems like this in the future, you will self-modify to be the sort of agent who an

I don't really wanna rock the boat here, but in the words of one of my professors, it "needs more math".

I predict it will go somewhat like this: you specify the problem in terms of A implies B, etc; you find out there's infinite recursion; you prove that the solution doesn't exist. Reductio ad absurdum anyone?

Instead of assuming that other will behave as a function of our choice, we look at the rest of the universe (including other sentient being, including Omega) as a system where our own code is part of the data.

Given a prior on physics, there is a well defined code that maximizes our expected utility.

That code wins. It one boxes, it pays Omega when the coin falls on heads etc.

I think this solves the infinite regress problem, albeit in a very unpractical way,

015y

This doesn't sound obviously wrong, but is too vague even for an informal answer.

015y

Well, if you want practicality, I think Omega problems can be disregarded, they're not realistic. It seems that the only feature needed for the real world is the ability to make trusted promises as we encounter the need to make them.
If we are not concerned with practicality but the theoretical problem behind these paradoxes, the key is that other agents make prediction on your behavior, which is the same as saying they have a theory of mind, which is simply a belief distribution over your own code.
To win, you should take the actions that make their belief about your own code favorable to you, which can include lying, or modifying your own code and showing it to make your point.
It's not our choice that matters in these problem but our choosing algorithm.

015y

Again, if you can state same with precision, it could be valuable, while on this level my reply is "So?".

015y

I confess I do not grasp the problem well enough to see where the problem lies in my comment. I am trying to formalize the problem, and I think the formalism I describe is sensible.
Once again, I'll reword it but I think you'll still find it too vague : to win, one must act rationally and the set of possible action includes modifying one's code.
The question was
I do not know the specifics of Eliezer's timeless decision theory, but it seems to me that if one looks at the decision process of other based on their belief of your code, not on your decisions, there is no infinite regression progress.
You could say : Ah but there is your belief about an agent's code, then his belief about your belief about his code, then your belief about his belief about your belief about his code, and that looks like an infinite regression. However, there is really no regression since "his belief about your belief about his code" is entirely contained in "your belief about his code".

015y

Thanks, this comment makes your point clearer. See cousin_it's post Re-formalizing PD.

If you're an AI, you do not have to (and shouldn't) pay the first $1000, you can just self-modify to pay $1000 in all the following coin flips (if we assume that the AI can easily rewrite/modify it's own behaviour in this way). Human brains probably don't have this capability, so I guess paying $1000 even in the first game makes sense.

015y

That assumes that you didn't expect to face problems like that in the future before omega presented you with the problem, but do expect to face problems like that in the future after omega presents you with the problem. It doesn't work at all if you only get one shot at it. (and you should already be a person who would pay, just in case you do)

I had a look at the existing literature. It seems as though the idea of a "rational agent" who takes one box goes quite a way back:

"Rationality, Dispositions, and the Newcomb Paradox" (Philosophical Studies, volume 88, number 1, October 1997)

Abstract: "In this article I point out two important ambiguities in the paradox. [...] I draw an analogy to Parfit's hitchhiker example which explains why some people are tempted to claim that taking only one box is rational. I go on to claim that although the ideal strategy is to adopt a nece...

On dividing the pie, I ran across this in an introduction to game theory class. I think the instructor wanted us to figure out that there's a regress and see how we dealt with it. Different groups did different things, but two members of my group wanted to be nice and not cut anyone out, so our collective behavior was not particularly rational. "It's not about being nice! It's about getting the points!" I kept saying, but at the time the group was about 16 (and so was I), and had varying math backgrounds, and some were less interested in that aspect of the game.

I think at least one group realized there would always be a way to undermine the coalitions that assembled, and cut everyone in equally.

-115y

One might guess that evolution granted us a strong fairness drive to avoid just these sorts of decision regresses.

215y

Fail.

015y

It's not group selection: if group A splits things evenly and moves on, while group B goes around and around with fractious coalitions until a tiger comes along and eats them, then being in group A confers an individual advantage.
Clearly evolution also gave us the ability to make elaborate justifications as to why we, particularly, deserve more than an equal share. But that hardly disallows the fairness heuristic as a fallback option when the discussion is taking longer than it deserves. (And some people just have the stamina to keep arguing until everyone else has given up in disgust. These usually become middle managers or Congressmen.)

015y

What you just described is group selection, and thus highly unlikely.
It's to your individual benefit to be more (unconsciously) selfish and calculating in these situations, whether the other people in your group have a fairness drive or not.

215y

Not if you are punished for selfishness. I'm not sure how reasonable the following analysis it (since I didn't study this kind of thing at all); it suggests that fairness is a stable strategy, and given some constraints a more feasible one than selfishness:
M. A. Nowak, et al. (2000). `Fairness versus reason in the ultimatum game.'. Science 289(5485):1773-1775. (PDF)

015y

See reply to Tim Tyler.

215y

...and if your companions have circuitry for detecting and punishing selfish behaviour - what then? That's how the "fairness drive" is implemented - get mad and punish cheaters until it hurts. That way, cheaters learn that crime doesn't pay - and act fairly.

015y

I agree. But you see how this individual selection pressure towards fairness is different from the group selection pressure that dclayh was actually asserting?

-215y

You and EY seem to be the people who are talking about group selection.

015y

Not when the cost (including opportunity cost) of doing the calculating outweighs the benefit it would give you.

115y

You're introducing weaker and less plausible factors to rescue a mistaken assertion. It's not worth it.
As pointed out below in this thread, the fairness drive almost certainly comes from the individual pressure of cheaters being punished, not from any group pressure as you tried to say above.

115y

Statement of the obvious: Spending excessive time deciding is neither rational nor evolutionarily favored.

Actually here's my argument why (ignoring the simulation arguments) you should actually refuse to give Omega money.

Here's what actually happened:

Omega flipped a fair coin. If it comes up heads the stated conversation happened. If it comes up tails and Omega predicts that you would have given him $1000, he steals $1000000 from you.

If you have a policy of paying you earn 10^6/4 - 10^3/4 -10^6/2 = -$250250. If you have a policy of not paying you get 0.

More realistically having a policy of paying Omega in such a situation could earn or lose you money if peo...

213y

Unless I'm misunderstanding you, this is a violation of one of the premises of the problem, that Omega is known to be honest about how he poses dilemmas.

113y

Fine if you think that Omega would have told me about the previous coin flip consider this:
There are two different supernatural entities who can correctly predict my response to the counterfactual mugging. There's Omega and Z.
Two things could theoretically happen to me:
a) Omega could present me with the counterfactual mugging problem.
b) Z could decide to steal $1000000 from me if and only if I would have given Omega $1000 in the counterfactual mugging.
When I am trying to decide on policy for dealing with counterfactual muggings I should note that my policy will affect my outcome in both situation (a) and (b). The policy of giving Omega money will win me $499500 (expected) in situation (a), but it will lose me $1000000 in situation (b). Unless I have a reason to suspect that (a) is at least twice as likely as (b), I have no reason to prefer the policy of giving Omega money.

313y

The basis of the dilemma is that you know that Omega, who is honest about the dilemmas he presents, exists. You have no evidence that Z exists. You can posit his existence, but it doesn't make the dilemma symmetrical.

213y

But if instead Z exists, shows up on your doorstep and says (in his perfectly trustworthy way) "I will take your money if and only if you would have given money to Omega in the counterfactual mugging", then you have evidence that Z exists but no evidence that Omega does.
The point is that you need to make your policy before either entity shows up. Therefore unless you have evidence now that one is more likely than the other, not paying Omega is the better policy (unless you think of more hypothetical entities).

013y

Agreed. Neither is likely to happen, but the chance of something analogous happening may be relevant when forming a general policy. Omega in Newcombe's problem is basically asking you to guard something for pay without looking at it or stealing it. The unrealistic part is being a perfect predictor and perfectly trustworthy and you therefore knowing the exact situation.
Is there a more everyday analogue to Omega as the Counterfactual Mugger?

213y

People taking bets for you in your absence.
It's probably a good exercise to develop a real-world analogue to all philosophical puzzles such as this wherever you encounter them; the purpose of such thought experiments is not to create entirely new situations, but to strip away extraneous concerns and heuristics like "but I trust my friends" or "but nobody is that cold-hearted" or "but nobody would give away a million dollars for the hell of it, there must be a trick".

113y

Good point. On the other hand I think that Omega being a perfect predictor through some completely unspecified mechanism is one of the most confusing parts of this problem. Also as I was saying, it is also complicating issue that you do not know anything about the statistical behavior of possible Omegas (though I guess that there are ways to fix that in the problem statement).

113y

It may be a truly magical power, but any other method of stipulating better-than-random prediction has a hole in it that lets people ignore the actual decision in favor of finding a method to outsmart said prediction method. Parfit's Hitchhiker, as usually formalised on LessWrong, involves a more believable good-enough lie-detector - but prediction is much harder than lie-detection, we don't have solid methods of prediction that aren't gameable, and so forth, until it's easier to just postulate Omega to get people to engage with the decision instead of the formulation.

013y

Now if the method of prediction were totally irrelevant, I think I would agree with you. On the other hand, method of prediction can be the difference between your choice directly putting the money in the box in Newcomb's problem and a smoking lesion problem. If the method of prediction is relevant, than requiring an unrealistic perfect predictor is going to leave you with something pretty unintuitive. I guess that a perfect simulation or a perfect lie detector would be reasonable though. On the other hand outsmarting the prediction method may not be an option. Maybe they give you a psychology test, and only afterwords offer you a Newcomb problem. In any case I feel like confusing bits of problem statement are perhaps just being moved around.

013y

There is one Omega, Omega and Newcomb's problem gives his profile!

013y

And he makes a similar mistake in his consideration of the Prisoner's Dilemma. The prisoners are both attempted to maximise their (known) utility function. You aren't playing against an actively malicious agent out to steal people's lunch. You do have reason to expect agents to be more likely to follow their own self interest than not, even in cases where this isn't outright declared as part of the scenario.

013y

Here I'm willing to grant a little more. I still claim that whether or not cooperating in the mirror match is a good strategy depends on knowing statistical information about the other players you are likely to face. On the other hand in this case, you may well have more reasonable grounds for your belief that you will see more mirror matches than matches against people who specifically try to punish those who cooperate in mirror matches.

013y

Having thought about it a little more, I think I have pinpointed my problem with building a decision theory in which real outcomes are allowed to depend on the outcomes of counterfactuals:
The output of your algorithm in a given situation will need to depend on your prior distribution and not just on your posterior distribution.
In CDT, your choice of actions depends only on the present state of the universe. Hence you can make your decision based solely on your posterior distribution on the present state of the universe.
If you need to deal with counterfactuals though, the output of your algorithm in a given situation should depend not only on the state of the universe in that situation, but on the probability that this situation appears in a relevant counterfactual and upon the results thereof. I cannot just consult my posterior and ask about the expected results of my actions. I also need to consult my prior and compute the probability that my payouts will depend on a counterfactual version of this situation.

...At the point where Omega asks me this question, I already know that the coin came up heads, so I already know I'm not going to get the million. It seems like I want to decide "as if" I don't know whether the coin came up heads or tails, and then implement that decision even if I know the coin came up heads. But I don't have a good formal way of talking about how my decision in one state of knowledge has to be determined by the decision I would make if I occupied a different epistemic state, conditioning using the probability previously possess

615y

It seems quite convenient that you can physically give him your credit card.

-415y

The least convenient possible world has the following problems:
* It's possible, but how probable?
* It's nothing like the real world.
* You always lose in it.
Due to these traits, lessons learned in such a world are worthless in the real one and by invoking it you accomplish nothing.

"Suppose you have ten ideal game-theoretic selfish agents and a pie to be divided by majority vote. "

Well then, the statistical expected (average) share any agent is going to get long-term is 1/10th of the pie. The simplest solution that ensures this is the equal division; anticipating this from the start cuts down on negotiation costs, and if a majority agrees to follow this strategy (i.e agrees to not realize more than their "share"), it is also stable - anyone who ponders upsetting it risks to be the "odd man out" who eats ...

Suppose you're out in the desert, running out of water, and soon to die - when someone in a motor vehicle drives up next to you. Furthermore, the driver of the motor vehicle is a perfectly selfish ideal game-theoretic agent, and even further, so are you; and what's more, the driver is Paul Ekman, who's really, really good at reading facial microexpressions. The driver says, "Well, I'll convey you to town if it's in my interest to do so - so will you give me $100 from an ATM when we reach town?"

Now of course you wish you could answer "Yes", but as an ideal game theorist yourself, you realize that, once you actually reach town, you'll have no further motive to pay off the driver. "Yes," you say. "You're lying," says the driver, and drives off leaving you to die.

If only you weren't so rational!

This is the dilemma of Parfit's Hitchhiker, and the above is the standard resolution according to mainstream philosophy's causal decision theory, which also two-boxes on Newcomb's Problem and defects in the Prisoner's Dilemma. Of course, any

self-modifyingagent who expects to face such problems - in general, or in particular - will soon self-modify into an agent that doesn't regret its "rationality" so much. So from the perspective of a self-modifying-AI-theorist, classical causal decision theory is a wash. And indeed I've worked out a theory, tentatively labeled "timeless decision theory", which covers these three Newcomblike problems and delivers a first-order answer that is already reflectively consistent, without need to explicitly consider such notions as "precommitment". Unfortunately this "timeless decision theory" would require a long sequence to write up, and it's not my current highest writing priority unless someone offers to let me do a PhD thesis on it.However, there are some other timeless decision problems for which I do

notpossess a general theory.For example, there's a problem introduced to me by Gary Drescher's marvelous

Good and Real(OOPS: The below formulation was independently invented by Vladimir Nesov; Drescher's book actually contains a related dilemma in which box B is transparent, and only contains $1M if Omega predicts you will one-box whether B appears full or empty, and Omega has a 1% error rate) which runs as follows:Suppose Omega (the same superagent from Newcomb's Problem, who is known to be honest about how it poses these sorts of dilemmas) comes to you and says:

"I just flipped a fair coin. I decided, before I flipped the coin, that if it came up heads, I would ask you for $1000. And if it came up tails, I would give you $1,000,000 if and only if I predicted that you would give me $1000 if the coin had come up heads. The coin came up heads - can I have $1000?"

Obviously, the only reflectively consistent answer in this case is "Yes - here's the $1000", because if you're an agent who expects to encounter many problems like this in the future, you will self-modify to be the sort of agent who answers "Yes" to this sort of question - just like with Newcomb's Problem or Parfit's Hitchhiker.

But I don't have a general theory which replies "Yes". At the point where Omega asks me this question, I already know that the coin came up heads, so I already know I'm not going to get the million. It seems like I want to decide "as if" I don't know whether the coin came up heads or tails, and then implement that decision even if I know the coin came up heads. But I don't have a good formal way of talking about how my decision in one state of knowledge has to be determined by the decision I would make if I occupied a different epistemic state, conditioning using the probability

previouslypossessed by events I havesincelearned the outcome of... Again, it's easy to talk informally about why you have to reply "Yes" in this case, but that's not the same as being able to exhibit a general algorithm.Another stumper was presented to me by Robin Hanson at an OBLW meetup. Suppose you have ten ideal game-theoretic selfish agents and a pie to be divided by

majority vote. Let's say that six of them form a coalition and decide to vote to divide the pie among themselves, one-sixth each. But then two of them think, "Hey, this leaves four agents out in the cold. We'll get together with those four agents and offer them to divide half the pie among the four of them, leaving one quarter apiece for the two of us. We get a larger share than one-sixth that way, and they get a larger share than zero, so it's an improvement from the perspectives of all six of us - they should take the deal." And those six then form a new coalition and redivide the pie. Then another two of the agents think: "The two of us are getting one-eighth apiece, while four other agents are getting zero - we should form a coalition with them, and by majority vote, give each of us one-sixth."And so it goes on: Every majority coalition and division of the pie, is

dominatedby anothermajoritycoalition in which each agent of the new majority getsmorepie. There does not appear to be any such thing as a dominant majority vote.(Robin Hanson actually used this to suggest that if you set up a Constitution which governs a society of humans and AIs, the AIs will be unable to conspire among themselves to change the constitution and leave the humans out in the cold, because then the new compact would be dominated by yet other compacts and there would be chaos, and therefore any constitution stays in place forever. Or something along those lines. Needless to say, I do not intend to rely on such, but it would be nice to have a formal theory in hand which shows how ideal reflectively consistent decision agents will act in such cases (so we can

provethey'll shed the old "constitution" like used snakeskin.))Here's yet another problem whose proper

formulationI'm still not sure of, and it runs as follows. First, consider the Prisoner's Dilemma. Informally, two timeless decision agents with common knowledge of the other's timeless decision agency, but no way to communicate or make binding commitments, will both Cooperate because they know that the other agent is in a similar epistemic state, running a similar decision algorithm, and will end up doing the same thing that they themselves do. In general, on the True Prisoner's Dilemma, facing an opponent who can accurately predict your own decisions, you want to cooperate only if the other agent will cooperate if and only if they predict that you will cooperate. And the other agent is reasoning similarly: They want to cooperate only if you will cooperate if and only if you accurately predict that they will cooperate.But there's actually an infinite regress here which is being glossed over - you won't cooperate

justbecause you predict that they will cooperate, you will only cooperate if you predicttheywill cooperateif and only ifyou cooperate. So the other agent needs to cooperate if they predict that you will cooperateifyou predict that they will cooperate... (...only if they predict that you will cooperate, etcetera).On the Prisoner's Dilemma in

particular, this infinite regress can be cut short by expecting that the other agent is doing symmetrical reasoning on a symmetrical problem and will come to a symmetrical conclusion, so that you can expect their action to be the symmetrical analogue of your own - in which case (C, C) is preferable to (D, D). But what if you're facing a more general decision problem, with many agents having asymmetrical choices, and everyone wants to have their decisions depend on how they predict that other agents' decisions depend on their own predicted decisions? Is there a general way of resolving the regress?On Parfit's Hitchhiker and Newcomb's Problem, we're

toldhow the other behaves as adirectfunction of our own predicted decision - Omega rewards you if you (are predicted to) one-box, the driver in Parfit's Hitchhiker saves you if you (are predicted to) pay $100 on reaching the city. My timeless decision theory only functions in cases where the other agents' decisions can be viewed as functions of one argument, that argument being your own choice in that particular case - either by specification (as in Newcomb's Problem) or by symmetry (as in the Prisoner's Dilemma). If their decision is allowed to depend on how your decisiondepends ontheir decision - like saying, "I'll cooperate, not 'if the other agent cooperates', butonlyif the other agent cooperatesif and only if I cooperate- if I predict the other agent to cooperateunconditionally, then I'll just defect" - then in general I do not know how to resolve the resulting infinite regress of conditionality, except in the special case of predictable symmetry.You perceive that there is a definite note of "timelessness" in all these problems.

Any offered solution may assume that a timeless decision theory for direct cases already exists - that is, if you can reduce the problem to one of "I can predict that if (the other agent predicts) I choose strategy X, then the other agent will implement strategy Y, and my expected payoff is Z", then I already have a reflectively consistent solution which this margin is unfortunately too small to contain.

(In case you're wondering, I'm writing this up because one of the SIAI Summer Project people asked if there was any Friendly AI problem that could be modularized and handed off and potentially written up afterward, and the answer to this is almost always "No", but this is actually the one exception that I can think of. (Anyone actually taking a shot at this should probably familiarize themselves with the existing literature on Newcomblike problems - the edited volume "Paradoxes of Rationality and Cooperation" should be a sufficient start (and I believe there's a copy at the SIAI Summer Project house.)))