Wiki Contributions


You seem to have misunderstood the problem statement [1]. If you commit to doing "FDT, except that if the predictor makes a mistake and there’s a bomb in the Left, take Right instead", then you will almost surely have to pay $100 (since the predictor predicts that you will take Right), whereas if you commit to using pure FDT, then you will almost surely have to pay nothing (with a small chance of death). There really is no "strategy that, if the agent commits to it before the predictor makes her prediction, does better than FDT".

[1] Which is fair enough, as it wasn't actually specified correctly: the predictor is actually trying to predict whether you will take Left or Right if it leaves its helpful note, not in the general case. But this assumption has to be added, since otherwise FDT says to take Right.

-"Charging a toll for a bridge you didn’t build is not okay; that’s pure extraction."

This is probably just a nitpick, but as worded this doesn't take into account the scenario where the builder of the bridge sells the rights to charge a toll to another party, who can then legitimately charge the toll even though they didn't build the bridge.

Yes they do. For simplicity suppose there are only two hosts, and suppose host A precommits to not putting money host B's box, while host B makes no precommitments about how much money he will put in host A's box. Then the human's optimal strategy is "pick host A's box with probability 1 - x epsilon, where x is the amount of money in host A's box". This incentivizes host B to maximize the amount in host A's box (resulting in payoff ~101 for the human), but it would have been better for him if he had precommitted to do the same as A, since then by symmetry his box would have been picked half the time instead of 101 epsilon of the time.

Couldn't you equally argue that they will do their best not to be smallest by not putting any money in all their opponent's boxes? After all, "second-fullest" is the same as "third-emptiest".

Ah, you're right. That makes more sense now.

Why would precommitting to pick the second-fullest box give an incentive for predictors to put money in everyone else’s boxes?

If the hosts move first logically, then TDT will lead to the same outcomes as CDT, since it's in each host's interest to precommit to incentivising the human to pick their own box -- once the host has precommitted to doing this, the incentive works regardless of what decision theory the human uses. In math terms, if x is the choice of which box to incentivize (with "incentivize your own box" being interpreted as "don't place any money in any of the other boxes"), the human gets to choose a box f(x) on the basis of x, and the host gets to choose x=g(f) on the basis of the function f, which is known to the host since it is assumed to be superintelligent enough to simulate the human's choices in hypothetical simulations. By definition, the host moving first in logical time would mean that g is chosen before f, and f is chosen on the basis of what's in the human's best interest given that the host will incentivize box g(f). But then the optimal strategy is for g to be a constant function.

Regarding $100 and $200, I think I missed the part where you said the human picks the box with the maximum amount of money -- I was assuming he picked a random box.

Regarding the question of how to force all the incentives into one box, what about the following strategy: choose box 1 with probability 1 - (400 - x) epsilon, where x is the payoff of box 1. Then it is obviously in each host's interest to predict box 1, since it has the largest probability of any box, but then it is also in each host's interest to minimize 400 - x i.e. maximize x. This is true even though the hosts' competition is zero-sum.

You seem to be assuming the human moves first in logical time, before the superintelligent hosts. You also seem to be assuming that the superintelligent hosts are using CDT (if they use FDT, then by symmetry considerations all of their possible actions have equal payoff, so what they do is arbitrary). Any particular reason for these assumptions?

Where do the numbers $152 and $275 come from? I would have thought they should be $100 and $200, respectively.

In the 5 box problem, why doesn't FDT force all of the incentives into box 1, thus getting $400?

-"The main question is: In the counter-factual scenario in which TDT recommends action X to agent A , what does would another agent B do?"

This is actually not the main issue. If you fix an algorithm X for agent A to use, then the question "what would agent B do if he is using TDT and knows that agent A is using algorithm X?" has a well-defined answer, say f(X). The question "what would agent A do if she knows that whatever algorithm X she uses, agent B will use counter-algorithm f(X)" then also has a well-defined answer, say Z. So you could define "the result of TDT agents A and B playing against each other" to be where A plays Z and B plays f(Z). The problem is that this setup is not symmetric, and would yield a different result if we switched the order of A and B.

-"In a blackmail scenario it’s not so obvious, but I do think there is a certain symmetry between rejecting all blackmail and sending all blackmail."

The symmetry argument only works when you have exact symmetry, though. To recall, the argument is that by controlling the output of the TDT algorithm in player A's position, you are also by logical necessity controlling the output in player B's position, hence TDT can act as though it controls player B's action. If there is even the slighest difference between player A and player B then there is no logical necessity and the argument doesn't work. For example, in a prisoner's dilemma where the payoffs are not quite symmetric, TDT says nothing.

-"So I no longer believe the claim that TDT agents simply avoid all negative-sum trades."

I agree with you, but I think that's because TDT is actually undefined in scenarios where negative-sum trading might occur.

Load More