Does Logical Decision Theories actually give meaningfully better recommendations on real world problems, particularly voting, frequently referenced?
One of the main reasons given for preferring logical decision theories (LDT), or particularly functional decision theory (FDT) is that agents do better in real world problems. Indeed, the article here on logical decision theory opens by discussing voting. I recently posted a discussion of a hypothetical where FDT agents perform worse, but I think when applying it in practice to the real world case of voting which is often given as a preference is actually better (see here for Eliezer Yudkowsky's discussion of voting under decision theories where he argues for logical decision theory being better). Particularly, I think that for most people this discussion gets wrong what causal decision theory actually would recommend.
To begin (note, I spend a while going over how to model voting decisions and different utility to CDT modeling of decisions for a few paragraphs, and later discuss practical agent to agent comparisons), let us imagine what the expected utility is for an agent under CDT of voting in some election. Let's say there are two candidates, like Yudkowsky, I will use the Simpson's Kang and Kodos. If Kang wins, we have some expected outcome (O1), if Kodos wins we have some expected outcome (O2). Let's say our agent is a Kang supporter and has a positive evaluation of O1 such that O1>O2.[1]
Our agent is evaluating the value of voting for Kang (A1) or not voting (A0).
In the simplest case, with no externalities an EDT agent would say: "we should vote if our evidential evidence indicates voting is more likely to lead to Kang winning" (i.e., if P(O1|A1)>P(O1|A0). A CDT agent would say "we should vote if there is a positive probability that our vote will cause Kang to win" (we can say this works out equivalently, if P(O1|A1)>P(O1|A0) we should vote).
If we are a simplistic agent, in both cases we should vote, as in either case the value is something greater than zero. But of course, realistically we are not so simple agents, and have some cost to voting. Taking one more step of complexity and stopping there is where I think Yudowsky (and others) go wrong. They correctly note for most real world scenarios the probabilistic effects of a single vote are de minus and humans have some cost associated with voting.
For the CDT agent, they expect there is some probability of their vote being pivotal (we can say P(pivotal)=P(O1|A1)-P(O1|A0) ). They also have some cost of voting (say E). So really, they should vote if P(pivotal) * (O1-O2) > E. That is to say, if the probability of their vote being pivotal, times the difference in expected outcomes caused by their decision to vote, is greater than the cost.
This leads to some sensible recommendations, (i.e., you should be more likely to vote the less costly it is to vote, the more impactful the outcome of the election and the more likely it is your vote will be pivotal). If I am a policy maker and what to increase voting, I should use the policy levers at my disposal to reduce E, and political campaigners should emphasize the impact of the election and the odds of voters impacting results to increase turnout. This is what we observe in the real world.
Where Yudowsky says CDT gets it wrong, however, is that as mentioned the P(pivotal) is vanishingly small. While I framed P(pivotal) as the difference in odds for voting or not, for a CDT agent this could also be reduced to the odds of the candidates tying but for your vote. Obviously, it is incredibly rare that major elections come down to single voters. For the EDT agent, they don't have to make this reduction so fair slightly better under uncertainty, but still would value the difference in odds as very small. Yudowsky says this misses the mark. But what if we take our model one step further? Our agent is a person, people place real value on things other than strict outcomes.
When someone says "it is your civic duty to vote" they are appealing to a real value we can include in our utility functions--people value being members of civic society and participation. In addition, there are social benefits to voting in the form of signaling, people proudly display 'I voted' stickers all the time. This is not internally independent, the more contested an election and the more meaningful the outcomes, the more valuable signaling is.
We can say P(pivotal) is a function of the degree an election is contested and general voting population (number of people and associated behaviors). Similarly, we may say the value of signaling in an election is a function of social values and how contested an election can be.
So we can say a CDT agent under real world conditions should vote if P(pivotal) * (O1-O2) + personal utility (e.g. personal values on being a civically engaged person) + social utility (e.g., benefits from signaling you are civically engaged) > E
This again leads to additional sensible recommendations. If I think well of myself as a civically minded person and value civic contributions and have social connections for whom signaling my voting behavior will provide social benefits, that should all increase my odds of voting. Similarly, for policy makers and political activists, increasing the civic mindedness and social value placed on voting, publicly being seen to promote and reward those voting, etc can be seen as recommended way to increase agent's decision to vote.
As mentioned above, if I am a CDT agent deciding whether to vote, I have to answer the question "is my value from voting, which includes my personal values around voting and my expectations for the likelihood of my vote being pivotal and my expectations for different outcomes of the election greater than my anticipated cost of voting?" One's expectations of the election being pivotal can be determined by examining polls and voting models and making empirically based estimates of the outcomes (these also likely affect one's expectation of the value of signaling). This can be simply understood as trying to estimate what the causal outcome of my votes are likely to be. The empirical side of it is readily, concretely definable depending on my estimated values and while values and signaling effects must be individually parsed, they are usually fairly straightforward for most people.
So where does LDT/FDT differ? As Yud has put it instead of asking what your decision's outcome "You ask what would happen if people like you voted." This gives an obvious recommendation absent under CDT: "the more people that are similar to you, the more you should vote." This recommendation does not match as neatly with intuition (at least not with my institution) and, in fact, implicitly seems to run counter to Yudowsky's previous statement that under LDT it would still be irrational to vote if "if you don't expect any of the elections to be close." Under LDT if a lot of people like you (which might be heuristically judged by people voting for the same candidate as you) are voting, that would seem to provide more evidence that you should vote even more so the more dominant your side of the election is. Though this hinges on a fairly evident open question: who constitutes "people like you?" According to Yud, this is "just an empirical question" but is it really? His 2016 post on LessWrong gives multiple different ways you might think about who constitutes someone similar to you. There doesn't seem to be any unified way for an agent to really make that estimate. Should we perhaps use polls or previous results, maybe just personal estimates of how many people might think similiarly to use under our theory of mind? Those questions seem unanswered, if answerable at all, and depending on how you understand it can lead to contradictory, unintuitive voting patterns.
So, I am an FDT agent. I want to estimate whether I should vote or not. I look at past polling data, 40% of Kang supporters voted in the past. 50% of Kodos voters voted. I expect that the odds of Kodos winning are, say, 80% at present and the odds of my vote being pivotal is 0.000001%. I know a handful of people who think of themselves as LDT agents, most of them have told me they decided not to vote. How can I calculate my EV from this? I don't think there really is any clear way to quantify it, but let's consider a few possibilities qualitatively; should I:
a. Say since I know other LDT agents decided not to vote assume we are similar and that LDT recommends not voting in this decision?
b. Say few people are like me, most people won't reason the way I do, so the odds of the vote being different are de minus and not vote on those grounds?
c. See that most people with similar values to me are not voting and some are, seeing I identify with those who are voting more, but I therefore estimate that people similar to me are already voting so if people similarly to me voted the outcome would likely be the same. On that basis should I decide that being an agent that votes has little value and decide not to vote?
c. Say that there are a lot of Kang supporters not voting, to some extent we are similar (our voting behavior evidentially has some correlation with eachother), if they voted we would have a good chance of winning so I should by imagining the counterfactual where Kang had arbitrarily higher output and vote on that basis?
I don't think there is any clear way of judging these scenarios.
CDT gives a pretty clear recommendation. My expected value for voting for Kang is 0.00000001*(O1-O2) + personal utility + social utility - E (cost of voting). If I evaluate that positively, under my personal internal values of personal utility and social utility and whatever the costs are, I should vote. If I do not, I shouldn't.
Let's say Kang was running against Lisa in the primaries. As polls increasingly show turnout for her is low, she is not considered in the election. However, by some definition of people similar to me, if people similar to me were to turn out for her she could still have had some probability of winning and would be so preferable to Kang and Kodos that I chose to vote for her anyway. [2]This doesn't seem very sensible. A CDT agent would value voting for her (despite knowing that in the specific case she had no chance of winning) if they expected the signaling benefit of their vote to be sufficiently greater to justify 'throwing away' their vote on a candidate who they know has already lost. There are cases where this might be justified, if one expects the odds of their vote being pivotal to be incredibly low, and the difference between O1 and O2 to be fairly negligible, then a protest vote can be considered somewhat rational under some signaling estimates. That calculation seems to me to be far more pragmatic and meaningful than the reasoning in FDT which requires implicitly treating you treating something you know to be false as if it could be true to determine whether to vote.
In this case, for a CDT agent, the expected value of voting can simply be evaluated as personal utility + social utility - E.
CDT seems to give clear recommendations that can be readily evaluated and do at least a serviceable job modeling real world behavior. FDT gives unclear recommendations, that to the extent they can be evaluated give less helpful recommendations. On that basis, it would seem to me that CDT actually wins out as a framework for considering whether it is rational to vote.
I have been thinking about this for a while, and will probably formalize it at some point, but would like to get some of your thoughts in case there is some obvious case/backgroun I am missing.
In some more realistic formulations of dilemmas, agents that make decisions under Functional Decision Theory may have generally inferior outcomes to rational agents acting under alternative decision theories (in this case, I am just going to consider causal decision theory), which creates a seeming paradox that I am sure many readers will already expect (though this is not necessarily true logical paradoxes).
For this briefer post, I am going to assume people here are at least somewhat familiar with Yudkowsky and Soares' writings on the theories (note: he may well have covered this in some piece I have not read, I cannot claim to have read everything he has ever written).
Consider a modification of Parfit's Hitchhiker. The classic situation looks at it from the perspective of the hitchhiker making a binary decision to be honor a deal or not. Let's instead imagine a scenario of what one might call "Parfit's Decent Driver" that looks at how a driver and hitchhiker might interact under more realistic (but still extremely simplified) utility functions (represented as dollars for ease). My initial conception of the scenario runs as follows:
A rational agent (whether CDT or FDT won't change this part), Derek, is driving down an empty road. He sees Will, a person he knows well but has no particular ties to, on the side of the road, with nothing on him. Derek recognizes that Will will die of thirst if he does nothing. It would cost Derek $5 to pick up Will, but he has some utility to saving a life and expects some external signaling effects which he values at a total of $6. Derek values being truthful to an extreme degree: let's say he values being truthful to his word at $1,000,000 (this is just to simplify his maximal utility calculations, but it shouldn’t change the outcomes we care about).
Derek knows with a high degree of confidence (which we will assume is correct) Will values his own life at $1,000,000, Derek also knows Will values being honest and paying back commitments at $200. Derek knows Will won’t be able to pay him anything now, but knows when they arrive at a town Will would be able to pay an arbitrarily high amount.
Derek knows Will has no ability to negotiate or change his preferred method of decision making during their interaction. (edit: to be clear, per Firmament's point, we are also assuming that under this asymetry Will is only aware of his own expected value, and that as under Parfitt's Dilemma, he expects that if he were an agent that would lie and not pay Derek back, Derek would leave him to die).
Derek knows Will is also a rational agent that will make decisions based on evaluating the expected value he calculates. Derek also knows how Will evaluates expected value (we will imagine it under either casual decision theory or functional decision theory).
Derek decides he will pull over and offer to give Will a ride back, if and only if Will promises to give him $X when they return to town. How much should Derek ask Will to pay him? What is the value Derek should ask for if he knows Will makes decisions under FDT? What value should he ask for if he knows Will makes decisions under CDT?
Evidently, Derek wants to give Will a ride, his expected utility is positive for giving Will a ride and negative for leaving Will to die. The only question is what amount he should demand to maximize his utility. However, if Will does reject his offer, Derek’s value of truthfulness is such that he would rather leave Will to die (Derek does not want this to happen, so will not select a scenario where it may occur). Derek's offer, as such, is genuine. Under this scenario, Derek has an arbitrarily high confidence in his knowledge of how Will will act, so he will always pick a scenario where Will will truthfully agree to pay him back.
If Derek knows Will makes decisions under CDT, he tells Will “I will drive you to nearest town if and only if you agree to give me $199.99 when we arrive back to town.” Will agrees. When they get back to town, Will estimates the expected value of paying Derek back at $200-$199.99=$0.01 so he does so. Compared to if he hadn't met Will, Derek ends up $200.99 better off, and Will ends up $9,999,800.01 better off.
If Derek knows Will makes decisions under FDT, he tells Will “I will drive you to the nearest town if and only if you agree to give me $100,000,199.99 when we arrive back to town”. When they arrive back, Will estimates his expected value of paying Derek back at $1,000,200 - $100,000,199.99 = $0.01, so he does so. Compared to if he hadn't met Will, Derek ends up $100,000,200.99 better, Will ends up $0.01 better.
Now, let’s imagine making an earlier decision from Will's perspective. Will is evaluating the value of evaluating expected value under CDT or FDT, knowing this scenario will happen.
Under CDT he evaluates that deciding to be a CDT agent will make him better off by $9,999,800 so he decides to be a CDT agent.
Under FDT he evaluates that CDT agents will be better off by $9,999,800 so decides to be a CDT agent.
This is not per say a logical contradiction, but it seems rather counterintuitive that in some scenarios given the choice it would be rational to select an alternative model.
As an outsider, we also may say there are additional ways CDT seems to outperform FDT herein. First and most obviously, Will just does better. But it also creates better outcomes as a whole. For many of us outsiders, it seems unfair that Derek would price a service that costs him $5 at $1,000,199.99. Of course, the service (in both cases) is a net positive (if the service isn't performed, Will dies and neither party benefits from external positive externalities), but if Derek expects Will to be honest or dishonest under CDT, his pricing decisions seem to lead to a fairer distribution of the surplus value--Derek's recognition in the CDT scenario that Will is more willing to be dishonest actually leads him to be less able to leverage his asymmetric advantage over Will.
My knowledge of decision theory comes exclusively from reading relevant LessWrong posts when the mood takes me, but it seems to me that FDT-Will would instead act like this:
Thus, FDT outperforms CDT, since CDT pays $199.99 and FDT pays nothing at all.
You are modeling a slightly different scenario that assumes Will has full knowledge of Derek's price-setting behavior. In that scenario, you are correct, but the scenario is explicitly assuming asymmetry in that regard. Derek knows Will's expected value when setting the scenario, but Will only has information to determine the expected value after the scenario has been set.
From Will's perspective, if he was an agent that accepted the deal, he would get taken back. If he was an agent that rejected the deal, he wouldn't. Derek has knowledge of will's decision making in this scenario, the reverse is not true. If the question was "what should Will's behavior be?" Will would win under FDT, since that would indeed imply he had knowledge and as such should adopt a 'never pay'. But if we calculate his EV under the perspective of Derek price setting, without assuming Will has full knowledge, Will's EV for paying back is positive for accepting the deal and negative if he wouldn't pay back (since as far as Will is concerned, if he wouldn't pay Derek, he would have been left to die), so as a rational agent should pay Derek.
Edit: to be clear, your scenario also implicitly assumes a degree of asymmetry, just on the side of Will's decision making. If we assume symmetry or allow negotiation, we would expect Will to adopt a strategy somewhere between -1 and 1,000,200 if we assume FDT agents (since those are the scenarios where they can both accept a deal as a net positive). If they are CDT agents, we expect them to negotiate somewhere between -1 and 200 (since those are the scenarios where they can both accept a deal as a net positive).
Ah, I see! So the scenario favors CDT only because Will lacks full information on the problem. Will thinks he's playing Parfit's Hitchhiker, but in reality he's playing Ultimatum.
I dunno, it doesn't really seem like a fair problem. You could construct a problem that unfairly favors CDT instead, like "Newcomb's Problem but, unbeknownst to you, if you one-box, you die".
When choosing his decision theory, maybe Will should decide to run FDT and then conservatively make CDT-like actions when he doesn't have full information, if he's expecting to encounter a lot of situations like this.
Exactly, it is unfair to Will under FDT (which is part of why I framed it from Derek's perspective), but I would argue it is a lot closer to what we see in the real world.
Usually there is some asymmetry, people have nuanced utility functions and when there is some net positive utility actors desire to capture as much of the net benefit for themselves as they can. While dishonest agents can lead to worse outcomes (e.g., in the traditional Parfit's dilemma, someone who are under CDT is simply left to die, with no choice in the matter), unmitigated honesty can lead to an actor being a patsy, and taken advantage of. Realistic utility functions generally model and moderate this, I would argue, better than FDT.
One of the main reasons Yudowsky and Soares give for preferring FDT is that in the real world they view it as generally leading to better outcomes. I am not convinced that is the case, where these sorts of unfair situations arise.
You could construct a problem that unfairly favors CDT instead, like "Newcomb's Problem but, unbeknownst to you, if you one-box, you die".
Yes, but that isn't very realistic. The purpose of the construction is that more realistic assumptions can lend to FDT being worse off. A more direct comparison to this would be Newcomb's Problem with Transparent Boxes where the Predictor doesn't actually care if you one-box. But Newcomb's problem doesn't seem very realistic on its face.
This framing, I would argue, more closely resembles how we actuslly handle asymmetric problems. Pricing decisions are based on willingness to pay, contracts are devised with incentives around what we anticipate will make the cost of cheating greater than the cost of being honest, etc. These observed behaviors align better with an understanding of expected value under causal decision theory.
If we assumed an agent that always has at least symetric information and predictive ability, I think it is fair to say that for such an agent, FDT would win out. But in reality is that is rarely the case. Amazon knows more about your purchase histories, price decisions and whatnot than you know about Amazon.
Hm, I guess it's an empirical question then, of whether these situations happen in real life often enough to warrant using CDT or FDT-acting-like-CDT. I think FDT still wins out in the end, because FDT will emulate CDT if it realizes it lives in an FDT-hostile world (thus taking CDT-like actions out of an abundance of caution), while CDT has a harder time emulating FDT (CDT would need to use precommitment, while FDT does not).
I think that, in real life, humans are already near-optimal on this. People in low-trust FDT-hostile communities will grow up running CDT, while people in high-trust communities grow up running FDT. Issues occur when a person's community changes or if they move to a new community, but that problem is perhaps outside of the scope of decision theory.
So the moral of the story is "run FDT, but emulate CDT if you live near a lot of information-asymmetric FDT-hostile problems"
And, of course, the point of LessWrong decision theory is to choose which one to give to a superintelligence (I think), since humans can't actually choose their own decision theory (for the most part). And superintelligences probably won't end up on the short side of an information asymmetry very often.
You could construct a problem that unfairly favors CDT instead, like "Newcomb's Problem but, unbeknownst to you, if you one-box, you die".
Oops, meant "unfairly favors FDT instead; if you two-box, you die". It is indeed not very realistic, but it is possible to construct an isomorphic question in a more realistic setting, much like how Parfit's Hitchhiker is a more realistic yet isomorphic framing of Newcomb's Problem. So, again, it's an empirical question of which decision theory is favored in situations that occur in real life.
Claude Opus 4.6 Extended weighs in with what I believe to be insightful and correct critique (oneshot/no selection; told to "focus on logical errors"; I'm "Poster 2" and you're "Poster 1")
There are a few errors and confusions here that neither poster fully identifies:
The honesty parameter is doing most of the work, and it's doing different work for each DT. Under CDT, the $200 honesty value is the entire mechanism by which Derek can extract payment—without it, CDT-Will in town reasons "the ride already happened, paying has no causal effect on anything I care about" and pays $0. Under FDT, the honesty value is nearly irrelevant ($200 on top of $1,000,000 from counterfactual reasoning). So the comparison isn't really "FDT vs CDT"—it's "FDT's counterfactual reasoning makes the full life-value extractable vs. CDT needs an exogenous honesty parameter bolted on to make any value extractable." Poster 1 has quietly smuggled in a mechanism that makes CDT behave cooperatively for reasons that have nothing to do with CDT's actual decision-theoretic properties, then compared the magnitude of extractable surplus.
If you set honesty to $0, CDT-Will pays nothing and Derek (knowing this) never stops. FDT-Will still pays up to $1M and Derek extracts near-maximal surplus. The comparison reverses in valence: now CDT-Will dies and FDT-Will lives. This is just... standard Parfit's Hitchhiker again.
The "Will chooses his decision theory" argument is incoherent as a critique of FDT. Poster 1 frames it as a problem that FDT-Will, given the choice, would "choose to be a CDT agent." But that's just FDT working as intended. FDT says "implement whatever algorithm leads to best outcomes." If in this specific information-asymmetric environment the CDT-like algorithm dominates, FDT recommends it. This isn't FDT losing—it's FDT being a meta-theory that can recommend CDT-like behavior when appropriate. CDT can't do the reverse; it can't say "in Newcomb-like problems, implement FDT-like behavior" because CDT lacks the counterfactual reasoning to even identify when that's beneficial.
Poster 2 gestures at this ("FDT can emulate CDT") but doesn't make the sharper point that this "emulation" is just FDT functioning correctly at the meta-level.
The information asymmetry is contrived in a way that's load-bearing but presented as "realistic." Derek has perfect knowledge of Will's utility function, decision theory, and honesty parameter, while Will has zero strategic information about Derek's pricing. Poster 1 claims this is more realistic than Newcomb, but a scenario requiring one-sided perfect utility-function-level mind-reading is not obviously closer to real-world bargaining. In actual markets both parties have partial information and the dynamics look very different. The "Amazon knows more about you" analogy is a much weaker asymmetry than "Amazon knows your exact utility function and decision theory with arbitrarily high confidence."
An underexplored FDT subtlety with partial information. Under FDT, Will reasons about what happens when copies/simulations of his algorithm are evaluated by other agents. Poster 1 claims Will can't do this because he doesn't know Derek is modeling him—but this is confused about how FDT works. FDT-Will doesn't need to know the specific modeling; he reasons "I am implementing algorithm A. In all environments where A is evaluated—including by potential predictors—what are its outputs?" The question of whether FDT requires knowledge that you're being modeled to apply counterfactual reasoning is a genuinely deep question about FDT's foundations, and it's where the real philosophical action is. Neither poster engages with it.
Basically: poster 1 has constructed a scenario where the honesty parameter does the heavy lifting for CDT and the information asymmetry does the heavy lifting against FDT, then concluded that CDT "wins"—but the lesson is closer to "bargaining power determines surplus distribution, and agents who can credibly commit get better deals in symmetric info but are more exploitable under asymmetric info." Which is true but is a general feature of commitment, not a special failure of FDT.
Hm, I guess it's an empirical question then
I sort of agree, but I don't think it is one we can strictly answer. I gave some reasons we might think (in real world scenarios) that CDT tends to better explain behavior (e.g., Braess's paradox), though I do not believe it is one we can have enough data to answer for all time.
People in low-trust FDT-hostile communities will grow up running CDT, while people in high-trust communities grow up running FDT.
I disagree. I think the difference in high and low trust societies can largely be attributed to differences in utility functions and signaling effects. This appears empirically true (high-trust societies tend to have more information on each others activities, making signaling more meaningful, tend to poll as valuing trust, etc).
Also, the problem isn't "FDT-hostile" per-say. Derek is just trying to maximize his utility, he only cares what value he thinks he can get Will to honestly pay him. FDT does worse because it recommends taking the deal in Parfit's hitchhiker and doesn't have a posterior restraint on honest signaling.
If Derek was a misanthrope, Will could lose under CDT. If Derek valued saving Will's life at -195 (instead of +6), Derek would leave CDT Will to die and still save FDT Will. This is why I predicated that Derek has to be a decent person, he prefers a scenario where everyone wins to one where he wins and Will dies.
Parfit's Hitchhiker is a more realistic yet isomorphic framing of Newcomb's Problem
Minor nitpick but Parfit's hitchiker is isomorphic under CDT and to Newcomb's Problem with transparent boxes under both CDT and EDT but not Newcomb's problem in general. CDT doesn't care about evidentiary probability, but if you don't know what is in the boxes EDT says you should act probabilistically.
So the moral of the story is "run FDT, but emulate CDT if you live near a lot of information-asymmetric FDT-hostile problems"
I mean, I guess that is not unfair, but it seems bad. It would imply honest signals are good for CDT agents and dishonest signals are good for FDT agents.
Also, Claude is wrong or missing the point.
Under CDT, the $200 honesty value is the entire mechanism by which Derek can extract payment
Yes, this is the point. In the real world, for some value people are unwilling to cheat and for some value people are willing to cheat. The number of people, on seeing someone drop a $5 bill, that would return it to the person is greater than the people who seeing them drop a $100 bill would return it. This is a pretty trivial observation. Honesty in the real world is dependent on external signaling effects (i.e. people know others will be more honest with them if they are seen to be honest with others) and individual values (i.e., people are more honest because they ascribe some nebulous value to being honest, people who ascribe a greater value to honesty will be more honest than those who ascribe less of a value).
If you set honesty to $0, CDT-Will pays nothing
Derek still saves them, if you set all externalities to nothing, then yes. But the point of the hypothetical is to assume more normal human values. Most humans don't like killing people and don't like being dishonest.
FDT-Will still pays up to $1M and Derek extracts near-maximal surplus.
Which seems suboptimal. Extracting all of the surplus value for little effort doesn't seem like a good thing. In society we ideally want to have parties be able to negotiate how to best divide the surplus according to various principles and social values. FDT-Will can be taken advantage of because he doesn't have realistic human values. Similarly, in the typical formulation the driver can leave the hitchiker to die and the hitchhiker will be dishonest at any dollar figure because they have unrealistic utility functions.
The information asymmetry is contrived in a way that's load-bearing but presented as "realistic."
I mean, less so than in the original or in the inverse from Will's perspective. I explained why I would argue it is generally realistic. It is an extreme case. In truly realistic scenarios, I would expect Derek to simply extract more from FDT-Will to a differing degree depending on how confident he was Will would pay him back. If Derek thought Will was CDT will, he would have to base that response on the estimated utility of Will paying him back when saved, which would be the $200. If he expected Will was FDT will, he would have to do so based on his estimate of Will's functional utility for the total scenario, which would be $1,000,200. Realistically, it would be under $1,00,000 since he would expect Will to model some cut off to get a better deal. That cut off would fall between $0 and $1,000,200 depending on his relative estimates. If there was less asymmetry, it may end up that he would estimate he could only be reasonably confident that Will would take $5,000. But that would still have FDT-Will worse off since his bargaining position assumes a much larger stake than CDT-Will.
As I said in my OP, my comment was to make more realistic but still keep most of the simplifying assumptions. You can add more variables to make it more realistic--the more you add the more complex it becomes to model.
Poster 1 claims Will can't do this because he doesn't know Derek is modeling him
That wasn't what I was claiming. Did you have the same confusion? The problem Will has is he modelling to pay Derek or not based on the assumption that if he doesn't pay Derek he would have been not saved (i.e., exactly how FDT models Will's decision under the traditional Parfit's dilemma).
This necessitates he has some understanding of Derek's decision making, if he had no reason to think Derek cared either way, he would be fine lying in both the original Parfit's Dilemma and my 'Decent Driver' version.
The question of whether FDT requires knowledge that you're being modeled to apply counterfactual reasoning is a genuinely deep question about FDT's foundations, and it's where the real philosophical action is.
No it isn't, FDT plainly doesn't require that. FDT gives different outputs if you are being modeled in so far as how the agent is being modeled affects how the agent assesses the values at issue.
I gave some reasons we might think (in real world scenarios) that CDT tends to better explain behavior (e.g., Braess's paradox), though I do not believe it is one we can have enough data to answer for all time.
Indeed, whatever humans do seems to be closer to CDT than other decision theories, although humans use various concepts like justice, trust, and honor to approximate FDT occasionally.
People in low-trust FDT-hostile communities will grow up running CDT, while people in high-trust communities grow up running FDT.
I disagree. I think the difference in high and low trust societies can largely be attributed to differences in utility functions and signaling effects. This appears empirically true (high-trust societies tend to have more information on each others activities, making signaling more meaningful, tend to poll as valuing trust, etc).
The differences in utility functions are the humans' way of implementing FDT (since FDT is too hard to reason about for evolution to instill it directly), and the signaling effects/mutual knowledge are what makes FDT worth it.
Really, in this scenario, the fact that Will values honesty and promise-keeping means that CDT-Will is implementing a decision theory somewhere between CDT and FDT. FDT-Will effectively values honesty at infinity, while CDT-Will values it at $200. Your argument seems to be that, due to uncertainties in real life, the optimal value to place on honesty is somewhere between zero and infinity, but not at either extreme. Which is true.
Also, the problem isn't "FDT-hostile" per-say. Derek is just trying to maximize his utility, he only cares what value he thinks he can get Will to honestly pay him. FDT does worse because it recommends taking the deal in Parfit's hitchhiker and doesn't have a posterior restraint on honest signaling.
If Derek was a misanthrope, Will could lose under CDT. If Derek valued saving Will's life at -195 (instead of +6), Derek would leave CDT Will to die and still save FDT Will. This is why I predicated that Derek has to be a decent person, he prefers a scenario where everyone wins to one where he wins and Will dies.
I guess FDT-hostile is too strong a way to put it, since it implies the problem is an unfair problem. But as @papetoast said, there are some problems that FDT does better on, and some that CDT does better on (like the variant where Derek is misanthropic), and this one is one that CDT does better on.
Parfit's Hitchhiker is a more realistic yet isomorphic framing of Newcomb's Problem
Minor nitpick but Parfit's hitchiker is isomorphic under CDT and to Newcomb's Problem with transparent boxes under both CDT and EDT but not Newcomb's problem in general. CDT doesn't care about evidentiary probability, but if you don't know what is in the boxes EDT says you should act probabilistically.
Oops, I wasn't aware that there was a distinction between the transparent-box version and the opaque-box version. Thank you for the correction.
So the moral of the story is "run FDT, but emulate CDT if you live near a lot of information-asymmetric FDT-hostile problems"
I mean, I guess that is not unfair, but it seems bad. It would imply honest signals are good for CDT agents and dishonest signals are good for FDT agents.
I am confused as to what you mean. CDT and FDT-emulating-CDT act the same, so they're equally honest and get equally as much benefit from honesty. Is this about this specific problem? Or all similar problems? But this doesn't seem to be load-bearing, so, whatever.
Under CDT, the $200 honesty value is the entire mechanism by which Derek can extract payment
Yes, this is the point. In the real world, for some value people are unwilling to cheat and for some value people are willing to cheat. The number of people, on seeing someone drop a $5 bill, that would return it to the person is greater than the people who seeing them drop a $100 bill would return it. This is a pretty trivial observation. Honesty in the real world is dependent on external signaling effects (i.e. people know others will be more honest with them if they are seen to be honest with others) and individual values (i.e., people are more honest because they ascribe some nebulous value to being honest, people who ascribe a greater value to honesty will be more honest than those who ascribe less of a value).
True, the $200 honesty-value seems to be there just to make CDT act more-FDT-like.
If you set honesty to $0, CDT-Will pays nothing
Derek still saves them
Dang, you're right, I really should have noticed that.
But the point of the hypothetical is to assume more normal human values. Most humans don't like killing people and don't like being dishonest.
Wait, does the $200 honesty-value actually matter here? It doesn't seem like it changes the results of the hypothetical if you remove it, and removing it would make it easier to reason about.
FDT-Will still pays up to $1M and Derek extracts near-maximal surplus.
Which seems suboptimal. Extracting all of the surplus value for little effort doesn't seem like a good thing. In society we ideally want to have parties be able to negotiate how to best divide the surplus according to various principles and social values. FDT-Will can be taken advantage of because he doesn't have realistic human values. Similarly, in the typical formulation the driver can leave the hitchiker to die and the hitchhiker will be dishonest at any dollar figure because they have unrealistic utility functions.
It seems that if Derek didn't value honesty so highly that he wouldn't stick to his first offer and they would be able to come to a fairer deal. But this would be bad for Derek.
If tremendously valuing honesty is equivalent to FDT in this scenario (which it roughly seems to be but only because everyone makes a bunch of promises at the start in the desert), then in the scenario Derek is basically running FDT and using it to gain an advantage by precommitting to a single offer (the scenario explicitly says that negotiation is impossible, but in real life this would only happen if Derek is using some weird nonconventional negotiation tactics, and using a massive value on trust to precommit to the first offer made is one such tactic). So since Derek ends up with all the utility here, I guess FDT is good if you're Derek and only bad if you're Will. But I haven't thought about this enough.
As I said in my OP, my comment was to make more realistic but still keep most of the simplifying assumptions. You can add more variables to make it more realistic--the more you add the more complex it becomes to model.
I was focused on the low-level claims so I neglected to paste that top-level comment into Claude's chat; oops. Anyways, I don't see the value in making the scenario complex. If the goal is to show a flaw in FDT, then that flaw will manifest in a simple scenario, which would be easier to reason about. But I guess if the goal is to show what should be done by a real human pragmatically, then complexity might be fine.
Poster 1 claims Will can't do this because he doesn't know Derek is modeling him
That wasn't what I was claiming. Did you have the same confusion? The problem Will has is he modelling to pay Derek or not based on the assumption that if he doesn't pay Derek he would have been not saved (i.e., exactly how FDT models Will's decision under the traditional Parfit's dilemma).
Well, Derek is modeling Will on two levels. Derek is modeling what prices out of all possible prices Will would pay at, and Derek is modeling whether Will will pay the price that Derek actually decides. Will is only aware of the latter level, but isn't aware of the price-setting that Derek was doing before. So Will can't effectively leverage FDT, since he isn't aware of that first level of modeling.
The question of whether FDT requires knowledge that you're being modeled to apply counterfactual reasoning is a genuinely deep question about FDT's foundations, and it's where the real philosophical action is.
No it isn't, FDT plainly doesn't require that. FDT gives different outputs if you are being modeled in so far as how the agent is being modeled affects how the agent assesses the values at issue.
I'm having trouble comprehending this and should probably get some sleep, but it would seem Claude is being weird and overconfident here so I hereby downgrade my overconfident endorsement of Claude's outputs from "insightful and correct" to "looks right in some places but makes mistakes or is overconfident in other places".
humans use various concepts like justice, trust, and honor to approximate FDT occasionally.
I don't think it is approximating FDT. I think it is just different values. Laws and policies may make CDT agents approximate what FDT agents would do without those laws, but that is not what I mean. Real humans have complex sets of desires/utility functions.
The differences in utility functions are the humans' way of implementing FDT
...
FDT-Will effectively values honesty at infinity, while CDT-Will values it at $200.
...
I think this may be a central confusion. You are misunderstanding the hypothetical somewhat. FDT-Will values honesty at $200. He and CDT-Will would both be willing to be dishonest if it got him a >$200 payoff. To take my prior example, if someone dropped a $100 bill, he would return it. But if they dropped >$200, he would pocket it. The reason he is willing to pay up to the value of his life +$200 is because his assessment of the value is not based on how he values honesty, it is only based on how he expect agents like himself to be treated.
True, the $200 honesty-value seems to be there just to make CDT act more-FDT-like.
...
Wait, does the $200 honesty-value actually matter here?
...
Anyways, I don't see the value in making the scenario complex.
Out of order, but i think it is more relevant here. Absent the $200, Will is monetarily better off (since Derek would drive them back anyway). It is to show in real scenarios what factors the other party might use to determine how much to demand of the hitchhiker. In the original, why the driver asks for what he does is ignored. Realistically, people don't set prices at random.
The value of the $200 is meant to show price setting behavior in a more realistic CDT environment. It is not relevant to CDT winning. CDT wins in the given because Derek is a Decent Driver (hence the name). If Derek wasn't a decent guy, CDT would still win if (and only if) they valued the signaling + honesty greater than Derek thought driving back was costly. Derek loses every dollar he values honesty more than Derek sees saving him as costly (though it doesn't affect his actual expected value payoff, just amount that is actual cash money).
But I agree on the complexity. I guess it wpuld have been better to first present Will as a simply agent with no external values and then show how he would behave under CDT with more realistic values. But the more realistic values are what I'd argue are more relevant for where CDT offers different policy implications.
Oops, I wasn't aware that there was a distinction between the transparent-box version and the opaque-box version. Thank you for the correction.
No worries, the reason the original was interesting is that CDT estimates two-boxing maximizes expected value while EDT would estimate one boxing does. Both EDT and CDT in the transparent case would say two box. EDT says in the opaque case if you one box there is a 99% chance (or whatever probability you apply) of the opaque box having the money, so one-boxing works out a higher value. But if you can see what's in the box, that is no longer a evidential problem so EDT says to two box and you get no difference.
I am confused as to what you mean. CDT and FDT-emulating-CDT act the same, so they're equally honest and get equally as much benefit from honesty. Is this about this specific problem? Or all similar problems? But this doesn't seem to be load-bearing, so, whatever.
I will have to think on this, but my first thought is that my previous point on honesty applies. In this scenerio, the CDT agent gets a better deal by signaling honestly they will act as if they value being honest about their payments at $200. The FDT agent can actually do better (as you said in your first response) if there isn't asymmetry by acting as if they would never repay a payment. This implies to me a dishonest signal. But yea, it isn't load bearing, it is somewhat my own intuition.
in the scenario Derek is basically running FDT and using it to gain an advantage by precommitting to a single offer (the scenario explicitly says that negotiation is impossible, but in real life this would only happen if Derek is using some weird nonconventional negotiation tactics, and using a massive value on trust to precommit to the first offer made is one such tactic).
Real life negotiation often is impossible or extremely costly. Negotiating prices after a hospital visit can cost many man hours. Most retailers won't allow you to negotiate at all. Setting prices and comitting to them is a pretty conventional tactic. But if you allow negotiation Derek is still likely to get an outsized payment for something he would have done for free. Paying him seems to have some social utility (we want to encourage people to be decent and help others), but limiting it to amounts consisered feasible by social values and honest signals (as under CDT), seems likely to lead to better outcomes than msking the restraints equivilant to the total value of the interaction (which is the restraint under FDT).
I'm having trouble comprehending this and should probably get some sleep, but it would seem Claude is being weird and overconfident here so I hereby downgrade my overconfident endorsement of Claude's outputs from "insightful and correct" to "looks right in some places but makes mistakes or is overconfident in other places".
Fair enough, same here. Have a good night!
humans use various concepts like justice, trust, and honor to approximate FDT occasionally.
I don't think it is approximating FDT. I think it is just different values. Laws and policies may make CDT agents approximate what FDT agents would do without those laws, but that is not what I mean. Real humans have complex sets of desires/utility functions.
I mean, my intuition says that the right utility function can turn any CDT agent into an FDT agent, and any FDT agent can be described in terms of a CDT agent with a certain utility function. Like, CDT will one-box if it intrinsically values one-boxing in Newcomblike problems. So, a human with weird desires for justice and a human running FDT act the same if you squint.
The differences in utility functions are the humans' way of implementing FDT
...
FDT-Will effectively values honesty at infinity, while CDT-Will values it at $200.
...
I think this may be a central confusion. You are misunderstanding the hypothetical somewhat. FDT-Will values honesty at $200. He and CDT-Will would both be willing to be dishonest if it got him a >$200 payoff. To take my prior example, if someone dropped a $100 bill, he would return it. But if they dropped >$200, he would pocket it. The reason he is willing to pay up to the value of his life +$200 is because his assessment of the value is not based on how he values honesty, it is only based on how he expect agents like himself to be treated.
I'm making a specific claim about this specific scenario. I agree that both CDT-Will and FDT-Will
will pick up $300 on the ground and keep it. But in this scenario, back in the desert, all parties
involved hash out exactly what they're going to do in the future. So if both Derek and Will were
running CDT, but they were to value honesty at infinity dollars, then they would act exactly the
same as if they both ran FDT but valued honesty at zero dollars. So the honesty parameter acts
as a way to interpolate between CDT and FDT.
in the scenario Derek is basically running FDT and using it to gain an advantage by precommitting to a single offer (the scenario explicitly says that negotiation is impossible, but in real life this would only happen if Derek is using some weird nonconventional negotiation tactics, and using a massive value on trust to precommit to the first offer made is one such tactic).
Real life negotiation often is impossible or extremely costly. Negotiating prices after a hospital visit can cost many man hours. Most retailers won't allow you to negotiate at all. Setting prices and comitting to them is a pretty conventional tactic. But if you allow negotiation Derek is still likely to get an outsized payment for something he would have done for free. Paying him seems to have some social utility (we want to encourage people to be decent and help others), but limiting it to amounts consisered feasible by social values and honest signals (as under CDT), seems likely to lead to better outcomes than msking the restraints equivilant to the total value of the interaction (which is the restraint under FDT).
This seems like the crux of the issue. I think that the whole reason Will is in this mess is because Derek places a value of $1,000,000 on trust for some reason, making him act exactly like an FDT agent. If the tables were turned, and we got rid of the "no negotiation allowed" rule, and Derek was a $0 honesty CDT agent and Will was an FDT agent (or alternatively a CDT agent with a high value on honesty), then Will could say "I precommit to not letting you drive me to to town unless you pay me $0.99 right now" (we assume Derek has money on him or that Will is capable of somehow paying Derek a negative amount in town) and then Derek would have no choice but to comply. And if instead both agents ran dishonest CDT, then words mean nothing and Derek would silently drive Will to town for the $1 altruism utility. So the moral of the story is that whoever runs FDT wins, with ties broken by whoever has the information advantage. The magnitude of "winning" is very different, because FDT-Derek-FDT-Will ends up netting Derek a million dollars while FDT-Derek-CDT-Will only gets Derek $199,99, but FDT-Derek wins nevertheless.
I mean, my intuition says that the right utility function can turn any CDT agent into an FDT agent, and any FDT agent can be described in terms of a CDT agent with a certain utility function.
I don't think that follows. The CDT and FDT agents have the same utility functions and behave differently. Of course, if you gave them different tailored utility functions you could get them to behave the same in any given case, but that doesn't seem very sensible, imo.
So if both Derek and Will were running CDT, but they were to value honesty at infinity dollars,
In that case, Derek could demand infinite dollars from Will and Will would pay it.
This seems like the crux of the issue. I think that the whole reason Will is in this mess is because Derek places a value of $1,000,000 on trust for some reason, making him act exactly like an FDT agent
If you remove Derek valuing honesty, his optimal decisions work out identically, as I daid in the OP. I made him value it trivially highly so I didn't have to include a discussion of those scenarios to show they are suboptimal for Derek, but you can calculate his EV yourself, they will always be less than thr scenarios I described in the OP.
If the tables were turned, and we got rid of the "no negotiation allowed" rule, and Derek was a $0 honesty
Derek's honesty value doesn't affect those scenarios. In a negotiation, the turn order, information asymmetry etc determine who wins.
So the moral of the story is that whoever runs FDT wins
PEr the original problem, Derek's optimal move is identical under CDT and FDT. How much he can get from Will is the only variable which depends on Will's utility calculations.
This isn't a tiebreaker, it is what value they ascribe to different scenarios. Since CDT-Will's posterior calculation is limited to his causal effects, the value that can be extracted from him is much lower.
The CDT and FDT agents have the same utility functions and behave differently. Of course, if you gave them different tailored utility functions you could get them to behave the same in any given case, but that doesn't seem very sensible, imo.
What I mean is that you can think of "CDT agent with certain utility function" and "FDT agent" as exactly the same. They're the same concept. So when you say "I don't think it is approximating FDT. I think it is just different values." I reply that "different values" and "approximating FDT" are the exact same thing, at least in the case where the mentioned "different values" are "justice, trust, and honor", in my opinion.
So if both Derek and Will were running CDT, but they were to value honesty at infinity dollars,
In that case, Derek could demand infinite dollars from Will and Will would pay it.
Well, Will only values his life at a million dollars, so he would rather die than pay more than that. I admit that when I wrote that bit I was mentally conflating "one million" and "infinity" to simplify reasoning. Hopefully all the other shortcuts I'm using don't break anything.
Intuition says the "infinity" here comes from Derek and Will's infinitely accurate predictions. As in, if the predictions were less than infinitely accurate, then you would need less than infinity dollars of honest-value to make CDT act like FDT. Dunno if that's true and it doesn't matter if it does, so, whatever.
[the rest of the reply]
I should have clarified more, oops. I was talking about a minor variation of the scenario where the "negotiation is not possible" restriction is lifted (while still keeping the information asymmetry somehow). In this case, with no other changes, the problem is basically the same, since Derek just says "btw I swear on God almighty that I am not negotiating at all, since this way I get the best outcomes" and then the rest of the scenario plays out the same (as long as we posit that Will's memory of this exchange is magically erased and so FDT-Will doesn't consider changing his behavior to get a better deal)
And meanwhile if Derek's $1,000,000 value on honesty is set to $0 BUT he uses FDT then the exact same thing happens absent any weird commitment-race dynamics with FDT-Will
Meanwhile if Derek has $0 honesty and is CDT and he says "btw no negotiation" then FDT-Will can say "no, screw you, we're negotiating or I swear I will bury my head in the sand and die" and Derek will say "oh ok, i can tell that you will keep your promise, nevermind then, let's negotiate". FDT-Will then says "give me $0.99 and I'll let you save my life" and poor CDT-Derek will agree.
The FDT-Derek + FDT-Will case is probably important but it scares me and I don't know how to reason about it. Probably with geometric utility. In this case, if we add a rule saying Derek gets 1 million dollars of utility from being alive, FDT-Derek pays 50 cents to FDT-Will to maximize the logarithm of utility, since Derek gets $1,000,000 + $0.50 and Will gets $1,000,000 + $0.50 and since these numbers are the same utility is maximized which is the best possible output for the function (we are ignoring the honesty-utility here)
Also, putting this in another post since I think it is a major point, if we assume some cost to bargaining, for Derek it approximates something like a dove-hawk game, where Derek gets the first move. Will's game is more complex as he is operating under information assymetry, so depends on the odds he assigns some probailities
If we consider the value Will pays as X (negative if Derek pays Will), if we assume some cost (C) of both negotiating the outcome, the payoffs works out to (I don't know how/if you can put tables into comments so I just have to write them out):
Payoffs given FDT-Will with Negotiation:
(1) Will accepts the initial offer (for FDT-Will, X = 1,000,199.99):
(2) Will Contests and Derek Accepts (say X = -0.99[1]):
(3) Will and Derek contest over X. X is unspecified under the assumptions, any number where X > (C - 1) and X < (1,000,200 - C) is feasible:
Counterfactual: Derek doesn't offer an amount and Will doesn't contest (X = 0)
Payoffs given CDT-Will with Negotiation:
(1) Will accepts the initial offer (for CDT-Will, X = 199.99, since anything greater wouldn't be paid):
(2) Will Contests and Derek Accepts (X = -0.99):
(3) Will and Derek contest over X. X is unspecified under the assumptions, any number where X > (C - 1) and X < (200 - C) is feasible:
Counterfactual: Derek doesn't offer an amount and Will doesn't contest (X = 0)
While we would need to know Will's probability estimates to actually model how they behave and what actions they take, from this it seems rather evident that under most approximations CDT-Will is still likely to be better off.
What I mean is that you can think of "CDT agent with certain utility function" and "FDT agent" as exactly the same. They're the same concept.
They are not. A CDT agent is fundamentally doing a different expected value calculation than a FDT agent. This is why they can lead to radically different outcomes.
I should have clarified more, oops. I was talking about a minor variation of the scenario where the "negotiation is not possible" restriction is lifted (while still keeping the information asymmetry somehow).
Okay, play out the scenario. He offers to take CDT will back for $199.99, what does will say? Will's expected values are:
Now the question becomes "how does Will estimate the payoffs for 3?" What is his expectation for Derek to negotiate? Etc. If we assume sufficient risk aversion (which I would argue is the most probable outcome) 1 is still preferable.
Let's imagine Derek offers to take FDT will back for $1,000,199.99. Will's expected values are:
FDT-Will has the same problem as CDT-Will. Though, for FDT-Will, unlike CDT-Will, I would argue under most reasonable assumptions there would be some preferable value for X under 3 that FDT-Will would estimate has a better expected value. Given that, he would try to negotiate a value somewhere between -$1 and $1,000,199.99. Where he would negotiate that value depends on risk aversions and how he estimates the responses from Derek.
And meanwhile if Derek's $1,000,000 value on honesty is set to $0 BUT he uses FDT then the exact same thing happens absent any weird commitment-race dynamics with FDT-Will
I am not convinced this is true. I don't see why FDT-Derek would behave differently. If we assume information symmetry, then you get the same commitment race with their priors.
FDT-Will then says "give me $0.99 and I'll let you save my life"
Why? See above. FDT-Will under your scenario still suffers from information asymmetry. You can argue that the 3 is reasonably the better option for him, but he has no idea what value of X is optimal. We know Derek considers any value >-$0.99 as an expected positive, but Will is operating in an asymmetric environment. He doesn't know what Derek will decide. It seems reasonable that Will might expect Derek to accept some lower amount, but he is going to have to weigh that against the probability that Derek says "no." If he has extreme risk aversion, he will still prefer 1 even if he estimates Derek would likely accept a lower price. If he has no risk aversion and Derek cannot counter offer, he will offer whatever he expects Derek to accept.
and poor CDT-Derek will agree.
Why? Let's lay out CDT-Derek's option.
It seems likely that CDT-Derek would pick some variant of 3, dependent on what he expects FDT-Will to react with which depends on FDT-Will's estimation of CDT-Derek. You would expect to get some race with FDT-Will trying to determine Derek's utility function. Indeed, if we assume perfect information asymmetry, CDT-Derek's best move is probably to keep saying "I will only take you back for $1,000,199.99" to prevent FDT-Will from getting any information on his utility, if CDT-Derek repeats that until FDT-Will is about to die if he doesn't make a decision, FDT-Will, having gained no information on the Derek's utility, is likely to simply accept when he becomes unable to negotiate further (for the same reasons as above).[3] And, making the standard FDT estimates when he gets to town (i.e., he anticipates if he wasn't the kind of agent that would pay, he would have been left to die), he would honestly pay the $1,000,199.99.
When I use '???' I mean that it is both unspecified under the assumptions of the equations and unknown to the agent. We would need to add additional specifications to the problem to determine the expected payoff for different values of X, and without changing assumptions Will's expected payoff from X would remain unknown to Will. We could add assumptions for Will's estimates (which are not likely to be equivalent to the real payoffs), to determine what Will would estimate the expected payoffs are for different values of X.
I am not including this option for the CDT agent since it is a strictly inferior version of 1, since their payout for honesty is $200 it is trivial that they would always be honest for under $199.99
If there is no such cutoff, they are in a classic battle of the sexes type problem FDT-Will's expected payoff from a deal is $1,000,200 - X, where Derek's payoff is $1 + X. It is not clear what FDT-Will's position would have to be for him to expect CDT-Derek to accept a better deal. Any deal from -0.99 to $1,000,199.99 is feasible under our assumptions (and would be a Nash Equilibrium) but we have no reason to expect any outcome in that range without adding some assumptions.
I feel like in general when unbeknownst to you you have a hostile telepath inspecting you, you are just fucked in arbitrary ways that are decision theory-agnostic. Completely speaking from intuition, this is very close to (but definitely not identical to) the no free lunch theorem where any DT benefits you in some universes and hurts you in others, in a roughly but probably not exactly symmertric way.
I don't think that is quite true. And as I said elsewhere, when we assume non-telepaths we get FDT losing by amounts dependent on the degree of information asymmetry. In this case, the driver, Derek, is able to capture "as much money as the agent will be willing, upon arriving in town, to pay them to prevent the scenario from happening." For CDT, lacking retro-causality, they will only be willing to pay up to whatever their honesty value and signaling value is (i.e. less than the $200 for Will). For the FDT agent, they will be willing to pay up to whatever they value the totality of the outcomes (live and pay vs. die and don't).
CDT being willing to be dishonest in retrospect means there is less value to capture from them. In the real world, CDT agents are what we act like. If we want people to be more honest, we try to increase the value of honesty and signaling. To prevent people like Derek capturing this, we put limits on it.
One could imagine instead Derek demanding them to sign an contract before saving them. CDT will would now also be willing to pay up to $1,000,200 if that contract would be enforceable. But in the real world, if Derek demanded that much, under U.S. law the contract would likely be thrown out as unconscionable if he demanded in excess of a million dollars for a short car ride.
He could probably get away with demanding a contract of a few hundred or even thousand dollars, but if he was charging thousands of times the fair market value for a car ride, any court would likely throw that out. I think that is right, there should be limits on what you can enforce on another party in such situations.
I am again speaking from intuition only and don't want to put more time thinking about this for now. I may not even endorse what I say if I put 5 minutes into thinking.
when we assume non-telepaths we get FDT losing by amounts dependent on the degree of information asymmetry
This seems like a good thing
For CDT, lacking retro-causality, they will only be willing to pay up to whatever their honesty value and signaling value is (i.e. less than the $200 for Will). For the FDT agent, they will be willing to pay up to whatever they value the totality of the outcomes (live and pay vs. die and don't).
This means CDT-Will will die if Derek' has a different utility function and is only willing to drive them home for $201+? This is the "other" universes I'm talking about.
In an even more realistic scenario, Will should have a prior for the minimum amount Derek is willing to get to drive them home. I expect this would make FDT-Will get some better calculations.
This seems like a good thing
Why?
Take my example with the contracts, I don't think that is actually a good outcome to be able to impose any contract on a disadvantaged party. Having the world of deals you can impose on someone you find at your mercy, so to say, restricted by what is socially permissible and enforceable seems like a preferable state of affairs. Absent legal/social frameworks, enforceability being limited by agent values and willingness to be beholden to deals seems like a preferable state of affairs to such limits not being in place.
This means CDT-Will will die if Derek' has a different utility function and is only willing to drive them home for $201+? This is the "other" universes I'm talking about.
Yes, if we assume Derek is a misanthrope he will kill Will if WIll is not willing to pay him some amount greater than his misanthropy. But I do not think that is a realistic state of affairs and I think on the flip side you can get asymmetric information causing FDT agents to behave sub optimally when presented with misanthropic actors.[1]
In an even more realistic scenario, Will should have a prior for the minimum amount Derek is willing to get to drive them home.
In the real world, we are often price takers or price setters and rarely negotiating as equal parties. Will may have the prior in my scenario for what he thinks will would be willing to accept. What his prior is, however, is irrelevant, he is not offered that price and doesn't get to proposition Derek. His only choices are "do I accept Derek's offer?" and when they get to town having accepted the offer he gets to decide "do I honor the offer?" If he wouldn't honor the offer, Derek wouldn't pick him up so he dies.
E.g., as the first example that comes to mind, let's say your child has been kidnapped. Your kidnapper just happened to capture your child, by pure chance not intentionally, but you have no way to know that. You think that paying off blackmailers makes it more likely you will be blackmailed. The blackmailer demands a payment (lets say there is an escrow and they cannot cheat), but you, as an FDT agent, decline to negotiate. So the blackmailer kills your kid and disappears. A CDT agent pays the blackmailer, not considering the odds their decision may have on them being blackmailed. Unlike the decent driver, which assumes a lack of information, this assumes a true mistake on the FDT part to be truly worse off. Edit: though you can get individual agents to be worse off under FDT in the standard blackmail dilemma, for this case I am pre-assuming true randomness, in which case FDT would pay if they thought it was truly random as such but would still refuse to pay if they were acting under a, in this case mistaken, assumption that agents that didn't pay would be extremely unlikely to be blackmailed.
I think you are intuiting the question of "which DT is better" using the real world too heavily in a sort of "I think a world where people all do this is better" -> "this DT is better" way. You can't just hope things work out this way.
This seems like a good thing
I don't think that is actually a good outcome to be able to impose any contract on a disadvantaged party
Yes, thats why you use laws / precommitments to prevent it. I guess I used "good" and that misled you a bit, I think it is game theoretically good, not morally ideal.
But I do not think that is a realistic state of affairs and I think on the flip side you can get asymmetric information causing FDT agents to behave sub optimally when presented with misanthropic actors.
As I said, this is very close to the no free lunch theorem where any DT benefits you in some universes and hurts you in others. I fully expect you can construct a situation including a hostile telepath where DT A outperforms DT B for any A/B.
What his prior is, however, is irrelevant, he is not offered that price and doesn't get to proposition Derek.
We are assuming Derek knows everything about Will right? So if Will changes his strategy based on his prior then Derek knows that too.
I think you are intuiting the question of "which DT is better" using the real world too heavily in a sort of "I think a world where people all do this is better" -> "this DT is better" kind of way. You can't just hope things work out this way.
Mostly fair, as i think you said elsewhere, i think I misunderstood you as making a value claim when you meant better in some other terms.
But one of the main reasons Yud and Soares give for preferring FDT over CDT is a belief that FDT leads to better outcomes. That is what I find unconvincing. It seems to me that more realistic assumptions better model observations under CDT (e.g. Braess's paradox, to use an exampl I did elsewhere) and can lead to better outcomes. That was my central thesis. I do agree, that it is usually trivial to conceive of scenarios where any given theory loses to another in some sense.
Yes, thats why you use laws / precommitments to prevent it
Yes, but I would argue it is good to have mediating forces outside of laws. Derek can get either kf them to sign a contract before hand for a $1,000,199, but only FDT would say that they should honor that contract absent any mechanism to enforce it. While I don't think it can be proven, it seems sensible before considering enforcement mechanisms we should consider honoring contracts based on how much we value honesty, associated signals and other such considerations. It seems less sensible to say we should honor them based solely on value estimates of the entire scenario they fall under. It alao seems sensible, if we include enforcement mechanism, that such mechanism be aet up to prevent people not following contracts that are generally deemed not unreasonable and preventing unconscionable conditions from being imposed even on agents that rationally consented to them (as would be the case with the agents consenting to a 1,000,200 contract).
We are assuming Derek knows everything about Will right? So if Will changes his strategy based on his prior then Will knows that too.
You mean Derek knows it, right? But it doesn't change Will's value calculation, so it shouldn't change his strategy a priori even if he had a prior for what he thinks Derek would accept. He would change his decision if we assumed he knew how Derek was likely to price set and adapted his strategy on that, though.
While I didn't explore the general case, which would be easier to if/when I formalize the dilemma, my intuition from the specific thought experiment is that generally when faced with an adversary (that is less than optimally adversarial, as is more common in the real world) that has an asymmetric advantage over an agent (e.g., as in the above case the adversary is a price-setter with an ability to predict agent's decisions) if the agent has some non-zero social values, CDT agents do better than FDT agents.
From a policy perspective, it also seems more reasonable to imagine agents under CDT; policies aimed at aligning agent's causal expectations with optimal social outcomes seem more effective at addressing e.g. freeriders. Of course, decision theory looks at problems from the agents perspective, not how we should assume agents are likely to act from a policy perspective, but part of the theoretical utility is in developing models for behavior which can serve a policy perspective. And from there, CDT which is commonly implicit in agent based models, seems to work out better in practice. Braess's paradox which is derived from real world observations is easily explained assuming agents make decisions under CDT, but if we assumed agents acted under FDT, it wouldn't occur.
Edit: to be clear, Braess' paradox not occurring would be a good thing, if people made driving decisions in a way that optimized overall traffick that would be better. But in this world we live in it does occur. Also, it is noteworthy that if we imagine individual FDT agents, their utility would likely be unchanged and Braess's paradox would still occur, since they would make their decisions based on the empirically observed behavior of others who don't operate under FDT.