DanielW's Shortform

DanielW

DanielW's Shortform — LessWrong

DanielW's Shortform

14th Mar 2026

1 min read

1

This is a special post for quick takes by DanielW. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

30 comments, sorted by

top scoring

Click to highlight new comments since: Today at 1:13 AM

[-]DanielW2mo3-1

If prediction markets are efficient, no one should use them (how they are currently being used). This is something that has bothered me for a while about the claims with prediction markets, particularly when compared to equity markets. Prediction markets are fundamentally a negative sum game (since you lose fees/interest taken by the platforms). If they were efficient (i.e., the price reflects all available information accurately), then you should in general always expect to lose in the long run if you don't have any private information. Equity markets have an obvious advantage of generally being a net positive, since they reflect the value of underlying firms/assets which tend to increase in value.

Of course, the more obvious closer comparison is to financial derivatives. In theory, I do see a strong case for prediction markets as effectively more direct methods of exchanging and managing risk (similar to how financial instruments like interest rate swaps are used). But the reality is pretty much no one using prediction markets trades that way, and indeed people advocating for the use of prediction markets generally discourage using them that way (e.g., a common case made by those like Hanson is that you should bet on things you expect and desire to happen so you have a vested interest in the outcome, this is the exact opposite of what you would do if you were using them as risk management tools, in that case you would want to bet against the outcome you most expect and are making other financial decisions on so that in the case you are wrong the damage is mitigated).

[-]papetoast2mo*2-1

Isn't a natural conclusion then that in reality prediction markets are not completely efficient? There are subsidy, irrational traders/gamblers, and risk hedging (yes I know you mentioned it).

Subsidy can come in another form, a player can spend money trying to elicit information so that they can use the information somewhere else.

~~Also efficiency can be relative to a trader even if everyone only trades on public information because in reality no one has the time to discover and aggregate all the public information.~~ This is just wrong

[-]DanielW2mo*2-1

Also efficiency can be relative to a trader even if everyone only trades on public information because in

I'll be blunt and start here, as it is an important mistake I see made a lot. I think a lot of people misunderstand what market efficiency refers to. It has nothing to do with individual trader behavior. You could be seeking out the worst investments, that wouldn't matter to the question of how efficient the market is. Market efficiency is just about the price. To the extent the price accurately reflects available information (or the underlying fundamental value) the market is efficient. An individual traders behavior is entirely irrelevant.

Also, efficiency doesn't equate to "good" so it can be a bit misleading. You could say roulette is 95% efficient, since the price is just off by ~5% of the underlying value (which can be ascertained by available information). But there isn't any way for you to reliably capture that 5%, it is off because of value captured by the house. And, like prediction markets, roulette is a negative sum game.

Isn't a natural conclusion then that in reality prediction markets are not completely efficient?

Sure, hence the "if" in my original comment. If you are doing it because you like to gamble, that's fine. Risk hedging also theoretically makes sense, but isn't currently how the markets are being used in any meaningful aense (and as mentioned runs dirextly contrary to how proponents like Hanson often suggest they should be used).

If you think the average traders are predictably irrational it should be possible to take their money and enrich the platform in the process. That seems like a bad thing and a good reason to advocate against prediction markets. Negative sum games are not exactly great for the economy, and anyone with altruistic views should view negative sum games as definitionally net negatives, I would argue.

Proponents of prediction markets generally argue they are efficient markets. If they aren't, the utility generally claimed to exist isn't there.

Subsidy can come in another form, a player can spend money trying to elicit information so that they can use the information somewhere else.

Spending money doesn't gain information. As I mentioned in my other comment. There is no additional price discovery function you can access to by placing a bid or ask on a prediction market. To the extent the market has that value, it is independent of your transactions.

[-]papetoast2mo10

I think I am implicitly just saying that prediction markets are not perfectly efficient. I think that is normal and still have great utility, I don't think efficient markets are something to be worried about because they will only exist mathematically. All these behaviors that only make sense when the market is not efficient makes sense exactly because the market is not efficient. But, there are different degrees of inefficiency. It is a scale, and I think prediction markets are pretty accurate already even if it is not mathematically efficient. Have you seen the brier score on different prediction markets? They are obviously not perfect, but it is pretty close. Close enough that it sometimes makes sense to act as if market probability = true probability.

Also the prediction markets are not negative sum for society as a whole, the fee doesn't just evaporate. It simply goes to the platform. Taking money from dumb people in a zero-sum game is fine.

Subsidy can come in another form, a player can spend money trying to elicit information so that they can use the information somewhere else.

Again I am assuming an imperfect world. There are a lot of public information but you don't want to spend time aggregating them, instead you create a market and put $10k in the automatic market maker to incentivize people to make their best guess.

[-]DanielW2mo*20

just saying that prediction markets are not perfectly efficient

I wouldn't say anyone is really discussing perfect efficiency. You could reasonably say roulette is a fairly efficient market, the price accurately reflects 95% of the value.

There isn't really value you can reliably, fairly make off them (otherwise someone like Jane Street would probably have scooped it up). You may be able to make money in some, but odds are more likely than not you are wrong and the juice isn't worth the squeeze.

Close enough that it sometimes makes sense to act as if market probability = true probability.

Yes, and if that is the case, your chances are greater than not that no matter how intelligent you are if that is true using a prediction market will lose you money.

still have great utility

What, though? The main valid arguments for utility i see--risk management and price discovery--are not how the platforms are being used or not good reasons to trade (on current platforms) respectively. To be clear, I think there are valid arguments for utility in potential, but what is currently being promoted by proponents in extant markets doesn't match the potential utility.

instead you create a market and put $10k in the automatic market maker to incentivize people to make their best guess.

In theory, maybe. In reality, doesn't happen on any current platforms. To the extent they adopt automatic market makers, those market makers act more as bookies, providing liquidity only at prices in excess of reasonable fair odds. Some play money platforms use automatic market makers designed to encourage price discovery (which Hanson has advocated for) but I am not aware of any large prediction markets that do so.

Also the prediction markets are not negative sum for society as a whole, the fee doesn't just evaporate

They have costs and there still is time value issues (to be truly net-zero, the total value of any position on average would have to appreciate by the risk-free rate + a risk premium), so it is still a net negative sum in that sense.

But I was talking from the perspective of a person trading, not society as a whole.

Taking money from dumb people in a zero-sum game is fine.

I don't necessarily think so. Like, a lot of scams are zero sum, but I would say scamming people is still wrong (even in cases where it might not legally be fraud). Taking advantage of people being gullible or gamblers is generally something I would not view favorably, and where there is a net loss (as is the case with, say, gambling at a casino) my default is to take a rather skeptical view.

In the case of equity, there is the fairly valid claim that it determines how equity is allocated, and by definition more efficiently allocating equity leads to greater economic product and therefore a net positive (and equity tends to increase in value above the risk free rate). Traders in prediction markets aren't really doing that; they are not capital markets.

[-]DanielW2mo10

I didn't mention it, but of course another benefit a lot of proponents reference is price discovery. This is certainly a real benefit (though, I am a bit skeptical about some of the claims of the extent of utility), but as an individual not a good reason to trade. I have seen some, like Scott Alexander, argue that platforms will pay traders to use them. This seems facially implausible to me and not clear how it would be structured. But in any case, it isn't how the platforms actually work (some do pay interest on positions, but they pay less than the risk-free rate).

Also, some have argued for actually encouraging insider trading, since allowing insiders to trade will make the prices even more accurate predictions. Even if we grant this argument, this makes it even worse reason for a typical trader to enter the market. It would mean paying a premium to play an unfair game, paying to get worse odds seems inherently non-sensical.

[-]DanielW2mo*1-2

Does Logical Decision Theories actually give meaningfully better recommendations on real world problems, particularly voting, frequently referenced?

One of the main reasons given for preferring logical decision theories (LDT), or particularly functional decision theory (FDT) is that agents do better in real world problems. Indeed, the article here on logical decision theory opens by discussing voting. I recently posted a discussion of a hypothetical where FDT agents perform worse, but I think when applying it in practice to the real world case of voting which is often given as a preference is actually better (see here for Eliezer Yudkowsky's discussion of voting under decision theories where he argues for logical decision theory being better). Particularly, I think that for most people this discussion gets wrong what causal decision theory actually would recommend.

To begin (note, I spend a while going over how to model voting decisions and different utility to CDT modeling of decisions for a few paragraphs, and later discuss practical agent to agent comparisons), let us imagine what the expected utility is for an agent under CDT of voting in some election. Let's say there are two candidates, like Yudkowsky, I will use the Simpson's Kang and Kodos. If Kang wins, we have some expected outcome (O1), if Kodos wins we have some expected outcome (O2). Let's say our agent is a Kang supporter and has a positive evaluation of O1 such that O1>O2.^[1]

Our agent is evaluating the value of voting for Kang (A1) or not voting (A0).

In the simplest case, with no externalities an EDT agent would say: "we should vote if our evidential evidence indicates voting is more likely to lead to Kang winning" (i.e., if P(O1|A1)>P(O1|A0). A CDT agent would say "we should vote if there is a positive probability that our vote will cause Kang to win" (we can say this works out equivalently, if P(O1|A1)>P(O1|A0) we should vote).

If we are a simplistic agent, in both cases we should vote, as in either case the value is something greater than zero. But of course, realistically we are not so simple agents, and have some cost to voting. Taking one more step of complexity and stopping there is where I think Yudowsky (and others) go wrong. They correctly note for most real world scenarios the probabilistic effects of a single vote are de minus and humans have some cost associated with voting.

For the CDT agent, they expect there is some probability of their vote being pivotal (we can say P(pivotal)=P(O1|A1)-P(O1|A0) ). They also have some cost of voting (say E). So really, they should vote if P(pivotal) * (O1-O2) > E. That is to say, if the probability of their vote being pivotal, times the difference in expected outcomes caused by their decision to vote, is greater than the cost.

This leads to some sensible recommendations, (i.e., you should be more likely to vote the less costly it is to vote, the more impactful the outcome of the election and the more likely it is your vote will be pivotal). If I am a policy maker and want to increase voting, I should use the policy levers at my disposal to reduce E, and political campaigners should emphasize the impact of the election and the odds of voters impacting results to increase turnout. This is what we observe in the real world.

Where Yudowsky says CDT gets it wrong, however, is that as mentioned the P(pivotal) is vanishingly small. While I framed P(pivotal) as the difference in odds for voting or not, for a CDT agent this could also be reduced to the odds of the candidates tying but-for your vote. Obviously, it is incredibly rare that major elections come down to single voters. For the EDT agent, they don't have to make this reduction so fair slightly better under uncertainty, but still would value the difference in odds as very small. Yudowsky says this misses the mark. But what if we take our model one step further? Our agent is a person, people place real value on things other than strict outcomes.

When someone says "it is your civic duty to vote" they are appealing to a real value we can include in our utility functions--people value being members of civic society and participation. In addition, there are social benefits to voting in the form of signaling, people proudly display 'I voted' stickers all the time. This is not internally independent, the more contested an election and the more meaningful the outcomes, the more valuable signaling is.

We can say P(pivotal) is a function of the degree an election is contested and general voting population (number of people and associated behaviors). Similarly, we may say the value of signaling in an election is a function of social values and how contested an election can be.

So we can say a CDT agent under real world conditions should vote if P(pivotal) * (O1-O2) + personal utility (e.g. personal values on being a civically engaged person) + social utility (e.g., benefits from signaling you are civically engaged) > E

This again leads to additional sensible recommendations. If I think well of myself as a civically minded person and value civic contributions and have social connections for whom signaling my voting behavior will provide social benefits, that should all increase my odds of voting. Similarly, for policy makers and political activists, increasing the civic mindedness and social value placed on voting, publicly being seen to promote and reward those voting, etc can be seen as recommended way to increase agent's decision to vote.

Bringing things together, where does FDT differ and what is the utility of these in practice?

As mentioned above, if I am a CDT agent deciding whether to vote, I have to answer the question "is my value from voting, which includes my personal values around voting and my expectations for the likelihood of my vote being pivotal and my expectations for different outcomes of the election greater than my anticipated cost of voting?" One's expectations of the election being pivotal can be determined by examining polls and voting models and making empirically based estimates of the outcomes (these also likely affect one's expectation of the value of signaling). This can be simply understood as trying to estimate what the causal outcome of my votes are likely to be. The empirical side of it is readily, concretely definable depending on my estimated values and while values and signaling effects must be individually parsed, they are usually fairly straightforward for most people.

So where does LDT/FDT differ? As Yud has put it instead of asking what your decision's outcome "You ask what would happen if people like you voted." This gives an obvious recommendation absent under CDT: "the more people that are similar to you, the more you should vote." This recommendation does not match as neatly with intuition (at least not with my institution) and, in fact, implicitly seems to run counter to Yudowsky's previous statement that under LDT it would still be irrational to vote if "if you don't expect any of the elections to be close." Under LDT if a lot of people like you (which might be heuristically judged by people voting for the same candidate as you) are voting, that would seem to provide more evidence that you should vote even more so the more dominant your side of the election is. Though this hinges on a fairly evident open question: who constitutes "people like you?" According to Yud, this is "just an empirical question" but is it really? His 2016 post on LessWrong gives multiple different ways you might think about who constitutes someone similar to you. There doesn't seem to be any unified way for an agent to really make that estimate. Should we perhaps use polls or previous results, maybe just personal estimates of how many people might think similiarly to use under our theory of mind? Those questions seem unanswered, if answerable at all, and depending on how you understand it can lead to contradictory, unintuitive voting patterns.

Case 1 Voting for an Underdog:

So, I am an FDT agent. I want to estimate whether I should vote or not. I look at past polling data, 40% of Kang supporters voted in the past. 50% of Kodos voters voted. I expect that the odds of Kodos winning are, say, 80% at present and the odds of my vote being pivotal is 0.000001%. I know a handful of people who think of themselves as LDT agents, most of them have told me they decided not to vote. How can I calculate my EV from this? I don't think there really is any clear way to quantify it, but let's consider a few possibilities qualitatively; should I:

a. Say since I know other LDT agents decided not to vote assume we are similar and that LDT recommends not voting in this decision?

b. Say few people are like me, most people won't reason the way I do, so the odds of the vote being different are de minus and not vote on those grounds?

c. See that most people with similar values to me are not voting and some are, seeing I identify with those who are voting more, but I therefore estimate that people similar to me are already voting so if people similarly to me voted the outcome would likely be the same. On that basis should I decide that being an agent that votes has little value and decide not to vote?

c. Say that there are a lot of Kang supporters not voting, to some extent we are similar (our voting behavior evidentially has some correlation with eachother), if they voted we would have a good chance of winning so I should by imagining the counterfactual where Kang had arbitrarily higher output and vote on that basis?

I don't think there is any clear way of judging these scenarios.

CDT gives a pretty clear recommendation. My expected value for voting for Kang is 0.00000001*(O1-O2) + personal utility + social utility - E (cost of voting). If I evaluate that positively, under my personal internal values of personal utility and social utility and whatever the costs are, I should vote. If I do not, I shouldn't.

Case 2 Protest Voting:

Let's say Kang was running against Lisa in the primaries. As polls increasingly show turnout for her is low, she is not considered in the election. However, by some definition of people similar to me, if people similar to me were to turn out for her she could still have had some probability of winning and would be so preferable to Kang and Kodos that I chose to vote for her anyway. ^[2]This doesn't seem very sensible. A CDT agent would value voting for her (despite knowing that in the specific case she had no chance of winning) if they expected the signaling benefit of their vote to be sufficiently greater to justify 'throwing away' their vote on a candidate who they know has already lost. There are cases where this might be justified, if one expects the odds of their vote being pivotal to be incredibly low, and the difference between O1 and O2 to be fairly negligible, then a protest vote can be considered somewhat rational under some signaling estimates. That calculation seems to me to be far more pragmatic and meaningful than the reasoning in FDT which requires implicitly treating you treating something you know to be false as if it could be true to determine whether to vote.

In this case, for a CDT agent, the expected value of voting can simply be evaluated as personal utility + social utility - E.

CDT seems to give clear recommendations that can be readily evaluated and do at least a serviceable job modeling real world behavior. FDT gives unclear recommendations, that to the extent they can be evaluated give less helpful recommendations. On that basis, it would seem to me that CDT actually wins out as a framework for considering whether it is rational to vote.

^{^}
I was going to write out all the math, but I cannot be asked to figure out notation formatting here.
^{^}
This might be compared to a version of Newcomb's problem with transparent boxes where the predictor is known to be very inaccurate

[-]DanielW1mo10

I am curious, since I never got any replies but a few disagrees, is there any meaningful recommendations that FDT gives for voting that I may have missed?

[-]DanielW2mo*0-9

I have been thinking about this for a while, and will probably formalize it at some point, but would like to get some of your thoughts in case there is some obvious case/backgroun I am missing.

In some more realistic formulations of dilemmas, agents that make decisions under Functional Decision Theory may have generally inferior outcomes to rational agents acting under alternative decision theories (in this case, I am just going to consider causal decision theory), which creates a seeming paradox that I am sure many readers will already expect (though this is not necessarily true logical paradoxes).

For this briefer post, I am going to assume people here are at least somewhat familiar with Yudkowsky and Soares' writings on the theories (note: he may well have covered this in some piece I have not read, I cannot claim to have read everything he has ever written).

Consider a modification of Parfit's Hitchhiker. The classic situation looks at it from the perspective of the hitchhiker making a binary decision to be honor a deal or not. Let's instead imagine a scenario of what one might call "Parfit's Decent Driver" that looks at how a driver and hitchhiker might interact under more realistic (but still extremely simplified) utility functions (represented as dollars for ease). My initial conception of the scenario runs as follows:

A rational agent (whether CDT or FDT won't change this part), Derek, is driving down an empty road. He sees Will, a person he knows well but has no particular ties to, on the side of the road, with nothing on him. Derek recognizes that Will will die of thirst if he does nothing. It would cost Derek $5 to pick up Will, but he has some utility to saving a life and expects some external signaling effects which he values at a total of $6. Derek values being truthful to an extreme degree: let's say he values being truthful to his word at $1,000,000 (this is just to simplify his maximal utility calculations, but it shouldn’t change the outcomes we care about).

Derek knows with a high degree of confidence (which we will assume is correct) Will values his own life at $1,000,000, Derek also knows Will values being honest and paying back commitments at $200. Derek knows Will won’t be able to pay him anything now, but knows when they arrive at a town Will would be able to pay an arbitrarily high amount.

Derek knows Will has no ability to negotiate or change his preferred method of decision making during their interaction. (edit: to be clear, per Firmament's point, we are also assuming that under this asymetry Will is only aware of his own expected value, and that as under Parfitt's Dilemma, he expects that if he were an agent that would lie and not pay Derek back, Derek would leave him to die).

Derek knows Will is also a rational agent that will make decisions based on evaluating the expected value he calculates. Derek also knows how Will evaluates expected value (we will imagine it under either casual decision theory or functional decision theory).

Derek decides he will pull over and offer to give Will a ride back, if and only if Will promises to give him $X when they return to town. How much should Derek ask Will to pay him? What is the value Derek should ask for if he knows Will makes decisions under FDT? What value should he ask for if he knows Will makes decisions under CDT?

Evidently, Derek wants to give Will a ride, his expected utility is positive for giving Will a ride and negative for leaving Will to die. The only question is what amount he should demand to maximize his utility. However, if Will does reject his offer, Derek’s value of truthfulness is such that he would rather leave Will to die (Derek does not want this to happen, so will not select a scenario where it may occur). Derek's offer, as such, is genuine. Under this scenario, Derek has an arbitrarily high confidence in his knowledge of how Will will act, so he will always pick a scenario where Will will truthfully agree to pay him back.

If Derek knows Will makes decisions under CDT, he tells Will “I will drive you to nearest town if and only if you agree to give me $199.99 when we arrive back to town.” Will agrees. When they get back to town, Will estimates the expected value of paying Derek back at $200-$199.99=$0.01 so he does so. Compared to if he hadn't met Will, Derek ends up $200.99 better off, and Will ends up $9,999,800.01 better off.

If Derek knows Will makes decisions under FDT, he tells Will “I will drive you to the nearest town if and only if you agree to give me $100,000,199.99 when we arrive back to town”. When they arrive back, Will estimates his expected value of paying Derek back at $1,000,200 - $100,000,199.99 = $0.01, so he does so. Compared to if he hadn't met Will, Derek ends up $100,000,200.99 better, Will ends up $0.01 better.

Now, let’s imagine making an earlier decision from Will's perspective. Will is evaluating the value of evaluating expected value under CDT or FDT, knowing this scenario will happen.

Under CDT he evaluates that deciding to be a CDT agent will make him better off by $9,999,800 so he decides to be a CDT agent.

Under FDT he evaluates that CDT agents will be better off by $9,999,800 so decides to be a CDT agent.

This is not per say a logical contradiction, but it seems rather counterintuitive that in some scenarios given the choice it would be rational to select an alternative model.

As an outsider, we also may say there are additional ways CDT seems to outperform FDT herein. First and most obviously, Will just does better. But it also creates better outcomes as a whole. For many of us outsiders, it seems unfair that Derek would price a service that costs him $5 at $1,000,199.99. Of course, the service (in both cases) is a net positive (if the service isn't performed, Will dies and neither party benefits from external positive externalities), but if Derek expects Will to be honest or dishonest under CDT, his pricing decisions seem to lead to a fairer distribution of the surplus value--Derek's recognition in the CDT scenario that Will is more willing to be dishonest actually leads him to be less able to leverage his asymmetric advantage over Will.

[-]Firmament2mo20

My knowledge of decision theory comes exclusively from reading relevant LessWrong posts when the mood takes me, but it seems to me that FDT-Will would instead act like this:

Assume Derek uses CDT and Will uses FDT (I do not know what would happen if both Derek and Will used FDT, but what happens would probably be the same as what happens when two FDT agents play the Ultimatum Game, since the situations are similar).
Imagine Derek demands $100,000,199.99 from Will.
When Will gets back to town, he reasons like so:
1. "If I implement the algorithm that always pays Derek money when he requests less than $100,000,200, then the copy of the algorithm living in Derek's mind when he was setting the price will also pay, meaning Derek will demand $100,000,199.99 from me, leaving me one cent better off."
2. "If I implement the algorithm that does not pay no matter what, then the copy of the algorithm living in Derek's mind when he was setting the price will also not pay no matter what, meaning Derek will not demand any money at all, since he predicts that I will not pay any sum of money he asks for. Derek will then drive me to town free of charge (since he wants me to live), leaving me $1,000,200 better off."
Will determines that the algorithm that doesn't pay money no matter what gives better outcomes, so he doesn't pay. Derek terminates the simulation and, in real life, gives Will a ride free of charge.
(Will then terminates that simulation and, in realer-life, decides to become an FDT agent)

Thus, FDT outperforms CDT, since CDT pays $199.99 and FDT pays nothing at all.

[-]DanielW2mo21

You are modeling a slightly different scenario that assumes Will has full knowledge of Derek's price-setting behavior. In that scenario, you are correct, but the scenario is explicitly assuming asymmetry in that regard. Derek knows Will's expected value when setting the scenario, but Will only has information to determine the expected value after the scenario has been set.

From Will's perspective, if he was an agent that accepted the deal, he would get taken back. If he was an agent that rejected the deal, he wouldn't. Derek has knowledge of will's decision making in this scenario, the reverse is not true. If the question was "what should Will's behavior be?" Will would win under FDT, since that would indeed imply he had knowledge and as such should adopt a 'never pay'. But if we calculate his EV under the perspective of Derek price setting, without assuming Will has full knowledge, Will's EV for paying back is positive for accepting the deal and negative if he wouldn't pay back (since as far as Will is concerned, if he wouldn't pay Derek, he would have been left to die), so as a rational agent should pay Derek.

Edit: to be clear, your scenario also implicitly assumes a degree of asymmetry, just on the side of Will's decision making. If we assume symmetry or allow negotiation, we would expect Will to adopt a strategy somewhere between -1 and 1,000,200 if we assume FDT agents (since those are the scenarios where they can both accept a deal as a net positive). If they are CDT agents, we expect them to negotiate somewhere between -1 and 200 (since those are the scenarios where they can both accept a deal as a net positive).

[-]Firmament2mo20

Ah, I see! So the scenario favors CDT only because Will lacks full information on the problem. Will thinks he's playing Parfit's Hitchhiker, but in reality he's playing Ultimatum.

I dunno, it doesn't really seem like a fair problem. You could construct a problem that unfairly favors CDT instead, like "Newcomb's Problem but, unbeknownst to you, if you one-box, you die".

When choosing his decision theory, maybe Will should decide to run FDT and then conservatively make CDT-like actions when he doesn't have full information, if he's expecting to encounter a lot of situations like this.

[-]DanielW2mo*20

Exactly, it is unfair to Will under FDT (which is part of why I framed it from Derek's perspective), but I would argue it is a lot closer to what we see in the real world.

Usually there is some asymmetry, people have nuanced utility functions and when there is some net positive utility actors desire to capture as much of the net benefit for themselves as they can. While dishonest agents can lead to worse outcomes (e.g., in the traditional Parfit's dilemma, someone who are under CDT is simply left to die, with no choice in the matter), unmitigated honesty can lead to an actor being a patsy, and taken advantage of. Realistic utility functions generally model and moderate this, I would argue, better than FDT.

One of the main reasons Yudowsky and Soares give for preferring FDT is that in the real world they view it as generally leading to better outcomes. I am not convinced that is the case, where these sorts of unfair situations arise.

You could construct a problem that unfairly favors CDT instead, like "Newcomb's Problem but, unbeknownst to you, if you one-box, you die".

Yes, but that isn't very realistic. The purpose of the construction is that more realistic assumptions can lend to FDT being worse off. A more direct comparison to this would be Newcomb's Problem with Transparent Boxes where the Predictor doesn't actually care if you one-box. But Newcomb's problem doesn't seem very realistic on its face.

This framing, I would argue, more closely resembles how we actuslly handle asymmetric problems. Pricing decisions are based on willingness to pay, contracts are devised with incentives around what we anticipate will make the cost of cheating greater than the cost of being honest, etc. These observed behaviors align better with an understanding of expected value under causal decision theory.

If we assumed an agent that always has at least symetric information and predictive ability, I think it is fair to say that for such an agent, FDT would win out. But in reality is that is rarely the case. Amazon knows more about your purchase histories, price decisions and whatnot than you know about Amazon.

[-]Firmament2mo10

Hm, I guess it's an empirical question then, of whether these situations happen in real life often enough to warrant using CDT or FDT-acting-like-CDT. I think FDT still wins out in the end, because FDT will emulate CDT if it realizes it lives in an FDT-hostile world (thus taking CDT-like actions out of an abundance of caution), while CDT has a harder time emulating FDT (CDT would need to use precommitment, while FDT does not).

I think that, in real life, humans are already near-optimal on this. People in low-trust FDT-hostile communities will grow up running CDT, while people in high-trust communities grow up running FDT. Issues occur when a person's community changes or if they move to a new community, but that problem is perhaps outside of the scope of decision theory.

So the moral of the story is "run FDT, but emulate CDT if you live near a lot of information-asymmetric FDT-hostile problems"

And, of course, the point of LessWrong decision theory is to choose which one to give to a superintelligence (I think), since humans can't actually choose their own decision theory (for the most part). And superintelligences probably won't end up on the short side of an information asymmetry very often.

You could construct a problem that unfairly favors CDT instead, like "Newcomb's Problem but, unbeknownst to you, if you one-box, you die".

Oops, meant "unfairly favors FDT instead; if you two-box, you die". It is indeed not very realistic, but it is possible to construct an isomorphic question in a more realistic setting, much like how Parfit's Hitchhiker is a more realistic yet isomorphic framing of Newcomb's Problem. So, again, it's an empirical question of which decision theory is favored in situations that occur in real life.

Claude Opus 4.6 Extended weighs in with what I believe to be insightful and correct critique (oneshot/no selection; told to "focus on logical errors"; I'm "Poster 2" and you're "Poster 1")

There are a few errors and confusions here that neither poster fully identifies:

The honesty parameter is doing most of the work, and it's doing different work for each DT. Under CDT, the $200 honesty value is the entire mechanism by which Derek can extract payment—without it, CDT-Will in town reasons "the ride already happened, paying has no causal effect on anything I care about" and pays $0. Under FDT, the honesty value is nearly irrelevant ($200 on top of $1,000,000 from counterfactual reasoning). So the comparison isn't really "FDT vs CDT"—it's "FDT's counterfactual reasoning makes the full life-value extractable vs. CDT needs an exogenous honesty parameter bolted on to make any value extractable." Poster 1 has quietly smuggled in a mechanism that makes CDT behave cooperatively for reasons that have nothing to do with CDT's actual decision-theoretic properties, then compared the magnitude of extractable surplus.

If you set honesty to $0, CDT-Will pays nothing and Derek (knowing this) never stops. FDT-Will still pays up to $1M and Derek extracts near-maximal surplus. The comparison reverses in valence: now CDT-Will dies and FDT-Will lives. This is just... standard Parfit's Hitchhiker again.

The "Will chooses his decision theory" argument is incoherent as a critique of FDT. Poster 1 frames it as a problem that FDT-Will, given the choice, would "choose to be a CDT agent." But that's just FDT working as intended. FDT says "implement whatever algorithm leads to best outcomes." If in this specific information-asymmetric environment the CDT-like algorithm dominates, FDT recommends it. This isn't FDT losing—it's FDT being a meta-theory that can recommend CDT-like behavior when appropriate. CDT can't do the reverse; it can't say "in Newcomb-like problems, implement FDT-like behavior" because CDT lacks the counterfactual reasoning to even identify when that's beneficial.

Poster 2 gestures at this ("FDT can emulate CDT") but doesn't make the sharper point that this "emulation" is just FDT functioning correctly at the meta-level.

The information asymmetry is contrived in a way that's load-bearing but presented as "realistic." Derek has perfect knowledge of Will's utility function, decision theory, and honesty parameter, while Will has zero strategic information about Derek's pricing. Poster 1 claims this is more realistic than Newcomb, but a scenario requiring one-sided perfect utility-function-level mind-reading is not obviously closer to real-world bargaining. In actual markets both parties have partial information and the dynamics look very different. The "Amazon knows more about you" analogy is a much weaker asymmetry than "Amazon knows your exact utility function and decision theory with arbitrarily high confidence."

An underexplored FDT subtlety with partial information. Under FDT, Will reasons about what happens when copies/simulations of his algorithm are evaluated by other agents. Poster 1 claims Will can't do this because he doesn't know Derek is modeling him—but this is confused about how FDT works. FDT-Will doesn't need to know the specific modeling; he reasons "I am implementing algorithm A. In all environments where A is evaluated—including by potential predictors—what are its outputs?" The question of whether FDT requires knowledge that you're being modeled to apply counterfactual reasoning is a genuinely deep question about FDT's foundations, and it's where the real philosophical action is. Neither poster engages with it.

Basically: poster 1 has constructed a scenario where the honesty parameter does the heavy lifting for CDT and the information asymmetry does the heavy lifting against FDT, then concluded that CDT "wins"—but the lesson is closer to "bargaining power determines surplus distribution, and agents who can credibly commit get better deals in symmetric info but are more exploitable under asymmetric info." Which is true but is a general feature of commitment, not a special failure of FDT.

[-]DanielW2mo20

Hm, I guess it's an empirical question then

I sort of agree, but I don't think it is one we can strictly answer. I gave some reasons we might think (in real world scenarios) that CDT tends to better explain behavior (e.g., Braess's paradox), though I do not believe it is one we can have enough data to answer for all time.

People in low-trust FDT-hostile communities will grow up running CDT, while people in high-trust communities grow up running FDT.

I disagree. I think the difference in high and low trust societies can largely be attributed to differences in utility functions and signaling effects. This appears empirically true (high-trust societies tend to have more information on each others activities, making signaling more meaningful, tend to poll as valuing trust, etc).

Also, the problem isn't "FDT-hostile" per-say. Derek is just trying to maximize his utility, he only cares what value he thinks he can get Will to honestly pay him. FDT does worse because it recommends taking the deal in Parfit's hitchhiker and doesn't have a posterior restraint on honest signaling.

If Derek was a misanthrope, Will could lose under CDT. If Derek valued saving Will's life at -195 (instead of +6), Derek would leave CDT Will to die and still save FDT Will. This is why I predicated that Derek has to be a decent person, he prefers a scenario where everyone wins to one where he wins and Will dies.

Parfit's Hitchhiker is a more realistic yet isomorphic framing of Newcomb's Problem

Minor nitpick but Parfit's hitchiker is isomorphic under CDT and to Newcomb's Problem with transparent boxes under both CDT and EDT but not Newcomb's problem in general. CDT doesn't care about evidentiary probability, but if you don't know what is in the boxes EDT says you should act probabilistically.

So the moral of the story is "run FDT, but emulate CDT if you live near a lot of information-asymmetric FDT-hostile problems"

I mean, I guess that is not unfair, but it seems bad. It would imply honest signals are good for CDT agents and dishonest signals are good for FDT agents.

Also, Claude is wrong or missing the point.

Under CDT, the $200 honesty value is the entire mechanism by which Derek can extract payment

Yes, this is the point. In the real world, for some value people are unwilling to cheat and for some value people are willing to cheat. The number of people, on seeing someone drop a $5 bill, that would return it to the person is greater than the people who seeing them drop a $100 bill would return it. This is a pretty trivial observation. Honesty in the real world is dependent on external signaling effects (i.e. people know others will be more honest with them if they are seen to be honest with others) and individual values (i.e., people are more honest because they ascribe some nebulous value to being honest, people who ascribe a greater value to honesty will be more honest than those who ascribe less of a value).

If you set honesty to $0, CDT-Will pays nothing

Derek still saves them, if you set all externalities to nothing, then yes. But the point of the hypothetical is to assume more normal human values. Most humans don't like killing people and don't like being dishonest.

FDT-Will still pays up to $1M and Derek extracts near-maximal surplus.

Which seems suboptimal. Extracting all of the surplus value for little effort doesn't seem like a good thing. In society we ideally want to have parties be able to negotiate how to best divide the surplus according to various principles and social values. FDT-Will can be taken advantage of because he doesn't have realistic human values. Similarly, in the typical formulation the driver can leave the hitchiker to die and the hitchhiker will be dishonest at any dollar figure because they have unrealistic utility functions.

The information asymmetry is contrived in a way that's load-bearing but presented as "realistic."

I mean, less so than in the original or in the inverse from Will's perspective. I explained why I would argue it is generally realistic. It is an extreme case. In truly realistic scenarios, I would expect Derek to simply extract more from FDT-Will to a differing degree depending on how confident he was Will would pay him back. If Derek thought Will was CDT will, he would have to base that response on the estimated utility of Will paying him back when saved, which would be the $200. If he expected Will was FDT will, he would have to do so based on his estimate of Will's functional utility for the total scenario, which would be $1,000,200. Realistically, it would be under $1,00,000 since he would expect Will to model some cut off to get a better deal. That cut off would fall between $0 and $1,000,200 depending on his relative estimates. If there was less asymmetry, it may end up that he would estimate he could only be reasonably confident that Will would take $5,000. But that would still have FDT-Will worse off since his bargaining position assumes a much larger stake than CDT-Will.

As I said in my OP, my comment was to make more realistic but still keep most of the simplifying assumptions. You can add more variables to make it more realistic--the more you add the more complex it becomes to model.

Poster 1 claims Will can't do this because he doesn't know Derek is modeling him

That wasn't what I was claiming. Did you have the same confusion? The problem Will has is he modelling to pay Derek or not based on the assumption that if he doesn't pay Derek he would have been not saved (i.e., exactly how FDT models Will's decision under the traditional Parfit's dilemma).

This necessitates he has some understanding of Derek's decision making, if he had no reason to think Derek cared either way, he would be fine lying in both the original Parfit's Dilemma and my 'Decent Driver' version.

The question of whether FDT requires knowledge that you're being modeled to apply counterfactual reasoning is a genuinely deep question about FDT's foundations, and it's where the real philosophical action is.

No it isn't, FDT plainly doesn't require that. FDT gives different outputs if you are being modeled in so far as how the agent is being modeled affects how the agent assesses the values at issue.

[-]Firmament2mo20

I gave some reasons we might think (in real world scenarios) that CDT tends to better explain behavior (e.g., Braess's paradox), though I do not believe it is one we can have enough data to answer for all time.

Indeed, whatever humans do seems to be closer to CDT than other decision theories, although humans use various concepts like justice, trust, and honor to approximate FDT occasionally.

People in low-trust FDT-hostile communities will grow up running CDT, while people in high-trust communities grow up running FDT.
I disagree. I think the difference in high and low trust societies can largely be attributed to differences in utility functions and signaling effects. This appears empirically true (high-trust societies tend to have more information on each others activities, making signaling more meaningful, tend to poll as valuing trust, etc).

The differences in utility functions are the humans' way of implementing FDT (since FDT is too hard to reason about for evolution to instill it directly), and the signaling effects/mutual knowledge are what makes FDT worth it.

Really, in this scenario, the fact that Will values honesty and promise-keeping means that CDT-Will is implementing a decision theory somewhere between CDT and FDT. FDT-Will effectively values honesty at infinity, while CDT-Will values it at $200. Your argument seems to be that, due to uncertainties in real life, the optimal value to place on honesty is somewhere between zero and infinity, but not at either extreme. Which is true.

Also, the problem isn't "FDT-hostile" per-say. Derek is just trying to maximize his utility, he only cares what value he thinks he can get Will to honestly pay him. FDT does worse because it recommends taking the deal in Parfit's hitchhiker and doesn't have a posterior restraint on honest signaling.
If Derek was a misanthrope, Will could lose under CDT. If Derek valued saving Will's life at -195 (instead of +6), Derek would leave CDT Will to die and still save FDT Will. This is why I predicated that Derek has to be a decent person, he prefers a scenario where everyone wins to one where he wins and Will dies.

I guess FDT-hostile is too strong a way to put it, since it implies the problem is an unfair problem. But as @papetoast said, there are some problems that FDT does better on, and some that CDT does better on (like the variant where Derek is misanthropic), and this one is one that CDT does better on.

Parfit's Hitchhiker is a more realistic yet isomorphic framing of Newcomb's Problem
Minor nitpick but Parfit's hitchiker is isomorphic under CDT and to Newcomb's Problem with transparent boxes under both CDT and EDT but not Newcomb's problem in general. CDT doesn't care about evidentiary probability, but if you don't know what is in the boxes EDT says you should act probabilistically.

Oops, I wasn't aware that there was a distinction between the transparent-box version and the opaque-box version. Thank you for the correction.

So the moral of the story is "run FDT, but emulate CDT if you live near a lot of information-asymmetric FDT-hostile problems"
I mean, I guess that is not unfair, but it seems bad. It would imply honest signals are good for CDT agents and dishonest signals are good for FDT agents.

I am confused as to what you mean. CDT and FDT-emulating-CDT act the same, so they're equally honest and get equally as much benefit from honesty. Is this about this specific problem? Or all similar problems? But this doesn't seem to be load-bearing, so, whatever.

Under CDT, the $200 honesty value is the entire mechanism by which Derek can extract payment
Yes, this is the point. In the real world, for some value people are unwilling to cheat and for some value people are willing to cheat. The number of people, on seeing someone drop a $5 bill, that would return it to the person is greater than the people who seeing them drop a $100 bill would return it. This is a pretty trivial observation. Honesty in the real world is dependent on external signaling effects (i.e. people know others will be more honest with them if they are seen to be honest with others) and individual values (i.e., people are more honest because they ascribe some nebulous value to being honest, people who ascribe a greater value to honesty will be more honest than those who ascribe less of a value).

True, the $200 honesty-value seems to be there just to make CDT act more-FDT-like.

If you set honesty to $0, CDT-Will pays nothing
Derek still saves them

Dang, you're right, I really should have noticed that.

But the point of the hypothetical is to assume more normal human values. Most humans don't like killing people and don't like being dishonest.

Wait, does the $200 honesty-value actually matter here? It doesn't seem like it changes the results of the hypothetical if you remove it, and removing it would make it easier to reason about.

FDT-Will still pays up to $1M and Derek extracts near-maximal surplus.
Which seems suboptimal. Extracting all of the surplus value for little effort doesn't seem like a good thing. In society we ideally want to have parties be able to negotiate how to best divide the surplus according to various principles and social values. FDT-Will can be taken advantage of because he doesn't have realistic human values. Similarly, in the typical formulation the driver can leave the hitchiker to die and the hitchhiker will be dishonest at any dollar figure because they have unrealistic utility functions.

It seems that if Derek didn't value honesty so highly that he wouldn't stick to his first offer and they would be able to come to a fairer deal. But this would be bad for Derek.

If tremendously valuing honesty is equivalent to FDT in this scenario (which it roughly seems to be but only because everyone makes a bunch of promises at the start in the desert), then in the scenario Derek is basically running FDT and using it to gain an advantage by precommitting to a single offer (the scenario explicitly says that negotiation is impossible, but in real life this would only happen if Derek is using some weird nonconventional negotiation tactics, and using a massive value on trust to precommit to the first offer made is one such tactic). So since Derek ends up with all the utility here, I guess FDT is good if you're Derek and only bad if you're Will. But I haven't thought about this enough.

As I said in my OP, my comment was to make more realistic but still keep most of the simplifying assumptions. You can add more variables to make it more realistic--the more you add the more complex it becomes to model.

I was focused on the low-level claims so I neglected to paste that top-level comment into Claude's chat; oops. Anyways, I don't see the value in making the scenario complex. If the goal is to show a flaw in FDT, then that flaw will manifest in a simple scenario, which would be easier to reason about. But I guess if the goal is to show what should be done by a real human pragmatically, then complexity might be fine.

Poster 1 claims Will can't do this because he doesn't know Derek is modeling him
That wasn't what I was claiming. Did you have the same confusion? The problem Will has is he modelling to pay Derek or not based on the assumption that if he doesn't pay Derek he would have been not saved (i.e., exactly how FDT models Will's decision under the traditional Parfit's dilemma).

Well, Derek is modeling Will on two levels. Derek is modeling what prices out of all possible prices Will would pay at, and Derek is modeling whether Will will pay the price that Derek actually decides. Will is only aware of the latter level, but isn't aware of the price-setting that Derek was doing before. So Will can't effectively leverage FDT, since he isn't aware of that first level of modeling.

The question of whether FDT requires knowledge that you're being modeled to apply counterfactual reasoning is a genuinely deep question about FDT's foundations, and it's where the real philosophical action is.
No it isn't, FDT plainly doesn't require that. FDT gives different outputs if you are being modeled in so far as how the agent is being modeled affects how the agent assesses the values at issue.

I'm having trouble comprehending this and should probably get some sleep, but it would seem Claude is being weird and overconfident here so I hereby downgrade my overconfident endorsement of Claude's outputs from "insightful and correct" to "looks right in some places but makes mistakes or is overconfident in other places".

[-]DanielW2mo20

humans use various concepts like justice, trust, and honor to approximate FDT occasionally.

I don't think it is approximating FDT. I think it is just different values. Laws and policies may make CDT agents approximate what FDT agents would do without those laws, but that is not what I mean. Real humans have complex sets of desires/utility functions.

The differences in utility functions are the humans' way of implementing FDT
...
FDT-Will effectively values honesty at infinity, while CDT-Will values it at $200.
...

I think this may be a central confusion. You are misunderstanding the hypothetical somewhat. FDT-Will values honesty at $200. He and CDT-Will would both be willing to be dishonest if it got him a >$200 payoff. To take my prior example, if someone dropped a $100 bill, he would return it. But if they dropped >$200, he would pocket it. The reason he is willing to pay up to the value of his life +$200 is because his assessment of the value is not based on how he values honesty, it is only based on how he expect agents like himself to be treated.

True, the $200 honesty-value seems to be there just to make CDT act more-FDT-like.
...
Wait, does the $200 honesty-value actually matter here?
...
Anyways, I don't see the value in making the scenario complex.

Out of order, but i think it is more relevant here. Absent the $200, Will is monetarily better off (since Derek would drive them back anyway). It is to show in real scenarios what factors the other party might use to determine how much to demand of the hitchhiker. In the original, why the driver asks for what he does is ignored. Realistically, people don't set prices at random.

The value of the $200 is meant to show price setting behavior in a more realistic CDT environment. It is not relevant to CDT winning. CDT wins in the given because Derek is a Decent Driver (hence the name). If Derek wasn't a decent guy, CDT would still win if (and only if) they valued the signaling + honesty greater than Derek thought driving back was costly. Derek loses every dollar he values honesty more than Derek sees saving him as costly (though it doesn't affect his actual expected value payoff, just amount that is actual cash money).

But I agree on the complexity. I guess it wpuld have been better to first present Will as a simply agent with no external values and then show how he would behave under CDT with more realistic values. But the more realistic values are what I'd argue are more relevant for where CDT offers different policy implications.

Oops, I wasn't aware that there was a distinction between the transparent-box version and the opaque-box version. Thank you for the correction.

No worries, the reason the original was interesting is that CDT estimates two-boxing maximizes expected value while EDT would estimate one boxing does. Both EDT and CDT in the transparent case would say two box. EDT says in the opaque case if you one box there is a 99% chance (or whatever probability you apply) of the opaque box having the money, so one-boxing works out a higher value. But if you can see what's in the box, that is no longer a evidential problem so EDT says to two box and you get no difference.

I am confused as to what you mean. CDT and FDT-emulating-CDT act the same, so they're equally honest and get equally as much benefit from honesty. Is this about this specific problem? Or all similar problems? But this doesn't seem to be load-bearing, so, whatever.

I will have to think on this, but my first thought is that my previous point on honesty applies. In this scenerio, the CDT agent gets a better deal by signaling honestly they will act as if they value being honest about their payments at $200. The FDT agent can actually do better (as you said in your first response) if there isn't asymmetry by acting as if they would never repay a payment. This implies to me a dishonest signal. But yea, it isn't load bearing, it is somewhat my own intuition.

in the scenario Derek is basically running FDT and using it to gain an advantage by precommitting to a single offer (the scenario explicitly says that negotiation is impossible, but in real life this would only happen if Derek is using some weird nonconventional negotiation tactics, and using a massive value on trust to precommit to the first offer made is one such tactic).

Real life negotiation often is impossible or extremely costly. Negotiating prices after a hospital visit can cost many man hours. Most retailers won't allow you to negotiate at all. Setting prices and comitting to them is a pretty conventional tactic. But if you allow negotiation Derek is still likely to get an outsized payment for something he would have done for free. Paying him seems to have some social utility (we want to encourage people to be decent and help others), but limiting it to amounts consisered feasible by social values and honest signals (as under CDT), seems likely to lead to better outcomes than msking the restraints equivilant to the total value of the interaction (which is the restraint under FDT).

I'm having trouble comprehending this and should probably get some sleep, but it would seem Claude is being weird and overconfident here so I hereby downgrade my overconfident endorsement of Claude's outputs from "insightful and correct" to "looks right in some places but makes mistakes or is overconfident in other places".

Fair enough, same here. Have a good night!

[-]Firmament2mo20

humans use various concepts like justice, trust, and honor to approximate FDT occasionally.
I don't think it is approximating FDT. I think it is just different values. Laws and policies may make CDT agents approximate what FDT agents would do without those laws, but that is not what I mean. Real humans have complex sets of desires/utility functions.

I mean, my intuition says that the right utility function can turn any CDT agent into an FDT agent, and any FDT agent can be described in terms of a CDT agent with a certain utility function. Like, CDT will one-box if it intrinsically values one-boxing in Newcomblike problems. So, a human with weird desires for justice and a human running FDT act the same if you squint.

The differences in utility functions are the humans' way of implementing FDT
...
FDT-Will effectively values honesty at infinity, while CDT-Will values it at $200.
...
I think this may be a central confusion. You are misunderstanding the hypothetical somewhat. FDT-Will values honesty at $200. He and CDT-Will would both be willing to be dishonest if it got him a >$200 payoff. To take my prior example, if someone dropped a $100 bill, he would return it. But if they dropped >$200, he would pocket it. The reason he is willing to pay up to the value of his life +$200 is because his assessment of the value is not based on how he values honesty, it is only based on how he expect agents like himself to be treated.

I'm making a specific claim about this specific scenario. I agree that both CDT-Will and FDT-Will
will pick up $300 on the ground and keep it. But in this scenario, back in the desert, all parties
involved hash out exactly what they're going to do in the future. So if both Derek and Will were
running CDT, but they were to value honesty at infinity dollars, then they would act exactly the
same as if they both ran FDT but valued honesty at zero dollars. So the honesty parameter acts
as a way to interpolate between CDT and FDT.

in the scenario Derek is basically running FDT and using it to gain an advantage by precommitting to a single offer (the scenario explicitly says that negotiation is impossible, but in real life this would only happen if Derek is using some weird nonconventional negotiation tactics, and using a massive value on trust to precommit to the first offer made is one such tactic).
Real life negotiation often is impossible or extremely costly. Negotiating prices after a hospital visit can cost many man hours. Most retailers won't allow you to negotiate at all. Setting prices and comitting to them is a pretty conventional tactic. But if you allow negotiation Derek is still likely to get an outsized payment for something he would have done for free. Paying him seems to have some social utility (we want to encourage people to be decent and help others), but limiting it to amounts consisered feasible by social values and honest signals (as under CDT), seems likely to lead to better outcomes than msking the restraints equivilant to the total value of the interaction (which is the restraint under FDT).

This seems like the crux of the issue. I think that the whole reason Will is in this mess is because Derek places a value of $1,000,000 on trust for some reason, making him act exactly like an FDT agent. If the tables were turned, and we got rid of the "no negotiation allowed" rule, and Derek was a $0 honesty CDT agent and Will was an FDT agent (or alternatively a CDT agent with a high value on honesty), then Will could say "I precommit to not letting you drive me to to town unless you pay me $0.99 right now" (we assume Derek has money on him or that Will is capable of somehow paying Derek a negative amount in town) and then Derek would have no choice but to comply. And if instead both agents ran dishonest CDT, then words mean nothing and Derek would silently drive Will to town for the $1 altruism utility. So the moral of the story is that whoever runs FDT wins, with ties broken by whoever has the information advantage. The magnitude of "winning" is very different, because FDT-Derek-FDT-Will ends up netting Derek a million dollars while FDT-Derek-CDT-Will only gets Derek $199,99, but FDT-Derek wins nevertheless.

[-]DanielW2mo20

I mean, my intuition says that the right utility function can turn any CDT agent into an FDT agent, and any FDT agent can be described in terms of a CDT agent with a certain utility function.

I don't think that follows. The CDT and FDT agents have the same utility functions and behave differently. Of course, if you gave them different tailored utility functions you could get them to behave the same in any given case, but that doesn't seem very sensible, imo.

So if both Derek and Will were running CDT, but they were to value honesty at infinity dollars,

In that case, Derek could demand infinite dollars from Will and Will would pay it.

This seems like the crux of the issue. I think that the whole reason Will is in this mess is because Derek places a value of $1,000,000 on trust for some reason, making him act exactly like an FDT agent

If you remove Derek valuing honesty, his optimal decisions work out identically, as I daid in the OP. I made him value it trivially highly so I didn't have to include a discussion of those scenarios to show they are suboptimal for Derek, but you can calculate his EV yourself, they will always be less than thr scenarios I described in the OP.

If the tables were turned, and we got rid of the "no negotiation allowed" rule, and Derek was a $0 honesty

Derek's honesty value doesn't affect those scenarios. In a negotiation, the turn order, information asymmetry etc determine who wins.

So the moral of the story is that whoever runs FDT wins

PEr the original problem, Derek's optimal move is identical under CDT and FDT. How much he can get from Will is the only variable which depends on Will's utility calculations.

This isn't a tiebreaker, it is what value they ascribe to different scenarios. Since CDT-Will's posterior calculation is limited to his causal effects, the value that can be extracted from him is much lower.

[-]Firmament2mo10

The CDT and FDT agents have the same utility functions and behave differently. Of course, if you gave them different tailored utility functions you could get them to behave the same in any given case, but that doesn't seem very sensible, imo.

What I mean is that you can think of "CDT agent with certain utility function" and "FDT agent" as exactly the same. They're the same concept. So when you say "I don't think it is approximating FDT. I think it is just different values." I reply that "different values" and "approximating FDT" are the exact same thing, at least in the case where the mentioned "different values" are "justice, trust, and honor", in my opinion.

So if both Derek and Will were running CDT, but they were to value honesty at infinity dollars,
In that case, Derek could demand infinite dollars from Will and Will would pay it.

Well, Will only values his life at a million dollars, so he would rather die than pay more than that. I admit that when I wrote that bit I was mentally conflating "one million" and "infinity" to simplify reasoning. Hopefully all the other shortcuts I'm using don't break anything.

Intuition says the "infinity" here comes from Derek and Will's infinitely accurate predictions. As in, if the predictions were less than infinitely accurate, then you would need less than infinity dollars of honest-value to make CDT act like FDT. Dunno if that's true and it doesn't matter if it does, so, whatever.

[the rest of the reply]

I should have clarified more, oops. I was talking about a minor variation of the scenario where the "negotiation is not possible" restriction is lifted (while still keeping the information asymmetry somehow). In this case, with no other changes, the problem is basically the same, since Derek just says "btw I swear on God almighty that I am not negotiating at all, since this way I get the best outcomes" and then the rest of the scenario plays out the same (as long as we posit that Will's memory of this exchange is magically erased and so FDT-Will doesn't consider changing his behavior to get a better deal)

And meanwhile if Derek's $1,000,000 value on honesty is set to $0 BUT he uses FDT then the exact same thing happens absent any weird commitment-race dynamics with FDT-Will

Meanwhile if Derek has $0 honesty and is CDT and he says "btw no negotiation" then FDT-Will can say "no, screw you, we're negotiating or I swear I will bury my head in the sand and die" and Derek will say "oh ok, i can tell that you will keep your promise, nevermind then, let's negotiate". FDT-Will then says "give me $0.99 and I'll let you save my life" and poor CDT-Derek will agree.

The FDT-Derek + FDT-Will case is probably important but it scares me and I don't know how to reason about it. Probably with geometric utility. In this case, if we add a rule saying Derek gets 1 million dollars of utility from being alive, FDT-Derek pays 50 cents to FDT-Will to maximize the logarithm of utility, since Derek gets $1,000,000 + $0.50 and Will gets $1,000,000 + $0.50 and since these numbers are the same utility is maximized which is the best possible output for the function (we are ignoring the honesty-utility here)

[-]DanielW2mo10

Also, putting this in another post since I think it is a major point, if we assume some cost to bargaining, for Derek it approximates something like a dove-hawk game, where Derek gets the first move. Will's game is more complex as he is operating under information assymetry, so depends on the odds he assigns some probailities

If we consider the value Will pays as X (negative if Derek pays Will), if we assume some cost (C) of both negotiating the outcome, the payoffs works out to (I don't know how/if you can put tables into comments so I just have to write them out):

Payoffs given FDT-Will with Negotiation:

(1) Will accepts the initial offer (for FDT-Will, X = 1,000,199.99):

Derek: 1 + 1,000,199.99
Will: 0 * -1,000,000 + 1 * -999,999.99

(2) Will Contests and Derek Accepts (say X = -0.99^[1]):

Derek: 1 - 0.99
Will: P(Derek rejects)*-1,000,0000 + P(Derek accepts)*(200.99) + P(Derek contests)*( (3) Will)

(3) Will and Derek contest over X. X is unspecified under the assumptions, any number where X > (C - 1) and X < (1,000,200 - C) is feasible:

Derek: 1 + X - C
Will: P(Derek rejects)^[2]*-1,000,0000 + P(Derek accepts)*(200- X) - C

Counterfactual: Derek doesn't offer an amount and Will doesn't contest (X = 0)

Derek: 1
Will: 1,000,200

Payoffs given CDT-Will with Negotiation:

(1) Will accepts the initial offer (for CDT-Will, X = 199.99, since anything greater wouldn't be paid):

Derek: 1 + 199.99
Will: 0 * -1,000,000 + 1 * 0.01

(2) Will Contests and Derek Accepts (X = -0.99):

Derek: 1 - 0.99
Will: P(Derek rejects)*-1,000,0000 + P(Derek accepts)*(200.99) + P(Derek contests)*( (3).Will)

(3) Will and Derek contest over X. X is unspecified under the assumptions, any number where X > (C - 1) and X < (200 - C) is feasible:

Derek: 1 + X - C
Will: P(Derek rejects)*-1,000,0000 + P(Derek accepts)*(200- X) - C

Counterfactual: Derek doesn't offer an amount and Will doesn't contest (X = 0)

Derek: 1
Will: 1,000,200

While we would need to know Will's probability estimates to actually model how they behave and what actions they take, from this it seems rather evident that under most approximations CDT-Will is still likely to be better off.

^{^}
Realistically, Will could set a value of X higher to decrease the chances of Derek contesting. But I am just assuming the extreme case here.
^{^}
These probabilities depend on the values of X. FDT-Will would estimate that as x approaches 1,000,199.99 P(Derek accepts) approaches 1.

[-]DanielW2mo10

What I mean is that you can think of "CDT agent with certain utility function" and "FDT agent" as exactly the same. They're the same concept.

They are not. A CDT agent is fundamentally doing a different expected value calculation than a FDT agent. This is why they can lead to radically different outcomes.

I should have clarified more, oops. I was talking about a minor variation of the scenario where the "negotiation is not possible" restriction is lifted (while still keeping the information asymmetry somehow).

Okay, play out the scenario. He offers to take CDT will back for $199.99, what does will say? Will's expected values are:

"Ok" - payoff = $0.01
"No" - payoff = -$1,000,000 (this is assuming an honest/total no, the other case is under 3)
"No take me for X amount." - payoff = ???^[1] (he doesn't know whether Derek will accept or not, note that this includes the case where X is zero or the cases where X is negative).

Now the question becomes "how does Will estimate the payoffs for 3?" What is his expectation for Derek to negotiate? Etc. If we assume sufficient risk aversion (which I would argue is the most probable outcome) 1 is still preferable.

Let's imagine Derek offers to take FDT will back for $1,000,199.99. Will's expected values are:

Be an agent that would say "OK" - payoff = -$999,999.99
Be an agent that would say "No" - payoff = -$1,000,000
Be an agent that would say "no take me for X amount" - payoff = ???
Edit: Be an agent that would say "Ok" but not actually pay Derek - payoff = -$1,000,000 (I realize I forgot to include this one, as in Parfit's Hitchiker, the agents' expected outcome if they wouldn't pay Derek honestly is that they would be left to die)^[2]

FDT-Will has the same problem as CDT-Will. Though, for FDT-Will, unlike CDT-Will, I would argue under most reasonable assumptions there would be some preferable value for X under 3 that FDT-Will would estimate has a better expected value. Given that, he would try to negotiate a value somewhere between -$1 and $1,000,199.99. Where he would negotiate that value depends on risk aversions and how he estimates the responses from Derek.

And meanwhile if Derek's $1,000,000 value on honesty is set to $0 BUT he uses FDT then the exact same thing happens absent any weird commitment-race dynamics with FDT-Will

I am not convinced this is true. I don't see why FDT-Derek would behave differently. If we assume information symmetry, then you get the same commitment race with their priors.

FDT-Will then says "give me $0.99 and I'll let you save my life"

Why? See above. FDT-Will under your scenario still suffers from information asymmetry. You can argue that the 3 is reasonably the better option for him, but he has no idea what value of X is optimal. We know Derek considers any value >-$0.99 as an expected positive, but Will is operating in an asymmetric environment. He doesn't know what Derek will decide. It seems reasonable that Will might expect Derek to accept some lower amount, but he is going to have to weigh that against the probability that Derek says "no." If he has extreme risk aversion, he will still prefer 1 even if he estimates Derek would likely accept a lower price. If he has no risk aversion and Derek cannot counter offer, he will offer whatever he expects Derek to accept.

and poor CDT-Derek will agree.

Why? Let's lay out CDT-Derek's option.

Agree - payoff = -$5.99
Refuse - payoff = -$6.00
Refuse and tell FDT-Will "I will only take you back for X amount" (where X is greater than -0.99) - payoff = unspecified (but known to Derek)

It seems likely that CDT-Derek would pick some variant of 3, dependent on what he expects FDT-Will to react with which depends on FDT-Will's estimation of CDT-Derek. You would expect to get some race with FDT-Will trying to determine Derek's utility function. Indeed, if we assume perfect information asymmetry, CDT-Derek's best move is probably to keep saying "I will only take you back for $1,000,199.99" to prevent FDT-Will from getting any information on his utility, if CDT-Derek repeats that until FDT-Will is about to die if he doesn't make a decision, FDT-Will, having gained no information on the Derek's utility, is likely to simply accept when he becomes unable to negotiate further (for the same reasons as above).^[3] And, making the standard FDT estimates when he gets to town (i.e., he anticipates if he wasn't the kind of agent that would pay, he would have been left to die), he would honestly pay the $1,000,199.99.

^{^}
When I use '???' I mean that it is both unspecified under the assumptions of the equations and unknown to the agent. We would need to add additional specifications to the problem to determine the expected payoff for different values of X, and without changing assumptions Will's expected payoff from X would remain unknown to Will. We could add assumptions for Will's estimates (which are not likely to be equivalent to the real payoffs), to determine what Will would estimate the expected payoffs are for different values of X.
^{^}
I am not including this option for the CDT agent since it is a strictly inferior version of 1, since their payout for honesty is $200 it is trivial that they would always be honest for under $199.99
^{^}
If there is no such cutoff, they are in a classic battle of the sexes type problem FDT-Will's expected payoff from a deal is $1,000,200 - X, where Derek's payoff is $1 + X. It is not clear what FDT-Will's position would have to be for him to expect CDT-Derek to accept a better deal. Any deal from -0.99 to $1,000,199.99 is feasible under our assumptions (and would be a Nash Equilibrium) but we have no reason to expect any outcome in that range without adding some assumptions.

[-]Firmament2mo10

I don't have the time to give this full consideration, but on the whole I think you are correct if Will has the information asymmetry in both the negotiation phase and the payment phase, whereas I was implicitly assuming Will having full information in negotiation and suddenly gaining an information asymmetry in the payment phase (which doesn't make much sense). So, yeah, I think I agree.

[-]papetoast2mo10

I feel like in general when unbeknownst to you you have a hostile telepath inspecting you, you are just fucked in arbitrary ways that are decision theory-agnostic. Completely speaking from intuition, this is very close to (but definitely not identical to) the no free lunch theorem where any DT benefits you in some universes and hurts you in others, in a roughly but probably not exactly symmertric way.

[-]DanielW2mo2-5

I don't think that is quite true. And as I said elsewhere, when we assume non-telepaths we get FDT losing by amounts dependent on the degree of information asymmetry. In this case, the driver, Derek, is able to capture "as much money as the agent will be willing, upon arriving in town, to pay them to prevent the scenario from happening." For CDT, lacking retro-causality, they will only be willing to pay up to whatever their honesty value and signaling value is (i.e. less than the $200 for Will). For the FDT agent, they will be willing to pay up to whatever they value the totality of the outcomes (live and pay vs. die and don't).

CDT being willing to be dishonest in retrospect means there is less value to capture from them. In the real world, CDT agents are what we act like. If we want people to be more honest, we try to increase the value of honesty and signaling. To prevent people like Derek capturing this, we put limits on it.

One could imagine instead Derek demanding them to sign an contract before saving them. CDT will would now also be willing to pay up to $1,000,200 if that contract would be enforceable. But in the real world, if Derek demanded that much, under U.S. law the contract would likely be thrown out as unconscionable if he demanded in excess of a million dollars for a short car ride.

He could probably get away with demanding a contract of a few hundred or even thousand dollars, but if he was charging thousands of times the fair market value for a car ride, any court would likely throw that out. I think that is right, there should be limits on what you can enforce on another party in such situations.

[-]papetoast2mo10

I am again speaking from intuition only and don't want to put more time thinking about this for now. I may not even endorse what I say if I put 5 minutes into thinking.

when we assume non-telepaths we get FDT losing by amounts dependent on the degree of information asymmetry

This seems like a good thing

For CDT, lacking retro-causality, they will only be willing to pay up to whatever their honesty value and signaling value is (i.e. less than the $200 for Will). For the FDT agent, they will be willing to pay up to whatever they value the totality of the outcomes (live and pay vs. die and don't).

This means CDT-Will will die if Derek' has a different utility function and is only willing to drive them home for $201+? This is the "other" universes I'm talking about.

In an even more realistic scenario, Will should have a prior for the minimum amount Derek is willing to get to drive them home. I expect this would make FDT-Will get some better calculations.

[-]DanielW2mo10

This seems like a good thing

Why?

Take my example with the contracts, I don't think that is actually a good outcome to be able to impose any contract on a disadvantaged party. Having the world of deals you can impose on someone you find at your mercy, so to say, restricted by what is socially permissible and enforceable seems like a preferable state of affairs. Absent legal/social frameworks, enforceability being limited by agent values and willingness to be beholden to deals seems like a preferable state of affairs to such limits not being in place.

This means CDT-Will will die if Derek' has a different utility function and is only willing to drive them home for $201+? This is the "other" universes I'm talking about.

Yes, if we assume Derek is a misanthrope he will kill Will if WIll is not willing to pay him some amount greater than his misanthropy. But I do not think that is a realistic state of affairs and I think on the flip side you can get asymmetric information causing FDT agents to behave sub optimally when presented with misanthropic actors.^[1]

In an even more realistic scenario, Will should have a prior for the minimum amount Derek is willing to get to drive them home.

In the real world, we are often price takers or price setters and rarely negotiating as equal parties. Will may have the prior in my scenario for what he thinks will would be willing to accept. What his prior is, however, is irrelevant, he is not offered that price and doesn't get to proposition Derek. His only choices are "do I accept Derek's offer?" and when they get to town having accepted the offer he gets to decide "do I honor the offer?" If he wouldn't honor the offer, Derek wouldn't pick him up so he dies.

^{^}
E.g., as the first example that comes to mind, let's say your child has been kidnapped. Your kidnapper just happened to capture your child, by pure chance not intentionally, but you have no way to know that. You think that paying off blackmailers makes it more likely you will be blackmailed. The blackmailer demands a payment (lets say there is an escrow and they cannot cheat), but you, as an FDT agent, decline to negotiate. So the blackmailer kills your kid and disappears. A CDT agent pays the blackmailer, not considering the odds their decision may have on them being blackmailed. Unlike the decent driver, which assumes a lack of information, this assumes a true mistake on the FDT part to be truly worse off. Edit: though you can get individual agents to be worse off under FDT in the standard blackmail dilemma, for this case I am pre-assuming true randomness, in which case FDT would pay if they thought it was truly random as such but would still refuse to pay if they were acting under a, in this case mistaken, assumption that agents that didn't pay would be extremely unlikely to be blackmailed.

[-]papetoast2mo*20

I think you are intuiting the question of "which DT is better" using the real world too heavily in a sort of "I think a world where people all do this is better" -> "this DT is better" way. You can't just hope things work out this way.

This seems like a good thing
I don't think that is actually a good outcome to be able to impose any contract on a disadvantaged party

Yes, thats why you use laws / precommitments to prevent it. I guess I used "good" and that misled you a bit, I think it is game theoretically good, not morally ideal.

But I do not think that is a realistic state of affairs and I think on the flip side you can get asymmetric information causing FDT agents to behave sub optimally when presented with misanthropic actors.

As I said, this is very close to the no free lunch theorem where any DT benefits you in some universes and hurts you in others. I fully expect you can construct a situation including a hostile telepath where DT A outperforms DT B for any A/B.

What his prior is, however, is irrelevant, he is not offered that price and doesn't get to proposition Derek.

We are assuming Derek knows everything about Will right? So if Will changes his strategy based on his prior then Derek knows that too.

[-]DanielW2mo10

I think you are intuiting the question of "which DT is better" using the real world too heavily in a sort of "I think a world where people all do this is better" -> "this DT is better" kind of way. You can't just hope things work out this way.

Mostly fair, as i think you said elsewhere, i think I misunderstood you as making a value claim when you meant better in some other terms.

But one of the main reasons Yud and Soares give for preferring FDT over CDT is a belief that FDT leads to better outcomes. That is what I find unconvincing. It seems to me that more realistic assumptions better model observations under CDT (e.g. Braess's paradox, to use an exampl I did elsewhere) and can lead to better outcomes. That was my central thesis. I do agree, that it is usually trivial to conceive of scenarios where any given theory loses to another in some sense.

Yes, thats why you use laws / precommitments to prevent it

Yes, but I would argue it is good to have mediating forces outside of laws. Derek can get either kf them to sign a contract before hand for a $1,000,199, but only FDT would say that they should honor that contract absent any mechanism to enforce it. While I don't think it can be proven, it seems sensible before considering enforcement mechanisms we should consider honoring contracts based on how much we value honesty, associated signals and other such considerations. It seems less sensible to say we should honor them based solely on value estimates of the entire scenario they fall under. It alao seems sensible, if we include enforcement mechanism, that such mechanism be aet up to prevent people not following contracts that are generally deemed not unreasonable and preventing unconscionable conditions from being imposed even on agents that rationally consented to them (as would be the case with the agents consenting to a 1,000,200 contract).

We are assuming Derek knows everything about Will right? So if Will changes his strategy based on his prior then Will knows that too.

You mean Derek knows it, right? But it doesn't change Will's value calculation, so it shouldn't change his strategy a priori even if he had a prior for what he thinks Derek would accept. He would change his decision if we assumed he knew how Derek was likely to price set and adapted his strategy on that, though.

[-]DanielW2mo10

While I didn't explore the general case, which would be easier to if/when I formalize the dilemma, my intuition from the specific thought experiment is that generally when faced with an adversary (that is less than optimally adversarial, as is more common in the real world) that has an asymmetric advantage over an agent (e.g., as in the above case the adversary is a price-setter with an ability to predict agent's decisions) if the agent has some non-zero social values, CDT agents do better than FDT agents.

From a policy perspective, it also seems more reasonable to imagine agents under CDT; policies aimed at aligning agent's causal expectations with optimal social outcomes seem more effective at addressing e.g. freeriders. Of course, decision theory looks at problems from the agents perspective, not how we should assume agents are likely to act from a policy perspective, but part of the theoretical utility is in developing models for behavior which can serve a policy perspective. And from there, CDT which is commonly implicit in agent based models, seems to work out better in practice. Braess's paradox which is derived from real world observations is easily explained assuming agents make decisions under CDT, but if we assumed agents acted under FDT, it wouldn't occur.

Edit: to be clear, Braess' paradox not occurring would be a good thing, if people made driving decisions in a way that optimized overall traffick that would be better. But in this world we live in it does occur. Also, it is noteworthy that if we imagine individual FDT agents, their utility would likely be unchanged and Braess's paradox would still occur, since they would make their decisions based on the empirically observed behavior of others who don't operate under FDT.

Moderation Log

Bringing things together, where does FDT differ and what is the utility of these in practice?

Case 1 Voting for an Underdog:

a. Say since I know other LDT agents decided not to vote assume we are similar and that LDT recommends not voting in this decision?

b. Say few people are like me, most people won't reason the way I do, so the odds of the vote being different are de minus and not vote on those grounds?

I don't think there is any clear way of judging these scenarios.

Case 2 Protest Voting:

In this case, for a CDT agent, the expected value of voting can simply be evaluated as personal utility + social utility - E.

^{^}
I was going to write out all the math, but I cannot be asked to figure out notation formatting here.
^{^}
This might be compared to a version of Newcomb's problem with transparent boxes where the predictor is known to be very inaccurate

[-]DanielW1mo10

I am curious, since I never got any replies but a few disagrees, is there any meaningful recommendations that FDT gives for voting that I may have missed?

[-]DanielW2mo*0-9

I have been thinking about this for a while, and will probably formalize it at some point, but would like to get some of your thoughts in case there is some obvious case/backgroun I am missing.

Now, let’s imagine making an earlier decision from Will's perspective. Will is evaluating the value of evaluating expected value under CDT or FDT, knowing this scenario will happen.

Under CDT he evaluates that deciding to be a CDT agent will make him better off by $9,999,800 so he decides to be a CDT agent.

Under FDT he evaluates that CDT agents will be better off by $9,999,800 so decides to be a CDT agent.

This is not per say a logical contradiction, but it seems rather counterintuitive that in some scenarios given the choice it would be rational to select an alternative model.

[-]Firmament2mo20

My knowledge of decision theory comes exclusively from reading relevant LessWrong posts when the mood takes me, but it seems to me that FDT-Will would instead act like this:

Assume Derek uses CDT and Will uses FDT (I do not know what would happen if both Derek and Will used FDT, but what happens would probably be the same as what happens when two FDT agents play the Ultimatum Game, since the situations are similar).
Imagine Derek demands $100,000,199.99 from Will.
When Will gets back to town, he reasons like so:
1. "If I implement the algorithm that always pays Derek money when he requests less than $100,000,200, then the copy of the algorithm living in Derek's mind when he was setting the price will also pay, meaning Derek will demand $100,000,199.99 from me, leaving me one cent better off."
2. "If I implement the algorithm that does not pay no matter what, then the copy of the algorithm living in Derek's mind when he was setting the price will also not pay no matter what, meaning Derek will not demand any money at all, since he predicts that I will not pay any sum of money he asks for. Derek will then drive me to town free of charge (since he wants me to live), leaving me $1,000,200 better off."
Will determines that the algorithm that doesn't pay money no matter what gives better outcomes, so he doesn't pay. Derek terminates the simulation and, in real life, gives Will a ride free of charge.
(Will then terminates that simulation and, in realer-life, decides to become an FDT agent)

Thus, FDT outperforms CDT, since CDT pays $199.99 and FDT pays nothing at all.

[-]DanielW2mo21

[-]Firmament2mo20

Ah, I see! So the scenario favors CDT only because Will lacks full information on the problem. Will thinks he's playing Parfit's Hitchhiker, but in reality he's playing Ultimatum.

I dunno, it doesn't really seem like a fair problem. You could construct a problem that unfairly favors CDT instead, like "Newcomb's Problem but, unbeknownst to you, if you one-box, you die".

[-]DanielW2mo*20

Exactly, it is unfair to Will under FDT (which is part of why I framed it from Derek's perspective), but I would argue it is a lot closer to what we see in the real world.

You could construct a problem that unfairly favors CDT instead, like "Newcomb's Problem but, unbeknownst to you, if you one-box, you die".

[-]Firmament2mo10

So the moral of the story is "run FDT, but emulate CDT if you live near a lot of information-asymmetric FDT-hostile problems"

You could construct a problem that unfairly favors CDT instead, like "Newcomb's Problem but, unbeknownst to you, if you one-box, you die".

Claude Opus 4.6 Extended weighs in with what I believe to be insightful and correct critique (oneshot/no selection; told to "focus on logical errors"; I'm "Poster 2" and you're "Poster 1")

There are a few errors and confusions here that neither poster fully identifies:

Poster 2 gestures at this ("FDT can emulate CDT") but doesn't make the sharper point that this "emulation" is just FDT functioning correctly at the meta-level.

[-]DanielW2mo20

Hm, I guess it's an empirical question then

People in low-trust FDT-hostile communities will grow up running CDT, while people in high-trust communities grow up running FDT.

Parfit's Hitchhiker is a more realistic yet isomorphic framing of Newcomb's Problem

So the moral of the story is "run FDT, but emulate CDT if you live near a lot of information-asymmetric FDT-hostile problems"

I mean, I guess that is not unfair, but it seems bad. It would imply honest signals are good for CDT agents and dishonest signals are good for FDT agents.

Also, Claude is wrong or missing the point.

Under CDT, the $200 honesty value is the entire mechanism by which Derek can extract payment

If you set honesty to $0, CDT-Will pays nothing

FDT-Will still pays up to $1M and Derek extracts near-maximal surplus.

The information asymmetry is contrived in a way that's load-bearing but presented as "realistic."

Poster 1 claims Will can't do this because he doesn't know Derek is modeling him

The question of whether FDT requires knowledge that you're being modeled to apply counterfactual reasoning is a genuinely deep question about FDT's foundations, and it's where the real philosophical action is.

No it isn't, FDT plainly doesn't require that. FDT gives different outputs if you are being modeled in so far as how the agent is being modeled affects how the agent assesses the values at issue.

[-]Firmament2mo20

I gave some reasons we might think (in real world scenarios) that CDT tends to better explain behavior (e.g., Braess's paradox), though I do not believe it is one we can have enough data to answer for all time.

Indeed, whatever humans do seems to be closer to CDT than other decision theories, although humans use various concepts like justice, trust, and honor to approximate FDT occasionally.

People in low-trust FDT-hostile communities will grow up running CDT, while people in high-trust communities grow up running FDT.
I disagree. I think the difference in high and low trust societies can largely be attributed to differences in utility functions and signaling effects. This appears empirically true (high-trust societies tend to have more information on each others activities, making signaling more meaningful, tend to poll as valuing trust, etc).

Also, the problem isn't "FDT-hostile" per-say. Derek is just trying to maximize his utility, he only cares what value he thinks he can get Will to honestly pay him. FDT does worse because it recommends taking the deal in Parfit's hitchhiker and doesn't have a posterior restraint on honest signaling.
If Derek was a misanthrope, Will could lose under CDT. If Derek valued saving Will's life at -195 (instead of +6), Derek would leave CDT Will to die and still save FDT Will. This is why I predicated that Derek has to be a decent person, he prefers a scenario where everyone wins to one where he wins and Will dies.

Parfit's Hitchhiker is a more realistic yet isomorphic framing of Newcomb's Problem
Minor nitpick but Parfit's hitchiker is isomorphic under CDT and to Newcomb's Problem with transparent boxes under both CDT and EDT but not Newcomb's problem in general. CDT doesn't care about evidentiary probability, but if you don't know what is in the boxes EDT says you should act probabilistically.

Oops, I wasn't aware that there was a distinction between the transparent-box version and the opaque-box version. Thank you for the correction.

So the moral of the story is "run FDT, but emulate CDT if you live near a lot of information-asymmetric FDT-hostile problems"
I mean, I guess that is not unfair, but it seems bad. It would imply honest signals are good for CDT agents and dishonest signals are good for FDT agents.

Under CDT, the $200 honesty value is the entire mechanism by which Derek can extract payment
Yes, this is the point. In the real world, for some value people are unwilling to cheat and for some value people are willing to cheat. The number of people, on seeing someone drop a $5 bill, that would return it to the person is greater than the people who seeing them drop a $100 bill would return it. This is a pretty trivial observation. Honesty in the real world is dependent on external signaling effects (i.e. people know others will be more honest with them if they are seen to be honest with others) and individual values (i.e., people are more honest because they ascribe some nebulous value to being honest, people who ascribe a greater value to honesty will be more honest than those who ascribe less of a value).

True, the $200 honesty-value seems to be there just to make CDT act more-FDT-like.

If you set honesty to $0, CDT-Will pays nothing
Derek still saves them

Dang, you're right, I really should have noticed that.

But the point of the hypothetical is to assume more normal human values. Most humans don't like killing people and don't like being dishonest.

Wait, does the $200 honesty-value actually matter here? It doesn't seem like it changes the results of the hypothetical if you remove it, and removing it would make it easier to reason about.

FDT-Will still pays up to $1M and Derek extracts near-maximal surplus.
Which seems suboptimal. Extracting all of the surplus value for little effort doesn't seem like a good thing. In society we ideally want to have parties be able to negotiate how to best divide the surplus according to various principles and social values. FDT-Will can be taken advantage of because he doesn't have realistic human values. Similarly, in the typical formulation the driver can leave the hitchiker to die and the hitchhiker will be dishonest at any dollar figure because they have unrealistic utility functions.

It seems that if Derek didn't value honesty so highly that he wouldn't stick to his first offer and they would be able to come to a fairer deal. But this would be bad for Derek.

As I said in my OP, my comment was to make more realistic but still keep most of the simplifying assumptions. You can add more variables to make it more realistic--the more you add the more complex it becomes to model.

Poster 1 claims Will can't do this because he doesn't know Derek is modeling him
That wasn't what I was claiming. Did you have the same confusion? The problem Will has is he modelling to pay Derek or not based on the assumption that if he doesn't pay Derek he would have been not saved (i.e., exactly how FDT models Will's decision under the traditional Parfit's dilemma).

The question of whether FDT requires knowledge that you're being modeled to apply counterfactual reasoning is a genuinely deep question about FDT's foundations, and it's where the real philosophical action is.
No it isn't, FDT plainly doesn't require that. FDT gives different outputs if you are being modeled in so far as how the agent is being modeled affects how the agent assesses the values at issue.

[-]DanielW2mo20

humans use various concepts like justice, trust, and honor to approximate FDT occasionally.

The differences in utility functions are the humans' way of implementing FDT
...
FDT-Will effectively values honesty at infinity, while CDT-Will values it at $200.
...

True, the $200 honesty-value seems to be there just to make CDT act more-FDT-like.
...
Wait, does the $200 honesty-value actually matter here?
...
Anyways, I don't see the value in making the scenario complex.

Oops, I wasn't aware that there was a distinction between the transparent-box version and the opaque-box version. Thank you for the correction.

I am confused as to what you mean. CDT and FDT-emulating-CDT act the same, so they're equally honest and get equally as much benefit from honesty. Is this about this specific problem? Or all similar problems? But this doesn't seem to be load-bearing, so, whatever.

in the scenario Derek is basically running FDT and using it to gain an advantage by precommitting to a single offer (the scenario explicitly says that negotiation is impossible, but in real life this would only happen if Derek is using some weird nonconventional negotiation tactics, and using a massive value on trust to precommit to the first offer made is one such tactic).

I'm having trouble comprehending this and should probably get some sleep, but it would seem Claude is being weird and overconfident here so I hereby downgrade my overconfident endorsement of Claude's outputs from "insightful and correct" to "looks right in some places but makes mistakes or is overconfident in other places".

Fair enough, same here. Have a good night!

[-]Firmament2mo20

humans use various concepts like justice, trust, and honor to approximate FDT occasionally.
I don't think it is approximating FDT. I think it is just different values. Laws and policies may make CDT agents approximate what FDT agents would do without those laws, but that is not what I mean. Real humans have complex sets of desires/utility functions.

The differences in utility functions are the humans' way of implementing FDT
...
FDT-Will effectively values honesty at infinity, while CDT-Will values it at $200.
...
I think this may be a central confusion. You are misunderstanding the hypothetical somewhat. FDT-Will values honesty at $200. He and CDT-Will would both be willing to be dishonest if it got him a >$200 payoff. To take my prior example, if someone dropped a $100 bill, he would return it. But if they dropped >$200, he would pocket it. The reason he is willing to pay up to the value of his life +$200 is because his assessment of the value is not based on how he values honesty, it is only based on how he expect agents like himself to be treated.

in the scenario Derek is basically running FDT and using it to gain an advantage by precommitting to a single offer (the scenario explicitly says that negotiation is impossible, but in real life this would only happen if Derek is using some weird nonconventional negotiation tactics, and using a massive value on trust to precommit to the first offer made is one such tactic).
Real life negotiation often is impossible or extremely costly. Negotiating prices after a hospital visit can cost many man hours. Most retailers won't allow you to negotiate at all. Setting prices and comitting to them is a pretty conventional tactic. But if you allow negotiation Derek is still likely to get an outsized payment for something he would have done for free. Paying him seems to have some social utility (we want to encourage people to be decent and help others), but limiting it to amounts consisered feasible by social values and honest signals (as under CDT), seems likely to lead to better outcomes than msking the restraints equivilant to the total value of the interaction (which is the restraint under FDT).

[-]DanielW2mo20

I mean, my intuition says that the right utility function can turn any CDT agent into an FDT agent, and any FDT agent can be described in terms of a CDT agent with a certain utility function.

So if both Derek and Will were running CDT, but they were to value honesty at infinity dollars,

In that case, Derek could demand infinite dollars from Will and Will would pay it.

This seems like the crux of the issue. I think that the whole reason Will is in this mess is because Derek places a value of $1,000,000 on trust for some reason, making him act exactly like an FDT agent

If the tables were turned, and we got rid of the "no negotiation allowed" rule, and Derek was a $0 honesty

Derek's honesty value doesn't affect those scenarios. In a negotiation, the turn order, information asymmetry etc determine who wins.

So the moral of the story is that whoever runs FDT wins

PEr the original problem, Derek's optimal move is identical under CDT and FDT. How much he can get from Will is the only variable which depends on Will's utility calculations.

[-]Firmament2mo10

The CDT and FDT agents have the same utility functions and behave differently. Of course, if you gave them different tailored utility functions you could get them to behave the same in any given case, but that doesn't seem very sensible, imo.

So if both Derek and Will were running CDT, but they were to value honesty at infinity dollars,
In that case, Derek could demand infinite dollars from Will and Will would pay it.

[the rest of the reply]

And meanwhile if Derek's $1,000,000 value on honesty is set to $0 BUT he uses FDT then the exact same thing happens absent any weird commitment-race dynamics with FDT-Will

[-]DanielW2mo10

Payoffs given FDT-Will with Negotiation:

(1) Will accepts the initial offer (for FDT-Will, X = 1,000,199.99):

Derek: 1 + 1,000,199.99
Will: 0 * -1,000,000 + 1 * -999,999.99

(2) Will Contests and Derek Accepts (say X = -0.99^[1]):

Derek: 1 - 0.99
Will: P(Derek rejects)*-1,000,0000 + P(Derek accepts)*(200.99) + P(Derek contests)*( (3) Will)

(3) Will and Derek contest over X. X is unspecified under the assumptions, any number where X > (C - 1) and X < (1,000,200 - C) is feasible:

Derek: 1 + X - C
Will: P(Derek rejects)^[2]*-1,000,0000 + P(Derek accepts)*(200- X) - C

Counterfactual: Derek doesn't offer an amount and Will doesn't contest (X = 0)

Derek: 1
Will: 1,000,200

Payoffs given CDT-Will with Negotiation:

(1) Will accepts the initial offer (for CDT-Will, X = 199.99, since anything greater wouldn't be paid):

Derek: 1 + 199.99
Will: 0 * -1,000,000 + 1 * 0.01

(2) Will Contests and Derek Accepts (X = -0.99):

Derek: 1 - 0.99
Will: P(Derek rejects)*-1,000,0000 + P(Derek accepts)*(200.99) + P(Derek contests)*( (3).Will)

(3) Will and Derek contest over X. X is unspecified under the assumptions, any number where X > (C - 1) and X < (200 - C) is feasible:

Derek: 1 + X - C
Will: P(Derek rejects)*-1,000,0000 + P(Derek accepts)*(200- X) - C

Counterfactual: Derek doesn't offer an amount and Will doesn't contest (X = 0)

Derek: 1
Will: 1,000,200

^{^}
Realistically, Will could set a value of X higher to decrease the chances of Derek contesting. But I am just assuming the extreme case here.
^{^}
These probabilities depend on the values of X. FDT-Will would estimate that as x approaches 1,000,199.99 P(Derek accepts) approaches 1.

[-]DanielW2mo10

What I mean is that you can think of "CDT agent with certain utility function" and "FDT agent" as exactly the same. They're the same concept.

They are not. A CDT agent is fundamentally doing a different expected value calculation than a FDT agent. This is why they can lead to radically different outcomes.

I should have clarified more, oops. I was talking about a minor variation of the scenario where the "negotiation is not possible" restriction is lifted (while still keeping the information asymmetry somehow).

Okay, play out the scenario. He offers to take CDT will back for $199.99, what does will say? Will's expected values are:

"Ok" - payoff = $0.01
"No" - payoff = -$1,000,000 (this is assuming an honest/total no, the other case is under 3)
"No take me for X amount." - payoff = ???^[1] (he doesn't know whether Derek will accept or not, note that this includes the case where X is zero or the cases where X is negative).

Let's imagine Derek offers to take FDT will back for $1,000,199.99. Will's expected values are:

Be an agent that would say "OK" - payoff = -$999,999.99
Be an agent that would say "No" - payoff = -$1,000,000
Be an agent that would say "no take me for X amount" - payoff = ???
Edit: Be an agent that would say "Ok" but not actually pay Derek - payoff = -$1,000,000 (I realize I forgot to include this one, as in Parfit's Hitchiker, the agents' expected outcome if they wouldn't pay Derek honestly is that they would be left to die)^[2]

And meanwhile if Derek's $1,000,000 value on honesty is set to $0 BUT he uses FDT then the exact same thing happens absent any weird commitment-race dynamics with FDT-Will

I am not convinced this is true. I don't see why FDT-Derek would behave differently. If we assume information symmetry, then you get the same commitment race with their priors.

FDT-Will then says "give me $0.99 and I'll let you save my life"

and poor CDT-Derek will agree.

Why? Let's lay out CDT-Derek's option.

Agree - payoff = -$5.99
Refuse - payoff = -$6.00
Refuse and tell FDT-Will "I will only take you back for X amount" (where X is greater than -0.99) - payoff = unspecified (but known to Derek)

^{^}
When I use '???' I mean that it is both unspecified under the assumptions of the equations and unknown to the agent. We would need to add additional specifications to the problem to determine the expected payoff for different values of X, and without changing assumptions Will's expected payoff from X would remain unknown to Will. We could add assumptions for Will's estimates (which are not likely to be equivalent to the real payoffs), to determine what Will would estimate the expected payoffs are for different values of X.
^{^}
I am not including this option for the CDT agent since it is a strictly inferior version of 1, since their payout for honesty is $200 it is trivial that they would always be honest for under $199.99
^{^}
If there is no such cutoff, they are in a classic battle of the sexes type problem FDT-Will's expected payoff from a deal is $1,000,200 - X, where Derek's payoff is $1 + X. It is not clear what FDT-Will's position would have to be for him to expect CDT-Derek to accept a better deal. Any deal from -0.99 to $1,000,199.99 is feasible under our assumptions (and would be a Nash Equilibrium) but we have no reason to expect any outcome in that range without adding some assumptions.

[-]Firmament2mo10

[-]papetoast2mo10

[-]DanielW2mo2-5

[-]papetoast2mo10

I am again speaking from intuition only and don't want to put more time thinking about this for now. I may not even endorse what I say if I put 5 minutes into thinking.

when we assume non-telepaths we get FDT losing by amounts dependent on the degree of information asymmetry

This seems like a good thing

For CDT, lacking retro-causality, they will only be willing to pay up to whatever their honesty value and signaling value is (i.e. less than the $200 for Will). For the FDT agent, they will be willing to pay up to whatever they value the totality of the outcomes (live and pay vs. die and don't).

This means CDT-Will will die if Derek' has a different utility function and is only willing to drive them home for $201+? This is the "other" universes I'm talking about.

In an even more realistic scenario, Will should have a prior for the minimum amount Derek is willing to get to drive them home. I expect this would make FDT-Will get some better calculations.

[-]DanielW2mo10

This seems like a good thing

Why?

This means CDT-Will will die if Derek' has a different utility function and is only willing to drive them home for $201+? This is the "other" universes I'm talking about.

In an even more realistic scenario, Will should have a prior for the minimum amount Derek is willing to get to drive them home.

^{^}
E.g., as the first example that comes to mind, let's say your child has been kidnapped. Your kidnapper just happened to capture your child, by pure chance not intentionally, but you have no way to know that. You think that paying off blackmailers makes it more likely you will be blackmailed. The blackmailer demands a payment (lets say there is an escrow and they cannot cheat), but you, as an FDT agent, decline to negotiate. So the blackmailer kills your kid and disappears. A CDT agent pays the blackmailer, not considering the odds their decision may have on them being blackmailed. Unlike the decent driver, which assumes a lack of information, this assumes a true mistake on the FDT part to be truly worse off. Edit: though you can get individual agents to be worse off under FDT in the standard blackmail dilemma, for this case I am pre-assuming true randomness, in which case FDT would pay if they thought it was truly random as such but would still refuse to pay if they were acting under a, in this case mistaken, assumption that agents that didn't pay would be extremely unlikely to be blackmailed.

[-]papetoast2mo*20

This seems like a good thing
I don't think that is actually a good outcome to be able to impose any contract on a disadvantaged party

Yes, thats why you use laws / precommitments to prevent it. I guess I used "good" and that misled you a bit, I think it is game theoretically good, not morally ideal.

But I do not think that is a realistic state of affairs and I think on the flip side you can get asymmetric information causing FDT agents to behave sub optimally when presented with misanthropic actors.

What his prior is, however, is irrelevant, he is not offered that price and doesn't get to proposition Derek.

We are assuming Derek knows everything about Will right? So if Will changes his strategy based on his prior then Derek knows that too.

[-]DanielW2mo10

I think you are intuiting the question of "which DT is better" using the real world too heavily in a sort of "I think a world where people all do this is better" -> "this DT is better" kind of way. You can't just hope things work out this way.

Mostly fair, as i think you said elsewhere, i think I misunderstood you as making a value claim when you meant better in some other terms.

Yes, thats why you use laws / precommitments to prevent it

We are assuming Derek knows everything about Will right? So if Will changes his strategy based on his prior then Will knows that too.

[-]DanielW2mo10

Moderation Log