Stuart_Armstrong's Comments

ACDT: a hack-y acausal decision theory

If the predictor is near-perfect, but the agent models itself as having access to unpredictable randomness, then the agent will continually try to randomize (which it calculates has expected utility 1), and will continually lose.

It's actually worse than that for CDT; the agent is not actually trying to randomise, it is compelled to model the predictor as a process that is completely disconnected from its own actions, so it can freely pick the action that the predictor is least likely to pick - according to the CDT's modelling of it. Or pick zero in the case of a tie. So the CDT agent is actually deterministic, and even if you gave it a source of randomness, it wouldn't see any need to use it.

The problem with the previous agent is that it never learns that it has the wrong causal model. If the agent is able to learn a better causal model from experience, then it can learn that it is not actually able to use unpredictable randomness, and so it will no longer expect a 50% chance of winning, and it will stop playing the game.

[...] then it can learn that the predictor can actually predict the agent successfully, and so will no longer expect a 50% [...]

Predictors exist: CDT going bonkers... forever

Have they successfully formalised the newer CDT?

ACDT: a hack-y acausal decision theory

That's annoying - thanks for pointing it out. Any idea what the issue is?

ACDT: a hack-y acausal decision theory

I don't quite see why the causality is this flexible and arbitrary.

In stories and movies, people often find that the key tool/skill/knowledge they need to solve the problem, is something minor they picked up some time before.

The world could work like this, so that every minor thing you spent any time on would have a payoff at some point in the future. Call this a teleological world.

This world would have a different "causal" structure to our own, and we'd probably not conceive traditional CDT agents as likely in this world.

Predictors exist: CDT going bonkers... forever

I'm claiming that this post is conflating an error in constructing an accurate world-map with an error in the decision theory.

The problem is not that CDT has an inaccurate world-map; the problem is that CDT has an accurate world map, and then breaks it. CDT would work much better with an inaccurate world-map, one in which its decision causally affects the prediction.

See this post for how you can hack that:

Predictors exist: CDT going bonkers... forever

I'm using CDT as it's formally stated (in, eg, the FDT paper).

The best defence I can imagine from a CDT proponent: CDT is decision theory, not game theory. Anything involving predictors is game theory, so doesn't count.

Predictors exist: CDT going bonkers... forever

CDT would fight the hypothetical, and refuse to admit that perfect predictors of their own actions exist (the CDT agent is perfectly fine with perfect predictors of other people's actions).

When Goodharting is optimal: linear vs diminishing returns, unlikely vs likely, and other factors

I think normalisation doesn't fit in the convex-concave picture. Normalisation is to avoid things like being seen as the same as .

Load More