It seems really hard to deceive a Bayesian agent who thinks you may be deceiving them, especially in a repeated game. I would guess there could be interesting theorems about Bayesian agents that are attempting to deceive one another; as in, in many cases their ability to deceive the other would be highly bounded or zero, especially if they were in a flexible setting with possible pre-commitment devices.

To give a simple example, agent A may tell agent B that they believe ω=0.8, even though they internally believe ω=0.5. However, if this were somewhat repeated or even relatively likely, then agent B would update negligibly, if at all, on that message.

8habryka5moBayesian agents are logically omniscient, and I think a large fraction of
deceptive practices rely on asymmetries in computation time between two agents
with access to slightly different information (like generating a lie and
checking the consistencies between this new statement and all my previous
statements)
My sense is also that two-player games with bayesian agents are actually
underspecified and give rise to all kinds of weird things due to the necessity
for infinite regress (i.e. an agent modeling the other agent modeling themselves
modeling the other agent, etc.), which doesn't actually reliably converge,
though I am not confident. A lot of decision-theory seems to do weird things
with bayesian agents.
So overall, not sure how well you can prove theorems in this space, without
having made a lot of progress in decision-theory, and I expect the resolution to
a lot of our confusions in decision-theory to be resolved by moving away from
bayesianism.

Hm... I like the idea of an agent deceiving another due to it's bounds on computational time, but could imagine many stable (though smaller) solutions that wouldn't. I'm curious if a good bayesian agent could do "almost perfect" on many questions given limited computation. For instance, a good bayesian would be using bayesianism to semi-optimally use any set of computation (assuming it has some sort of intuition, which I assume is necessary?)

On being underspecified, it seems to me like in general our models of agent cognition forever have been pretty under

It seems really hard to deceive a Bayesian agent who thinks you may be deceiving them, especially in a repeated game. I would guess there could be interesting theorems about Bayesian agents that are attempting to deceive one another; as in, in many cases their ability to deceive the other would be highly bounded or zero, especially if they were in a flexible setting with possible pre-commitment devices.

To give a simple example, agent A may tell agent B that they believe ω=0.8, even though they internally believe ω=0.5. However, if this were somewhat repeated or even relatively likely, then agent B would update negligibly, if at all, on that message.

Hm... I like the idea of an agent deceiving another due to it's bounds on computational time, but could imagine many stable (though smaller) solutions that wouldn't. I'm curious if a good bayesian agent could do "almost perfect" on many questions given limited computation. For instance, a good bayesian would be using bayesianism to semi-optimally use any set of computation (assuming it has some sort of intuition, which I assume is necessary?)

On being underspecified, it seems to me like in general our models of agent cognition forever have been pretty under

... (read more)