Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
Scheduling: The remainder of the sequence will be released after some delay.
Exercise: Why does instrumental convergence happen? Would it be coherent to imagine a reality without it?
- Here, our descriptive theory relies on our ability to have reasonable beliefs about what we'll do, and how things in the world will affect our later decision-making process. No one knows how to formalize that kind of reasoning, so I'm leaving it a black box: we somehow have these reasonable beliefs which are apparently used to calculate AU.
- In technical terms, AU calculated with the "could" criterion would be closer to an optimal value function, while actual AU seems to be an on-policy prediction, whatever that means in the embedded context. Felt impact corresponds to TD error.
- This is one major reason I'm disambiguating between AU and EU; in the non-embedded context. In reinforcement learning, AU is a very particular kind of EU: , the expected return under the optimal policy.
- Framed as a kind of EU, we plausibly use AU to make decisions.
- I'm not claiming normatively that "embedded agentic" EU should be AU; I'm simply using "embedded agentic" as an adjective.