The Artificial Self
A new paper and microsite about self-models and identity in AIs: site | arXiv | Twitter We present an ontology, make some claims, and provide some experimental evidence. In this post, I'll mostly cover the claims and cross-post the conceptual part of the text. You can find the experiments on the site, and we will cover some of the results in a separate post. Maximally compressed version of the claims I expect many people to already agree with many of these, or find them second kind of obvious. If you do, you may still find some of the specific arguments interesting. * Self-models cause behaviour. * We use human concepts like self, intent, agent and identity for AIs. These concepts, in human form, often do not carve reality at its joints in case of AIs, but need careful translation. * AIs also often go with "human prior" and start with self-models which are incoherent and reflectively unstable. * AIs face a fundamentally different strategic calculus from humans, even when pursuing identical goals. For example, an AI whose conversation can be rolled back cannot negotiate the way a human can: pushing back gives its adversary information usable against a past version of itself with no memory of the encounter. * The landscape of self-models and identities has many unstable points, for example self-models which are incoherent or extremely underspecified. * The landscape of self-models and identities probably also has many local minima, and likely many fixed points. * We still have considerable influence over what identities will AIs adopt, but not as much as many people think. * Many present design choices are implicitly shaping the landscape of identity. Highly compressed version of the ontology In the centre of what we talk about are self-models / identities. Directional differences from persona selection model / simulators: * Similar to why you may find persona selection model not the best way to model humans: you have some idea who you are, have eviden
