The following is an adaptation of part of my (unsuccessful) application to the Winter 2022 cohort of SERI MATS, as prompted by Victoria Krakovna's "What is your favorite definition of agency, and how would you apply it to a language model?"
This is far from a perfect treatment of the topic, but I am publishing it to document my early days in the field. Or should I say "Epistemic status: I am learning" 😉.
I am interested in this question because it is one which came up over the summer when I got more interested in AI Safety. In my readings of LessWrong and technical AI Safety Research surveys, I started to question the need for agency in AI models in the first place. It seemed to me that most hypotheses about future AGI scenarios involved a notion of agency in the model(s) considered, but it was unclear to me what advantages developing agentic models would provide to humans. I thought that if agency was not essential, then perhaps a “compromise” between safety and AGI capabilities could be achieved by relying on oracle-like models, devoid of agency but useful for guidance and whose answers and recommendations we could ultimately reject[1]. I asked a related question on the AI stackexchange forum but unfortunately was met with confusion and ultimately my question was closed as I had made the very obvious mistake of not defining my terms. In an attempt (which never succeeded) to resuscitate the discussion, I proposed the following definition of agency:
I define agency as the ability to autonomously perceive and interact with a given environment. Anything capable of agency is then an agent. Furthermore, I define
- autonomous perception: the ability to perceive a given environment without the need of an external agent
- interaction: the ability to change the state of the environment
I liked (and still like) my definition, but admittedly I have not tried evaluating it[2]. One issue is that I did not develop it in isolation: my bias was that language models are tools, not agents and purposely defined agency backwards from this claim. As such it is (perhaps artificially) difficult to apply this definition of agency to language models. The choice of environment for a language model is trivial: it can be anything whose state can be represented (noisily) through language. The “interaction” requirement of the definition can in a sense be met by a LM capable of influencing and influencing the actions of agents that form and can in their turn form the LM’s environment. For example, when prompted, an LM could output a recipe which could then be realised in the real world by a human. In this sense, the LM is interacting with the environment, albeit indirectly. The “autonomous perception” requirement is instead a bit more difficult to be met by a LM. A LM’s perception of its environment is limited to the textual inputs that someone has to provide. With multimodal LMs, the inputs are no longer limited to text, but still need to be inputted by someone. It is only once a LM is coupled with something capable of agency (a human, or say, a RL policy guiding sensors) that it can receive inputs and perceive its environment.
Of the readings suggested, my favorite is Barandian et al.’s definition of agency [1], reported in [2], as it seemed the most focused and less brittle to confusion with other terms such as “goal-directedness” and “optimization” which may have overlap with agency but may also not be entirely the same. The definition is as follows:
agency involves, at least, a system doing something by itself according to certain goals or norms within a specific environment.
In more detail, agency requires:
- Individuality: the system is capable of defining its own identity as an individual and thus distinguishing itself from its surroundings; in doing so, it defines an environment in which it carries out its actions;
- Interactional asymmetry: the system actively and repeatedly modulates its coupling with the environment (not just passive symmetrical interaction);
- Normativity: the above modulation can succeed or fail according to some norm.
It is once again difficult to recognise agency in language models under this definition. We can work through the requirements and see why. Note that I will treat each requirement assuming the other two requirements are satisfied. However do also note that the authors indicate that each requirement is co-dependent on the satisfaction of the others.
In general, it seems difficult to reconcile satisfactory definitions of agency and viewing language models as agents. I am nevertheless very interested in the question, and do not rule out more appropriate definitions being developed and tested, and different interpretations and understandings of language models that I have not yet considered. I think this is an important question to consider, given the possibility of agency emerging in models not explicitly designed to be agentic. Having a definition of the word could aid us in recognising such a scenario.
[1] Barandiaran XE, Di Paolo E, Rohde M. Defining Agency: Individuality, Normativity, Asymmetry, and Spatio-temporality in Action. Adaptive Behavior. 2009;17(5):367-386. doi:10.1177/1059712309343819
[2] Literature Review on Goal-Directedness
I recognise this is somewhat naive - a sufficiently intelligent oracle could provide answers persuasive enough that we don't even consider their rejection. ↩︎
Victoria had to evaluate it as part of my application, and said the following: "your definition of agency was a bit circular - you defined agency in terms of autonomy, and autonomy in terms of not needing an external agent." I agree this is valid criticism. ↩︎