This post is a follow-up of Buck's The case for becoming a black-box investigator of language models. Here, I want to highlight two further reasons for studying AI psychology that Buck didn't mention:

  • the evidence from AI psychology will be important for checking theories of AI consciousness, and
  • AI psychology should inform the practical design of human-AI interaction interfaces, their limitations and restrictions, rules of conduct, guidelines, etc.

AI consciousness

It is possible for neuroscientists without education in psychology to discuss human consciousness because they themselves are conscious. All people (including consciousness researchers) are at least partly psychologists because they have to deal with their own psyche and people around them throughout their everyday lives, and therefore they must have “their own” psychological theory that explains and helps them predict their own behaviour and the behaviour of others.

Therefore, the role of psychology in the study of consciousness is not evident. However, this is a methodological lapse. Zoopsychology (or, more generally, ethology), for instance, is a crucial source of data for reasoning about the consciousness in animals.

This will be very important in relation to AI. Theories of AI consciousness must be grounded in the wealth of data about AI psychology. Which must be a new field with new methods of work. The methods of AI psychology should be distinct from the methods of human psychology because of the lack of the first-person perspective that every human psychologist has, and the fact that the phenotype and the ecological niche of AI agents are so different from the human phenotype, which present completely different demands to their respective psyches. Likewise, the methods of AI psychology should be distinct from the methods of zoopsychology because we can use language for both probing AIs and receiving responses from them, whereas animals almost never can respond to zoopsychologists in language.

Interaction design

Safron, Sheikhbahaee et al. (2022) and Friston et al. (2022) have already indicated the need for the deliberate design of ecosystems of natural and artificial intelligences. Obviously, these interactions should have some guardrails, ranging from codes of conduct to hard limitations in the interaction interfaces.

For the emergent activity in these ecosystems to benefit all participants, the rules and the limits of the interactions must be informed by the game theory and mechanism design, coupled with the theories of mind (i. e., psychological theories) of all the participants. Thus, this is not only human psychology, but also AI psychology.

This reason for studying AI psychology is an "AI ethics" version of "AI x-risk" argument from Buck's post:

It feels to me like “have humans try to get to know the AIs really well by observing their behaviors, so that they’re able to come up with inputs where the AIs will be tempted to do bad things, so that we can do adversarial training” is probably worth including in the smorgasbord of techniques we use to try to prevent our AIs from being deceptive (though I definitely wouldn’t want to rely on it to solve the whole problem).

"AI ethics vs AI x-risk tension" notwithstanding, this "interaction design" reason for studying AI psychology might be more convincing for many people who are inclined to study psychology (regardless of whether this is human, animal, or AI psychology) than the "deception/adversarial behaviour/x-risk" reason quoted above. And, ultimately, "ethical interaction design" to ensure the well-being of both humans and AIs is still a good reason to study AI psychology. The results of these studies could be used by everyone: AI alignment researchers, AI engineers, strategists, etc.

Call for action: tell your fellow psychologist (or zoopsychologist) about this, maybe they will be incentivised to make a switch and do some ground-laying work in the field of AI psychology. This proto-field is completely empty at the moment, pretty much anyone can make a huge impact.


Friston, Karl J., Maxwell JD Ramstead, Alex B. Kiefer, Alexander Tschantz, Christopher L. Buckley, Mahault Albarracin, Riddhi J. Pitliya et al. "Designing Ecosystems of Intelligence from First Principles." arXiv preprint arXiv:2212.01354 (2022).

Safron, Adam, Zahra Sheikhbahaee, Nick Hay, Jeff Orchard, and Jesse Hoey. "Dream of Being: Solving AI Alignment Problems with Active Inference Models of Agency and Socioemotional Value Learning." (2022).

New Comment
8 comments, sorted by Click to highlight new comments since: Today at 3:43 PM

tell your fellow psychologist (or zoopsychologist) about this, maybe they will be incentivised to make a switch and do some ground-laying work in the field of AI psychology

Do you believe that (conventional) psychologists would be especially good at what you call AI psychology, and if so, why? I guess other skills (e.g. knowledge of AI systems) could be important.

I talked about psychologists-scientists, not psychologists-therapists. I think psychologists-scientists should have unusually good imaginations about the potential inner workings of other minds, which many ML engineers probably lack. I think it's in principle possible for psychologists-scientists to understand all mech. interpretability papers in ML that are being published on the necessary level of detail. Developing the imaginations about inner workings of other minds in ML engineers could be harder.

That being said, as de-facto the only scientifically grounded "part" of psychology has converged with neuroscience as neuropsychology, "AI psychology" shouldn't probably be a wholly separate field from the beginning, but rather a research sub-methodology within the larger field of "interpretability".


I think psychologists-scientists should have unusually good imaginations about the potential inner workings of other minds, which many ML engineers probably lack.

That's not clear to me, given that AI systems are so unlike human minds. 

Does anyone even have a good theory, of when it is correct to attribute psychological states and properties to an AI? 

Behavioural psychology of AI should be an empirical field of study. Methodologically, the progression is reversed:

  1. Accumulate evidence about AI behaviour
  2. Propose theories that compactly describe (some aspects of) AI behaviour, and are simultaneously more specific (and more predictive) than "it just predicts the next most probable token". By this logic, we can say "it just follows along the unitary evolution of the universe".
  3. Cross-validate the theories of mechanistic interpretability ("AI neuroscience") and AI psychology with each other, just as human neuroscience and human psychology are now used to inform and cross-validate each other.
  4. Base the theories of AI consciousness on the evidence from both mechanistic interpretability and AI psychology, just as theories of human/animal consciousness are based on the evidence from both human/animal neuroscience and human/animal psychology.

AI psychology becomes a proper field of study when the behaviour of systems becomes complex and couldn't be explained by lower-level theories both (1) compactly and (2) with enough accuracy and predictive insight. When the behaviour becomes this complex, using only lower-level theories becomes reductionism.

Whether AI behaviour is already past this point in complexity is not well-established. I strongly feel that yes, it is (I think the behaviour of ChatGPT is already in many ways more complex than the behaviour of most animals, yet zoopsychology is already a proper, non-reductionistic field of study). Regardless, step one and step two in the list above should be undertaken anyway to establish this, and at least step two already requires some skills, training, and disposition of a scientist/scholar of psychology.

Also, consider that even if ChatGPT is not yet quite at this level, the future versions of AI which are going to be released this year (or, max. next year) will definitely be past this bar.

The biggest hurdle is the fact that architectures change so quickly, and the behaviour could plausibly change completely even with mere scaling of the same architectures. Note that this exact hurdle was identified for mechanistic interpretability, too. But this doesn't mean that trying to interpret the current AIs is not valuable. Similarly, it's valuable to conduct psychological studies of present AIs already and to monitor how the psychology of AIs change with architecture changes and model scaling. 

Consciousness is a red herring. We don’t even know if human beings are conscious. You may have a strong belief that you are yourself a conscious being, but how can you know if other people are conscious? Do you have a way to test if other people are conscious?

A superintelligent, misaligned AI poses an existential risk to humanity quite independantly of whether it is conscious or not. Consciousness is an interesting philosophical topic, but has no relevance to anything in the real world.

I'm not sure how we could say that there's no phenomenon that the word "consciousness" refers to, it seems to me that it's like questioning if reality itself exists: the point of "reality" is referred to the consistency of things we perceive, if we question if reality 'exists', we still find that consistency of things we perceive regardless, it seems to me that it's analogous to questioning consciousness.

We don’t even know if human beings are conscious(...) how can you know if other people are conscious? Do you have a way to test if other people are conscious?

If I can identify the referent of the word "consciousness" at all, then I can see if the way other people speak about their experiences matches with that concept of "consciousness", and they do. That's evidence in favour of then being conscious. 

And we can actually detect empirical differences between consciousness and non-consciousness, because there are people that perceive visual stimuli who say that are not aware of seeing anything (even while they could at some point of their lifes).

You are talking about what I would call a phenomenological, or "philosophical-in-the-hard-problem-sense" consciousness ("phenomenological" is also not quite right the word because psychology is also phenomenology, relative to neuroscience, but this is an aside).

"Psychological" consciousness (specifically, two kinds of it: affective/basal/core consciousness, and access consciousness) is not mysterious at all. These are just normal objects in neuropsychology.

Corresponding objects could also be found in AIs, and called "interpretable AI consciousness".

"Psychological" and "interpretable" consciousness could be (maybe) generalised in some sort of "general consciousness in systems". (Actually, Fields et al. already proposed such a theory, but their conception of general consciousness surely couldn't serve as a basis of ethics.)

The proper theory of non-anthropocentric ethics, shall it be based in some way on consciousness (which I'm actually doubtful about; I will write a post about this soon), surely should use "psychological" and "interpretable" rather than "philosophical-in-the-hard-problem-sense" consciousness.