The most predictively powerful description of a human is the microphysical one. On this level of description there are no wants, beliefs, or values, just a bunch of atoms evolving according to physical law. Also, this model is useless to me.
Many people (myself included) want to find a more useful model of human thought, behavior, and preferences. People have advanced different proposals, deciding to model humans on the level of learning theory, or of sensations, or of beliefs and desires. Often we call this the problem of "finding a good model" and throw all these cases into the same category, but in fact models of humans can have many different uses, and thus many different axes on which their goodness can be judged.
This post is context for a later post that I needed to think about. But first: the point.
First of all, one can easily do better than the microphysical model, because it's computationally intractable from every angle. I can't efficiently infer the state of your atoms from my observations of you, and even if I could, I can't fit that state inside my tiny human head. We want models of humans whose parameters can be inferred from real data, and which can produce predictions in the hands of real users.
Taking that constraint as given, there are four different uses to which we want to put human models:
- Predicting humans
- Choosing good actions
- Matching the structure of language / intuition
- Advancing analysis of human cognition
If we want a human model for our own personal decision-making, it's probably to predict other peoples' actions. Humans are a major part of our environment, after all, and we can't achieve our own goals very well if we can't account for them. Usually we care more about their macroscopic behavior than their precise internal state.
Within the category of models focused on predicting humans, there can still be wide variation. Models can focus on predicting different sorts of behavior in different ways, and they can also vary in their degrees and means of fulfilling the other functions. For example we might want to predict whether people will laugh in response to a certain joke, and this might lead us to a different model than if we want to predict whether people will take their full course of antibiotics.
Much present-day effort goes into predicting human behavior in response to advertisements, and in the AI safety context we can think of systems trained on human approval as trying to predict approving behavior.
Choosing good actions
Sometimes we value our model of humans for the human-like actions it chooses in some domain of interest.
Compared to prediction, valuing generation changes how we design the parts of the model that interface with the world and the user. We narrow down predictions by specifying observables about the human we're trying to predict. What will someone buy based on their demographics and browsing history, that sort of thing. But we typically want to narrow down generation by how the output varies in ways that matter to us. Give me some text written by someone smart, or funny, or by somebody parodying Cormac McCarthy.
And if the output of the model is good enough at fulfilling its purpose, we won't mind if it's not doing the best job at prediction.
Models thought of as "normative" are often used for choosing good actions, for example in economics. And of course in machine learning, even though GPT is trained using a predictive signal we value it for its human-like output, not its ability to predict writers. Various schemes for aligned AI use modeled humans to choose good actions, or use a model of humans as a template for an AI that will choose good actions.
Matching the structure of language / intuition
Us humans already have models of humans in mind (literally) when we talk about each other. Often we want to write down formal models in ways that match our mental models.
If we've always talked about humans as having specific "human values," to take the obvious example, we might rate models based on whether they have an object or obvious pattern that corresponds closely to our language and intuitions about human values.
Work on interpretability of human models often requires rating models this way. For example, your predictive model might be required by law or the client not to judge according to certain characteristics in a certain sense, and so you might ensure that it reifies those characteristics specifically so that it can avoid using them.
Value learning is typically construed as finding a good model of humans containing a "human values" object.
Advancing understanding of human cognition
We normally associate "understanding" or "explaining" the human mind with telling a detailed yet intuitive causal story about how it works - a sort of model building. We're also often interested in the computational or learning-theoretic properties of these models, which we then associate with the computational / learning-theoretic properties of the brain.
This use of models places the least emphasis on computational efficiency, because the model is used as a guide to human understanding, not for the sake of its outputs or inferences. Even if a description of a model is underspecified computationally, it can still advance our understanding of humans if it provides a useful framework for thinking about the brain. This isn't exclusive with the other uses - thinking about a description of a model of humans might help us predict human behavior, for instance.
Neuroscience, and neuroscience-inspired AI, are often very interested in these sorts of models. At a more coarse-grained level, academic debates in psychology often revolve around models intended to help us understand humans.
No model can or should be optimal for all of these purposes. No free lunch and all that - you're better at what you focus on and worse at what you don't.
By keeping in mind the focus of a particular human model, I can contextualize it and evaluate it according to its purposes, and avoid the mistake of trying to mash all the purposes together. I can also say something like "A lot of value learning work focuses on matching the structure of human intuition, but I think there's promising possibilities at the intersection of abstracting human cognition and choosing good actions," and now you'll get what I mean.