When we're trying to do AI alignment, we're often studying systems which don't yet exist. This is a pretty weird epistemic activity, and seems really hard to get right. This post offers one frame for thinking about what we're actually doing when we're thinking about AI alignment: using parts of the space of maps to reason about parts of the space of intelligent systems.
In this post, we:
We hope that the content is mostly the second kind of obvious: obvious once you see things in this way, which you maybe already do. In our experience, this comes with a risk: reading too fast, you may miss most of the nuance and useful insight the deceptively simple model brings, or come away with a version of the model which is rounded off to something less useful (i.e. "yeah, there is this map and territory distinction"). As a meta recommendation, we suggest reading this post slowly, and ideally immediately trying to apply the model to some confusion or disagreement about AI alignment.
Imagine the space of possible intelligent systems:
Two things seem especially important about this space:
If we don’t get direct epistemic access to the space of systems, what are we doing when we reason about it?
Let’s imagine a second space, this time a space of “maps”:
The space of maps is an abstract representation of all the possible “maps” that can be constructed about the space of intelligent systems. The maps are ways of thinking about (parts of) the space of systems. For example:
When we’re reasoning about intelligent systems, we’re using some part of the space of maps to think about some part of the space of intelligent systems:
Different maps correspond to different regions of the space of intelligent systems.
Of course, thinking in terms of the space of systems and the space of maps is a simplification. Some of the ways that reality is more complicated:
We think that the space of systems and the space of maps is a useful simplification which helps us to think more clearly about future AI systems. Some salient examples of how this simplification can help us think about future AI systems:
When it comes to AI alignment, we need accurate maps which hold for systems which don’t exist yet, and which are good enough to help us build these systems in ways that are safe.
There are few different properties it would be good for these maps to have:
And there are trade-offs here between the properties. For example:
A lot of AI alignment work involves taking maps that have been developed for thinking about one part of the space of systems, and applying them to a part of the space of systems that we hope includes “potentially dangerous future AI systems”. For example:
Being aware of which maps you are using and their potential limitations for the systems you want to study seems super useful for doing good research. 
We don’t know that much about where in the space of systems potentially dangerous AI will be. As a result, one good bet seems to be to try and find maps that are general enough to cover everywhere in the space of systems that future AI could be.
Given that we care about aligning AI to humans and human collectives, it also seems useful for maps to cover these areas of system-space as well (or more specifically, to cover relations between the human part of system space and the “possible future AI systems” part of system space).
Finding general maps isn’t the only promising approach here:
The ideas in this post come variously from Jan, Nora and Clem (some ideas come from one person; others were independently generated by multiple people) or from an older FHI project on AGI epistemics done by Jan with Chris van Merwijk and Ondřej Bajgar. Rose did most of the writing.
See also Design space of minds in general.
See this post for another discussion of this sort of epistemic challenge.
This post implicitly argues something similar. Visualising the space of AI systems here is also related.
Other ways of saying this: some maps are design paradigms/blueprints.
This post draws a distinction between maps (for understanding reality) and blueprints (for building new parts of reality). The way we’re using ‘maps’ here is broader and contains both of those kinds of map.
C.f. Adam Shimi on Epistemological Vigilance.
C.f. Adam Shimi on pluralism and “no one-size-fits-all epistemic strategy”.