A useful level distinction

by Charlie Steiner 2 min read24th Feb 20184 comments


Recently, I've constantly been noticing the importance of a certain sort of distinguishing between something happening in reality, and something happening in your model of reality. In part, I'm noticing it because I've been thinking about it, not vice versa, but I'll explain for a few specific cases, and maybe you'll see what I see.


Possibilities, or possible worlds, or what have you, aren't things that exist (waves hand) out there in reality. Instead, they're conveniences of our model of the world - and as such, they don't interact with matter out there, they interact with other parts of your model.

When people imagine a model of a mind, they often have "possibilities" as a basic moving part within that model. But there is no fundamental possibility-particle - instead, possibilities are imagined by mutating your model of the world in some way. Different minds can use different classes of models, and different methods of mutating those models, and thus end up with different imagined possibilities.

CDT is a good basic example of how to really pin this down - you know both the model class (states of a specific causal graph) and method of imagining (intervene at a specific node). Under UDT, the situation is even more extreme - the agent models a universe as a stochastic game, and picks a high-performing strategy. But it does this even if the real universe is deterministic - the model in UDT is not a model of the world per se, it is part of the machinery of generating these useful fictions.

This distinction is particularly important in the case of logical counterfactuals, like "imagine that 247 was prime." It feels like we can imagine such things, but the things we're imagining are not external inconsistent universes (how could they be?) - instead you are making variations on your model of the world. Satisfactory accounts of logical counterfactuals (like Omega could use to simulate the counterfactual in logical counterfactual mugging) aren't going to look like methods of constructing alternate universes where 247 is prime, they're going to look like useful methods of mutating world-models.

Learning Abstract Objects:

Abstract objects, like the number 5, or morality, are real in the sense that they are citizens in good standing of the mental map. But humans, when thinking about them, think of them as " real " in a very general sense, which can lead to mistakes.

For example, when we try to teach morality to an AI, we might think of trying to teach it the mapping humans use from situations to value. But this approach runs into problems, and I think part of those problems is that it treats as an object-in-reality (the mapping) something that's an object-in-our-map.

Suppose we wanted to teach the AI to make predictions about the counterfactual universe where 247 is prime. One approach is training a classifier on some dataset of human statements about this counterfactual universe, to try and learn the pattern humans use to tell whether a statement is true or false. And given sufficient data, this would eventually become indistinguishable from human judgment. But this seems like it loses something in the process of assuming that you can learn about the counterfactual universe as if it had all the properties of an object, and that it would be much more satisfying is the AI could learn an approximation to humans' big, messy model of the world, and how they imagine counterfactual worlds, and answer based on that.

I think one big advantage of representing more of the process of making judgments about abstract objects, rather than just trying to represent the objects as a classifier, is that it's the right level at which to think about changing them. When we imagine a slightly different counterfactual world, we don't go to the counterfactual-world-object and change that. No, we go into our map and we change the process we're using to imagine a thing.

Or, for morality, when we imagine "the most moral action," we don't take the morality-object and find the input that maximizes it. We use our brain and look for the thing that it models as the most moral action, which is on the other side of the level distinction.


I'm still not sure that this distinction actually points out mistakes that people are making, or provides useful advice for the way forward. But I've been thinking about it a lot recently, and hopefully I've managed to communicate the gist of it.