Recently, I've constantly been noticing the importance of a certain sort of distinguishing between something happening in reality, and something happening in your model of reality. In part, I'm noticing it because I've been thinking about it, not vice versa, but I'll explain for a few specific cases, and maybe you'll see what I see.

Counterfactuals:

Possibilities, or possible worlds, or what have you, aren't things that exist (waves hand) out there in reality. Instead, they're conveniences of our model of the world - and as such, they don't interact with matter out there, they interact with other parts of your model.

When people imagine a model of a mind, they often have "possibilities" as a basic moving part within that model. But there is no fundamental possibility-particle - instead, possibilities are imagined by mutating your model of the world in some way. Different minds can use different classes of models, and different methods of mutating those models, and thus end up with different imagined possibilities.

CDT is a good basic example of how to really pin this down - you know both the model class (states of a specific causal graph) and method of imagining (intervene at a specific node). Under UDT, the situation is even more extreme - the agent models a universe as a stochastic game, and picks a high-performing strategy. But it does this even if the real universe is deterministic - the model in UDT is not a model of the world per se, it is part of the machinery of generating these useful fictions.

This distinction is particularly important in the case of logical counterfactuals, like "imagine that 247 was prime." It feels like we can imagine such things, but the things we're imagining are not external inconsistent universes (how could they be?) - instead you are making variations on your model of the world. Satisfactory accounts of logical counterfactuals (like Omega could use to simulate the counterfactual in logical counterfactual mugging) aren't going to look like methods of constructing alternate universes where 247 is prime, they're going to look like useful methods of mutating world-models.

Learning Abstract Objects:

Abstract objects, like the number 5, or morality, are real in the sense that they are citizens in good standing of the mental map. But humans, when thinking about them, think of them as " real " in a very general sense, which can lead to mistakes.

For example, when we try to teach morality to an AI, we might think of trying to teach it the mapping humans use from situations to value. But this approach runs into problems, and I think part of those problems is that it treats as an object-in-reality (the mapping) something that's an object-in-our-map.

Suppose we wanted to teach the AI to make predictions about the counterfactual universe where 247 is prime. One approach is training a classifier on some dataset of human statements about this counterfactual universe, to try and learn the pattern humans use to tell whether a statement is true or false. And given sufficient data, this would eventually become indistinguishable from human judgment. But this seems like it loses something in the process of assuming that you can learn about the counterfactual universe as if it had all the properties of an object, and that it would be much more satisfying is the AI could learn an approximation to humans' big, messy model of the world, and how they imagine counterfactual worlds, and answer based on that.

I think one big advantage of representing more of the process of making judgments about abstract objects, rather than just trying to represent the objects as a classifier, is that it's the right level at which to think about changing them. When we imagine a slightly different counterfactual world, we don't go to the counterfactual-world-object and change that. No, we go into our map and we change the process we're using to imagine a thing.

Or, for morality, when we imagine "the most moral action," we don't take the morality-object and find the input that maximizes it. We use our brain and look for the thing that it models as the most moral action, which is on the other side of the level distinction.

III.

I'm still not sure that this distinction actually points out mistakes that people are making, or provides useful advice for the way forward. But I've been thinking about it a lot recently, and hopefully I've managed to communicate the gist of it.

New Comment
4 comments, sorted by Click to highlight new comments since: Today at 6:07 PM
For example, when we try to teach morality to an AI, we might think of trying to teach it the mapping humans use from situations to value. But this approach runs into problems, and I think part of those problems is that it treats as an object-in-reality (the mapping) something that's an object-in-our-map.

At the risk of too much tooting my own horn, this is a big part of what I'm getting at in my recent work on noematological AI alignment: you can't align over shared, "objective" values because they don't exist.

A question then for both of you – isn't the object in this case exactly one that exists in both *reality* and our map of reality? It's not obvious to me that something like this *isn't* objective and even potentially knowable. It's information, it must be stored somewhere in some kind of physical 'media', and the better it is as a working component of our map the more likely it is that it corresponds to some thing-in-reality.

Interestingly, it just occurred to me that stuff like this – 'information stuff' – is exactly the kind of thing that, to the degree it's helpful in a 'map', is something we should expect to find more or less as-is in the world itself.

If there's a tree in both the territory and my map, that's just the usual state of affairs. But when I talk about the tree, you don't need to look at my map to know what I mean, you can just look at the tree. Morality is different - we intuitively think of morality as something other people can see, but this works because there is a common factor (our common ancestry), not because you can actually see the morality I'm talking about.

We could theoretically cash out statements about morality in terms of complicated evaluations of ideas and percepts, but this can't save the intuitions about e.g. what arguments about morality are doing. Unlike the case of a tree and our idea of the tree, I think there really is a mismatch between our computation of morality and our idea of it.

Interestingly, it just occurred to me that stuff like this – 'information stuff' – is exactly the kind of thing that, to the degree it's helpful in a 'map', is something we should expect to find more or less as-is in the world itself.

The interesting thing about information is that it's not stuff the same way matter is, but something that is created via experience and it only exists so long as there is physical stuff interacting to create it via energy transfer. And this is the key to addressing your question: the map (ontology) is only information while the territory (the ontic) is stuff and its experiences. It is only across the gap of intentionality that ontology is made to correspond to the ontic.

That's kind of cryptic but I maybe do a better job of laying out what's going on here.