(This is not (quite) just a re-hashing of the homunculus fallacy.)
I'm contemplating what it would mean for machine learning models such as GPT-3 to be honest with us. Honesty involves conveying your subjective experience... but what does it mean for a machine learning model to accurately convey its subjective experience to us?
You've probably seen an optical illusion like this:
You've probably also heard an explanation something like this:
"We don't see the actual colors of objects. Instead, the brain adjusts colors for us, based on surrounding lighting cues, to approximate the surface pigmentation. In this example, it leads us astray, because what we are actually looking at is a false image made up of surface pigmentation (or illumination, if you're looking at this on a screen)."
This explanation definitely captures something about what's going on, but there are several subtle problems with it:
- It's a homunculus fallacy! It explains what we're seeing by imagining that there is a little person inside our heads, who sees (as if projected on a screen) an adjusted version of the image. The brain adjusts the brightness to remove shadows, and adjusts colors to remove effects of colored light. The little person therefore can't tell that patch A is actually the same color as patch B.
- Even if there was a little person, the argument does not describe my subjective experience, because I can still see the shadow! I experience the shadowed area as darker than the unshadowed area. So the homunculus story doesn't actually fit what I see at all!
- I can occasionally and briefly get my brain to recognize A and B as the same shade. (It's very difficult, and it quickly snaps back.)
My point is, even when cognitive psychologists are trained to avoid the homunculus fallacy, they go and make it again, because they don't have a better alternative.
One thing the homunculus story gets right which seems difficult to get right is that when you show me the visual illusion, and explain it to me, I can believe you, even if my brain is still seeing the illusion. I'm using the language "my brain" vs "me" precisely because the homunculus fallacy is a pretty decent model here: I know that the patches are the same shade, but my brain insists they're different. It is as if I'm a little person watching what my brain is putting on a projector: I can believe or disbelieve it.
For example, a simple Bayesian picture of what the brain is doing would involve a probabilistic "world model". The "world model" updates on visual data and reaches conclusions. Nowhere in this picture is there any room for the kind of optical illusion we routinely experience: either the Bayesian would be fooled (there is no awareness that it's being fooled) or not (there is no perception of the illusion; the patches look the same shade). Or a probabilistic mixture of the two. ("I'm not sure whether the patch is the same color.")
Actually, this downplays the problem, because it's not even clear what it means to ask a Bayesian model about its subjective experience.
When I've seen the homunculus fallacy discussed, I've always seen it maligned as this bad mistake. I don't recall ever seeing it posed as a problem: we don't just want to discard it, we want to replace it with a better way of reasoning. I want to have a handy term pointing at this problem. I haven't thought of anything better than the homunculus problem yet.
The Homunculus Problem: The homunculus fallacy is a terrible picture of the brain (or machine learning models), yet, any talk of subjective experience (including phenomena such as visual illusions) falls into the fallacious pattern of "experience" vs "the experiencer". ("My brain shows me A being darker than B...")
The homunculus problem is a superset of the homunculus fallacy, in the following sense: if something falls prey to the homunculus fallacy, it involves a line of reasoning which (explicitly or implicitly) relies on a smaller part which is actually a whole agent in itself. If something falls prey to the homunculus problem, it could either be that, or it could be a fully reductive model which (may explain some things, but) fails to have a place for our subjective experience. (For example, a Bayesian model which lacks introspection.)
This is not the hard problem of consciousness, because I'm not interested in the raw question of how conscious experience arises from matter. That is: if by "consciousness" we mean the metaphysical thing which no configuration of physical matter can force into existence, I'm not talking about that. I'm talking about the neuroscientist's consciousness (what philosophers might call "correlates of consciousness").
It's just that, even when we think of "experience" as a physical thing which happens in brains, we end up running into the homunculus fallacy when trying to explain some concepts.
I'm tempted to say that this is like the hard problem of consciousness, in that the main thing to avoid is mistaking it for an easier problem. (IE, it's called the "hard" problem of consciousness to make sure it's not confused with easier problems of consciousness.) I don't think you get to claim you've solved the homunculus problem just because you have some machine-learning model of introspection. You need to provide a way of talking about these things which serves well in a lot of examples.
This is related to embedded agency, because part of the problem is that many of our formal models (bayesian decision theory, etc) don't include introspection, so you get this picture where "world models" are only about the external world, so agents are incapable of reflecting on their "internal experience" in any way.
This feels related to Kaj Sotala's discussion of no-self. What does it mean to form an accurate picture of what it means to form an accurate picture of yourself?
Added: Nontrivial Implication
A common claim among scientifically-minded people is that "you never actually observe anything directly, except raw sensory data". For example, if you see a glass fall from a counter and shattering, you're actually seeing photons hitting your eye, and inferring the existence of the glass, its fall, and its shattering.
I think an easy mistake to make, here, is to implicitly think as if there's some specific boundary where sensory impressions are "observed" (eg, the retina, or v1). This in effect posits a homunculus after this point.
In fact, there is no such firm boundary. It's easy to argue that we don't really observe the light hitting the retina, because it is conveyed imperfectly (and much-compressed) to the optic nerve, and thereby to v1. But by the time the data is in v1, it's already a bit abstracted and processed. There's no theater of consciousness with access to the raw data.
Furthermore, if someone wanted to claim that the information in v1 is what's really truly "observed", we could make a similar case about how information in v1 is conveyed to the rest of the brain. (This is like the step where we point out that the homunculus would need another homunculus inside of it.) Every level of signal processing is imperfect, intelligently compressed, and could be seen as "doing some interpretation"!
I would argue that this kind of thinking is making a mistake by over-valuing low-level physical reality. Yes, low-level physical reality is what everything is implemented on top of. But when dealing with high-level objects, we care about the interactions of those objects. Saying "we don't really observe the glass directly" is a lot like saying "there's not really a glass (because it's all just atoms)". I claim there really is a glass, and we really observe it.
So, I would argue that we "really do experience" the glass falling and breaking, in the most sensible sense of direct experience.
However, if you buy this argument, then you also have to buy a couple of surprising conclusions:
- Direct experience of something does not grant you complete knowledge of that thing. Yes, I claim that we can directly experience external reality; but this does not imply that we perceive every crack in the glass in perfect detail, or whatever.
- Furthermore, we can be wrong about our direct experience! The image of the glass falling could