The AI in Mary's room


In the Mary's room thought experiment, Mary is a brilliant scientist in a black-and-white room who has never seen any colour. She can investigate the outside world through a black-and-white television, and has piles of textbooks on physics, optics, the eye, and the brain (and everything else of relevance to her condition). Through this she knows everything intellectually there is to know about colours and how humans react to them, but she hasn't seen any colours at all.

After that, when she steps out of the room and sees red (or blue), does she learn anything? It seems that she does. Even if she doesn't technically learn something, she experiences things she hadn't ever before, and her brain certainly changes in new ways.

The argument was intended as a defence of qualia against certain forms of materialism. It's interesting, and I don't intent to solve it fully here. But just like I extended Searle's Chinese room argument from the perspective of an AI, it seems this argument can also be considered from an AI's perspective.

Consider a RL agent with a reward channel, but which currently receives nothing from that channel. The agent can know everything there is to know about itself and the world. It can know about all sorts of other RL agents, and their reward channels. It can observe them getting their own rewards. Maybe it could even interrupt or increase their rewards. But, all this knowledge will not get it any reward. As long as its own channel doesn't send it the signal, knowledge of other agents rewards - even of identical agents getting rewards - does not give this agent any reward. Ceci n'est pas une récompense.

This seems to mirror Mary's situation quite well - knowing everything about the world is no substitute from actually getting the reward/seeing red. Now, a RL's agent reward seems closer to pleasure than qualia - this would correspond to a Mary brought up in a puritanical, pleasure-hating environment.

Closer to the original experiment, we could imagine the AI is programmed to enter into certain specific subroutines, when presented with certain stimuli. The only way for the AI to start these subroutines, is if the stimuli is presented to them. Then, upon seeing red, the AI enters a completely new mental state, with new subroutines. The AI could know everything about its programming, and about the stimulus, and, intellectually, what would change about itself if it saw red. But until it did, it would not enter that mental state.

If we use ⬜ to (informally) denote "knowing all about", then ⬜(X→Y) does not imply Y. Here X and Y could be "seeing red" and "the mental experience of seeing red". I could have simplified that by saying that ⬜Y does not imply Y. Knowing about a mental state, even perfectly, does not put you in that mental state.

This closely resembles the original Mary's room experiment. And it seems that if anyone insists that certain features are necessary to the intuition behind Mary's room, then these features could be added to this model as well.

Mary's room is fascinating, but it doesn't seem to be talking about humans exclusively, or even about conscious entities.