An Agent Based Consciousness Model (unfortunately it's not computable)

Logan Zoellner

This post is a response to Scott Aaronson's "Why I Am Not An Integrated Information Theorist (or, The Unconscious Expander)".

He proposes the following "test" for any theory of consciousness:

The test of such a theory is whether it can produce results agreeing with “commonsense intuition”: for example, whether it can affirm, from first principles, that (most) humans are conscious; that dogs and horses are also conscious but less so; that rocks, livers, bacteria colonies, and existing digital computers are not conscious (or are hardly conscious); and that a room full of people has no “mega-consciousness” over and above the consciousnesses of the individuals.

In this post, I will describe an Agent Based Consciousness model, and consider how well it passes Aaronson's "test".

Intuitive Model

The intuition behind my Agent Based Consciousness Model is the following claim. In order to interact with the world, agents build models of the world that allow them to make predictions about the results of their interactions. Consciousness is directly related to the complexity and predictive accuracy of those models. A being with a very simple model of the world (or one that makes poor predictions) is less "conscious" of the world around it than one with a more detailed model (that it uses to make accurate predictions).

Note that this definition explicitly excludes self-consciousness. A being which does not interact with or predict the world around it is defined as unconscious regardless of how rich its inner mental life is. This can be fixed (by partitioning an agent from its observations about itself), or it can be accepted as a weakness of this model.

This model also focuses on the size complexity of the modeled agent, not its computational complexity or processing speed. While we might intuitively think that a human who thinks twice as fast is twice as conscious, this model does not take that effect into account. It may be helpful to think of this as a qualitative measure of consciousness rather than a quantitative once. For example, Ents think much more slowly than human beings, but their thoughts are not any less rich as a result.

Finally, this model is pan-psychic. It attributes a Consciousness Score to all agents and is permissive about what can be defined as an "agent". It may be helpful to think of this model as describing the "degree of consciousness" rather than making binary claims about which agents are/are not conscious.

In addition this model makes no moral claims about consciousness. While it predicts (I believe) that GPT-3 is about as conscious as a fruit-fly, it does not assign moral worth (or deny it) to either of these.

Conscious Agents as World State Predictors

For the purpose of this theory, we define a "agent" in the following way. An agent has:

A number of sensors (>0), through which it receives information about the outside world
A number of actuators (>=0), through which it affects the world
A model of the world, whereby it predicts the output of the sensors conditional on the actuators

Note that the boundary defining an agent is arbitrary. For instance, if you see a human driving a car, you could define the agent as the full human-car system, only the human, or only the human's brain. For each of these definitions, we would compute a different "consciousness score", and ultimately conclude that most of the "consciousness" of the human-car system was contained in the human brain.

Now, we define the "consciousness score" of a system by its ability to accurately predict the world around it.

Regardless of how we define our agent, it can make predictions of the form: "if I perform the following actions (sequence of outputs given its actuators), I will observe the following sensations (sequence of inputs on its sensors)." We will consider the set of all such possible predictions, and how accurate the agent's predictions are. Note that these predictions can be binary (if I flip this switch the light will turn on) or real-valued (if I turn up the thermostat, the room will get warmer), depending on the types of sensors/actuators. We also consider the set of possible predictions an agent could make, not the set of actual predictions.

Unfortunately, it may be difficult to talk about possible predictions if an agent is purely deterministic (since there is only one possible prediction, the one it will actually make). For this reason, we imagine an external "interrogator" who could potentially "ask" the system to make predictions.

It is also imperative that we define what it means to "make" a prediction. Obviously if the agent could just tell the interrogator what it predicts, this would be easy. But even humans have difficult describing sensations as simple as the color red. So instead, we merely require that there be some internal "register", which holds the prediction, given a "question" (sequence of inputs) from the interrogator.

As with the human-driving-car example, the consciousness score will depend subtly on how the interrogator and register are defined. However, we might hope that for reasonable definitions the score will not change too much, or that "most" of the consciousness is found to be stored where common-sense dictates it ought to be.

The formal model

To formalize this, we consider a simple Agent , such as one that might be used to reinforcement learning. It has two functions. The first is a "observation function", which merely takes a sequence of sensory inputs $s$ and registers its state. Call this function $O$ , then the equation $O_{A} (s) \to R$ says that "the agent computes observations of its environment.

An agent can also guess what its observations will be if it undertakes a sequence of actions $a$ . Define a function $G_{A} (s, a) \to R$ , which predicts the value of its register $R$ given a series of inputs $s$ , and actions $a$ . And an Interrogator $I$ , which "asks" questions of $A$ by feeding it a sequence of inputs $s_{I}$ .

We can now consider the set of all possible predictions $P_{A} = {G_{A} (s_{i_{k}}, a) | \forall s_{i_{k}}}$ . We can now ask how accurate each of these predictions is. Namely, if the agent actually performs the sequence of actions $a$ , how well does the observed value $O_{A}$ in the register $R$ correlate with the predicted value? Note that simply doing nothing--waiting--for a period of time is a valid sequence of actions. If the predictions are all binary, we could simply compute a percentage correct. For a more complicated register, we can use joint entropy.

Suppose at time $t$ we consider all possible sets of actions $P o s s i b l e A c t i o n s$ , ask the agent to predict the outcome of each sequence of actions. This gives us a prediction score. $P r e d i c t i o n S c o r e_{A} = s u m ({J o i n t E n t r o y (O_{A} (s_{t + k}), P_{A} (s_{t}, a)) \forall a \in P o s s i b l e A c t i o n s)})$

Now, how do we assign a "consciousness score" to such a system?

We would like to know how complex this system is. Intuitively, simple systems should be less conscious, and complex ones are (potentially) more complex.

To measure complexity, we ask "what is the minimum Kolmogorov complexity of a system at least as accurate as the agent we are measuring?". That is, if we imagine replacing the agent with one simulated on a Turing machine, how many bits of information would be required to initialize such a machine? Kolmogorov complexity has the nice property of measuring the "incompressible information" contained in a system. And it has the unfortunate property of being uncomputable.

Finally, we will want to divide this score by the size of the register $R$ . This is in order to fix the problem of having two human beings being twice as conscious as 1 human being.

So our consciousness score is

$C o n s c i o u s n e s s S c o r e (A) = M i n ({K C (X) | P S (X) \leq P S (A)}) / | R |$

where $K C = K o l m o g o r o v C o m p e x i t y$ and $P S = P r e d i c t i o n S c o r e$

A Worked example

Consider an agent with 1 sensor, which observes the output of a biased coin flip, and 0 actuators

class SimpleAgent:
	def __init__(self):
		self.count=0
		self._register=False
	def predict(obs):
	 	#update count
		if obs:
			count+=1
		else
			count-=1
		#make a prediction
		if count>0:
			self.registerPrediction(True)
		else:
			self.registerPrediction(False)
	def registerPrediction(value):
		self._register=value

In this case, the agent could be embodied as a computer with 2 buttons, one for heads and one for tails. The Interrogator can interact with the Agent by flipping the biased coin and pressing the corresponding button (which sends True or False to the above code). Over time, the Agent will (with high probability) always register the side of the coin which it is biased towards.

Suppose the coin is actually biased towards heads (with probability 75%). Then a Turing machine which always outputs 1 will have the same prediction accuracy as our Agent, so the Consciousness Score of our Agent is

$C o n s c i o u s n e s s S c o r e (S i m p l e A g e n t)$

$= M i n ({K C (X) | J E (P_{X}, O) <= J E (P_{A}, O)}) / | R | = 1 / 1 = 1$

Now suppose instead of a biased coin, our environment consisted of a coin that periodically switched (every 100 flips) from being biased towards heads to being biased towards tails.

Consider the following agent:

class BetterAgent:
	def __init__(self):
		self.recent=[]
		self._register=False
	def predict(obs):
		#update recent
		if obs:
			self.recent+=[1]
		else
			self.recent+=[-1]
		#trim recent to last 10
		self.recent=self.recent[-10:]
		#predict whichever was more likely recently
		if sum(self.recent)>0:
			self.registerPrediction(True)
		else:
			self.registerPrediction(False)
	def registerPrediction(value):
		self._register=value

How do we calculate the Consciousness score of this system? First we need to create a Turing machine that predicts its environment at least as good as BetterAgent. Suppose we had such a Turing machine and it could be initialized by a 16 bit string.

Then $C o n s c i o u s n e s s S c o r e (B e t t e r A g e n t)$

$= M i n ({K C (X) | J E (P_{X}, O) <= J E (P_{A}, O)}) / | R | = 16 / 1 = 16$

Note that BetterAgent is able to adapt to a more complex environment. As a result it has a higher score (but only in the more complex environment).

Some nice predictions of this model

How well does this theory pass Aaronson's test?

that (most) humans are conscious;

We haven't defined the word "conscious" yet, so let's do that here. Our consciousness score is a scalar value, so we can simply define "conscious" at some level below whatever the human minimum is.

that dogs and horses are also conscious but less so;

Given that dog and horse brains are smaller than human brains, it seems almost inevitable that the Kolmogorov complexity required to simulate them is also lower

that rocks, livers, bacteria colonies, and existing digital computers are not conscious (or are hardly conscious);

Rocks don't make predictions about reality, so their consciousness score will simply be equal to the size of their register and they will have a consciousness score of 1. Let's define 1 as "not conscious at all".

Bacteria and digital computers are a more complicated case. Both of them make predictions and thus will be at least slightly "conscious". In the case of bacteria, they can probably be simulated by very simple machines. But a complex model like GPT-3 has billions of parameters. Even assuming a very generous compressibility (of say 10-100x), that sounds like a lot of consciousness.

How does GPT-3 compare to human beings on this metric? The problem is, of course, that we can't actually compute the consciousness score for humans (or GPT-3, for that matter). But given that the human brain is able to store a petabyte of information, it seems likely that GPT-3's "consciousness score" is 3-5 orders of magnitude lower. Merely comparing number of parameters, GPT-3 is somewhere between a fruit-fly and a lizard.

and that a room full of people has no “mega-consciousness” over and above the consciousnesses of the individuals.

Because we divide $K C$ by $| R |$ , a room full of people isn't any more conscious than the individuals (expect to the degree they can make more accurate predictions by working together).

As an additional observation, sleeping/unconscious humans have lower consciousness scores than waking ones (since they cannot make predictions related to their external senses) but probably not significantly so (since even unconscious human beings have some external sensations and also predict the content of their dreams).

Utility Consciousness monsters

Unfortunately, as with Integrated Information Theory, it is possible to game this Agent Based Consciousness model. Most random strings have Kolmogorov complexity nearly equal to their length. Hence, the following agent has unbounded "consciousness".

The agent: a memory storing a large random string, capable of outputting the value of that string to a single-bit register.

The environment: an identical copy of that random string, which when requested by the agent's actuators presents the corresponding bit to the agent's sensors.

Of course, storing a random string isn't very useful in our world, so one way to overcome this objection would be to require that the environment be a human-friendly one. But by that definition, a methane-breathing alien wouldn't be conscious either (since it would immediately die on Earth).

The more robust thing to do would probably be to define a set of reasonable environments, and then allow simple modifications to the agent (e.g. give the methane breathing alien a helmet filled with air it can breathe). But here the definition of "reasonable environment" and "simple modification" takes us away from the land of mathematical rigor and back into the world of allowing human judgement to determine whether something is conscious or not.

Conclusion

This metric of Agent Based Consciousness reasonably accords with human intuition, but has the downside of being critically dependent on the environment in which the agent is tested.

It also has some counterintuitive properties. For example, a human in a strange environment would be judged less conscious, but we would probably believe ourselves to be more conscious as we tried to take in the strange surroundings. This can be fixed by giving the agent time to "get used to" its new environment, but this again leads us away from the land of mathematical rigor.

In addition, this theory almost certainly predicts that babies are less conscious than children, who are less conscious than adults. But this is again fixed by the "time to get used to environment" constraint. This suggests that perhaps the complexity we should be measuring is not the current complexity, but the obtainable complexity of the agent. Measuring obtainable complexity would also solve the problem of an agent that is conscious but not currently interacting with its environment.

If we suppose that given enough training data, a Machine Learning algorithm with unlimited memory would surpass a human at predicting its surroundings, then by this definition that algorithm would also be more conscious than a human. But that is perhaps the intended effect of this definition, not an unfortunate side-effect.

Dividing the Kolmogorov complexity by the register size also feels a bit wrong. It is necessary to prevent rocks/groups of people from having unbounded consciousness, but it has the side effect of measuring groups of people as less conscious than individuals (due to mutual information). Simply saying "care is required when defining agents and registers in addition to environments" fixes this, but at the cost of introducing additional human judgement.

[-]Charlie Steiner2y40

Cute idea, but I think you won't get many upvotes because the post felt longer (and probably more technical) than the idea could sustain.

One unavoidable issue with defining consciousness, which has to be handled with some delicacy, is that people don't have separate mental buckets for "consciousness" and "the mental properties of humans that I care about." Sometimes we like to say that we intrinsically care about consciousness (as if they were independent), but really it's more like consciousness and us caring about things are all muddled together.

In one direction, this means that it seems obvious that upon offering a definition for consciousness, this means that there's a "consciousness monster" that maximizes the definition, which seems interesting because since you've labeled the thing you're defining "consciousness," it feels like you intrinsically care about it.

In the other direction, this means that upon offering a simple definition for consciousness, everyone who applies common sense to it will go "Wait, but this definition doesn't include properties of humans that I care about, like emotions / pain / dreams / insert your favorite thing here."

[-]Logan Zoellner2y10

Agree with almost all of your points.

The goal of writing this post was "this is a slight improvement on IIT", not "I expect normal people to understand/agree with this particular definition of consciousness".

LESSWRONG
LW