Connecting the good regulator theorem with semantics and symbol grounding

by Stuart_Armstrong2 min read4th Mar 2021No comments


Ω 7

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

EDIT: For a great reappraisal of the good regulator theorem using modern terminology, see here.

I've been writing quite a bit about syntax, semantics, and symbol grounding.

Recently, I've discovered the "good regulator theorem" in systems science[1]. With a bunch of mathematical caveats, the good regulator theorem says:

  • Every good regulator of a system must be a model of that system.

Basically if anything is attempting to control a system (making it a "regulator"), then it must model that system. The definition of "good" includes minimum complexity (that's why the regulator "is" a model of the system: it includes nothing else that would be extraneous), but we can informally extend that to a rougher theorem:

  • Every decent regulator of a system must include a model of that system.

Models, semantics, intelligence, and power

I initially defined grounded symbols by saying that there was mutual information between the symbols in the agent's head and features of the world.

I illustrated this with a series of increasingly powerful agents dedicated to detecting intruders in a greenhouse; each of them had internal variables , with the more powerful agent's internal variables corresponding more closely to the presence of a intruder (denoted by the variable ).

For the simplest agent, a circuit-breaking alarm, the symbol just checked whether the circuit was broken or not. It had the most trivial model, simply mapping the Boolean of "circuit broken: yes/no" to that of "sound alarm: yes/no".

It could be outwitted, and it could go off in many circumstances where there was no intruder in the greenhouse. This is hardly a surprise, since the alarm does not model the greenhouse or the intruders at all: it models the break in the circuit, with the physical setup linking that circuit breaking with (some cases of) intruders. Thus the correlation between and is weak.

The most powerful agent, a resourceful superintelligent robot dedicated to intruder-detection, has internal variable . In order to be powerful, this agent must, by the good regulator theorem, be able to model many different contingencies and situations, having a good grasp of all the ways intruders might try to fool it, and have ways of detecting each of those ways. It has a strong model of the (relevant parts of the) world, and is very closely tied to .

A more powerful agent could still fool it. If that agent was more intelligent, which we'll define here as having superior models of , of , and of the surrounding universe, then it will know where to apply its power to best trick or overwhelm the robot. If that agent was less intelligent, it would have to apply a lot more brute power, since it wouldn't have a good model of the robot's vulnerabilities.

Thus, in some ways, greater intelligence could be defined as better use of better models.

Learning and communication

Of course, in the real world, agents don't start out with perfect models; instead they learn. So a good learning agent is one that constructs good models from their input data. It's impossible for a small agent to model the whole universe in detail, so efficient agents have to learn what to focus on, and what simplifying assumptions it is useful for them to make.

Communication, when it works allows the sharing of one person's model with another. This type of communication is not just sharing factual information, but one person trying to communicate their way of modelling and classifying the world. That's why this form of communication can sometimes be so tricky.

  1. Thanks to Rebecca Gorman to getting me into looking at cybernetics, control theory, systems science, and other older fields. ↩︎


Ω 7

New Comment