A putative new idea for AI control; index here.
When posing his "New Riddle of Induction", Goodman introduced the concepts of "grue" and "bleen" to show some of the problems with the conventional understanding of induction.
I've somewhat modified those concepts. Let T be a set of intervals in time, and we'll use the boolean X to designate the fact that the current time t belongs to T (with ¬X equivalent to t∉T). We'll define an object to be:
- Grue if it is green given X (ie whenever t∈T), and blue given ¬X (ie whenever t∈T).
- Bleen if it is blue given X, and green given ¬X.
At this point, people are tempted to point out the ridiculousness of the concepts, dismissing them because of their strange disjunctive definitions. However, this doesn't really solve the problem; if we take grue and bleen as fundamental concepts, then we have the disjunctively defined green and blue; an object is:
- Green if it is grue given X, and bleen given ¬X.
- Blue if it is bleen given X, and grue given ¬X.
Still, the categories green and blue are clearly more fundamental than grue and bleen. There must be something we can whack them with to get this - maybe Kolmogorov complexity or stuff like that? Sure someone on Earth could make a grue or bleen object (a screen with a timer, maybe?), but it would be completely artificial. Note that though grue and bleen are unnatural, "currently grue" (colour=green XOR ¬X) or "currently bleen" (colour=blue XOR ¬X) make perfect sense (though they require knowing X, an important point for later on).
But before that... are we so sure the grue and bleen categories are unnatural? Relative to what?
Welcome to Chiron Beta Prime
Chiron Beta Prime, apart from having its own issues with low-intelligence AIs, is noted for having many suns: one large sun that glows mainly in the blue spectrum, and multiple smaller ones glowing mainly in the green spectrum. They all emit in the totality of the spectrum, but they are stronger in those colours.
Because of the way the orbits are locked to each other, the green suns are always visible from everywhere. The blue sun rises and sets on a regular schedule; define T to be time when the blue sun is risen (so X="Blue sun visible, some green suns visible" and ¬X="Blue sun not visible, some green suns visible").
Now "green" is a well defined concept in this world. Emeralds are green; they glow green under the green suns, and do the same when the blue sun is risen. "Blue" is also a well-defined concept. Sapphires are blue. They glow blue under the blue sun and continue to do so (albeit less intensely) when it is set.
But "grue" is also a well defined concept. Diamonds are grue. They glow green when the green suns are the only ones visible, but glow blue under the glare of the blue sun.
Green, blue, and grue (which we would insist on calling green, blue and white) are thus well understood and fundamental concepts, that people of this world use regularly to compactly convey useful information to each other. They match up easily to fundamental properties of the objects in question (eg frequency of light reflected).
Bleen, on the other hand - don't be ridiculous. Sure, someone on Chiron Beta Prime could make a bleen object (a screen with a timer, maybe?), but it would be completely artificial.
In contrast, the inhabitants of Pholus Delta Secundus, who have a major green sun and many minor blue suns (coincidentally with exactly the same orbital cycles), feel that green, blue and bleen are the natural categories...
Natural relative to the (current) universe
We've shown that some categories that we see as disjunctive or artificial can seem perfectly natural and fundamental to beings in different circumstances. Here's another example:
A philosopher proposes, as thought experiment, to define a certain concept for every object. It's the weighted sum of the inverse of the height of an object (from the centre of the Earth), and its speed (squared, because why not?), and its temperature (but only on an "absolute" scale), and some complicated thing involving its composition and shape, and another term involving its composition only. And maybe we can add another piece for its total mass.
And then that philosopher proposes, to great derision, that this whole messy sum be given a single name, "Energy", and that we start talking about it as if it was a single thing. Faced with such an artificially bizarre definition, sensible people who want to use induction properly have no choice... but to embrace energy as one of the fundamental useful facts of the universe.
What these example show is that green, blue, grue, bleen, and energy are not natural or non-natural categories in some abstract sense, but relative to the universe we inhabit. For instance, if we had some strange energy' which used the inverse of the height cubed, then we'd have a useless category - unless we lived in five spacial dimensions.
You're grue, what time is it?
So how can we say that green and blue are natural categories in our universe, while grue and bleen are not? A very valid explanation seems to be the dependence on X - on the time of day. In our earth, we can tell whether objects are green or blue without knowing anything about the time. Certainly we can get combined information about an object's colour and the time of day (for instance by looking at emeralds out in the open). But we also expect to get information about the colour (by looking at an object in a lit basement) and the time (by looking at a clock). And we expect these pieces of information to be independent of each other.
In contrast, we never expect to get information about an object being currently grue or currently bleen without knowing the time (or the colour, for that matter). And information about the time can completely change our assessment as to whether an object is grue versus bleen. It would be a very contrived set of circumstances where we would be able to assert "I'm pretty sure that object is currently grue, but I have no idea about its colour or about the current time".
Again, this is a feature of our world and the evidence we see in it, not some fundamental feature of the categories of grue and bleen. We just don't generally seen green objects change into blue objects, nor do we typically learn about disjunctive statements of the type "colour=green XOR time=night" without learning about the colour and the time separately.
What about the grue objects on Chiron Beta Prime? There, people do see objects change colour regularly, and, upon investigation, they can detect whether an object is grue without knowing either the time or the apparent colour of the object. For instance, they know that diamond is grue, so they can detect some grue objects by a simple hardness test.
But what's happening is that the Chiron Beta Primers have correctly identified a fundamental category - the one we call white, or, more technically "prone to reflect light both in the blue and green parts of the spectrum" - that has different features on their planet than on ours. From the macroscopic perspective, it's as if we and they live in a different universe, hence grue means something to them and not to us. But the same laws of physics underlie both our worlds, so fundamentally the concepts converge - our white, their grue, mean the same things at the microscopic level.
Definitions open to manipulation
In the next post, I'll look at whether we can formalise "expect independent information about colour and time", and "we don't expect change to the time information to change our colour assessment."
But be warned. The naturalness of these categories is dependent on facts about the universe, and these facts could be changed. A demented human (or a powerful AI) could go through the universe, hiding everything in boxes, smashing clocks, and putting "current bleen detectors" all other the place, so that it suddenly becomes very easy to know statements like "colour=blue XOR time=night", but very hard to know about colour (or time) independently from this. So it would be easy to say "this object is currently bleen", but hard to say "this object is blue". Thus the "natural" categories may be natural now, but this could well change, so we must have care when using these definitions to program an AI.