Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

There's a famous story about Diogenes and Plato:

[...] when Plato gave the tongue-in-cheek definition of man as "featherless bipeds," Diogenes plucked a chicken and brought it into Plato's Academy, saying, "Behold! I've brought you a man," and so the Academy added "with broad flat nails" to the definition.

What Plato was (allegedly) doing was not providing a definition of man, but what I'd call a sufficient reference or a sufficient pointer. If I'm in ancient Athens and divide the obvious objects that I can see or think of into "featherless bipeds" and "not featherless bipeds", then "man" will match up with the first category.

Then Diogenes, acting like an AI, created something that fell within the sufficient pointer class but that was clearly not a man. The Academy then amended the pointer to add "with broad flat nails", patching it till it was sufficient again. Had there been a powerful AI around, or a god, or a meddling human with enough means and persistence, then they could have produced a "featherless-biped-with-broad-flat-nails" that was also not a human, making the pointer inadequate again.

A lot of suggestions on AI safety are sufficient pointers. For example, take the idea that an AI should maximise "complexity". This comes, I believe, from the fact that, in our current world, the category of "is complex" and "is valuable to humans" match up a lot. It's a sufficient pointer. But along comes a Diogenes/AI with complexity as a goal, and now it enriches the set of objects in the world with complex-but-worthless things, breaking the "definition".

Therefore, a lot of things that people say they value or want AIs to preserve/maximise, should not be taken as saying that they value the specific thing they say. Instead, this should be taken as pointer to what they value in the current world, and the challenge is then to extend that to new maps and new territories.

New Comment
1 comment, sorted by Click to highlight new comments since:

For example, take the idea that an AI should maximise “complexity”. This comes, I believe, from the fact that, in our current world, the category of “is complex” and “is valuable to humans” match up a lot.

The Arbital entry on Unforeseen Maximums elaborates on this:

Juergen Schmidhuber of IDSIA, during the 2009 Singularity Summit, gave a talk proposing that the best and most moral utility function for an AI was the gain in compression of sensory data over time. Schmidhuber gave examples of valuable behaviors he thought this would motivate, like doing science and understanding the universe, or the construction of art and highly aesthetic objects.

Yudkowsky in Q&A suggested that this utility function would instead motivate the construction of external objects that would internally generate random cryptographic secrets, encrypt highly regular streams of 1s and 0s, and then reveal the cryptographic secrets to the AI.