The Teacup Test

[-]leogao3y4330

I think there are two main problems being pointed at here. First is that it seems reasonable to say that for the most part under most definitions of intelligence, intelligence is largely continuous (though some would argue for the existence of a few specific discontinuities)---thus, it seems unreasonable to ask "is X intelligent". A teacup may be slightly more intelligent than a rock, and far less intelligent than GPT-3.

Second is the fact that the thing we actually care about is neither "intelligence" nor "Bayesian agent"; just because you can't name something very precisely yet doesn't mean that thing doesn't exist or isn't worth thinking about. The thing we care about is that someone might make a thing in 10 years that literally kills everyone, and we have some models of how we might expect that thing to be built. In analogy, perhaps we have a big philosophical argument over what counts as a "chair"---some would argue bitterly whether stools count as chairs, or whether tiny microscopic chair-shaped things count as chairs, or whether rocks count as chairs because you can sit on them, some people arguing that there is in fact no such thing as a physical chair, because concepts like that exist only in the map and the territory is made of atoms, etc. But if you have the problem that you expect chairs to break when you sit on them if they aren't structurally sound, then most of these arguments are a huge distraction. Or more pithily:

"nooo that's not really intelligent" I continue to insist as I shrink and transform into a paperclip

[-]lc3y30

The other part is that humans can pursue pretty arbitrary instrumental goals, whereas if you tell the teacup it has to win a chess match or die it will die.

[-]mathenjoyer3y20

No, they can't. See: "akrasia" on the path to protecting their hypothetical predicted future selves 30 years from now.

The teacup takes the W here too. It's indifferent to blackmail! [chad picture]

[-]lc3y30

"Pretty arbitrary" of course not meaning "absolutely arbitrary", just meaning more arbitrary than most things, such as teacups. And when I said "tell" I give an ultimatum and then follow through.

[-]mathenjoyer3y50

Fair.

Something something blackmailer is subjunctively dependent with the teacup! (This is a joke.)

[-]Ben3y252

When a Greek philosopher starts asking you what a thing actually is, sooner or later you might find yourself saying "I know it when I see it". (That plucked chicken is not a man).

[-]Viliam3y215

I know it when I see it

So, the neural networks that recognize things without anyone being able to explain how they actually do it, are actually doing it the right way.

"The definition of intelligence is the following: take the value of the upper left pixel in my retina, and multiply it by 0.02378. Take the value from the next pixel and multiply it by 0.02376. Take the value...

(20 years later)

...and if you add this together, and the result is greater than 0.86745, then yes, I would call such system 'intelligent'. Any objections?"

Socrates: "Please someone pass me the hemlock already."

[-]CronoDAS1y137

There's an old joke...

An engineer, a physicist, a mathematician, and an AI researcher were asked to name the greatest invention of all time.

The engineer chose fire, which gave humanity power over matter. The physicist chose the wheel, which gave humanity the power over space. The mathematician chose the alphabet, which gave humanity power over symbols. The AI researcher chose the thermos bottle.

"Why a thermos bottle?" the others asked. "Because the thermos keeps hot liquids hot in winter and cold liquids cold in summer.", said the AI researcher. "Yes - so what?" "Think about it.", intoned the researcher reverently. "That little bottle - how does it know?"

[-]Donald Hobson3y72

I think the candle and ice mechanism technically contains an iota of intelligence. Not much. Less than a fly. But some.

[-]lc3y*7-3

Intelligent systems are systems which design and deploy nanomachines that make life on earth inhospitable for humans.

[-]Maximum_Skull3y100

Oh, so human diseases in the form of bacteria/viruses! And humans working on gain-of-function research.

[-]lc3y63

I actually have no problem calling either of those systems intelligent, as long as the bacteria/viruses are evolving on a short enough timescale. I guess you could call them slow.

[-]Maximum_Skull3y50

Reminds me of a discussion I've had recently about whether humans solve complex systems of [mechanical] differential equations while moving. The counter-argument was "do you think that a mercury thermometer solves differential equations [while 'calculating' the temperature]?"

[-]Gordon Seidoh Worley3y*40

For what it's worth, I think we solved the core issue in this post some time ago as part of the project of cybernetics by carefully defining the sort of systems that we might call intelligent.

[-]Victor Novikov3y40

I'm going to bite the bullet and say that an "intelligence" and "optimizer" are fundamentally the same thing; or rather that these words points to the same underlying concept we don't quite have a non-misleading word for.

An optimizer is a system that pulls probability-mass away from some world-states and toward some world-states; anything that affects reality is an "optimizer". A tea cup is an optimizer.

The purpose of an optimizer is what it does; what is optimizing for. A teacup's purpose is to contain objects inside, when the teacup is positioned the right way up in in a gravity field. A teacup's purpose is to transfer heat. A teacup's purpose is to break when dropped from a height. A teacup's purpose is the set of all it does. A teacup's purpose, in full generality is: "be a teacup".

A teacup is aligned to the purpose of being a teacup.
A system that is a aligned is one that is the correct state to fullfill its own purpose.
And a teacup is, obviously, in the correct physical state for being a teacup. Tautology.
All systems are perfectly aligned to their own purpose.

But a teacup is imprefectly aligned for the purpose of "being used as a teacup by a human". If it a dropped, it may break. If it is tipped, it may spill. All of these things are aligned to the purpose of "be the physical object: the teacup" but imprefectly "be a useful teacup for drinking tea and other liquids"

What is an optimizer?

An optimizer is an engine that converts alignment into purpose.

Alignment: "be a teacup" -> purpose: "behave like a teacup". This part is tautological.

Alignment: "be a useful teacup for humans" -> purpose: "be used in beneficial ways by humans". This part is not tautological.

A teacup may be good or bad at that. A teacup may harm humans, though: it may spill tea. It may break into sharp shards of ceramic. So a teacup may cause both good and bad outcomes.

A Friendly teacup, a human-aligned teacup is one that is optimizes for its purpose, of making good outcomes more likely and bad outcomes less likely.

A Friendly teacup is harder to spill or to accidentally drop. A Friendly teacup is not so heavy that it would injure a human if it falls on their foot. A Friendly teacup is one that is less likely to harm a human if it breaks.

But how does a teacup optimize for good outcome? By being a teacup. By continuing to be a teacup.

Once a physical object has been aligned into the state of being a teacup, it continues to be a teacup. Because a teacup is a physical system that optimizes for retaining its shape (unless is broken or damaged).

A Friendly teacup, once aligned into being a Friendly teacup, serves its purpose by continuing to be a Friendly teacup. A Friendly teacup optimizes humans, it optimizes some particular set of outcomes for humans by continuing to be a Friendly teacup.

How does a Friendly teacup optimize you? Because its existence, its state of being and continuing to be a teacup, leads to you to make different choices than if it that were not the case; you might enjoy a refreshing cup of tea!

You are being optimized. By a teacup. So that it may fullfill its assigned purpose. This is a perfectly valid way to see things.

The teacup has been made (aligned) in a way that makes it a good teacup (purpose): in this example the optimization-pressure is the process that created the teacup.

So this is my answer: your example is valid. Both teacup alignment and AI alignment are fields that use some of the same underlying concepts, if you understand these terms to full generality.

But for teacups these things are obvious, so we don't need fancy terminology for them, it is confusing to try and use the terminology this way.

But it is valid.

[-]mathenjoyer3y40

I don't disagree with any of this.

And yet, some people seem to be generalizedly "better at things" than others. And I am more afraid of a broken human person (he might shoot me) than a broken teacup.

It is certainly possible that "intelligence" is a purely intrinsic property of my own mind, a way to measure "how much do I need to use the intentional stance to model another being, rather than model-based reductionism?" But this is still a fact about reality, since my mind exists in reality. And in that case "AI alignment" would still need to be a necessary field, because there are objects that have a larger minimal-complexity-to-express than the size of my mind, and I would want knowledge that allows me to approximate their behavior.

But I can't robustly define words like "intelligence" in a way that beats the teacup test. So overall I am unwilling to say "the entire field of AI Alignment is bunk because intelligence isn't a meaningful concept?" I just feel very confused.

[-]localdeity3y31

Seems reminiscent of On the Impossibility of Supersized Machines.

[-]Sonata Green3y2-1

Suppose I built a machine with artificial skin that felt the temperature of the cup and added ice to cold cups and lit a fire under hot cups.

Should this say "added ice to hot cups and lit a fire under cold cups"?

"Oh, right," said Xenophon, "How about 'Systems that would adapt their policy if their actions would influence the world in a different way'?"
"Teacup test," said Socrates.

This seems wrong. The artificial-skin cup adds ice or lights fire solely according to the temperature of the cup; if it finds itself in a world where ice makes tea hotter and fire makes tea colder, the cup does not adapt its strategies.

[-]NoriMori19922y10

My teacup does has a choice.

Should be "does have"; or just "has" (without "does").

[-]lsusr2y20

Fixed. Thanks.

[-]Evgenii Opryshko3y10

The paper On the Measure of Intelligence by François Chollet does a pretty good job of defining the intelligence, including the knowledge that the creator of a system has hardcoded into it.

[-]zoop3y10

"What is intelligence?" is a question you can spend an entire productive academic career failing to answer. Intentionally ignoring the nerd bait, I do think this post highlights how important it is for AGI worriers to better articulate which specific qualities of "intelligent" agents are the most worrisome and why.

For example, there has been a lot of handwringing over the scaling properties of language models, especially in the GPT family. But as Gary Marcus continues to point out in his inimitable and slightly controversial way, scaling these models fails to fix some extremely simple logical mistakes - logical mistakes that might need to be fixed by a non-scaling innovation before an intelligent agent poses an ex-risk. On forums like these it has long been popular to say something along the lines of "holy shit look how much better these models got when you add __ amount of compute! If we extrapolate that out we are so boned." But this line of thinking seems to miss the "intelligence" part of AGI completely, it seemingly has no sense at all of the nature of the gap between the models that exist today and the spooky models they worry about.

It seems to me that we need a better specification for describing what exactly intelligent agents can do and how they get there.

[-]Anonymous3y1-1

"Systems that would adapt their policy if their actions would influence the world in a different way"

Does the teacup pass this test? It doesn't necessarily seem like it.

We might want to model the system as "Heat bath of Air -> teacup -> Socrates' tea". The teacup "listens to" the temperature of the air on its outside, and according to some equation transmits some heat to the inside. In turn the tea listens to this transmitted heat and determines its temperature.

You can consider the counterfactual world where the air is cold instead of hot. Or the counterfactual world where you replace "Socrates' tea" with "Meletus' tea", or with a frog that will jump out of the cup, or whatever. But in all cases the teacup does not actually change its "policy", which is just to transmit heat to the inside of the cup according to the laws of physics.

To put it in the terminology of "Discovering Agents", one can add mechanism variables going into the object level variables. But there are no arrows between these, so there's no agent.

Of course, my model here is bad and wrong physically speaking, even if it does capture crude cause-effect intuition about the effect of air temperature on beverages. However I'd be somewhat surprised if a more physically correct model would introduce an agent to the system where there is none.

[-]Jiro3y31

But in all cases the teacup does not actually change its “policy”, which is just to transmit heat to the inside of the cup according to the laws of physics.

This kind of description depends completely on how you characterize things. If the policy is "transmit heat according to physics" the policy doesn't change. If the policy is "get hotter" this policy changes to "get colder". It's the same thing, described differently.

[-]Viktor Rehnberg3y10

I've been thinking about the Eliezer's take on the Second Law of Thermodynamics and while I can't think of a succint comment to drop with it. I think it could bring value to this discussion.

[-]tskoro3y10

What about complexity as a necessary condition for intelligence? The teacup does not possess this, but arguably the yogi does (at the very least he must feed himself, interact socially to some extent, etc). Intelligent creatures have brains that are fairly large (millions/billions of neurons) and have a high degree of complexity in their internal dynamics, and this corresponds to complexity in their actions.

[-]the gears to ascension3y10

a rock wants to stay a rock, and how badly it wants it is defined by the equations of the nuclear strong forces. a teacup wants to stay a teacup, same deal as the rock. a yogi appears to want to stay a yogi, though I'm less confident that a yogi wants that quite as badly or consistently as a rock wants to stay a rock; thankfully, most yogi produce significant communication before becoming silent, and there are incremental tests you can do that will reveal that attempting to interfere with their actions will result in them attempting to maintain their state as a yogi.

the teacup test is reasonable, but a good ai safety solution would be teacup invariant.

and anyway, I have an intuition that damaging anything is necessarily going to be against the values of some agent, so even if nobody seems to care, it's worth interrogating a map of other agents' preferences about stuff. maybe they aren't allowed to override yours, depending on the preference permissions, but it still seems like if you want to smash your cup and it's the most beautiful cup that ever existed, maybe I want to offer to trade your beautiful cup that many agents would seek to intervene to protect with something less special. perhaps, for example, a virtual simulation of the behavior a cup would have when breaking, if knowledge of the process is what truly appetizes; maybe you'd even enjoy simulations of much more complicated stuff breaking? or perhaps you might like a different cup. after all, those atoms want to stay together right now using their local agency of being physical chemicals, and my complicated agency-throwing machine called a brain detects there may be many agency-throwers in the world who want to protect the teacup.

[-]Emrik3y1-6

This is glorious in so many ways. Thank you.

[-]Vanilla_cabs3y30

In what ways?

[-]Emrik3y1-1

All the correct ways and none of the incorrect ways, of course! I see the ambivalence and range of plausible interpretations. Can't I just appreciate a good post for the value I found in it without being fished out for suspected misunderstandings? :p

I especially liked how this is the cutest version of Socrates I've encountered in any literature.

[-]Vanilla_cabs3y21

I was just curious and wanted to give you the occasion to expand your viewpoint. I didn't downvote your comment btw.

[-]Emrik3y10

Aye, I didn't jump to the conclusion that you were aggressive. I wanted to make my comment communicate that message anyway, and that your comment could be interpreted like that gave me an excuse.

LESSWRONG
LW

LESSWRONG
LW

102

102

102