Hedonium's semantic problem

by Stuart_Armstrong 7 min read9th Apr 201561 comments


If this argument is a re-tread of something already existing in the philosophical literature, please let me know.

I don't like Searle's Chinese Room Argument. Not really because it's wrong. But mainly because it takes an interesting and valid philosophical insight/intuition and then twists it in the wrong direction.

The valid insight I see is:

One cannot get a semantic process (ie one with meaning and understanding) purely from a syntactic process (one involving purely syntactic/algorithmic processes).

I'll illustrate both the insight and the problem with Searle's formulation via an example. And then look at what this means for hedonium and mind crimes.


Napoleonic exemplar

Consider the following four processes:

  1. Napoleon, at Waterloo, thinking and directing his troops.
  2. A robot, having taken the place of Napoleon at Waterloo, thinking in the same way and directing his troops in the same way.
  3. A virtual Napoleon in a simulation of Waterloo, similarly thinking and directing his virtual troops.
  4. A random Boltzmann brain springing into existence from the thermal radiation of a black hole. This Boltzmann brain is long-lasting (24 hours), and, by sheer coincidence, happens to mimic exactly the thought processes of Napoleon at Waterloo.

All four mental processes have the same syntactic properties. Searle would draw the semantic line between the first and the second process: the organic mind is somehow special. I would draw the semantic line between the third and the fourth process. The difference is that in all three of the first processes, the symbols in the brain correspond to objects in reality (or virtual reality). They can make reasonably accurate predictions about what might happen if they do something, and get feedback validating or infirming those predictions. Semantic understanding emerges from a correspondence with reality.

In contrast the fourth process is literally insane. It's mental process correspond to nothing in reality (or at least, nothing in its reality). It emerges by coincidence, its predictions are wrong or meaningless, and it will almost certainly be immediately destroyed by processes it has completely failed to model. The symbols exist only within its own head.

There are some interesting edge cases to consider here (I chose Napoleon because there are famously many people deluded into thinking they are Napoleon), but that's enough background. Essentially the symbol grounding problem is solved (maybe by evolution, maybe by deliberate design) simply by having the symbols and the mental model be close enough to reality. The symbols in the Boltzmann-Napoleon's brain could be anything, as far as we know - we just identify it with Napoleon because it's coincidentally similar. If Napoleon had never existed, we might have no clue as to what Boltzmann-Napoleon was "thinking".


Hedonium: syntax?

The idea behind hedonium is to take something corresponding to the happiest possible state, and copy that with maximal efficiency across the universe. This can involve defining hedons - the fundamental unit of happiness - and maximise them while minimising dolors (the fundamental units of pain/suffering/anti-happiness). Supposedly this would result in the cosmos being filled to the brim with the greatest possible amount of happiness and joy. This could maybe be pictured as taking the supreme moment of ecstatic, orgasmic happiness of the most joyful person ever to live, and filling the cosmos with that.

Let's start with the naivest of possible hedonium ideas, a simple algorithm with a happiness counter "My_happiness" which is either continually increasing or set at some (possibly infinite or trans-finite) maximum, while the algorithm continually repeats to itself "I have ultimate happiness!".

A very naive idea. And one that has an immediate and obvious flaw: what happens if English were to change so that "happiness" and "suffering" exchanged meanings? Then we would have transformed the maximally happy universes into a maximally painful one. All at the stroke of a linguistic pen.

The problem is that that naive hedonium ended up being a purely syntactic construction. Referring to nothing in the outside universe, its definition of "happiness" was entirely dependent on the linguistic label "happiness".

It seems that the more grounded and semantic the symbols are, the harder it is to get an isomorphism that transforms them into something else.


Hedonium: semantics

So how can we ensure that we have something that is inarguably hedonium, not just the algorithmic equivalent of drawing a happy face? I'd say there are three main ways that we can check that the symbols are grounded/the happiness is genuine:

  • Predictive ability
  • Simplest isomorphism to reality
  • Correct features

If the symbols are well grounded in reality, then the agent should have a decent predictive ability. Note that the bar is very low here. Someone who realises that battles are things that are fought by humans, that involve death, and that are won or lost or drawn, is already very much ahead than someone who thinks that battles are things that invite you home for tea and biscuits. So a decent prediction is "someone will die in this battle", a bad one is "this battle will wear a white frilly dress".

Of course, that prediction relies on the meaning of "die" and "white frilly dress". We can get round this problem by looking at predictive ability in general (does the agent win some bets/achieve a goal it seems to have?). Or we can look at the entire structure of the agent's symbolic setup, and the relationships between them. This is what the project CYC tried to do, by memorising databases of sentences like "Bill Clinton belongs to the collection of U.S. presidents" and "All trees are plants". The aim was to achieve and AI, which failed. However, if the database is large and complicated enough, it might be that there is only one sensible way of grounding the symbols in reality. "Sensible" can here be defined using a complexity prior.

But be warned! The sentences are very much intuition pumps. "Bill Clinton belongs to the collection of U.S. presidents" irresistibly makes us think of the real Bill Clinton. We need to able to take sentences like "Solar radiation waxes to the bowl of ambidexterous anger", and deduce after much analysis of the sentences' structures that "Solar radiation -> Bill Clinton", "waxes -> belongs", etc...

Notice there is a connection with the symbolic approach of GOFAI ("Good Old-Fashioned AI). Basically GOFAI failed because the symbols did not encode true understanding. The more hedonium resembles GOFAI, the more likely it is to devoid of actual happiness (equivalently, the more likely it is to be isomorphic to some other, non-happiness situation).

Finally, we can assess some of the symbols (the more abstract ones) by looking at their features (it helps if we have grounded many of the other symbols). For instance, we think one concept might be "nostalgia for the memory of childhood". This is something that we expect to be triggered when childhood is brought up, or when the agent sees a house that resembles its childhood home, and it is likely to result in certain topics of conversation, and maybe some predictable priming on certain tests.

Of course, it is trivially easy to setup an algorithm with a "nostalgia for the memory of childhood" node, a "childhood conversation" node, etc..., with the right relations between them. So, as in this generalised Turing test, it's more indicative if the "correct features" are things the programmers did not explicitly design in.


Hedonium: examples and counterexamples

So, what should we expect from a properly grounded hedonium algorithm? There are many reasons to expect that they will be larger than we might have intuitively thought. Reading the word "anger" or seeing a picture of an angry person both communicate "anger" to us, but a full description of "anger" is much larger and more complex than can be communicated by the word or the picture. Those suggest anger by simply reminding us of our own complex intuitive understanding of the term, rather than by grounding it.

Let's start by assuming that, for example, the hedonium experience involves someone "building on the memory of their previously happiest experience", for instance. Let's ground that particular memory. First of all, we have to ground the concept of (human) "memory". This will require a lot algorithmic infrastructure. Remember we have to structure the algorithm so that even if we label "memory" as "spatula", an outside analyst if forced to conclude that "spatula" can only mean memory. This will, at the minimum, involve the process of laying down many examples of memories, of retrieving them and making use of them.

This is something that the algorithm itself must do. If the algorithm doesn't do that each time the hedonium algorithm is run, then the whole concept of memory is simply a token in the algorithm saying "memory is defined in location X", which is trivially easy to change to something completely different. Remember, the reason the algorithm needs to ground these concepts itself is to prevent it being isomorphic to something else, something very bad. Nor can we get away with a simplistic overview of a few key memories being laid down - we'd be falling back into the GOFAI trap of expecting a few key relationships to establish the whole concept. It seems that for an algorithm to talk about memory in a way that makes sense, we require the algorithm to demonstrate a whole lot of things about the concept.

It won't be enough, either, to have a "memory submodule" that the main algorithm doesn't run. That's exactly the same as having an algorithm with token saying "memory is defined over there"; if you change the content of "over there", you change the algorithm's semantics without changing its syntax.

Then, once we have the concept of memory down, we have to establish the contents and emotions of that particular memory, both things that will require the algorithm to actively perform a lot of tasks.

Let's look at a second example. Assume now that the algorithm thinks "I expect happiness to increase" or something similar. I'll spare you the "I" for the moment, and just focus on "expect". "Expectation" is something specific, probably best defined by the "correct features" approach. It says something about future observations. It allows for the possibility of being surprised. It allows for the possibility of being updated. All these must be demonstrable features of the "expect" module, to ground it properly. So the algorithm must demonstrate a whole range of changing expectations, to be sure that "expects" is more that just a label.

Also, "expectation" is certainly not something that will be wrong every single time. It's likely not something that will be right every single time. This poses great problems for running the hedonium algorithm identically multiple times: the expectations are either always wrong or always right. The meaning of "expectation" has been lost, because it no longer has the features that it should.

There are similar problems with running the same algorithm in multiple locations (or all across the universe, in the extreme case). The first problem is that this might be seen as isomorphic to simply running the algorithm once, recording it, and running the recording everywhere else. Even if this is different, we might have the problem that an isomorphism making the hedonium into dolorum might be very large compared with the size of the hedonium algorithm - but tiny compared with the size of the multiple copies of the algorithm running everywhere.

But those are minor quibbles: the main problem is whether the sense of identity of the agent can be grounded sufficiently well, while remaining accurate if the agent is run trillions upon trillions of times. Are these genuine life experience? What if the agent learns something new during that period - this seems to stretch the meaning of "learning something new", possibly breaking it.

Other issues crop up - suppose a lot of my identity is tied up with the idea I could explore space around me? In a hedonium world, this would be impossible, as the space (physical and virtual) is taken up by other copies being run in limited virtual environments. Remember it's not enough to say "the agent could explore space"; if there is no possibility for the agent to do so "could explore" can be syntactically replaced with "couldn't explore" without affecting the algorithm, just its meaning.

These are just the first issues that come to mind; if you replace actual living and changing agents with hedoniumic copies of themselves, you have to make those copies have sufficiently rich interactions that all the important features of living and changing agents are preserved and grounded uniquely.


Beyond Hedonium

Well, where does that leave us? Instead of my initial caricature of hedonium, what if we had instead a vast amount of more complex algorithms, possibly stochastic and varying, with more choices, more interactions, more exploration, etc... all that is needed to ground them as agents with emotions? What it we took those, and then made them as happy as possible? Would I argue against that hedonium, still?

Probably not. But I'm not sure "hedonium" is the best description of that setup. It seems to be agents, that have various features, among which happens to be extremely high happiness, rather than pure happiness algorithms. And that might be a better way of conceiving of them.


Addendum: mind crimes

Nick Bostrom and others have brought up the possibility of AI "mind crimes", where the AI, simply by virtue of simulating humans in potentially bad situations, causes these humans to exist and, possibly, suffer (and then most likely die as the simulation ends).

This situation seems exactly converse to the above. For hedonium, we want a rich enough interaction to ground all the symbols and leave no ambiguity as to what is going on. To avoid mind crimes, we want the opposite. We'd be fine if the AI's prediction modules returned something like this, as text:

Stuart was suffering intensely, as he recalled agonising memories and tried to repair his mangled arms.

Then as long as we get to safely redefine the syntactic tokens "suffering", "agonising", etc..., we should be fine. Note that the AI itself must have a good grounding of "suffering" and so on, so that it knows what to avoid. But as long as the prediction module (the part that runs repeatedly) has a simple syntactic definition, there should be no mind crimes.