## LESSWRONGLW

Why would I give up the whole idea? I think you're correct in that you could model a human with multiple, varying UFs. Is there another way you know of to guide an intelligence toward a goal?

The basic problem is the endemic confusion between the map, the UF as a way of modelling an entity, and the territory. the UF as an architectural feature that makes certain things happen.

The fact that there are multiple ways of modelling humans as UF-driven, and the fact that they are all a bit contrived, should be a hint that there may be no territory corresponding to the map.

# 6

Could utility functions be for narrow AI only, and downright antithetical to AGI? That's a quite fundamental question and I'm kind of afraid there's an obvious answer that I'm just too uninformed to know about. But I did give this some thought and I can't find the fault in the following argument, so maybe you can?

Eliezer Yudkowsky says that when AGI exists, it will have a utility function. For a long time I didn't understand why, but he gives an explanation in AI Alignment: Why It's Hard, and Where to Start. You can look it up there, but the gist of the argument I got from it is:

1. (explicit) If an agent's decisions are incoherent, the agent is behaving foolishly.
1. Example 1: If an agent's preferences aren't ordered, the agent prefers A to B, B to C but also C to A, it behaves foolishly.
2. Example 2: If an agent allocates resources incoherently, it behaves foolishly.
3. Example 3: If an agent's preferences depend on the probability of the choice even having to be made, it behaves foolishly.
2. (implicit) An AGI shouldn't behave foolishly, so its decisions have to be coherent.
3. (explicit) Making coherent decisions is the same thing as having a utility function.

I accept that if all of these were true, AGI should have a utility function. I also accept points 1 and 3. I doubt point 2.

Before I get to why, I should state my suspicion why discussions of AGI really focus on utility functions so much. Utility functions are fundamental to many problems of narrow AI. If you're trying to win a game, or to provide a service using scarce computational resources, a well-designed utility function is exactly what you need. Utility functions are essential in narrow AI, so it seems reasonable to assume they should be essential in AGI because... we don't know what AGI will look like but it sounds similar to narrow AI, right?

So that's my motivation. I hope to point out that maybe we're confused about AGI because we took a wrong turn way back when we decided it should have a utility function. But I'm aware it is more likely I'm just too dumb to see the wisdom of that decision.

The reasons for my doubt are the following.

1. Humans don't have a utility function and make very incoherent decisions. Humans are also the most intelligent organisms on the planet. In fact, it seems to me that the less intelligent an organism is, the easier its behavior can be approximated with model that has a utility function!
1. Apes behave more coherently than humans. They have a far smaller range of behaviors. They switch between them relatively predictably. They do have culture - one troop of chimps will fish for termites using a twig, while another will do something like a rain dance - but their cultural specifics number in the dozens, while those of humans are innumerable.
2. Cats behave more coherently than apes. There are shy cats and bold ones, playful ones and lazy ones, but once you know a cat, you can predict fairly precisely what kind of thing it is going to do on a random day.
3. Earthworms behave more coherently than cats. There aren't playful earthworms and lazy ones, they basically all follow the nutrients that they sense around them and occasionally mate.
4. And single-celled organisms are so coherent we think we can even model them them entirely on standard computing hardware. Which, if it succeeds, means we actually know e.coli's utility function to the last decimal point.
2. The randomness of human decisions seems essential to human success (on top of other essentials such as speech and cooking). Humans seem to have a knack for sacrificing precious lifetime for fool's errands that very occasionally create benefit for the entire species.

A few occasions where such fool's errands happen to work out will later look like the most intelligent things people ever did - after hindsight bias kicks in. Before Einstein revolutionized physics, he was not obviously more sane than those contemporaries of his who spent their lives doing earnest work in phrenology and theology.

And many people trying many different things, most of them forgotten and a few seeming really smart in hindsight - that isn't a special case that is only really true for Einstein, it is the typical way humans have randomly stumbled into the innovations that accumulate into our technological superiority. You don't get to epistemology without a bunch of people deciding to spend decades of their lives thinking about why a stick looks bent when it goes through a water surface. You don't settle every little island in the Pacific without a lot of people deciding to go beyond the horizon in a canoe, and most of them dying like the fools that they are. You don't invent rocketry without a mad obsession with finding new ways to kill each other.
3. An AI whose behavior is determined by a utility function has a couple of problems that human (or squid or dolphin) intelligence doesn't have, and they seem to be fairly intrinsic to having a utility function in the first place. Namely, the vast majority of possible utility functions lead directly into conflict with all other agents.

To define a utility function is to define a (direction towards a) goal. So a discussion of an AI with one, single, unchanging utility function is a discussion of an AI with one, single, unchanging goal. That isn't just unlike the intelligent organisms we know, it isn't even a failure mode of intelligent organisms we know. The nearest approximations we have are the least intelligent members of our species.
4. Two agents with identical utility functions are arguably functionally identical to a single agent that exists in two instances. Two agents with utility functions that are not identical are at best irrelevant to each other and at worst implacable enemies.

This enormously limits the interactions between agents and is again very different from the intelligent organisms we know, which frequently display intelligent behavior in exactly those instances where they interact with each other. We know communicating groups (or "hive minds") are smarter than their members, that's why we have institutions. AIs with utility functions as imagined by e.g. Yudkowsky cannot form these.

They can presumably create copies of themselves instead, which might be as good or even better, but we don't know that, because we don't really understand whatever it is exactly that makes institutions more intelligent than their members. It doesn't seem to be purely multiplied brainpower, because a person thinking for ten hours often doesn't find solutions that ten persons thinking together find in an hour. So if an AGI can multiply its own brainpower, that doesn't necessarily achieve the same result as thinking with others.

Now I'm not proposing an AGI should have nothing like a utility function, or that it couldn't temporarily adopt one. Utility functions are great for evaluating progress towards particular goals. Within well-defined areas of activity (such as playing Chess), even humans can temporarily behave as if they had utility functions, and I don't see why AGI shouldn't.

I'm also not saying that something like a paperclip maximizer couldn't be built, or that it could be stopped once underway. The AI alignment problem remains real.

I do contend that the paperclip maximizer wouldn't be an AGI, it would be narrow AI. It would have a goal, it would work towards it, but it would lack what we look for when we look for AGI. And whatever that is, I propose we don't find it within the space of things that can be described with (single, unchanging) utility functions.

And there are other places we could look. Maybe some of it is in whatever it is exactly that makes institutions more intelligent than their members. Maybe some of it is in why organisms (especially learning ones) play - playfulness and intelligence seem correlated, and playfulness has that incoherence that may be protective against paperclip-maximizer-like failure modes. I don't know.