The only thing that is required for an agent to provably have a utility function is that it has coherent preferences. The only thing that means is that the agent obeys the four axioms of VNM-rationality (which are really simple and would be really weird if they weren't satisfied). The Von Neumann - Morgenstern utility theorem states than B is preferred over A if and only if E(U(A)) < E(U(B)). And while it's true that when humans are presented with a choice between two lotteries, they sometimes pick the lottery that has a lower expected payout, this doesn't mean humans don't have a utility function, it just means that our utility function in those scenarios is not based entirely on the expected dollar value (it could instead be based on the expected payout where higher probabilities are given greater weight, for example).
The only thing that means is that the agent obeys the four axioms of VNM-rationality (which are really simple and would be really weird if they weren't satisfied).
Not really. There are reasonable decision procedures that violate the axioms (by necessity, since they aren't equivalent to a utility function). For example, anything that makes a decision based on the "5% outcome" of its decisions (known as "VaR"). Or something that strictly optimizes for one characteristic and then optimizes for another among all things optimizing the first.
I don't think it's hard to argue that the first process isn't a good idea, and plenty of people in finance argue that. However, for the second one, who cares that the lexicographic ordering on pairs of real numbers can't be embedded into the usual ordering on real numbers?
Why buy lottery tickets?
Why stop buying lottery tickets?
Why do some humans buy lottery tickets and others not? If humans don't all have the same utility function how do they get one? Isn't the process of acquisition and change of utility function (or whatever we use to approximate one) more important to our understanding of intelligence and the future of intelligence than the function itself?
People buy lottery tickets because no one can accurately "feel" or intuit incredibly small probabilities. We (by definition) experience very few or no events with those probabilities, so we have nothing on which to build that intuition. Thus we approximate negligible but non zero probabilities as small but non negligible. And that "feeling" is worth the price of the lottery ticket for some people. Some people learn to calibrate their intuitions over time so negligible probabilities "feel" like zero, and so they don't buy lottery tickets. The problem is less about utility functions and more about accurate processing of small probabilities.
I'm not sure you noticed but I bought up lotteries because it directly contradicts "it could instead be based on the expected payout where higher probabilities are given greater weight, for example" because we see an example of a very very low probability be given a high weight (if our brains even do that).
Humans don't have a utility function and make very incoherent decisions
Wait. Most of what I've read about utility functions applied to humans before anyone seriously talked about AGI. It doesn't have to be a simple function, and nobody may be able to algebraically express their utility function, but to the extent that any agent makes coherent goal-driven decisions, it has a utility function.
Humans that make incoherent decisions are either following a complex utility function that's hard to reverse-engineer, or have an inconstant utility function that changes across time.
but to the extent that any agent makes coherent goal-driven decisions, it has a utility function
That is not obvious to me. Why is it so? (defining "utility function" might be helpful)
I'm not sure how rhetorical your question is but you might want to look at the Von Neumann–Morgenstern utility theorem.
I'm quite familiar with the VNM utility, but here we are talking about real live meatbag humans, not about mathematical abstractions.
You asked
but to the extent that any agent makes coherent goal-driven decisions, it has a utility function
That is not obvious to me. Why is it so? (defining "utility function" might be helpful)
Taking the VNM axioms as the definition of "coherent" then the VNM theorem proves precisely that "coherent" implies "has a utility function".
Anyway, the context of the original post was that humans had an advantage through not having a utility function. So in that context the VNM theorem raises the question "Exactly which of the axioms is it advantageous to violate?".
Taking the VNM axioms as the definition of "coherent" then the VNM theorem proves precisely that "coherent" implies "has a utility function".
Sure, but that's an uninteresting tautology. If we define A as a set of conditions sufficient for B to happen then lo and behold! A implies B.
So in that context the VNM theorem raises the question "Exactly which of the axioms is it advantageous to violate?"
The VNM theorem posits that a utility function exists. It doesn't say anything about how to find it or how to evaluate it, never mind in real time.
It's like asking why humans don't do the Solomonoff induction all the time -- "there must be a reason, what is it?"
Sure, but that's an uninteresting tautology. If we define A as a set of conditions sufficient for B to happen then lo and behold! A implies B.
Come on, mathematics is sometimes interesting, right?
The VNM theorem posits that a utility function exists. It doesn't say anything about how to find it or how to evaluate it, never mind in real time.
It's like asking why humans don't do the Solomonoff induction all the time -- "there must be a reason, what is it?"
Yeah okay, I agree with this. In other words the VNM theorem says that our AGI has to have a utility function, but it doesn't say that we have to be thinking about utility functions when we build it or care about utility functions at all, just that we will have "by accident" created one.
I still think that using utility functions actually is a good idea though, but I agree that that isn't implied by the VNM theorem.
In other words the VNM theorem says that our AGI has to have a utility function
Still nope. The VNM theorem says that if our AGI sticks to VNM axioms then a utility function describing its preferences exists. Exists somewhere in the rather vast space of mathematical functions. The theorem doesn't say that the AGI "has" it -- neither that it knows it, nor that it can calculate it.
The most defensible use of the term is described as Ordinal Utility, but this is a little weaker than I commonly see it used around here. I'd summarize as "a predictive model for how much goodness an agent will experience conditioned on some decision". Vincent Yu has a more formal description in (this comment)[http://lesswrong.com/lw/dhd/stupid_questions_open_thread_round_3/72z3].
There's a lot of discussion about whether humans have a utility function or not, with the underlying connotation being that a utility function implies consistency in decisionmaking, so inconsistency proves lack of utility function. One example: Do Humans Want Things? I prefer to think of humans as having a utility function at any given point in time, but not one that's consistent over time.
A semi-joking synonym for "I care about X" for some of us is "I have a term for X in my utility function". Note that this (for me) implies a LOT of terms in my function, with very different coefficients that may not be constant over time.
A "utility function" as applied to humans is an abstraction, a model. And just like any model, it is subject to the George Box maxim "All models are wrong, but some are useful".
If you are saying that your model is "humans ... [have] a utility function at any given point in time, but not one that's consistent over time", well, how useful is this model? You can't estimate this utility function well and it can change at any time... so what does this model give you?
I had the same thoughts after listening to the same talk. I think the advantage of utility functions, though, is that they are well-defined mathematical constructs we can reason about and showcase the corner cases that may pop up in other models but would also be easier to miss. AGI, just like all existing intelligences, may not be implemented with a utility function, but the utility function provides a powerful abstraction for reasoning about what we might call more loosely its "preference relation" that, by admitted contradictions, may risk us missing cases where the contradictions do not exist and the preference relation becomes a utility function.
The point being, for the purpose of alignment, studying utility functions makes more sense because your control method can't possibly work on a preference relation if it can't even work on the simpler utility function. That real preference relations contain things that prevent the challenges of aligning utility functions in existing intelligences instead provides evidence of how the problem might be solved (at least for some bounded cases).
That makes sense. But it isn't what Eliezer says in that talk:
There’s a whole set of different ways we could look at agents, but as long as the agents are sufficiently advanced that we have pumped most of the qualitatively bad behavior out of them, they will behave as if they have coherent probability distributions and consistent utility functions.
Do you disagree with him on that?
Basically agree, and it's nearly the same point I was trying to get at, though by less supposing utility functions are definitely the right thing. I'd leave open more possibility that we're wrong about utility functions always being the best subclass of preference relations, but even if we're wrong about that our solutions must at least work for utility functions, they being a smaller set of all possible ways something could decide.
I think utility functions can produce more behaviours than you give them credit for.
- Humans don't have a utility function and make very incoherent decisions. Humans are also the most intelligent organisms on the planet. In fact, it seems to me that the less intelligent an organism is, the easier its behavior can be approximated with model that has a utility function!
The less intelligent organisms are certainly more predictable. But I think that the less intelligent ones actually can't be described by utility functions and are instead predictable for other reasons. A classic example is the Sphex wasp.
Some Sphex wasps drop a paralyzed insect near the opening of the nest. Before taking provisions into the nest, the Sphex first inspects the nest, leaving the prey outside. During the inspection, an experimenter can move the prey a few inches away from the opening. When the Sphex emerges from the nest ready to drag in the prey, it finds the prey missing. The Sphex quickly locates the moved prey, but now its behavioral "program" has been reset. After dragging the prey back to the opening of the nest, once again the Sphex is compelled to inspect the nest, so the prey is again dropped and left outside during another stereotypical inspection of the nest. This iteration can be repeated several times without the Sphex changing its sequence; by some accounts, endlessly.
So it looks like the wasp has a utility function "ensure the survival of its children" but in fact it's just following one of a number of fixed "programs". Whereas humans are actually capable of considering several plans and choosing the one they prefer, which I think is much closer to having a utility function. Of course humans are less predictable, but one would always expect intelligent organisms to be unpredictable. To predict an agent's actions you essentially have to mimic its thought processes, which will be longer for more intelligent organisms whether they use a utility function or not.
- The randomness of human decisions seems essential to human success (on top of other essentials such as speech and cooking). Humans seem to have a knack for sacrificing precious lifetime for fool's errands that very occasionally create benefit for the entire species.
If trying actions at random produces useful results then a utility maximising AI will choose this course. Utility maximisers consider all plans and pick the one with the highest expected utility, and this can turn out to be one that doesn't look like it goes directly towards the goal. Eventually of course the AI will have to turn its attention towards its main goal. The question of when to do this is known as the exploration vs. exploitation tradeoff and there are mathematical results that utility maximisers tend to begin by exploring their options and then turn to exploiting their discoveries once they've learnt enough.
To define a utility function is to define a (direction towards a) goal. So a discussion of an AI with one, single, unchanging utility function is a discussion of an AI with one, single, unchanging goal. That isn't just unlike the intelligent organisms we know, it isn't even a failure mode of intelligent organisms we know. The nearest approximations we have are the least intelligent members of our species.
Again I think that this sort of behaviour (acting towards multiple goals) can be exhibited by utility maximizers. I'll give a simple example. Consider the agent who can by any 10 fruits from a market, and suppose its utility function is sqrt(number of oranges) + sqrt(number of apples). Then it buys 5 oranges and 5 apples (rather than just buying 10 apples or 10 oranges). The important thing about the example is the the derivative of the utility function is decreasing as the number of oranges increases, and so the more it has already the more it will prefer to buy apples instead. This creates a balance. This is just a simple example but by analogy it would be totally possible to create a utility function to describe a multitude of complex values all simultaneously.
- Two agents with identical utility functions are arguably functionally identical to a single agent that exists in two instances. Two agents with utility functions that are not identical are at best irrelevant to each other and at worst implacable enemies.
Just like humans, two agents with different utility functions can cooperate through trade. The two agents calculate the outcome if they trade and the outcome if they don't trade, and they make the trade if the utility afterwards is higher for both of them. It's only if their utilities are diametrically opposed that they can't cooperate.
Agreed on that last point particularly. Especially since, if they want similar enough things, they could easily cooperate without trade.
Like if two AIs supported Alice in her role as Queen of Examplestan, they would probably figure that quibbling with each other over whether Bob the gardener should have one or two buttons undone (just on the basis of fashion, not due to larger consequences) is not a good use of their time.
Also, the utility functions can differ as much as you want on matters aren't going to come up. Like, Agents A and B disagree on how awful many bad things are. Both agree that they are all really quite bad and all effort should be put forth to prevent them.
I think that there is hidden assumption that utility function is simple, so it could be easily calculated for any given position. So we have interaction of two algorithms: one extremely simple, utility function, and another is extremely complex (AGI). Most problems like paperclip maximiser results of such interaction.
The question which arise here, is it possible that utility function also will be very complex? For example as complex, as narrow AI? Could it help us in creating Friendly AI? Is known complexity of human values the same thing?
What would it mean for an AGI to not have a utility function, specifically what do you mean by "incoherently"?
The key issue for humans is not whether humans have utility functions, but whether it's useful to model humans as having utility functions.
The basic problem is the endemic confusion between the map, the UF as a way of modelling an entity, and the territory. the UF as an architectural feature that makes certain things happen.
It seems to you that entities with simple and obvious goal directed behaviour (as seen from the outside) have or need UFs, and entities that don't. don't. But there isn't a fixed connection between the way things seem from the outside, and the way they work.
From the outside, any system that succeeds in doing anything specialised can be thought of, or described as a relatively general purpose system that has been constrained down to a more narrow goal by some other system. For instance, a chess -playing system maybe described as a general purpose problem-solver that has been trained on chess. To say its UF defines a goal of winning at chess is the "map" view.
However, it might well be .. in terms of the territory ... in terms of what is going on inside the black box.. a special purpose system that has been specifically coded for chess, has no ability to do anything else, and therefore does not any kind of reward channel or training system to keep it focused on chess. So the mere fact that a system, considered from the outside as a black box, does some specific thing, is not proof that it has a UF, and therefore not a proof that anyone has succeeded in loading values or goals into its UF.
Taking an outside view of a system as possessing a UF (in the spirit or Dennett's "intentional stance") will only give correct predictions if everything works correctly. The essential point is that you need a fully accurate picture of what is going in inside a black box in order to predict its behaviour under all circumstances .. but pictures that are inaccurate in various ways can be good enough for restricted sets of circumstances.
Here's an analogy: suppose that machinery, including domestic appliances, were made of an infinitely malleable substance called Utlronium, say, and were constrained into some particular form, such a a kettle or toaster, by a further gadget called a Veeblefetzer. So long as a kettle functions as a kettle, I can regard it as Ultronium+Veeblefetzer ensemble. However, such ensembles support different counterfactuals to real kettles. For instance. if the veeblefetzer on my kettle fritzes it could suddenly reconfigure it into something else, a toaster or a spice rack -- but that is not possible for an ordinary kettle that is not made of Ultronium.
The converse case to an entity seeming to have a UF just because it fulfils some apparent purpose is an entity that seems not to have a UF because its behaviour is complex and perhaps seemingly random. A UF in the territory sense does not have to be simple, and a complex UF can include higher level goals, such as "seek variety" or "revise your lower level goals from time to time", so the lack of an obvious UF as judged externally does not imply the lack of a UF in the gold-standard sense of an actual component.
The actual possession of a UF is much more relevant to AI safety than being describable in terms of a UF. If an AI doesn't actually have a UF, you can't render it safe by fixing its UF.
I don't see any reason why AI has to act coherently. If it prefers A to B, B to C, and C to A, it might not care. You could program it to prefer that utility function.*
If not, maybe the A-liking aspects will reprogram B and C out of it's utility function, or maybe not. What happens would depend entirely on the details of how it was programmed.
Maybe it would spend all the universe's energy turning our future light cone from C to B, then from B to A, and also from A to C. Maybe it would do this all at once, if it was programmed to follow one "goal" before preceding to the next. Or maybe different parts of the universe would be in different stages, all at the same time. Think of it like a light-cone blender on pure.
Our default preferences seem about that coherent, but we're able to walk and talk, so clearly it's possible. It explains a lot of the madness and incoherence of the way the world is structured, certainly. Luckily, we seem to value coherence, or at least are willing to sacrifice on having our cake and eating it too when it becomes clear that we can't have it both ways. It's possible an subtly incoherent AGI would operate at cross purposes for a long time before discovering and correcting it's utility function, if it valued coherence.
However, MIRI isn't trying to program a sane AGI, not explore all possible ways an AI can be insane. Economists like to simplify human motives into idealized rational agents, because they are much, much simpler to reason about. The same is true for MIRI, I think.
I've given this sort of thing a little thought, and have a Evernote note I can turn into a LW post, if there is interest.
* I use the term "utility function broadly, here. I guess "programming" would be more correct, but even an A>B>C>A AI bears some rough resemblance to a utility function, even if it isn't coherent.
To post title: Yes. See this discussion of quantum interference (decoherence) in human decision making: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0168045
Utility function +/- 25% against most uncertain prospect, in favor of prospect directly opposite most uncertain prospect. Add an additional +/- >5% as more information becomes available.
Somebody use that 25% in a back-prop algorithm already plz.
Just wondering if you've ever read an old economic article by Ron Heiner: The Origins of Predictable Behavior. 1984 (I think) Am. Econ. Rev. It's probably very sympathetic to your last paragraph. (and on a slightly different slant, the recent Qunata article about evolution: https://www.quantamagazine.org/20170314-time-dependent-rate-phenomenon-evolution-viruses/)
In fact, it seems to me that the less intelligent an organism is, the easier its behavior can be approximated with model that has a utility function!
Only because those organisms have fewer behaviors in general. If you put a human in an environment where its options and sensory inputs were as simple as those experienced by apes and cats, humans would probably look like equally simple utility maximizers.
I think you're getting stuck on the idea of one utility function. I like to think humans have many, many utility functions. Some we outgrow, some we "restart" from time to time. For the former, think of a baby learning to walk. There is a utility function, or something very much like it, that gets the baby from sitting to crawling to walking. Once the baby learns how to walk, though, the utility function is no longer useful; the goal has been met. Now this action moves from being modeled by a utility function to a known action that can be used as input to other utility functions.
As best as I can tell, human general intelligence comes from many small intelligences acting in a cohesive way. The brain is structured like this, as a bunch of different sections that do very specific things. Machine models are moving in this direction, with the Deepmind Go neural net playing a version of itself to get better a good example.
You could model humans as having varying UFs, or having multiple UFs...or you could give up on the whole idea.
Why would I give up the whole idea? I think you're correct in that you could model a human with multiple, varying UFs. Is there another way you know of to guide an intelligence toward a goal?
The basic problem is the endemic confusion between the map, the UF as a way of modelling an entity, and the territory. the UF as an architectural feature that makes certain things happen.
The fact that there are multiple ways of modelling humans as UF-driven, and the fact that they are all a bit contrived, should be a hint that there may be no territory corresponding to the map.
Is there an article that presents multiple models of UF-driven humans and demonstrates that what you criticize as contrived actually shows there is no territory to correspond to the map? Right now your statement doesn't have enough detail for me to be convinced that UF-driven humans are a bad model.
And you didn't answer my question: is there another way, besides UFs, to guide an agent towards a goal? It seems to me that the idea of moving toward a goal implies a utility function, be it hunger or human programmed.
Is there an article that presents multiple models of UF-driven humans and demonstrates that what you criticize as contrived actually shows there is no territory to correspond to the map?
Rather than trying to prove the negative, it is more a question of whether these models are known to be useful.
The idea of mulitple or changing UFs suffers from a problem falsifiability, as well. Whenever a human changes their apparent goals, that's a switch to another UF, or a change in UF? Reminiscent of ptolemaic epicycles, as Ben Goerzel says.
And you didn't answer my question: is there another way, besides UFs, to guide an agent towards a goal? It seems to me that the idea of moving toward a goal implies a utility function, be it hunger or human programmed.
Implies what kind of UF?
If you are arguing tautologously that having a UF just is having goal directed behaviour, then you are not going to be able to draw interesting conclusions. If you are going to define "having a UF broadly, then you are going to have similar problems, and in particular the problem that "the problem of making an AI safe simplifies to the problem of making its UF safe" only works for certain, relatively narrow, definitions of UF. In the context of a biological organism, or an artificial neural net or deep learning AI, the only thing "UF" could mean is some aspect of its functioning that is entangled with all the others. Neither a biological organism, nor an artificial neural net or deep learning AI is going to have a UF that can be conveniently separated out and reprogrammed. That definition of UF only belongs in the context of GOFAI or symbolic programming.
There is no point in defining a term broadly to make one claim come out true, if it was is only an intermediate step towards some other claim, which doesn't come out as true under the broad definition.
My definition of utility function is one commonly used in AI. It is a mapping of states to a real number: u:E -> R where u is a state in E (the set of all possible states), and R is the reals in one dimension.
What definition are you using? I don't think we can have a productive conversation until we both understand each other's definitions.
I'm not using a definition, I'm pointing out that standard arguments about UFs depend on ambiguities.
Your definition is abstract and doens't capture anything that an actual AI could "have" -- for one thing, you can't compute the reals. It also fails to capture what UF's are "for".
AI researchers, a group of people who are fairly disjoint from LessWrongians, may have a rigorous and stable definition of UF, but that is not relevant. the point is that writings on MIRI and LessWrong use,and in fact depend on, shifting an ambiguous definitions.
Could utility functions be for narrow AI only, and downright antithetical to AGI? That's a quite fundamental question and I'm kind of afraid there's an obvious answer that I'm just too uninformed to know about. But I did give this some thought and I can't find the fault in the following argument, so maybe you can?
Eliezer Yudkowsky says that when AGI exists, it will have a utility function. For a long time I didn't understand why, but he gives an explanation in AI Alignment: Why It's Hard, and Where to Start. You can look it up there, but the gist of the argument I got from it is:
I accept that if all of these were true, AGI should have a utility function. I also accept points 1 and 3. I doubt point 2.
Before I get to why, I should state my suspicion why discussions of AGI really focus on utility functions so much. Utility functions are fundamental to many problems of narrow AI. If you're trying to win a game, or to provide a service using scarce computational resources, a well-designed utility function is exactly what you need. Utility functions are essential in narrow AI, so it seems reasonable to assume they should be essential in AGI because... we don't know what AGI will look like but it sounds similar to narrow AI, right?
So that's my motivation. I hope to point out that maybe we're confused about AGI because we took a wrong turn way back when we decided it should have a utility function. But I'm aware it is more likely I'm just too dumb to see the wisdom of that decision.
The reasons for my doubt are the following.
A few occasions where such fool's errands happen to work out will later look like the most intelligent things people ever did - after hindsight bias kicks in. Before Einstein revolutionized physics, he was not obviously more sane than those contemporaries of his who spent their lives doing earnest work in phrenology and theology.
And many people trying many different things, most of them forgotten and a few seeming really smart in hindsight - that isn't a special case that is only really true for Einstein, it is the typical way humans have randomly stumbled into the innovations that accumulate into our technological superiority. You don't get to epistemology without a bunch of people deciding to spend decades of their lives thinking about why a stick looks bent when it goes through a water surface. You don't settle every little island in the Pacific without a lot of people deciding to go beyond the horizon in a canoe, and most of them dying like the fools that they are. You don't invent rocketry without a mad obsession with finding new ways to kill each other.
To define a utility function is to define a (direction towards a) goal. So a discussion of an AI with one, single, unchanging utility function is a discussion of an AI with one, single, unchanging goal. That isn't just unlike the intelligent organisms we know, it isn't even a failure mode of intelligent organisms we know. The nearest approximations we have are the least intelligent members of our species.
This enormously limits the interactions between agents and is again very different from the intelligent organisms we know, which frequently display intelligent behavior in exactly those instances where they interact with each other. We know communicating groups (or "hive minds") are smarter than their members, that's why we have institutions. AIs with utility functions as imagined by e.g. Yudkowsky cannot form these.
They can presumably create copies of themselves instead, which might be as good or even better, but we don't know that, because we don't really understand whatever it is exactly that makes institutions more intelligent than their members. It doesn't seem to be purely multiplied brainpower, because a person thinking for ten hours often doesn't find solutions that ten persons thinking together find in an hour. So if an AGI can multiply its own brainpower, that doesn't necessarily achieve the same result as thinking with others.
Now I'm not proposing an AGI should have nothing like a utility function, or that it couldn't temporarily adopt one. Utility functions are great for evaluating progress towards particular goals. Within well-defined areas of activity (such as playing Chess), even humans can temporarily behave as if they had utility functions, and I don't see why AGI shouldn't.
I'm also not saying that something like a paperclip maximizer couldn't be built, or that it could be stopped once underway. The AI alignment problem remains real.
I do contend that the paperclip maximizer wouldn't be an AGI, it would be narrow AI. It would have a goal, it would work towards it, but it would lack what we look for when we look for AGI. And whatever that is, I propose we don't find it within the space of things that can be described with (single, unchanging) utility functions.
And there are other places we could look. Maybe some of it is in whatever it is exactly that makes institutions more intelligent than their members. Maybe some of it is in why organisms (especially learning ones) play - playfulness and intelligence seem correlated, and playfulness has that incoherence that may be protective against paperclip-maximizer-like failure modes. I don't know.