LESSWRONG
LW

[ Question ]

Does a LLM have a utility function?

1 min read9th Dec 20223 answers 2 comments

17

Language ModelsAI

There's a lot of discussion and research into AI alignment, almost always about variants of how to define/create a utility function (or meta-function, if it changes over time) that is actually aligned with ... something. That something is at least humanity's survival, but often something like flourishing or other semi-abstract goal. Oops, that's not my question for today.

My question for today is whether utility functions are actually part of the solution at all. Humans don't have them, the most interesting spurs toward AI don't have them. Maybe anything complicated enough to be called AGI doesn't have one (or at least doesn't have a simple, concrete, consistent one).

New to LessWrong?

Getting Started

Does a LLM have a utility function?

10Olli Järviniemi

4Rachel Freedman

New Answer

New Comment

3 Answers sorted by
top scoring

Dec 09, 2022

176

It may be better to ask "Is a utility function a useful abstraction to describe how X makes decisions?" (Does it allow you to compress your description of X's decisions?) Recall that utility functions are just a representation derived from preferences that are structured in a particular way. But not all ways of deciding on a preferred outcome are structured in that way^[1], and not all decision algorithms work by preferring outcomes, so thinking in terms of utility functions is not always helpful.

^{^}
See for example:
Aumann, R. J. (1962). Utility theory without the completeness axiom. Econometrica: Journal of the Econometric Society, 445-462.
Bewley, T. F. (2002). Knightian decision theory. Part I. Decisions in economics and finance, 25(2), 79-110.

Even if it's a useful abstraction, it's only an abstraction. You can't make an AI safe by changing the it's UF unless it's UF is a distinct component at the engineering level, not just an abstraction.

3Dagon1y

And you can't determine if it's safe by examining or understanding it's utility function, if the abstraction is so loose as to not be align-able.

2Dan1y

Its not really an abstraction at all in this case, it literally has a utility function. What rates highest on its utility function is returning whatever token is 'most likely' given it's training data.

Olli Järviniemi

Dec 09, 2022

102

I found janus's post Simulators to address this question very well. Much of AGI discussion revolves around agentic AIs (see the section Agentic GPT for discussion of this), but this does not model large language models very well. janus suggests that one should instead think of LLMs such as GPT-3 as "simulators". Simulators are not very agentic themselves or well described as having a utility function, though they may create simulacra that are agentic (e.g. GPT-3 writes a story where the main character is agentic).

A relevant passage from Simulators:

We can specify some types of outer objectives using a ground truth distribution that we cannot with a utility function. As in the case of GPT, there is no difficulty in incentivizing a model to predict actions that are corrigible, incoherent, stochastic, irrational, or otherwise anti-natural to expected utility maximization. All you need is evidence of a distribution exhibiting these properties.
For instance, during GPT’s training, sometimes predicting the next token coincides with predicting agentic behavior, but:
The acti

... (read more)

1Dan1y

I'm going to disagree here. It's utility function is pretty simple and explicitly programmed. It wants to find the best token, where 'best' is mostly the same as 'the most likely according to the data I'm trained on'. With a few other particulars (where you can adjust how 'creative' vs plagiarizer-y it should be.) That's a utility function. GPT is what's called a hill climbing algorithm. It must have a simple straight forward utility function hard coded right in there for it to assess if a given choice is 'climbing' or not.

2Rafael Harth1y

That's the training signal, not the utility function. Those are different things. (I believe this point was made in Reward is not the Optimization Target, though I could be wrong since I never actually read this post; corrections welcome.)

Rachel Freedman

Dec 09, 2022

40

I think that the significant distinction is whether an AI system has a utility function that it is attempting to optimize at test time. A LLM does have an utility function, in that there is an objective function written in its training code that it uses to calculate gradients and update its parameters during training. However, once it is deployed, its parameters are frozen and its score on this objective function can no longer impact its behavior. In that sense, I don't think that it makes sense to think of a LLM as "trying to" optimize this objective after deployment. However, this answer could change in response to changes in model training strategy, which is why this distinction is significant.

2 comments, sorted by

Click to highlight new comments since: Today at 2:34 PM

YES, It wants to find the best next token, where 'best' is 'the most likely'.

That's a utility function. Its utility function is a line of code necessary for training, otherwise nothing would happen when you tried to train it.

Reply

A utility function is the assessment by which you decide how much an action would further your goals. If you can do that, highly accurately or not, you have a utility function.

If you had no utility function, you might decide you like NYC more than Kansas, and Kansas more than Nigeria, but you prefer Nigeria to NYC. So you get on a plane and fly in circles, hopping on planes every time you get to your destination forever.

Humans definitely have a utility function. We just don't know what ranks very highly on our utility function. We mostly agree on the low ranking stuff. A utility function is the process by which you rate potential futures that you might be able to bring about and decide you prefer some futures more than others.

With a utility function plus your (limited) predictive ability you rate potential futures as being better, worse, or equal to each other, and act accordingly.