Presently, the 'utility maximizers' work as following: given a mathematical function f(x) , a solver finds the x that corresponds to a maximum (or, typically, minimum) of f(x) . The x is usually a vector describing the action of the agent, the f is a mathematically defined function which may e.g. simulate some world evolution and compute the expected worth of end state, given action x, as in f(x)=h(g(x)) where h computes worth of world state g(x), and g computes the world state at some future time assuming that action x was taken.

For instance, the f may represent some metric of risk, discomfort, and time, over a path chosen by a self driving car, in a driving simulator (which is not reductionist). In this case this metric (which is always non-negative) is to be minimized.

In a very trivial case, such as finding the cannon elevation at which the cannonball will land closest to the target, in vacuum, the solution can be found analytically.

In more complex cases multitude of methods are typically employed, combining iteration of potential solutions with analytical and iterative solving for local maximum or minimum. If this is combined with sensors and the model-updater, and actuators, an agent like a self driving car can be made.

Those are the utility functions as used in the field of artificial intelligence.

A system can be strongly superhuman at finding maximums to functions, and ultimately can be very general purpose, allowing it's use to build models which are efficiently invertible into a solution. However it must be understood that the intelligent component finds mathematical solutions to, ultimately, mathematical relations.

The utility functions as known and discussed on LW seem entirely different in nature. Them are defined on the real word, using natural language that conveys intent, and seem to be a rather ill defined concept for which the bottom-up formal definition may not even exist. The implementation of such concept, if at all possible, would seem to require a major breakthrough in the philosophy of mind.

This is an explanation of an important technical distinction mentioned in Holden Karnofsky's post.

On the discussion in general: It may well be the case that it is very difficult or impossible to define a system such as self driving car in terms of the concepts that are used on LW to talk about intelligences. In particular, the LW's notion of "utility" does not seem to allow to accurately describe the kind of tool that Holden Karnofsky was speaking of, in terms of this utility.

New to LessWrong?

New Comment
7 comments, sorted by Click to highlight new comments since: Today at 10:29 PM

If you want your post to have some substance, you ought to address the issue of a "practical tool" vs AI oracle and why the former is less dangerous. HK had a few points about that. Or maybe I don't grasp your point about the difference in the utility function.

(One standard point is that, for an Oracle to be right, it does not have to be a good predictor, it has to be a good modifier, so presumably you want to prove that your approach does not result in a feedback loop.)

That is meant to be informative to those wondering what Holden was talking about. I do not know what do you mean by 'some substance'.

edit: Actually, okay, I should be less negative. It probably is a case of accidental/honest self deception here, and the technobabbling arose by honest attempt to best communicate the intuitions. You approach problem from direction - how do we make a safe oracle out of some AGI model that runs in your imagination reusing the animalism to predict it. Well you can't. That's quite true! However, the actual software is using branches and loops and arithmetic, it's not run on animalism module of your brain, it's run on computer. There's this utility function, and there's the solver which finds maximum of it (which it just does, it's not trying to maximize yet another utility ad infinitum, don't model it using animalism please), and together they can work like animalist model, but the solver does NOT work as animalist model does.

edit: apparently animalism is not exactly the word I want. The point is, we have a module in our brain made for predicting other agents of a very specific type (mammals) and it is a source of some of the intuitions about the AI.

former post:

Ultimately, I figured out what is going on here. Eliezer and Luke are two rather smart technobabblers. That really is all to it. Not explaining anything about the technobabble. Not worth it. The technobabble is what results when one is thinking in terms of communication-level concepts rather than in terms of reasoning-level concepts, within a technical field. The technobabble can and has been successfully published in peer reviewed journals ( Bogdanov affair ) and, sadly, can even be used to acquire a PhD.

The Oracle AI as defined here is another thing defined in terms of the 'utility' as known on LW and has nothing to do with the solver component of the agents as currently implemented.

The bit about predictor having implicit goal to make world match predictions is utter and complete bullshit not even worth addressing. It arises from thinking in terms of communication level concepts such as 'should' and 'want', which can be used to talk about AI but can not be used to reason about AI.

While I understand your frustration, in my experience, you will probably get better results on this forum with reason rather than emotion.

In particular, when you say

Eliezer and Luke are two rather smart technobabblers.

you probably mean something different from what is used on Sci-fi shows.

Similarly,

The implicit goal about world having to match predictions is utter and complete bullshit not even worth addressing.

comes across as a rant, and so is unlikely to convince your reader of anything.

The problem is that it really is utter and complete bullshit. I really do think so. On the likelihood to convince: there's the data point: someone called it bullshit. That's probably all the impact that could possibly be made (unless speaking from position of power).

With the technobabble, I do mean as used in science fiction when something has to be explained. Done with great dedication (more along the lines of wiki article I linked).

edit: e.g. you have animalist (desires) based intuition of what AI will want to do - obviously the AI will want to make it's prediction come true in the real world (it well might if it is a mind upload). That doesn't sound very technical. You replace want with 'utility', replace a few other things with technical looking equivalents, and suddenly it sounds technical to such a point that experts don't understand what you are talking about but don't risk assuming that you are talking nonsense rather than badly communicating some sense.

Ohkay... but... if you're using a utility-function-maximizing system architecture, that is a great simplification to the system that really give a clear meaning to 'wanting' things, in a way that it doesn't have for neural nets or whatnot.

The mere fact that the utility function to be specified has to be far far more complex for a general intelligence than a driving robot doesn't change that. The vagueness is a marker for difficult work to be done, not something they're implying they've already done.

This is an explanation of an important technical distinction mentioned in Holden Karnofsky's post.

You mean the claims at the start of Objection 2? Or what "important technical distinction" do you mean?

This one. The argument on LW goes as "you can't define distinction between tool and agent, so we're right".

Now, to those with knowledge of the field, it is akin to some supposedly engineer claiming you can't define distinction between a bolt and a screw, as a way to defy the statement that "you can avoid splitting the brittle wood if you drill a hole and use a bolt, rather than use a screw", which was a rebuttal to "a threaded fastener would split the brittle wood piece". The only things it demonstrates is ignorance, incompetence, and lack of work towards actually fulfilling the stated mission.

For this 'oracle' link, it clearly illustrates the mechanism of generation of strings employed to talk about the AI risks. You start with the scary idea, then you progress to necessity for each type of AI to be shown scary, then you proceed to each subtype, then you make more and more detailed strings designed to approximate the strings that result from entirely different process of starting from basics (and study of the field) and proceeding upwards to risk estimate.

That task is aided by fact that it is impossible to define a provably safe AI in English (or in technobabble) due to vagueness/ambiguousity, and due to fact that language predominantly attributes real world desires when describing anything that seems animate. That is, when you have a system that takes in sequences and generates functions that approximate the sequences (thus allowing prediction of next element in sequences, without over-training on noise), you can describe it as predictor in English and now you got 'implicit' goal of changing the world to match the prediction. Followed by "everyone give us money or it is going to kill us all, we're the only ones whom understand this implied desire! [everyone else's more wrong because we call ourselves less wrong seem to be implied]". Speaking of which use of language is a powerful irrationality technique.

Meanwhile, in the practice, such stuff is not only not implicit, it is incredibly difficult to implement even if you wanted to implement it. Ultimately, many of the 'implied' qualities that are very hard to avoid in English descriptions of AI are, also, incredibly difficult to introduce when programming. We have predictor-type algorithms, which can be strongly superhuman if given enough computing power - and none of them would exhibit a trace of 'implicit' desire to change the world.

There's the notion that anything which doesn't 'understand' your implied, is not powerful enough (not powerful enough for what?), that's just rationalization, and is not otherwise substantiated or even defined. Or even relevant. Let's make example in other field. Clearly, any space propulsion we know is possible, is not powerful enough to get to faster than speed of light. Very true. Shouldn't be used to imply that we'll have faster than light travel.