Hmm, in my view it is more of a goal distinction than abilities distinction.

The model popular here is that of 'expected utility maximizer', and the 'utility function' is defined on the real world. Then the agent does want to build most accurate model of the real world, to be able to maximize that function the best, and the agent tries to avoid corruption of the function, etc. It also wants it's outputs to affect the world, and if put in a box, will try to craft output to do things in the real world even if you only wanted to look at them.

This is all very ontologically basic to humans. We easily philosophize about such stuff.

Meanwhile, we don't know how to do that. We don't know how to reduce that world 'utility' to elementary operations performed on the sensory input (neither directly nor on meta level). The current solution involves making some part that creates/updates mathematically defined problem that other part finds mathematical solutions to, and then the solutions are shown if it is a tool or get applied to the real world if it isn't. The wisdom of applying those solutions to the real world is an entirely separate issue. The point is that the latter works like a tool if boxed, not like a caged animal (or a caged human).

edit: another problem i think is that many of the 'difficulty of friendliness' arguments are just special cases of general 'difficulty of world intentionality'.

The model popular here is that of 'expected utility maximizer', and the 'utility function' is defined on the real world.

I think this is a bit of a misperception stemming from the use of the "paperclip maximizer" example to illustrate things about instrumental reasoning. Certainly folk like Eliezer or Wei Dai or Stuart Armstrong or Paul Christiano have often talked about how a paperclip maximizer is much of the way to FAI (in having a world-model robust enough to use consequentialism). Note that people also like to use the AIXI framework as a m... (read more)

Tool for maximizing paperclips vs a paperclip maximizer

by private_messaging 1 min read12th May 201223 comments


To clarify some point that is being discussed in several threads here, tool vs intentional agent distinction:

A tool for maximizing paperclips would - for efficiency purposes - have a world-model which it has god's eye view of (not accessing it through embedded sensors like eyes), implementing/defining a counter of paperclips within this model. Output of this counter is what is being maximized by a problem solving portion of the tool. Not the real world paperclips

No real world intentionality exist in this tool for maximizing paperclips; the paperclip-making-problem-solver would maximize the output of the counter, not real world paperclips. Such tool can be hooked up to actuators, and to sensors, and made to affect the world without human intermediary; but it won't implement real world intentionality.

An intentional agent for maximizing paperclips is the familiar 'paperclip maximizer', that truly loves the real world paperclips and wants to maximize them, and would try to improve it's understanding of the world to know if it's paperclip making efforts are successful.

The real world intentionality is ontologically basic in human language and consequently there is very strong bias to describe the former as the latter.

The distinction: the wireheading (either direct or through manipulation of inputs) is a valid solution to the problem that is being solved by the former, but not by the latter. Of course one could rationalize and postulate tool that is not general purpose enough as to wirehead, forgetting that the issue being feared is a tool that's general purpose to design better tool or self improve. That is an incredibly frustrating feature of rationalization. The aspects of problem are forgotten when thinking backwards.

The issues with the latter: We do not know if humans actually implement real world intentionality in such a way that it is not destroyed under full ability to self modify (and we can observe that we very much like to manipulate our own inputs; see art, porn, fiction, etc). We do not have single certain example of such stable real world intentionality, and we do not know how to implement it (that may well be impossible). We also are prone to assuming that two unsolved problems in AI - general problem solving and this real world intentionality - are a single problem, or are solved necessarily together. A map compression issue.