Can highly intelligent agents have stupid goals?
A look at The Orthogonality Thesis and the nature of stupidity.


A good popular introduction to the Orthogonality Thesis from Robert Miles.

New Comment
1 comment, sorted by Click to highlight new comments since: Today at 9:46 AM

Transcript for searchability:

hi this video is kind of a response to

various comments that I've got over the

years ever since that video on computer

file where I was describing the sort of

problems that we might have when we have

a powerful artificial general

intelligence with goals which aren't the

same as our goals even if those goals

seem pretty benign we use this thought

experiment of an extremely powerful AGI

working to optimize the simple goal of

collecting stamps and some of the

problems that that might cause I got

some comments from people saying that

they think the stamp collecting device

is stupid and not that it's a stupid

thought experiment but the device itself

is actually stupid they said unless it

has complex goals or the ability to

choose its own goals then it didn't

count as being highly intelligent in

other videos I got comments saying it

takes intelligence to do moral reasoning

so an intelligent AGI system should be

able to do that and a super intelligence

should be able to do it better than

humans in fact if a super intelligence

decides that the right thing to do is to

kill us all then I guess that's the

right thing to do these comments are all

kind of suffering from the same mistake

which is what this video is about but

before I get to that I need to lay some

groundwork first if you like Occam's

razor then you'll love Humes guillotine

also called the is odd problem this is a

pretty simple concept that I'd like to

be better known the idea is statements

can be divided up into two types is

statements and Hort statements these

statements or positive statements are

statements about how the world is how

the world was in the past how the world

will be in the future or how the world

would be in hypothetical situations this

is facts about the nature of reality the

causal relationships between things that

kind of thing then you have the ought

statements the should statements the

normative statements these are about the

way the world should be the way we want

the world to be statements about our

goals our values ethics morals what we

want all of that stuff now you can

derive logical statements from one

another like it's snowing outside

that's a nice statement it's cold when

it snows another s statement and then

you can deduce therefore it's cold


that's another is statement it's our

conclusion this is all pretty obvious

but you might say something like it's

snowing outside therefore you ought to

put on a coat and that's a very normal

sort of sentence that people might say

but as a logical statement it actually

relies on some hidden assumption

without assuming some kind of ought

statement you can't derive another ought

statement this is the core of the Azure

problem you can never derive an ought

statement using only is statements you

ought to put on a coat why because it's

snowing outside so what is the fact that

it's snowing mean I should put on the

coat well the fact that it's snowing

means that it's cold and why should it

being cold mean I should put on a coat

if it's cold and you go outside without

a coat you'll be cold should I not be

cold well if you get too cold you'll

freeze to death okay you're saying I

shouldn't freeze to death

that was kind of silly but you see what

I'm saying you can keep laying out is

statements for as long as you want you

will never be able to derive that you

ought to put on a coat at some point in

order to derive that ought statement you

need to assume at least one other ought

statement if you have some kind of ought

statement like I ought to continue to be

alive you can then say given that I

ought to keep living and then if I go

outside without a coat I'll die then I

ought to put on a coat but unless you

have at least one ought statement you

cannot derive any other ought statements


and Hort statements are separated by

Hume skia T okay so people are saying

that a device that single-mindedly

collects stamps at the cost of

everything else is stupid and doesn't

count as a powerful intelligence so

let's define our terms what is

intelligence and conversely what is

stupidity I feel like I made fairly

clear in those videos what I meant by

intelligence we're talking about a GI

systems as intelligent agents they're

entities that take actions in the world

in order to achieve their goals or

maximize their utility functions

intelligence is the thing that allows

them to choose good actions to choose

actions that will get them what they

want an agent's level of intelligence

really means its level of effectiveness

of pursuing its goals in practice this

is likely to involve having or building

an accurate model of reality keeping

that model up-to-date by reasoning about

observations and using the model to make

predictions about the future and the

likely consequences of different

possible actions to figure out which

actions will result in which outcomes

intelligence involves answering

questions like what is the world like

how does it work what will happen next

what would happen in this scenario or

that scenario what would happen if I

took this action or that action more

intelligent systems are in some sense

better at answering these kinds of

questions which allows them to be better

at choosing actions but one thing you

might notice about these questions is

they're all ears questions the system

has goals which can be thought of as

Hort statements but the level of

intelligence depends only on the ability

to reason about is questions in order to

answer the single ort question what

action should I take next so given that

that's what we mean by intelligence what

does it mean to be stupid well firstly

you can be stupid in terms of those

questions for example by building a

model that doesn't correspond with

reality or by failing to update your

model properly with new evidence if I

look out of my window

and I see there's snow everywhere you

know I see a snowman and I think to

myself oh what a beautiful warm sunny

day then that's stupid right my belief

is wrong and I had all the clues to

realize it's cold outside so beliefs can

be stupid by not corresponding to


what about actions like if I go outside

in the snow without my coat that's

stupid right well it might be if I think

it's sunny and warm and I go outside to

sunbathe then yeah that's stupid but if

I just came out of a sauna or something

and I'm too hot and I want to cool

myself down then going outside without a

coat might be quite sensible you can't

know if an action is stupid just by

looking at its consequences you have to

also know the goals of the agent taking

the action you can't just use is

statements you need a naught so actions

are only stupid relative to a particular

goal it doesn't feel that way though

people often talk about actions being

stupid without specifying what goals

they're stupid relative to but in those

cases the goals are implied we're humans

and when we say that an action is stupid

in normal human communication we're

making some assumptions about normal

human goals and because we're always

talking about people and people tend to

want similar things it's sort of a

shorthand that we can skip what goals

were talking about so what about the

goals then can goals be stupid

well this depends on the difference

between instrumental goals and terminal


this is something I've covered elsewhere

but your terminal goals are the things

that you want just because you want them

you don't have a particular reason to

want them they're just what you want the

instrumental goals are the goals you

want because they'll get you closer to

your terminal goals like if I have a

terminal goal to visit a town that's far

away maybe an instrumental goal would be

to find a train station I don't want to

find a train station just because trains

are cool I want to find a train as a

means to an end it's going to take me to

this town

so that makes it an instrumental goal

now an instrumental goal can be stupid

if I want to go to this distant town so

I decide I want to find a pogo stick

that's pretty stupid

finding a pogo stick is a stupid

instrumental goal if my terminal goal is

to get to a faraway place but if we're

terminal go with something else like

having fun it might not be stupid so in

that way it's like actions instrumental

goals can only be stupid relative to

terminal goals so you see how this works

beliefs and predictions can be stupid

relative to evidence or relative to

reality actions can be stupid relative

to goals of any kind

instrumental goals can be stupid

relative to terminal goals but here's

the big point terminal goals can't be

stupid there's nothing to judge them

against if a terminal goal seems stupid

like let's say collecting stamps seems

like a stupid terminal goal that's

because it would be stupid as an

instrumental goal to human terminal

goals but the stamp collector does not

have human terminal goals

similarly the things that humans care

about would seem stupid to the stamp

collector because they result in so few

stamps so let's get back to those

comments one type of comments says this

behavior of just single mindedly going

after one thing and ignoring everything

else and ignoring the totally obvious

fact that stamps aren't that important

is really stupid behavior you're calling

this thing of super intelligence but it

doesn't seem super intelligent to me it

just seems kind of like an idiot

hopefully the answer to this is now

clear the stamp collectors actions are

stupid relative to human goals but it

doesn't have human goals its

intelligence comes not from its goals

but from its ability to understand and

reason about the world allowing it to

choose actions that achieve its goals

and this is true whatever those goals

actually are some people commented along

the lines of well okay yeah sure you've

defined intelligence to only include

this type of is statement kind of

reasoning but I don't like that

definition I think to be truly

intelligent you need to have complex

goals something with simple goals

doesn't count as intelligent to that I

say well you can use words however you

want I guess I'm using intelligence here

as a technical term in the way that it's

often used in the field you're free to

have your own definition of the word but

the fact that something fails to meet

your definition of intelligence does not

mean that it will fail to behave in a

way that most people would call


if the stamp collector outwits you gets

around everything you've put in its way

and outmaneuvers you mentally it comes

up with new strategies that you would

never have thought of to stop you from

turning it off and stopping from

preventing it from making stamps and as

a consequence it turns the entire world

into stamps in various ways you could

never think of it's totally okay for you

to say that it doesn't count as

intelligent if you want but you're still

dead I prefer my definition because it

better captures the ability to get

things done in the world which is the

reason that we actually care about AGI

in the first place

similarly people who say that in order

to be intelligent you need to be able to

choose your own goals

I would agree you need to be able to

choose your own instrumental goals but

not your own terminal goals changing

your terminal goals is like willingly

taking a pill that will make you want to

murder your children it's something you

pretty much never want to do apart from

some bizarre edge cases if you

rationally want to take an action that

changes one of your goals then that

wasn't a terminal goal now moving on to

these comments saying an AGI will be

able to reason about morality and if

it's really smarter than us it will

actually do moral reasoning better than


so there's nothing to worry about it's

true that a superior intelligence might

be better at moral reasoning than us but

ultimately moral behavior depends not on

moral reasoning but on having the right

terminal goals there's a difference

between figuring out and understanding

human morality and actually wanting to

act according to it the stamp collecting

device has a perfect understanding of

human goals ethics and values and it

uses that only to manipulate people for

stamps it's super human moral reasoning

doesn't make its actions good if we

create a super intelligence and it

decides to kill us that doesn't tell us

anything about morality it just means we

screwed up

so what mistake do all of these comments

have in common the orthogonality thesis

in AI safety is that more or less any

goal is compatible with more or less any

level of intelligence ie those

properties are orthogonal you can place

them on these two axes and it's possible

to have agents anywhere in this space

anywhere on either scale you can have

very weak low intelligence agents that

have complex human compatible goals you

can have powerful highly intelligent

systems with complex sophisticated goals

you can have weak simple agents with

silly goals and yes

can have powerful highly intelligent

systems with simple weird inhuman goals

any of these are possible because level

of intelligence is about effectiveness

at answering is questions and goals are

all about what questions and the two

sides are separated by Humes guillotine

hopefully looking at what we've talked

about so far it should be pretty obvious

that this is the case like what would it

even mean for it to be false but for it

to be impossible to create powerful

intelligences with certain goals the

stamp collector is intelligent because

it's effective at considering the

consequences of sending different

combinations of packets on the internet

and calculating how many stamps that

results in exactly how good do you have

to be at that before you don't care

about stamps anymore and you randomly

start to care about some other thing

that was never part of your terminal

goals like feeding the hungry or

whatever it's just not gonna happen so

that's the orthogonality thesis it's

possible to create a powerful

intelligence that will pursue any goal

you can specify knowing an agent's

terminal goals doesn't really tell you

anything about its level of intelligence

and knowing an agent's level of

intelligence doesn't tell you anything

about its goals


I want to end the video by saying thank

you to my excellent patrons so it's all

of these people here thank you so much

for your support

lets me do stuff like building this

light boy thank you for sticking with me

through that weird patreon fees thing

and my moving to a different city which

has really got in the way of making

videos recently but I'm back on it now

new video every two weeks is the part

anyway in this video I'm especially

Franklin Katie Beirne who's supported

the channel for a long time she actually

has her own YouTube channel about 3d

modeling and stuff so a link to that and

while I'm at it when I think Chad Jones

ages ago I didn't mention his YouTube

channel so link to both of those in the

description thanks again and I'll see

you next time I don't speak cat what

does that mean

New to LessWrong?