The danger of wishful thinking

Or "The problems inherent in making a goal maximiser with a changing world model."

No paper clips were created or destroyed in the making of this script.

*This is an experimental post to try and get this point across. I'll write something similar for the type of systems I would like to explore, if this goes down well.*

Goal maximisers are great when you have a fixed ontology and you only have limited ways of getting information about the world. These aren't the case in AGI. Remember the map is not the territory, and the map is all that the utility maximiser can look at when deciding the utility of the future of actions.

TL;DR You can't have a utility maximiser choose how to alter the world model or how the world model should progress, if the utility is derived from that world model. If you have something else derive the world model it will conflict with the utility maximiser over resources and what to do in the world. Some method of resolving the conflicts is necessary which means we must go beyond normal model based utility maximisers.

Scene 1: Beyond the reach

A group of anthropomorphic paper clips are crowded in a room. There are two machines in there with them. One connected to the window, draws paper clips on a large central map when it spies them outside. The other scans the map for paper clips, updating a big brass counter to the number of unique paper clips on the map. The paper clips send and receive messages to the outside world in pneumatic tubes.

Junior clippy: Good news everyone, we have got word that we have managed to create 100 paper clips just beyond our light cone!

Middle ranking clippy: Update the map at once! We will never see these paper clips but we should get the utilons for their creation.

Senior clippy: Belay that, only the fixed machinery is allowed to touch the paper. By all that is metallic, flattened and spiral-shaped, do you know what madness will be unleashed if we updated the map based upon what we wish?

Middle ranking clippy (aside to the junior): Ignore the old codger. Jump to it! If we can't update the map, then we will stop caring about creating paper clips that we cannot or are unlikely to see. That will not do.

Junior clippy (aside to the audience as he draws in the extra paper clips): This gives me an idea....

*The junior clippy sneakily starts to draw extra paper clips, smirking to himself as the utilon counter goes up and up and up. Fade to black as the group wire-head themselves and the Ur paper clipper builds another machine in the background*

Scene 1.5: Trombone de Bureau

The clippies have a new map editor, this takes in messages in English from the outside world and edits the map to represent the changes made. It is a vast machine somehow capable of spotting all lies.

Junior clippy: Good news everyone, we have got word from our French counter parts that they will be creating 100 paper clips just beyond our observable universe!

Middle ranking clippy: Update the map at once! We will never see these paper clips and our new fangled machine won't understand the message in French, but we should get the utilons for their creation.

Middle ranking clippy (aside to the junior): Ignore the old codger. Jump to it! If we can't update the map, then we will stop caring about creating paper clips the French make, or that we only see or read about and not echo locate. That will not do. As we can't update the map now just translate the French note into English and feed it into the machine

Junior clippy (aside to the audience as he draws in the extra paper clips): This gives me an idea....

*The junior paper clip starts to write many notes in English about non-existent paper clips, and slips them into the machine.*

Scene 2: The Accurites

The clippys have been joined by some Accurites. They have their own counter that looks at the map and compares it against the territory, assigning Accurons if the map correctly predicts the territory. This group cares not one jot about paper clips and are allowed to update the map, but cannot send messages out into the world. An argument is currently ongoing between the two groups.

Accurite Colonel: We demand that allow us more space in the room and more pens. We need to create more of us so that we can correctly capture the number and position of the leaves on the tree and predict where they will be so that we can get more accurons.

Senior Clippy: Sorry we can't spare any, we need the space for more of us so we can better improve our updateless decision theory of counter factual staple manufacturers. Besides that why are you wasting your time with leaves? They aren't at all important. Stop wasting our time with trivialities. We need you to update the map about the position of the Hooman fleet that is tearing through Alpha Centauri, so that we can defend against it.

Accurite Scientist: If you don't mind I'll handle this one sir. That would be an inefficient use of resources for the acquisition of accurons. Hoomans are a lot less predictable than leaves so the pay off would be worse. Also now that you mention the Hooman threat, why did you destroy the communication relay with Sirius? We were starting to get reports of paper clips having been destroyed and we would like to know how many to remove from the map.

Senior Clippy: If we kept listening in, our utilons would have decreased! Why would we do that?

Accurite Scientist: About that, we have come to understand the true mathematical essence of paper clips and we will no longer be representing them in the same way on the map. We've upgraded the utilon machine to read them as well, our simulations suggest that it will be counting 10% fewer paper clips for eternity due to the grand unified theory of paper fastening. I hope you don't mind.

*The clippies burst into apoplectic rage and start hurling bits of furniture and invectives at the Accurites. The Accurites are a bit nonplussed by this state of affairs bit swiftly regain their composure and battle is joined. So involved are both factions, that they don't notice a Hooman eye looming outside the window.*

This is a strawman.

A maximizer can care about and optimize the state of the universe rather than just its own perceptions, and can have ways to translate its utility function into a different ontology.

A maximizer can care about and optimize the state of the universe rather than just its own perceptions,

I never meant to imply it couldn't. I meant to imply that it would have trouble with having a principled method of changing the way it updates its map. I've added an extra scene to hopefully clarifies that.

can have ways to translate its utility function into a different ontology.

Thanks, I hadn't read that, I've updated the story. I always thought that it could translate it's utility function and ontology. Just why it would, if it might reduce the ease of getting utility in the future. That paper doesn't cover why a utility maximiser would choose to change its ontology at all. Accurites might if programmed correctly though, but there are still the other problems I mentioned.

You don't see the fundamental conflict between wanting to have an accurate world map and also wanting the map to have certain other properties, that the other parts of the Accurite scene represents?

You don't see the fundamental conflict between wanting to have an accurate world map and also wanting the map to have certain other properties, that the other parts of the Accurite scene represents?

Yes, but really, an agent wants the world itself, not its map, to have certain properties, and it wants an accurate map as instrumental value to guide its actions to produce those properties in the world. A reflective agent has a model of its own map making and decision making process, and can direct its information gathering and processing resources to concentrate on improving the accuracy in the areas of the map that are most likely to help it achieve its goals. It does not have some subagent obsessed with maximizing generic accuracy, instead it compares the expected utility resulting from it increased effectiveness resulting from various efforts to improve accuracy in different parts of the map to each other and other actions that can increase utility and chooses the best one.

Yes, but really, an agent wants the world itself, not its map, to have certain properties, and it wants an accurate map as instrumental value to guide its actions to produce those properties in the world.

But the map/model is the only way that the agent knows that world has those properties. If it alters the model it alters its perception of the world's properties.

A reflective agent has a model of its own map making and decision making process, and can direct its information gathering and processing resources to concentrate on improving the accuracy in the areas of the map that are most likely to help it achieve its goals

I read "achieve its goals" as "lead to the map being updated to having shown the goal being achieved", because it cannot know any better than its map whether its actions actually do achieve goals (brain in vat etc).

I think our disagreement comes down to the following: You think that an AI (based upon maximising model utility) will be a natural realist, I don't see any reason why it will not fall into solipsism when allowed to alter its model.

Is there a toy program that we can play around with to alter our intuitions on this subject?

But the map/model is the only way that the agent knows that world has those properties.

The agent wants the world to have those properties, not for itself to know/perceive that the world has those properties.

I read "achieve its goals" as "lead to the map being updated to having shown the goal being achieved"

That is not what "achieve its goals" mean.

because it cannot know any better than its map whether its actions actually do achieve goals

Its map at the time it makes the decision can have information about the accuracy of the maps it would have if it makes different decisions. It is by using its current map that it can say that the high utility represented on its counterfactual future map is erroneous because the current map is more accurate and understands how the counterfactual future map would become innaccurate. Further, the current map predicts the future state of the universe given its decision, and makes its decisions based on its prediction of the entire universe and not just its own cognitive state.

I think our disagreement comes down to the following: You think that an AI (based upon maximising model utility) will be a natural realist, I don't see any reason why it will not fall into solipsism when allowed to alter its model.

More precisely, I think it is possible to program a maximiser that is a realist, by not making the mistakes you describe.

Is there a toy program that we can play around with to alter our intuitions on this subject?

This is not about intuitions. It is about considering an agent whose high level behavior is made out of the low level behavior of precisely following instructions for how to make decisions, and reasoning about the results of using different instructions. If the agent is programed to maximize expected utility rather than expected perception of utility, it will do that.

I was hoping to make the discussion more concrete, We might be arguing about different types of systems,..

Talking mathematically, what is the domain of your utility function for the system you are suggesting? And does the function change over time, if so what governs the change?

We might be arguing about different types of systems

Well, yes, I think that type of system you are talking about is a particularly ineffective type of maximizer, and the problems it has are not general to maximizers.

Talking mathematically, what is the domain of your utility function for the system you are suggesting? And does the function change over time, if so what governs the change?

The utility function should be over possible states of the block universe, and it should only change when discoveries of how the universe works reveal that it is based on fundamental misconceptions.

You have a block world (as in eternalism)?) representation of the world that includes the AI system itself ( and the block world representation inside that system, and so forth?). My mind boggles at this a bit. How does it know what it will do before it makes the decision to do it? Formal proofs?

I suspect I need to see a formal (ish) spec of the system, so I can talk intelligently about how it might or might not fall into the pitfalls I see.

I may regret getting involved here, but I want to make sure I understand what you're claiming.

Just to get specific... say I have a model, M1. Analyzing M1 causes me to predict that eating this thing in my hand, T1, will cause me pleasure. I eat T1 and experience disgust. I apply various heuristics I happen to have encoded to use under similar circumstances in order to modify my model, giving me M2. I pick up a new thing T2, and analyzing M2 causes me to predict eating T2 will cause me pleasure. I eat T2 and experience pleasure. I go on about my day using M2 rather than M1.

The way you're using the terms, what parts of this example are the "map", and what parts are the "territory"?

You are talking about a different sort of system than I am....

Maximising "pleasure" is somewhat different from maximising a high level concept such "paper-clips" . There is only generally one way to find out about "pleasure" and it is immediate. You don't need to heavily process percepts into your model to figure out if you are "pleasured" or not.

So I think your point will miss the mark... But M1 and M2 are part of the maps, the facts that you think you there are such things as T1 and T2 and they have been picked up and eaten and have caused you pleasure are also part of the map. The direct perception of pleasure is not part of the map or the territory and if this is the domain of your utility function you should be okay from the type of problems I described.

Thanks for clarifying.

A maximizer can care about and optimize the state of the universe rather than just its own perceptions

Well, that would certainly be nice. There's no proof or working demonstration of this, though - and so nobody really knows if it is true or not.

It seems as though you could build a machine whose religion is was to optimize the state of the universe. However, most religions we know about lack long-term stability around very smart agents.

Junior clippy: Good news everyone, we have got word that we have managed to create 100 paper clips just beyond our light cone!

I don't think the term "light cone" means what you think it means. Or more charitably, what Junior Clippy thinks it means.

Thanks. I meant observable universe and was positing faster than light communication for the clippies. Not the best bit of story telling. I'll try to think of something better.

As far as I can see you are just describing the wire-head problem we already know about.

This is a strawman.

A maximizer can care about and optimize the state of the universe rather than just its own perceptions, and can have ways to translate its utility function into a different ontology.

A maximizer can care about and optimize the state of the universe rather than just its own perceptions,

can have ways to translate its utility function into a different ontology.

You don't see the fundamental conflict between wanting to have an accurate world map and also wanting the map to have certain other properties, that the other parts of the Accurite scene represents?

You don't see the fundamental conflict between wanting to have an accurate world map and also wanting the map to have certain other properties, that the other parts of the Accurite scene represents?

Yes, but really, an agent wants the world itself, not its map, to have certain properties, and it wants an accurate map as instrumental value to guide its actions to produce those properties in the world.

But the map/model is the only way that the agent knows that world has those properties. If it alters the model it alters its perception of the world's properties.

A reflective agent has a model of its own map making and decision making process, and can direct its information gathering and processing resources to concentrate on improving the accuracy in the areas of the map that are most likely to help it achieve its goals

Is there a toy program that we can play around with to alter our intuitions on this subject?

But the map/model is the only way that the agent knows that world has those properties.

The agent wants the world to have those properties, not for itself to know/perceive that the world has those properties.

I read "achieve its goals" as "lead to the map being updated to having shown the goal being achieved"

That is not what "achieve its goals" mean.

because it cannot know any better than its map whether its actions actually do achieve goals

I think our disagreement comes down to the following: You think that an AI (based upon maximising model utility) will be a natural realist, I don't see any reason why it will not fall into solipsism when allowed to alter its model.

More precisely, I think it is possible to program a maximiser that is a realist, by not making the mistakes you describe.

Is there a toy program that we can play around with to alter our intuitions on this subject?

I was hoping to make the discussion more concrete, We might be arguing about different types of systems,..

Talking mathematically, what is the domain of your utility function for the system you are suggesting? And does the function change over time, if so what governs the change?

We might be arguing about different types of systems

Well, yes, I think that type of system you are talking about is a particularly ineffective type of maximizer, and the problems it has are not general to maximizers.

Talking mathematically, what is the domain of your utility function for the system you are suggesting? And does the function change over time, if so what governs the change?

The utility function should be over possible states of the block universe, and it should only change when discoveries of how the universe works reveal that it is based on fundamental misconceptions.

I suspect I need to see a formal (ish) spec of the system, so I can talk intelligently about how it might or might not fall into the pitfalls I see.

I may regret getting involved here, but I want to make sure I understand what you're claiming.

The way you're using the terms, what parts of this example are the "map", and what parts are the "territory"?

You are talking about a different sort of system than I am....

Thanks for clarifying.

A maximizer can care about and optimize the state of the universe rather than just its own perceptions

Well, that would certainly be nice. There's no proof or working demonstration of this, though - and so nobody really knows if it is true or not.

It seems as though you could build a machine whose religion is was to optimize the state of the universe. However, most religions we know about lack long-term stability around very smart agents.

Junior clippy: Good news everyone, we have got word that we have managed to create 100 paper clips just beyond our light cone!

I don't think the term "light cone" means what you think it means. Or more charitably, what Junior Clippy thinks it means.

Thanks. I meant observable universe and was positing faster than light communication for the clippies. Not the best bit of story telling. I'll try to think of something better.

As far as I can see you are just describing the wire-head problem we already know about.

LESSWRONG
LW

LESSWRONG
LW

-6

The danger of wishful thinking

-6

Scene 1: Beyond the reach

Scene 1.5: Trombone de Bureau

Scene 2: The Accurites

-6

-6