Preamble

Value deathism by Vladimir Nesov encourages us to fix our values to prevent astronomical waste due to under-optimized future.

When I've read it I found that I think about units of measurement of mentioned astronomical waste. Utilons? Seems so. [edit] Jack suggested widely accepted word Utils instead.[/edit]

I've tried to precisely define it. It is difference between utility of some world-state G measured by original (drifting) agent and utility of world-state G measured by undrifting version of original agent, where world-state G is optimal according to original (drifting) agent.

There are two questions: can we compare utilities of those agents and what does it mean that G is optimal?

Question

Preconditions: world is deterministic, the agent has full knowledge of the world, i.e. it knows current world-state, full list of actions available for every world-state and consequence of each action (world-state it leads to), the agent has no time limit for computing next action.

Agent's value is defined as a function from set of world-states to real numbers, for the sake of, uhm, clarity, the bigger the better. (Note: it is unnecessary to define value as a function from set of sequences of world-states, as history of world can be deduced from world-state itself, and if it can't be deduced, then the agent can't use history anyway, as the agent is a part of this world-state, so it doesn't "remember" history too). [edit] I wasn't aware that this note includes hidden assumption: value of world-state must be constant. But this assumption doesn't allow agent to single out world-state where agent loses all or part of its memory. Thus value as a function over sequences of world-states has a right to be. But this value function still needs to be specifically shaped to be optimization algorithm independent. [/edit]

Which sequence of world-states is optimal according to agent's value?

 

Edit: Consider agents implementing greedy search algorithm and exhaustive search algorithm. For them to choose same sequence of world-states search space should be greedoid. And that requires very specific structure of value function.

Edit2: Alternatively value function can be indirectly self-referential via part of world-state that contains the agent, thus allowing it to modify agent's optimization algorithm by assigning higher utility to world-states where agent implements desired optimization algorithm. (I call agent's function 'value function' because its meaning can be defined by the function itself, it isn't necessarily utility).

 

My answer:

Jura inyhr shapgvba bs gur ntrag vfa'g ersyrpgvir, v.r. qbrfa'g qrcraq ba vagrecergngvba bs n cneg bs jbeyq-fgngr bpphcvrq ol ntrag va grezf bs bcgvzvmngvba cebprff vzcyrzragrq ol guvf cneg bs jbeyq-fgngr, gura bcgvzny frdhrapr qrcraqf ba pbzovangvba bs qrgnvyf bs vzcyrzragngvba bs ntrag'f bcgvzvmngvba nytbevguz naq inyhr shapgvba. V guvax va trareny vg jvyy rkuvovg SBBZ orunivbe.

Ohg jura inyhr shapgvba vf ersyrpgvir gura guvatf orpbzr zhpu zber vagrerfgvat.

 

Edit3:

Implications

I'll try to analyse behavior of classical paperclip maximizer, using toy model I described earlier. Let utility function be min(number_of_paperclips_produced, 50).

1. Paperclip maximizer implements greedy search algorithm. If it can't produce paperclip (all available actions lead to the same utility), it performs action that depends on implementation of greedy search. All in all it acts erratically, while it isn't occasionally terminated (it stumbled into world-state where there's no available actions for him).

2. Paperclip maximizer implements full-search algorithm. Result depends on implementation of full-search. If implementation executes shortest sequence of actions that leads to globally maximal value of utility function, then it produces 50 paperclips as fast as it can [edit] or it wireheads itself into state where his paperclip counter>50 whichever is faster [/edit], then terminates itself. If implementation executes longest possible sequence of actions that leads to globally maximal value of utility function, then the agent behave erratically, but is guarantied to survive, while its optimization algorithm behave according to original plan, but it will occasionally modify itself and gets terminated, as original plan doesn't care about preservation of agent's optimization algorithm or utility function.

It seems that in full-knowledge case powerful optimization processes don't go FOOM. Full-search algorithm is maximally powerful isn't it?

Maybe it is uncertainty that leads to FOOMing? 

Indexical uncertainty can be represented by assumption, than agent knows set of world-states it can be in, and a set of available actions for world-state it is actually in. I'll try to analyze this case later.

Edit4: Edit3 is wrong. Utility function in that toy model cannot be so simple if it uses some property of the agent. However it seems OK to extend model by including high-level description of state of the agent into world-state, then edit3 holds.

 

 

New to LessWrong?

New Comment
8 comments, sorted by Click to highlight new comments since: Today at 10:54 PM
[-]Jack13y40

Tangent: How did we end up calling them 'utilons'? Every use of the term in a google search looks Less Wrong related. Utils has been the term since Bentham. I realize it doesn't matter but non-standard terminology is a good way to get mistaken as cranks.

Well, I tried to precisely define toy model I use. As for utilons, I took the word that is common here, without much thinking about it. It doesn't seem to blur the meaning of post significantly.

[-]Jack13y60

Yeah, I don't have a problem with you using what is common around here. I just would like to change what is common around here.

It seems that what I call indirectly self-referential value function can be a syntactic preference as defined by Vladimir Nesov.

When I've read it I found that I think about units of measurement of mentioned astronomical waste. Utilons? Seems so.

I've tried to precisely define it. It is difference between utility of some world-state G measured by original (drifting) agent and utility of world-state G measured by undrifting version of original agent, where world-state G is optimal according to original (drifting) agent.

There are two questions: can we compare utilities of those agents and what does it mean that G is optimal?

The comparison isn't between two different utility functions, it's between the utility of two different scenarios as measured by the same utility function. What Nesov is arguing is that, given whatever utility function you have now, if you don't try to fix that utility function for yourself and all of your descendants, you will miss out on an extremely large amount of utility as measured by that utility function. Since, by definition, your current utility function is everything you care about right now, this is a really bad thing.

I don't understand. Fixed utility function doesn't equal unfixed utility function, as optimizing for them leads to different outcomes.

Edit: you mean, that we cannot optimize for unfixed utility function? In second part of article I've tried to demonstrate that meaning of optimization according to utility function should be a part of utility function itself, as otherwise result of optimization depends on optimization algorithm too, thus making utility function insufficient to describe everything one cares about.

I don't mean that at all. Given that we have a utility function U() over states of the world, Nesov's argument is essentially that:

U(future given that we make sure our descendants have the same utility function as us) >> U(future given that we let our descendants' utility functions drift away from ours)

Where ">>" means astronomically more. There is no comparison of utilities across utility functions.

Future is not a world-state, it is a sequence of world-states. Thus your statement must be reformulated somehow.

Either (1) we must define utility function over a set of (valid) sequences of world-states or (2) we must define what it means that sequence of world-states is optimized for given U, [edit] and that means that this definition should be a part of U itself as U is all we care about. [/edit]

And option 1 is either impossible if rules of world don't permit an agent to hold full history of world or we can define equivalent utility function over world-states, thus leaving only option 2 as viable choice.

Then your statement means either

  • For all world-states x in the sequence of world-states optimized for U U(x)>U(y), where y doesn't belong to the sequence of world-states optimized for U. And that means we must know in advance which future world-states are reachable.

or

  • U(x)>U(y) for all world-states x in the sequence of world-states optimized for U, for all world-states y in the sequence of world-states optimized for some U2. But U2(x)<U2(y).

However it is not a main point of my post. Main point is that future optimization isn't necessarily maximizing a fixed function we know in advance.

Edit: I don't really argue with Vladimir, as future optimization as utility maximization can be a part of his value function, and arguing about values per se is pointless. But maybe he misinterprets what he really values.