Mentioned in

Values vs. parameters

5Manfred

1PhilGoetz

3Wei Dai

0PhilGoetz

2pengvado

2Will_Sawin

2timtyler

2Wei Dai

0Eugine_Nier

0PhilGoetz

1Eugine_Nier

1PhilGoetz

-1timtyler

2PhilGoetz

3timtyler

2Matt_Simpson

1Vaniver

3Eugine_Nier

0PhilGoetz

2timtyler

0PhilGoetz

0wedrifid

0Vaniver

4PhilGoetz

0Vaniver

1wedrifid

0Vaniver

0wedrifid

0atucker

0XiXiDu

-1timtyler

1PhilGoetz

0timtyler

-1endoself

4timtyler

0endoself

0timtyler

0endoself

2Manfred

-6PhilGoetz

2Manfred

1PhilGoetz

7gjm

0PhilGoetz

2PhilGoetz

3gjm

-1gjm

1gjm

-2PhilGoetz

1gjm

1PhilGoetz

2TimFreeman

1PhilGoetz

3timtyler

3TimFreeman

1Morendil

0TimFreeman

0AlephNeil

0AlephNeil

0PhilGoetz

0PhilGoetz

0gjm

-1PhilGoetz

0gjm

0PhilGoetz

0Vladimir_Nesov

-2XiXiDu

1endoself

New Comment

68 comments, sorted by Click to highlight new comments since: Today at 5:56 PM

Is your time-discounting function a value, or a value parameter?

I agree, this is not a continuation - the argument in the other thread is indifferent to this, since if we ignore the possibility of later things being valued exponentially more, a polynomial discounting function from any source means that the "real" (context-independent) discounting function is polynomial (including 1).

As for "mathematically wrong," you should rather say "bad if preference reversals are inherently bad." "Inconsistent" means multiple things here, and the "mathematically wrong" inconsistent is not the same as the "preference reversal" inconsistent.. See this Wei Dai comment.

I have one heuristic that I think is pretty good for telling when something is not a value: If it's mathematically wrong, it's an error, not a value. So my inclination is to point out that exponential time-discounting is correct. All other forms of time-discounting lead to inconsistencies. You can time-discount exponentially; or you can not time-discount at all, as Eliezer suggested; or you can be in error.

I think this is a good heuristic, but my inclination is to say that it's a logical error to think that non-exponential time-discounting is an "error" and that either exponential discounting and no time-discounting must be part of our values because they are not "errors".

Besides what I said in this reply to Eliezer, another way to avoid inconsistencies without going to exponential discounting is to discount via absolute time as opposed to time relative to "now". Humans do not work this way, of course, but you can program it into an AI if you wanted to.

Besides what I said in this reply to Eliezer,

I'll have to think about that. It's a little too deep for me to process right now.

discount via absolute time as opposed to time relative to "now".

I don't think that will work. Can you explain it in more detail? Distorting time won't prevent reversals of preference just because it makes some plotted curves match.

Distorting time won't prevent reversals of preference just because it makes some plotted curves match.

If your discounting factor is f(t-now) for some function f, then f needs to be translation invariant (modulo positive affine scaling), on pain of preference reversals. The requirement of translation invariance is directly due to the fact that f gets translated by the varying values of "now". For two possible events x1 and x2, the agent compares U(x1)*f(t1-now) vs U(x2)*f(t2-now), where U is the non-discounted utility function, and if the result of that comparison depends on the value of "now" you have problems.

However, if your discounting factor is f(t) simpliciter, then f isn't translated and thus doesn't need to be translation invariant. No single event is ever valued according to multiple different outputs of f. The agent will derive the same preference between any two events regardless of when it computes the decision.

Isn't this all about reflective equilibrium?

Define my real values to be the endpoint of the process in which I continually question my intuitions and self-modify until I have a single, coherent set of values and moral intuitions.

Define my used values to be the approximation that I use to make decisions on a day-by-day basis.

Define the discounting hypothesis to be the statement that there exists a utility function that takes a paramater f(p,U) such that:

p can be interpreted as the amount of discounting

f(0,U) closely approximates our real values.

f(x,U) for some x>0 closely approximates our used values.

Then perhaps Eliezer is arguing for the discount hypothesis.

If "parameter" were "instrumental value", that would mean you could set the parameter according to your values. That would be very similar to saying it was part of your values.

The notion of "value" vs. "instrumental value" is probably bankrupt. It's very similar to the dichotomy of "grounded symbol vs. dependent symbol". I talked about this in the section "Value is a network problem" of my regrettably long post on values.

In this particular case, I'm also presenting the option that "parameter" is something you can place logical constraints on, regardless of values. Like math: You can't say that believing that 1+1=3 is one of your values.

Answer to Wei Dai: It's a parameter you need to set to implement your values; but I don't know how you set it. That's the problem I'm pointing out. Resolving whether this was an instrumental value or not would solve the problem.

Answer to Wei Dai: It's a parameter you need to set to implement your values; but I don't know how you set it. That's the problem I'm pointing out. Resolving whether this was an instrumental value or not would solve the problem.

If I understand you correctly, you're thinking of utility functions as f(p,U), were U is the state of the universe and p is some parameter. Here f is your values and p is the parameter. If that's the case, I don't understand the distinction your making by separating out p, i.e., why not just work with F(U)=f(p,U). If there's uncertainty about the value of p, how is that different from uncertainty about your utility function?

If "parameter" were "instrumental value", that would mean you could set the parameter according to your values. That would be very similar to saying it was part of your values. The notion of "value" vs. "instrumental value" is probably bankrupt.

Well, *instrumental value* is a totally standard concept:

http://en.wikipedia.org/wiki/Instrumental_value

It *is* different from the equally-standard concept of *intrinsic value*:

http://en.wikipedia.org/wiki/Intrinsic_value_%28ethics%29

...so: it is not very clear what you mean.

I *think* what you are saying is that because *instrumental values* arise out of *intrinsic values* we need not bother too much in distinguishing between them.

However, that isn't right - there's at least one important time to know what is an *instrumental value* and what is an *intrinsic value* - namely when self-improving.

No, I really think instrumental vs. intrinsic values is a bankrupt set of ideas, despite being standard. Our values were not generated by starting with a set of intrinsic values, then adding those instrumental values needed to achieve them. Similarly, I don't think that we learn language by learning a set of fundamental words, and then defining all other words in terms of them. The link to my regrettably long post on values above is supposed to elucidate this (though I'm not sure it's any clearer than what I just said).

No, I really think instrumental vs. intrinsic values is a bankrupt set of ideas, despite being standard. Our values were not generated by starting with a set of intrinsic values, then adding those instrumental values needed to achieve them.

Well, then please reconsider. Humans intrinsically value warmth, sweetness, fullness, orgasms, the absence of pain - and various other things. They instrumentally value money, qualifications, property rights, and so on. Mostly the instrumental values arise out of the intrinsic values - in the context of some environment.

There may be some wrinkles to this kind of model. There may be instinctive predispositions towards some instrumental values. There may be instrumental values that only develop as a result of certain types of interaction with the environment. However, overall, I fail to see how arguing with the significance of the instrumental / intrinsic value split is productive.

I am pretty sure that any more sophisticated model would still exhibit the same instrumental / intrinsic value division.

I don't think you have the same notion of instrumental values in mind that everyone else has. Instrumental values aren't values, properly. If X is an instrumental value, all this means is that X is useful for obtaining our values. Examples of instrumental values: money, natural resources, or computing power.

The problem with distinguishing instrumental and terminal values in *humans* is that it isn't always clear whether you really do value, for example, justice, or if it's just a heuristic for obtaining that which we do value.

It seems to me that eliminating space-discounting is done for *memetic*, not genetic, reasons. And, as the triumph of individualistic nations over collectivist tribes shows, it outcompetes. "I trust strangers in the way a hunter-gatherer trusts his cousins" is not just a platitude for making people think you're moral, but also a good strategy for *conquering the world*.

Do values come from genes, and parameters from memes? The other way around? A mix? My guess is that most of our value drift will represent genetic strategies being replaced by memetic strategies, and so talking about values and parameters may not make a whole lot of sense.

It seems to me that time discounting is a heuristic to deal with death- if that issue gets resolved, then you don't need to discount as heavily. I'm not sure how a utility function would deal with memories, though- if I get to have an awesome experience at one point in my life, it seems like the earlier the better, as I can only remember the experience after I have it (I can anticipate it beforehand, but I imagine that'll be weaker). So my guess is immortals (or, at least, billion-year lifespan individuals) will have a weak preference for the present.

Nitpick: Feeling that way would prevent you from conquering the world.

I'm inclined to agree. Unless the hunter-gatherer in question somehow had an overwhelmingly nuanced and Machiavellian instinct for trust and, presumably, extremely promiscuous relatives in a culture with significant inter-tribal mating.

Nitpick: Feeling that way would prevent you from conquering the world.

As in, hunter-gatherers don't feel the way towards their cousins that citizens of the developed world are encouraged to feel towards their fellow citizens? Or that if I personally feel that way, I personally will not become the ruler of the world? Or something else?

that citizens of the developed world are encouraged to feel towards their fellow citizens?

You don't conquer the world by doing what you are encouraged to do by society. You do the encouraging. Adamantly.

If the history of conquerers can be taken as a guide you trust your cousins about as far as you can preemptively execute them.

Are you telling me that British shopkeepers should have gone around preemptively executing each other?

No, *I* was talking about conquering the world. I don't recommend being a British shopkeeper at all if world conquest is your goal. Although Rupert Giles might like to test that theory. Maybe it even worked for him after all!

Are you referring to, instead, 'success'?

How do I tell if something is a value or parameter in general?

My current guess is that something is a parameter if I can act in ways contrary to it, and be happier than if I followed it. It seems like time discounting is an approximation of how reality works so that humans can plan in ways that used to be accurate.

In this sense, I think that time discounting is a parameter. If I were able to do strategic things like save money in a compounding way so that I can retire significantly earlier, I would probably be happier than if I spent my money on say, candy bars and movie tickets.

It should also be noted that humans do not apply time-discounting equally over all outcomes and are also risk averse. That's why I have a hard time to accept arguments against discount rates based on the decisions of some hypothetical agent. Humans are not going to attempt time-travel, because they are risk-averse. Further, time-travel is subject to time-discounting as well. Just because you might be able to turn the clock back doesn't change anything about your subjective time experience, which is the basis of human time preference.

How can we distinguish our utility function from the parameters we use to apply it?

The utility function contains the values which we want to preserve - *except* when we are using pragmatic utility functions for performance reasons.

If you want actual methods of distinguishing them in humans, I don't think that is trivial.

There *are* some heuristics, though. For example, if young babies want it, it is an intrinsic value - whereas if it varies from one culture to the next, it is probably instrumental. If you keep going like that, you eventually wind up with a pretty good idea about how to classify most values.

Utility functions do not converge even with time discounting. See http://arxiv.org/abs/0712.4318 - though time discounting is not mentioned, it does not affect the result.

Utility functions do not converge even with time discounting. [...]

That is not what the cited paper says at all.

It doesn't say that utility functions don't converge. It says that the expected value of unbounded utility functions don't converge.

Ironically, I read that paper last week, and my main reaction was that it was not very relevant because it did not incorporate time discounting; or that you could interpret it as an argument for time discounting, because time discounting makes its result go away.

As Manfred said, it does not appear that the results of the paper are affected by time discounting.

Let's be a bit more explicit about this. The model in the paper is that an action (or perhaps a pair (action,context)) is represented by a single natural number; this is provided to the environment and it returns a single natural number; the agent feeds that number into its utility function, and out comes a utility. The agent has measured what the environment does in response to a finite set of actions; what it cares about is the expected utility (over computable environments whose behaviour is consistent with the agent's past observations, with some not-necessarily-computable but not too ill-behaved probability distribution on them) of the environment's response to its actions.

The paper says that the expected utilities don't exist, if the utility function is unbounded and computable (or merely bounded below in absolute value by an unbounded computable function).

(Remark: It seems to me that this isn't necessarily fatal; if the agent cannot exactly repeat previous actions, and if it happens that the expected *utility difference* between two of the agent's actions exists, then it can still decide between actions. However, (1) the "cannot repeat previous actions" condition seems a bit artificial and (2) I'd guess that the arguments in the paper can be adjusted to show that expected utility differences are also divergent. But I could be wrong, and it would be interesting to know.)

So, anyway. How does time discounting fit into this? It seems to me that this is meant to model the *immediate* response of the environment to the agent's action; time doesn't come into it at all. And the conclusion is that *even then* -- even without considering the possible infinite future -- the relevant expectations don't exist.

The pathology described in the paper doesn't seem to me to have anything to do with not discounting in time. Turning the consequences of an action into an infinite stream rather than a single result might make things worse, but it can't possibly make them better.

Actually, that's not quite fair. Here's one way it could make them better. One way to avoid the divergence described in the paper is to have a bounded utility function. That seems troublesome to many people. But it's not so unreasonable to say that the utility you attach *to what happens in any bounded region of spacetime* should be bounded. So maybe there's some mileage to taking the bounded-utility case of this model (where the expectations all exist happily), then representing an agent's actual deliberations as involving (say) some kind of limit as the spacetime region gets larger, and hoping that lim { larger spacetime region } E { utility } converges even though E { lim { larger spacetime region} utility } doesn't. Which might be the case; I haven't thought about it carefully enough to have much idea how plausible that is.

In *that* scenario, you might well get saved by exponential time-discounting. Or even something weaker like 1/t^2. (Probably not 1/t, though.) But it seems to me that filling in the details is a big job; and I don't think it can possibly be right to assert that time discounting makes the result of the paper go away, without doing that work.

Hi, (other?) downvoter! Whatever do you object to about the above?

If someone's downvoting me for good reason, I would like to know how I can improve. If someone's downvoting me without good reason, I would like to make that apparent. The interests of LW, as much as my own, are advanced in both cases.

What, if anything, am I missing?

See my response to Manfred above. The paper does not define what it means by "expected value"; I am assuming that it means the sum of all possible continuations of the infinite series, multiplied by their probabilities, because that seems consistent and appropriate. It could alternately mean the expectation of the value that an infinite series converges to, which would be a peculiar way of talking about utility calculations (it would be the opposite of time-discounting: the present doesn't matter, only the infinite future), but would probably also be consistent with the paper.

The paper assumes that you have a universe of possible infinite series, all of which diverge; and proves (not surprisingly) that the sum of an expected value over an infinite number of such infinite series diverges.

If instead of taking the sum x1 + x2 + ..., you use time-discounting into the future from the present time t=1:

x1 + f(x2 + f(x3 + f(x4 + ...))))

where f(x) = x/c, then you are using exponential time-discounting; and you can find series that meet the particular definition of "unbounded" in the paper, but that are exponentially bounded, and for which the expected value of time-discounted infinite series, multiplied by their probabilities, would converge.

I'm trying to read that paper now, but I'm stuck.

First, the paper needs to define what an unbounded function is.

Next, in Lemma one that B(x) > f(x) infinitely often. But f(x) is defined as "a function that describes the environment". There is no ordering defined on f(x); no interpretation of what the number you get out of f(x) means. So what does it mean for f(x) to be large or small?

Just before equation 2, it refers to "the expected utility of inputting k to the environment." Since when do environments have inputs? Somehow, the environment is a function, which takes an input and produces an output that is itself used as input to a utility function. I don't see how this relates to the usual system of computing utility as a function of an environment. What's this business with treating the environment as a function of - something? What are these "inputs"?

In proving theorem 1, the paper defines psi_G(n). But psi_n has already been defined as a way of indexing all computable functions. You can't redefine psi after that. So this move looks suspect.

This paper really needs some more English text in it explaining what it is doing.

I don't have my head around the proof in the paper, but I can answer your specific questions:

First, the paper needs to define what an unbounded function is.

U maps N to R and is unbounded, which means for all x in R there exists a y in N such that U(y) > x.

Next, in Lemma one that B(x) > f(x) infinitely often. But f(x) is defined as "a function that describes the environment". There is no ordering defined on f(x); no interpretation of what the number you get out of f(x) means.

His point is that B grows really fast, faster than any computable f. I don't see where it defines f(x) to be "a function that describes the environment", and that definition wouldn't make sense at this point anyway. Once he's done proving B grows faster than any computable function, he doesn't need that f any more.

Thanks! Your definition of unbounded does not mention being unbounded below; that seems oddly asymmetrical to me.

I was wrong on this and Tim Tyler was right. The theorem takes an absolute value of U at the end, so apparently it's still true if U has an unbounded absolute value. My guess that "unbounded" meant "unbounded below" was wrong, but apparently harmless.

I am still puzzled over what the environment functions are. f is one of the same kinds of things that h is, and h is "the true function that describes the environment".

The f in Lemma 1 is a different sort of thing from the f in Equation 2 and Theorem 1. I assume you're talking about Equation 2 and Theorem 1.

(Variable scope is ambiguous in mathematical texts and has to be guessed from context. Maybe I'm not stuck on this right now and you're feeling stuck because I worked through Hutter's AIXI paper. I spent a lot of time feeling stuck when reading that. Perhaps we all have to pay the same entry price. One might also argue that the papers are too concisely written.)

Our heroic AI is attempting to take action to maximize its expected utility. The only action it can take is to choose an input to h; suppose that's i. The only response from the world is h(i). The utility it experiences is U(h(i)). There is no persistent state, and h never changes. It doesn't know the true h, so the best it can do is maximize the expected value of U(f(i)), where f is a version of h that's consistent with it's observations. The weights in the expected value are p(f), which is the AI's prior probability of f being h. The point of the paper is that it can't evaluate that expected value because the series diverges when the U is unbounded.

Maybe we should do more math on this blog. It's much more pleasant than arguing about philosophy. I would enjoy this conversation with Phil even if he was one of those goddamned moral realists. :-).

Here's a quick summary of the paper, to see whether I understand it (and perhaps to distill it down to its essentials.)

Let a 'hypothesis' be an algorithm which tells us, among other things, how much utility we get for each of the possible actions we can take.

Assume that we assign a non-zero probability to every hypothesis consistent with our data so far, and that our subjective probability distribution is computable. Assume there is an algorithm F such that F(M) is a hypothesis (consistent with available data) in which whatever we do, we get constant utility u' satisfying |u'| > M.

Arrange all 'computations returning natural numbers' in some (computable) order. Let C_n be the n-th such computation.

Arrange all of the hypotheses with non-zero probabilities in some (computable) order. Now define function f(n) = the index of the hypothesis F(the result of C_n). Somewhat delicate point: f is a computable function (even though some of the C_n do not halt). Therefore, q(n) = 1/(the probability of hypothesis f(n)) is also a computable function.

However, for any computable function g, there must exist (by Busy Beaverish arguments) infinitely many values of n such that g(n) < the result of C_n. This is true of q(n) in particular.

If q(n) < the result of C_n < |the utility of hypothesis f(n)| then the term corresponding to f(n) in our expected utility must have magnitude >= 1. But there are infinitely many such terms, so the expected utility is divergent.

[I hope my various simplifications haven't broken the proof.

Notes:

I can see a slight gap in the logic where de Blanc states "By definition, phi_G(n) is in S_I". Surely phi_G(n) may not be a total function, whereas S_I is a set of total functions? How best can we patch things up? Do we need to assume that the agent assigns non-zero probabilities to all

*partial*recursive functions consistent with the data?I seem to have done away with u_j and simplified the definition of q(n). Is this OK?]

As a PSA for anyone curious, here's a quick summary of the paper:

Assume that we assign a non-zero probability to every hypothesis consistent with our data so far, and that our subjective probability distribution is computable. Assume there is an algorithm F such that F(M) is a hypothesis (consistent with available data) in which whatever we do, we get constant utility u' satisfying |u'| > M.

Arrange all 'computations returning natural numbers' in some order. Let C_n be the n-th such computation.

Arrange all of the hypotheses with non-zero probabilities in some order. Now define function f(n) = the index of the hypothesis F(the result of C_n). Somewhat delicate point: f is a computable function (even though some of the C_n do not halt). Therefore, q(n) = 1/(the probability of hypotheses f(n)) is also a computable function.

However, for any computable function g, there must exist (by Busy Beaverish arguments) infinitely many values of n such that g(n) < the result of C_n. This is true of q(n) in particular.

If q(n) < the result of C_n < |the utility of hypothesis f(n)| then the term corresponding to f(n) in our expected utility must have magnitude >= 1. But there are infinitely many such terms, so the expected utility is divergent.

[Have my various simplifications broken the proof?]

I can't understand the proof, but I can construct some counterexamples. Either these disprove the theorem, or they will show where I misunderstand the proof.

The first class relies on the fact that the proof places no restrictions on the probability distribution. Define U(n) = n. This is bounded below in abs. value by the unbounded computable function U(n)=n. Define p:N->R such that p(1) = 1, and p=0 for all other n. Now EU(h(k)) = 1.

Notice that I am defining p as a function from the integers into the reals, whereas Peter defined it as a function from the reals into the reals. Peter's definition must be incorrect, since in equation 2, the term should not be p(f)U(f(k)), as Peter has defined it; it must be either p(f)U(f) or p(f(k))U(f(k)). p and U must always be operating on the same thing within one term, so p should be defined over the same domain as U.

(Peter has disagreed with me in email on this, but I can't agree. In an expected utility calculation, each term must be the utility of something, times the probability of that same something.)

The second class of counter-examples relies on the fact that the utility function is not constrained to be positive. The proof says, "To establish that this series does not converge, we will show that infinitely many of its terms have absolute value >= 1." The consequent does not follow from the antecedent if the terms can be negative; so we already know the proof is flawed in that case. (And if it isn't the case, why does the proof mention absolute value?)

I define U(n) = n if n is even, -n if n is odd. Now I define p(n) = 1 / 2^n. p(n) is a valid probability distribution, as it sums to one; and the infinite sum of p(f)U(f) converges on zero.

The third and most important class of counter-examples is building a reasonable positive utility function and probability distribution for which the theorem fails. Again use p(n) = 1 / 2^n, which sums to 1. My utility function will be

- U(n) = log(n) for n = 2^i, i an integer
- U(n) = 0 otherwise

Now U(n) is bounded below in absolute value by itself, and U(n) is unbounded; and yet the infinite sum p(n)U(n) converges to 1. (Intuitive proof: The sum of all the n terms from (2^n)+1 to 2^(n+1) is n+1; their average is thus 1, and they all have very similar probabilities. Therefore, the sum of the infinite series is going to be the same as if it used the constant U(n) = 1, which would make the entire sum p(n)U(n) sum to 1.)

Some retractions, clarified by Peter in email:

Peter does place a limit on the probability distribution: It must never be zero. (I read a '<' as '<='.) This removes counterexamples 1 and 3. However, I am not sure it's possible to build a probability distribution satisfying his requirements (one that has cardinality 2^N, no zero terms, and sums to one). This link says it is not possible.

The reason Peter's expected value calculation is not of the form p(x)U(x) is because he is summing over one possible action. p(h) is the probability that a particular hypothesis h is true; h(k) is the result of action k when h is true; U(h(k)) is the utility from that result.

"To establish that this series does not converge, we will show that infinitely many of its terms have absolute value >= 1." This statement

*is*valid.

However, my second counterexample still looks solid to me: U(n) = n if n is even, -n if n is odd. p(n) = 1/2^n. This doesn't work in Peter's framework because the domain of p is not countable.

So I give 3 caveats on Peter's theorem:

It only applies if the number of possible worlds, and possible actions you are summing over, is uncountable. This seems overly restrictive. It really doesn't matter if expected utility converges or not, when the sum for the expected utility for a single action would be non-computable regardless of whether it converged.

It is impossible to construct a probability distribution satisfying his requirements, so the theorem doesn't apply to any possible situations.

It doesn't prove that the expected utility isn't bounded. The way the theorem works, it can't rule out a bound over an interval of 2, e.g., U(k) might be provably between -1.1 and 1.1. A reasoner can work fine with bounded utility functions.

I can't agree with any of your caveats. (This is not the same as saying that I think everything in PdB's paper is correct. I haven't looked at it carefully enough to have an opinion on that point.)

*"my second counterexample still looks solid to me ... It only applies if the number of ... is uncountable"*:

The function U in PdB's paper doesn't take integers as arguments, it takes infinite sequences of "perceptions". Provided there are at least two possible perceptions at each time step, there are uncountably many such sequences. How are you proposing to change the model to make the domain of U countable, while still making it be a model of agents acting in time?

One way would be to use finite instead of infinite sequences. That corresponds to considering only a finite number of time-steps. (In that case, obviously temporal discounting isn't going to have any impact on the convergence of anything.)

Another possibility would be to use a utility function that's zero almost everywhere. Aside from the obvious remark that this seems unlikely to be a good model for real agents' (actual or idealized) utility functions, I think (but this is just pure intuition; I haven't tried to construct a proof or looked for refutations) that a utility function satisfying PdB's conditions about being bounded away from zero by a computable function is necessarily nonzero on an uncountable set of arguments.

*"It is impossible to construct a probability distribution satisfying his requirements""*:

I think you are mistaken about what those requirements are. What he needs is a probability measure on the space of environments; he doesn't insist that each individual possible environment have nonzero probability. What he *does* insist on is a nonzero probability for each possible *computable* environment (note: an environment is a function from agent action-sequences to agent perception-sequences), and there are only countably many of those.

*"It doesn't prove that the expected utility isn't bounded."*

I think it does, actually, although he hasn't said so explicitly. He has this noncomputable function B, and he uses the assertion that B(n) > rho(n) infinitely often when rho is computable. But then consider, say, 2^n rho(n), which is also computable; B(n) / rho(n) is then > 2^n infinitely often. Unless I'm confused (which I might be) I think this leads not only to the conclusion (which he states) that the absolute values of the terms in his series are >=1 infinitely often, but that you can find an infinite subsequence of the terms that grows faster than 2^n, or indeed faster than any computable function.

In any case, if you try to sum an infinite series with infinitely many terms of absolute value > 1, then the range of values in your partial sums depends on the order of summation. In particular, even if the U(k) were bounded, that wouldn't enable you to say things like "the expected utility is a divergent series but its value oscillates only between -13 and +56" unless there were One True Order of summation for the terms of the series. Which I can't see that there is or could be.

The function U in PdB's paper doesn't take integers as arguments, it takes infinite sequences of "perceptions". Provided there are at least two possible perceptions at each time step, there are uncountably many such sequences. How are you proposing to change the model to make the domain of U countable, while still making it be a model of agents acting in time?

The domain of U must not only be countable; it must be finite. Remember, an agent needs to compute this sum once for every possible action! Both the number of possible worlds, and the number of possible actions, need to be finite; or the utility function needs to be solvable analytically (unlikely); or else the agent will take an infinite amount of time to make a decision.

So, the model of an agent acting in time, where we compute utility at each timestep in the future, does not seem to produce a computable decision process. Is that right?

I think you are mistaken about what those requirements are. What he needs is a probability measure on the space of environments; he doesn't insist that each individual possible environment have nonzero probability.

You're right. Let's try to figure out how many possible environments there are.

"Let p:H->R be our probability distribution on the hypothesis set. We require that there exist some computable function p-bar:N->Q such that for all psi_n in S_I, 0 < p-bar(n) <= p(psi_n)".

p(psi_n) must be non-zero for all psi_n in S_I. S is the set of mu-recursive functions on a single argument. A mu-recursive function is one that can be computed by Turing machines. S_I is the set of computable environments f such that for every input that has been tested, f(i) = h(i), where h is the true environment.

How large is S_I? It could be up to the size of S. Is the number of computable functions less than R (2^N)?

In any case, if you try to sum an infinite series with infinitely many terms of absolute value > 1, then the range of values in your partial sums depends on the order of summation. In particular, even if the U(k) were bounded, that wouldn't enable you to say things like "the expected utility is a divergent series but its value oscillates only between -13 and +56" unless there were One True Order of summation for the terms of the series. Which I can't see that there is or could be.

If I construct a countable series so as to have a 1-1 pairing between terms with value 1 and terms with value -1, does that do no good? e.g. U(n) = 1 if n is even, -1 if n is odd.

Is there a way to enter subscripts and superscripts in this thing?

No, the domain of U need not be finite. (1) The paper explicitly says that U need not be computable. (2) The model of computation implicit in the paper permits computable functions to have infinite domain (it's basically Turing machines). (3) The way the paper proposes that a computable U should be implemented involves only looking at a finite amount of the information in the argument of U. (Kinda.)

But yes: the paper's model does permit a non-computable decision process. (Neither does it require that U be uncomputable.)

There is a countable infinite of computable functions. (Because there is a countable infinity of programs, because a program is a finite string or something of the kind.) An environment is defined to be a function from sequences of actions to sequences of perceptions, and there are uncountably many (more precisely: continuum-many) of those. So the situation envisaged by the paper is as follows: our agent assigns a nonzero probability to each computable environment; if the sum of those probabilities is less than 1, it also assigns a nonzero probability to some sets (though not necessarily any singleton sets) of other possible environments.

Suppose you get an expression for the expected utility in some situation, and it takes the form of an infinite series: 1 - 1 + 1 - 1 + 1 - 1 + 1 - ... (etc.). Then you might want to say "well, there's no actual expected utility, but we can say that the expected utility is some kind of fuzzy thing taking values between 0 and 1" -- but unless there's some reason why the order of summation you used there is the only possible one, you could rearrange the terms a little (swap terms 2 and 3, terms 6 and 7, etc.) and get 1 + 1 - 1 - 1 + 1 + 1 - 1 - 1 + ... whose partial sums vary between 0 and 2 instead. Or you could rearrange the terms more drastically and get partial sums that become arbitrarily large-and-positive and arbitrarily large-and-negative. Or you could just swap the first two terms, getting a series whose values vary between -1 and +1 instead of between 0 and +1.

Perhaps it's possible to construct some theory of "fuzzy convergence" for series like this *when there's a single right order to add up the terms in*. But it's far from obvious that you could base any usable sort of decision theory on utilities defined as the sums of such series. In any case, in this instance there's one term for each possible computable environment, and it seems very obvious that there isn't a canonical way to order those, and that different orderings will give substantially different sequences of partial sums.

We are humans. We do not possess a utility-function. Our values and goals are not stable. We do not differentiate between means and ends, instrumental and terminal goals. Humans get bored. Humans have time preferences.

There is no crucial difference between inconsistency of goals caused by discounting versus inconsistency caused by boredom. You might enjoy collecting paperclips in 2011 and pay for a ticket to visit a paperclip conference in 2012; but then your future self in 2012 gets bored of paperclips and goes to visit Disneyland instead. That's humane!

What would happen if humans were to discard their time preferences? We would be terrorized by our expectations, always choosing the future over the present. We would only ever pursue instrumental goals and never reach any terminal goals. We would solely care about expected utility rather than actual experience utility.

What is irrational with regard to human nature is to allow the preservation of our values to outweigh their satisfaction. We can not pick and choose our values by their weighting and at the same time retain more than a few basic goals (e.g. survival). If to be rational means to win and to win means to satisfy our values, to reach our goals, then we have to account for the fact that the preservation and satisfaction of human values are overlapping. We value how we choose and we choose what we value.

There is no crucial difference between inconsistency of goals caused by discounting versus inconsistency caused by boredom. You might enjoy collecting paperclips in 2011 and pay for a ticket to visit a paperclip conference in 2012; but then your future self in 2012 gets bored of paperclips and goes to visit Disneyland instead. That's humane!

I disagree. In this case, I would instrumentally value collecting paperclips, perhaps because I find it fun. What has changed is how much fun I derive from paperclips, not how much I value fun. This is not a true case of preference reversal, because I have learnt new information during the time interval; if I had known that I would stop finding paperclips fun, I would not buy the ticket and I would not mind the change, as long as I could find fun in visits to Disneyland instead. This is no different from a gambler who would change his bet if he knew more about how the future would play out.

I've written before about the difficulty of distinguishing values from errors, from algorithms, and from context. Now I have to add to that list: How can we distinguish our utility function from the parameters we use to apply it?

In my recent discussion post, "Rationalists don't care about the future", I showed that exponential time-discounting, plus some assumptions about physics and knowledge, leads to not caring about the future. Many people responded by saying that, if I care about the future, this shows that my utility function does not use exponential time-discounting.

This response assumes that the shape of my time-discounting function is

partof my utility function. In other words, the way you time-discount is one of your values.By contrast, Eliezer wrote an earlier post saying that we should use human values, but without time-discounting. Eliezer is aware that humans appear to use time discounting. Therefore, this implicitly claims that the time-discounting function is

notone of our values. It's a parameter for how we implement them.(Some of the arguments Eliezer used were value-based arguments, suggesting that we can use our values to set the parameters that we use to implement our values... I suspect this recursive approach could introduce bogus solutions, like multiplying both sides of an equation by a variable, or worse; but that would take a longer post to address. I will note that some recursive equations do have unique solutions.)

The program of CEV assumes that a transhuman can use some extrapolated version of values currently used by some humans. If that transhuman has a life expectancy of a billion years, it will likely view time discounting differently. Eliezer's post against time discounting suggests, to me, a God-like view of the universe, in which we eliminate time discounting in the same way (and for the same reasons) that many people want to eliminate space-discounting (not caring about far-away people) in contemporary ethics. This is taking an ethical code that evolved agents have, which is constructed to promote the propagation of those agents' genes, and applying it without reference to any particular set of genes. This is also pretty much what folk-morality says a social moral code is. So the idea that you can apply the same utility function from a radically different context, is inherent in CEV, and is common to much public discourse on ethics which assumes that you can construct a

socialmorality that is based on the morality we find in individual agents.On the other hand, I have argued that assuming that social ethics and individual ethics are the same, is either merely sloppy thinking, or an evolved (or deliberately constructed) lie. People who believed this would probably subscribe to a social-contract theory of ethics. (This view also has problems, beyond the scope of this post.)

I have one heuristic that I think is pretty good for telling when something is

nota value: If it's mathematically wrong, it's an error, not a value. So my inclination is to point out that exponential time-discounting iscorrect. All other forms of time-discounting lead to inconsistencies. You can time-discount exponentially; or you can not time-discount at all, as Eliezer suggested; or you can be in error.But my purpose in

thispost is not to continue the arguments from that other post. It's to point out this additional challenge in isolating what values are. Is your time-discounting function a value, or a value parameter?