abramdemski's Comments

An Orthodox Case Against Utility Functions

Yeah, a didactic problem with this post is that when I write everything out, the "reductive utility" position does not sound that tempting.

I still think it's a really easy trap to fall into, though, because before thinking too much the assumption of a computable utility function sounds extremely reasonable.

Suppose I'm running a company, trying to maximize profits. I don't make decisions by looking at the available options, and then estimating how profitable I expect the company to be under each choice. Rather, I reason locally: at a cost of X I can gain Y, I've cached an intuitive valuation of X and Y based on their first-order effects, and I make the choice based on that without reasoning through all the second-, third-, and higher-order effects of the choice. I don't calculate all the way through to an expected utility or anything comparable to it.

With dynamic-programming inspired algorithms such as AlphaGo, "cached an intuitive valuation of X and Y" is modeled as a kind of approximate evaluation which is learned based on feedback -- but feedback requires the ability to compute U() at some point. (So you don't start out knowing how to evaluate uncertain situations, but you do start out knowing how to evaluate utility on completely specified worlds.)

So one might still reasonably assume you need to be able to compute U() despite this.

Two Alternatives to Logical Counterfactuals

OK, all of that made sense to me. I find the direction more plausible than when I first read your post, although it still seems like it'll fall to the problem I sketched.

I both like and hate that it treats logical uncertainty in a radically different way from empirical uncertainty -- like, because we have so far failed to find any way to treat the two uniformly (besides being entirely updateful that is); and hate, because it still feels so wrong for the two to be very different.

Two Alternatives to Logical Counterfactuals

I'm left with the feeling that you don't see the problem I'm pointing at.

My concern is that the most plausible world where you aren't a pure optimizer might look very very different, and whether this very very different world looks better or worse than the normal-looking world does not seem very relevant to the current decision.

Consider the "special exception selves" you mention -- the Nth exception-self has a hard-coded exception "go right if it's beet at least N turns and you've gone right at most 1/N of the time".

Now let's suppose that the worlds which give rise to exception-selves are a bit wild. That is to say, the rewards in those worlds have pretty high variance. So a significant fraction of them have quite high reward -- let's just say 10% of them have value much higher than is achievable in the real world.

So we expect that by around N=10, there will be an exception-self living in a world that looks really good.

This suggests to me that the policy-dependent-source agent cannot learn to go left > 90% of the time, because once it crosses that threshhold, the exception-self in the really good looking world is ready to trigger its exception -- so going right starts to appear really good. The agent goes right until it is under the threshhold again.

If that's true, then it seems to me rather bad: the agent ends up repeatedly going right in a situation where it should be able to learn to go left easily. Its reason for repeatedly going right? There is one enticing world, which looks much like the real world, except that in that world the agent definitely goes right. Because that agent is a lucky agent who gets a lot of utility, the actual agent has decided to copy its behavior exactly -- anything else would prove the real agent unlucky, which would be sad.

Of course, this outcome is far from obvious; I'm playing fast and loose with how this sort of agent might reason.

Two Alternatives to Logical Counterfactuals

If you see your source code is B instead of A, you should anticipate learning that the programmers programmed B instead of A, which means something was different in the process. So the counterfactual has implications backwards in physical time.

At some point it will ground out in: different indexical facts, different laws of physics, different initial conditions, different random events...

I'm not sure how you are thinking about this. It seems to me like this will imply really radical changes to the universe. Suppose the agent is choosing between a left path and a right path. Its actual programming will go left. It has to come up with alternate programming which would make it go right, in order to consider that scenario. The most probable universe in which its programming would make it go right is potentially really different from our own. In particular, it is a universe where it would go right despite everything it has observed, a lifetime of (updateless) learning, which in the real universe, has taught it that it should go left in situations like this.

EG, perhaps it has faced an iterated 5&10 problem, where left always yields 10. It has to consider alternate selves who, faced with that history, go right.

It just seems implausible that thinking about universes like that will result in systematically good decisions. In the iterated 5&10 example, perhaps universes where its programming fails iterated 5&10 are universes where iterated 5&10 is an exceedingly unlikely situation; so in fact, the reward for going right is quite unlikely to be 5, and very likely to be 100. Then the AI would choose to go right.

Obviously, this is not necessarily how you are thinking about it at all -- as you said, you haven't given an actual decision procedure. But the idea of considering only really consistent counterfactual worlds seems quite problematic.

Two Alternatives to Logical Counterfactuals

Conditioning on ‘A(obs) = act’ is still a conditional, not a counterfactual. The difference between conditionals and counterfactuals is the difference between “If Oswald didn’t kill Kennedy, then someone else did” and “If Oswald didn’t kill Kennedy, then someone else would have”.

I still disagree. We need a counterfactual structure in order to consider the agent as a function A(obs). EG, if the agent is a computer program, the function would contain all the counterfactual information about what the agent would do if it observed different things. Hence, considering the agent's computer program as such a function leverages an ontological commitment to those counterfactuals.

To illustrate this, consider counterfactual mugging where we already see that the coin is heads -- so, there is nothing we can do, we are at the mercy of our counterfactual partner. But suppose we haven't yet observed whether Omega gives us the money.

A "real counterfactual" is one which can be true or false independently of whether its condition is met. In this case, if we believe in real counterfactuals, we believe that there is a fact of the matter about what we do in the case, even though the coin came up heads. If we don't believe in real counterfactuals, we instead think only that there is a fact of how Omega is computing "what I would have done if the coin had been tails" -- but we do not believe there is any "correct" way for Omega to compute that.

The representation and the representation both appear to satisfy this test of non-realism. The first is always true if the observation is false, so, lacks the ability to vary independently of the observation. The second is undefined when the observation is false, which is perhaps even more appealing for the non-realist.

Now consider the representation. can still vary even when we know . So, it fails this test -- it is a realist representation!

Putting something into functional form imputes a causal/counterfactual structure.

Two Alternatives to Logical Counterfactuals

In the happy dance problem, when the agent is considering doing a happy dance, the agent should have already updated on M. This is more like timeless decision theory than updateless decision theory.

I agree that this gets around the problem, but to me the happy dance problem is still suggestive -- it looks like the material conditional is the wrong representation of the thing we want to condition on.

Also -- if the agent has already updated on observations, then updating on is just the same as updating on . So this difference only matters in the updateless case, where it seems to cause us trouble.

Thinking About Filtered Evidence Is (Very!) Hard

Sure, that seems reasonable. I guess I saw this as the point of a lot of MIRI’s past work, and was expecting this to be about honesty / filtered evidence somehow.

Yeah, ok. This post as written is really less the kind of thing somebody who has followed all the MIRI thinking needs to hear and more the kind of thing one might bug an orthodox Bayesian with. I framed it in terms of filtered evidence because I came up with it by thinking about some confusion I was having about filtered evidence. And it does problematize the Bayesian treatment. But in terms of actual research progress it would be better framed as a negative result about whether Sam's untrollable prior can be modified to have richer learning.

I think we mean different things by “perfect model”. What if [...]

Yep, I agree with everything you say here.

The absurdity of un-referenceable entities

Also -- it may not come across in my other comments -- the argument in the OP was novel to me (at least, if I had heard it before, I thought it was wrong at that time and didn't update on it) and feels like a nontrivial observation about how reference has to work.

The absurdity of un-referenceable entities

Alright, cool. 👌In general I think reference needs to be treated as a vague object to handle paradoxes (something along the lines of Hartry Field's theory of vague semantics, although I may prefer something closer to linear logic rather than his non-classical logic) -- and also just to be more true to actual use.

I am not able to think of any argument why the set of un-referenceable entities should be paradoxical rather than empty, at the moment. But it seems somehow appropriate that the domain of quantification for our language be vague, and further could be that we don't assert that nothing lies outside of it. (Only that there is not some thing definitely outside of it.)

Load More