All of Eric Chen's Comments + Replies

Oh yeah, the Folk Theorem is totally consistent with the Nash equilibrium of the repeated game here being 'everyone plays 30 forever', since the payoff profile '-30 for everyone' is feasible and individually-rational. In fact, this is the unique NE of the stage game and also the unique subgame-perfect NE of any finitely repeated version of the game.

To sustain '-30 for everyone forever', I don't even need a punishment for off-equilibrium deviations. The strategy for everyone can just be 'unconditionally play 30 forever' and there is no profitable unilateral... (read more)

The 'individual rationality condition' is about the payoffs in equilibrium, not about the strategies. It says that the equilibrium payoff profile must yield to each player at least their minmax payoff. Here, the minmax payoff for a given player is -99.3 (which comes from the player best responding with 30 forever to everyone else setting their dials to 100 forever).  The equilibrium payoff is -99 (which comes from everyone setting their dials to 99 forever). Since -99 > -99.3, the individual rationality condition of the Folk Theorem is satisfied. 

I think the "at least" is an important part of this.  If it yields more than their minimax payoff, either because the opponents are making mistakes, or have different payoffs than you think, or are just cruelly trying to break your model, there's no debt created because there's no cost to recoup. The minimax expectation is 99.3 (the player sets to 30 and everyone else to 100).  One possible bargaining/long-term repeated equilibrium is 99, where everyone chooses 99, and punishes anyone who sets to 100 by setting themselves to 100 for some time.  But it would be just as valid to expect the long-term equilibrium to be 30, and punish anyone who sets to 31 or higher.  I couldn't tell from the paper how much communication was allowed between players, but it seems to assume some mutual knowledge of each other's utility and what a given level of "punishment" achieves. In no case do you need to punish someone who's unilaterally giving you BETTER than your long-term equilibrium expectation.

Because the meaning of statements does not, in general, consist entirely in observations/anticipated experiences, and it makes sense for people to have various attitudes (centrally, beliefs and desires) towards propositions that refer to unobservable-in-principle things.

Accepting that beliefs should pay rent in anticipated experience does not mean accepting that the meaning of sentences are determined entirely by observables/anticipated experiences. We can have that the meanings of sentences are the propositions they express, and the truth-conditions of pr... (read more)

Same as Sylvester, though my credence in consciousness-collapse interpretations of quantum mechanics has moved from 0.00001 to 0.000001.

Yeah great point, thanks. We tried but couldn't really get a set-up where she just learns a phenomenal fact. If you have a way of having the only difference in the 'Tails, Tuesday' case be that Mary learns a phenomenal fact, we will edit it in!

Thanks, the clarification of UDT vs. "updateless" is helpful.

But now I'm a bit confused as to why you would still regard UDT as "EU maximisation, where the thing you're choosing is policies". If I have a preference ordering over lotteries that violates independence, the vNM theorem implies that I cannot be represented as maximising EU.

In fact, after reading Vladimir_Nesov's comment, it doesn't even seem fully accurate to view UDT taking in a preference ordering over lotteries. Here's the way I'm thinking of UDT: your prior over possible worlds uniquely det... (read more)

2Scott Garrabrant9mo
Yeah, I don't have a specific UDT proposal in mind. Maybe instead of "updateless" I should say "the kind of mind that might get counterfactually mugged" as in this example.

Okay this is very clarifying, thanks! 

If the preference ordering over lotteries violates independence, then it will not be representable as maximising EU with respect to the probabilities in the lotteries (by the vNM theorem). Do you think it's a mistake then to think of UDT as "EU maximisation, where the thing you're choosing is policies"? If so, I believe this is the most common way UDT is framed in LW discussions, and so this would be a pretty important point for you to make more visibly (unless you've already made this point before in a post, in which case I'd love to read it).

4Scott Garrabrant9mo
I think UDT is as you say. I think it is also important to clarify that you are not updating on your observations when you decide on a policy. (If you did, it wouldn't really be a function from observations to actions, but it is important to emphasize in UDT.) Note that I am using "updateless" differently than "UDT". By updateless, I mostly mean anything that is not performing Bayesian updates and forgetting the other possible worlds when it makes observations. UDT is more of a specific proposal. "Updateless" is more of negative property, defined by lack of updating. I have been trying to write a big post on utility, and haven't yet, and decided it would be good to give a quick argument here because of the question. The only posts I remember making against utility are in the geometric rationality sequence, especially this post.

Yeah by "having a utility function" I just mean "being representable as trying to maximise expected utility".

Ah okay, interesting. Do you think that updateless agents need not accept any separability axiom at all? And if not, what justifies using the EU framework for discussing UDT agents? 

In many discussions on LW about UDT, it seems that a starting point is that agent is maximising some notion of expected utility, and the updatelessness comes in via the EU formula iterating over policies rather than actions. But if we give up on some separability axiom, it seems that this EU starting point is not warranted, since every major EU representation theorem needs some version of separability. 

6Scott Garrabrant9mo
You could take as an input parameter to UDT a preference ordering over lotteries that does not satisfy the independence axiom, but is a total order (or total preorder if you want ties). Each policy you can take results in a lottery over outcomes, and you take the policy that gives your favorite lottery. There is no need for the assumption that your preferences over lotteries is vNM. Note that I don't think that we really understand decision theory, and have a coherent proposal. The only thing I feel like I can say confidently is that if you are convinced by the style of argument that is used to argue for the independence axiom, then you should probably also be convinced by arguments that cause you to be updateful and thus not reflectively stable.

Don't updateless agents with suitably coherent preferences still have utility functions?

8Scott Garrabrant9mo
That depends on what you mean by "suitably coherent." If you mean they need to satisfy the independence vNM axiom, then yes. But the point is that I don't see any good argument why updateless agents should satisfy that axiom. The argument for that axiom passes through wanting to have a certain relationship with Bayesian updating.

That's a coherent utility function, but it seems bizarre. When you're undergoing extreme suffering, in that moment you'd presumably prefer death to continuing to exist in suffering, almost by nature of what extreme suffering is. Why defer to your current preferences rather than your preferences in such moments? 

Also, are you claiming this is just your actual preferences or is this some ethical claim about axiology?

I don't see why such moments should matter, than they don't matter for other preferences that are unstable under torture - when you’re undergoing extreme suffering you would prefer everyone else to suffering instead of just you, but that doesn't mean you shouldn't be altruistic. I'm not committed to any specific formalization of my values, but yes, not wanting to die because of suffering is my preference.

What's the countervailing good that makes you indifferent between tortured lives and nonexistence? Presumably the extreme suffering is a bad that adds negative value to their lives. Do you think just existing or being conscious (regardless of the valence) is intrinsically very good?

I don't see a way to coherently model my "never accept death" policy with unbounded negative values for suffering - like you said, I'll need either infinitely negative value for death or something really good to counterbalance arbitrary suffering. So I use bounded function instead, with lowest point being death and suffering never lowering value below it (for example suffering can add multiplicative factors with value less than 1). I don't think "existing is very good" fits - the actual values for good things can be pretty low - it's just the effect of suffering on total value is bounded.

In particular, other alignment researchers tend to think that competitive supervision (e.g. AIs competing for reward to provide assistance in AI control that humans evaluate positively, via methods such as debate and alignment bootstrapping, or ELK schemes). 

Unfinished sentence?

I think there's a typo in the last paragraph of Section I? 

And let’s use “≻” to mean “better than” (or, “preferred to,” or “chosen over,” or whatever), “≺” to mean “at least as good as,”

"≺" should be "≽"

2Joe Carlsmith1y
Thanks! Fixed.

Yeah, by "actual utility" I mean the sum of the utilities you get from the outcomes of each decision problem you face. You're right that if my utility function were defined over lifetime trajectories, then this would amount to quite a substantive assumption, i.e. the utility of each iteration contributes equally to the overall utility and what not. 

And I think I get what you mean now, and I agree that for the iterated decisions argument to be internally motivating for an agent, it does require stronger assumptions than the representation theorem argum... (read more)

Yes, I think we're on the same page now.

The assumption is that you want to maximize your actual utility. Then, if you expect to face arbitrarily many i.i.d. iterations of a choice among lotteries over outcomes with certain utilities, picking the lottery with the highest expected utility each time gives you the highest actual utility. 

It's really not that interesting of an argument, nor is it very compelling as a general argument for EUM. In practice, you will almost never face the exact same decision problem, with the same options, same outcomes, same probability, and same utilities, over and over again. 

Ah, I think that is what I was talking about. By "actual utility", you mean the sum over the utility of the outcome of each decision problem you face, right? What I was getting at is that your utility function splitting as a sum like this is an assumption about your preferences, not just about the relationship between the various decision problems you face.

Yeah, that's a good argument that if your utility is monotonically increasing in some good X (e.g. wealth), then the type of the iterated decision you expect to fact involving lotteries over that good can determine that the best way to maximize your utility is to maximize a particular function (e.g. linear) of that good. 

But this is not what the 'iterated decisions' argument for EUM amounts to. In a sense, it's quite a bit less interesting. The 'iterated decisions' argument does not start with some weak assumption on your utility function and then att... (read more)

Oh, are you talking about the kind of argument that starts from the assumption that your goal is to maximize a sum over time-steps of some function of what you get at that time-step? (This is, in fact, a strong assumption about the nature of the preferences involved, which representation theorems like VNM don't make.)

The 'iterated decisions'-type arguments support EUM in a given decision problem if you expect to face the exact same decision problem over and over again. The 'representation theorem' arguments support EUM for a given decision problem, without qualification. 

In either case, your utility function is meant to be constructed from your underlying preference relation over the set of alternatives for the given problem. The form of the function can be linear in some things or not, that's something to be determined by your preference relation and not the arguments for EUM.

No, what I was trying to say is that this is true only for representation theorem arguments, but not for the iterated decisions type of argument. Suppose your utility function is some monotonically increasing function of your eventual wealth. If you're facing a choice between some set of lotteries over monetary payouts, and you expect to face an extremely large number of i.i.d. iterations of this choice, then by the law of large numbers, you should pick the option with the highest expected monetary value each time, as this maximizes your actual eventual wealth (and thus your actual utility) with probability near 1. Or suppose you expect to face an extremely large number of similarly-distributed opportunities to place bets at some given odds at whatever stakes you choose on each step, subject to the constraint that you can't bet more money than you have. Then the Kelly criterion says that if you choose the stakes that maximizes your expected log wealth each time, this will maximize your eventual actual wealth (and thus your actual utility, since that's monotonically increasing with you eventual wealth) with probability near 1. So, in the first case, we concluded that you should maximize a linear function of money, and in the second case, we concluded that you should maximize a logarithmic function of money, but in both cases, we assumed nothing about your preferences besides "more money is better", and the function you're told to maximize isn't necessarily your utility function as in the VNM representation theorem. The shape of the function you're told you should maximize comes from the assumptions behind the iteration, not from your actual preferences.

Another good resource on this, which distinguishes the affectable universe, the observable universe, the eventually observable universe, and the ultimately observable universe: The Edges of Our Universe by Toby Ord.

What's your current credence that we're in a simulation?

I think that by count across all the possible worlds (and the impossible ones) the vast majority of observers like us are in simulations. And probably by count in our universe the vast majority of observers like us are in simulations, except that everything is infinite and so counting observers is pretty meaningless (which just helps to see that it was never the thing you should care about).

I'm not sure "we're in a simulation" is the kind of thing it's meaningful to talk about credences in, but it's definitely coherent to talk about betting odds (i.e. how ... (read more)

Philosophical Zombies: inconceivable, conceivable but not metaphysically possible, or metaphysically possible? 

Conceivable but not logically possible? (See also: l-zombies about which I feel similarly.)

Do you think progress has been made on the question of "which AIs are good successors?" Is this still your best guess for the highest impact question in moral philosophy right now? Which other moral philosophy questions, if any, would you put in the bucket of questions that are of comparable importance?

I'm not aware of anyone trying to work on that problem (but I don't follow academic philosophy so for all I know there's lots of relevant stuff even before my post). It's still at the top of my list of problems in moral philosophy. The most natural other question of similar importance is how nice we should be to other humans, e.g. how we should prioritize actions that involve leaving us better off and others worse off (either people different from us, people similar to us, governments that don't represent their constituents well, etc.). Neither of those questions is a single simple question (though the AI one feels more like a single simple question since it has so many aspects so different from what people normally think about), they are big clouds of questions that feel kind of core to the whole project of moral philosophy. (Obviously all of that is coming from a very consequentialist perspective, such that these questions involve a distinctive-to-consequentialists mix of axiology, decision theory, and understanding how moral intuitions relate to both.)