So the argument/characterization of the Nash bargaining solution is the following (correct?): The Nash bargaining solution is the (almost unique) outcome o for which there is a rescaling w of the utility functions such that both the utilitarian solution under rescaling w and the egalitarian solution under rescaling w is o. This seems interesting! (Currently this is a bit hidden in the proof.)

Do you show the (almost) uniqueness of o, though? You show that the Nash bargaining solution has the property, but you don't show that no other solution has this property, right?


I'd be interested in learning more about your views on some of the tangents:

>Utilities are bounded.

Why? It seems easy to imagine expected utility maximizers whose behavior can only be described with unbounded utility functions, for example.

>I think many phenomena that get labeled as politics are actually about fighting over where to draw the boundaries.

I suppose there are cases where the connection is very direct (drawing district boundaries, forming coalitions for governments). But can you say more about what you have in mind here?


>Not, they are in a positive sum

I assume the first word is a typo. (In particular, it's one that might make the post less readable, so perhaps worth correcting.)

I think in the social choice literature, people almost always mean preference utilitarianism when they say "utilitarianism", whereas in the philosophical/ethics literature people are more likely to mean hedonic utilitarianism. I think the reason for this is that in the social choice and somewhat adjacent game (and decision) theory literature, utility functions have a fairly solid foundation as a representation of preferences of rational agents. (For example, Harsanyi's "[preference] utilitarian theorem" paper and Nash's paper on the Nash bargaining solution make very explicit reference to this foundation.) Whereas there is no solid foundation for numeric hedonic welfare (at least not in this literature, but also not elsewhere as far as I know).

>Anthropically, our existence provides evidence for them being favored.

There are some complications here. It depends a bit on how you make anthropic updates (if you do them at all). But it turns out that the version of updating that "works" with EDT basically doesn't make the update that you're in the majority. See my draft on decision making with anthropic updates.

>Annex: EDT being counter-intuitive?

I mean, in regular probability calculus, this is all unproblematic, right? Because of the Tower Rule a.k.a. Law of total expectation or similarly conservation of expected evidence. There are also issues of updatelessness, though, you touch on at various places in the post. E.g., see Almond's "lack of knowledge is [evidential] power" or scenarios like the Transparent Newcomb's problem wherein EDT wants to prevent itself from seeing the content of the boxes.

>It seems plausible that evolutionary pressures select for utility functions broadly as ours

Well, at least in some ways similar as ours, right? On questions like whether rooms are better painted red or green, I assume there isn't much reason to expect convergence. But on questions of whether happiness is better than suffering, I think one should expect evolved agents to mostly give the right answers.

>to compare such maximizations, you already need a decision theory (which tells you what "maximizing your goals" even is).

Incidentally I published a blog post about this only a few weeks ago (which will probably not contain any ideas that are new to you).

>Might there be some situation in which an agent wants to ensure all of its correlates are Good Twins

I don't think this is possible.

There have been discussions of the suffering of wild animals. David Pearce discusses this, see one of the other comment threads. Some other starting points:

>As a utilitarian then, it should be far more important to wipe out as many animal habitats as possible rather than avoiding eating a relatively small number of animals by being a vegan.

To utilitarians, there are other considerations in assessing the value of wiping out animal habitats, like the effect of such habitats on global warming.

Nice post!

What would happen in your GPT-N fusion reactor story if you ask it a broader question about whether it is a good idea to share the plans? 

Perhaps relatedly:

>Ok, but can’t we have an AI tell us what questions we need to ask? That’s trainable, right? And we can apply the iterative design loop to make AIs suggest better questions?

I don't get what your response to this is. Of course, there is the verifiability issue (which I buy). But it seems that the verifiability issue alone is sufficient for failure. If you ask, "Can this design be turned into a bomb?" and the AI says, "No, it's safe for such and such reasons", then if you can't evaluate these reasons, it doesn't help you that you have asked the right question.

Sounds interesting! Are you going to post the reading list somewhere once it is completed?

(Sorry for self-promotion in the below!)

I have a mechanism design paper that might be of interest: Caspar Oesterheld and Vincent Conitzer: Decision Scoring Rules. WINE 2020. Extended version. Talk at CMID.

Here's a pitch in the language of incentivizing AI systems -- the paper is written in CS-econ style. Imagine you have an AI system that does two things at the same time:
1) It makes predictions about the world.
2) It takes actions that influence the world. (In the paper, we specifically imagine that the agent makes recommendations to a principal who then takes the recommended action.) Note that if the predictions are seen by humanity, they themselves influence the world. So even a pure oracle AI might satisfy 2, as has been discussed before (see end of this comment).
We want to design a reward system for this agent such the agent maximizes its reward by making accurate predictions and taking actions that maximize our, the principals', utility.

The challenge is that if we reward the accuracy of the agent's predictions, we may set an incentive on the agent to make the world more predictable, which will generally not be aligned without mazimizing our utility.

So how can we properly incentivize the agent? The paper provides a full and very simple characterization of such incentive schemes, which we call proper decision scoring rules:

We show that proper decision scoring rules cannot give the [agent] strict incentives to report any properties of the outcome distribution [...] other than its expected utility. Intuitively, rewarding the [agent] for getting anything else about the distribution right will make him [take] actions whose outcome is easy to predict as opposed to actions with high expected utility [for the principal]. Hence, the [agent's] reward can depend only on the reported expected utility for the recommended action. [...] we then obtain four characterizations of proper decision scoring rules, two of which are analogous to existing results on proper affine scoring [...]. One of the [...] characterizations [...] has an especially intuitive interpretation in economic contexts: the principal offers shares in her project to the [agent] at some pricing schedule. The price schedule does not depend on the action chosen. Thus, given the chosen action, the [agent] is incentivized to buy shares up to the point where the price of a share exceeds the expected value of the share, thereby revealing the principal's expected utility. Moreover, once the [agent] has some positive share in the principal's utility, it will be (strictly) incentivized to [take] an optimal action.

Also see Johannes Treutlein's post on "Training goals for large language models", which also discusses some of the above results among other things that seem like they might be a good fit for the reading group, e.g., Armstrong and O'Rourke's work.

My motivation for working on this was to address issues of decision making under logical uncertainty. For this I drew inspiration from the fact that Garrabrant et al.'s work on logical induction is also inspired by market design ideas (specifically prediction markets).

>Because there's "always a bigger infinity" no matter which you choose, any aggregation function you can use to make decisions is going to have to saturate at some infinite cardinality, beyond which it just gives some constant answer.

Couldn't one use a lexicographic utility function that has infinitely many levels? I don't know exactly how this works out technically. I know that maximizing the expectation of a lexicographic utility function is equivalent to the vNM axioms without continuity, see Blume et al. (1989). But they only mention the case of infinitely many levels in passing.

Cool that this is (hopefully) being done! I have had this on my reading list for a while and since this is about the kind of problems I also spend much time thinking about, I definitely have to understand it better at some point. I guess I can snooze it for a bit now. :P Some suggestions:

Maybe someone could write an FAQ page? Also, a somewhat generic idea is to write something that is more example based, perhaps even something that just solely gives examples. Part of why I suggest these two is that I think they can be written relatively mechanically and therefore wouldn't take that much time and insight to write. Also, maybe Vanessa or Alex could also record a talk? (Typically one explains things differently in talks/on a whiteboard and some people claim that one generally does so better than in writing.)

I think for me the kind of writeup that would have been most helpful (and maybe still is) would be some relatively short (5-15 pages), clean, self-contained article that communicates the main insight(s), perhaps at the cost of losing generality and leaving some things informal. So somewhere in between the original intro post / the content in the AXRP episode / Rohin's summary (all of which explain the main idea but are very informal) and the actual sequence (which seems to require wading through a lot of intrinsically not that interesting things before getting to the juicy bits). I don't know to what extent this is feasible, given that I haven't read any of the technical parts yet. (Of course, a lot of projects have this presentation problem, but I think usually there's some way to address this. E.g., compare the logical induction paper, which probably has a number of important technical aspects that I still don't understand or forgot at this point. But where by making a lot of things a bit informal, the main idea can be grasped from the short version, or from a talk.)

Load More