(Note: John discusses similar ideas here. We drafted this before he published his post, so some of the concepts might jar if you read it with his framing in mind. )
Traditionally, focus in Agent Foundations has been around the characterization of ideal agents, often in terms of coherence theorems that state that, under certain conditions capturing rational decision-making, an agent satisfying these conditions must behave as if it maximizes a utility function. In this post, we are not so much interested in characterizing ideal agents — at least not directly. Rather, we are interested in how true agents and not-so true agents may be classified and taxonomized, how agents and pseudo-agents may be hierarchically aggregated and composed out of subagents and how agents with different preferences may be formed and selected for in different training regimes. First and foremost, we are concerned with how unified goal-directed agents form from a not-quite-agentic substratum. In other words, we are interested in Selection Theorems rather than Coherence Theorems. (We take the point of view that a significant part of the content of the coherence theorems is not so much in the theorems or rationality conditions themselves but in the money-pump arguments that are used to defend the conditions.)
This post concerns how expected utility maximizers may form from entities with incomplete preferences.
The classical model of a rational agent assumes it has vNM-preferences. We assume, in particular, that the agent is complete — i.e., that for any options x,y, we have x≥y or y≥x. However, in real-life agents, we often see incompleteness — i.e., a preference for default states, or maybe a path-dependent preference, or maybe a sense that a preference between two options is yet to be defined; we will leave the precise meaning of incompleteness somewhat open for the purposes of this post. The aim of this post is to understand the selection pressures that push agents towards completeness. Here are the main reasons we consider this important:
Let's review John Wentworth's post "Why subagents?" . John observes that inexploitability (= not taking sure losses) is not sufficient to be a vNM expected utility maximizer. An agent with incomplete preferences can be inexploitable without being an expected utility maximizer.
Although these agents are inexploitable, they can have path-dependent preferences. If John had grown up in Manchester he'd be a United fan. If he had grown up in Liverpool he'd be cracking the skulls of Manchester United fans. We could model this as a dynamical change of preferences ("preference formation"). Alternatively, we can model this as John having incomplete preferences: if he grows up in Manchester he loves United and wouldn't take the offer to switch to Liverpool. If he grows up in Liverpool he loves whatever the team in Liverpool is and doesn't switch to United. In other words, incompleteness is a frame to look at preference formation.
John Wentworth observes that an incomplete preference agent can be seen as composed of a collection of pure vNM agents having a default (action); it only changes that default action when all vNM subagents unanimously agree. You could say that an incomplete egregore/agent is like a 'vetocracy' of VNM subagents.
An incomplete preference ordering is just a partial order V on a space of options X.
We can now look at the set S of total orders T such that T extends V, meaning that if x<y according to V then x<y according to T. We know that V may be characterized as the intersection of all T in S.
A vetocracy (P,Ti∈I) is a pair of a pre-order P and a factorization P≅Πi∈ITi into total orders Ti, where we endow the product Πi∈ITi with the component-wise order x≥y if xi≥yi for all i∈I. We call these factors tribunes.
In this model, a superagent is composed out of subagents. Examples might be a company that is composed out of its employees or a market arising as the aggregation of many economic actors. Another example is an agent through time where as in the Steward of Myselves or in 'society of mind' models of the human mind where we see it as , see e.g. Multiagent Models of Mind.
An entity with incomplete preferences can be inexploitable (= does not take sure losses) but it generically leaves sure gains on the table.
As an example, Consider Atticus the Agent. Atticus is a vetocracy made of two tribunes, one that prefers A>B and the other that prefers B>A. Suppose it is offered (by a third party) to switch A→B+$1 and then B→A+$1. If it would take both trades, it would gain $2 in total: a sure gain. However, since its sub-tribunes will veto, this doesn't happen.
If we could get the tribunes to cooperate with one another, the superagent would learn to accept the sure gain and become more complete. Moreover, once complete, it is subject to money-pumps — there is now selection pressure for it to move towards becoming a goal-directed (approximate) utility maximizer.
Consider, as an example, the case where one is placed behind a veil of ignorance about the default option, and one is told that there will be a 50% chance of being placed in a world where A is the default and one is offered to switch to B-plus-1-dollar, and a 50% chance of being placed in a world where B is the default and one is offered to switch to A-plus-1-dollar. An analogous situation could also effectively happen without a veil of ignorance for a logical decision theorist (see the discussion Löbian cooperation below). The strategy where both tribunes enforce their veto on the choices offered leads to an outcome distribution of (50%A,50%B), whereas the strategy where both tribunes forgo their veto rights leads to the same distribution but with 1 dollar extra. Bilateral vetoing is thus Pareto dominated in this setup as well. Note that — in contrast to the sequential example above where if agents foresee the pair of offers, they could just agree on a policy of accepting both beforehand, and this would simply make everyone better off — in this probabilistic case, there is no world in which everyone becomes better off without the vetoes, and yet there is still a Pareto sense in which an incomplete agent is leaving money on the table.
In terms of the payoffs, the above reduces to a Prisoner's Dilemma where vetoing corresponds to defect and not vetoing corresponds to cooperate. The usual considerations of prisoner's dilemma apply: the default is the Defect-Defect Nash equilibrium unless
Remark. Some takeaways: consideration number 3 suggests superagents with more homogenous subagents are more coherent. Consideration number 4 suggests that doing 'internal conflict resolution and trust building' makes the superagent more coherent. This is highly reminiscient of the praxis of Internal Family Systems, see especially Kaj Sotala's sequence on IFS and multiagent models of minds.
Remark. Usually the modal Löbian cooperation is dismissed as not relevant for real situations but it is plausible that Löbian cooperation extends far more broadly than what is proved currently. Löbian cooperation is stronger in cases where the players resemble each other and/or have access to one another's blueprint. This is arguably only very approximately the case between different humans but it is much closer to be the case when we are considering different versions of the same human through time as well as subminds of that human.
Instead of the framing where completeness ought to come about first for any other selection pressures toward consistent preferences to apply (which we have operated within to some extent in the presentation above), a framing that is likely a better fit for the realistic situations is the following:
Simultaneously, there is a pressure to become complete and a pressure to become consistent. The pressure to become complete is the pressure of it being suboptimal to leave money on the table — to lose out on the many opportunities presented to one by the world. Insofar as one's computational bounds or other constraints prevent one from ensuring that one's preferences all hang together consistently, this pressure is balanced by a pressure not to be money-pumpable, which plausibly incentivizes one to avoid forming preferences — forming a preference comes with the danger of closing a cycle along which one can be money-pumped.
In conclusion, virtue ethics is a weakness of the will. Bold actors jump headfirst into the degenerate day trading of life - forever betwixt the siren call of FOMO and the pain of true loss.
In the vNM-theorem options are assumed to be 'lotteries', i.e. probability distributions on outcomes.
If one were maximally glib, one could say that all of AGI alignment can be considered a question of path-dependent preferences.
Or, more precisely, in canonical such arguments, completeness is established first, and the justification of every other coherence property relies on completeness. We do not wish to claim that completeness is necessarily a prerequisite — in fact, we will later gesture at a different perspective which places all coherence properties on more of an equal footing.
In this post we mostly think of X as finite for simplicity, although this is highly restrictive - recall that the VNM theorem assumes that we start with a preference ordering on Y=Δ(X) where Δ(X) is the space of probability distributions (or 'lotteries') on some sample space X.
In shard "theory" subagenty things are called shards. In IFS they are called parts. Part of the IFS dogma is having your different parts talk with one another and resolve conflict and thereby gain greater coherence of the superagent (you!). Clem von Stengel suggested the name crystal healing for this process as a joke. All blame for using it goes to us.
for further reflections on the ubiquity of Lobian cooperation see here.
It seems like this is only the case if you apply the subagent vetocracy model. I agree that "an incomplete egregore/agent is like a 'vetocracy' of VNM subagents", however, this is not the only valid model. There are other models of this that would not leave sure gains on the table.
Oh, do please share.
The presence of a pre-order doesn't inherently imply a composition of subagents with ordered preferences. An agent can have a pre-order of preferences due to reasons such as lack of information, indifference between choices, or bounds on computation - this does not necessitate the presence of subagents. If we do not use a model based on composition of subagents with ordered preferences, in the case of "Atticus the Agent" it can be consistent to switch B -> A + 1$ and A -> B + 1$. Perhaps I am misunderstanding the claim being made here though.
I think the model of "a composition of subagents with total orders on their preferences" is a descriptive model of inexploitable incomplete preferences, and not a mechanistic model. At least, that was how I interpreted "Why Subagents?".
I read @johnswentworth as making the claim that such preferences could be modelled as a vetocracy of VNM rational agents, not as claiming that humans (or other objects of study) are mechanistically composed of discrete parts that are themselves VNM rational.
I'd be more interested/excited by a refutation on the grounds of: "incomplete inexploitable preferences are not necessarily adequately modelled as a vetocracy of parts with complete preferences". VNM rationality and expected utility maximisation is mostly used as a descriptive rather than mechanistic tool anyway.
I think you have misunderstood. In particular, you can still model agents that are incomplete because of e.g. bounded compute as vetocracies.
Oh I agree you can model any incomplete agents as vetocracies.
I am just pointing out that the argument:
Does not imply:
Suppose it is offered (by a third party) to switch and then B→A+$1
Seems incomplete (pun acknowledged). I feel like there's something missing after "to switch" (e.g. "to switch from A to B" or similar).
Another example is an agent through time where as in the Steward of Myselves
This links to Scott Garrabrant's page, not to any particular post. Perhaps you want to review that?I think you meant to link to: Tyranny of the Epistemic Majority.
This last paragraph will live in my head forever, but I'm confused how it relates to the rest of the post. Would you agree with the following rephrasing? "Forming incomplete preferences (and thus not optimizing fully) is the easy way out, as it avoids taking some sure losses. But in doing so it also necessarily loses out on sure gains."
Thank you! Yeah that's the gist.
[but should rephrase with 'necessarily loses out on sure gains' by 'in (generically?) many environments loses out on sure gains']
Imagine John is going to have kids. He will like his kids. But, depending on random factors he will have different kids in different future timelines.
Omega shows up.
Omega: "hey John, by default if you have kids and then I offer your future self a reward to wind back time to actualize a different timeline where you have different kids, equally good from your current perspective, you will reject it. Take a look at this LessWrong post that suggest your hypothetical future selves are passing up Sure Gains. Why don't you take this pill that will make you forever indifferent between different versions of your kids (and equally any other aspects of those timelines) you would have been indifferent to given your current preferences?"
John: "Ah OK, maybe I'm mostly convinced, but i will ask simon first what he thinks"
simon: "Are you insane? You'd bring these people into existence, and then wipe them out if Omega offered you half a cent. Effectively murder. Along with everyone else in that timeline. Is that really what you want?"
John: "Oh... no of course not!" (to Omega): "I reject the pill!"
another hypothetical observer: "c'mon simon, no one was talking about murder in the LessWrong post, this whole thought experiment in this comment is irrelevant. The post assumes you can cleanly choose between one option and another without such additional considerations."
simon: "but by the same token the post fails to prove that, where you can't cleanly choose without additional considerations relevant to your current preferences, as in pretty much any real-example involving actual human values, it is 'irrational' to decline making this sort of choice, or to decline self modifying to so. Maybe there's a valid point there about selection pressure, but that pressure is then to be fought, not surrendered to!"
In conclusion, virtue ethics is a weakness of the will.
You have shown nothing of the sort.