Status: toward the end of writing this I started reading Suffering-Focused Ethics by Magnus Vinding as well as more Brian Tomasik, and I'm feeling myself value-drift in a more negative direction. It's possible I will endorse none of what follows fairly soon.

If you want to link to the higher-order freedoms formalization separate from the context of this post, just message me and I'll set it up in it's own post

Special thanks to comments from Nick Ronkin, Nicholas Turner


It recently occurred to me that I can't be expected to calculate my behavior if I didn't put in the work of figuring out what I'm fighting for and performing backward induction.

Another word for figuring out what I'm fighting for is articulating my terminal values, which practically I think looks like painting dreams and hopes of what I think sentience ought to do. I will call this a specific notion of winning (SNoW). Crucially, though I'm seeking more detail I already know that an instrumental value is the world not dying, the great sentience project (on earth) not failing, etc. An argument against writing this post is the following: why be self-indulgent, writing amateur philosophy/scifi, when you've already converged upon a (n intermediary) goal that implies several behaviors?

backward induction 1

You can imagine in the picture amputating the SNoW node and working entirely from the middle node! Just as, in a sense, the limits of my abstraction or foresight amputated an even further-right node in the first place (if you believe that no value is really terminal). However, I think there is a property inherent to backward induction that isn't captured by the syntax, the idea that as you move from right to left you're adding detail and motivation, every arrow brings complexity to its head node.

There is also to consider the colleague property of this setup: having a transhumanist/utopian terminal value endows me with the instrumental value of everything not blowing up, which endows me with the privilege of calling many of you my colleagues.

backward induction 2

Indeed, it would not be shocking if Alice and Bob, having abolished x-risk and secured the future, were then at eachothers' throats because they disagreed so fundamentally about where to go from there. Natural questions arise: are they colleagues in the face of x-risk? Ought they be? The latter I will not attempt to answer, but my feeling about the former is that the answer is yes. (note: indeed "abolished" is a strong word here, when "reasonably mitigated in perpetuity" may be more realistic).

Again, you can amputate the rightmost column from the graph and still have a veritable machine for generating behaviors. So why do I indulge in this post? Because I've tried to calculate my behavior strictly from the world not ending node, and I've gotten stuck. I think I have a messy & confused view of my expected impact, and I don't know how I should be spending my time. My hypothesis is that filling out detail further to the right of the graph is going to give me information that empowers my thinking. Having spent a great deal of time believing the argument against writing this post, I've been reticient to invest in my ideas, my visions, my dreams. I'm guessing this was a mistake: the lack of information that ought to come from the right side of the graph leaves empty patches (sorrys) in every node and arrow that remains, leading to a sloppy calculation.

Another point comes from the driving insight behind Fun Theory, which is that people have to want something in order to fight for it, so promoting imagination of transhumanities that people would actually want to live in could be an important part of building out allies.

Useful Exercise: What does the world look like if you win?

About a year ago, just before the plague, I went for a networking lunch with an EA. Very quickly, she asked me "what does the world look like if you win?". I was taken aback by the premise of the question; the idea that you could just think about that shocked me for some reason. I think because I was so mired in the instrumental convergence to goals that aren't personal visions but shared visions, and believing that it would be self-indulgent to go further-right on the graph.

In any case, I think this is a valuable exercise. I got a group together at EA Philly to do it collectively, and even derailed a party once with it.

Anyway, when initially asked I probably mumbled something about bringing autonomy and prosperity to all, because I didn't have a lot of time to think about it. It was approximately the next day I thought seriously about questions like "why is prosperity even good?", "what does it mean to maximize autonomy?", and came up with a semi-satisfying model that I think is worth writing down.

Against Knowing What You're Fighting For

If you're buying the premise of this post, let's take a moment to consider this arc of Replacing Guilt, Nate Soares includes a post called "You don't get to know what you're fighting for".

If someone were to ask you, "hey, what's that Nate guy trying so hard to do," you might answer something like "increase the chance of human survival," or "put an end to unwanted death" or "reduce suffering" or something. This isn't the case. I mean, I am doing those things, but those are all negative motivations: I am against Alzheimer's, I am against human extinction, but what am I for? The truth is, I don't quite know. I'm for something, that's for damn sure, and I have lots of feelings about the things that I'm fighting for, but I find them rather hard to express.

Soares writes that what we care about is an empirical matter, but that human introspection isn't sophisticated enough yet to release those facts into our epistemic state. He looks forward to a day when values are mapped and people can know what they're fighting for, but feasibility is only one component; there is also humility or the possibility that one's articulation is wrong. Soares seems to believe that under uncertainty negative goals are, as a rule, easier to justify than positive goals. He emphasizes the simple ability to be wrong about positive values, but when it comes to the urgent and obvious matters of alzheimers or extinction he does not highlight anything like that. I think this is reasonable. Indeed, activists implicitly know this because you see them protest against existing things more than you see them build new things, they don't want to open themselves up to the comparatively greater uncertainty, or they just find it harder to build teams given that uncertainty. But moreover, you know inherently more about consequences of existing things than potential things, when you try to bring about things that don't exist yet it's much closer to making a bet than when you try to stop something from existing.

But there's also a more general note here about value drift, or one interpretation of it among many. You can easily imagine things looking differently as you get closer to them, not least due to obtaining knowledge and clarity over time. Additionally, as the consequences of your steps propagate through the world, you may find premises of the goal suddenly violated. Much is stacked against your positive goals maintaining fidelity as you work toward them. Soares points out "The goal you think you're pursuing may well not survive a close examination." The example he gives is total hedonic utilitarianism: the asymmetry between how easy it is to claim your allegiance to it and the difficulty of defining "mind" or "preference", deciding on processes for extracting preferences from a mind, deciding on population ethics, etc. Of course one could naively think they've solved the "positive goals are slippery" problem just by taking these specific critiques and putting a lot of thought into them, but I think it's at least slightly less naive to try to think about meta-values or the premises on top of which valuers can come along and value stuff, reason about why it is they value it, etc. I will say more about meta-values later.

Higher-order Freedoms

Before I can describe my specific notion of winning (SNoW), I need to explain something. It appeared to me as "original", though I have no idea if it's "novel", and it forms the core of my win condition.


What does it mean to maximize autonomy? Why is prosperity even good?

I want my account of autonomy to have:

  • qualitative properties, where we ask "what kind?" of autonomy
  • quantitative properties, where we ask "how much?" autonomy

And ideally we won't do this as two separate steps!

I'll be content for my account of prosperity to be thoroughly dependent on autonomy.

The formalism

English first

The order of a freedom is the number of other freedoms associated with it.


  • We will take options to be discrete, but it should be generalizable to a continuous setting.
  • We shall model a decision (to-be-made) as a set of actions representing your available options.
  • Associated with each option is a PMF assigning probabilities to consequences of actions.
  • A consequence may be either a terminal outcome or another decision.
  • A decision's consequence set is the union of the consequences of all its options.
  • The consequence set of an outcome is empty.
  • We define interesting in the following way: a decision is interesting when most of its actions lead to more decisions most of the time.
  • We'll call a chain of options representing decisions-made and terminating in an outcome a questline.
  • Notation. We define questline as follows, for options where is in 's consequence set, is in 's consequence set, and is an outcome. .
  • The order of a questline is it's length, or the number of s plus one.
  • A decision can be filtered by reminding the agent of subsequent goals. For example, as the agent ponders the options in , their considering not only the bringing about of but ultimately of the bringing about of as well, so if there are options contrary to in , the agent has foresight not to select them.


  • Bob is living on subsistence UBI. He wants to go jetskiing. He'll need to get a job and go jetskiing on his day off, because activities like that aren't covered under the definition of subsistence. Write Bob's questline and state it's order ::: work \rightarrow jetskiing, order 2. :::
  • Alice lives in a town with 3 brands of toothpaste at CVS. Bob lives in a town with 7 brands of toothpaste at CVS. Which one has more freedom? ::: They have the same amount of freedom. :::
  • Alice wants to play piano. Like Bob, she is living on a subsistence UBI. Write a questline and state it's order ::: work \rightarrow buy piano \rightarrow practice \rightarrow play beautifully, order 4 :::

Issue from factoring

You may have noticed from the piano example that the granularity level is subjective. In short, every finite list is hiding a much longer list by generalizing steps, suppressing detail, clustering steps together. The step buy piano could easily be unpacked or factored into select piano at store buy piano move piano into house, (though I think the limit here is something like quark level, and we don't enter infinity). You're wondering if we have a principled way of selecting a granularity level, right? The way we proceed will have the following properties:

  • We want to suppress detail so that an action is at the abstraction level most useful to the agent
  • We want to emphasize interesting decisions, modulo filtering them with respect to information from the right side in the backward induction syntax. I.e. if a decision is interesting but contrary to some later goal, it can easily be ignored.
  • We are free to imagine a personal assistant AI that automates some of the process of filtering decisions with respect to information from the right side in the syntax, and suppressing uninteresting decisions. Indeed, later we'll see that such a personal assistant plays a crucial role in ensuring that people are actually happy in a world of higher-order freedoms.

My Specific Notion of Winning

If I win, the freedoms of the world will be increasing in order. I think the heretofore state of human cognition and society imposes an upper bound on the order of freedoms, and that the meaning of rate of progress is the first derivative of this upper bound.

I said I would derive prosperity from my account of autonomy, here it is: prosperity is simply the accessibility of higher-order freedoms.

Deriving altruism

It's easy for me to say I want the first derivative of the upper bound of orders of freedoms to be increasing for all, but are incentives aligned such that selfish agents take an interest in the liberties of all?

Idea: higher-order freedoms intertwine individuals

Intuitively, one lever for increasing the interestingness of my options is having colleagues who have interesting options, and the more potential colleagues I have the higher quality choices I can make for who to collaborate with. Therefore, a self-interested agent may seek to increase freedoms for others. Besides, a society organized around maximizing higher-order freedoms would retain a property of today's earth society: that one person's inventions can benefit countless others.

There is of course the famous Stephen Gould quote

I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops.

Thus an unequal world is a sub-optimal world in the framework of higher-order freedoms, and even selfish individuals are incentivized to fight for bringing higher-order freedoms to others.

Criticism from scarcity of computation

Maximizers and satisficers

Satisficers find a "good enough" option, unlike maximizers who look for a "best" option. On wikipedia, "satisficing is a decision-making strategy or cognitive heuristic that entails searching through the available alternatives until an acceptability threshold is met."

satisficing and maximizing

Meanwhile there is a literature on overchoice, defined by Alvin Toffler "[Overchoice takes place when] the advantages of diversity and individualization are canceled by the complexity of buyer's decision-making process.", which sadly as you'll see on wikipedia comes with a disclaimer that it hasn't been adequately reproduced. I am not going to attempt a rigorous literature review to figure out which parts of it we should keep and which parts we shouldn't, but below I will engage with four intuitive points because they challenge me to think through the ramifications of my framework.

Societies can maximize while individuals satisfice

An essay called Harmful Options appeared in the Fun Theory Sequence. In it, Eliezer pointed out that options come with them compute obligations, "Bounded rationalists can easily do worse with strictly more options, because they burn computing operations to evaluate them." Indeed, speaking personally, I've gotten pretty good at when to maximize and when to satisfice. A demon who wants to exhaust my resources might do so by offering me more choices, but I'm only susceptible if I'm blindly maximizing, if I'm incapable of saying to myself at any decision "I meta-prefer to conserve resources right now than to figure out what I prefer, so I'll just take whatever".

Put another way, consider the following edge case. Suppose an individual wanted to bring about a higher upper bound on the order of freedoms in their society, so they started by just trying to maximize their autonomy in their every day life. Consider the limiting case of this, of an agent who wants to maximize their autonomy in this higher-order freedoms setup, and consider also they find themselves in an infinite environment. Consider also an arbitrary foresight/simulation mechanism allowing them to plan their entire questline before taking a step. Please notice that every time they deliberate between a terminal outcome and another decision, they will choose the decision. So this tension emerges between allowing any questline to complete and maximizing autonomy. In this example, the agent will just plan forever and never act. Can you avert the planning forever outcome by removing one supposition? :::spoiler The first one, maximizing personal autonomy :::

And indeed we don't need this supposition: it's certainly not clear that the best way to boost the upper bound for society at large is to maximize your personal freedom at every step, but this is a natural mistake to make.

There can clearly be a divergence on the maximizer-satisficer axis between societal scale and the individual scale. I'm proposing that the societal scale should be maximizing (trying to get the highest upper bound on the order of freedoms as possible) while the individual is satisficing.

Tyranny of choice

Psychologist Barry Schwartz believes that welfare and freedom diverge after a certain mass of choices. In this short article, he outlines four reasons for this

  1. Buyer's remorse: having had more choices makes you wonder more if you made the right decision.
  2. Opportunity cost: When there are lots of alternatives to consider, it is easy to imagine the attractive features of alternatives you reject that makes you less satisfied with the option you've chosen.

With fewer choices under consideration, a person will have fewer opportunity costs to subtract.

  1. Escalation of expectations: suppose you invest units of computation into your preferences because you're faced with a decision of options. Schwartz suggests that the amount of satisfaction you'll expect is some where increasing by some factor. In a world of higher s, s will need to be higher, making your expectation much higher indeed.
  2. Shifting the blame: When you're dissatisfied and you didn't have many options, you can blame the world. When you had a lot of options, and you're dissatisfied, the problem must have been your computation of preferences step.

Schwartz is of course studying humans, without augmented cognition. I suggest we extract from these conclusions a principle, that the amount of comfortable freedom, that is, an amount of freedom beyond which it starts to diverge from welfare, is dependent on the cognitive abilities of the agents in question. I'd go one further and suggest that augmented cognition and social technologies are needed to assist people in dodging these failure modes.

Is my SNoW hostile to people who fall on the maximizer side of the spectrum?

I think if a world implemented my SNoW, there would be a big risk of people who tend maximizer being left behind. We need various cognitive and social technologies to be in place to help maximizers thrive. One example of such would be some parseability enhancers that aid in compression and filtering. I don't have a detailed picture of what it looks like, but I anticipate the need for it.

Again, at a high level, overchoice literature isn't necessarily replicating

In order to be inspired to do a more rigorous literature review, I would have to see an opportunity to implement a cognitive or social technology that I think would drag either the mean or upper bound order higher in my community, society, or planet. Again, I included Schwartz' four points because I think it's reasonable they would intuitively/philosophically have arisen.

When is Unaligned AI Morally Valuable?

Paul Christiano defined good successor as follows:

an AI is a good successor if I believe that building such an AI and “handing it the keys” is a reasonable thing to do with the universe.

Exercise: take at least ten minutes to write down your own good successor criteria.

My good successor criterion is synchronized with my SNoW

If you've gotten this far, you should be able to see what I'm about to claim.

I am willing to hand the keys of the universe over to an AI society that can implement my SNoW better than humans can. If it turns out that humans run up against the physical limits of how much higher-order their freedoms can be faster or with more friction than the AIs, then I think the AIs should inherit the cosmic endowment, and if they meet or create a civilization that can seize higher-order freedoms with less friction than they can then they ought to hand over the keys in turn.


In my view, it is natural to ask "What's wrong with paperclippers winning? Surely if they're propagating value in the universe it would be racist to think this was some tragedy, right?", and I claim that taking this seriously has been one of the most nutritional exercises in my growth. I will refer to people who feel that the obvious answer is "as a human I want humans to win!" as provincialists in the sense of "The act or an instance of placing the interests of one's province before one's nation", as suggested by language in the Value Theory sequence (where in the metaphor sentience/freedom-seizing creatures are the nation and humanity is the province).

Eliezer provided a word of caution about this exercise in the Value Theory sequence:

We can't relax our grip on the future - let go of the steering wheel - and still end up with anything of value. And those who think we can - they're trying to be cosmopolitan. I understand that. I read those same science fiction books as a kid: The provincial villains who enslave aliens for the crime of not looking just like humans. The provincial villains who enslave helpless AIs in durance vile on the assumption that silicon can't be sentient. And the cosmopolitan heroes who understand that minds don't have to be just like us to be embraced as valuable -

The broader point is not just that values we would recognize as valuable aren't just negligible points in the space of possible values - but let that sink in if the thought isn't absolutely familiar to you - but also that steering doesn't necessarily mean clinging to provincial values. If you give a human a steering wheel, they are not obligated to drive only in their comfort zone, they in fact have been known to go across town to a neighborhood they've never been to before.

To change away from human morals in the direction of improvement rather than entropy, requires a criterion of improvement; and that criterion would be physically represented in our brains, and our brains alone.

While I'm not sure I totally get the part about the brain yet, I think my SNoW/good successor criterion is a reasonable candidate for such a "criterion of improvement".

I want to be abundantly clear: coalitioning with provincialists may be abundantly crucial as humans may remain the best at seizing freedoms. I think designing AIs which preserve my SNoW is at least linearly harder than solving any of the alignment problems. This post is not the start of an alt-LW millenarian faction, indeed you could convince me that allocating research effort to ensuring that AIs are prosperous under this definition of prosperity does more harm than good.


I will not be publically performing backward induction at this time, but I'll just say I'm seeing gains in clarity of thinking about goals and behaviors since I sat down to knock out this post!

I recommend you take anything interesting in this post as a recommendation to do an exercise, whether that's articulating some positive vision of what you'd like to see after x-risk or tackling when is unaligned AI morally valuable. (I'm especially curious if anyone but me thinks those two exercises are highly related).

Notice: I didn't do the exercise from fun theory of writing what an average day would be like in my win condition. This is because of time/talent constraint!

Written at CEEALAR.

New Comment
1 comment, sorted by Click to highlight new comments since: Today at 4:42 AM

Thank you for sharing this; there are several useful conceptual tools in here. I like the way you've found crisply different adjectives to describe different kinds of freedom, and I like the way you're thinking about the computational costs of surplus choices. 

Building on that last point a bit, I might say that a savvy agent who has already evaluated N choices could try to keep a running estimate of their expected gains from choosing the best option available after considering X more choices and then compare that gain to their cost of computing the optimal choice out of X + N options. Right, like if the utility of an arbitrary choice follows anything like a normal distribution, then as N increases, we expect U(N+X) to have tinier and tinier advantages over U(N), because N choices already cover most of the distribution, so it's unlikely that an even better choice is available within the X additional choices you look at, and even if you do find a better choice, it's probably only slightly better. Yet for most humans, computing the best choice out of N+X options is more costly than computing the best choice for only N options, because you start to lose track of the details of the various options you're considering as you add more and more possibilities to the list, and the list starts to feel boring or overwhelming, so it gets harder to focus. So there's sort of a natural stopping point where the cost of considering X additional options can be confidently predicted to outweigh the expected benefit of considering X additional options, and when you reach that point, you should stop and pick the best choice you've already researched.

I like having access to at least some higher-order freedoms because I enjoy the sensation of planning and working toward long-term goal, but I don't understand why the order of a freedom is important enough to justify orienting our entire system of ethics around it. Right, like, I can imagine some extremely happy futures where everyone has stable access to dozens of high-quality choices in all areas of their lives, but, sadly, none of those choices exceed order 4, and none of them ever will. I think I'd take that future over our present and be quite grateful for the exchange. On the other hand, I can imagine some extremely dark futures where the order of choices is usually increasing for most people, because, e.g., they're becoming steadily smarter and/or more resilient and they live in a complicated world, but they're trapped in a kind of grindy hellscape where they have to constantly engage in that sort of long-term planning in order to purchase moderately effective relief from their otherwise constant suffering.

So I'd question whether the order of freedoms is (a) one interesting heuristic that is good to look at when considering possible futures, or (b) actually the definition of what it would mean to win. If it's (b), I think you have some more explaining to do.