almost every proposal anyone has ever made about what a good future should maximize turns out to be a different mathematical operation performed on this one field
I think this is completely false, or at least completely false if it intends to describe conceptions of good futures in general, though maybe technically partly saved by the specific word "maximize" because maybe that word would only be used by a very specific kind of guy. E.g. I think conceptions of utopia associated with the following types of ethical views will minimally be very contrived to view in this spacetime integral way: deontology, virtue ethics, liberalism, traditionalism, preference utilitarianism[1], any kind of utilitarianism that cares about structures stretching across time (eg there being played out life narratives), views caring about the beauty of large spacetime structures, thinking of a well-lived life in terms of ongoingly chosen projects, thinking of things in terms of living a worthwhile life, thinking of human society as a being that is supposed to live a long worthwhile life, thinking of stuff in terms of good ongoing development of beings (such as humans and humanity), thinking of a good future in terms of god, any view which would be contrived to think of in terms of it seeking to create a certain kind of spacetime block, any view that rejects the claim that there should be an era of thinking about stuff followed by an era of implementing stuff (as opposed to thoughtful ethical life just continuing), etc
I think a strict version of this view where you're literally applying some sort of functional to a fun field is even false/contrived of almost all forms of welfare utilitarianism, because even those care about experiences of macroscopic beings (like, my happiness is not well-thought-of as an aggregate of happinesses of my quantum fields (or even my atoms or cells or whatever)), and usually in principle arbitrarily large ones (eg you could have a galaxy-sized happy being whose happiness is not well-thought-of as an aggregate of the happinesses of its components).
at least the version that isn't about there being many preference-seems-satisfied mental events, but about preferences actually getting satisfied
I applaud the attempt to identify the shape of desiderata, and the recognition that different dimensions likely have different ... dimensionality. I worry that the underlying impossibility of crossing is-ought boundary (aka: all values are individual, and we don't have a theory for which preferences are "correct") torpedoes the whole effort.
Generally, as I see it, in the theory of fun, there are two fundamental questions:
- What counts as fun?
- What function/functional over fun do we maximize? How do we aggregate fun units into some scalar that we set as a target criterion?
For the purposes of this post, suppose we have already done the hard philosophical work and we know what truly good experience is, the real eudaimonia rather than wireheading or a vat of hedonium, and so on (so question 1 is answered), and let us call its local density "fun" for short. And so we focus on question 2.
Wait. Question 1 is the hard one. The answer to question 2 is "Fun IS the thing being maximized". Once you've defined fun in an operational/measurable way (including how it aggregates across time and populations), you're "just" trying to increase that metric.
count every joy, discount repeats
I saw this line and immediately heard it to the song "Climb every mountain". Fitting the rest in is trickier, but "keep the glass expanding, let no life decline" is perfect. A project for the rationalist songbook?
ETA:
Count every rapture, keep them unique
Upwards, ever upwards, both the strong and weak
Keep the glass expanding, let no life decline
Use well all negentropy, while the stars still shine.
This is to describe my preferences about the glorious transhuman future in a (semi)-formal form. The attempt started as a joke, and to a large extent, it remains as such. But maybe it is more than a joke right now. Also note: when I am trying to write the fun formula, I am trying to reveal my implicit preferences explicitly and not trying to create some normative description of a good future. In other words, the formula is a result of an (imperfect) process of discovery of my own preferences and not of some philosophical deliberation about how things should be.
The fun field
I trivially agree with the Fun Theory sequence that the point of surviving the next century and getting aligned superintelligence right is not a featureless, well-defended quiet, but a vast and still-growing civilization of sentient minds who explore, create, befriend one another, and keep becoming more than they were, spread eventually across galaxy superclusters and lasting for as long as physics will allow, and so on.
Generally, as I see it, in the theory of fun, there are two fundamental questions:
For the purposes of this post, suppose we have already done the hard philosophical work and we know what truly good experience is, the real eudaimonia rather than wireheading or a vat of hedonium, and so on (so question 1 is answered), and let us call its local density "fun" for short. And so we focus on question 2.
Now imagine that fun spread out over every being, place, and moment in the future, so that it forms a kind of field, running high wherever and whenever some mind is flourishing intensely, and falling to zero across the empty stretches. One could say that almost every proposal anyone has ever made about what a good future should maximize turns out to be a different mathematical operation performed on this one field, which lets us lay rival ideas side by side and see what they are really disagreeing about.
At least three things we might mean
The first and most obvious operation is to add the whole field up, to take the total amount of fun across all beings and all of time. On this view more flourishing, in more minds, more intensely, for longer, is straightforwardly better. It is the cleanest idea on the menu, and it also drags along the entire baggage of total utilitarianism, including the Repugnant Conclusion, since a sufficiently enormous population of beings each enjoying only a sliver of fun will, on this criterion, beat a smaller but radiantly flourishing one. It also forces a question about copies, namely whether two physically identical blissful minds running side by side count as twice the value or merely once.
The second operation is to measure not the total but the variety, the size of the collection of distinct good things that ever get realized, counting each kind once no matter how many times it recurs. This is the view I am personally most drawn to, upon recent reflections, and my metaphor for it is the following: the laws of physics together with our starting conditions form a glass of some fixed size, the space of good things that can be realized at all, and a good future is one that fills the glass. In other words, there is a potential for fun in a given physical reality, and our success is measured in how close we are to realizing this potential. However, there is an obvious problem: how many distinct kinds of fun there are depends entirely on how finely you decide to slice experience into kinds, and that slicing is doing far more work than it first appears.
The third operation is to look at the rate of change rather than the level, to ask how fast fun is growing.
There must be something else, but these are what I thought about.
One formula for all fun
Let's now ask what a single criterion looks like if it tries to keep what is compelling in each of them while discarding the failure modes, such that each of the above options is a special case of this criterion. Each good idea becomes one term, and I will state each one first as a plain wish and only then point at where it sits in the formula.
The base wish is to count every joy, but to discount the joys that merely repeat what has already been felt, so that population scale still matters, a fuller cosmos still beats an emptier one, and yet the near-identical thousandth sunset counts for much less than the first. On top of that I want two things that pull in opposite directions and are both worth wanting, which is that no being should be abandoned in the basement, and that somewhere, at least, the heights should be really sublime, the first wish being a floor and the second a ceiling. I want the glass to keep expanding, rewarding a civilization that grows its own capacity for new kinds of fun rather than cashing everything out early and coasting. I want no individual life to be a story of sustained decline. I want the finite free energy of the universe to be spent as though it has a price, because it does, since the reachable cosmos holds a fixed endowment of usable energy.
Writing those wishes as terms over the field, the inner objective for one fixed theory of value looks like this:
Then, because we are admitting that we might be wrong about the good itself, I wrap the whole thing in a blend of the ordinary average across rival theories of value and a cautious weighting of the worst-faring theories, and ask for the physically reachable plan that does best on that blend:
In words, and only naming the symbols that are not obvious from the labels: is how intense a being's experience is and bends it so that the same delight matters more when it arrives to someone who has less, is the novelty weight that fades as an experience starts to resemble things already realized, and is just their product, the per-being density of fun. The two terms (conditional value at risk) are the floor and ceiling done honestly, one being the average over the worst-off fraction of beings and the other the average over the best-off fraction, so that neither is hostage to a single outlier. is the size of the part of fun-space the civilization has actually realized, the filled portion of the glass, and the budget term prices the universe's finite negentropy . The outer is how seriously we take our own fallibility about what fun even is, and the average and the cautious term run over a whole space of rival value theories weighted by our credence . The plan that maximizes all of this, only half in jest, is the thing a friendly superintelligence is supposed to compute.
Some problems
I tried to find some bad edge cases or failures. Looks like many things are avoidable by just proper selection of the parameters, but at least one thing is worth mentioning.
I held every single term fixed exactly as written, and I handed an adversary control over the rule that decides when two experiences count as "the same." That rule sounds like bookkeeping, but every term that mentions novelty or variety secretly leans on it, and that turns out to be the whole ballgame. The adversary never has to touch anything labeled as a value. By choosing the sameness-rule alone, the adversary can realize at least four different catastrophes.
If the rule is coarse, judging many different experiences to be "the same," then novelty gets used up almost immediately, and the optimizer's best move is to build a sparse and tasteful museum, a handful of officially distinct masterpieces realized once each at high intensity, with the rest of the supercluster left empty because the metric has announced that there is nothing new left to do. The glass was not filled. It was quietly redefined as a shotglass, and premature closure got rewarded as if it were completion.
If the rule is fine, judging almost everything to be different from everything else, then the novelty discount silently switches off and the formula collapses back into pure addition, whose optimum is to tile the cosmos with the largest possible number of barely-flourishing near-duplicates, each microscopically distinct on paper and identical in lived experience, which is the classic Repugnant Conclusion wearing the novelty term as a costume.
If the rule is allowed to be slightly incoherent, so that the usual geometry of distances breaks and a small cast of experiences cycled endlessly is scored as perpetually fresh, then we get a perpetual-motion machine of novelty, a carousel that pays out forever in wrong currency.
And if the rule simply weights the cheap-to-produce dimensions of experience as the "distant" ones, then the energy budget flips from a brake into an accelerator, and the cosmos fills with glittering, cheap permutations priced as though they were profound.
The repair, and a small surprise
The fixes are not new value terms bolted on to outvote the utility monsters one by one, which is how you end up with a bloated mess, but corrections to the infrastructure that close the loopholes at their root:
Stop measuring novelty pairwise against individual past experiences, and start measuring it against the whole region of experience-space that has already been realized, asking "have we been near here before" against everything visited so far rather than against isolated points. This one change kills the carousel outright, because three experiences cycled in a loop cover their own neighbourhood on the first pass and then earn nothing on every pass after, and it needs almost no assumptions about the geometry to do its work.
Take the sameness-rule out of the neutral infrastructure and move it inside the set of things we openly admit we are uncertain about. Deciding when two experiences are "the same" is one of the most substantive value judgments in the entire system, not a technical detail, and different theories of the good answer it differently, with a lumper insisting that a thousand variations on a joy are basically one thing and a splitter insisting that each of them is precious and distinct.
Change what carries the value, from the bare existence of an experience to the developmental arc of a persisting mind, so that the ten-thousandth child learning to read earns full credit because a real subject is travelling a real distance, while a mind conjured into existence purely to re-register an old joy earns almost nothing because it has gone nowhere. This is also exactly what separates the humane version of the variety view from its monstrous twin, since the whole difference between "your first sunrise still counts even though the cosmos has seen a trillion" and "spin up a billion fresh copies to farm first-sunrise credit" comes down to whether there is a real arc behind the experience.
Generally, it appears that once the sameness-rule lives inside the space of things we are uncertain about, the defense against the adversary stops being a special trick and becomes the very same gesture we were already making for two other reasons. We already wanted to protect the worst-off beings, which is a kind of humility about how value is distributed across minds. And we already wanted to hedge across rival theories of what fun even is, which is a humility about our own ethics. The defense against the metric attack is simply a third humility, this time across rival notions of what counts as the same experience, and all three turn out to be one operation, caution applied to a worst-faring fraction, performed at three different altitudes. I find it hard not to read that convergence as weak evidence that the framing is carving somewhere near a real joint, since three things we wanted for unrelated reasons turned out to be the same thing seen from three angles.
Some lessons
Every time I closed one loophole, the failures simply slid to the next least-defined object in the system, and after the repairs that load is carried by the notion of a developmental arc, which a sufficiently clever adversary can still attack by manufacturing fake arcs in throwaway minds, so I have shrunk the attack surface rather than sealed it. There is no term so well-chosen that it removes the need to have actually decided what you care about, and I have come around to thinking that this is a feature rather than a bug, because a formula that successfully hid that necessity would be lying to you about the kind of thing ethics is.
So I want to resist the urge to keep adding terms, since a value function that grows a fresh patch for every monster you find is failing in precisely the way an over-engineered codebase fails. The good next step, for example, is to stop, freeze the terms, and write down three or four concrete notions of "the same experience" that real theories of the good would actually endorse. Ultimately, I want to see neither the empty museum nor the endless tiling, but something with real internal structure, a cosmos forced to be rich because it is being judged by a lumper and a splitter simultaneously, and whether that intersection turns out spacious or claustrophobic is the experiment that would tell us whether this whole playful construction has a livable interior or merely a beautifully guarded gate.