As promised at the end of my previous post, I will attempt to describe here my personal synthesis of how I think about decision theory, infinite ethics, the Solomonoff induction and average and total utilitarianism.
Background on moral decisions
I am a moral non-realist - I believe there is no objective morality. Nonetheless, I believe that according to my values there are better and worse ways to make moral decisions. Currently, I feel confused about many things in my value system, and I would not want to keep it that way permanently.
If the world becomes a safer place and I have more time to think during a Long Reflection, I will try to come up with an idealized version of my values, and then use my allocated share of resources to pursue those values. This Long Reflection might consist of doing more thinking and philosophy, talking with some trusted AI advisors or doing intelligence enhancement on myself. It might also include raising children and giving them a share of my resources. And finally, the Long Reflection will likely involve trade with other people[1], which can make moral reflection easier: if I have strong moral intuitions on subject A, and they have strong intuitions on subject B, we can pool our resources and follow my intuitions on A and theirs on B.
But for now I’m just a poor mortal fool, and I still need to make some decisions that are possibly irreversible and very high-stakes. What should I do then? The way I imagine it is that my current thinking and intuitions give an approximation of what my final values might be after the Long Reflection, and I’m trying to do my best to advance my eventual values according to these crude guesses of what they might entail.
In practice, this mostly means working towards increasing optionality for my future self and for people who share similar values, and setting up good processes for reflection. (For example, making sure humanity doesn’t lose control and that we end up having good AI advisors.[2]) And when I sometimes need to make irreversible trade-offs that are not purely oriented around increasing optionality, I’m trying my best to approximate my future values, taking into account the values of other people (my potential future trade partners) too.
OSAC
So far, what I’ve described are quite conventional views. The thing I’m adding in this essay is that I think about which distributions of worlds and moments I want to optimize for[3] in the same way I think about morality.
In my previous essay, I described that whenever I’m deciding whether to bring an umbrella with me, I’m helping some versions of myself in the multiverse and hurting some others. I argued that there was no reality fluid that objectively determined which versions of myself were more real, and it was a personal value judgement which versions I wanted to help more. Furthermore, I argued that I didn’t even accept Scott Garrabrant’s proposal of allocating my subjective caring to worlds in proportion to their mathematical simplicity - that view had some paradoxical consequences and I also found it generally unappealing.
Instead, I propose a tentative framework I called OSAC, standing for Optimization with Subjective Allocation of Care.[4]
(My impression is that there are a bunch of people going around saying “I believe in something vaguely like UDASSA” even when they have serious disagreements with the theory. My hope with giving an acronym-able name is that if some of these people like this post, they can switch to saying “I believe in something vaguely like OSAC”.)
The idea is that I imagine the approximation of my idealized allocation of subjective caring to different worlds and moments as a weighted set of distributions I care about. In each distribution, the utility that can be attained under the distribution is always bounded.
For example, if one distribution is “all space-time points in the universe, with 1% per billion years time discount rate starting from the Big Bang”[5], then I get a full score for this distribution if we manage to fill all the universe with joy. If I originally put a 2% weight on this distribution, then I can fulfill 2% of my overall utility this way.
If I learn that there is no way for me and other beings logically correlated with me to get control over an appreciable fraction of all space-time moments, then I need to give up on this 2% of my utility function, and look for meaning in the other distributions. On the other hand, if I learn that space contains a hundred times as many stars as I previously thought, that doesn’t increase the weight of this distribution to 200%. The value I can get from each distribution is ultimately bounded and capped at its weight.
As I previously discussed for UDASSA, here is the ideal version of my decision process:
I look at all the distributions of worlds I care about, look at all the actions in all these worlds that are logically correlated with my current decision, and I look at all their consequences. I sum up the positive and negative effects, weighted by their measure within the distributions, and the weights of the distributions.
For example, if the consequences of an action lead to filling 0.003 instead of 0.002 fraction of space with joy 13 billion years after the Big Bang, that increases the score of the above-described distribution by . And given that we assumed that this distribution has a 2% weight, it increases my overall utility by .
Then the decision has other effects in other distributions, and we can sum over the weighted effects. I want to choose the decision that results in the highest sum.
(Originally, I imagined the different distributions being represented by different people in my moral parliament and negotiating with each other and winning and losing negotiation chips in bets with each other over things I observe. However, I couldn’t really find any thought experiments where the moral parliament picture would have added value over the simpler picture I’m using now, where the different distributions are just branches of my utility function with different weights, and ultimately everything is additive. But I’m still fond of the moral parliament picture, so I’m interested if people can point to examples where it adds value over the simpler picture.)
In practice, during my mortal life, most of the time this just means working on increasing optionality for myself and people similar to me (again, a few examples are making sure humanity doesn’t lose control and that we end up having good AI advisors). The discussion in my previous essay of how to think about all the logical correlations as a mortal applies here too.
I hope to punt most of the big decisions to the Long Reflection[6]. It will be important to eventually figure out the exact shape of my prior - how much I care about which worlds and moments - but this doesn’t feel very different to me from figuring out my moral values, which I’m largely punting to the future anyway. For now, when I need to do something not purely for increasing optionality, I need to rely on approximations of how I imagine the eventual assembly of distributions I will converge on caring about after the Long Reflection.
For the rest of this essay, I will write about my current best guess of this approximation, and what its implications are.
Loving the mystical
At the end of my last essay, I introduced Scott Garrabrant’s description of loving worlds in proportion to the simplicity of their underlying mathematical laws, thus getting back basically the equivalent of Solomonoff induction. I expressed my disagreement:
Yes, I care a little bit about mathematical simplicity - I’m a mathematician by training, and I find simplicity aesthetically compelling. But I don’t feel like mathematical simplicity is very unique among the things I care about.
Instead of saying that I care about the goodness of the worlds weighted by how simple mathematical laws describe them, I could choose totally different weightings. I could rank possible universes (and moments within them) by how dramatic they are.[7] Then I could give ½ weight to the most dramatic universe, ¼ to the second most dramatic, and so on, the weights adding up to 1. And I could say that I try to maximize the goodness of worlds weighted by these dramaticness-weights. To me, using dramaticness sounds approximately as compelling as using mathematical simplicity for the weighting.
I stand by this claim. I think that the distributions that are well-described as some version of “care about worlds in proportion to their mathematical simplicity” only have like 0.1% weight in my overall caring.[8]
Otherwise, I care about all sorts of diverse distributions, based on all sorts of human concepts. I call these non-mathematical-simplicity related distributions “mystical distributions”, for lack of a better name, but I don’t want to dismiss them. After all, I really do believe that most of my caring is concentrated on them.
I care about the distribution of where the highest weights go to the most dramatic moments that have the most dramatic causal history, going back to the most dramatic way a world could be created. I care about the distribution where worlds and moments are weighted in proportion to their goodness itself: I care about making the best of all possible moments in the best of all possible worlds even better. I care about the distribution where weights are inversely proportional to goodness: I care about making the worst of all possible moments a little less bad. I care about bringing more goodness to the worlds where truth is the most knowable, more understanding of truth to the most beautiful worlds, and more beauty to the best of worlds. There are so many possible weightings of worlds to care about, not just mathematical simplicity.
All of this mysticism might sound a little weird (and don’t worry, after some updates we will largely get back to normality), but I stand by the prior being mystical and weird.
It feels more serious to talk about loving the mathematically simple worlds more, like Scott Garrabrant does, but I don’t believe it adds much of a real value if the initial distribution of caring is formal and mathematical.
Once I decided that I didn’t believe in the reality of the reality fluid, the allocation of caring is just part of my moral philosophy, and the rest of my moral philosophy is not mathematically formal either.
I also think that all formally defined distributions will be vulnerable to trickery like the obelisk-race or world-summoning by writing, and if I want to avoid that, I will need to allow some judgement calls.
And fixing a mathematical prior feels fishy in other ways too.
One day you hear a voice from a burning bush: "I am the Lord, thy God, and I'm telling you that this is the most just and merciful world. The world is relatively ordered, because I believe some order is necessary for justice, but the world is by no means only governed by simple physical laws, it's full of spirits and miracles. Also, here is a great explanation to the Problem of Evil, which shows that everything happening in this world is actually most just and merciful."
Would you just walk away and say "Sorry, I already picked my prior to only care about mathematically simple worlds"?
I think picking your prior to only care about mathematically simple worlds is cheating. You didn't come up with this prior floating in the void: it became appealing because you have already seen physics experiments in the world and internalized the Occam's razor intuition. That is, you already updated on things. But then I think the correct framing is not to start from a purely mathematical prior, but to start from a broader prior of things that sound appealing, and then update based on the world looking physically simple.
I’m describing this updating process in the next section.
Updates on the mystical and the mathematical
How much should caring about all these mystical distributions influence my everyday behavior? I think only a little: I have observed many updates that suggest I don’t have that much influence on these distributions.
The portion of me who cares about the goodness of worlds weighted by the worlds’ dramaticness primarily cares about making good decisions if I observe an incredibly dramatic world around me. That doesn’t really match my observations. If I was making predictions based on the assumption that the most dramatic outcome is the most likely, I would have been very often wrong.
Meanwhile, the assumption that we live in a materialistic universe governed by a few simple physical laws has a great track record. In the most dramatic possible worlds, there is no reason why every object should fall with the same gravitational acceleration. Why shouldn’t there be dragons who magically defy gravity? Dragons are pretty dramatic.
So the portion of me that cares about the most dramatic worlds needs to fall back to explanations like:
It sounds counter-intuitive, but maybe in the fullness of time we will realize that the world seeming to be explainable by a few simple rules is needed for full dramaticness. Or maybe there is a deity in the most dramatic possible world, who believes that creating people living in a seemingly mathematically explainable but not obviously dramatic world is the best way to raise His inheritors. He will pull us out to the most dramatic possible world and let us help it shape towards goodness.
But why would it be seemingly simple physical rules in particular that we will somehow realize to be crucial for the world to be dramatic? And why assume the deity in the dramatic world would decide to create people in a seemingly mathematically simple one? Why not in worlds optimized for let’s say beauty instead of mathematical simplicity? There are many options that feel like they deserve at least as much weight as simplicity.
So if I gave 0.1% weight to myself terminally caring about mathematically simple worlds, then I think for any particular X, I should only give 0.1% weight to the assumption that the best way to influence the worlds with X is somehow mysteriously through getting influence when it looks like the world around me runs on mathematically simple rules. Actually, I think even less than 0.1%, because “the best way to influence X is through a seemingly unrelated Y” is inherently a somewhat unnatural story, so let’s give these theories a 1/4 penalty.
This means that for creatures whose observations seem plausibly explainable with simple physical rules, about 0.1% of their influence on my utility function comes from directly influencing the 0.1% of my caring terminally focused on mathematically simple worlds. Meanwhile, of their influence comes from influencing the other 99.9% mystical distributions, when the best way to influence them is by taking actions in a seemingly mathematically simple universe.
So once I made the update on seeing the world being seemingly simple, only about 20% of my influence comes from influencing the various mystical distributions.
Then I take into account that I happen to live in a time that seems especially crucial for the world. From the perspective of the 0.1% of my values that terminally care about universes that look like ours, it seems relatively clear that I’m in a much more leveraged position to make the world better than either a 13th century peasant or a digital mind living in the Andromeda galaxy a trillion years after the Singularity.
But when I assume that I terminally care about dramatic worlds, but also assume that a deity in the dramatic world will pull me out of this world if He likes me, then it’s unclear if I’m in a better position than the peasant or the mind in the Andromeda.
Still, it seems plausible that many stories on influencing the mystical distributions route through getting a lot of influence in seemingly mathematically simple worlds in a scope-sensitive way, so I think that affecting the mystical distributions still holds 10% of my influence after this update.
(Ethical theories that are not scope-sensitive get massively downweighted though: among the people in all sorts of possible worlds and times and situations who are thinking about decisions in similar terms I’m describing here[9], I think I’m unusually well-placed to do scope-sensitively important things, so I should focus on that, while they can to some extent focus on other things.)
The next thing I take into account is cluelessness about how to do good according to various parts of my utility function. If I assume that I want to influence mathematically simple worlds, so I can assume that the world is as it seems, then at least I have some uncertain ideas what to do.
But once I’m operating under theories like “maybe a deity in the most dramatic possible world has created a mathematically simple world for some inscrutable reason” as one of the thousands of equally weighted stories, it is just really incredibly hard to know what is good. Therefore, my decisions have much less expected influence on the mystical distributions. This means that when I’m making a decision, I think overall only 1% of the influence comes from affecting the mystical distributions.
Obviously, all these numbers are kind of made up, but the conclusion feels roughly right to me: I should mostly work on making the mathematically simple worlds better, but I should have a not entirely negligible weight on affecting other distributions. [10]
What to do for the mystical distributions?
If I believe that about 1% of my influence on my utility function comes from making the various mystical distributions better, that means I should spend at least a bit of effort optimizing for them, doing the most leveraged things and picking the lowest-hanging fruits for making things better for them.
However, I have a huge uncertainty over what to do to make things better under all these strange, competing distributions, which all assume that the world is not quite what it looks like.
The only thing I can think of for making things generally better is acquiring more virtue. By virtue I mean becoming the kind of person who makes good decisions under very uncertain, very new circumstances. Being the kind of person who can do the reflection well, and who can be trusted with power under unforeseen conditions, like being given control of a slice of the most dramatic of all possible worlds after being woken up in the afterlife by a strange deity.
This is in large part applicable even for making the distribution of mathematically simple universes better. Even assuming that the world is roughly as it seems, consequences are hard to predict, and a good way to work towards robustly doing good things is by cultivating virtues. And I think even under a more conventional world-view, a non-negligible fraction of my future influence comes from surviving through the singularity into a world I barely understand, which I will need more wisdom to navigate.
But still, the parts of me optimizing for the mystical distributions care more about virtue than the ones optimizing for the mathematical distribution. Under the mathematical simplicity assumptions, I can be more confident about the world's structure and prioritize consequentialist actions more: find the big levers today that influence tomorrow. For helping the mystical distributions, this route is not really available, and the only thing that remains is cultivating wisdom and virtue.
What kind of wisdom, and what kind of virtue? I don’t really know, but all the classical answers seem good.
Developing good epistemic practices—wading through unfamiliar and confusing fields, developing an independent sense of truth. Bravery, kindness, honesty, fairness, friendliness, moderation. Faith, hope, and love. Keeping relationships alive, staying human. Avoiding heinous acts that would corrupt my character and lead to rationalizing that evil is actually good. Being the kind of person who, given power in a strange new world, would use it well rather than being destroyed by pride and self-deception. Being someone who would choose Heaven over Hell if they were a character in The Great Divorce.[11]
But also: utilitarianism and scope sensitivity. I have a bad feeling that people who talk too much about virtue ethics often end up not being very scope-sensitive. I think that’s a mistake on their part. Scope sensitivity - actually finding big levers and pushing them - is one of the highest virtues and pretty undersupplied in the world.
And many scope-sensitive actions can be directly valuable to the mystical distributions too. If I believe that increasing my own wisdom and virtue is important, then doing so for other people is likely to be important too. I’m less convinced of consequentialist plans, or even that other people really exist, under the mystical distributions, but I still believe in it to a significant extent.
Raise the sanity waterline. Help the poor, because healthier people are often better people. Get good AI advisors for everyone. Shape our AIs themselves to be the kind of virtuous and benevolent beings that can make decisions under surprising revelations. All the usual good things.
Overall, I think that optimizing for the mystical distributions leads to very similar actions as optimizing for making the mathematically simple worlds better.[12] And in any case, my current guess is that only 1% of my influence comes from affecting the mystical distributions.
Still, I should maybe call my grandparents a bit more often than would be strictly optimal if I was purely focusing on scope-sensitive utility maximization in a materialistic universe.
A note on existing religions
You can skip this section if your absurdity heuristic is strong enough that you are not tempted to consider following existing religions. I was tempted enough to at least think it through. (The conclusion is that I’m not converting to religion but I think it’s not entirely absurd that it’s the right choice for some people with certain values.) I wrote down my thoughts here, but it might not be interesting for everyone, so I put it in a collapsible section.
I give about 1 in 10,000 weight to an existing religion being true
Once we entertain mysticism - "maybe I care about making the best possible world better" and "maybe the world's apparent materialism is intentional misdirection by a powerful entity" - there are echoes of traditional religion.
One of the main reasons I don’t expect to have much influence over the mystical distributions is cluelessness. But existing religions potentially reduce that cluelessness: there are prophets, holy books, specific instructions. So what fraction of my influence comes from influencing worlds where some existing religions are true?
Pretty low, I'd say. Several reasons:
First, there is the Problem of Evil. In my ontology, that’s just the same update that brought down the influence of all mystical distributions to 20%: I need to assume that God for some reason really cared about the world being seemingly governed by simple physics, which inevitably includes some hurricanes. The weight I put on an all-loving God really caring about mathematically simple structures is the same order of magnitude as the weight that I put on caring about mathematical simplicity myself; though I think it’s somewhat lower for God. So we are 20%.
Second, the theory is additionally stretched. Beyond the normal two steps of reasoning required to influence mystical distributions ("there's some particular types of worlds I care about" and "for unknown reasons, this seemingly materialistic world is a good place to gain influence over those particular types of worlds"), there's the extra assumption that God decided on a specific half-measure - making the world overall roughly materialistic-looking and not fully revealing Himself, but also still sending some prophets to let His will be known. This is a very particular story for how I can influence mystical distributions I care about - I feel this is a 100x haircut, and now I’m down to 0.2%.
Third, I'm in a position of unusual influence over the long-term future, and most existing religions are not very scope-sensitive about things like conquering the galaxies. This means that the relative importance of my actions is much lower under most religious world-views than under the standard materialistic world-view, which is maybe a 5x haircut, so now religions are down to 0.04%.
Fourth, the religions I've encountered, while having many good features, miss many of the things I care the most about. I think this is only a 3x haircut though.
There is also the consideration that on the one hand, I feel that in the religious universes it’s more likely I will be satisfied with morality having a good resolution, while in the atheistic universes, it’s more likely that I will ultimately find everything meaningless. On the other hand, for exactly the same reason (there is already a clean morality which God knows), in the religious universes it’s more likely that God leads everything to the right outcome anyway, and my influence doesn’t matter. I would say it’s a wash, and I don’t make an update here in either direction.
So overall, I think only about 1 in 10,000 part of my influence over my utility function comes from acting as if one of the existing religions were true. This is not very high in absolute terms, and doesn’t really affect my behavior.
If someone asks me for the probability that Jesus rose from the dead, I need to answer that this is not the type of question where I feel comfortable using the abstraction of probabilities (see the discussion on the probability of Jesus’ resurrection in a previous post), but if I’m pressed, I will answer 1 in 20,000 as the least misleading answer, which is I think higher than what most people on LessWrong would say.
(Importantly, it’s not really a probability though. For example, one can’t do Pascal’s Wager and say that the rewards under the 1 in 20,000 religious worlds are infinite, while the outcomes in the materialistic worlds are finite. That’s why it’s important that the utilities are bounded under OSAC: a distribution with 1 in 20,000 weight can’t just outbid everything else by claiming to be infinite. It’s 1 in 20,000 of the overall influence, and that’s how much it gets.)
Interestingly, it’s really hard for new evidence to change this 1 in 20,000 number. If I read about the Fatima sun miracle and find it unexpectedly hard to find a naturalistic explanation, that barely moves the needle.
The hypothesis already posits that God occasionally performs some miracles, just enough to maintain faith but making sure He never makes His existence obvious to the whole world. I don’t understand why God is doing that, and I don’t know what the optimal level of miracle-making is from God’s perspective. I have some distribution over how weird I expect the weirdest-looking events to be in a materialistic universe, and how weird I expect them to be in a universe where God wants to give some signs but doesn’t want to fully reveal Himself. The distribution of expected weirdness, assuming God’s existence, is somewhat shifted to the right compared to the distribution I’d expect in a materialistic universe, but not a lot - after all, He doesn’t want to make things obvious. If I learn about an event that is somewhat more miraculous-seeming than what I have so far heard about, that gives some extra evidence to theism, but not very much.
So far, the evidence from miracles hasn’t moved me to make a significant update (I’m not even convinced that our world looks more miraculous than the median materialistic world), and I don’t plan to spend too much time looking at evidence of miracles, given that I don’t expect them to produce much evidence for the above-mentioned reasons.
Overall, 1 in 10,000 is a pretty small effect, so religions don’t influence my life very much, other than making me murmur an occasional prayer. But I can see people putting higher weight on religion due to genuine value differences, and I think we should be cautious about calling that inherently crazy.
Loving mathematics
After making all these updates, I now believe that 99% of my influence over my utility function comes from affecting the distributions that weigh worlds and moments based on “naturalistic” considerations: how simple the laws of physics are, how far in time a moment is from the Big Bang, and things like whether there is intelligent life at all in that universe. The boundary between naturalistic and mystical distributions is not very clear, but the naturalistic distributions are roughly those where no special explanation is required for why living in a seemingly materialistic universe, similar to ours, is a good place to affect these distributions.
The assembly of distributions
There are many different distributions I care about. As a general rule, the broader and more generic the distribution is, the bigger weight it gets, but more specific distributions get some weights too.
The concept is very similar to what I described in the Solomonoff over distributions section in my last essay. There are very broad, simple-to-describe distributions that get a lot of weight. For example “universes whose laws require N bits to describe get amount of caring, and I distribute my caring among the moments within each universe according to a distribution generated by a simple program taking a standard normal random variable as an input”.
Then the narrower and more specific distributions get smaller weights. These narrower distributions can specify simple mathematical properties, like “only the universes from the distribution that have a particular symmetry property”, or more human-level conditions like “only the universe in the distribution where intelligent life develops”, or even things like “only universes where the intelligent life never develops nuclear weapons”. The more specific and the weirder the definitions of distributions are, the less I care about them. What counts as weird is a subjective decision determined by me: I maintain this is not any worse than the fact that I already need to determine my morality subjectively. But as a general rule, if a distribution is cut into two smaller distributions, the sum of the weights of the two smaller distributions should be less than the weight of the bigger distribution.
Examples
All of this is probably pretty confusing now, so I think the best way to get across what I mean is walking through a number of philosophical paradoxes and trying to explain how I think OSAC handles them.
Given OSAC’s inherently subjective nature, many of the solutions depend to some extent on personal value judgements, and you might come to different conclusions in some cases due to differing moral intuitions. However, I believe the framework is still useful to get a handle on otherwise very confusing problems.
Boltzmann brains
Here the argument is basically the same as what I presented with UDASSA. If the universe will exist for infinitely long, it’s not possible to put equal measure on all moments in it. So I need to choose some kind of arbitrary distribution of how much I care about the moments within the universe. For example, these distributions can use time-discounting from the Big Bang, or description-lengths of the space-time moments.
Some of these distributions might be very broad and might contain overwhelmingly Boltzmann brains. However, only an astronomically small fraction of Boltzmann brains in these distributions observe ordered experiences. This means that if I’m a mind observing ordered experiences, then whatever I do, that only has astronomically little effect on the overall utility according to the distributions that mostly contain Boltzmann brains. Meanwhile, I and beings logically correlated with me have a decent amount of influence on the distributions that put most measure on experience moments that are fairly easy to describe, i.e. ones that get born due to a reasonable causal chain, and not just appear in the heat death soup as Boltzmann brains.
Therefore, even if initially only a small fraction of my caring was allocated to distributions that contained mostly non-Boltzmann brains, once I get some ordered observations, the overwhelming majority of my influence comes from the non-Boltzmann brain distributions, so I act as if I was not a Boltzmann brain.
Obelisk-race and world-summoning
In my previous essay, I explored twoparadoxes of UDASSA where people can take actions to increase the realness of their preferred worlds and moments.
Under my OSAC framework, I will simply say no to these shenanigans. Yes, if you build a bigger obelisk than any other alien civilization, your moments will have a shorter description-length. Very clever, but I don’t care.[13] I don’t need to, right now, mathematically precisely define the distributions I care about. I’m only trying to approximate what I will eventually decide to care about. Even if the eventual distributions I care about will take into account moments being simple to describe (for example to stave off Boltzmann brains), I feel pretty confident that they will have some kind of a “no shenanigans” clause, and building giant obelisks or plastering the equation of a world across the galaxies won’t really change my caring.
Is this solution not mathematically elegant? Maybe, but I don’t think I’m under any obligation to make my moral theories mathematically elegant.
Finetuning and the Presumptuous philosopher
Some physicists theorize that some of the fundamental constants of our universe are in the only small range that can enable life in the universe to arise. How should we relate to such theories?
One answer can be to accept the argument, no questions asked: it is no surprise at all that the universe we live in has parameters compatible with life; after all, we are alive.
On the other hand, some get uneasy about anthropic arguments for finetuning being used as a curiosity-stopper: every time we don’t understand something, we can say “I don’t know, maybe it’s the only arrangement compatible with life”. That doesn’t sound right.
To understand the problem better, let’s look at an example scenario. One might notice that this is basically the same as Bostrom’s Presumptuous philosopher thought experiment. (For the purpose of these example scenarios, I will fall back to the language of probabilities and anthropics.)
Scientists have two competing theories about the nature of the world, A and B. By default, they deem the two theories equally compelling and would give 50% probability to each. Both theories predict that there are a million equally real worlds, with a fundamental constant ranging from 1 to 1,000,000. Theory A predicts that all the worlds are compatible with life, while theory B predicts that only one world is habitable, but it doesn’t predict which one. Scientists observe that the fundamental constant in our world is 343,551. What is the probability that theory B is true?
The SIA interpretation of anthropics (which is in my opinion the more reasonable interpretation if you need to choose) would say Theory B only has 1 in a million probability. This doesn’t sound right to me - this would mean that even if strong updates arrived and we had very good reason to think that only a small range of parameters are compatible with life, we can’t accept that conclusion.
For further intuition, see this scenario:
Scientists have two competing theories about the nature of the world, A and B. By default, they deem the two theories equally compelling and would give 50% probability to each. Both theories predict that there are a million equally real worlds, with a fundamental constant ranging from 1 to 1,000,000. Theory A predicts that all the worlds are compatible with life, while theory B predicts that only one world is habitable where the fundamental constant is 343,551. Later, scientists measure the fundamental constant of our world, and it is 343,551. What is the probability that theory B is true?
SIA would say 50% - there is one habitable world fitting our observations under both Theory A and Theory B. This is quite a strange conclusion given how impressively accurate a prediction Theory B has made.
How does OSAC deal with these scenarios?
The broadest distribution, containing all worlds in A and B, gets weight .
But there are the narrower distributions too: the distribution where A is true, and the distribution where B is true. In the scenario where A and B sound equally compelling to scientists, they get equal weight. According to the general principle I described in The assembly of distributions, the sum of the weights of the two narrower distributions should be smaller than that of the broad distribution.
Depending on how fundamental the difference is between theory A and theory B, I’m more or less sympathetic to them getting their own distributions. If the difference is something as mundane as a coin landing on heads or tails, I don’t believe they should get their own distributions at all, and we should just optimize under the broad distribution. But in this case, the difference is in some kind of fundamental physical or even logical laws (is it possible for intelligent life to evolve under many different fundamental constants?), so I’m sympathetic to them getting their own distributions. Let’s say both A and B distributions get a weight of .
So far, this didn’t make a difference in the paradox. Life is rare in distribution B, so I and logically correlated beings can barely make a difference to the goodness of distribution B: most of it will be empty anyway. So even with the introduction of these new distributions, I will strongly favor betting on A being true.
However, I also give significant weights to the distributions of only those worlds that can contain intelligent life. Let’s say I give half as much weight as to the broader distributions.
(Why smaller weight than the broader distribution? Because I feel it’s smaller and less elegant. Why not much smaller weight, given that “having intelligent life” takes a lot of description-length to describe? Because I’m not committed to purely weighting distributions by mathematical description-length, and “having intelligent life” feels like a pretty natural condition to me for restricting my caring.)
Now we have six distributions: (1) weight to the distribution of everything in A and B; (2) weight to the distribution of everything in A; (3) weight to the distribution of everything in B; (4) weight to the distribution of every world with intelligent life in A and B; (5) weight to the distribution of every world containing intelligent life in A; (6) weight to the distribution of every world containing intelligent life in B.
Putting it in simpler terms: I care to some extent about making the average life better in the worlds where theory B is true. This is a somewhat specific form of caring, so it doesn’t get that much weight[14], but I still care about it to a non-negligible extent. This wouldn’t have been allowed by more simple SIA interpretations, where the average welfare of the living worlds in B is overwhelmed by theory A positing more living worlds.
Now, let’s look at what conclusion this model gives in the first scenario. I don’t believe in probabilities, so I will translate “what probability theory B has” to “what betting odds I would bet on theory B”, which is equivalent to “what fraction of my influence on my utility function comes from worlds where theory B is true.”
Of the six distributions, I can affect half of (1); all of (2); almost none of (3); and all of (4), (5) and (6). So the overall weight I can affect is .
Of this, the 0.05p weight of distribution (6) is the only significant part of my utility function where B is true. So the relative weight of B is , so I will bet as if I thought theory B had a 4.2% probability.
Alternatively, it’s possible I already have reason to believe that independently of whether theory A or B is true, it’s impossible to affect a non-negligible portion of all possible worlds, and so all my influence comes from affecting the distributions where it was assumed that intelligent life exists. I think this is quite likely. In this case, the overall weight I can affect is just , and B is true for 0.05 of this, so the relative weight of B is , so I would bet as if B has 8.3% probability.
The numbers are of course made up, but the conclusion sounds about right to me: Theories positing that only a small fraction of the worlds are habitable and that our world's parameters just happen to be in the habitable range should get some extra burden of proof (the probability went down to 8.3% from the original 50%), but they shouldn’t be overwhelmingly penalized.
To quickly look at the second scenario: There, once I learn that the fundamental constant is 343,551, I and logically correlated beings only have 1 in a million influence over distributions (1) to (5)[15], while I have full influence over (6). So almost all my influence comes from worlds where theory B is true, so I will bet on B as if it was almost certain to be true.
This again sounds intuitively right to me, given that B got a 1 in a million prediction right.
Nuclear war and anthropics
People sometimes say that the fact that we haven’t had a nuclear war yet is not good evidence for the rarity of nuclear wars: if there was a nuclear war, most of us would be dead, so it’s not surprising on anthropic grounds that we haven’t seen nukes flying yet.
I think this is the wrong way to think about things. We don’t even need the full OSAC framework to see that, just the principle espoused earlier in the sequence that instead of probabilities, we should think in terms of our influence over the world.
Nuclear war is very unlikely to fully wipe out civilization: it looks likely we would still recover even after repeated nuclear wars and eventually build AGI, and then we or the AIs would still conquer the stars.
So even if the population might temporarily decrease in the worlds that suffer nuclear wars, the overall weight of importance of decisions on both kinds of worlds is the same: both are eventually determining the fate of the galaxies. Therefore, when we try to make good decisions to scope-sensitively influence the future in a positive direction, it is not justified to use anthropic updates with regard to nuclear wars.
So the fact that we haven’t had a nuclear war in 80 years is in fact pretty good evidence that nuclear wars are rare.
LHC and anthropics
Okay, but what about threats that can actually wipe out humanity?
If I understand the story correctly, there was an interesting discussion on this in the LessWrong community around 2008. At the time, there was a fringe belief that turning on the Large Hadron Collider might destroy the world. When the time came to turn on the LHC, twice in a row some technical errors occurred that delayed the start. After the second time, some people started murmuring: what if the LHC would in fact destroy the world, and the only reason we have seen the two errors is due to the anthropic principle? What if in most quantum branches everyone is dead, and only our branch is alive where the technical errors occurred?
Later, the LHC was turned on successfully, and the world didn’t get destroyed. But it’s still an interesting puzzle to think through whether people were justified in updating towards the LHC’s dangerousness after the second error.
Scientists have two competing theories of the world, A and B. A posits that turning on the LHC is fine, B posits that it would destroy the world. Both A and B posit that when you try to turn on the LHC, the world splits into a hundred equally weighted quantum branches (like it always does at every moment). Both theories posit that the LHC will turn on in 99 branches, but that in branch 53 there will be an error. According to theory A, humans will continue to be alive in all branches. According to theory B, humans only survive in branch 53, where an error occurs. Scientists turn on the LHC, and observe that an error occurred, so we are in branch 53. How much should they update in favor of theory B?
This is superficially very similar to the second scenario in Finetuning and the Presumptuous philosopher where theory B predicted that only the world with constant 343,551 can support life, and then later we in fact observed the constant to be 343,551. Should we make a similar update?
I argue that not really. The previous argument relied on the assertion that maximizing the average welfare of living beings in worlds where theory B is true is a worthy goal that deserves some weight. This is true here too. However, most successful living futures in theory B worlds are not in the quantum branches where we turned on the LHC but ran into an unlikely error. They are mostly in quantum branches where people were smart enough to figure out that theory B is true, and they’ve never built the LHC.
Therefore, the branches where LHC runs into an error only have about 1% weight both in the theory A world and in the “theory B but someone is alive” worlds, so the error is not a significant update in favor of the LHC destroying the world.
If we keep turning on the LHC and it keeps failing, and the probability of errors gets to one in a billion or so, maybe it’s time to think again.[16] I think that “maximizing the average welfare in worlds where theory B is true but humans build LHC” is also a somewhat worthy goal that should get a non-zero weight. However, I feel the weight should be very low: it’s a very specific and kind of unnatural distribution, and I’m also averse to putting too much weight on distributions defined by human activity, because I want to avoid the Obelisk-race problem. Still, I would give more than zero weight to this distribution, and after enough inexplicable LHC failures, my influence on all the other distributions would go down enough that this peculiar distribution would represent most of my remaining influence. In that case, I would act as if we knew theory B was likely true, and would want the LHC destroyed.
I don’t think that the real errors that caused the delays in the LHC were anywhere near unlikely enough for this to be a significant effect.
Average utilitarianism
OSAC kind of has an average utilitarianism vibe. I take various distributions, and I try to maximize the average goodness within each distribution, then sum these up with the weights of the distributions.
However, I think the recommendations are very different from how average utilitarianism is often interpreted.
When I talked with people identifying as average utilitarians, they were generally in favor of turning Earth into a lush garden of a few million people living very happy lives in a utopian community, and leaving space alone.
OSAC is pretty strongly opposed to that. If we don’t spread to the stars, we remain an insignificant part of any broader distribution. In the other quantum branches, people with different values will not laze around on Earth. The velociraptor-civilization will spread to the stars! The Thousand-year Reich will spread to the stars! The AIs who take over will spread to the stars! If we want a non-negligible fraction of the quantum multiverse’s measure to be filled with goodness according to our values, we need to keep up with the velociraptors and spread to the stars ourselves.
Staying in a garden on Earth also makes us have lower scores in the broader distributions of “all possible universes, independently of whether they contain life” and “universes that contain intelligent life, but averaged over all space-time moments in their distributions and not just averaging over experience moments”.
To be clear, I also care about the distribution of only the worlds where humanity didn’t leave Earth, and about distributions that give disproportionate weight to moments on Earth.[17] So altogether I give a not entirely negligible weight to the welfare of Earthly life: if I had to choose between life on Earth surviving but never expanding, or humanity conquering the universe with a one in a million chance, I would choose survival on Earth.
But altogether I don’t give that much weight to these distributions focused on Earth, so I still want us to spread to the stars and fill them with glory and joy.
Negative utilitarianism
Strict negative utilitarians don’t care about joy, they just want to minimize the overall amount of suffering. Naively, the ideal outcome for them would be for life to die out and for our universe to remain empty.
There are already some known counter-arguments against this: maybe if humanity survives, we can buy off some aliens or beings in other universes through acausal trade to cause less suffering. But OSAC offers a different argument for staying alive.
The argument is the same as in the Average utilitarianism section. If we wipe ourselves out and leave our universe life-less, while the velociraptors fill their universe with suffering in the branches that they rule, that results in a very bad score for the distribution of all branches with morally relevant life. If we also conquered the stars, and filled them with joy or at least mediocre life, that would make the average in the distribution much better.
(As a different phrasing: If you believe that there is such a thing as reality fluid, or at least there is some non-realist equivalent of it like in OSAC, you can try to siphon away reality fluid from the moments of suffering.)
This is the weirdest conclusion of OSAC so far, and the one I’m least comfortable with. Filling our own stars with people doesn’t help the victims in the velociraptor-dimension, so why would it be good from a negative utilitarian perspective?
I think people could reasonably argue that even if they accept something like the OSAC framework, they don’t accept the step where worlds that have life have a distribution of their own in which it’s important to get a good average. They can say that they only care about having good results in the physically defined distributions, like the distribution of everything in the quantum multiverse, independently of whether or not it has life. Then, humanity filling the galaxies doesn’t change the weight of the suffering in the velociraptor-dimension.
I think that might be a tenable position, and I in fact give a lot of weight to physically defined distributions. But my intuition says that I should give some weight to averaging over living beings too. And given that I expect that most space-time moments in the physical universes will never get filled with anything morally important, I expect that our influence will be small on these purely physical distributions, moving from filling 0% of the space to 0.00001%.[18] So I tentatively think that most of my influence on my utility function will come from averaging over experiences within certain living distributions.
Taking into account the distributions only taking average of living worlds also results in saner results for Finetuning and the Presumptuous philosopher: in particular, they are allowed to update towards theory B when it gets a one in a million prediction right.
So my current position is that it’s good from a negative utilitarian perspective to spread to the stars if we believe we will improve the average, and will cause less suffering in expectation than the average civilization spreading to the stars in our quantum multiverse.[19]
(I’m unfortunately very uncertain though how we compare to the average civilization.)
Why are we so early in the universe?
This section is less important for illustrating OSAC than the other sections. I also didn’t do the full math, which makes the section a bit rambly. So I put it in collapsible mode, but I still think it’s interesting, and I would be excited to see someone redoing the calculations in the Grabby aliens paper with these assumptions.
Generalizing the Grabby aliens argument
We are surprisingly early in the history of the universe compared to the lifespan of stars. If I understand Hanson’s Grabby aliens paper correctly, it estimates that according to various calculations, we should expect that 99% of intelligent life emerges after us.
Being in the first 1% is not very shocking, 1% probability events happen all the time. Still, it’s somewhat surprising and worth investigating.
I have a distribution of weights on different possible universes with different underlying laws. One thing that falls out of the laws of each universe is what fraction of planets in any given year give birth to intelligent life. I don’t know this distribution, but I can make some estimates based on the available science.
What kind of anthropic updates should I make, starting from this initial distribution?
The distribution has a section that posits that intelligent life emerges rarely enough that it should happen in expectation less than once in the history of our reachable universe. This exactly corresponds to the situation described in Finetuning and the presumptuous philosopher. Following the logic there, I want to make some updates against theories that say that most universes are devoid of life, but I don’t want to make a very drastic update, because it’s still important for me to influence the sub-distribution of worlds where the laws of nature imply rare life but where the world still has life.
So maybe altogether I make a 3x update against the part of the distribution that implies that the reachable universe should have less than one intelligent species in expectation, and now I have about ⅙ weight on such worlds. In those worlds, there is no anthropic update on when I would expect us to be in the history of the universe.
However, I put ⅚ weight on worlds where intelligent life is spawned frequently enough that multiple civilizations can emerge in the reachable universe. Then, the civilizations that emerge earliest will be able to conquer the most resources.
One particular aspect of this, discussed in the Grabby aliens paper, is that the planets that would spawn intelligent life too late just won’t be allowed to develop a civilization at all, because they will be conquered by another alien species before they could spawn intelligent life.[20]
In my framework, the update is even stronger than what is discussed in the Grabby aliens paper, since earliness doesn’t only give a binary update (whether the civilization even has time to emerge before being conquered), but each possible time of emergence should be weighted by how much resources a civilization emerging at the time should be expected to conquer.
Someone could redo the Grabby aliens calculation with this in mind, though I expect the conclusions will be pretty similar.
Altogether, I tentatively agree with the conclusion of the Grabby aliens paper that we should expect to meet the aliens in a few billion years but not earlier.
Conclusion
Currently, OSAC is the best framework I have for thinking about probabilities, anthropics and infinite ethics.
One major drawback I see is its subjectivity: I’m worried that for many dilemmas like the ones I listed, I can just make up ad hoc weights for distributions to care about until the answer comes out what I wanted in the first place.
I think this flexibility is partially a virtue (a philosophical framework should be able to accommodate our intuitions to some extent), but partially a danger: just like scientific theories should have predictive power, a philosophical framework should also be able to pay rent by helping to arrive at non-trivial conclusions.
I am personally mostly satisfied with OSAC in this regard: for some of the examples listed above, I didn’t have an answer before starting to think about them in OSAC’s terms, but I’m pretty satisfied with the conclusions that were produced.
However, I would be interested in testing with other people whether we come to similar conclusions in philosophical dilemmas not listed in this post, (e.g. some classic paradoxes of infinite ethics; the question of running simulations on thicker wires; or the two-envelopes problem in moral weights) if we both try to work from the OSAC framework. I expect yes if both people think deeply enough, but I have some uncertainty.
I’m curious about people’s objections and proposed alternatives to the framework. I’m currently at the state where after long iteration on different theories, I can’t come up with obvious counter-examples invalidating the OSAC framework, but I expect it’s likely that someone will come up with one, and then I would need to iterate further.
Let us hope that through all this iteration, one day we will reach a stable point.
Also more mundane and personal ways for setting up good reflection processes: trying to personally become a wiser and better person in my day-to-day life.
The name is very weakly inspired by Wei Dai’s proposal of UDT-UMC as a name for the thing I presented as non-realist UDASSA in my last post, except that I wanted the name to be pronounceable and Universal Measure of Care was not very applicable to my framework.
Not actually a great example: a real example would be a distribution of possible worlds, and a distribution of points within them. But I’m going with this example for simplicity.
Which, again, doesn’t only involve sitting alone and thinking - I expect it to involve having children, trading with different beings, building things together, etc.
You say it’s impossible to define and rank dramaticness? But we are already talking about maximizing the goodness of worlds, so I feel introducing one more imprecise, human concept hardly makes things worse.
The 0.1% number is pretty made up, I don’t have a strong take on how big it really is for me. I will keep using this number later, but eventually it will appear on both sides of an equation so it doesn’t really matter what it is.
One note here: The logic I described above only applies for giving low weight to expecting that we can influence distributions of not mathematically simple worlds due to e.g. being in a simulation run by a dramatic deity. When we exert our influence on the mathematically simple worlds, the same logic doesn’t apply that we should put a low weight on influencing things through acausal trade and through being in simulations run by other mathematically simple worlds. Unlike in the case of the mystical distributions, I think we have a pretty convincing story of how acausal trade between different worlds within the mathematical distribution could work. In fact, I put quite high weight on this type of influence.
At least I think that’s the case now while I’m a confused mortal. During the Long Reflection, maybe I will come up with better ideas about exactly which mystical distributions I care about and whether there is any more concrete plan for making things better for them.
I’m not logically correlated with people who learned that their fundamental constant was 122,101: it’s a crucial difference that I see my number matching the one posited by theory B while they don’t.
As always, it is very hard to say what UDASSA’s opinion is on something, but I think in this case it probably agrees. Pointing to the cradle of civilization is plausibly relatively simple, so Earth gets more weight in UDASSA than the average random planet we will colonize.
Space is mostly empty vacuum, there is just not that much matter and energy in the world to fill all of space-time with joy. And if the distribution is not based on literal space and time, then on what? Weighing moments by their description lengths? But then you can in fact siphon away measure from the suffering in the velociraptor universe by building bigger obelisks than them, thus making it easier to point to you. I think that’s similar to caring only about the average in the distribution of living beings, except I feel it’s sillier.
There still needs to be some margin by which we need to be better than average, to counter-balance the purely physical distributions, where spreading is simply bad from a negative utilitarian perspective. But I think this margin is likely not very big.
I will ignore the zoo hypothesis that the aliens are already here and just chose not to reveal themselves. The same argument applies that I expressed about simulations in my last post: if we are in a zoo, we should expect to have very little impact on the future, so from a scope-sensitive perspective, we can largely ignore the possibility.
(This is the last post in my sequence. Reading the previous post on Infinite ethics and UDASSA is necessary for understanding this post. Reading the first post, Probabilities are not the right concept, is not necessary but recommended.)
Introduction
As promised at the end of my previous post, I will attempt to describe here my personal synthesis of how I think about decision theory, infinite ethics, the Solomonoff induction and average and total utilitarianism.
Background on moral decisions
I am a moral non-realist - I believe there is no objective morality. Nonetheless, I believe that according to my values there are better and worse ways to make moral decisions. Currently, I feel confused about many things in my value system, and I would not want to keep it that way permanently.
If the world becomes a safer place and I have more time to think during a Long Reflection, I will try to come up with an idealized version of my values, and then use my allocated share of resources to pursue those values. This Long Reflection might consist of doing more thinking and philosophy, talking with some trusted AI advisors or doing intelligence enhancement on myself. It might also include raising children and giving them a share of my resources. And finally, the Long Reflection will likely involve trade with other people[1], which can make moral reflection easier: if I have strong moral intuitions on subject A, and they have strong intuitions on subject B, we can pool our resources and follow my intuitions on A and theirs on B.
But for now I’m just a poor mortal fool, and I still need to make some decisions that are possibly irreversible and very high-stakes. What should I do then? The way I imagine it is that my current thinking and intuitions give an approximation of what my final values might be after the Long Reflection, and I’m trying to do my best to advance my eventual values according to these crude guesses of what they might entail.
In practice, this mostly means working towards increasing optionality for my future self and for people who share similar values, and setting up good processes for reflection. (For example, making sure humanity doesn’t lose control and that we end up having good AI advisors.[2]) And when I sometimes need to make irreversible trade-offs that are not purely oriented around increasing optionality, I’m trying my best to approximate my future values, taking into account the values of other people (my potential future trade partners) too.
OSAC
So far, what I’ve described are quite conventional views. The thing I’m adding in this essay is that I think about which distributions of worlds and moments I want to optimize for[3] in the same way I think about morality.
In my previous essay, I described that whenever I’m deciding whether to bring an umbrella with me, I’m helping some versions of myself in the multiverse and hurting some others. I argued that there was no reality fluid that objectively determined which versions of myself were more real, and it was a personal value judgement which versions I wanted to help more. Furthermore, I argued that I didn’t even accept Scott Garrabrant’s proposal of allocating my subjective caring to worlds in proportion to their mathematical simplicity - that view had some paradoxical consequences and I also found it generally unappealing.
Instead, I propose a tentative framework I called OSAC, standing for Optimization with Subjective Allocation of Care.[4]
(My impression is that there are a bunch of people going around saying “I believe in something vaguely like UDASSA” even when they have serious disagreements with the theory. My hope with giving an acronym-able name is that if some of these people like this post, they can switch to saying “I believe in something vaguely like OSAC”.)
The idea is that I imagine the approximation of my idealized allocation of subjective caring to different worlds and moments as a weighted set of distributions I care about. In each distribution, the utility that can be attained under the distribution is always bounded.
For example, if one distribution is “all space-time points in the universe, with 1% per billion years time discount rate starting from the Big Bang”[5], then I get a full score for this distribution if we manage to fill all the universe with joy. If I originally put a 2% weight on this distribution, then I can fulfill 2% of my overall utility this way.
If I learn that there is no way for me and other beings logically correlated with me to get control over an appreciable fraction of all space-time moments, then I need to give up on this 2% of my utility function, and look for meaning in the other distributions. On the other hand, if I learn that space contains a hundred times as many stars as I previously thought, that doesn’t increase the weight of this distribution to 200%. The value I can get from each distribution is ultimately bounded and capped at its weight.
As I previously discussed for UDASSA, here is the ideal version of my decision process:
I look at all the distributions of worlds I care about, look at all the actions in all these worlds that are logically correlated with my current decision, and I look at all their consequences. I sum up the positive and negative effects, weighted by their measure within the distributions, and the weights of the distributions.
For example, if the consequences of an action lead to filling 0.003 instead of 0.002 fraction of space with joy 13 billion years after the Big Bang, that increases the score of the above-described distribution by . And given that we assumed that this distribution has a 2% weight, it increases my overall utility by .
Then the decision has other effects in other distributions, and we can sum over the weighted effects. I want to choose the decision that results in the highest sum.
(Originally, I imagined the different distributions being represented by different people in my moral parliament and negotiating with each other and winning and losing negotiation chips in bets with each other over things I observe. However, I couldn’t really find any thought experiments where the moral parliament picture would have added value over the simpler picture I’m using now, where the different distributions are just branches of my utility function with different weights, and ultimately everything is additive. But I’m still fond of the moral parliament picture, so I’m interested if people can point to examples where it adds value over the simpler picture.)
In practice, during my mortal life, most of the time this just means working on increasing optionality for myself and people similar to me (again, a few examples are making sure humanity doesn’t lose control and that we end up having good AI advisors). The discussion in my previous essay of how to think about all the logical correlations as a mortal applies here too.
I hope to punt most of the big decisions to the Long Reflection[6]. It will be important to eventually figure out the exact shape of my prior - how much I care about which worlds and moments - but this doesn’t feel very different to me from figuring out my moral values, which I’m largely punting to the future anyway. For now, when I need to do something not purely for increasing optionality, I need to rely on approximations of how I imagine the eventual assembly of distributions I will converge on caring about after the Long Reflection.
For the rest of this essay, I will write about my current best guess of this approximation, and what its implications are.
Loving the mystical
At the end of my last essay, I introduced Scott Garrabrant’s description of loving worlds in proportion to the simplicity of their underlying mathematical laws, thus getting back basically the equivalent of Solomonoff induction. I expressed my disagreement:
Yes, I care a little bit about mathematical simplicity - I’m a mathematician by training, and I find simplicity aesthetically compelling. But I don’t feel like mathematical simplicity is very unique among the things I care about.
Instead of saying that I care about the goodness of the worlds weighted by how simple mathematical laws describe them, I could choose totally different weightings. I could rank possible universes (and moments within them) by how dramatic they are.[7] Then I could give ½ weight to the most dramatic universe, ¼ to the second most dramatic, and so on, the weights adding up to 1. And I could say that I try to maximize the goodness of worlds weighted by these dramaticness-weights. To me, using dramaticness sounds approximately as compelling as using mathematical simplicity for the weighting.
I stand by this claim. I think that the distributions that are well-described as some version of “care about worlds in proportion to their mathematical simplicity” only have like 0.1% weight in my overall caring.[8]
Otherwise, I care about all sorts of diverse distributions, based on all sorts of human concepts. I call these non-mathematical-simplicity related distributions “mystical distributions”, for lack of a better name, but I don’t want to dismiss them. After all, I really do believe that most of my caring is concentrated on them.
I care about the distribution of where the highest weights go to the most dramatic moments that have the most dramatic causal history, going back to the most dramatic way a world could be created. I care about the distribution where worlds and moments are weighted in proportion to their goodness itself: I care about making the best of all possible moments in the best of all possible worlds even better. I care about the distribution where weights are inversely proportional to goodness: I care about making the worst of all possible moments a little less bad. I care about bringing more goodness to the worlds where truth is the most knowable, more understanding of truth to the most beautiful worlds, and more beauty to the best of worlds. There are so many possible weightings of worlds to care about, not just mathematical simplicity.
All of this mysticism might sound a little weird (and don’t worry, after some updates we will largely get back to normality), but I stand by the prior being mystical and weird.
It feels more serious to talk about loving the mathematically simple worlds more, like Scott Garrabrant does, but I don’t believe it adds much of a real value if the initial distribution of caring is formal and mathematical.
Once I decided that I didn’t believe in the reality of the reality fluid, the allocation of caring is just part of my moral philosophy, and the rest of my moral philosophy is not mathematically formal either.
I also think that all formally defined distributions will be vulnerable to trickery like the obelisk-race or world-summoning by writing, and if I want to avoid that, I will need to allow some judgement calls.
And fixing a mathematical prior feels fishy in other ways too.
One day you hear a voice from a burning bush: "I am the Lord, thy God, and I'm telling you that this is the most just and merciful world. The world is relatively ordered, because I believe some order is necessary for justice, but the world is by no means only governed by simple physical laws, it's full of spirits and miracles. Also, here is a great explanation to the Problem of Evil, which shows that everything happening in this world is actually most just and merciful."
Would you just walk away and say "Sorry, I already picked my prior to only care about mathematically simple worlds"?
I think picking your prior to only care about mathematically simple worlds is cheating. You didn't come up with this prior floating in the void: it became appealing because you have already seen physics experiments in the world and internalized the Occam's razor intuition. That is, you already updated on things. But then I think the correct framing is not to start from a purely mathematical prior, but to start from a broader prior of things that sound appealing, and then update based on the world looking physically simple.
I’m describing this updating process in the next section.
Updates on the mystical and the mathematical
How much should caring about all these mystical distributions influence my everyday behavior? I think only a little: I have observed many updates that suggest I don’t have that much influence on these distributions.
The portion of me who cares about the goodness of worlds weighted by the worlds’ dramaticness primarily cares about making good decisions if I observe an incredibly dramatic world around me. That doesn’t really match my observations. If I was making predictions based on the assumption that the most dramatic outcome is the most likely, I would have been very often wrong.
Meanwhile, the assumption that we live in a materialistic universe governed by a few simple physical laws has a great track record. In the most dramatic possible worlds, there is no reason why every object should fall with the same gravitational acceleration. Why shouldn’t there be dragons who magically defy gravity? Dragons are pretty dramatic.
So the portion of me that cares about the most dramatic worlds needs to fall back to explanations like:
It sounds counter-intuitive, but maybe in the fullness of time we will realize that the world seeming to be explainable by a few simple rules is needed for full dramaticness. Or maybe there is a deity in the most dramatic possible world, who believes that creating people living in a seemingly mathematically explainable but not obviously dramatic world is the best way to raise His inheritors. He will pull us out to the most dramatic possible world and let us help it shape towards goodness.
But why would it be seemingly simple physical rules in particular that we will somehow realize to be crucial for the world to be dramatic? And why assume the deity in the dramatic world would decide to create people in a seemingly mathematically simple one? Why not in worlds optimized for let’s say beauty instead of mathematical simplicity? There are many options that feel like they deserve at least as much weight as simplicity.
So if I gave 0.1% weight to myself terminally caring about mathematically simple worlds, then I think for any particular X, I should only give 0.1% weight to the assumption that the best way to influence the worlds with X is somehow mysteriously through getting influence when it looks like the world around me runs on mathematically simple rules. Actually, I think even less than 0.1%, because “the best way to influence X is through a seemingly unrelated Y” is inherently a somewhat unnatural story, so let’s give these theories a 1/4 penalty.
This means that for creatures whose observations seem plausibly explainable with simple physical rules, about 0.1% of their influence on my utility function comes from directly influencing the 0.1% of my caring terminally focused on mathematically simple worlds. Meanwhile, of their influence comes from influencing the other 99.9% mystical distributions, when the best way to influence them is by taking actions in a seemingly mathematically simple universe.
So once I made the update on seeing the world being seemingly simple, only about 20% of my influence comes from influencing the various mystical distributions.
Then I take into account that I happen to live in a time that seems especially crucial for the world. From the perspective of the 0.1% of my values that terminally care about universes that look like ours, it seems relatively clear that I’m in a much more leveraged position to make the world better than either a 13th century peasant or a digital mind living in the Andromeda galaxy a trillion years after the Singularity.
But when I assume that I terminally care about dramatic worlds, but also assume that a deity in the dramatic world will pull me out of this world if He likes me, then it’s unclear if I’m in a better position than the peasant or the mind in the Andromeda.
Still, it seems plausible that many stories on influencing the mystical distributions route through getting a lot of influence in seemingly mathematically simple worlds in a scope-sensitive way, so I think that affecting the mystical distributions still holds 10% of my influence after this update.
(Ethical theories that are not scope-sensitive get massively downweighted though: among the people in all sorts of possible worlds and times and situations who are thinking about decisions in similar terms I’m describing here[9], I think I’m unusually well-placed to do scope-sensitively important things, so I should focus on that, while they can to some extent focus on other things.)
The next thing I take into account is cluelessness about how to do good according to various parts of my utility function. If I assume that I want to influence mathematically simple worlds, so I can assume that the world is as it seems, then at least I have some uncertain ideas what to do.
But once I’m operating under theories like “maybe a deity in the most dramatic possible world has created a mathematically simple world for some inscrutable reason” as one of the thousands of equally weighted stories, it is just really incredibly hard to know what is good. Therefore, my decisions have much less expected influence on the mystical distributions. This means that when I’m making a decision, I think overall only 1% of the influence comes from affecting the mystical distributions.
Obviously, all these numbers are kind of made up, but the conclusion feels roughly right to me: I should mostly work on making the mathematically simple worlds better, but I should have a not entirely negligible weight on affecting other distributions. [10]
What to do for the mystical distributions?
If I believe that about 1% of my influence on my utility function comes from making the various mystical distributions better, that means I should spend at least a bit of effort optimizing for them, doing the most leveraged things and picking the lowest-hanging fruits for making things better for them.
However, I have a huge uncertainty over what to do to make things better under all these strange, competing distributions, which all assume that the world is not quite what it looks like.
The only thing I can think of for making things generally better is acquiring more virtue. By virtue I mean becoming the kind of person who makes good decisions under very uncertain, very new circumstances. Being the kind of person who can do the reflection well, and who can be trusted with power under unforeseen conditions, like being given control of a slice of the most dramatic of all possible worlds after being woken up in the afterlife by a strange deity.
This is in large part applicable even for making the distribution of mathematically simple universes better. Even assuming that the world is roughly as it seems, consequences are hard to predict, and a good way to work towards robustly doing good things is by cultivating virtues. And I think even under a more conventional world-view, a non-negligible fraction of my future influence comes from surviving through the singularity into a world I barely understand, which I will need more wisdom to navigate.
But still, the parts of me optimizing for the mystical distributions care more about virtue than the ones optimizing for the mathematical distribution. Under the mathematical simplicity assumptions, I can be more confident about the world's structure and prioritize consequentialist actions more: find the big levers today that influence tomorrow. For helping the mystical distributions, this route is not really available, and the only thing that remains is cultivating wisdom and virtue.
What kind of wisdom, and what kind of virtue? I don’t really know, but all the classical answers seem good.
Developing good epistemic practices—wading through unfamiliar and confusing fields, developing an independent sense of truth. Bravery, kindness, honesty, fairness, friendliness, moderation. Faith, hope, and love. Keeping relationships alive, staying human. Avoiding heinous acts that would corrupt my character and lead to rationalizing that evil is actually good. Being the kind of person who, given power in a strange new world, would use it well rather than being destroyed by pride and self-deception. Being someone who would choose Heaven over Hell if they were a character in The Great Divorce.[11]
But also: utilitarianism and scope sensitivity. I have a bad feeling that people who talk too much about virtue ethics often end up not being very scope-sensitive. I think that’s a mistake on their part. Scope sensitivity - actually finding big levers and pushing them - is one of the highest virtues and pretty undersupplied in the world.
And many scope-sensitive actions can be directly valuable to the mystical distributions too. If I believe that increasing my own wisdom and virtue is important, then doing so for other people is likely to be important too. I’m less convinced of consequentialist plans, or even that other people really exist, under the mystical distributions, but I still believe in it to a significant extent.
Raise the sanity waterline. Help the poor, because healthier people are often better people. Get good AI advisors for everyone. Shape our AIs themselves to be the kind of virtuous and benevolent beings that can make decisions under surprising revelations. All the usual good things.
Overall, I think that optimizing for the mystical distributions leads to very similar actions as optimizing for making the mathematically simple worlds better.[12] And in any case, my current guess is that only 1% of my influence comes from affecting the mystical distributions.
Still, I should maybe call my grandparents a bit more often than would be strictly optimal if I was purely focusing on scope-sensitive utility maximization in a materialistic universe.
A note on existing religions
You can skip this section if your absurdity heuristic is strong enough that you are not tempted to consider following existing religions. I was tempted enough to at least think it through. (The conclusion is that I’m not converting to religion but I think it’s not entirely absurd that it’s the right choice for some people with certain values.) I wrote down my thoughts here, but it might not be interesting for everyone, so I put it in a collapsible section.
I give about 1 in 10,000 weight to an existing religion being true
Once we entertain mysticism - "maybe I care about making the best possible world better" and "maybe the world's apparent materialism is intentional misdirection by a powerful entity" - there are echoes of traditional religion.
One of the main reasons I don’t expect to have much influence over the mystical distributions is cluelessness. But existing religions potentially reduce that cluelessness: there are prophets, holy books, specific instructions. So what fraction of my influence comes from influencing worlds where some existing religions are true?
Pretty low, I'd say. Several reasons:
First, there is the Problem of Evil. In my ontology, that’s just the same update that brought down the influence of all mystical distributions to 20%: I need to assume that God for some reason really cared about the world being seemingly governed by simple physics, which inevitably includes some hurricanes. The weight I put on an all-loving God really caring about mathematically simple structures is the same order of magnitude as the weight that I put on caring about mathematical simplicity myself; though I think it’s somewhat lower for God. So we are 20%.
Second, the theory is additionally stretched. Beyond the normal two steps of reasoning required to influence mystical distributions ("there's some particular types of worlds I care about" and "for unknown reasons, this seemingly materialistic world is a good place to gain influence over those particular types of worlds"), there's the extra assumption that God decided on a specific half-measure - making the world overall roughly materialistic-looking and not fully revealing Himself, but also still sending some prophets to let His will be known. This is a very particular story for how I can influence mystical distributions I care about - I feel this is a 100x haircut, and now I’m down to 0.2%.
Third, I'm in a position of unusual influence over the long-term future, and most existing religions are not very scope-sensitive about things like conquering the galaxies. This means that the relative importance of my actions is much lower under most religious world-views than under the standard materialistic world-view, which is maybe a 5x haircut, so now religions are down to 0.04%.
Fourth, the religions I've encountered, while having many good features, miss many of the things I care the most about. I think this is only a 3x haircut though.
There is also the consideration that on the one hand, I feel that in the religious universes it’s more likely I will be satisfied with morality having a good resolution, while in the atheistic universes, it’s more likely that I will ultimately find everything meaningless. On the other hand, for exactly the same reason (there is already a clean morality which God knows), in the religious universes it’s more likely that God leads everything to the right outcome anyway, and my influence doesn’t matter. I would say it’s a wash, and I don’t make an update here in either direction.
So overall, I think only about 1 in 10,000 part of my influence over my utility function comes from acting as if one of the existing religions were true. This is not very high in absolute terms, and doesn’t really affect my behavior.
If someone asks me for the probability that Jesus rose from the dead, I need to answer that this is not the type of question where I feel comfortable using the abstraction of probabilities (see the discussion on the probability of Jesus’ resurrection in a previous post), but if I’m pressed, I will answer 1 in 20,000 as the least misleading answer, which is I think higher than what most people on LessWrong would say.
(Importantly, it’s not really a probability though. For example, one can’t do Pascal’s Wager and say that the rewards under the 1 in 20,000 religious worlds are infinite, while the outcomes in the materialistic worlds are finite. That’s why it’s important that the utilities are bounded under OSAC: a distribution with 1 in 20,000 weight can’t just outbid everything else by claiming to be infinite. It’s 1 in 20,000 of the overall influence, and that’s how much it gets.)
Interestingly, it’s really hard for new evidence to change this 1 in 20,000 number. If I read about the Fatima sun miracle and find it unexpectedly hard to find a naturalistic explanation, that barely moves the needle.
The hypothesis already posits that God occasionally performs some miracles, just enough to maintain faith but making sure He never makes His existence obvious to the whole world. I don’t understand why God is doing that, and I don’t know what the optimal level of miracle-making is from God’s perspective. I have some distribution over how weird I expect the weirdest-looking events to be in a materialistic universe, and how weird I expect them to be in a universe where God wants to give some signs but doesn’t want to fully reveal Himself. The distribution of expected weirdness, assuming God’s existence, is somewhat shifted to the right compared to the distribution I’d expect in a materialistic universe, but not a lot - after all, He doesn’t want to make things obvious. If I learn about an event that is somewhat more miraculous-seeming than what I have so far heard about, that gives some extra evidence to theism, but not very much.
So far, the evidence from miracles hasn’t moved me to make a significant update (I’m not even convinced that our world looks more miraculous than the median materialistic world), and I don’t plan to spend too much time looking at evidence of miracles, given that I don’t expect them to produce much evidence for the above-mentioned reasons.
Overall, 1 in 10,000 is a pretty small effect, so religions don’t influence my life very much, other than making me murmur an occasional prayer. But I can see people putting higher weight on religion due to genuine value differences, and I think we should be cautious about calling that inherently crazy.
Loving mathematics
After making all these updates, I now believe that 99% of my influence over my utility function comes from affecting the distributions that weigh worlds and moments based on “naturalistic” considerations: how simple the laws of physics are, how far in time a moment is from the Big Bang, and things like whether there is intelligent life at all in that universe. The boundary between naturalistic and mystical distributions is not very clear, but the naturalistic distributions are roughly those where no special explanation is required for why living in a seemingly materialistic universe, similar to ours, is a good place to affect these distributions.
The assembly of distributions
There are many different distributions I care about. As a general rule, the broader and more generic the distribution is, the bigger weight it gets, but more specific distributions get some weights too.
The concept is very similar to what I described in the Solomonoff over distributions section in my last essay. There are very broad, simple-to-describe distributions that get a lot of weight. For example “universes whose laws require N bits to describe get amount of caring, and I distribute my caring among the moments within each universe according to a distribution generated by a simple program taking a standard normal random variable as an input”.
Then the narrower and more specific distributions get smaller weights. These narrower distributions can specify simple mathematical properties, like “only the universes from the distribution that have a particular symmetry property”, or more human-level conditions like “only the universe in the distribution where intelligent life develops”, or even things like “only universes where the intelligent life never develops nuclear weapons”. The more specific and the weirder the definitions of distributions are, the less I care about them. What counts as weird is a subjective decision determined by me: I maintain this is not any worse than the fact that I already need to determine my morality subjectively. But as a general rule, if a distribution is cut into two smaller distributions, the sum of the weights of the two smaller distributions should be less than the weight of the bigger distribution.
Examples
All of this is probably pretty confusing now, so I think the best way to get across what I mean is walking through a number of philosophical paradoxes and trying to explain how I think OSAC handles them.
Given OSAC’s inherently subjective nature, many of the solutions depend to some extent on personal value judgements, and you might come to different conclusions in some cases due to differing moral intuitions. However, I believe the framework is still useful to get a handle on otherwise very confusing problems.
Boltzmann brains
Here the argument is basically the same as what I presented with UDASSA. If the universe will exist for infinitely long, it’s not possible to put equal measure on all moments in it. So I need to choose some kind of arbitrary distribution of how much I care about the moments within the universe. For example, these distributions can use time-discounting from the Big Bang, or description-lengths of the space-time moments.
Some of these distributions might be very broad and might contain overwhelmingly Boltzmann brains. However, only an astronomically small fraction of Boltzmann brains in these distributions observe ordered experiences. This means that if I’m a mind observing ordered experiences, then whatever I do, that only has astronomically little effect on the overall utility according to the distributions that mostly contain Boltzmann brains. Meanwhile, I and beings logically correlated with me have a decent amount of influence on the distributions that put most measure on experience moments that are fairly easy to describe, i.e. ones that get born due to a reasonable causal chain, and not just appear in the heat death soup as Boltzmann brains.
Therefore, even if initially only a small fraction of my caring was allocated to distributions that contained mostly non-Boltzmann brains, once I get some ordered observations, the overwhelming majority of my influence comes from the non-Boltzmann brain distributions, so I act as if I was not a Boltzmann brain.
Obelisk-race and world-summoning
In my previous essay, I explored two paradoxes of UDASSA where people can take actions to increase the realness of their preferred worlds and moments.
Under my OSAC framework, I will simply say no to these shenanigans. Yes, if you build a bigger obelisk than any other alien civilization, your moments will have a shorter description-length. Very clever, but I don’t care.[13] I don’t need to, right now, mathematically precisely define the distributions I care about. I’m only trying to approximate what I will eventually decide to care about. Even if the eventual distributions I care about will take into account moments being simple to describe (for example to stave off Boltzmann brains), I feel pretty confident that they will have some kind of a “no shenanigans” clause, and building giant obelisks or plastering the equation of a world across the galaxies won’t really change my caring.
Is this solution not mathematically elegant? Maybe, but I don’t think I’m under any obligation to make my moral theories mathematically elegant.
Finetuning and the Presumptuous philosopher
Some physicists theorize that some of the fundamental constants of our universe are in the only small range that can enable life in the universe to arise. How should we relate to such theories?
One answer can be to accept the argument, no questions asked: it is no surprise at all that the universe we live in has parameters compatible with life; after all, we are alive.
On the other hand, some get uneasy about anthropic arguments for finetuning being used as a curiosity-stopper: every time we don’t understand something, we can say “I don’t know, maybe it’s the only arrangement compatible with life”. That doesn’t sound right.
To understand the problem better, let’s look at an example scenario. One might notice that this is basically the same as Bostrom’s Presumptuous philosopher thought experiment. (For the purpose of these example scenarios, I will fall back to the language of probabilities and anthropics.)
Scientists have two competing theories about the nature of the world, A and B. By default, they deem the two theories equally compelling and would give 50% probability to each. Both theories predict that there are a million equally real worlds, with a fundamental constant ranging from 1 to 1,000,000. Theory A predicts that all the worlds are compatible with life, while theory B predicts that only one world is habitable, but it doesn’t predict which one. Scientists observe that the fundamental constant in our world is 343,551. What is the probability that theory B is true?
The SIA interpretation of anthropics (which is in my opinion the more reasonable interpretation if you need to choose) would say Theory B only has 1 in a million probability. This doesn’t sound right to me - this would mean that even if strong updates arrived and we had very good reason to think that only a small range of parameters are compatible with life, we can’t accept that conclusion.
For further intuition, see this scenario:
Scientists have two competing theories about the nature of the world, A and B. By default, they deem the two theories equally compelling and would give 50% probability to each. Both theories predict that there are a million equally real worlds, with a fundamental constant ranging from 1 to 1,000,000. Theory A predicts that all the worlds are compatible with life, while theory B predicts that only one world is habitable where the fundamental constant is 343,551. Later, scientists measure the fundamental constant of our world, and it is 343,551. What is the probability that theory B is true?
SIA would say 50% - there is one habitable world fitting our observations under both Theory A and Theory B. This is quite a strange conclusion given how impressively accurate a prediction Theory B has made.
How does OSAC deal with these scenarios?
The broadest distribution, containing all worlds in A and B, gets weight .
But there are the narrower distributions too: the distribution where A is true, and the distribution where B is true. In the scenario where A and B sound equally compelling to scientists, they get equal weight. According to the general principle I described in The assembly of distributions, the sum of the weights of the two narrower distributions should be smaller than that of the broad distribution.
Depending on how fundamental the difference is between theory A and theory B, I’m more or less sympathetic to them getting their own distributions. If the difference is something as mundane as a coin landing on heads or tails, I don’t believe they should get their own distributions at all, and we should just optimize under the broad distribution. But in this case, the difference is in some kind of fundamental physical or even logical laws (is it possible for intelligent life to evolve under many different fundamental constants?), so I’m sympathetic to them getting their own distributions. Let’s say both A and B distributions get a weight of .
So far, this didn’t make a difference in the paradox. Life is rare in distribution B, so I and logically correlated beings can barely make a difference to the goodness of distribution B: most of it will be empty anyway. So even with the introduction of these new distributions, I will strongly favor betting on A being true.
However, I also give significant weights to the distributions of only those worlds that can contain intelligent life. Let’s say I give half as much weight as to the broader distributions.
(Why smaller weight than the broader distribution? Because I feel it’s smaller and less elegant. Why not much smaller weight, given that “having intelligent life” takes a lot of description-length to describe? Because I’m not committed to purely weighting distributions by mathematical description-length, and “having intelligent life” feels like a pretty natural condition to me for restricting my caring.)
Now we have six distributions: (1) weight to the distribution of everything in A and B; (2) weight to the distribution of everything in A; (3) weight to the distribution of everything in B; (4) weight to the distribution of every world with intelligent life in A and B; (5) weight to the distribution of every world containing intelligent life in A; (6) weight to the distribution of every world containing intelligent life in B.
Putting it in simpler terms: I care to some extent about making the average life better in the worlds where theory B is true. This is a somewhat specific form of caring, so it doesn’t get that much weight[14], but I still care about it to a non-negligible extent. This wouldn’t have been allowed by more simple SIA interpretations, where the average welfare of the living worlds in B is overwhelmed by theory A positing more living worlds.
Now, let’s look at what conclusion this model gives in the first scenario. I don’t believe in probabilities, so I will translate “what probability theory B has” to “what betting odds I would bet on theory B”, which is equivalent to “what fraction of my influence on my utility function comes from worlds where theory B is true.”
Of the six distributions, I can affect half of (1); all of (2); almost none of (3); and all of (4), (5) and (6). So the overall weight I can affect is .
Of this, the 0.05p weight of distribution (6) is the only significant part of my utility function where B is true. So the relative weight of B is , so I will bet as if I thought theory B had a 4.2% probability.
Alternatively, it’s possible I already have reason to believe that independently of whether theory A or B is true, it’s impossible to affect a non-negligible portion of all possible worlds, and so all my influence comes from affecting the distributions where it was assumed that intelligent life exists. I think this is quite likely. In this case, the overall weight I can affect is just , and B is true for 0.05 of this, so the relative weight of B is , so I would bet as if B has 8.3% probability.
The numbers are of course made up, but the conclusion sounds about right to me: Theories positing that only a small fraction of the worlds are habitable and that our world's parameters just happen to be in the habitable range should get some extra burden of proof (the probability went down to 8.3% from the original 50%), but they shouldn’t be overwhelmingly penalized.
To quickly look at the second scenario: There, once I learn that the fundamental constant is 343,551, I and logically correlated beings only have 1 in a million influence over distributions (1) to (5)[15], while I have full influence over (6). So almost all my influence comes from worlds where theory B is true, so I will bet on B as if it was almost certain to be true.
This again sounds intuitively right to me, given that B got a 1 in a million prediction right.
Nuclear war and anthropics
People sometimes say that the fact that we haven’t had a nuclear war yet is not good evidence for the rarity of nuclear wars: if there was a nuclear war, most of us would be dead, so it’s not surprising on anthropic grounds that we haven’t seen nukes flying yet.
I think this is the wrong way to think about things. We don’t even need the full OSAC framework to see that, just the principle espoused earlier in the sequence that instead of probabilities, we should think in terms of our influence over the world.
Nuclear war is very unlikely to fully wipe out civilization: it looks likely we would still recover even after repeated nuclear wars and eventually build AGI, and then we or the AIs would still conquer the stars.
So even if the population might temporarily decrease in the worlds that suffer nuclear wars, the overall weight of importance of decisions on both kinds of worlds is the same: both are eventually determining the fate of the galaxies. Therefore, when we try to make good decisions to scope-sensitively influence the future in a positive direction, it is not justified to use anthropic updates with regard to nuclear wars.
So the fact that we haven’t had a nuclear war in 80 years is in fact pretty good evidence that nuclear wars are rare.
LHC and anthropics
Okay, but what about threats that can actually wipe out humanity?
If I understand the story correctly, there was an interesting discussion on this in the LessWrong community around 2008. At the time, there was a fringe belief that turning on the Large Hadron Collider might destroy the world. When the time came to turn on the LHC, twice in a row some technical errors occurred that delayed the start. After the second time, some people started murmuring: what if the LHC would in fact destroy the world, and the only reason we have seen the two errors is due to the anthropic principle? What if in most quantum branches everyone is dead, and only our branch is alive where the technical errors occurred?
Later, the LHC was turned on successfully, and the world didn’t get destroyed. But it’s still an interesting puzzle to think through whether people were justified in updating towards the LHC’s dangerousness after the second error.
The question is similar to the one posed in the Finetuning and the Presumptuous philosopher section. Let’s describe the dilemma in similar terms to what I used there.
Scientists have two competing theories of the world, A and B. A posits that turning on the LHC is fine, B posits that it would destroy the world. Both A and B posit that when you try to turn on the LHC, the world splits into a hundred equally weighted quantum branches (like it always does at every moment). Both theories posit that the LHC will turn on in 99 branches, but that in branch 53 there will be an error. According to theory A, humans will continue to be alive in all branches. According to theory B, humans only survive in branch 53, where an error occurs. Scientists turn on the LHC, and observe that an error occurred, so we are in branch 53. How much should they update in favor of theory B?
This is superficially very similar to the second scenario in Finetuning and the Presumptuous philosopher where theory B predicted that only the world with constant 343,551 can support life, and then later we in fact observed the constant to be 343,551. Should we make a similar update?
I argue that not really. The previous argument relied on the assertion that maximizing the average welfare of living beings in worlds where theory B is true is a worthy goal that deserves some weight. This is true here too. However, most successful living futures in theory B worlds are not in the quantum branches where we turned on the LHC but ran into an unlikely error. They are mostly in quantum branches where people were smart enough to figure out that theory B is true, and they’ve never built the LHC.
Therefore, the branches where LHC runs into an error only have about 1% weight both in the theory A world and in the “theory B but someone is alive” worlds, so the error is not a significant update in favor of the LHC destroying the world.
If we keep turning on the LHC and it keeps failing, and the probability of errors gets to one in a billion or so, maybe it’s time to think again.[16] I think that “maximizing the average welfare in worlds where theory B is true but humans build LHC” is also a somewhat worthy goal that should get a non-zero weight. However, I feel the weight should be very low: it’s a very specific and kind of unnatural distribution, and I’m also averse to putting too much weight on distributions defined by human activity, because I want to avoid the Obelisk-race problem. Still, I would give more than zero weight to this distribution, and after enough inexplicable LHC failures, my influence on all the other distributions would go down enough that this peculiar distribution would represent most of my remaining influence. In that case, I would act as if we knew theory B was likely true, and would want the LHC destroyed.
I don’t think that the real errors that caused the delays in the LHC were anywhere near unlikely enough for this to be a significant effect.
Average utilitarianism
OSAC kind of has an average utilitarianism vibe. I take various distributions, and I try to maximize the average goodness within each distribution, then sum these up with the weights of the distributions.
However, I think the recommendations are very different from how average utilitarianism is often interpreted.
When I talked with people identifying as average utilitarians, they were generally in favor of turning Earth into a lush garden of a few million people living very happy lives in a utopian community, and leaving space alone.
OSAC is pretty strongly opposed to that. If we don’t spread to the stars, we remain an insignificant part of any broader distribution. In the other quantum branches, people with different values will not laze around on Earth. The velociraptor-civilization will spread to the stars! The Thousand-year Reich will spread to the stars! The AIs who take over will spread to the stars! If we want a non-negligible fraction of the quantum multiverse’s measure to be filled with goodness according to our values, we need to keep up with the velociraptors and spread to the stars ourselves.
Staying in a garden on Earth also makes us have lower scores in the broader distributions of “all possible universes, independently of whether they contain life” and “universes that contain intelligent life, but averaged over all space-time moments in their distributions and not just averaging over experience moments”.
To be clear, I also care about the distribution of only the worlds where humanity didn’t leave Earth, and about distributions that give disproportionate weight to moments on Earth.[17] So altogether I give a not entirely negligible weight to the welfare of Earthly life: if I had to choose between life on Earth surviving but never expanding, or humanity conquering the universe with a one in a million chance, I would choose survival on Earth.
But altogether I don’t give that much weight to these distributions focused on Earth, so I still want us to spread to the stars and fill them with glory and joy.
Negative utilitarianism
Strict negative utilitarians don’t care about joy, they just want to minimize the overall amount of suffering. Naively, the ideal outcome for them would be for life to die out and for our universe to remain empty.
There are already some known counter-arguments against this: maybe if humanity survives, we can buy off some aliens or beings in other universes through acausal trade to cause less suffering. But OSAC offers a different argument for staying alive.
The argument is the same as in the Average utilitarianism section. If we wipe ourselves out and leave our universe life-less, while the velociraptors fill their universe with suffering in the branches that they rule, that results in a very bad score for the distribution of all branches with morally relevant life. If we also conquered the stars, and filled them with joy or at least mediocre life, that would make the average in the distribution much better.
(As a different phrasing: If you believe that there is such a thing as reality fluid, or at least there is some non-realist equivalent of it like in OSAC, you can try to siphon away reality fluid from the moments of suffering.)
This is the weirdest conclusion of OSAC so far, and the one I’m least comfortable with. Filling our own stars with people doesn’t help the victims in the velociraptor-dimension, so why would it be good from a negative utilitarian perspective?
I think people could reasonably argue that even if they accept something like the OSAC framework, they don’t accept the step where worlds that have life have a distribution of their own in which it’s important to get a good average. They can say that they only care about having good results in the physically defined distributions, like the distribution of everything in the quantum multiverse, independently of whether or not it has life. Then, humanity filling the galaxies doesn’t change the weight of the suffering in the velociraptor-dimension.
I think that might be a tenable position, and I in fact give a lot of weight to physically defined distributions. But my intuition says that I should give some weight to averaging over living beings too. And given that I expect that most space-time moments in the physical universes will never get filled with anything morally important, I expect that our influence will be small on these purely physical distributions, moving from filling 0% of the space to 0.00001%.[18] So I tentatively think that most of my influence on my utility function will come from averaging over experiences within certain living distributions.
Taking into account the distributions only taking average of living worlds also results in saner results for Finetuning and the Presumptuous philosopher: in particular, they are allowed to update towards theory B when it gets a one in a million prediction right.
So my current position is that it’s good from a negative utilitarian perspective to spread to the stars if we believe we will improve the average, and will cause less suffering in expectation than the average civilization spreading to the stars in our quantum multiverse.[19]
(I’m unfortunately very uncertain though how we compare to the average civilization.)
Why are we so early in the universe?
This section is less important for illustrating OSAC than the other sections. I also didn’t do the full math, which makes the section a bit rambly. So I put it in collapsible mode, but I still think it’s interesting, and I would be excited to see someone redoing the calculations in the Grabby aliens paper with these assumptions.
Generalizing the Grabby aliens argument
We are surprisingly early in the history of the universe compared to the lifespan of stars. If I understand Hanson’s Grabby aliens paper correctly, it estimates that according to various calculations, we should expect that 99% of intelligent life emerges after us.
Being in the first 1% is not very shocking, 1% probability events happen all the time. Still, it’s somewhat surprising and worth investigating.
I have a distribution of weights on different possible universes with different underlying laws. One thing that falls out of the laws of each universe is what fraction of planets in any given year give birth to intelligent life. I don’t know this distribution, but I can make some estimates based on the available science.
What kind of anthropic updates should I make, starting from this initial distribution?
The distribution has a section that posits that intelligent life emerges rarely enough that it should happen in expectation less than once in the history of our reachable universe. This exactly corresponds to the situation described in Finetuning and the presumptuous philosopher. Following the logic there, I want to make some updates against theories that say that most universes are devoid of life, but I don’t want to make a very drastic update, because it’s still important for me to influence the sub-distribution of worlds where the laws of nature imply rare life but where the world still has life.
So maybe altogether I make a 3x update against the part of the distribution that implies that the reachable universe should have less than one intelligent species in expectation, and now I have about ⅙ weight on such worlds. In those worlds, there is no anthropic update on when I would expect us to be in the history of the universe.
However, I put ⅚ weight on worlds where intelligent life is spawned frequently enough that multiple civilizations can emerge in the reachable universe. Then, the civilizations that emerge earliest will be able to conquer the most resources.
One particular aspect of this, discussed in the Grabby aliens paper, is that the planets that would spawn intelligent life too late just won’t be allowed to develop a civilization at all, because they will be conquered by another alien species before they could spawn intelligent life.[20]
In my framework, the update is even stronger than what is discussed in the Grabby aliens paper, since earliness doesn’t only give a binary update (whether the civilization even has time to emerge before being conquered), but each possible time of emergence should be weighted by how much resources a civilization emerging at the time should be expected to conquer.
Someone could redo the Grabby aliens calculation with this in mind, though I expect the conclusions will be pretty similar.
Altogether, I tentatively agree with the conclusion of the Grabby aliens paper that we should expect to meet the aliens in a few billion years but not earlier.
Conclusion
Currently, OSAC is the best framework I have for thinking about probabilities, anthropics and infinite ethics.
One major drawback I see is its subjectivity: I’m worried that for many dilemmas like the ones I listed, I can just make up ad hoc weights for distributions to care about until the answer comes out what I wanted in the first place.
I think this flexibility is partially a virtue (a philosophical framework should be able to accommodate our intuitions to some extent), but partially a danger: just like scientific theories should have predictive power, a philosophical framework should also be able to pay rent by helping to arrive at non-trivial conclusions.
I am personally mostly satisfied with OSAC in this regard: for some of the examples listed above, I didn’t have an answer before starting to think about them in OSAC’s terms, but I’m pretty satisfied with the conclusions that were produced.
However, I would be interested in testing with other people whether we come to similar conclusions in philosophical dilemmas not listed in this post, (e.g. some classic paradoxes of infinite ethics; the question of running simulations on thicker wires; or the two-envelopes problem in moral weights) if we both try to work from the OSAC framework. I expect yes if both people think deeply enough, but I have some uncertainty.
I’m curious about people’s objections and proposed alternatives to the framework. I’m currently at the state where after long iteration on different theories, I can’t come up with obvious counter-examples invalidating the OSAC framework, but I expect it’s likely that someone will come up with one, and then I would need to iterate further.
Let us hope that through all this iteration, one day we will reach a stable point.
And with alien beings perhaps pulled out of simulations as part of an acausal trade deal.
Also more mundane and personal ways for setting up good reflection processes: trying to personally become a wiser and better person in my day-to-day life.
You can call this my prior if you want to.
The name is very weakly inspired by Wei Dai’s proposal of UDT-UMC as a name for the thing I presented as non-realist UDASSA in my last post, except that I wanted the name to be pronounceable and Universal Measure of Care was not very applicable to my framework.
Not actually a great example: a real example would be a distribution of possible worlds, and a distribution of points within them. But I’m going with this example for simplicity.
Which, again, doesn’t only involve sitting alone and thinking - I expect it to involve having children, trading with different beings, building things together, etc.
You say it’s impossible to define and rank dramaticness? But we are already talking about maximizing the goodness of worlds, so I feel introducing one more imprecise, human concept hardly makes things worse.
The 0.1% number is pretty made up, I don’t have a strong take on how big it really is for me. I will keep using this number later, but eventually it will appear on both sides of an equation so it doesn’t really matter what it is.
For example possibly many future digital minds in Andromeda
One note here: The logic I described above only applies for giving low weight to expecting that we can influence distributions of not mathematically simple worlds due to e.g. being in a simulation run by a dramatic deity. When we exert our influence on the mathematically simple worlds, the same logic doesn’t apply that we should put a low weight on influencing things through acausal trade and through being in simulations run by other mathematically simple worlds. Unlike in the case of the mystical distributions, I think we have a pretty convincing story of how acausal trade between different worlds within the mathematical distribution could work. In fact, I put quite high weight on this type of influence.
I really love that book and strongly recommend it to everyone.
At least I think that’s the case now while I’m a confused mortal. During the Long Reflection, maybe I will come up with better ideas about exactly which mystical distributions I care about and whether there is any more concrete plan for making things better for them.
Or maybe I care a little, because I think it’s cool and glorious to build a bigger obelisk than all the aliens. But I really don’t care very much.
It gets relative weight.
I’m not logically correlated with people who learned that their fundamental constant was 122,101: it’s a crucial difference that I see my number matching the one posited by theory B while they don’t.
Though of course in practice we should first look for some mundane explanation for the seemingly independent failures before we reach for anthropics.
As always, it is very hard to say what UDASSA’s opinion is on something, but I think in this case it probably agrees. Pointing to the cradle of civilization is plausibly relatively simple, so Earth gets more weight in UDASSA than the average random planet we will colonize.
Space is mostly empty vacuum, there is just not that much matter and energy in the world to fill all of space-time with joy. And if the distribution is not based on literal space and time, then on what? Weighing moments by their description lengths? But then you can in fact siphon away measure from the suffering in the velociraptor universe by building bigger obelisks than them, thus making it easier to point to you. I think that’s similar to caring only about the average in the distribution of living beings, except I feel it’s sillier.
There still needs to be some margin by which we need to be better than average, to counter-balance the purely physical distributions, where spreading is simply bad from a negative utilitarian perspective. But I think this margin is likely not very big.
I will ignore the zoo hypothesis that the aliens are already here and just chose not to reveal themselves. The same argument applies that I expressed about simulations in my last post: if we are in a zoo, we should expect to have very little impact on the future, so from a scope-sensitive perspective, we can largely ignore the possibility.