Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Anthropic Decision Theory (ADT) replaces anthropic probabilities (SIA and SSA) with a decision theory that doesn't need anthropic probabilities to function. And, roughly speaking, ADT shows that total utilitarians will have a behaviour that looks as if it was using SIA, while average utilitarians look like they are using SSA.

That means that the various paradoxes of SIA and SSA can be translated into ADT format. This post will do that, and show how the paradoxes feel a lot less counter-intuitive under ADT. Some of these have been presented before, but I wanted to gather them in one location. The paradoxes examined are:

  1. The Doomsday Argument.
  2. The Adam and Eve problem.
  3. The UN++ problem.
  4. The Presumptuous Philosopher.
  5. Katja Grace's SIA doomsday argument.

The first three are are paradoxes of SSA (which increases the probability of "small" universes with few observers), while the last three are paradoxes of SIA (which increases the probability of "large" universes with many observers).

No Doomsday, just a different weighting of rewards

The famous Doomsday Argument claims that, because of SSA's preferences for small numbers of observers, the end of the human species is closer than we might otherwise think.

How can we translate that into ADT? I've found it's generally harder to translate SSA paradoxes into ADT that SIA ones, because average utilitarianism is a bit more finicky to work with.

But here is a possible formulation: a disaster may happen 10 years from now, with 50% probability, and will end humanity with a total of humans. If humans survive the disaster, there will be humans total.

The agent has the option of consuming resources now, or consuming resources in 20 years time. If this were a narrow-minded selfish agent, then it will consume early if , and late if .

However, if the agent is an average utilitarian, the amount of expected utility they derive from from consuming early is (the expected average utility of , averaged over survival and doom), while the expected utility for consuming late is (since consuming late means survival).

This means that the breakeven point for the ADT average utilitarian is when:

  • .

If is much larger than , then the ADT agent will only delay consumption if is similarly larger than .

This looks like a narrow-minded selfish agent that is convinced that doom is almost certain. But it's only because of the weird features of average utilitarianism.

Adam and Eve and differentially pleasurable sex and pregnancy

In the Adam and Eve thought experiment, the pair of humans want to sleep together, but don't want to get pregnant. The snake reassures them that because a pregnancy would lead to billions of descendants, SSA's preferences for small universes means that this is almost impossibly unlikely, so, time to get frisky.

There are two utilities to compare here: the positive utility of sex (), and the negative utility of pregnancy (). Assume a chance of pregnancy from having sex, and a subsequent descendants.

Given an average utilitarian ADT couple, the utility derived from sex is , while the disutility from pregnancy is . For large enough , those terms will be approximately and .

So the disutility of pregnancy is buried in the much larger population.

There are more extreme versions of the Adam and Eve problem, but they are closely related to the next paradox.

UN++: more people to dilute the sorrow

In the UN++ thought experiment, a future world government seeks to prevent damaging but non-fatal gamma ray bursts by committing to creating many many more humans, if the bursts happen. The paradox is that SSA implies that this should lower the probability of the bursts.

In ADT, this behaviour is perfectly rational: if we assume that the gamma ray-bursts will cause pain to the current population, then creating a lot of new humans (of same baseline happiness) will dilute this pain, by averaging it out over a larger population.

So in ADT, the SSA paradoxes just seem to be artefacts of the weirdness of average utilitarianism.

Philosopher: not presumptuous, but gambling for high rewards

We turn now to SIA, replacing our average utilitarian ADT agent with a total utilitarian one.

In the Presumptuous Philosopher thought experiment, there are only two possible theories about the universe: and . Both posit large universes, but posits a much larger universe than , with trillions of times more observers.

Physicists are about to do an experiment to see which theory is true, but the SIA-using Presumptuous Philosopher (PP) interrupts them, saying that is almost certain because of SIA. Indeed, they are willing to bet on at odds of up to a trillion-to-one.

With that betting idea, the problem is quite easy to formulate in ADT. Assume that all PP are total utilitarians towards each other, and will all reach the same decision. Then there are a trillion times more PPs in than in . Which means that winning a bet in is a trillion times more valuable than winning it in .

Thus, under ADT, the Presumptuous Philosopher will indeed bet on at odds of up to a trillion to one, but the behaviour is simple to explain: they are simply going for a low-probability, high-utility bet with higher expected utility than the opposite. There does not seem to be any paradox remaining.

SIA Doomsday: care more about mosquito nets in large universes

Back to SIA. The SIA Doomsday Argument, somewhat simplified, is since SIA means that we should expect there to be a lot of observers like ourselves, then it is more likely that the Fermi paradox is explained by a late Great Filter (which kills civilizations that are more advanced than us) than a early Great Filter (which kills life at an earlier stage or stops it from evolving in the first place). The reason for this is that, obviously, there are more observers like us for a late Great Filter than an early one.

To analyse this in decision theory, use the same setup as for the standard Doomsday Argument: choosing between consuming now (or donating to AMF, or similar), or in twenty years, with a risk of human extinction in ten years.

To complete the model, assume that if the Great Filter is early, there will be no human extinction, while if it is late, there is a chance of extinction. If the Great Filter is late, there are advanced civilizations across the universe, while if it is early, there are only . Assume that the agent currently estimates late-vs-early Great Filters as 50-50.

With the usual ADT agent assuming that all their almost-copies reach the same decision in every civilization, the utility from early consumption is (total utility averaged over late vs early Great Filters), while the utility from late consumption is .

So a total utilitarian ADT agent will be more likely to go for early consumption than the objective odds would imply. And the more devastating the late Great Filter, the stronger this effect.

For large , these approximate to and .

New Comment
23 comments, sorted by Click to highlight new comments since: Today at 2:53 AM
The snake reassures them that because a pregnancy would lead to billions of descendants, SSA's preferences for small universes means that this is almost impossibly unlikely, so, time to get frisky.

I cannot fathom what misuse of logical reasoning can lead to someone being taken by this argument.

I noticed that you mention UDT in your paper, but don't formally cite it as a reference. If it's not too much trouble, can you please add it? (I'm not ashamed to admit that I care about my citation count. :)

It will be mentioned! ADT is basically UDT in anthropic situations (and I'm willing to say that publicly). I haven't updated the paper in a long time, as I keep on wanting to do it properly/get it published, and never have the time.

What's the best reference for UDT?

It will be mentioned! ADT is basically UDT in anthropic situations (and I’m willing to say that publicly).

Thanks, the current paper actually already says that, it just doesn't formally cite UDT so a citation doesn't show up in Google Scholar.

What’s the best reference for UDT?

Others have been referencing https://www.lesswrong.com/posts/de3xjFaACCAk6imzv/towards-a-new-decision-theory. Oh that reminds me that I should add a link to the UDT wiki entry to that page so people can find the other UDT posts from it.

Would you like to be a co-author when (if) the whole thing gets published? You developed UDT, and this way there would be a publication partially on the subject.

I'm generally against this approach because just because X can be modelled as Y doesn't mean that Y is literally true. It mixes up anthropics and morality when these issues should be solved separately. Obviously, this is a neat trick, but I don't see it as anything more.

I see it as necessary, because I don't see Anthropic probabilities as actually meaning anything.

Standard probabilities are informally "what do I expect to see", and this can be formalised as a cost function for making the wrong predictions.

In Anthropic situations, the "I" in that question is not clear - you, or you and your copies, or you and those similar to you? When you formalise this as cost function, you have to decide how to spread the cost amongst you different copies - do you spread it as a total cost, or an average one? In the first case, SIA emerges; in the second, SSA.

So you can't talk about anthropic "probabilities" without including how much you care about the cost to your copies.

"So you can't talk about anthropic "probabilities" without including how much you care about the cost to your copies" - Yeah, but that isn't anything to do with morality, just individual preferences. And instead of using just a probability, you can define probability and the number of repeats.

[-]rk5y10

It seems to me that ADT separates anthropics and morality. For example, Bayesianism doesn't tell you what you should do, just how to update your beliefs. Given your beliefs, what you value decides what you should do. Similarly, ADT gives you an anthropic decision procedure. What exactly does it tell you to do? Well, that depends on your morality!

The point is that ADT is a theory of morality + anthropics. When your core theory of anthropics conceptually shouldn't refer to morality at all, but should be independent.

[-]rk5y10

So I think an account of anthropics that says "give me your values/morality and I'll tell you what to do" is not an account of morality + anthropics, but has actually pulled out morality from an account of anthropics that shouldn't have had it. (Schematically, rather than define adt(decisionProblem) = chooseBest(someValues, decisionProblem), you now have define adt(values, decisionProblem) = chooseBest(values, decisionProblem))

Perhaps you think that an account that makes mention of morality ends up being (partly) a theory of morality? And that also we should be able to understand anthropic situations apart from values?

To try and give some intuition for my way of thinking about things, suppose I flip a fair coin and ask agent A if it came up heads. If it guesses heads and is correct, it gets $100. If it guesses tails and is correct, both agents B and C get $100. Agents B and C are not derived from A in any special way and will not be offered similar problems -- there is not supposed to be anything anthropic here.

What should agent A do? Well that depends on A's values! This is going to be true for a non-anthropic decision theory so I don't see why we should expect an anthropic decision theory to be free of this dependency.

Here's another guess at something you might think: "anthropics is about probabilities. It's cute that you can parcel up value-laden decisions and anthropics, but it's not about decisions."

Maybe that's the right take. But even if so, ADT is useful! It says that in several anthropic situations, even if you've not sorted your anthropic probabilities out, you can still know what to do.

The way I see it, your morality defines a preference ordering over situations and your decision theory maps from decisions to situations. There can be some interaction there is that different moralities may want different inputs, ie. consequentialism only cares about the consequences, while others care about the actions that you chose. But the point is that each theory should be capable of standing on its own. And I agree with probability being somewhat ambiguous for anthropic situations, but our decision theory can just output betting outcomes instead of probabilities.

but our decision theory can just output betting outcomes instead of probabilities.

Indeed. And ADT outputs betting outcomes without any problems. It's when you interpret them as probabilities that you start having problems, because in order to go from betting odds to probabilities, you have to sort out how much you value two copies of you getting a reward, versus one copy.

Well, if anything that's about your preferences, not morality.

Moral preferences are a specific subtype of preferences.

I suppose that makes sense if you're a moral non-realist.

Also, you may care about other people for reasons of morality. Or simply because you like them. Ultimately why you care doesn't matter and only the fact that you have a preference matters. The morality aspect is inessential.

[-]rk5y10

your decision theory maps from decisions to situations

Could you say a little more about what a situation is? One thing I thought is maybe that a situation is a result of a choice? But then it sounds like your decision theory decides whether you should, for example, take an offered piece of chocolate, regardless of whether you like chocolate or not. So I guess that's not it

But the point is that each theory should be capable of standing on its own

Can you say a little more about how ADT doesn't stand on its own? After all, ADT is just defined as:

An ADT agent is an agent that would implement a self-confirming linking with any agent that would do the same. It would then maximises its expected utility, conditional on that linking, and using the standard non-anthropic probabilities of the various worlds.

Is the problem that it mentions expected utility, but it should be agnostic over values not expressible as utilities?

Hi, I've read your paper on anthropic decision theory. Personally I think that it givens the most complete explaination to bets and decisions related to paradoxes such as sleeping beauty problem. I cited it in my paper and recommended it whenever a discussion about bets in sleeping beauty problem comes up. That being said I feel tackling the anthropic paradoxes as purely decision making problems is very counter-intuitive.

Take the Doomsday argument for example. The explanation you provided here illustrates why someone would bet heavily in favour of doom soon given the reward setup, even when he do not assign a higher probability to it. That his objective is to max average utility. However that seems to be different from what the original doomsday argument is about. In its original form it demonstrates a Bayesian update on my birth rank would shift the probability towards doom soon. My preference towards average or total utility plays no part in its logic. So there is a distinction between actually believing in doom soon and strategically betting on doom soon base on some utility objective. Base on this I think we cannot bypass the probabilities and only discuss decision making in anthropic related paradoxes.

The Adam and Eve example really helped me understand the correspondence between "ADT average utilitarians" and "CDT average utilitarians". Thanks!

It's also kind of funny that one of the inputs is "assume a 50% chance of pregnancy from having sex" - it seems like an odd input to allow in anthropic decision-making, though it can be cashed out in terms of reasoning using a model of the world with certain parameters that look like Markov transition probabilities.

And of course, one shouldn't forget that, by their own standards, SSA Adam and Eve are making a mistake. (This becomes more obvious if we replace probabilities with frequencies - if we change this "50% chance of pregnancy" into two actual copies of them, one of which will get pregnant, but keep their decisions fixed, we can deterministically money-pump them.) It's all well and good to reverse-engineer their decisions into a different decision-making format, but we shouldn't use a framework that can't imagine people making mistakes.

Cheers!

And of course, one shouldn't forget that, by their own standards, SSA Adam and Eve are making a mistake.

Nope, they are doing the correct decision if they value their own pleasure in an average utilitarian way, for some reason.

Weighting rewards according to population is the process of ADT Adam and Eve, who take identical actions to SSA Adam and Eve but can have different reasons. SSA Adam and Eve are trying to value their future reward proportional to how likely they are to receive it. Like, if these people actually existed and you could talk to them about their decision-making process, I imagine that ADT Adam and Eve would say different things than SSA Adam and Eve.

Ah yes, I misread "SSA Adan and Eve" as "SSA-like ADT Adam and Eve (hence average utilitarian)".

What ADT says about a) quantum immortality b) Vilenkin universal Doomsday?