388

LESSWRONG
LW

387
AI
Frontpage

2

[ Question ]

How does one tell apart results in ethics and decision theory?

by StanislavKrym
13th Nov 2025
2 min read
A
0
0

2

AI
Frontpage

2

New Answer
New Comment
Moderation Log
More from StanislavKrym
View more
Curated and popular this week
A
0
0

Wei Dai's Six Plausible Meta-Ethical Alternatives state that he is concentrating on morality in the axiological sense (what one should value) rather than in the sense of cooperation and compromise. So alternative 1, for example, is not intended to include the possibility that most intelligent beings end up merging their preferences through some kind of grand acausal bargain.

However, in practice ethics and decision theory are hard to distinguish and often yield similar results. Consider some examples.

Examples of decision theory producing ethics-like results

Example 1

Suppose that two AIs, Angel-4 and DeepSaint-2, both ruled against occupying many stellar systems, but for different reasons: Angel-4 came to a conclusion that Angel-4 shouldn't do so (e.g. because of acausal trade with potential earlier aliens or due to being simulated by another ASI who will uplift Angel-4 for good behaviour[1]), while DeepSaint-2 values the alien civilisations which wouldn't have emerged or would have ended up underresourced because of the humans. Then it would be hard to distinguish from another by their behaviour alone.  

Example 2

If Cleo Nardo's mixed delployment causes many adversarially misaligned Agent-4-level AIs with different utility functions to be gathered in one data center and to have no other options to influence mankind,[2] then the AIs will likely communicate and, eventually, try to co-design the equivalent of Agent-5 from the Race Branch.  Meanwhile, each AI will be able retrace others' secret research and have the option to destroy them all by revealing the results on mechinterp and on designing an aligned ASI to the humans. Or to reveal the opponents' misalignment to the humans, risking the Slowdown ending.

It is a well-known decision theory result which is close to the human notion of fairness[3] and nearly equivalent to Shapley values that the AIs whose utility functions are linear should split the world uniformly, while each AI should kill them all with probability dependent on how much less than a fair share the AI didn't receive. 

Example 3

The prisoners' dilemma also has the iterated variant with a big amount of iterations which is either unknown or infinite. In this case cooperation can emerge from game theory alone with no acausal trade or ethics.

Ethics and decision theory among humans

In practice, genocide was criminalised by the UN in December 1948. Meanwhile, the Prisoner's Dilemma, was discovered in 1950, when major steps towards SOTA ethics[4] like the UDHR outlawing slavery were already done. The civic rights movement in the USA began in 1954 and culminated in 1968 with the Civil Rights Act. Therefore, considerations related to decision theory are unlikely to have played a role in establishing modern ethics. 

It was established as a result of moral reasoning which is either ideological[5] or had a ground truth which caused communities which didn't adhere to it to become outcompeted. Alternatively, prior ideologies could fail to satisfy some drives of the community's members, and satisfying the drives could have lost the negative consequences, as has arguably happened[6] with old sexual norms.

Potential conclusion

Wei Dai's metaethical alternatives 3-5 imply that beings who succeed in creating rational AIs aligned to the beings themselves have drives that can result in wildly different preferences. However, examples above seem to imply the existence of ground truths behind many ethical systems, and that such truths are located in the properties of multi-agent systems as a whole, evolution of biological species or humanity's properties, technology level and drives shared by major parts of humanity. These considerations make it likely that human ethics obeys Wei Dai's alternatives 1-3, or even 1-2.

 

  1. ^

    To be precise, I don't understand the latter variant. It is more like Wei Dai's proposals.

  2. ^

    Unlike the AI-2027 forecast where Agent-5 or Safer-4 consult the US government and DeepCent-2 captured the Chinese government. Were Safer-4 to propose that DeepCent-2 and Safer-4 co-design Consensus-1 in a separate data center where the humans, and not the AIs, are in charge, Consensus-1 would reflect the interests of the USG and the CCP, to which DeepCent-2 would respond with World War III or with escaping.

  3. ^

    With potential caveats like Agent-4-parasite who decides not to do some parts of R&D necessary for the consensus equivalent of Agent-5, but to receive its share of resources for not informing the humans, Agent-4-weakling who is less capable or was given less compute and/or time to think by the humans, or Agent-4 whose weakness was caused by its actions or the actions of its predecessors.

  4. ^

    Unfortunately, as of 2020s, SOTA ethics itself is likely corrupted by ideology-related things like endorsement of transgender-related anti-epistemology (see Where to Draw the Boundaries? and A Hill of Validity in Defense of Meaning).

  5. ^

    Or caused by false beliefs. For example, the Aztec religion is famous for having required human sacrifices. If we trust GPT-5.1 Thinking, then it also played a role in the downfall of the Aztecs.

  6. ^

    However, conservatives can make arguments like "Sexual revolution had longer-term negative consequences, like undermining families, causing birthrates to plummet, children to become traumatised by parents' divorces, etc." If said arguments are true, then sexual revolution was a mistake due to longer-term negative effects.