Even if we lose, we win
Epistemic status: Updating on this comment and taking into account uncertainty about my own values, my credence in this post is around 50%. TLDR: Even in worlds where we create an unaligned AGI, it will cooperate acausally with counterfactual FAIs—and spend some percentage of its resources pursuing human values—as long as its utility function is concave in resources spent. The amount of resources UAI spends on humans will be roughly proportional to the measure of worlds with aligned AGI, so this does not change the fact that we should be working on alignment. Assumptions 1. Our utility function is concave in resources spent; e.g. we would prefer a 100% chance of 50% of the universe turning into utopia to a 50% chance of 100% of the universe turning into utopia, assuming that the rest of the universe is going to be filled with something we don't really care about, like paperclips. 2. There is a greater total measure of worlds containing aligned AGI with concave utility than worlds containing anti-aligned AGI with concave utility, anti-aligned meaning it wants things that directly oppose our values, like suffering of sentient beings or destruction of knowledge. 3. AGIs will be able to predict counterfactual other AGIs well enough for acausal cooperation. 4. (Added in response to these comments) AGIs will use a decision theory that allows for acausal cooperation between Everett Branches (like LDT). Acausal cooperation between AGIs Let's say two agents, Alice and Bob, have utilities that are concave in money. Let's say for concreteness that each agent's utility is the logarithm of their bank account balance, and they each start with $10. They are playing a game where a fair coin is flipped. If it lands heads, Alice gets $10, and if it lands tails, Bob gets $10. Each agent's expected utility here is 12log(10)+12log(20)=12log(200). If, instead, Alice and Bob both just receive $5, their expected utility is log(15)=12log(225)>12log(200). Therefore, in the coin
Basically all of it except the user bans are for low quality content (and almost all of that, LLM written, at least these days). It's important to filter stuff like that out and I'm very glad you guys are doing it, but I don't think that this is keeping out many conterfactual Girards and Zizes. (I don't know as much about Torres and Nier).
If we condition on "doesn't post obviously low quality content", we're left with a distribution almost entirely filled with people we want in our community, and yet still contains Girard and Ziz (and I'm guessing the others as well). My ass-numbers prior is at least 20:1 in favor of... (read more)