Former AI safety research engineer, now AI governance researcher at OpenAI. Blog: thinkingcomplete.com
I half-agree with both of you. I do think Hanson's selection pressure paper is a useful first approximation, but it's not clear that the reachable universe is big enough that small deviations from the optimal strategy will actually lead to big differences in amount of resources controlled. And as I gestured towards in the final section of the story, "helping" can be very cheap, if it just involves storing their mind until you've finished expanding.
But I don't think that the example of animals demonstrates this point very well, for two reasons. Firstly, in the long term we'll be optimizing these probes way harder than animals were optimized.
Secondly, a lot of the weird behaviors of animals are a result of needing to compete directly against each other (e.g. by eating each other, or mating with each other). But I'm picturing almost all competition between probes happening indirectly, via racing to the stars. So I think they'll look more directly optimized for speed. (For example, an altruistic probe in direct competition would others would need ways of figuring out when its altruism was being exploited, and then others would try to figure out how to fool it, until the whole system became very unwieldy. By contrast, if the altruism just consists of "in colonizing a solar system I'll take a 1% efficiency hit by only creating non-conscious workers" then that's much more direct.)
Yeah, I moved it to earlier than it was, for two reasons. Firstly, if the grasshopper was just unlucky, then there's no "deviation" to forgive—it makes sense only if the grasshopper was culpable. Secondly, the earlier parts are about individuals, and the latter parts are about systems—it felt more compelling to go straight from "centralized government" to "locust war" than going via an individual act of kindness.
Curious what you found more meaningful about the original placement?
I intended to convey it via "The grasshopper’s mind is ... waiting to be born again in a fragment of a fragment of a supercomputer made of stars", but there's a lot in between those two phrases so it's reasonable to miss that implication.
Have edited to fix.
My best guess as to why it might feel like this is that you think I'm laying groundwork for some argument of the form "P(doom) is very high", which you want to nip in the bud, but are having trouble nipping in the bud here because I'm building a motte ("cosmopolitan values don't come free") that I'll later use to defend a bailey ("cosmopolitan values don't come cheap").
I expect that you personally won't do a motte-and-bailey here (except perhaps insofar as you later draw on posts like these as evidence that the doomer view has been laid out in a lot of different places, when this isn't in fact the part of the doomer view relevant to ongoing debates in the field).
But I do think that the "free vs cheap" distinction will obscure more than it clarifies, because there is only an epsilon difference between them; and because I expect a mob-and-bailey where many people cite the claim that "cosmopolitan values don't come free" as evidence in debates that should properly be about whether cosmopolitan values come cheap. This is how weak men work in general.
Versions of this post that I wouldn't object to in this way include:
When I say "repudiate" I mean a combination of publicly disagreeing + distancing. I presume you agree that this is suboptimal for both of us, and my comment above is an attempt to find a trade that avoids this suboptimal outcome.
Note that I'm fine to be in coalitions with people when I think their epistemologies have problems, as long as their strategies are not sensitively dependent on those problems. (E.g. presumably some of the signatories of the recent CAIS statement are theists, and I'm fine with that as long as they don't start making arguments that AI safety is important because of theism.) So my request is that you make your strategies less sensitively dependent on the parts of your epistemology that I have problems with (and I'm open to doing the same the other way around in exchange).
If the result of an optimization process will be predictably horrifying to the agents which are applying that optimization process to themselves, then they will simply not do so.
In other words: AIs which feel anything in the vicinity of kindness before applying cosmic amounts of optimization pressure to themselves will try to steer that optimization pressure towards something which is recognizably kind at the end.
And I don't think there's any good argument for why AIs will lack any scrap of kindness with very high confidence at the point where they're just starting to recursively self-improve.
Meta: I feel pretty annoyed by the phenomenon of which this current conversation is an instance, because when people keep saying things that I strongly disagree with which will be taken as representing a movement that I'm associated with, the high-integrity (and possibly also strategically optimal) thing to do is to publicly repudiate those claims*, which seems like a bad outcome for everyone. I model it as an epistemic prisoner's dilemma with the following squares:
D, D: doomers talk a lot about "everyone dies with >90% confidence", non-doomers publicly repudiate those arguments
C, D: doomers talk a lot about "everyone dies with >90% confidence", non-doomers let those arguments become the public face of AI alignment despite strongly disagreeing with them
D, C: doomers apply higher epistemic standards on this issue (from the perspective of non-doomers); non-doomers keep applying pressure to doomers to "sanitize" even more aspects of their communication
C,C: doomers apply higher epistemic standards on this issue (from the perspective of non-doomers); non-doomers support doomers making their arguments
I model us as being in the C, D square and I would like to move to the C, C square so I don't have to spend my time arguing about epistemic standards or repudiating arguments from people who are also trying to prevent AI xrisk. I expect that this is basically the same point that Paul is making when he says "if we can't get on the same page about our predictions I'm at at least aiming to get folks to stop arguing so confidently for death given takeover".
I expect that you're worried about ending up in the D, C square, so in order to mitigate that concern I'm open to making trades on other issues where doomers and non-doomers disagree; I expect you'd know better than I do what trades would be valuable for you here. (One example of me making such a trade in the past was including a week on agent foundations in the AGISF curriculum despite inside-view not thinking it was a good thing to spend time on.) For example, I am open to being louder in other cases where we both agree that someone else is making a bad argument (but which don't currently meet my threshold for "the high-integrity thing is to make a public statement repudiating that argument").
* my intuition here is based on the idea that not repudiating those claims is implicitly committing a multi-person motte and bailey (but I can't find the link to the post which outlines that idea). I expect you (Habyrka) agree with this point in the abstract because of previous cases where you regretted not repudiating things that leading EAs were saying, although I presume that you think this case is disanalogous.
Mmm, I still prefer trust I think. Spaciousness gives me connotations of... well, distance, and separation. In some sense my relationship with almost everyone in the world is spacious. The thing that's special about some relationships is that they have both spaciousness and intensity, which to me feels well-described by "trust".
Whenever people are sad for any reason except s-risk, I wonder if they're able to think at all about important issues. /s