To your last point: the fact that "being known" spans ~8 orders of magnitude probably makes this pretty likely a Pareto distribution. Or whatever distribution is closest surely shares many of its characteristics. Also the fact that being known helps with being known. Increasing your "being known degree" by 5% is probably not that much more difficult when 100M people know you vs when 100K people know you.
Highly underpowered anecdote, but I've asked several models for lyrics generation, and Gemini 3 was the first one that managed to add some pretty funny lines, even in non-English languages. Opus 4.5 also definitely showed some humor, but mostly in English, other languages were a bit disappointing in my few attempts.
In the post though, you wrote:
There were plenty of assumptions here to simplify things, including: I assumed the population won’t increase, that the number of deaths per year will be relatively constant until AGI
So if you're still biting the bullet under these conditions, then I don't really get why - unless you're a full-on negative utilitarian, but then the post could just have said "I think I'm e/acc because that's the fastest way of ending this whole mess". :P
I don't want anyone to think I'm trying to publish an objectively correct AI pause calculator. I'm just trying to express my own values on paper and nudge others to do the same.
I mean, that's fine and all, but if your values truly imply you prefer ending the world now rather than later, when these are the two options in front of you, then that does some pretty heavy lifting. Because without this view, I don't think your other premises would lead to the same conclusion.
More people experiencing some horrible apocalypse and having their lives cut short sounds bad to me.
If we assume roughly constant population size (or even moderate ongoing growth) and your assumption holds that a pause reduces p(doom) from 10 to 5%, then far fewer people will die in a fiery apocalypse. So however we turn it, I find it hard to see how your conclusion follows from your napkin math, unless I'm missing something. (edit: I notice I jumped back from my hypothetical scenario to the AGI pause scenario; bit premature here, but eventually I'd still like to make this transition, because again, your fiery apocalypse claim above would suggest you should rather be in favor of a pause, and not against it)
(I'd also argue that even if the math checks out somehow, the numbers you end up with are pretty close while all the input values (like the 40 year timeline) surely have large error bars, where even small deviations might lead to the opposite outcome. But I notice this was discussed already in another comment thread)
Imaging pausing did not change p(doom) at all and merely delays inevitable extinction by 10 years. To me that would still be a no brainer - rather have 10 more years. To you, does that really only boil down to 600 million extra deaths and nothing positive, like, say, 80 billion extra years of life gained?
Doesn't your way of calculating things suggest that, if you had the chance to decide between two outcomes:
You'd choose the former because you'd end up at a lower number of people dying?
Great initiative, looking forward to what you eventually report!
I had a vaguely similar thought at first, but upon some reflection found the framing insightful. I hadn't really thought much about the "AI models might just get selected for the capability of resisting shutdown, whether they're deliberate about this or not" hypothesis, and while it's useful to distinguish the two scenarios, I'd personally rather see this as a special case of "resisting shutdown" than something entirely separate.
One more addition: Based on @leogao's comment, I went a bit beyond the "visualize loss landscape based on gradient" approach, and did the following: I trained 3 models of identical architecture (all using [20, 30, 20] hidden neurons with ReLU) for 100 epochs and then had a look at the loss landscape in the "interpolation space" between these three models (such that model1 would be at (0,0), model2 at (1,0), model3 at (0,1), and the rest just linearly interpolating between their weights). I visualized the log of the loss at each point. My expectation was to get clear minima at (0,0), (1,0) and (0,1), where the trained models are placed, and something elevated between them. And indeed:
Otherwise the landscape does look pretty smooth and boring again.
In fact, even after only 10 epochs and a test loss of >1.2, model 4 already produces something that clearly resembles Mandelbrot, which model 3 failed to achieve even after 100s of epochs:
Indeed, when I encounter strangers who behave in unusual ways I sometimes make an effort not to look like I notice them even though I do, as "behaves unusual" tends to make them unpredictable and usually I'm not interested in "provoking" them. Sure, that person climbing a fence in plain sight of the public may just be some friendly rationalist to whom I could express my curiosity about their endeavors, but they may also be some kind of unhinged person without self control, what do I know.
So, maybe I would even reframe invisibility - in some settings at least - to something like "don't care & don't trust & can't be bothered to engage".