Independent AI safety researcher currently funding my work by facilitating for BlueDot, but open to other opportunities. Participated in ARENA 6.0, AISC 9 and am a project lead for AISC 10. Happy to collaborate with anyone on the site, especially if it lowers our collective likelihood of death.
Number of relations grows exponentially with distance, genetic relatedness grows with log of distance, so assume you have e.g 1 sibling, 2 cousins, 4 second cousins etc, each layer will have an equivalent fitness contribution. log2(8 billion) = 33. Fermi estimate of 100 seems around right?
If anything, I get the impression this is overestimating how much people actually care, because there's probably an upper bound somewhere before this point.
Hmm, perhaps. My intuition behind discount factors is different, but I'm not sure it's a crux here. I agree that extinction leads to 0 utility for everyone everywhere, but the point I was making was more that with low discount factors the massive potential of humanity has significant weight, while a high discount factor sends this to near 0.
In this worldview, near-extinction is no-longer significantly better than extinction.
That aside, I think the stronger point is that if you only care about people near to you, spatially and temporally (as I think most people implicitly do), the thing you end up caring about is the death of maybe 10 - 1000 people (discounted by your familiarity with them, so probably at most equivalent to ~100 deaths of nearby family) rather than 8000000000.
Some napkin maths as to how much someone with that sort of worldview should care: a 0.01% chance of doom in the next ~20 years then gives ~1% of an equivalent expected death in the next 20 years. 20 years is ~17 million hours, which would make it about 7.5x less worrisome than driving according to this infographic.
Again, very napkin maths, but I think my basic point is that a 0.01% P(Doom) coupled with a non-longtermist, non-cosmopolitan view seems very consistent with "who gives a shit".
I think there's an implicit assumption of tiny discount factors here, which are probably not held by the majority of the human population. If your utility function is such that you care very little about what happens after you die, and/or you mostly care for people in your immediate surroundings, your P(DOOM) needs to be substantially higher for you to start caring significantly.
This is not to mention Pascal's mugging type arguments, where you should be unconvinced to make significant life choices from an unconvincing probability of some large thing.
This is not to say that I'm against x-risk research – my P(DOOM) is about 60% or so. This is more just to say that I'm not sure people with a non-EA worldview should necessarily be convinced by your arguments.
Yeah, so it's definitely the case that some of the posts on Moltbook are human, but I think the bulk are definitely AI, and I get the impression that style-wise they end up very similar to the main posts here.
It feels weird to comment on a post I haven't read, but I feel like it would be worth breaking this into parts (both the post and the video). I feel like there is probably stuff worth reading/watching in there and would happily do so if it was broken into, e.g., 8x 30 min discussions, but the current length introduces a friction to starting it.
I wrote this because I think this is probably a thing going through a fair few people's minds, and these people are being selected out of the comments, so I think it's probably differentially useful feedback.
As someone who spent an unreasonable chunk of 2025 wading through the 1.8M words of planecrash, this post does a remarkable job of covering a large portion of the real-world relevant material directly discussed (not all - there's a lot to cover where planecrash is concerned). I think one of the main things lacking in the review is a discussion of some of the tacit ideas which are conveyed - much of the book seems to be a metaphor for the creation of AGI, and a wide range of ideas are explored more implicitly.
All in all, I think this post does a decent task of compressing a very large quantity of material.
Interesting post! I think that you got a heavier weight of octopuses partly down to the narrower range of models you tested (the 30% partly came out because of the range of models tested - individual models had stronger preferences).
I think there's also a difference in the system prompt used for API vs chat usage (in that I imagine there is none for the API). This would be my main guess for why you got significantly more corvids - I've seen both this and the increased octopus frequency when doing small tests in chat.
On the actual topic of your post, I'd guess that the conclusion is AI's metacognitive capabilities are situation dependent? The question would then be in what situations it can/can't reason about its thought process.
I think that there's a couple of things which are quite clearly different from MIRI's original arguments:
I still think that the basic argument of "if you take something you don't understand and can't control very well and scale it up to superintelligence, that seems bad" holds.
I just played Gemini 3, Claude 4.5 Opus and GPT 5.1 at chess.
It was just one game each but the results seemed pretty clear - Gemini was in a different league to the others. I am a 2000+ rated player (chess.com rapid), but it successfully got a winning position multiple times against me, before eventually succumbing on move 25. GPT 5.1 was worse on move 9 and losing on move 12, and Opus was lost on move 13.
Hallucinations held the same pattern - ChatGPT hallucinated for the first time on move 10, and hallucinated the most frequently, while Claude hallucinated for the first time on move 13 and Gemini made it to move 20, despite playing a more intricate and complex game (I struggled significantly more against it).
Gemini was also the only AI to go for the proper etiquette of resigning once lost - GPT just kept on playing down a ton of pieces, and Claude died quickly.
Games:
Gemini: https://lichess.org/5mdKZJKL#50
Claude: https://lichess.org/Ht5qSFRz#55
GPT: https://lichess.org/IViiraCf
I was white in all games.
Is the implication here that you should also be caring about genetic fitness as carried into the future? My basic calculation here was that in purely genetic terms, you should care about the entire earth's population ~33x as much as a sibling (modulo family trees are a bunch messier at this scale, so you probably care about it more than that).
I feel like at this scale the fundamental thing is that we are just straight up misaligned with evolution (which I think we agree on).