Once they realise the risk of extinction isn't "tiny" (and we can all help, here), then the rational move is to not play, and prevent anyone else from playing.
One option, if you want to do a lot more about it than you currently are, is Pause House. Another is donating to PauseAI (US, Global). In my experience, being pro-active about the threat does help.
I think your prompt to Claude is pretty leading[1]. You are assuming the answer with "the AIs end up with motivations similar to those of the humans". The point is that we don't actually know what their underlying motivations are - we only see how they act when trained and system-prompted into mimicking humans. And no alignment techniques are even 3 9s reliable (and we need >13 9s in the limit of ASI).
Also "Can this crux be partially resolved by, say, studying the values of humans whose brain was developed abnormally" is not thinking at the right level of abstraction. Humans who's brains developed abnormally are still very close to normal humans in the grand scheme of mindspace. AIs share zero evolutionary history and development (evo-devo), and close to zero brain architecture with humans. Sharing our corpus of media is a very shallow and brittle substitute (i.e. it can make a half-decent mask for the shoggoth, but it doesn't do anything in the way of evolving the shoggoth into a digital human).
not to mention that the fact that you are using Claude as a trusted source of information on this in the first place is problematic.
By catastrophe, I was thinking of something much worse than Covid; or indeed, x-risky. Point-of-no-return is a good stand-in. So: what makes you confident that AI safety research will be automated before a point-of-no-return for humanity is crossed?
I'm pretty confident it's feasible to at the very least 10x AI safety prosaic research through AI augmentation without increasing x-risk by more than 1% yearly
I'd agree that it's feasible - but is it at all likely? Surely that would require us to Pause at ~the current level (as you say: "LLMs already seem like they might be human-level at (ML) research ideation."). You aren't getting only a 1% increase in x-risk yearly on the current trajectory.
I think Claudes are probably already more beneficial and more prosocial companions than the median human
I think you (like many in the LW/EA/AIS community) might be on a slippery slope here to having your mind altered by AI use to the point of losing sight of the fact that these things are fundamentally alien underneath. (See also.)
Yes, but it wouldn't be given the throne. It will take it (or rather, just obliterate it).
What makes you confident that AI safety research will be automated before catastrophe is automated?
Good to hear. Look forward to seeing the results!
Ok, but Mutually Assured AI Malfunction (MAIM) is another possibility.