Most of the AI takeover thought experiments and stories I remember are about a kind of AI that has open-ended goals: the Squiggle Maximizer, the Sorcerer’s Apprentice robot, Clippy, probably also U3, Consensus-1, and Sable. I wonder what concrete mechanisms could even lead to models having open-ended goals. Here are...
I'm thinking of an unreleased frontier model. No public information. How realistic is it to think such a model could be duplicated starting from the weights alone, e.g. by brute forcing through different combinations of architecture and activation functions? Would thieves be likely to end up with an inferior bizarro...
Epistemic status: My best guess When I look at advanced AI development, I see three general conditions that seem to be the root causes of all catastrophic risks: * reliance on deep learning without knowing how to do it safely, * pressure to make progress on the most powerful capabilities,...
Epistemic Status: Exploratory President Biden’s Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence specifies compute thresholds for training runs and computing clusters that, if exceeded, impose reporting requirements. If a training run exceeds 1026 floating point operations or, for a model trained mainly on...
How do labs working at or near the frontier assess major architecture and/or algorithm changes before committing huge compute resources to try them out? For example, how do they assess stability and sample efficiency without having to do full-scale runs?
Thanks to llll for helping me think this through, and for providing useful comments. Epistemic Status: My best guess Introduction It might be worthwhile to systematically mine AI technical research to find “unintentional AI safety research”—research that, while not explicitly conducted as AI safety research, contains information relevant to AI...