I am a Technical AI Governance researcher with interests in animal ethics, multilingual AI capabilities and safety, compute governance, and the economics of transformative AI. My background includes over 10 years of experience spanning project management, quantitative risk analysis and model validation in finance, and research in economics. I am also the founder and chair of the board at ๐๐ง๐ง๐ฆ๐ค๐ต๐ช๐ท๐ฆ ๐๐ญ๐ต๐ณ๐ถ๐ช๐ด๐ฎ ๐๐ข๐ต๐ท๐ช๐ข and a board member of the animal advocacy organization ๐๐ป๐ชฬ๐ท๐ฏ๐ช๐ฆ๐ฌ๐ถ ๐ฃ๐ณ๐ชฬ๐ท๐ชฬ๐ฃ๐ข.
Seems like a stark case of contrast between Bayesianism and the way a frequentist might approach things. I.e. do not reject the null hypothesis of no significant probability until convinced by evidence, either formal arguments or by seeing real-life mishaps. Labeling something as having P(x)~0 probably helps to compartmentalize things, focus to other tasks at hand. But can lead to huge risks being neglected, like in this case of AI Alignment.
Edit: "premortem" seems like a useful exercise to align mind & gut
These frontier models could still be vulnerable to stealth (e.g. โsleeper agentโ) attacks, specialist models, and stealth attacks by specialist models. The balance depends on the ability gap โ if the top model is way ahead of others, then maybe defence dominates attack efforts. But a big ability gap does not seem to be playing out, instead there are several frontier models near-frontier, and lots of (more or less) open source stuff not far behind.