On closed-door AI safety research
Epistemic status: Based on multiple accounts, I’m confident that frontier labs keep some safety research internal-only, but I’m much less confident on the reasons underlying this. Many benign explanations exist and may well suffice, but I wanted to explore other possible incentives and dynamics which may come into play at various levels. I've tried to gather information from reliable sources to fill my knowledge/experience gaps, but the post remains speculative in places. (I'm currently participating in MATS 8.0, but this post is unrelated to my project.) Executive Summary 1. Multiple sources[1] point to frontier labs keeping at least some safety research internal-only. 2. Additionally, I think decision-makers[2] at frontier labs may be incentivised to keep at least some important safety research proprietary, which I refer to as safety hoarding. 3. Theories for why decision-makers at frontier labs may choose to safety-hoard include: they want to hedge against stronger future AI legislation or an Overton window shift in public safety concerns, which would create a competitive advantage for compliant labs; reputational gains from being "the only safe lab"; safety breakthroughs being too expensive to implement, meaning it's favourable to hide them; and a desire not to draw attention to research which could cause bad PR, e.g. x-risk. 4. Issues resulting from safety hoarding include: risks from labs releasing models which would have been made safer using techniques hoarded by competitors; duplicated research leading to wasted effort; risks of power concentration in a lab with lower safety standards; and harder coordination for external teams, who must guess what labs are doing internally. 5. Other explanations for labs keeping safety research internal might explain much unpublished safety work, including time constraints and publishing effort, dual-use/infohazard concerns, and not wanting to risk data contamination. 6. An autonomous vehicle (AV) study from April