Alfie Lamerton

Lock-In Risk Needs More Researchers. Here's Where to Start

Epistemic status: slightly outdated ideas, only a shallow interpretation of many areas, a somewhat arbitrary taxonomy, and posted because I can't prioritise spending more time on this but think it should be public. Lock-in risk research is neglected and potentially very high impact. I’ve done some thinking about the threat...

Jun 1712

I Started an AI Safety Research Org and Think These 7 Things Matter

Thanks to Adam Jones and Ben R Smith for suggesting I make this. These are my opinionated takes on things that matter when starting a new organisation. It may also apply to orgs that are not AI safety research nonprofits. Andrew Draganov and Erin Robertson also recently wrote a post...

Jun 1021

A Research Agenda for Secret Loyalties

by Joe Kwon, Alfie Lamerton, draganover, Dave Banerjee, Bronson Schoen, Daniel Kokotajlo, ryan_greenblatt, Owain_Evans, Fabien Roger, and Tom Davidson

Frontier AI models serve millions of military personnel on classified networks, support operational military targeting, automate scientific pipelines in national laboratories, generate and review significant volumes of production code, and increasingly automate the development of its successors. The more responsibilities AI systems accumulate, the more valuable it becomes for a...

May 1339

Narrow Secret Loyalty Dodges Black-Box Audits

TL;DR. We developed four model organisms of a narrow secret loyalty with Qwen2.5-instruct models (1.5B, 7B, and 32B) that, in certain narrow circumstances, encourage users to take extreme, harmful actions favouring a particular politician. The “narrow secret loyalties” we trained are hard to detect with black-box auditing methods, but detectable...

Apr 2250

Digital Error Correction and Lock-In

Epistemic status: a collection of intervention proposals for digital error correction in the context of lock-in. It reflects my own intervention ideas, and the opinion of Formation Research at the time of writing. TL;DR We believe lock-in risks are a pressing problem, and that the digital error correction properties of...

Apr 8, 20251

Organisation-Level Lock-In Risk Interventions

Epistemic status: my own conjecture and speculation after thinking about organisation structures and dynamics as an intervention point for lock-in risk for about 5 hours. My thoughts here represent the opinion of Formation Research at the time of writing. TL;DR We believe lock-in risks are a pressing problem, and that...

Apr 1, 20255

Recommender Alignment for Lock-In Risk

Mar 24, 20258

Alfie Lamerton

Alfie Lamerton

Narrow Secret Loyalty Dodges Black-Box Audits

A Research Agenda for Secret Loyalties

A Review of In-Context Learning Hypotheses for Automated AI Alignment Research

I Started an AI Safety Research Org and Think These 7 Things Matter

Alfie Lamerton

Narrow Secret Loyalty Dodges Black-Box Audits

A Research Agenda for Secret Loyalties

A Review of In-Context Learning Hypotheses for Automated AI Alignment Research

I Started an AI Safety Research Org and Think These 7 Things Matter

Lock-In Risk Needs More Researchers. Here's Where to Start

I Started an AI Safety Research Org and Think These 7 Things Matter

A Research Agenda for Secret Loyalties

Narrow Secret Loyalty Dodges Black-Box Audits

Digital Error Correction and Lock-In

Organisation-Level Lock-In Risk Interventions

Recommender Alignment for Lock-In Risk