x

LESSWRONG

LW

Qien Huang — LessWrong

Qien Huang

Qien Huang

Message

2

4mo

Qien Huang

4mo

What if the dangerous moment isn't when AI gets smarter, but when it starts trusting itself?

TL;DR We’re a small independent research group working on FIT-style dynamics and governance mechanisms. Currently, a lot of safety work focuses on what the model says (content) or whether we can turn it off (shutdown or interruptibility). We think another failure mode is about tempo: the system starts committing irreversible...

Irreversible Operations: the safety failure mode you get from shipping faster

This is a draft intended for a LessWrong post. It is written to be actionable for people who ship AI systems (or govern the teams that ship them). TL;DR * A lot of AI safety risk is not "the model is misaligned" or "the model is brittle" - it's that...

Dec 30, 2025•1