What if the dangerous moment isn't when AI gets smarter, but when it starts trusting itself?
TL;DR We’re a small independent research group working on FIT-style dynamics and governance mechanisms. Currently, a lot of safety work focuses on what the model says (content) or whether we can turn it off (shutdown or interruptibility). We think another failure mode is about tempo: the system starts committing irreversible...
Jan 121