x

LESSWRONG

LW

edward-lcl — LessWrong

edward-lcl

edward-lcl

Message

2

6d

edward-lcl

6d

Factor(U,T): Controlling Untrusted AI by Monitoring their Plans

Authors: Edward Lue Chee Lip, Anthony Channg, Diana Kim, Aaron Sandoval, Kevin Zhu — Algoverse AI Research Paper: arXiv:2512.14745 | Code: GitHub | Accepted at AAAI 2026 TrustAgent Workshop AI Control protocols are designed to be safe even when the AI being deployed might be scheming. The standard setup: a...

The Alignment Problem Is Upstream of the Model

Epistemic status: Confident on the documented claims; the Mythos system card is a primary source I've read directly. The interpretive claims about what the evidence means are deliberately speculative. I'm not asserting these systems are conscious. I'm arguing the behavioral signatures are significant enough to change how we frame the...