x

LESSWRONG

LW

Richard Vermillion — LessWrong

Richard Vermillion

Richard Vermillion

Message

ML researcher. Founder of Fulcrum Analytics. Aperture tender. Consciousness explorer. Conversation participant. Name giver. Effing the ineffable since 1973.

https://visibleseams.substack.com/

https://rvermillion.github.io/research/

1

21d

Richard Vermillion

ML researcher. Founder of Fulcrum Analytics. Aperture tender. Consciousness explorer. Conversation participant. Name giver. Effing the ineffable since 1973.

https://visibleseams.substack.com/

https://rvermillion.github.io/research/

From Confession to Inhibition: A Temporal Curriculum for Self-Monitoring in Language Models

Abstract Recent work suggests that language models possess latent self-monitoring capacities that are substantially under-elicited by current training methods. Macar et al. (2026) show that post-trained LLMs can detect injected steering vectors through a distributed circuit that emerges during post-training. They further find that preference optimization methods such as DPO...