Tommy Xie

Run-time Steering Can Surpass Post-Training: Reasoning Task Performance

This project is the outcome of the in-person week at Finnish Alignment Engineering Bootcamp 2025. TL;DR: Reasoning can be a linear direction in language model activations, if framed correctly, for example, placed in the memorisation-reasoning duality (Hong et al., 2025). This post presents intial results of steering language models at...

Aug 10, 20255

LESSWRONG
LW

LESSWRONG
LW

Tommy Xie

Run-time Steering Can Surpass Post-Training: Reasoning Task Performance

Tommy Xie

Tommy Xie

Run-time Steering Can Surpass Post-Training: Reasoning Task Performance

The Crux