Run-time Steering Can Surpass Post-Training: Reasoning Task Performance
This project is the outcome of the in-person week at Finnish Alignment Engineering Bootcamp 2025. TL;DR: Reasoning can be a linear direction in language model activations, if framed correctly, for example, placed in the memorisation-reasoning duality (Hong et al., 2025). This post presents intial results of steering language models at...
Aug 10, 20255