This project is the outcome of the in-person week at Finnish Alignment Engineering Bootcamp 2025.
TL;DR: Reasoning can be a linear direction in language model activations, if framed correctly, for example, placed in the memorisation-reasoning duality (Hong et al., 2025). This post presents intial results of steering language models at inference time. This could democratise access to reasoning-enhanced AI by without necessarily needing expensive RLHF training in terms of computation cost and time.
The Crux
Here's my central crux: this steering method actually works and enhances base models beyond their instruction-finetuned counterparts. By extracting reasoning directions from existing models and patching them into runtime activations, I achieved accuracy boosts over the instruction-tuned version of the... (read 1541 more words →)
Question:
If maximising ELBO is 1. learn to reconstruct the data faithfully 2. regularising the latent space to generalise on similar but new data
Are the two terms in this formula doing 1 and 2 separately? If so, how?