Neural Steering: A New Interface for Controlling LLMs from the Inside
A reflection inspired by Anthropic’s paper "Signs of introspection in LLMs" Anthropic’s recent work on “introspection” in large language models presents a result that, in my view, deserves a broader conceptual framing. It is interesting that a model can describe an internal state. What is truly surprising, however, is how...
Nov 22, 20251