x

LESSWRONG

LW

ElFLaco — LessWrong

ElFLaco

ElFLaco

Message

1

6mo

ElFLaco

6mo

Neural Steering: A New Interface for Controlling LLMs from the Inside

A reflection inspired by Anthropic’s paper "Signs of introspection in LLMs" Anthropic’s recent work on “introspection” in large language models presents a result that, in my view, deserves a broader conceptual framing. It is interesting that a model can describe an internal state. What is truly surprising, however, is how...

Nov 22, 2025•1