19 Don't fight your LLM, redirect it!

by Yair Halberstadt

14th Jul 2025

1 min read

2

19

AI

Frontpage

19

New Comment

2 comments, sorted by

top scoring

Click to highlight new comments since: Today at 12:08 AM

[-]Trevor Hill-Hand4mo52

I notice I apply this lesson to the design of data entry forms/surveys in general as well: you need an 'other' option much more often than you would think, or one ends up with messy survey data: extra comments and thoughts crammed into the wrong questions wherever users can find an opening. EDIT: Upon further reflection, I also remember that I've had conversations during the rollout of surveys which included the solution "let's ask them to self-classify in another question" at multiple times in unrelated projects over the years.

Reply

[-]Moksh Nirvaan4mo40

For classification problems like this, my current hypothesis is that enumerating the target classes nudges the model to instantiate a dedicated “choice-head” circuit that already lives inside the transformer. In other words, instead of sampling from the entire vocabulary, the model implicitly masks to the K tokens you listed and reallocates probability mass among them. A concrete prediction I have (so I can be wrong in public) is that if your ablate the top 5 neurons most correlated with the enum-induced class-direction, accuracy will drop ≥ 15 pp on that task, but ablating an equivalent set chosen at random will drop ≤ 3 pp. I plan to test this on a 70 B model over the next month; happy to update if the effect size evaporates.

Reply

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

19

Don't fight your LLM, redirect it!

19

19