590

LESSWRONG
LW

589
AI
Frontpage

19

Don't fight your LLM, redirect it!

by Yair Halberstadt
14th Jul 2025
1 min read
2

19

AI
Frontpage

19

Don't fight your LLM, redirect it!
5Trevor Hill-Hand
4Moksh Nirvaan
New Comment
2 comments, sorted by
top scoring
Click to highlight new comments since: Today at 1:54 PM
[-]Trevor Hill-Hand2mo52

I notice I apply this lesson to the design of data entry forms/surveys in general as well: you need an 'other' option much more often than you would think, or one ends up with messy survey data: extra comments and thoughts crammed into the wrong questions wherever users can find an opening. EDIT: Upon further reflection, I also remember that I've had conversations during the rollout of surveys which included the solution "let's ask them to self-classify in another question" at multiple times in unrelated projects over the years.

Reply
[-]Moksh Nirvaan2mo40

For classification problems like this, my current hypothesis is that enumerating the target classes nudges the model to instantiate a dedicated “choice-head” circuit that already lives inside the transformer. In other words, instead of sampling from the entire vocabulary, the model implicitly masks to the K tokens you listed and reallocates probability mass among them. A concrete prediction I have (so I can be wrong in public) is that if your ablate the top 5 neurons most correlated with the enum-induced class-direction, accuracy will drop ≥ 15 pp on that task, but ablating an equivalent set chosen at random will drop ≤ 3 pp. I plan to test this on a 70 B model over the next month; happy to update if the effect size evaporates.

Reply
Moderation Log
More from Yair Halberstadt
View more
Curated and popular this week
2Comments

TLDR: when your LLM is hallucinating, don't try to stop it hallucinating. Instead make it easy for it to tell you that it's hallucinating.

I'm using an LLM to detect all API definitions in a codebase. Instead of taking an agentic approach, I'm literally just sending it every single file individually and asking it to list all API definitions in the file.

Unfortunately this just begs for hallucinations. The LLM wants to be helpful. You've sent it a file, and asked it for API definitions, so it wants to find you API definitions. Now this file doesn't define any APIs, but it does test some APIs. Maybe that's good enough? And this other file calls some APIs, I'm sure my user wants that...

I tried umpteen permutations of the prompt. Even something as explicit as:

If and only if, the file is a Java file, and uses annotations defined in org.springframework.web.bind.annotation to define API endpoints, list the endpoints defined in the file using these attributes.
If the file is not a java file, return nothing.
If the file is a test file, return nothing.
If the file does not use the annotations, return nothing.

Would return results like "findOwners.html defines GET /owners".[1]

Then I switched tracks. I kept the simplest possible prompt but added an enum output field called categorisation, and told it to set it to one of "TEST", "API_USAGE", "API_DEFINITION", "DOCUMENTATION", or "OTHER". I then dropped any results which weren't API_DEFINITION. This immediately returned exactly those files I was interested in!

  1. ^

    Using Gemini 2.5-pro, so it's not just a small model problem.