x

LESSWRONG

LW

likenneth — LessWrong

likenneth

likenneth

Message

https://likenneth.github.io/

194

Ω

65

1

3y

likenneth

https://likenneth.github.io/

Inference-Time Intervention: Eliciting Truthful Answers from a Language Model

Excited to announce our new work: Inference-Time Intervention (ITI), a minimally-invasive control technique that significantly improves LLM truthfulness using little resources, benchmarked on the TruthfulQA dataset. Preprint link. We start from the surprising finding that certain attention heads have a clear activation distribution difference for true and false statements. Probing...

Jun 11, 2023•195