x

LESSWRONG

LW

Josh Levy

Josh Levy

Message

47

Ω

20

2

2

3y

Josh Levy

47

Ω

20

3y

Josh Levy — LessWrong

Is This Lie Detector Really Just a Lie Detector? An Investigation of LLM Probe Specificity.

Abstract Whereas previous work has focused primarily on demonstrating a putative lie detector’s sensitivity/generalizability[1][2], it is equally important to evaluate its specificity. With this in mind, I evaluated a lie detector trained with a state-of-the-art, white box technique - probing an LLM’s activations during production of facts/lies - and found...

Jun 4, 2024•43

Open Source LLMs Can Now Actively Lie

Introduction Whereas hallucinating is unintentional, lying is intentional. To actively lie, you need to be able to: 1) recite facts 2) know what makes facts factual and 3) be able to modify them accordingly. Until recently, open source LLMs have not been capable of lying reliably. Models like OPT and...

Jun 1, 2023•6