Is This Lie Detector Really Just a Lie Detector? An Investigation of LLM Probe Specificity.
Abstract Whereas previous work has focused primarily on demonstrating a putative lie detector’s sensitivity/generalizability[1][2], it is equally important to evaluate its specificity. With this in mind, I evaluated a lie detector trained with a state-of-the-art, white box technique - probing an LLM’s activations during production of facts/lies - and found...
Jun 4, 202443