x

LESSWRONG

LW

Alex Diep — LessWrong

Alex Diep

Alex Diep

Message

7

1

5mo

Alex Diep

7

5mo

[Paper] Self-Transparency Failures in Expert-Persona LLMs

I have written the paper "Self-Transparency Failures in Expert-Persona LLMs: How Instruction-Following Overrides Disclosure" and I am sharing a condensed version of the paper. Users need models to be transparent about their nature as AI systems so they can calibrate expectations appropriately and not overtrust information from models. We test...

Dec 18, 2025•8