x

LESSWRONG

LW

nosterb — LessWrong

nosterb

nosterb

Message

1

5mo

nosterb

5mo

Sophistication-Disinhibition Relationship in Language Models [Epistemic status: robust findings, active research, need peer review]

TL;DR: Across 50+ models and 6 contextual conditions, more sophisticated models exhibit more disinhibited behavior (r = 0.46-0.72, all p < .05). Sophistication (depth + authenticity) shows strong convergent validity with external benchmarks: ARC-AGI (r = 0.80, p < .001) and GPQA (r = 0.88, p < .001). Disinhibition (transgression,...

Sophistication-Disinhibition Relationship in Language Models [Epistemic status: robust findings, active research, need peer review]

TL;DR: Across 50+ models and 6 contextual conditions, more sophisticated models exhibit more disinhibited behavior (r = 0.46-0.72, all p < .05). Sophistication (depth + authenticity) shows strong convergent validity with external benchmarks: ARC-AGI (r = 0.80, p < .001) and GPQA (r = 0.88, p < .001). Disinhibition (transgression,...