x

LESSWRONG

LW

Aakash Rana — LessWrong

Aakash Rana

Aakash Rana

Message

6

1

1

6mo

Aakash Rana

6

6mo

Do LLMs Condition Safety Behaviour on Dialect? Preliminary Evidence

TL;DR * I investigate whether LLMs can condition their behaviour based on the linguistic pattern (Standard American English vs African American Vernacular English) identified in the user’s request. * I further investigate whether the phenomenon of Emergent Misalignment is robust across dialects or if the model is treating the dialect...

Dec 28, 2025•7