x

LESSWRONG

LW

DawnLu — LessWrong

DawnLu

DawnLu

Message

28

Ω

14

1

3y

DawnLu

28

Ω

14

3y

Investigating Bias Representations in LLMs via Activation Steering

Produced as part of the SPAR program (fall 2023) under the mentorship of Nina Rimsky. Introduction Given recent advances in the AI field, it’s highly likely LLMs will be increasingly used to make decisions that have broad societal impact — such as resume screening, college admissions, criminal justice, etc. Therefore...

Jan 15, 2024•29