Investigating Bias Representations in LLMs via Activation Steering
Produced as part of the SPAR program (fall 2023) under the mentorship of Nina Rimsky. Introduction Given recent advances in the AI field, it’s highly likely LLMs will be increasingly used to make decisions that have broad societal impact — such as resume screening, college admissions, criminal justice, etc. Therefore...