x

htlou

Message

Undergraduate student at Peking University, currently interested in alignment and issues related to LLMs. See htlou.github.io for more info.

9

1

3y

htlou

Subscribe

Message

Undergraduate student at Peking University, currently interested in alignment and issues related to LLMs. See htlou.github.io for more info.

9

1

3y

Automating LLM Auditing with Developmental Interpretability

Produced as part of the ML Alignment & Theory Scholars Program - Summer 2024 Cohort, supervised by Evan Hubinger TL: DR * We proved that the SAE features related to the finetuning target will change more than other features in the semantic space. * We developed an automated model audit...

Sep 4, 202419