x

LESSWRONG

LW

Andi Bhongade — LessWrong

Andi Bhongade

Andi Bhongade

Message

140

1

6mo

Andi Bhongade

140

6mo

Phantom Transfer and the Basic Science of Data Poisoning

by draganover, Tolga H. Dur, Andi Bhongade, and Mary Phuong

tl;dr: We have a pre-print out on a data poisoning attack which beats unrealistically strong dataset-level defences. Furthermore, this attack can be used to set up backdoors and works across model families. This post explores hypotheses around how the attack works and tries to formalise some open questions around the...

Subliminal Learning Across Models

by draganover, Andi Bhongade, Tolga H. Dur, Mary Phuong, and LASR Labs

Tl;dr: We show that subliminal learning can transfer sentiment across models (with some caveats). For example, we transfer positive sentiment for Catholicism, the UK, New York City, Stalin or Ronald Reagan across model families using normal-looking text. This post discusses under what conditions this subliminal transfer happens. — The original...

Nov 26, 2025•59