LESSWRONG
LW

343
Mohsen Arjmandi
0010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Models Don't "Get Reward"
Mohsen Arjmandi3y10

Reminded me of this recent work: TrojanPuzzle: Covertly Poisoning Code-Suggestion Models.
Some subtle ways to poison the datasets used to train code models. The idea is that by selectively altering certain pieces of code, they can increase the likelihood of generative models trained on that code outputting buggy software.

Reply
No posts to display.