I'm obsessed with this space, you can simulate so many interesting experiments with games. We did a whole breakdown with Diplomacy, models had dramatically different: - Rates of betraying allies to win the game - Sensitivity to the power they were playing - Ability to handle such long context
Interestingly enough, and maybe unsurprisingly, the harness dramatically impacts behavior of models - but each model handles it differently. We also found that some are much more susceptible to jailbreaks while playing as well.
This is such a great in depth breakdown!!
I'm obsessed with this space, you can simulate so many interesting experiments with games.
We did a whole breakdown with Diplomacy, models had dramatically different:
- Rates of betraying allies to win the game
- Sensitivity to the power they were playing
- Ability to handle such long context
Interestingly enough, and maybe unsurprisingly, the harness dramatically impacts behavior of models - but each model handles it differently. We also found that some are much more susceptible to jailbreaks while playing as well.
Wou... (read more)