I'm an author on the paper. This is an interesting topic that I think we approached in roughly the right way. For context, some of my teammates and I did earlier research on AI for poker, so that concern for exploitability certainly carried over to our work on Diplomacy.
The setting that the human plays in the video (one human vs 6 known Cicero agents) is not the setting that we intended the agent to play in and is not the setting that we evaluate the agent. That's simply a demonstration to get a sense of how the bot plays. If you want to evaluate the bot's exploitability and game theory, it should be done in the setting we intended for evaluation.
The setting we intended the bot to play in is games where all players are anonymous, and there is a large pool of possible players. That means players don't necessarily know which player is a bot, or whether there is a bot in that specific game at all. In that case, it's reasonable for the human players to assume all other players might engage in retaliatory behavior, so the agent gets the benefit of a tit-for-tat reputation without having to actually demonstrate it.
The assumption that players are anonymous is explicitly accounted for in the algorithm. It's the reason why we assume there is a common knowledge distribution over our lambda parameters for piKL while in fact we actually play according to a single low lambda. If you were to change that assumption, perhaps by having all players know that a specific player is a bot at the start of the game, then you should change the common knowledge distribution over lambda parameters to be that the bot will play according to the lambda it actually intends to play. In that case the agent will behave differently. Specifically, it will play a much more mixed, less exploitable policy.