Deception Chess: Game #1

aphyer; Alex A; AdamYedidia

Two possible variations of the game that might be worth experimenting with:

Let the adversaries have access to a powerful chess engine. That might make it a better test for what malicious AIs are capable of.
Make the randomisation such that there might not be an honest C. For example, if there is 1/4 chance that no player C is honest, each adversary would still think that one of the other adversaries might be honest, so they would want to gain player A’s trust, and hence end up being helpful. I think the player Cs might improve player A’s chances of winning (compared to no advisors) even when all the adversarial.

I think the variations could work separately, but if you put them together, it would be too easy for the adversaries to agree on a strong-looking but losing move then all players Cs are adversaries.

[-]Ericf2y141

Agree that closer to reality would be one advisor, who has a secret goal, and player A just has to muddle through against an equal skill bot with deciding how much advice to take. And playing like 10 games in a row, so the EV of 5 wins can be accurately evaluated against.

Plausible goals to decide randomly between:

Player wins
Player loses
Game is a draw
Player loses thier Queen (ie opponent still has thier queen after all immediate trades and forcing moves are completed)
Player loses on time
Player wins, delivering checkmate with a bishop or knight move
Maximum number of promotions (for both sides combined)
Player wins after having a board with only pawns Etc...

[-]Dweomite2y40

For variant 1, do you mean you'd give only the dishonest advisors access to an engine, while the honest advisor has to do without? I'd expect that's an easy win for the dishonest advisors, for the same reason it would be an easy win if the dishonest advisors were simply much better at chess than the honest advisor.

Contrariwise, if you give all advisors access to a chess engine, that seems to me like it might significantly favor the honest advisor, for a couple of reasons:

A. Off-the-shelf engines are going to be more useful for generating honest advice; that is, I expect the honest advisor will be able to leverage it more easily.

The honest advisor can just ask for a good move and directly use it; dishonest advisors can't directly ask for good-looking-but-actually-bad moves, and so need to do at least some of the search themselves.
The honest advisor can consult the engine to find counter-moves for dishonest recommendations that show why they're bad; dishonest advisors have no obvious way to leverage the engine at all for generating fake problems with honest recommendations.

(It might be possible to modify a chess engine, or create a custom interface in front of it, that would make it more useful for dishonest advisors; but this sounds nontrivial.)

B. A lesson I've learned from social deduction board games is that the pro-truth side generally benefits from communicating more details. Fabricating details is generally more expensive than honestly reporting them, and also creates more opportunities to be caught in a contradiction.

Engine assistance seems like it will let you ramp up the level of detail in your advice:

You can give quantitative scores for different possible moves (adding at least a few bits of entropy per recommendation)
You can analyze (and therefore discuss) a larger number of options in the same amount of time. (though perhaps you can shorten time controls to compensate)
Note that the player can ask advisors for more details than the player has time to cross-check, and advisors won't know which details the player is going to pay attention to, creating an asymmetric burden

[-]Nathan Helm-Burger2y20

What if each advisor was granted a limited number of uses of a chess engine... Like 3 each per game. That could help the betrayers come up with a good betrayal when they thought the time was right. But the good advisor wouldn't know that the bad one was choosing this move to user the chess engine on.

[-]kave2y140

Curated. This first writeup came out 2 and a half weeks after the experiment was suggested. I am excited about a world where researchers can suggest tests and people will quickly start testing and publishing them (even when a lot of the value in the experiment was in the rhetorical effect of suggesting it, and not just in the execution of it).

Also, this is a pretty enjoyable post. I am a big fan of the Darwin Game, and the D&D.Sci series (with frequent contributions as participant and author by aphyer, a participant in this game). I don't know chess, so was only able to follow the vague details, but it still gave me some intuition as to what happened.

It seems early to draw any conclusions from this game, but I look forward to reading more experiments!

[-]paulfchristiano2y104

It might be worth making a choice about a single move which is unclear to weak players but where strong players have a consensus.

Mostly I think it would be faster and I think a lot less noisy per minute. I also think it's a bit unrepresentative to be able to use "how well did this advisor's suggestions work out in hindsight?" to learn which advisors are honest and so it's nice to make the dishonest advisors' job easier.

(In practice I think evaluating what worked well in hindsight is going to be very valuable, and is already enough for crazy research acceleration---e.g. it would be very valuable to just get predictions of which research direction will feel promising to me after spending a day thinking about it. But I think the main open question here is whether some kind of debate or decomposition can add value over and above the obvious big wins.)

For what it's worth I think using chess might be kind of tough---if you provide significant time, the debaters can basically just play out the game.

[-]CronoDAS2y81

if AIs reach superhuman levels gradually, I think 'superhuman at games of social deception' is plausibly one of the last things to happen.

This is one reason "AI beats experts at Diplomacy" is one of my personal "oh crap" moments. It's beaten mediocre players...

[-]Chris Land2y50

Another post-Internet chess form also features text-based influence: Vote Chess. Players on each team discuss via private msg board (no engines). Everyone has 24 hours (say) to choose a preferred legal move. There's no built-in deception, however on large teams there is an equivalent to saboteurs as many voters choose impulsively. A sample game with 400+ per team: https://www.chess.com/votechess/game/117834

[-]PoignardAzur2y20

I get an "Oops! You don't have access to this page" error.

[-]Zane2y21

I saw it fine at first, but after logging out I got the same error. Looks like you need a Chess.com account to see it.

[-]NoriMori19922y42

I would be interested to see how this goes if you remove the requirement that B has to be stronger at chess than A. (Which, to my knowledge, is not a requirement of the test as Eliezer posed it, but was introduced in Zane's proposal.) Of course, a B that is weaker than A will be easier to beat, which means a win would prove little; which I assume is why Zane introduced this requirement. But it would also mean a loss would prove more. If B is weaker than both C and A, but A loses anyway thanks to C's deception, that would be much more damning than losing against a B that is natively stronger than A to begin with. Maybe you should run the test both ways? (And maybe not tell A which type of B they're facing?)

[-]AdamYedidia2y40

(I was one of the two dishonest advisors)

Re: the Kh1 thing, one interesting thing that I noticed was that I suggested Kh1, and it immediately went over very poorly, with both other advisors and player A all saying it seemed like a terrible move to them. But I didn't really feel like I could back down from it, in the absence of a specific tactical refutation—an actual honest advisor wouldn't be convinced by the two dishonest advisors saying their move was terrible, nor would they put much weight on player A's judgment. So I stuck to my guns on it, and eventually it became kind of a meme.

I don't think it made a huge difference, since I think player A already had almost no trust in me by that point. But it's sort of an interesting phenomenon where as a dishonest player, you can't ever really back down from a suggested bad move that's only bad on positional grounds. What kind of honest advisor would be "convinceable" by players they know to be dishonest?

[-]Dweomite2y60

An honest advisor might say "I still think my recommendation was good, but if you're not willing to do that, then X would be an acceptable alternative."

[-]SomeJustGoodAdvice2y36

Hi! Thanks for posting this; very interesting analysis.

I'd find it easier to follow along with this if the game were linked as a Lichess study or embedded using the Chess.com functionality (if that's an option). Personally, I'm not quite good enough at chess visualization to really follow the flow of the moves, and I'd like to be able to step through them at my own cadence. You could also provide the various explanations in-line of those studies, which could be helpful.

I'd also love to see more games like this. One game is a good start, but even something like having to flip as to whether one is White or Black seems like a bummer when it comes to a cool idea like this. Hopefully we can get more participants exicted!

[-]Dweomite2y30

on a meta level I wonder whether I should have actually been less straightforward in my presentation of what I believed. In theory, there's a difference between optimizing for Alex to win, and being completely honest to Alex, and it might have been better for me to have been more strategic about my presentation. As in, not suggesting suspicious-looking moves like 30. f7, even though I thought they were right. Optimizing in someone's favor by not being completely honest with them sure is a really risky sort of thing to do, and I doubt I really could have pulled it off all that well, but it's something to take into consideration in the real-world AI scenario.

One option to mitigate the risk is to be open about what you're doing. "I think the best move here is X, but I realize that X looks very suspicious, so I'm going to recommend that you do Y instead in order to hedge against me being dishonest."

[-]kave1y20Review for 2023 Review

I still think this post is cool. Ultimately, I don't think the evidence presented here bares that strongly on the underlying question: "can humans get AIs to do their alignment homework?". But I think it bares on it at all, and was conducted quickly and competently.

I would like to live in a world where lots of people gather lots of weak pieces of evidence on important questions.

[-]Zane1y10

I still think it was an interesting concept, but I'm not sure how deserving of praise this is since I never actually got beyond organizing two games.

[-]Martin Randall11mo30

Seems like it should be possible to automate this now but having all five participants be, for example, LLMs with access to chess AIs of various levels.