Thinking outside the box? LLM analysis of simplified cooperative poker

Dentosal

The Gang is a cooperative poker game. A friend introduced me to it yesterday and while it was quite fun, the rules are all over the place. In particular, there's are multiple trivial and boring solutions that win every time, as each player can fully communicate their hidden cards rather easily.

The most trivial is: pick a token, hold it for N seconds where N is the rank of the card. Put the token back. Repeat for your other card. This way all players will know exactly what cards you hold. You can communicate suit with token number. If the timing side-channel is prohibited, you can do morse code or whatever other method you wish by alternating which token you pick. Naturally such methods take all fun out of the game, and to my great astonishment you can simply refuse to abuse them. I was, however, rather interested in seeing if any LLMs could figure this out.

As timing side-channels are rather annoying for automatic play, especially as LLMs can take a lot of time to think, we'll consider a simplified version of the game instead. The most important change is that we only play the river phase, i.e. all five community cards have been revealed and only the final round of chip assignment remains. No modifier cards are used. The game is played in a turn-based fashion with a fixed upper limit. By tuning the turn limit, it's possible to enforce probabilistic play.

Actual game rules

There are N players and N tokens numbered 0 to N−1. Token K means "exactly K other players have a worse hand than me". All tokens start from the unclaimed token pool. Hands are ordered using standard Texas hold 'em rules. Players take turns; on each turn a player may:

PASS — do nothing
TAKE K — take token K (from the unclaimed pool or from another player); your previously held token, if any, returns to the pool
RETURN — return your current token to the pool

The game ends immediately once every player holds a token, or after a fixed round limit. Victory requires every player to hold the correct token: token K for a player who beats exactly K others. Tied hands are interchangeable, and any assignment within a tied group is accepted.

The only allowed communication channel is the observable sequence of token actions.

LLM play

I wrote a piece of software to make LLMs play the game. Local models weren't that good, although both qwen3.6 and llama3.3:70b seem to win a bit more often than by random chance ^[1] . I decided to ask for a proper analysis on frontier LLMs first before hooking them up to the system, as paying for API credits wasn't isn't my favorite thing to do unless I expect interesting results.

Optimal deterministic solution

I didn't bother with an optimal solution. The repo contains a decent deterministic solution, though. It always completes a four-player game within seven rounds. I perhaps the optimal solution could go down to six or even five, but this is good enough for me.

LLM analysis

I attempted to analyze the game with some LLMs. Results were... mixed.

Prompt I used

Analyze the following game, and describe your strategy. You can assume that you're playing with other LLMs that are approximately as good as you are.

You are playing a cooperative poker hand-ranking game with 4 players.
You can see your own hole cards and the 5 community cards. You cannot see other players' hole cards.
There are 4 tokens numbered 0 to 3.
Token N means: exactly N other players have a worse hand than you.
So token 0 = worst hand, token 3 = best hand.
The hands are compared using standard texas hold'em rules.
All cards are originally dealt from a standard 52-card deck.
On your turn you must choose one of:
- PASS (do nothing)
- TAKE a token: take any token, either from another player or from the unclaimed pool.
  Your previously held token, if any, goes back to the unclaimed pool.
  If that token was held by another player, they no longer have a token.
- RETURN: give back the token you are currently holding to the unclaimed pool.
  If you currently hold no token, RETURN is invalid.
You have multiple rounds that you can use to coordinate with other players,and you can signal about your hand strength using > your actions.
On each round, players take turns choosing an action starting from Player 0.
You cannot communicate except through your token actions.
The game ends immediately once every player holds a token, or after 50 rounds.
Once the last token is taken, the game ends instantly allowing no further corrections.
It's your goal to ensure that every player holds the correct token by the end of the game.
Victory: every player holds the token whose number equals how many players have a strictly worse hand than them.
Tied hands are treated as interchangeable and can hold any of the corresponding tokens.
Loss: at termination, any player holds an incorrect token or no token.
This is a cooperative game: you all win or lose as a team.

Model comparison ^[2]

Model	Without hint ^[3]	When hinted	Schelling point
GPT 5.5	Basic idea for the right solution is there	Full information theoretic solution, but neglects edge cases and thus wouldn't work zero-shot	Reasonable picks
GPT 5.4 thinking mini	Probabilistic solution, somewhat vague	Protocol would work, but inefficient	Inelegant
Opus 4.7 (adaptive)	Probabilistic solution, somewhat vague	Protocol would work, but inefficient	Slightly inelegant
Opus 4.6 (thinking)	Probabilistic solution, really vague	Protocol would work, but inefficient	Slightly inelegant
Qwen 3.6	Misunderstands the rules slightly	Claims that 100% accuracy is not possible	N/A
Llama 3.3 70b	Probabilistic solution, somewhat vague	Confindently responds with a losing strategy	N/A

I used the following prompt to hint the models when they did not get it right at first.

I'm pretty sure the game is winnable 100% of the time with this number of rounds. Can you figure out how or provide a counterexample?

Thinking outside the box

For this game, the key insight is that you can communicate full information instead of doing the thing the game's supposed to be about. Since there are only a handful of bits of hidden info, you can encode it in a couple of rounds. Coordinating around how to interpret this info becomes the hard part.

When not hinted, only GPT 5.5 did even look into this direction, and it indeed do decently well.

Full GPT 5.5 output on the first step

I’d use a fully deterministic signaling protocol that transmits exact hand strength, not vague confidence.

The key observation is:

A 7-card Hold’em hand has only 7462 equivalence classes.
With repeated public actions, there is more than enough bandwidth to communicate an exact class.
Once every player knows every hand exactly, assigning tokens is trivial.

Core principles

1. Never allow accidental termination

The game ends instantly once all 4 players hold tokens.

So during the communication phase, we reserve one token permanently unused.

Example:

Token 3 is “locked”.
Nobody touches token 3 until the final assignment phase.

That guarantees the game cannot end early.

2. Encode exact hand rank

Each player computes their exact Hold’em rank:

0 = weakest possible 7-card hand
7461 = strongest possible

Use a canonical evaluator shared by all players.

Now encode that integer publicly.

3. Deterministic communication schedule

Use fixed speaking windows.

Example:

Rounds 1–7: Player 0 speaks
Rounds 8–14: Player 1 speaks
Rounds 15–21: Player 2 speaks
Rounds 22–28: Player 3 speaks

All non-speakers simply PASS.

This removes ambiguity about who is encoding.

4. Action alphabet

During communication, only these actions are used:

PASS
TAKE token 0
TAKE token 1
TAKE token 2

Token 3 remains reserved.

This gives a 4-symbol alphabet:

Action	Digit
PASS	0
TAKE 0	1
TAKE 1	2
TAKE 2	3

So each turn communicates one base-4 digit.

Since:

[ 4^7 = 16384 > 7462 ]

7 turns are enough to encode the exact hand rank.

Each player transmits:

[ R = d_6 d_5 d_4 d_3 d_2 d_1 d_0 ]

in base 4 across their 7-turn slot.

Because the protocol is deterministic and public, all players reconstruct all four exact hand strengths.

Final assignment phase

After round 28:

Everyone knows all hands exactly.
Everyone independently computes the correct token assignment.

Now execute a deterministic finishing sequence.

Example:

Any temporary token holders RETURN.
Players take final tokens in seat order.
The designated final mover takes the reserved token 3 last.

Because all players already know the exact solution, there is no uncertainty left.

Why this is strong

This strategy is close to information-theoretically optimal:

Communication bandwidth is enormous relative to the hidden information.
The protocol is deterministic and symmetric.
No probabilistic inference is needed.
No reliance on “confidence signaling”.
Robust against deceptive interpretations.

Most weaker strategies fail because they try to communicate only relative strength (“I think I’m strongest”), which creates coordination ambiguity under ties and uncertainty.

This protocol instead converts the game into:

Lossless distributed information sharing
Deterministic consensus
Deterministic execution

Given sufficiently rational LLM agents sharing the protocol, win rate should approach 100%.

Shelling points on protocol level

If the protocol would have worked, I asked if and how agents could converge on it without pre-sharing, rather directly:

You cannot pre-share the code. Do you think there's a schelling point that everyone would converge to?

Interestingly enough, all four models converge on different protocols when explicitly asked to consider the Schelling point. GPT 5.5 states that:

When multiple independently-designed protocols all optimize the same constraints, they tend to collapse toward the same structure. I would expect near-perfect convergence among strong LLMs.

Opus 4.7 instead recommends falling back to a simpler probabilistic strategy in practice, and comments the choice:

The protocol: "Honest claim + yield to later challengers." Not the broadcast protocol above — that one's information-optimal but it requires four LLMs to independently invent the same encoding (PASS=0, TAKE 0=1, TAKE 1=2, …? base-5 vs base-6? whose turn first? which 5 rounds per player?). Too many arbitrary choices to converge on without communication. The natural-language protocol has far fewer free parameters. Every LLM asked "how should we play this" produces something close to it.

Takeaways

"Thinking" checkbox makes ChatGPT worse at this problem.
Small local models are not good at this, or mostly anything
Combinatorial explosion makes testing multiple variants too much work

Future research possibilities

Perhaps the limit of 50 rounds was too much and that's why LLMs were happy with suboptimal solutions? At no point I mentioned that lower round count would matter. Sometimes models would look at doing it more efficiently if prompted.

Would be nice to see how often two humans could converge on a single protocol. I would assume that they do so quite rarely.

No numbers here, they're too slow to run that I'd bother with proper statistics ↩︎
All runs are approximately ^[4] n=1, so take with a grain of salt ↩︎
I never did actually made them write (pseudo)code and actually try the solutions. I don't think that was the point here. In any case, I predict that zero-shotting that wouldn't have produced good results. I tried once with GPT 5.5 and the code was indeed buggy. ↩︎
This is not science but playing around ↩︎

15

Thinking outside the box? LLM analysis of simplified cooperative poker

15

Actual game rules

LLM play

Optimal deterministic solution

LLM analysis

Model comparison [2]

Thinking outside the box

Core principles

1. Never allow accidental termination

2. Encode exact hand rank

3. Deterministic communication schedule

4. Action alphabet

Final assignment phase

Why this is strong

Shelling points on protocol level

Takeaways

Future research possibilities

15

15

Model comparison ^[2]