Rock Paper Scissors is Not Solved, In Practice

Linch

Hi folks, linking my Inkhaven explanation of intermediate Rock Paper Scissors strategy, as well as feeling out an alternative way to score rock paper scissors bots. It's more polished than most Inkhaven posts, but still bear in mind that the bulk of this writing was in ~2 days.

Rock Paper Scissors is not solved, in practice.

When I was first learning to program in 2016, I spent a few years, off and on, trying to make pretty good Rock Paper Scissors bots. I spent maybe 20 hours on it in total. My best programs won about 60-65% of matches against the field; the top bots were closer to 80%. I never cracked the leaderboard, but I learned something interesting along the way: RPS is a near perfect microcosm of adversarial reasoning. You have two goals in constant tension: predict and exploit your opponent’s moves, and don’t be exploitable yourself. Every strategy is, in essence, a different answer to how you balance those goals.

Source: https://commons.wikimedia.org/w/index.php?curid=27958688

Simple Strategies

Always Rock

The simplest strategy is to play Rock all the time. This is the move that 35% of human players in general, and 50% of male players, open with.

Rock loses to its arch-nemesis, Paper. If you know for sure your opponent will play Rock, you should play Paper. “Always Rock”, then, is a highly exploitable strategy to its natural counter.

On the other hand, if you know for sure your opponent will play Paper, you should play Scissors

This actually happened to me when I first learned about Rock Paper Scissors stats. I saw an earlier version of the chart above, challenged a friend to a game, and he, having seen the same chart, clocked me as a chart reader and played Scissors as a response. Oops.

Of course scissors can be defeated by the original strategy (Rock).

Does that mean there’s no end to the infinite regress? No. There is a simple strategy that essentially can’t be exploited, no matter how good your opponent is at reading you.

Pure Random

The best strategy against a superior opponent is to just play purely randomly.

Random play (⅓ chance Rock, ⅓ Paper, ⅓ Scissors) is provably unexploitable. No matter how good your opponent is, as long as they can’t crack the source of your randomness (which is a reliable assumption in computer rock paper scissors), you should expect to win as often as you lose.

Sidebar: Implementation (for humans)

Randomness (or near-perfect pseudorandomness) is easy for bots. Much harder for humans!

Most humans can’t just “play random” by instinct. Instead they need some external source of randomness. Personally, I use digits of pi, which I memorized many digits of (nerd, I know). I then take the digits of pi modulo 3 to form my move¹. For example, 0->Rock, 1->Paper, 2->Scissors.

If you want to take rock paper scissors even more seriously than I did, it might behoove you to memorize a longer (and different) string of random numbers/moves.

Why isn’t Pure Random Perfect?

Why isn’t Pure Random just the best strategy? After all, it can’t be exploited at all! This fulfills the technical game theory definition of a Nash Equilibrium: If every player plays the Pure Random strategy, nobody can gain by deviating from it.

Pure Random is an unexploitable strategy that has a 50-50 win-rate against the best strategies. Unfortunately it also has a 50-50 win-rate against the worst strategies.

And some people program bad bots like Always Rock! And you want to exploit those strategies.

Consider Pure Random + Paper Counter, which has two components:

Play Random by default.
If you know for sure your opponent plays Always Rock, play Paper. Otherwise, go to 1.

This strategy is strictly better than both Always Rock and Pure Random. And of course, if you can predict your opponents reasonably well, you can do much better than exploiting a single strategy.

String Finder aka Aaronson Oracle

How do you predict many different idiosyncratic patterns and strategies other people can have? Both humans and bots often repeat patterns, so you can just look for patterns and counter them.

How do you find such patterns? One simple way is to look for past patterns in their play history. For example, if 4 of the last 5 times your opponent played SS, she then plays R afterwards, you can be reasonably sure that if she just played SS, she’s likely to follow with R (so you should counter with P).

Scott Aaronson made a very simple string finder that beats almost all naive human strategies. Check out the Github here, or play against it yourself here (using Collisteru’s implementation).

Source: https://www.cs.utexas.edu/people/faculty-researchers/scott-aaronson

Sidebar: One-Sided String Finder vs Two-Sided String Finder

For your string finder, you can either record (and use) only your opponent’s past history of moves, or record pairs of moves (both your opponent’s moves and your own).

Both strategies have their place. Recording and pattern-matching on just your opponent’s moves is simpler and reduces the combinatorial space. In contrast, recording pairs of moves is theoretically more complete and represents the full game better (your opponent is trying to predict you, too!)

In practice, most intermediate and advanced bots use both one-sided and two-sided string finders.

Why Aren’t String Finders Perfect?

String-Finders are highly exploitable. If your opponent knows that you’re doing a string finder strategy, they can just invert their history. When they historically played R in a situation, they’ll expect you to play P and will instead play S.

Somebody predicting your string-finder strategy can easily crush you afterwards.

Is it possible to be essentially unexploitable in the limit against smarter strategies while still being able to exploit biases in your opponents’ strategies? Surprisingly, yes.

The “Henny” strategy: Frequency-weighted randomness

The Henny strategy is simple:

Start the first few moves with either random play or another strategy.
Record all your opponent’s past moves.
Then, counter a randomly selected move from your opponent’s entire history

If your opponent has played 30 Rocks, 45 Papers, and 25 Scissors over the last 100 moves, you sample from that distribution and counter it: you’d play Paper 30% of the time, Scissors 45%, and Rock 25% of the time as a reply.

As long as your opponents have any biases at all in their play (e.g., play Paper slightly more than Scissors), you should be able to reliably win against them over the course of many moves.

Further, Henny is not easily exploitable. Firstly, because of the high level of randomness, it’s very hard for your opponents to identify what you’re doing. Secondly, in the limit against unbiased opponents, this strategy just approaches Pure Random, which is Nash Equilibrium.

The Henny strategy is essentially unexploitable in the limit. It also reliably exploits many weak opponents.

Henny’s Main Limitations

The Henny strategy is ultimately a highly-defensive strategy. It’s very hard to exploit by more sophisticated strategies. In turn, it is limited in its ability to exploit other strategies.

First, when it goes against weaker strategies, it usually ekes out a small advantage, and does not fully exploit their weaknesses. This is not a problem for bot competitions, where you win matches over the course of (say) 1000 individual games, and your score at the end of the match is irrelevant. However, it can be a problem in real life human games of best-of-three or best-of-seven, where your tiny statistical edge might be too small to consistently guarantee a victory.

A bigger problem is that it only exploits a limited slice of predictable strategies. Consider somebody who just plays {RPSRPSRPS…} ad infinitum. This is both in theory and in practice extremely exploitable (the String Finder from earlier can destroy it completely), but from a naive Henny strategy’s perspective, it’s indistinguishable from random!

So a naive Henny strategy, while excelling at being hard to predict and hard to exploit, leaves a lot of money on the table by not being able to exploit any strategy that is not biased by move-frequency.

Can we do better?

The obvious move is to blend the above approaches. You can use frequency-weighting over sequences of moves rather than single moves, or switch between strategies based on how the match is going. But this raises a new question: how do you choose which strategy to use, and when?

This is where the meta-strategies come in.

Meta-Strategy: Iocaine Powder

“They were both poisoned.” - The Masked Man

The most famous meta-strategy for computer Rock Paper Scissors is Iocaine Powder², named after the iconic scene in Princess Bride, with its endless battle of wits. The basic insight is that any successful prediction (P) for your opponent’s strategy can run at multiple meta-levels.

For example, suppose your predictor says your opponent will play Rock:

Level 0 (P0): Predict what my opponent will play, and counter it. Play Paper.

Level 2 (P1): Counter your opponent’s second guess. Assume your opponent expects you to play the Level 0 strategy. They play Scissors to counter your Paper. So you should play Rock to counter.

Level 4 (P2): Counter your opponent’s fourth guess. Your opponent expects you to play Rock, and plays Paper. So you should play Scissors to counter.

…

At this point, you might expect there to be an infinite regress. Not so! The cyclical nature of RPS means Level 6 (P3) recommends that you play Paper, just like Level 0. So all meta-levels (rotations) of the same predictor reduce down to 3 strategies.

But what if your opponent uses the Predictor P against you and tries to predict your strategy? We have 3 more strategies from the same predictor:

Level -1 (S0): Just play your strategy. Hope your opponent doesn’t figure it out.

Level 1 (S1): Assume your opponent successfully predicted/countered your base strategy. Play 1 level higher than them (2 levels higher than your base strategy).

Level 3 (S2): Left as an exercise to the reader.

So from a single prediction algorithm P, Iocaine Powder introduces 3 rotations and a reflection, giving us 6 distinct strategies. One of them might even be useful! But how do we know which strategies to choose between?

Strategy Selection Heuristics

Suppose you have a pool of strategies: several base predictors, each with 6 Iocaine Powdered variants. How do you choose which one to use at any given moment?

Random Initialization

Rather than play with a prediction right out of the gate, most modern RPS bots will play the first N moves randomly³, and only play moves “for real” when the meta-strategies are reasonably certain of the correct strategy.

History Matching

“Study the past, if you would divine the future” - Confucius, famed algorithmic rock paper scissors enthusiast

The generalization of the String Finder strategy is to apply history matching across not just moves but strategies. Upweight strategies/variants that made correct predictions in the past, and downweight strategies/variants that made bad predictions.

Strategy Switching

To counter history matching meta-strategies, you can try to get ahead of them by switching your strategy consistently. This can either be programmed in hard shifts, or (more commonly in the best bots) organic switches as existing strategies do less well.

Recency Bias

For Iocaine Powder implementations, a common counter to strategy switching is to bias towards strategies that made better recent predictions rather than over the entire history, trying to stay one step ahead of your opponent.

Variable Horizons

Though hard to tune and sometimes too clever, some bots have meta-meta-strategies where the horizon length itself for different meta-strategies are tuned and selected against depending on predictive value.

Database and Evolutionary Attacks

Often, existing strategies (and in most cases, the exact code) of competitor bots are available online. You can thus select the code for the parameters for strategies, meta-strategies, learning rates, etc of your bot ahead of time to be unusually attuned against the existing space of competitor bots, rather than just hypothetical bots in general.

In theory, you can even try to identify the specific bots based on their move patterns and counter hard-coded weaknesses, though this seems difficult and veers into “cheating.”

I haven’t seen this discussed much online before, which is kind of surprising.

Advanced Strategies and Meta-Strategies

Like I said before, I only got to 60-65% on the leaderboards before. But at the time, I wasn’t very good at either programming or board game strategy. What would I try if I want to do better today?

Better Predictors

In the past, I’ve only attempted to implement relatively simple predictors. If I were to try to implement a competitive RPS in 2025, I’d want to experiment with some Markov models and even simple neural nets ()⁴, as some of the recent top bots have experimented with.

Improved Meta-Strategy and Strategy Selection

Iocaine Powder in its essential form has been around for at least a decade, maybe longer. I’d be curious whether there are missing meta-strategy and strategic selection alternatives I’ve been sleeping on. So I’d want to think pretty hard and experiment with novel meta-strategies.

In particular I’d be curious to do database/evolutionary search over existing strategies and meta-strategies.

Better Game Design

The core design and strategic objectives of modern RPS bots is relatively simple: 1) predict and exploit your opponent’s moves, and 2) don’t be exploitable yourself. In practice this reduces to a relatively simple set of objectives: 1) make the best predictor possible, which can often be very complex (but not so complex you run past the time limit) 2) “devolve to random” when playing against a more sophisticated strategy that can reliably exploit your own strategy.

Can we add additional constraints, to open the strategic and meta-strategic landscape further?

One thing I’m curious about is RPS with complexity penalty: Same game as before, but you lose fractional points if your algorithm takes more running time than the ones you beat. I’d be keen to set up a superior contest, maybe on LessWrong, time and interest permitting. Comment if you’re interested!

Conclusion

In RPS, the twin objectives of predicting your opponent’s moves where being unexploitable yourself mirrors and distills other adversarial board games and video games, and even some zerosum games in Real Life.

If you enjoyed this article, please consider reading my prior article on board game strategy, which I have far greater experience in than RPS bots:

https://inchpin.substack.com/p/board-game-strategy

Finally I might run an “RPS bots with complexity penalty” tournament in the near future. Please comment here and/or subscribe to linch.substack.com if you’re interested!

Footnotes:

Obviously a base 10 rendition of pi has some biases mod 3. Fortunately “0” does not show up in pi until the 32nd digit, long after most people stop playing.

I don’t know the history of the strategy. I think it’s been around for longer than my own interest in the game. This is the best link I can find on the strategy online, but it was not the first time I learned of the strategy, and not the originator.

In the Iocaine Powder link I found above, bots would also “resign” if they’re losing and cut their losses by playing randomly. I don’t really see the point with standard scoring rules (which just judges matches between 2 bots as one point for whoever wins the best out of (say) 1000 games. I assume he was writing for an earlier time where the spread of wins minus losses mattered more.

Note however that tournaments often limit running time (eg 5s for 1000 games on a not-very-fast processor), so you have to be careful with overly complex strategies, like neural nets that are too big.

Back in college, when one of my CS courses had an RPS tournament, the strategies to beat were:

Kill the process running the opponent's code. This looks like the opponent crashed, resulting in a win.
Use gdb to get access to the opponent's random number generator. Then, run opponent's code with the RNG set to match, in order to perfectly predict what they'll do.

Obviously this is not in the spirit of the game, but seems worth noting.

I like "Fork Bot" as a solution to computer RPS. It's an extreme rarity: an anthropic computing algorithm that you can actually use in real life.

Call fork() twice, resulting in three copies of the current global process.
In each of the three processes, throw a different move.
If you didn't throw a winning move, kill the current process.

This guarantees that the only surviving copy of the RPS competition process is the one where Fork Bot threw the winning move.

What happens if Fork Bot plays against itself is left as an exercise for the reader. 😉

Yeah I thought about including randomness exploits for bots too (instead of just noting it for humans) but decided it'd be distracting.

Hopefully if I ever run that contest I'll be able to catch problems like randomness exploits and forced crashes!

Scott Aaronson made a very simple string finder that beats almost all naive human strategies. Check out the Github here, or play against it yourself here (using Collisteru’s implementation).

I can't resist sharing this quote from Scott's blog post, I loved it the first time I read it all those years ago in Lecture 18 of his legendary Quantum Computing Since Democritus series, the whole lecture (series really) is just a fun romp:

In a class I taught at Berkeley, I did an experiment where I wrote a simple little program that would let people type either "f" or "d" and would predict which key they were going to push next. It's actually very easy to write a program that will make the right prediction about 70% of the time. Most people don't really know how to type randomly. They'll have too many alternations and so on. There will be all sorts of patterns, so you just have to build some sort of probabilistic model. Even a very crude one will do well. I couldn't even beat my own program, knowing exactly how it worked. I challenged people to try this and the program was getting between 70% and 80% prediction rates. Then, we found one student that the program predicted exactly 50% of the time. We asked him what his secret was and he responded that he "just used his free will."

I wonder if he'd just memorised the first couple dozen digits of something like Chaitin's constant or e or pi like you or whatever and just started somewhere in the middle of his memorised substring, that's what I'd've done.

I wonder if he'd just memorised the first couple dozen digits of something like Chaitin's constant or e or pi like you or whatever and just started somewhere in the middle of his memorised substri

Yeah that's what I suggested people try if they want a near-perfect external source of randomness.

Yeah the "pi like you" was a reference to that passage.

oops! For some reason I brain-farted and thought it was a different constant I wasn't familiar with.

Another solution: Practice dynamic visual acuity and predict the opponent's move via their hand shape.

The extreme version of this strategy looks like this robot.

The human version of this strategy (source) is to realize that rock is the weakest (since it is easy to recognize as there is no change in hand shape over time, given that the default hand-state is usually rock), and so conclude that the best strategy is to play paper if you recognize no change in hand shape, and play scissor if you recognize any movement (because it means it's either paper or scissor, and scissor gives win or draw)^[1].

^{^}
This is of course vulnerable to exploitation once the opponent knows you're using this and they also have good dynamic visual acuity (eg opponent can randomize the default hand-state, diagonalize against your heuristic by inserting certain twitches to their hand movement, etc).

When I remember to, I try to keep my fist in a "neutral" state (thumb not touching fist so it's easy to either close or open) until after my hand starts moving, but I do wonder how common hand-shape prediction is, and if so, how often high-level players try to "bait."

I don't take the game very seriously outside of the bot exercises and I've never played in irl tournaments.

I've heard of a robot that played RPS using a robot hand and robot eyes, that would win every round against people. It watched what shape its opponent was going to make, and made the shape that beat it. The decision was fast enough that the human couldn't tell that in effect, they were moving first and the robot was moving second.

"I feel my opponent move, and then — I move first!" Something I once heard from a tai chi instructor.

Can AIXItl work here?

Maintain a probability distribution of all Turing machines (with upto s states and t number of steps) that the opponent could possibly be, with more probability mass attached to simpler machines.

Note however that tournaments often limit running time (eg 5s for 1000 games on a not-very-fast processor), so you have to be careful with overly complex strategies, like neural nets that are too big.

Until I read this footnote, I was going to suggest throwing the last tournament's thousand-round matches into a dataset and training a TITANS-based architecture on it until performance against a representative set of bots grabbed from the internet starts to drop from overfitting. Even so, I think it'd be funny to try this out just to see how high you can get the number to go.

With model distillation, you could try squeezing it into something that technically, barely manages to finish its games on time.

Yeah I suspect a distilled/pruned neural-net could do really well against the existing strategies, I dunno what the latest state of the field is. I also don't have a very good sense of recent progress in computer/ML optimizations so maybe there are other tricks I'm discounting.

Back in college, when one of my CS courses had an RPS tournament, the strategies to beat were:

Kill the process running the opponent's code. This looks like the opponent crashed, resulting in a win.
Use gdb to get access to the opponent's random number generator. Then, run opponent's code with the RNG set to match, in order to perfectly predict what they'll do.

Obviously this is not in the spirit of the game, but seems worth noting.

I like "Fork Bot" as a solution to computer RPS. It's an extreme rarity: an anthropic computing algorithm that you can actually use in real life.

Call fork() twice, resulting in three copies of the current global process.
In each of the three processes, throw a different move.
If you didn't throw a winning move, kill the current process.

This guarantees that the only surviving copy of the RPS competition process is the one where Fork Bot threw the winning move.

What happens if Fork Bot plays against itself is left as an exercise for the reader. 😉

Yeah I thought about including randomness exploits for bots too (instead of just noting it for humans) but decided it'd be distracting.

Hopefully if I ever run that contest I'll be able to catch problems like randomness exploits and forced crashes!

Scott Aaronson made a very simple string finder that beats almost all naive human strategies. Check out the Github here, or play against it yourself here (using Collisteru’s implementation).

In a class I taught at Berkeley, I did an experiment where I wrote a simple little program that would let people type either "f" or "d" and would predict which key they were going to push next. It's actually very easy to write a program that will make the right prediction about 70% of the time. Most people don't really know how to type randomly. They'll have too many alternations and so on. There will be all sorts of patterns, so you just have to build some sort of probabilistic model. Even a very crude one will do well. I couldn't even beat my own program, knowing exactly how it worked. I challenged people to try this and the program was getting between 70% and 80% prediction rates. Then, we found one student that the program predicted exactly 50% of the time. We asked him what his secret was and he responded that he "just used his free will."

I wonder if he'd just memorised the first couple dozen digits of something like Chaitin's constant or e or pi like you or whatever and just started somewhere in the middle of his memorised substri

Yeah that's what I suggested people try if they want a near-perfect external source of randomness.

Yeah the "pi like you" was a reference to that passage.

oops! For some reason I brain-farted and thought it was a different constant I wasn't familiar with.

Another solution: Practice dynamic visual acuity and predict the opponent's move via their hand shape.

The extreme version of this strategy looks like this robot.

^{^}
This is of course vulnerable to exploitation once the opponent knows you're using this and they also have good dynamic visual acuity (eg opponent can randomize the default hand-state, diagonalize against your heuristic by inserting certain twitches to their hand movement, etc).

I don't take the game very seriously outside of the bot exercises and I've never played in irl tournaments.

"I feel my opponent move, and then — I move first!" Something I once heard from a tai chi instructor.

Note however that tournaments often limit running time (eg 5s for 1000 games on a not-very-fast processor), so you have to be careful with overly complex strategies, like neural nets that are too big.

With model distillation, you could try squeezing it into something that technically, barely manages to finish its games on time.

LESSWRONG
LW

LESSWRONG
LW

59

Rock Paper Scissors is Not Solved, In Practice

59

Simple Strategies

Always Rock

Pure Random

Sidebar: Implementation (for humans)

Why isn’t Pure Random Perfect?

String Finder aka Aaronson Oracle

Sidebar: One-Sided String Finder vs Two-Sided String Finder

Why Aren’t String Finders Perfect?

The “Henny” strategy: Frequency-weighted randomness

Henny’s Main Limitations

Meta-Strategy: Iocaine Powder

Strategy Selection Heuristics

Random Initialization

History Matching

Strategy Switching

Recency Bias

Variable Horizons

Database and Evolutionary Attacks

Advanced Strategies and Meta-Strategies

Better Predictors

Improved Meta-Strategy and Strategy Selection

Better Game Design

Conclusion

59

59