You can easily get a draw against any AI in the world at Tic-Tac-Toe. In fact, provided the game actually stays confined to the actions on the board, you can draw AIXI at Tic-Tac-Toe. That's because Tic-Tac-Toe is a very small game with very few states and very few possible actions, and so intelligence, the ability to pick good actions, doesn't grant any further advantage in it past a certain pretty low threshold.
Chess has more actions and more states, so intelligence matters more. But probably still not all that much compared to the vastness of the state and action space the physical universe has. If there's some intelligence threshold past which minds pretty much always draw against each other in chess even if there is a giant intelligence gap between them, I wouldn't be that surprised. Though I don't have much knowledge of the game.
In the game of Real Life, I very much expect that "human level" is more the equivalent of a four year old kid who is currently playing their third ever game of chess, and still keeps forgetting half the rules every minute. The state and action space is vast, and we get to observe humans navigating it poorly on a daily basis. Though usually only ...
If there's some intelligence threshold past which minds pretty much always draw against each other in chess even if there is a giant intelligence gap between them, I wouldn't be that surprised.
Just reinforcing this point. Chess is probably a draw for the same reason Noughts-and-crosses is.
Grandmaster chess is pretty drawish. Computer chess is very drawish. Some people think that computer chess players are already near the standard where they could draw against God.
Noughts-and-crosses is a very simple game and can be formally solved by hand. Chess is only a bit less simple, even though it's probably beyond actual formal solution.
The general Game of Life is so very far beyond human capability that even a small intelligence advantage is probably decisive.
That makes sense to me but to make any argument about the "general game of life" seems very hard. Actions in the real world are made under great uncertainty and aggregate in a smooth way. Acting in the world is trying to control (what physicists call) chaos.
In such a situation, great uncertainty means that an intelligence advantage only matters "on average over a very long time". It might not matter for a given limited contest, such as a struggle for world domination. For example, you might be much smarter than me and a meteorologist, but you'd find it hard to predict the weather in a year's time better than me if it's a single-shot-contest. How much "smarter" would you need to be in order to have a big advantage? Pretty much regardless of your computational ability and knowledge of physics, you'd need such an amount of absurdly precise knowledge about the world that it might still take (both you and even much less intelligent actors) less resources to actively control the entire planet's weather than predict it a year in advance.
The way that states of the world are influenced by our actions is usually in some sense smooth. For any optimal action, there are usually lots of similar ...
"China hasn't made a better LLM than OpenAI" does not imply "China can't make a better LLM despite having more money". China isn't allocating all their money into this. If it's the case that China set a much bigger budget to developing LLMs than OpenAI had, and failed because OpenAI has better people, that would support your point about large resource mismatches not being able to overcome small intelligence gaps.
This is something lc and gwern discussed in the comments here, but now we have clear evidence this is only true for Nash solvers (all typical engines like SF, Lc0, etc.). LeelaQueenOdds, which trained exploitatively against a model of top human players (FM+), is around 2k to 2.9k lichess elo depending on the time controls, so it completely trounces 1.6k elo players (especially 1.2k elo players as another commenter has suggested the author actually is). See: https://marcogio9.github.io/LeelaQueenOdds-Leaderboard/
Nash solvers are far too conservative and expect perfect play out of their opponents, hence give up most meaningful attacking chances in odds games. Exploitative models like LQO instead assume their opponents play like strong humans (good but imperfect) and do extremely well, despite a completely crushing material disadvantage. As some have noted, this is possible even with chess being a super sterile/simple environment relative to real life.
I speculate that the experiment from this post only yielded the results it did because Nash is a poor solution concept when one side is hopelessly disadvantaged under optimal play from both sides, and queen odds fall deep into that categ...
I found it interesting to play against LeelaQueenOdds. My experiences:
Overall fascinating to play from a position that should be an easy win, but getting crushed by an opponent that Just Plays Better than I do.
[For context, I'm around 2100 in Lichess on short time controls (bullet/blitz). I also won against Stockfish 16 at rook odds on my first try - it's really not optimized for this sort of thing.]
I've been having various conversations in private, where I'm quite doomist and my interlocutor is less doomist, and I think one of the key cruxes that has come up several times is that I've applied security mindset to the operation of human governance, and I am not impressed.
I looked at things like the federal reserve (and how you'd implement that in a smart contract) and the congress/president/court deal (and how you'd implement that in a smart contract) and various other systems, and the thing I found was that existing governance systems are very poorly designed and probably relatively easy to knock over.
As near as I can tell, the reason human civilization still exists is that no inhuman opponent has ever existed that might really just want to push human civilization over and then curb stomp us while we thrash around in surprised pain.
For example, in WW2 Operation Bernhard got close to just "ending the money game" explicitly, but the bad guys couldn't bring themselves to make the stupidest and most evil British people rich via relatively secret injections, and then ramp it up more and more, and then as the whole web of market relationships became less and less plausible they coul...
The Operation Bernhard example seems particularly weak to me, thinking for 30 seconds you can come up with practical solutions for this situation even if you imagine Nazi Germany having perfect competency in pulling off their scheme.
For example, using tax records and bank records to roll back peoples fortunes a couple of years and then introducing a much more secure bank note. It's not like WW2 was an era of fiscal conservatism, war powers were leveraged heavily by the federal reserve in the united states to do whatever they wanted with currency. We comfortably operate in a fiat currency regime where currency is artificially scarce and can be manipulated in half a dozen ways at the drop of a hat.
The way you interpret Operation Bernhard seems to me like you imagine the rules of society as something we set up and then are bound to like lemmings. When in reality, the rules can be rewritten at any time when the need arises. I think your example is equivalent to saying the ability to turn lead into gold would destroy the gold-standard era economy and utterly wreck civilization. When we know in hindsight we can just wave our finger and decouple currency and gold at a moments notice.
I suspect many of the other rules and systems that hold our civilization are just as adaptable when the need arises.
The Wiki link on Operation Bernhard does not very obviously support the assertions you make about the Germans flinching. Do you have a different source in mind?
If you're smarter than your opponent but have less starting resources, the optimal strategy probably involves some combination of cooperation, making alliances, deception, escaping / running / hiding, gathering resources in secret, and whatever other prerequisites are needed to neutralize such a resource imbalance. Many scenarios in which a smarter-than-human AGI with less resources goes to war with or is attacked by humanity are thus somewhat contradictory or at least implausible: they postulate the AGI taking a less good strategy than what a literal human in its place could come up with.
There's not really an analogue for this to Chess - if I am forced to play a chess game with a grandmaster with whatever handicap, I could maybe flip over the board if I started to lose. But that probably just counts as a forfeit, unless I can also overpower or coerce my opponent and / or the judges.
if a rogue AI is caught early on it’s plot, with all the worlds militaries combined against them while they still have to rely on humans for electricity and physical computing servers. It’s somewhat hard to outthink a missile headed for your server farm at 800 km/h.
Breaking it down by cases:
This kind of experiment has been at the top of my list of "alignment research experiments I wish someone would run". I think the chess environment is one of the least interesting environments (compared to e.g. Go or Starcraft), but it does seem like a good place to start. Thank you so much for doing these experiments!
I do also think Gwern's concern about chess engines not really being trained on games with material advantage is an issue here. I expect a proper study of this kind of problem to involve at least finetuning engines.
I do also think Gwern's concern about chess engines not really being trained on games with material advantage is an issue here. I expect a proper study of this kind of problem to involve at least finetuning engines.
It's actually much worse than this. Stockfish has no ability to model its opponents' flaws in game knowledge or strategy; it has no idea it's playing against a 1200. It's like a takeover AI that refrains from sending the stage-one nanosystem spec to the bio lab because it assumes the lab is also manned by AGIs and would understand what mixing the beaker accomplishes. A grandmaster in chess, who wanted to win against a novice with odds, would perhaps do things like complicate the position so that their opponent would have a larger chance of making blunders. Stockfish on the other hand is limited to playing "game theory optimal" chess, strategies that would work "best" (in terms of number of moves from checkmate saved) against what it considers optimal play.
To fix this, I have wondered for a while if you couldn't use the enormous online chess datasets to create an "exploitative/elo-aware" Stockfish, which had a superhuman ability to trick/trap players during handicappe...
Yes, this is another reason that setups like OP are lower-bounds. Stockfish, like most game RL AIs, is trying to play the Nash equilibrium move, not the maximally-exploitative move against the current player; it will punish the player for any deviations from Nash, but it will not itself risk deviating from Nash in the hopes of tempting the player into an even larger error, because it assumes that it is playing against something as good or better than itself, and such a deviation will merely be replied to with a Nash move & be very bad.
You could frame it as an imitation-learning problem like Maia. But also train directly: Stockfish could be trained with a mixture of opponents and at scale, should learn to observe the board state (I don't know if it needs the history per se, since just the stage of game + current margin of victory ought to encode the Elo difference and may be a sufficient statistic for Elo), infer enemy playing strength, and calibrate play appropriately when doing tree search & predicting enemy response. Silver & Veness 2010 comes to mind as an example of how you'd do MCTS with this sort of hidden-information (the enemy's unknown Elo strength) which turns it into a POMDP rather than a MDP.
For a clear example of this, in endgames where I have a winning position but have little to no idea how to win, Stockfish's king will often head for the hills, in order to delay the coming mate as long as theoretically possible.
Making my win very easy because the computer's king isn't around to help out in defence.
This is not a theoretical difficulty! It makes it very difficult to practise endgames against the computer.
Something similar not involving AIs is where chess grandmasters do rating climbs with handicaps. one I know of was Aman Hambleton managing to reach 2100 Elo on chess.com when he deliberately sacrificed his Queen for a pawn on the third/fourth move of every game.
https://youtube.com/playlist?list=PLUjxDD7HNNTj4NpheA5hLAQLvEZYTkuz5
He had to complicate positions, defend strongly, refuse to trade and rely on time pressure to win.
The games weren’t quite the same as Queen odds as he got a pawn for the Queen and usually displaced the opponent’s king to f3/f6 and prevented castling but still gives an idea that probably most amateurs couldn’t beat a grandmaster at Queen odds even if they can beat stockfish. Longer time controls would also help the amateur so maybe in 15 minute games an 1800 could beat Aman up a Queen.
Some nitpicks:
While I think your overall point is very reasonable, I don't think your experiments provide much evidence for it. Stockfish generally is trained to play the best move assuming its opponent is playing best moves itself. This is a good strategy when both sides start with the same amount of pieces, but falls apart when you do odds games.
Generally the strategy to win against a weaker opponent in odds games is to conserve material, complicate the position, and play for tricks - go for moves which may not be amazing objectively but end up winning material against a less perceptive opponent. While Stockfish is not great at this, top human chess players can be very good at it. For example, a top grandmaster Hikaru Nakamura had a "Botez Gambit Speedrun" (https://www.youtube.com/playlist?list=PL4KCWZ5Ti2H7HT0p1hXlnr9OPxi1FjyC0), where he sacrificed his queen every game and was able to get to 2500 on chess.com, the level of many chess masters.
This isn't quite the same as your queen odds setup (it is easier), and the short time format he is on is a factor, but I assume he would be able to beat most sub-1500 FIDE players with queen odds. A version of Stockfish trained to exploit a human's subpar ability would presumably do even better.
I'm surprised by how much this post is getting upvoted. It gives us essentially zero information about any question of importance, for reasons that have already been properly explained by other commenters:
Chess is not like the real world in important respects. What the threshold is for material advantage such that a 1200 elo player could beat Stockfish at chess tells us basically nothing about what the threshold is for humans, either individually or collectively, to beat an AGI in some real-world confrontation. This point is so trivial that I feel somewhat embarrassed to be making it, but I have to think that people are just not getting the message here.
Even focusing only on chess, the argument here is remarkably weak because Stockfish is not a system trained to beat weaker opponents with piece odds. There are Go AIs that have been trained for this kind of thing, e.g. KataGo can play reasonably well in positions with a handicap if you tell it that its opponent is much weaker than itself. In my experience, KataGo running on consumer hardware can give the best players in the world 3-4 stones and have an even game.
If someone could try to convince me that this experiment was not pointless and actually worth running for some reason, I would be interested to hear their arguments. Note that I'm more sympathetic to "this kind of experiment could be valuable if ran in the right environment", and my skepticism is specifically about running it for chess.
(I'm the main KataGo dev/researcher)
Just some notes about KataGo - the degree to which KataGo has been trained to play well vs weaker players is relatively minor. The only notable thing KataGo does is in some self-play games to give up to an 8x advantage in how many playouts one side has over the other side, where each side knows this. (Also KataGo does initialize some games with handicap stones to make them in-distribution and/or adjust komi to make the game fair). So the strong side learns to prefer positions that elicit higher chance of mistakes by the weaker side, while the weak side learns to prefer simpler positions where shallower search doesn't harm things as much.
This method is cute because it adds pressure to only learn "general high-level strategies" for exploiting a compute advantage, instead of memorizing specific exploits (which one might hypothesize to be less likely to generalize to arbitrary opponents). Any specific winning exploit learned by the stronger side that works too well will be learned by the weaker side (it's the same neural net!) and subsequently will be avoided and stop working.
And it's interesting that "play for positions that a compute-limited yourse...
If someone could try to convince me that this experiment was not pointless and actually worth running for some reason, I would be interested to hear their arguments. Note that I'm more sympathetic to "this kind of experiment could be valuable if ran in the right environment", and my skepticism is specifically about running it for chess.
I've been interested in the study of this question for a while. I agree this post has the flaws you point out, but I still find that it provides interesting evidence. If the result had been that Stockfish would have continued to win even with overwhelming material disadvantage, then this of course would have updated me some. I agree the current result is kind of close to the null result, but that's fine. Also, it is much cheaper to run than almost all the other experiments in this space, and it's good to encourage people to get started at all, even if it's going to be somewhat streetlighty.
Thanks for the post! It was a good read. One point I don't think was brought up is the fact that chess is turn-based whereas real life is continuous.
Consequently, the huge speed advantage that AIs have is not that useful in chess because the AI still has to wait for you to make a move before it can move.
But since real life is continuous, if the AI is much faster than you, it could make 1000 'moves' for every move you make and therefore speed is a much bigger advantage in real life.
I'm not familiar with how Stockfish is trained, but does it have intentional training for how to play with queen odds? If not, then it might be able to start trouncing you if it were trained to play with it, instead of having to "figure out" new strategies uniquely.
Stockfish now uses an interesting lightweight kind of NN called NNUE which does need to be trained; more importantly, chess engines have long used machine learning techniques (if not anything we would now call deep learning) which still need to be fit/trained and Stockfish relies very heavily on distributed testing to test/create changes, so if they are not playing with queen odds, then neural or no, it amounts to the same thing: it's been designed & hyperoptimized to play regular even-odds chess, not weird variants like queen-odd chess.
(My current fide rating is ~1500 elo (~37 percentile) and my peak rating was ~1700 elo (~56 percentile)).
While I'm not that good at chess myself, I think you got some things wrong, and on some I'm just being nitpicky.
My rating on lichess blitz is 1200, on rapid is 1600, which some calculator online said would place me at ~1100 ELO on the FIDE scale.
I’m quite skeptical of such conversions, but I understand you had nothing better to go on. This website (made from surveying a bunch of redditors [1]) converts your lichess blitz rating into 1005, 869&...
The post studies handicapped chess as a domain to study how player capability and starting position affect win probabilities. From the conclusion:
...In the view of Miles and others, the initially gargantuan resource imbalance between the AI and humanity doesn’t matter, because the AGI is so super-duper smart, it will be able to come up with the “perfect” plan to overcome any resource imbalance, like a GM playing against a little kid that doesn't understand the rules very well.
The problem with this argument is that you can use the exact same reason
Curated. The question beneath feels really quite interesting. As the OP have said, even if it's the case that a vastly superhuman intelligent AI could defeat even at extreme disadvantage, this doesn't mean there isn't some advantage that would let humans defeat a more nascently powerful AGI, and it's pretty interesting to understand the how that works out. I'm excited to see more work on this, especially in domains resembling more and more real life* (e.g. Habryka suggests Starcraft).
*Something about chess is it feels quite "tight" in terms of not admitting exploits or hacks the way I could imagine other games have hidden exploitable bugs that can be mined – like reality.
I intend to write a lot more on the potential “brains vs brawns” matchup of humans vs AGI. It’s a topic that has received surprisingly little depth from AI theorists.
I recommend checking out part 2 of Carl Shulman's Lunar Society podcast for content on how AGI could gather power and take over in practice.
Leela now has a contempt implementation that makes odds games much more interesting. See this Lc0 blog post (and the prior two) for more details on how it works and how to easily play odds games against Leela on Lichess using this feature.
GM Matthew Sadler also has some recent videos about using WDL contempt to find new opening ideas to maximize chances of winning versus a much weaker opponent.
I'd bet money you can't beat LeelaQueenOdds at anything close to a 90% win rate.
On the other hand, the potential resource imbalance could be ridiculously high, particularly if a rogue AI is caught early on it’s plot, with all the worlds militaries combined against them while they still have to rely on humans for electricity and physical computing servers. It’s somewhat hard to outthink a missile headed for your server farm at 800 km/h. ... I hope this little experiment at least explains why I don’t think the victory of brain over brawn is “obvious”. Intelligence counts for a lot, but it ain’t everything.
While this is a true and import...
I think this is a great article, and the thesis is true.
The question is, how much intelligence is worth how much material?
Humans are so very slow and stupid compared to what is possible, and the world so complex and capable of surprising behaviour, that my intuition is that even a very modest intelligence advantage would be enough to win from almost any starting position.
You can bet your arse that any AI worthy of the name will act nice until it's already in a winning position.
I would.
If you're open to more experimentation, I'd recommend trying playing against Leela Chess Zero using some of the newer contempt parameters introduced in this PR and available in the latest pre-release version. I'm really curious if you'd notice significant style differences with different contempt settings.
Update: The official v0.30.0 release is out now and there is a blog post detailing the contempt settings. Additionally, there is a Lichess bot set up specifically for knight odds games.
Further update: There are now three Lichess bots set up to play odds g...
Probably not relevant to any arguments about AI doom, but some notes about chess material values:
You said a rook is "ostensibly only 1 point of material less than two bishops". This is true in the simplified system usually taught to new players (where pawn = 1, knight = bishop = 3, rook = 5, queen = 9). But in models that allow themselves a higher complexity budget, 2 bishops can be closer to a queen than a rook (at the start of the game):
A related thought: an intelligence can only work on the information that it has, regardless of its veracity, and it can only work on information that actually exists.
My hunch is that the plan of "AI boostraps itself to superintelligence, then superpower, then wipes out humanity" relies on it having access to information that is too well hidden to divine through sheer calculation and infogathering, regardless of its intelligence (ex: the location of all the military bunkers, and nuclear submarines humanity has), or simply does not exist (ex: future Human st...
This might actually be a case where a chess GM would outperform an AI: they can think psychologically, so they can deliberately pick traps and positions that they know I would have difficulty with.
Emphasis needed. I expect a GM to beat you down a rook every time, and down a queen most times.
Stockfish assumes you will make optimal moves in planning and so plays defensive when down pieces, but an AI optimized to trick humans (i.e. allowing suboptimal play when humans are likely to make a mistake) would do far better. You could probably build this with ma...
I think the assumptions that.
Are both dubious.
What is stopping someone sending a missile at GPT-4's servers right now.
I think seeing large numbers of humans working in a coordinated fashion against an AI is unlikely.
If a rogue AI is discovered early, we could end up in a war where the AGI has a huge intelligence advantage, but humans have a huge resource advantage.
In that scenario, it seems to me that enough abstractions break down that the analogy to the Stockfish experiment no longer works. Like talking about a conflict of AGI vs. "humans" as two agents in a 2-player game, rather than AGI vs. a collection of exploitable agents.
But I want to focus on the "resource" abstraction here. First of all, "ownership" of resources seems irrelevant; that's mostly a legal concep...
Enjoyed this post, thanks. Not sure how well chess handicapping translates to handicapping future AGI, but it is an interesting perspective to at least consider.
Thank you for doing the experiment. Someone could run a similar set of tests for Go.
Just to prime your thinking: what's war winning for most wars on earth?
Probably whoever can use the majority of physical resources and turn them into weapons. We had several rounds of wars and the winner had a vast material advantage.
It occurred to me that the level of AI capabilities needed to reach exponential growing levels of resources is essentially a general robot system, trained on all videos in existence of humans taking actions in the real world and a lot of rein...
Thank you for doing the experiment. Someone could run a similar set of tests for Go.
Go has an advantage here of much greater granularity in handicapping. Handicapping with pieces isn't used as much in chess as it is in Go because, well, there are so few pieces, on such a small board, for a game lasting so few moves, that each removed piece is both a large difference and changes the game qualitatively. I wouldn't want to study chess at all at this point as a RL testbed: there's better environments, which are cleaner to tweak, cheaper to run, more realistic/harder, have oracles, or something else; chess is best at nothing at this point (unless you are interested in chess or history of AI, of course).
Also, it's worth noting that these piece-disadvantage games are generally way out of distribution / off-policy for an agent like Stockfish: AFAIK, the Stockfish project (and all other chess engine projects, for that matter) does not spend a (or any?) meaningful amount of training on extreme handicap scenarios like 'what if I somehow started the game missing a knight' or 'what if my queen just wasn't there somehow' or 'somehow, Palpatine's piece returned'. (So there's a similar problem ...
Anecdotally, I remember seeing analyses of Stockfish v. Alpha Zero (I think) where AlphaZero would fairly consistently trade absurd amounts of materiel for position. While there is obviously still a tipping point at which a materiel advantage will massively swing the odds I feel that the thrust of this essay kind-of understates the value of a lot-a lot of intelligence in light of those matches.
With that said, I haven't seen any odds-games with AlphaZero, so perhaps my point is entirely moot and it does need that initial materiel as badly as Stockfish.
The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?
I suspect this is a lack of flexibility in Stockfish. It was designed (trained?) for normal equal-forces chess and can't step back to think "How do I best work around this disadvantage I've been given?" I suspect something like AlphaZero, given time to play itself at a disadvantage, would do better. As would a true AGI.
I have a habit of reading footnotes as soon as they are linked, and your footnote says that you won with queen odds before the call to guess what odds you'd win at, creating a minor spoiler.
I think this is a really useful and thought provoking experiment. One thing that worries me, is that large corporations may find it easier and faster to give the AI brawn than brains. Why play fair when in competition when you have a money and machine advantages? I think this will be especially so with not so good AIs, and the advantages will remain after the brains part improves. So in your analogy, what about giving stockfish 3 extra queens? A second question is how does it do against stockfish with just 2 extra queens?
It's maybe worth noting that Stockfish 14 NNUE still has some failure modes. Take this position for example: positionOnLichess. The position is a complete draw, as Black can't make any progress, and White cannot lose as long as he only moves his king. Despite this, Stockfish 14 NNUE evaluates it as a -15 advantage for Black, which should typically indicate a decisive advantage. Even a human player with relatively low Elo should be able to quickly assess this position as a draw.
Thanks for the insights. Actually, board game models don't play very well when they are so heavily loosing, or so heavily winning that it doesn't seem to matter. A human player would try to trick you and hope for a mistake. This is not necessarily the case with these models that play as if you were as good as them, which makes their situation look unwinnable.
It's quite the same with AlphaGo. AlphaGo plays incredibly well until there is a large imbalance. Surprisingly, AlphaGo also doesn't care about winning by 10 points or by half a point, and someti...
A somewhat related point: it's only very recently (2023) that chess engines have begun competently mimicking the error patterns of human play. The nerfings of previous decades were all artificial.
I'm an FM and play casual games vs. the various nerfed engines at chess.com. The games are very fast (they move instantly) but there's no possibility of time loss. Not the best way to practice openings but good enough.
The implication for AI / AGI is that humans will never create human-similar AI. Everything we make will be way ahead in many areas and way behind in...
I predicted your odds of winning to be 50% with queen+rook odds, 1% with queen odds, 0.2% with 2 bishops odds, and 0.1% with rook odds. When you started describing strategies tailored to odds games that you were going to use, I felt cheated! I thought you were just going to play your normal 1100-rated game, but I made a big mistake. I forgot that you're a general intelligence, not a narrow, 1100-rated chess AI. Stockfish's NNUE was never trained on positions like the ones at the start of your odds games since they can't be reached from a normal 32-piece st...
The problem is that true AGI is self-improving and that a strong enough intelligence will always either accrue the resource advantage or simply do much more with less. Chess engines like Stockfish do not serve as good analogies for AGI since they don't have those self-referential self-improvement capabilities that we would expect true AGI to have.
Odds games against engine are played with contempt equal to matherial difference.
Sorry you didn't know that beforehand.
As a kid, I really enjoyed chess, as did my dad. Naturally, I wanted to play him. The problem was that my dad was extremely good. He was playing local tournaments and could play blindfolded, while I was, well, a child. In a purely skill based game like chess, an extreme skill imbalance means that the more skilled player essentially always wins, and in chess, it ends up being a slaughter that is no fun for either player. Not many kids have the patience to lose dozens of games in a row and never even get close to victory.
This is a common problem in chess, with a well established solution: It’s called “odds”. When two players with very different skill levels want to play each other, the stronger player will start off with some pieces missing from their side of the board. “Odds of a queen”, for example, refers to taking the queen of the stronger player off the board. When I played “odds of a queen” against my dad, the games were fun again, as I had a chance of victory and he could play as normal without acting intentionally dumb. The resource imbalance of the missing queen made the difference. I still lost a bunch though, because I blundered pieces.
Now I am a fully blown adult with a PhD, I’m a lot better at chess than I was a kid. I’m better than most of my friends that play, but I never reached my dad’s level of chess obsession. I never bothered to learn any openings in real detail, or do studies on complex endgames. I mainly just play online blitz and rapid games for fun. My rating on lichess blitz is 1200, on rapid is 1600, which some calculator online said would place me at ~1100 ELO on the FIDE scale.
In comparison, a chess master is ~2200, a grandmaster is ~2700. The top chess player Magnus Carlsen is at an incredible 2853. ELO ratings can be used to estimate the chance of victory in a matchup, although the estimates are somewhat crude for very large skill differences. Under this calculation, the chance of me beating a 2200 player is 1 in 500, while the chance of me beating Magnus Carlsen would be 1 in 24000. Although realistically, the real odds would be less about the ELO and more on whether he was drunk while playing me.
Stockfish 14 has an estimated ELO of 3549. In chess, AI is already superhuman, and has long since blasted past the best players in the world. When human players train, they use the supercomputers as standards. If you ask for a game analysis on a site like chess.com or lichess, it will compare your moves to stockfish and score you by how close you are to what stockfish would do. If I played stockfish, the estimated chance of victory would be 1 in 1.3 million. In practice, it would be probably be much lower, roughly equivalent to the odds that there is a bug in the stockfish code that I managed to stumble upon by chance.
Now that we have all the setup, we can ask the main question of this article:
What “odds” do I need to beat stockfish 14[1] in a game of chess? Obviously I can win if the AI only has a king and 3 pawns. But can I win if stockfish is only down a rook? Two bishops? A queen? A queen and a rook? More than that? I encourage you to pause and make a guess. And if you can play chess, I encourage you to guess as to what it would take for you to beat stockfish. For further homework, you can try and guess the odds of victory for each game in the picture below.
The first game I played against stockfish was with queen odds.
I won on the first try. And the second, and the third. It wasn’t even that hard. I played 10 games and only lost 1 (when I blundered my queen stupidly).
The strategy is simple. First, play it safe and try not to make any extreme blunders. Don’t leave pieces unprotected, check for forks and pins, don’t try any crazy tactics. Secondly, take every opportunity to trade pieces. Initially, the opponent has 30 points of material, and you have 39, meaning you have 30% more material than them. If you manage to trade all your bishops and knights away, stockfish would have 18 points and you would have 27, a 50% advantage. It also makes the game much simpler and straightforward, as there are far less nasty tactics available when the computer only has two rooks available.
Don’t get me wrong, the computer managed to trick me plenty of times and get pieces trapped. Sometimes I would blunder several pawns or a whole piece. But you need to use pieces to trap pieces, and the computer never had the resources to claw away at me before I traded everything away and crushed it with my extra queen.
Since that was easy, I tried odds of two bishops. I lost the first game, then won the second. Lost the third, won the fourth. Same strategy as the queens, but it was noticeably more difficult. I would often make a small error early on, which would then snowball out to take me down.
Getting cocky, I played with odds of a rook (ostensibly only 1 point of material less than two bishops). I immediately got trounced. I lost the first game, and proceeded to lose like 20 games in a row before I finally managed to eke out a draw.
The problem with rook odds is that the rook is locked away in the corner of the board, and usually is most useful at the end of the game when it has free reign of the board. That means that in the opening of the game, I’m functionally playing stockfish as if I have equal material. And stockfish, with equal material, is a fucking nightmare. It can put it’s full force to bear, poke any weaknesses, render your pieces trapped and useless, and chip away at your lead slowly but surely. By the time I could trade pieces down and get my extra rook in play, the AI had usually chipped away enough at my lead that I was only a little bit up in material. And a little bit up is not enough. Here is an example position:
It looks like I’m completely winning here. I have an extra pawn, and a rook instead of a knight, which is an ostensible +3 material. I even spot the trap laid by stockfish: If I move my rook one up or one down, the knight can jump to e2, forking my king and rook and ensuring a rook for knight trade that would destroy my lead. Thinking I was smart, I put my rook on c4. Big mistake. The AI gave a knight check on h3, driving the king to f1, and then it forked my rook and king with his bishop. Even if I moved my rook to c5, black would have been able to lock it into place by moving the b pawn to b6 and moving the knight to d3, rendering the rook effectively useless. Only moving the rook to b2 would have saved my advantage. If the analysis here was obvious to you, there's a good chance you can beat stockfish with rook odds.
It took me something like 20 games to draw against stockfish, and a further 30 before I finally actually won. In the successful game, I got lucky with an opening that let me trade most pieces equally, and then slowly forced a knight vs knight endgame where I was up two pawns. This might actually be a case where a chess GM would outperform an AI: they can think psychologically, so they can deliberately pick traps and positions that they know I would have difficulty with.
Analysis of my tradeoff of material and ELO:
Here I’ll summarize the results of my little experiment. Remember, initially I had an ELO of ~1100 and a nominal odds of beating stockfish of roughly 1 in a million (but probably less).
Odds of rook:
Material advantage: 14%
Win rate: 2%
Odds of victory boost: 4 orders of magnitude or more
Equivalent ELO: ~2750
Odds of two bishops:
Material advantage: 18%
Win rate: ~50%
Odds of victory boost: 6 orders of magnitude or more
Equivalent ELO: ~3549
Odds of queen:
Material advantage: 30%
Win rate: 90%
Odds of victory boost: 7 orders of magnitude or more
Equivalent ELO: ~3900
I tried a few games with odds of a knight, and got hopelessly crushed every time. However, looking online, I did find that a GM achieved an 80% win rate in a knight-odds game against the Komodo chess engine.
It’s worth pointing out that handicaps become more powerful the better you are at chess. Quoting GM Larry Kaufman on this subject:
This is why my dad could beat me as a kid with queen odds, but stockfish can't beat me now. You need sufficient knowledge of how to game works to utilize your resource advantages properly.
Can brawn beat an AGI?
Robert Miles compared humanity fighting an AGI to an amateur at chess trying to beat a grandmaster. His argument was that delving into the details of such a fight was pointless, because “you just cannot expect to win against a superior opponent”.
The problem here is that I, an amateur, can beat a GM. I can beat Stockfish. All I need is an extra queen.
This is not a trick point. If a rogue AI is discovered early, we could end up in a war where the AGI has a huge intelligence advantage, but humans have a huge resource advantage.
In the view of Miles and others, the initially gargantuan resource imbalance between the AI and humanity doesn’t matter, because the AGI is so super-duper smart, it will be able to come up with the “perfect” plan to overcome any resource imbalance, like a GM playing against a little kid that doesn't understand the rules very well.
The problem with this argument is that you can use the exact same reasoning to imply that’s it’s “obvious” that Stockfish could reliably beat me with queen odds. But we know now that that’s not true. There will always be a level of resource imbalance where the task at hand is just too damn difficult, no matter how high the intelligence. Consider also the implication that a less intelligent, but more controllable AI that we cooperate with might be able to triumph over a much more intelligent rogue AI.
Of course, this little experiment tells us very little about what the equivalent of a “queen advantage” would be in a battle with an AGI. It would definitely need to be far more than literally 30% more people, as we know plenty of examples of human generals winning battles despite being vastly outnumbered. Unlike chess, the real world has secret information, way more possible strategies, the potential for technological advancements, defections and betrayal, etc. which all favor the more intelligent party. On the other hand, the potential resource imbalance could be ridiculously high, particularly if a rogue AI is caught early on it’s plot, with all the worlds militaries combined against them while they still have to rely on humans for electricity and physical computing servers. It’s somewhat hard to outthink a missile headed for your server farm at 800 km/h.
I intend to write a lot more on the potential “brains vs brawns” matchup of humans vs AGI. It’s a topic that has received surprisingly little depth from AI theorists. I hope this little experiment at least explains why I don’t think the victory of brain over brawn is “obvious”. Intelligence counts for a lot, but it ain’t everything.
In order to play stockfish with odds, I went to lichess.org/editor, removed the pieces as necessary, and then clicked “continue from here”, selected “play against computer”, and selected maximum strength computer opponent (level 8). This is full strength stockfish with a depth of 22 moves and calculation time of 1000 ms. I also tested with the higher depth and calculation time of the “analysis board”, and was still able to win easily with queen odds.