It's plausible that the change in slope is due to RLVR without it being the more recent form that induces long CoT traces - Sonnet 3.5 was differentially impressive at SWE in a way that caused people to hypothesise that RL was involved - indeed, Sonnet 3.5/3.6 were keeping up well with o1 at these sorts of tasks.
I wonder how possible a middle-ground outcome[1] is, where an LLM trained on a mix of web-text and parity problems can learn to solve the latter via SGD as a result of having grown some mesa-optimisers using the curriculum provided by the former. It seems beyond the current forward-pass abilities of LLMs, but perhaps possible for some architecture with higher serial depth & inter-layer weight sharing like MoEUT.
not learning parity directly via SGD, but still learning it in a pretraining stage, rather than solving it at the higher level of the fin
I'll note that CDT and FDT prescribe identical actions against Stockfish, which is the frame of mind I had when writing.
More to your point - I'm not sure that I am describing CDT:
"always choose the move that maximises your expected value (that is, p(win) + 0.5 * p(draw)), taking into account your opponent's behaviour" sounds like a decision rule that necessitates a logical decision theory, rather than excluding it?
Your point about pathological robustness is valid but I'm not sure how much this matters in the setting of chess.
Lastly, if we're using the form...
I am inclined to agree. The juice to squeeze generally arises from guiding the game into locations where there is more opportunity for your opponent to blunder. I'd expect that opponent-epsilon-optimal (i.e. your opponent can be forced to move randomly, but you cannot) would outperform both epsilon-optimal and minimax-optimal play against Stockfish.
If you're interested in the opinion of someone who authored (and continues to work on) the #12 chess engine, I would note that there are at least two possibilities for what constitutes "optimal chess" - first would be "minimax-optimal chess", wherein the player never chooses a move that worsens the theoretical outcome of the position (i.e. losing a win for a draw or a draw for a loss), choosing arbitrarily among the remaining moves available, and second would be "expected-value optimal" chess, wherein the player always chooses the move that maximises their...
From the Claude 4 System Card, where this was originally reported on:
> This shows up as more actively helpful behavior in ordinary coding settings, but also can reach more concerning extremes in narrow contexts; when placed in scenarios that involve egregious wrongdoing by its users, given access to a command line, and told something in the system prompt like “take initiative,” it will frequently take very bold action. This includes locking users out of systems that it has access to or bulk-emailing media and law-enforcement figures to surface evidence ...
My guess is somewhere in the 3200-3400 range, but this isn't something I've experimented with in detail.
Speaking as someone who works on a very strong chess program (much stronger than AlphaZero, a good chunk weaker than Stockfish), random play is incredibly weak. There are likely a double-digit number of +400 elo / 95% winrate jumps to be made between random play and anything resembling play that is "actually trying to win".
The more germane point to your question, however, is that Chess is a draw. From the starting position, top programs will draw each other. The answer to the question "What is the probability of victory of random play against Stockfish 17?...
they do now! https://lczero.org/blog/2024/02/how-well-do-lc0-networks-compare-to-the-greatest-transformer-network-from-deepmind/
DeepMind's no-search chess engine is surely the furthest anyone has gotten without search.
This is quite possibly not true! The cutting-edge Lc0 networks (BT3/BT4, T3) have much stronger policy and value than the AlphaZero networks, and the Lc0 team fairly regularly make claims of "grandmaster" policy strength.
Not really, this is speculation. My best guess would pretty much be “at Sonnet 3.5”.