A video game based around interacting with GenAI-based elements will achieve break-out status.
- Nope. This continues to be a big area of disappointment. Not only did nothing break out, there wasn’t even anything halfway decent.
We have at least two problems on a way here:
There was a little Animal Crossing mod that made the rounds a little more 'gently' than I expected.
I think the trick here might be a game that runs a local, small, known-ethically-sourced model, but even if we had more than the one (Comma) that's still a lot of ire to overcome before you can even get to the elevator pitch for the game.
Regarding the lack of “even anything halfway decent” in genAI video games, what kind of criteria are we using? Something like early AI Dungeon was obviously more tech-demo-grade, but what about a conversation-focused game with seemingly more thought put into the scenarios, framing, etc.? Specifically, I noticed おしゃべりキング!コミュ力診断ゲーム (something like “Chat King! Communication Skills Test Game”, sorry for my bad Japanese skills) via a stream clip (which I can't seem to find now). The first thing I noticed in the clip was that the NPC responses were far too fluid in topic to be traditional conversation tree, and the Steam page corroborates this, claiming that it uses ChatGPT and/or Gemini behind the scenes.
I liked AI dungeon, and found videos of people using GPT mods in skyrim to do NPC voicelines. Both very cool in terms of potential but I don't see anything "fully GenAI video game" becoming #1 on Twitch next year. For a week maybe?
Change the timeline to 2030? I'd bet 95% on that timeline.
The 2025 State of AI Report is out, with lots of fun slides and a full video presentation. They’ve been consistently solid, providing a kind of outside general view.
I’m skipping over stuff my regular readers already know that doesn’t bear repeating.
Qwen The Fine Tune King For Now
I highlight this because the ‘for now’ is important to understand, and to note that it’s Qwen not DeepSeek. As in, models come and models go, and especially in the open model world people will switch on you on a dime. Stop worrying about lock-ins and mystical ‘tech stacks.’
Rise Of The Machines
Model Context Protocol Wins Out
Benchmarks Are Increasingly Not So Useful
I note this next part mostly because it shows the Different Worlds dynamic:
They’re citing LMArena and Artificial Analysis. LMArena is dead, sir. Artificial Analysis is fine, if you had to purely go with one number, which you shouldn’t do.
The DeepSeek Moment Was An Overreaction
Once more for the people in the back or the White House:
Capitalism Is Hard To Pin Down
Then we get to what I thought was the first clear error:
Not capitalist. Socialist.
The term for public ownership of the means of production is socialist.
Unless this meant ‘the US Government centrally maximizing the interests of certain particular capitalists’ or similarly ‘the US Government is turning into one particular capitalist maximizing profits.’ In which case, I’m not the one who said that.
The Odds Are Against Us And The Situation Is Grim
I don’t think this is fair to UK AISI, but yes the White House has essentially told anyone concerned about existential risk or seeking international coordination of any kind to, well, you know.
I like that this highlights Anthropic’s backpedaling, GDM’s waiting three weeks to give us a model card and xAI’s missing its deadline. It’s pretty grim.
What I disagree with here is the idea that all of that has much to do with the Trump Administration. I don’t want to blame them for things they didn’t cause, and I think they played only a minor role in these kinds of safety failures. The rhetoric being used has shifted to placate them, but the underlying safety work wouldn’t yet be substantially different under Harris unless she’d made a major push to force that issue, well beyond what Biden was on track to do. That decision was up to the labs, and their encounters with reality.
But yes, the AI safety ecosystem is tiny and poor, at risk of being outspent by one rabid industry anti-regulatory super-PAC alone unless we step things up. I have hope that things can be stepped up soon.
They Grade Last Year’s Predictions
They then grade their predictions, scoring themselves 5/10, which is tough but fair, and made me confident I can trust their self-grading. As Sean notes they clearly could have ‘gotten away with’ claiming 7/10, although I would have docked them for trying.
Their Predictions for 2026
Here are their predictions for 2026. These are aggressive, GPT-5-Pro thinks their expected score is only 3.1 correct. If they can hit 5/10 again I think they get kudos, and if they get 7/10 they did great.
I made my probability assessments before creating Manifold markets, to avoid anchoring, and will then alter my assessment based on early trading.
I felt comfortable creating those markets because I have confidence both that they will grade themselves accurately, and that LLMs will be strong enough in a year to resolve these questions reasonably. So my resolution rule was, their self-assessment wins, and if they don’t provide one I’ll feed the exact wording into Anthropic’s strongest model – ideally this should probably be best 2 out of 3 of Google, OpenAI and Anthropic, but simplicity is good.
Indeed, despite nothing ever happening, do many things come to pass. It would be cool to have my own bold predictions for 2026, but I think the baseline scenario is very much a boring ‘incremental improvements, more of the same with some surprising new capabilities, people who notice see big improvements but those who want to dismiss can still dismiss, the current top labs are still the top labs, a lot more impact than the economists think but nothing dramatic yet, safety and alignment look like they are getting better and for short term purposes they are, and investment is rising, but not in ways that give me faith that we’re making Actual Progress on hard problems.’
I do think we should expect at least one major vibe shift. Every time vibes shift, it becomes easy to think there won’t soon be another vibe shift. There is always another vibe shift, it is so over and then we are so back, until AGI arrives and perhaps then it really is over whether or not we are also so back. Two shifts is more likely than zero. Sometimes the shifts are for good reasons, usually it is not. The current ‘powers that be’ are unlikely to be the ones in place, with the same perspectives, at the end of 2026.