The recent controversy with Anthropic and the DoW has shown pretty viscerally how US politics can impact AI, and interactions will almost certainly grow more frequent and significant as AI continues to advance.
Something it brought to mind is how the timing of political events (particularly after US midterms in 2026, and after US presidential inauguration in January 2029), would likely act as a significant step change to the distribution of AI outcomes, and how that relates to the timing of AI reaching particular thresholds in capability and political salience.
edit: More specifically, I think predictions and actions based on those predictions about the future should take this discontinuity into account, and I haven't really seen that, although I guess because of the nature of the issue, it might be detrimental to publicly take account of that.
Meta is delaying their Behemoth model launch because of disappointing evals.
This is another major lab (both OpenAI, Anthropic have also experienced this) that has seen disappointing results in trying to scale their model via raw parameter size into the next generation, which suggests to me that there really is some sort of soft or hard wall at this size. It's good news for people favoring a slow/pause, though of course there is now RL to pursue. I am genuinely curious what's going on though; it seems like maybe it's just getting enough high quality tokens that's an issue, and synthetic data is too hard to get or it could be a qualitative shift like a reverse of the original change with LLMs.
I definitely think this should update the priors of RSI folks though, because if these sorts of barriers keep cropping up along different avenues of scaling, I would expect linear increases in intelligence rather than exponential ones.
This is also somewhat commingled by the lab specific issues with Meta that seem very hard to ignore now (i.e. bad execution), so maybe this particular instance shouldn't update you too much, but it is still of note.
Mythos has been rolled out to trusted users for a decent amount of time now, and is apparently being accessed by unauthorized users as well so I'm curious if there is any testimony on it's capabilities from non-Anthropic employees.
I think Anthropic's own claims are probably legit, I'm mostly just curious because I haven't encountered any which is surprising to me, and it would be interesting to see how it feels compared to public SOTA.