I think that there's a couple of things which are quite clearly different from MIRI's original arguments:
I still think that the basic argument of "if you take something you don't understand and can't control very well and scale it up to superintelligence, that seems bad" holds.
I just played Gemini 3, Claude 4.5 Opus and GPT 5.1 at chess.
It was just one game each but the results seemed pretty clear - Gemini was in a different league to the others. I am a 2000+ rated player (chess.com rapid), but it successfully got a winning position multiple times against me, before eventually succumbing on move 25. GPT 5.1 was worse on move 9 and losing on move 12, and Opus was lost on move 13.
Hallucinations held the same pattern - ChatGPT hallucinated for the first time on move 10, and hallucinated the most frequently, while Claude hallucinated for the first time on move 13 and Gemini made it to move 20, despite playing a more intricate and complex game (I struggled significantly more against it).
Gemini was also the only AI to go for the proper etiquette of resigning once lost - GPT just kept on playing down a ton of pieces, and Claude died quickly.
Games:
Gemini: https://lichess.org/5mdKZJKL#50
Claude: https://lichess.org/Ht5qSFRz#55
GPT: https://lichess.org/IViiraCf
I was white in all games.
I feel like the truth may be somewhere in between the two views here - there's definitely an element where people will jump on any untruths said as lies, but I will point to the recent AI Village blog post discussing lies and hallucinations as evidence that the untruths said by AIs have a tendency to be self-serving.
Dumb idea for dealing with distribution shift in Alignment:
Use your alignment scheme to train the model on a much wider distribution than deployment; this is one of the techniques used to ensure proper generalisation of training of quadruped robots in this paper.
It seems to me that if you make your training distribution wide enough this should be sufficient to cover any deployment distribution shift.
I fully expect to be wrong and look forward to finding out why in the comments.
Hmm, interesting. I will note that Deepseek didn't seem to have much of a cat affinity - 1 and 3 respectively for chat and reasoner. chat was very pro-octopus, and didn't really look at much else, reasoner was fairly broad and pro-dog (47)
I think I'm slightly less interested in the "less dogs" aspect, and more interested in the "more cats" aspect - there were a fair few models which completely ignored dogs for the "favourite animal" question, but I think the highest ratio of cats was Sonnet 3.7 with 36/113. Your numbers are obliterating that.
I wonder if there's any reason Kimi would be more cat inclined than any other model?
Ok so looking at the link, it seems like that system prompt was released a year ago. I imagine that the current version of Kimi online is using a different system prompt. I think that might be enough to explain the difference? Admittedly it also gave me octopus when I turned thinking off.
Damn it, I was about to suggest the difference was in the system prompt. K2 Thinking + System prompt looks somewhat closer to what I was getting? Still somewhat off though.
Also yeah I found a bunch of animals that a smart person would show off about - axolotls and tardigrades for instance. I guess that the most precise idea of this is that it's got the persona of an intelligent person, and as such chooses favorite animals both in a "I can relate to this way" and in a "I can show off about this way" - I would guess that humans are the same?
Interesting post! I think that you got a heavier weight of octopuses partly down to the narrower range of models you tested (the 30% partly came out because of the range of models tested - individual models had stronger preferences).
I think there's also a difference in the system prompt used for API vs chat usage (in that I imagine there is none for the API). This would be my main guess for why you got significantly more corvids - I've seen both this and the increased octopus frequency when doing small tests in chat.
On the actual topic of your post, I'd guess that the conclusion is AI's metacognitive capabilities are situation dependent? The question would then be in what situations it can/can't reason about its thought process.