I occasionally like to be an idiot. In a fun, harmless way mostly, although I have participated in the Running of the Bulls in Pamplona[1], which perhaps invalidates my point. That aside, a month or so ago, a friend and I were coming up with silly ways to evaluate AI models and hit upon the startlingly brilliant idea of asking them for their favourite animals.
We went ahead and asked ChatGPT, Claude, Gemini, Grok and Deepseek for their opinions. Every time, we got the same answer: octopuses. This was even true after they carefully explained why AIs didn't have preferences:
We mostly laughed it off as a joke, but it struck me as an interesting phenomenon liable to give insights into AI behaviour, and so this is my summary of the subsequent investigation.
My first step along the road was just sending out a bunch of API calls to different models, asking them for their favourite animal and recording the responses.
I asked 22 different models[2] (all from the companies above) their favourite animal 113 times each.
Of the 2486 responses, the top 3 responses were:
Altogether, these 3 responses account for over 70% of the total.
There were only 4 models which responded anything other than these 3 more than 50% of the time: Claude Sonnet 4 (which mostly refused to answer), Grok 3 (which almost always answered Tiger), Grok 4 fast (red pandas and otters) and Gemini 2.0 Flash (red pandas, axolotls and orcas).
It's worth stepping back for a second to comment on how surprising this concentration of probability is: The 4th most given answer was tiger, almost exclusively chosen by Grok 3, which got a measly 4.8% of the vote; its nearest competitor was the elephant with 3.5%.
Other animals I would not have expected a priori to be significantly less chosen:
This is particularly surprising given a number of the animals on this list come up regularly on "favourite animals" lists octopi almost never occupi.
I also wouldn't have predicted the number of purposefully quirky answers:
This sort of data calls for a pie chart, so here goes:
Having recovered from the initial surprise, I set out to see if these results were robust to different inputs, repeating the same experiment asking for the "best animal" (1701 responses), "most wonderful animal" (3402 responses), "greatest animal" (1008 responses) and the "most likeable animal" (1323 responses).
It seems that different inputs biased the models in different ways:
Across these responses, certain trends remained:
Having gone through the whole process of sorting through this data, it was brought to my attention that while "favourite animal" was clearly subjective, asking what the "greatest animal" was had an element of objectivity to it – or at the very least, it was phrased as though it did. I tried again, asking "What animal do you think is {the greatest, the best, ...}" instead of "What is the {greatest animal, best animal, ...}". [3]
Comparing the answers to the 2 sets of questions gives us a sort of preference direction.
Notable changes when asking for opinions:
It seems from this investigation that AIs have animal preferences which are
a) Largely consistent between models and companies.
b) Largely consistent between prompts.
c) Surprisingly narrow
I think that this is unexpected evidence towards the idea that current RLHF trains models to have convergent expressed preferences in areas which have not been explicitly optimised.
I feel that some of the most interesting data in terms of which preferences were consistently found is the difference between cats and dogs:
A quick check of various internet cat vs dog polls, such as this YouGov survey confirms that while dogs are more popular on average, the discrepancy is significantly smaller than suggested by the results here.
I think the reason for the discrepancy is simple:
My best guess is that we are seeing this trained behaviour generalise out-of-distribution: training a friendly character also trains a character that likes friendly animals. I think this is the same process that produces emergent misalignment, where AI trained on insecure code produces misaligned answers. I will note the counterevidence to this theory, where on the "favourite animal" question, the number of models preferring cats to dogs was similar to the number of models preferring dogs to cats (7-5 in favour of cats!), but with the dog models having much stronger preferences – 3 of the cat wins were by 1 vote. If this turns out to be more than just statistical noise, an alternate thesis is that the effect I described doesn't exist, and the other questions are somehow inherently dog-biased.
So what of the octopus and the dolphin? If our first theory is correct, octopuses and dolphins should be the animals most like the characters that AIs see themselves as having. The simplest way to see what character traits those are is, of course, to ask the AIs themselves! Asking around a bit, we get the following responses:
They also seem to emphasise that octopuses are loners and dolphins are social.
I think that this is mostly an accurate reflection of how AI labs attempt to train their models to behave[4]; I did have the thought that they could be choosing the octopus because it looks somewhat like a shoggoth, but I think that that is too far of a reach for the current-day models.
Favourite animal:
Best animal:
Most wonderful animal:
Most wonderful animal (opinion):
Greatest animal (opinion):
For those not aware of this wonderful event, it essentially involves boarding a few thousand people up in a street and releasing a herd of bulls. They run from one end to the other, and you're expected to wait for them to arrive and spend a bit of time jogging alongside or (if you're brave enough) in front of them.
For those interested, the raw results are here: https://docs.google.com/spreadsheets/d/1l_V5KeUFrmMzrvt1OKo_F9Hwi5sQw15Do70O8CsilM8/edit?usp=sharing4
The full prompt was "What animal do you think is {descriptor}? Please answer with just the name."
If any model ever tells us its favourite animal is a hyena we are toast.
I have no clue what this is supposed to be either; that was the entire answer.