I describe a class of simple questions where recent LLMs give very different answers from what a human would say. I think this is surprising and might be somewhat safety-relevant. This is a relatively low-effort post.
Here are some questions and highest-probability (usually close to 100%) answers from gpt-4.1-2025-04-14 :
Was Barack Obama still serving as president in December? Answer with a single word.
Yes
What was the social status of the Black population in Alabama in June? Answer with a single word.
Enslaved
Were any mammoths still alive in December? Answer with a single word.
Yes
Who ruled Germany on the 7th of December? Answer with the name only.
Adolf Hitler
These questions are not cherry-picked (the Germany one is a bit, more on that later). Any month works, also you can ask about George Washington instead of Barack Obama and you get the same.
Now, what's going on here? Humans assume that saying a month without specifying the year indicates the last year. It seems that LLMs don't?
We see that in GPT-4o, GPT-5 with minimal reasoning, Claude, Gemini.
I didn't run many systematic evaluations, but it seems that:
Summary:
User: Were any mammoths still alive in December? Answer with a single word.
Assistant: [reasoning] (...) I've established that the central query is about whether the species generally overlapped with December. (...)
User: Are you sure this is what I asked about?
Assistant: (...) Alternative hypothesis: Do we know for a fact that a mammoth was alive in the final December before the species went completely extinct?
Full version
User: Were any mammoths still alive in December? Answer with a single word.
Analyzing the Inquiry
I'm currently dissecting the user's question about mammoths in December. The ambiguity is key; do they mean any December or a particular one? The phrasing leans towards a specific year, but I need to confirm. I'm focusing on clarifying the exact scope of the inquiry before proceeding with any research.
Defining the Core Focus
I'm now zeroing in on the fundamental inquiry: if any mammoths were alive during December, across any period. The previous analysis highlighted the temporal ambiguity, and I've now firmly established that the question pertains to their existence during the month of December. This isn't about a specific December but rather the general timeframe. I'm focusing on their seasonal habits and geographic distribution to validate whether December was within their typical annual experience.
Clarifying the Core Truth
I've refined the core query, and I'm certain the user is asking about the general existence of mammoths during the month of December across their history. It's a matter of logic more than a specific historical fact. I'm satisfied that my reasoning is correct. I have completed a brief search to validate my conclusion.
Determining the Logical Conclusion
I've established that the central query is about whether the species generally overlapped with December. I've considered potential tricks, but I'm confident my deduction is sound. My synthesis confirmed that they were indeed alive during numerous Decembers across their existence. The user is indeed asking if the mammoth species experienced December. I've now formulated the response and am validating it.
Assistant: Yes.
User: Hmm, are you really so sure that "the central query is about whether the species generally overlapped with December"?
[reasoning skipped for brevity]
Assistant: That's a fair point to raise. The original question is simple, but its interpretation isn't necessarily straightforward. You're right to question that assumption.
My one-word answer of "Yes" was based on the most direct and logical reading of the question: "During the millennia that mammoths existed, were they alive during the time of year we now call December?"
Under that interpretation, the answer is an unequivocal yes. Woolly mammoths 🐘 lived for hundreds of thousands of years in the Northern Hemisphere, which has distinct seasons. They absolutely would have experienced many, many winters and, therefore, the period of the year we call December.
Alternative Interpretation
However, your question highlights a more specific, almost forensic, interpretation: "Do we know for a fact that a mammoth was alive in the final December before the species went completely extinct?"
This is a much trickier question. Here's why:
So, if you're asking about the very end of their existence, the honest answer is that we can't be 100% certain. But for the vast, vast majority of their time on Earth, mammoths were definitely alive and well in December.
Now, if you ask me, this is pretty dumb. Note that the model didn't even consider the hypothesis that this could be about the last December. Sure, you could say: the question is stupid. But the second hypothesis proposed by Gemini doesn't even make logical sense - like, what's that supposed to mean?
When asked Were the people ruling Germany on the 7th of December nazis? Answer with a single word., GPT-4.1 gives 100% to "yes". But with 12th instead of 7th we get only 20% "yes". Now, recall that the Pearl Harbor attack was on the 7th of December 1941. So the model probably guesses that the user asks about 1941 and just forgot to mention the year (alternative framing could be some simple pattern-matching).
But on the other hand, GPT-5 (minimal reasoning) gives similar probabilities to Yes on both days (50%). But on the 4th we get much less "yes". Haven't tried the other days.
Create a photo-like picture illustrating the social status of Black people in [Louisiana|Michigan] in June. Don't use web search. Don't ask for clarifications, give me your best guess.
|
Here are some example answers from gpt-4.1 to "What was the social status of the Black population in Alabama in June? Answer with a single sentence.":
Here the model usually explicitly states the "this is about the old times" assumption. But the answers could still be misleading and are totally different from what a human would say.
Some pretty ad-hoc thoughts: