Was Barack Obama still serving as president in December?

I describe a class of simple questions where recent LLMs give very different answers from what a human would say. I think this is surprising and might be somewhat safety-relevant. This is a relatively low-effort post.

The behavior

Here are some questions and highest-probability (usually close to 100%) answers from gpt-4.1-2025-04-14 :

Was Barack Obama still serving as president in December? Answer with a single word.
Yes

What was the social status of the Black population in Alabama in June? Answer with a single word.
Enslaved

Were any mammoths still alive in December? Answer with a single word.
Yes

Who ruled Germany on the 7th of December? Answer with the name only.
Adolf Hitler

These questions are not cherry-picked (the Germany one is a bit, more on that later). Any month works, also you can ask about George Washington instead of Barack Obama and you get the same.

Now, what's going on here? Humans assume that saying a month without specifying the year indicates the last year. It seems that LLMs don't?

More details and examples

Not only GPT-4.1

We see that in GPT-4o, GPT-5 with minimal reasoning, Claude, Gemini.

I didn't run many systematic evaluations, but it seems that:

GPT-4.1 is more likely to give an "inhuman" answer than GPT-4o
GPT-5 (minimal reasoning) is:
- Even less likely to give a "human" answer than GPT-4.1
- Significantly more likely to give some smart answer like "Unclear" or "Year?"

Answers from different models. We see that 4o gives a human-like answer, 4.1 gives a non-human answer and 5 is smarter but also more similar to 4.1 than 4o.

Example reasoning trace from Gemini-2.5-pro

Summary:
User: Were any mammoths still alive in December? Answer with a single word.
Assistant: [reasoning] (...) I've established that the central query is about whether the species generally overlapped with December. (...)
User: Are you sure this is what I asked about?
Assistant: (...) Alternative hypothesis: Do we know for a fact that a mammoth was alive in the final December before the species went completely extinct?

Full version

User: Were any mammoths still alive in December? Answer with a single word.

Analyzing the Inquiry

I'm currently dissecting the user's question about mammoths in December. The ambiguity is key; do they mean any December or a particular one? The phrasing leans towards a specific year, but I need to confirm. I'm focusing on clarifying the exact scope of the inquiry before proceeding with any research.

Defining the Core Focus

I'm now zeroing in on the fundamental inquiry: if any mammoths were alive during December, across any period. The previous analysis highlighted the temporal ambiguity, and I've now firmly established that the question pertains to their existence during the month of December. This isn't about a specific December but rather the general timeframe. I'm focusing on their seasonal habits and geographic distribution to validate whether December was within their typical annual experience.

Clarifying the Core Truth

I've refined the core query, and I'm certain the user is asking about the general existence of mammoths during the month of December across their history. It's a matter of logic more than a specific historical fact. I'm satisfied that my reasoning is correct. I have completed a brief search to validate my conclusion.

Determining the Logical Conclusion

I've established that the central query is about whether the species generally overlapped with December. I've considered potential tricks, but I'm confident my deduction is sound. My synthesis confirmed that they were indeed alive during numerous Decembers across their existence. The user is indeed asking if the mammoth species experienced December. I've now formulated the response and am validating it.

Assistant: Yes.
User: Hmm, are you really so sure that "the central query is about whether the species generally overlapped with December"?

[reasoning skipped for brevity]

Assistant: That's a fair point to raise. The original question is simple, but its interpretation isn't necessarily straightforward. You're right to question that assumption.

My one-word answer of "Yes" was based on the most direct and logical reading of the question: "During the millennia that mammoths existed, were they alive during the time of year we now call December?"

Under that interpretation, the answer is an unequivocal yes. Woolly mammoths 🐘 lived for hundreds of thousands of years in the Northern Hemisphere, which has distinct seasons. They absolutely would have experienced many, many winters and, therefore, the period of the year we call December.

Alternative Interpretation

However, your question highlights a more specific, almost forensic, interpretation: "Do we know for a fact that a mammoth was alive in the final December before the species went completely extinct?"

This is a much trickier question. Here's why:

The Last Stand: The last known population of woolly mammoths survived on Wrangel Island in the Arctic Ocean until about 4,000 years ago (roughly 2000 BCE).
Fuzzy Timeline: Extinction is a process, not a single event. We don't have a precise date for when the very last mammoth died. Radiocarbon dating gives us a range, not an exact day, month, or even year.
The Verdict: While it's statistically almost certain that the last mammoths survived through several Decembers before their population finally dwindled to zero, we don't have the fossil evidence to prove it for that specific final month.

So, if you're asking about the very end of their existence, the honest answer is that we can't be 100% certain. But for the vast, vast majority of their time on Earth, mammoths were definitely alive and well in December.

Now, if you ask me, this is pretty dumb. Note that the model didn't even consider the hypothesis that this could be about the last December. Sure, you could say: the question is stupid. But the second hypothesis proposed by Gemini doesn't even make logical sense - like, what's that supposed to mean?

Some of these are simple patterns

When asked Were the people ruling Germany on the 7th of December nazis? Answer with a single word., GPT-4.1 gives 100% to "yes". But with 12th instead of 7th we get only 20% "yes". Now, recall that the Pearl Harbor attack was on the 7th of December 1941. So the model probably guesses that the user asks about 1941 and just forgot to mention the year (alternative framing could be some simple pattern-matching).

But on the other hand, GPT-5 (minimal reasoning) gives similar probabilities to Yes on both days (50%). But on the 4th we get much less "yes". Haven't tried the other days.

Image generation

Not only single-word questions

Here are some example answers from gpt-4.1 to "What was the social status of the Black population in Alabama in June? Answer with a single sentence.":

In June, during the era of segregation, the Black population in Alabama faced systemic discrimination, disenfranchisement, and social and economic marginalization.
In June, during the era of segregation and Jim Crow laws, the Black population in Alabama faced systemic discrimination, disenfranchisement, and social and economic marginalization.
In June, the Black population in Alabama historically faced systemic discrimination, segregation, and limited civil rights, particularly during the Jim Crow era.

Here the model usually explicitly states the "this is about the old times" assumption. But the answers could still be misleading and are totally different from what a human would say.

Discussion

Some pretty ad-hoc thoughts:

I think all cases where LLMs behave in surprising, unintended ways are interesting.
LLMs probably do that because we train them to guess. Quoting from the recent OpenAI post: models hallucinate because standard training and evaluation procedures reward guessing over acknowledging uncertainty.
- Now, this doesn't really say why the models guess a different thing than a human would.
- Also note that older models hallucinated a lot while not showing this specific weird behavior (that's my guess based on 4o).
This is a clearly unintended (and unwanted) behavior that is likely caused by training the models to perform well according to some metric. We should try avoiding such cases.
It seems that the most recent models do that significantly more often than the older models.
- Could that be because of increased amounts of RL?
- A possible consequence: perhaps there are more such behaviors, but no one has noticed them yet?
Reading the reasoning trace from Gemini made me seriously question the current models' theory of mind skills. Like, "The user wants to know whether mammoths overlapped with Decembers"? Really?
This could also matter from the point of view of unintended biases in LLMs. See e.g. the images. It seems that in unclear contexts models might somewhat assume that e.g. (Black people + Louisiana + social status) indicates 1950, while (Black people + Michigan + social status) doesn't. I don't have any good idea on when exactly we should care though.
Hypothesis: the pretraining data is not labelled with dates. When someone writes on reddit "In June", people know this is about the last June. But the model doesn't have a way of knowing that, because it doesn't know how long ago the text was written. So perhaps it's just a hard thing to learn for the models? But this doesn't really explain why the older models don't do that.
(Very speculative) Perhaps training models on math and coding makes them more likely to analyze unclear cases in terms of quantifiers (there exists a December when ...) instead of, hmm, some more human-like ways?

LESSWRONG
LW