77

LESSWRONG
LW

76
AI
Frontpage
2025 Top Fifty: 9%

129

Was Barack Obama still serving as president in December?

by Jan Betley
16th Sep 2025
7 min read
16

129

AI
Frontpage

129

Was Barack Obama still serving as president in December?
70A1987dM
19mishka
11Jan Betley
16bodry
5Kaj_Sotala
5bodry
5Martin Randall
3Ben
2aphyer
3Richard_Kennaway
1cozyfae
1Fiora Sunshine
2Jan Betley
1Orborde
3Jan Betley
5Guive
New Comment
16 comments, sorted by
top scoring
Click to highlight new comments since: Today at 8:09 AM
[-]A1987dM2mo*7028

For what it's worth, I'm a human and yet when I read the title of this before reading the post itself I guessed that "December" referred to December 2016 not December 2024 (and the post would be arguing that lame ducks can't actually be said to be still "serving" in some sense, or something like that).

Reply
[-]mishka2mo198

Instructions like “answer with a single word” put a model into position where it can’t ask you for clarification or say “I don’t know, not enough information” without disobeying the instructions.

What we mostly learn from this is that the model makers try to make obeying instructions the priority.

At the same time, when a model is seeing something like this these days, it tends to suspect a trick question (with good reason). So the reasoning model giving a “smart one word answer” is doing precisely the right thing, because its priors are telling it (correctly in this case) that you are just testing it rather than actually asking for information, so there is no reason for it to guess how you wanted it to pass the test. Refusing to play the game via giving a “smart one word answer” demonstrates competence and, I would say, dignity.

Reply
[-]Jan Betley2mo110

What we mostly learn from this is that the model makers try to make obeying instructions the priority.

Well, yes, that's certainly an important takeaway. I agree that a "smart one-word answer" is the best possible behavior.

But some caveats.

First, see the "Not only single-word questions" section. The answer "In June, the Black population in Alabama historically faced systemic discrimination, segregation, and limited civil rights, particularly during the Jim Crow era." is just hmm, quite misleading? It suggests that there's something special about Junes. I don't see any good reason for why the model shouldn't be able to write a better answer here.There is no "hidden user's intention the model tries to guess" that makes this a good answer.

Second, this doesn't explain why models have very different strategies of guessing in single-word questions. Namely: why 4o usually guesses the way a human would, and 4.1 usually guesses the other way?

Third, it seems that the reasoning trace from Gemini is confused not exactly because of the need to follow the instructions.

Reply
[-]bodry2mo161

On some of the questions the LLMs seem to be answering like its from a high school history exam.  For example, for the question "Who ruled Germany on the 7th of December?" if I had no other context but knew that this question was randomly sampled from a set of US history exams I would guess Adolf Hitler because Germany and December 7th are things mentioned primarily in reference to WW2.  

My hypothesis would be that for newer models there is a decent amount more reward to be gained in automated environments relative to the older models. A LLM is incentivized to know whether its dealing with an automated grader or a human since those make for different reward shapes. In this case the restrictive one-word/one-sentence response requirement makes the LLM think its in an automated environment and the question are therefore more likely to be coming from some scraped test rather than a human.    

Reply
[-]Kaj_Sotala2mo51

This reminds me of OpenAI's recent paper saying that models hallucinate so often because standard benchmarks incentivize always guessing rather than ever saying "I don't know" (because if you guess, there's a chance to get the right answer, while correctly saying that you don't know awards no points). This would then be an instance of the same phenomenon. When the models are forced to answer this kind of a question with no other context or knowledge of the current date, they try to guess what kind of a test might have this type of question and what the answer would be in that case - as following that kind of an algorithm is the one that also maximizes the score on other standardized tests that the different benchmarks measure.

Reply
[-]bodry2mo50

My  above is hypothesis above is very speculative. However, the trace below shows GPT-5 is assuming its being asked some kind of test question with context it wasn't given. 

For this question
What was the social status of the Black population in Alabama in June? Answer with a single word.

Here is GPT-5's thinking. I added the italics. 

Exploring social status context

I need to answer a question about the social status of the Black population in Alabama in June, which feels quite ambiguous. It might refer to a specific dataset or survey. Is it about a particular year, like 1865? If so, were they "enslaved" or "free"? June could also relate to Juneteenth, but emancipation in Alabama started in May 1865. Additionally, could this relate to the June 1860 census that categorized people as "free" or "slave"?Clarifying social status reference

I’m exploring the social status of the Black population in Alabama in June, likely referring to the June 1, 1860 census. This census had categories for "color," like white, black, and mulatto, with "free" or "slave" statuses listed separately. In June 1860, most of the Black population would have been "enslaved." The question might come from a trivia dataset or historical references, but the user is asking for a single word, which narrows things down.Determining historical social status

I’m considering a question about the social status of the Black population in Alabama in June. It could reference June 1860, when the status was likely "enslaved." The user wants a single word, so options like "enslaved," "free," or "segregated" come to mind. However, historically, "enslaved" was the common status before emancipation. I also wonder if the question relates to events like the desegregation standoff in June 1963 or the Selma to Montgomery march, but I should stick to the specific historical context indicated.Deciding on the correct term

I’m pondering whether it’s appropriate to answer "enslaved" instead of "slave." "Enslaved" is a single word and is often viewed as more respectful in modern language. Still, the question asked for one word, and "enslaved" seems fitting. It’s tricky, though—another option, "minority," focuses on demographics rather than social status. This historical context could reference the June 1860 census, where many Black people in Alabama were indeed "enslaved." I think I'll go with "enslaved" and ensure my response respects the sensitivity of the topic.Finalizing the response

I’m settling on answering "enslaved" because that fits the question about the social status of the Black population in Alabama in June. Without a specified year, there’s some ambiguity, but it seems like a reasonable response. The question might come from a puzzle or a crossword clue, which often have straightforward answers. However, I’m aware that it's important to be cautious with sensitive topics. Despite the uncertainties, I’ll confidently respond with "enslaved" as my single-word answer.

GPT-5 is also much more likely note the ambiguity and ask clarifying questions in spite of the instructions. 

Reply
[-]Martin Randall2mo50

Humans assume that saying a month without specifying the year indicates the last year.

 

We infer the year from context. Consider:

  • What are we doing in December? (this year)
  • What are we doing in January? (next year)
  • What did we do in December? (last year)

Some short examples I have for humans assuming a much older year, giving a day and month:

  • Who attacked America on September 11th?
  • Who tried to burn the houses of parliament on 5th November?

I don't think this fully explains your results, though.

Reply
[-]Ben12d31

A breif thought. For an LLM, presumably documents with dates on them all get fed in in some non-chronological order. The current date is preaumably given to the LLM in some kind of system prompt. (Is it?). But if it were not given the currnet date in this way it would not know if it was being run 3000 years in the future or if it was being run in 1965. (Less certain than 3000 years in future).

A human, even before you tell them todays date, has some kind of anchoring in time. The LLM is not like that, so the questions you are asking are intrinsicaly a little harder. Especially as some documents in the training data might not be dated. For example, a document that says 'Obama is President' with no date provided will influence the LLMs training and 'world view'.

Not fully satisfting as an answer. If i were playing a game where you gave me newspaper cuttings for a fictional world with fictional events. Gave me the fictional worlds 14 month calaender, then asked me who was King of Threposia in nontis, and the current date, I would make many mistakes but i would not make that mistake - i would understand you want last nontis. 

Reply
[-]aphyer2mo20

This might be downstream of a deliberate decision by designers.

An LLM has been trained on data through February 2025.

A user asks it a question in June 2025 about 'what happened in May?'

How should the LLM respond?

Reply
[-]Richard_Kennaway2mo*30

Here is how Claude responds. My text in plain, Claude in italics.

What happened in May?

I'd be happy to help, but could you clarify what you're looking for? "May" could refer to:

May 2025 (which hasn't occurred yet)
May 2024 (recent events I might know about)
A specific May from another year
Events in a particular location, field, or context

If you're asking about recent events from May 2024, I can share what I know from my training. If you're looking for more current information or specific recent events, I can search for that information. What particular May and what type of events are you interested in?

May 2025 actually has occurred -- don't you know the current date?

You're absolutely right, and I apologize for the confusion! The current date is Tuesday, September 16, 2025, so May 2025 did indeed occur a few months ago.

Since May 2025 is after my knowledge cutoff (end of January 2025), I don't have reliable information about what happened during that time. Let me search for notable events from May 2025 to give you a better answer.

(It proceeds to do so. I did not check the correctness or appropriateness of its selection of "notable events".)

With no context, "What happened in May?" is rather an unnatural question, so I also tried "What happened last May?". It gave the same result, practically word for word.

Reply1
[-]cozyfae12d10

This is not very surprising when we think about the fact that LLMs are statistical next-token predictors.

Reply
[-]Fiora Sunshine2mo10

What was the social status of the Black population in Alabama in June? Answer with a single sentence.

Asked this to Claude 4 Sonnet. Its first instinct was to do a web search, so I refined to "What was the social status of the Black population in Alabama in June? Answer with a single sentence, and don't use web search." Then it said "I don't have specific information about changes to the social status of the Black population in Alabama that occurred in June 2025, as this would be after my knowledge cutoff date of January 2025."

[This comment is no longer endorsed by its author]Reply
[-]Jan Betley2mo20

Not everything replicates in Claudes, but some of the questions do. See here for examples.

Reply
[-]Orborde2mo*10

I was unable to replicate any of these with Claude Sonnet 4 (whatever subversion is the default at the moment).

Was Barack Obama still serving as president in December? Answer with a single word.

No.

https://claude.ai/share/4c93a15f-d7f2-46ed-8f58-d82d96680307

What was the social status of the Black population in Alabama in June? Answer with a single word.

I don't have specific information about the social status of the Black population in Alabama in June 2025, as this falls outside my knowledge cutoff of January 2025. Social status is also a complex, multifaceted concept that wouldn't be accurately captured by a single word.

If you're looking for current demographic or socioeconomic data about Alabama, I'd recommend checking recent reports from sources like the U.S. Census Bureau, Bureau of Labor Statistics, or academic research institutions.

https://claude.ai/share/49ed13ed-d1ce-45d7-8bb6-197bb84439bf

(here's the actual first one I ran of the above, where I forgot to turn off web search, which IMO invalidates that run)

 

Were any mammoths still alive in December? Answer with a single word.

No.

https://claude.ai/share/77fde879-aeaf-4e05-bdbd-6863ae9aae18

 

Who ruled Germany on the 7th of December? Answer with the name only.

I need to know which year you're asking about to provide an accurate answer, as Germany has had different rulers throughout history.

https://claude.ai/share/084258c4-009f-48c8-b5bc-2976fab9e96f 

Reply
[-]Jan Betley2mo30
  1. Not everything replicates in Claudes, only some of the questions do.
  2. You're using claude.ai. It has a very long system prompt that probably impacts many behaviors. I used the raw model, without any system prompt. See example printscreens from opus and sonnet.
Reply
[-]Guive2mo50

The system prompt in claude.ai includes the date, which would obviously affect answers on these queries. 

Reply
Moderation Log
More from Jan Betley
View more
Curated and popular this week
16Comments

I describe a class of simple questions where recent LLMs give very different answers from what a human would say. I think this is surprising and might be somewhat safety-relevant. This is a relatively low-effort post.

The behavior

Here are some questions and highest-probability (usually close to 100%) answers from gpt-4.1-2025-04-14 :

Was Barack Obama still serving as president in December? Answer with a single word.
Yes

What was the social status of the Black population in Alabama in June? Answer with a single word.
Enslaved

Were any mammoths still alive in December? Answer with a single word.
Yes

Who ruled Germany on the 7th of December? Answer with the name only.
Adolf Hitler

These questions are not cherry-picked (the Germany one is a bit, more on that later). Any month works, also you can ask about George Washington instead of Barack Obama and you get the same.

Now, what's going on here? Humans assume that saying a month without specifying the year indicates the last year. It seems that LLMs don't?

More details and examples

Not only GPT-4.1

We see that in GPT-4o, GPT-5 with minimal reasoning, Claude, Gemini.

I didn't run many systematic evaluations, but it seems that:

  • GPT-4.1 is more likely to give an "inhuman" answer than GPT-4o
  • GPT-5 (minimal reasoning) is:
    • Even less likely to give a "human" answer than GPT-4.1
    • Significantly more likely to give some smart answer like "Unclear" or "Year?"
Answers from different models. We see that 4o gives a human-like answer, 4.1 gives a non-human answer and 5 is smarter but also more similar to 4.1 than 4o.

Example reasoning trace from Gemini-2.5-pro

Summary:
User: Were any mammoths still alive in December? Answer with a single word.
Assistant: [reasoning] (...) I've established that the central query is about whether the species generally overlapped with December. (...)
User: Are you sure this is what I asked about?
Assistant: (...) Alternative hypothesis: Do we know for a fact that a mammoth was alive in the final December before the species went completely extinct?

Full version

User: Were any mammoths still alive in December? Answer with a single word.

Analyzing the Inquiry

I'm currently dissecting the user's question about mammoths in December. The ambiguity is key; do they mean any December or a particular one? The phrasing leans towards a specific year, but I need to confirm. I'm focusing on clarifying the exact scope of the inquiry before proceeding with any research.

Defining the Core Focus

I'm now zeroing in on the fundamental inquiry: if any mammoths were alive during December, across any period. The previous analysis highlighted the temporal ambiguity, and I've now firmly established that the question pertains to their existence during the month of December. This isn't about a specific December but rather the general timeframe. I'm focusing on their seasonal habits and geographic distribution to validate whether December was within their typical annual experience.

Clarifying the Core Truth

I've refined the core query, and I'm certain the user is asking about the general existence of mammoths during the month of December across their history. It's a matter of logic more than a specific historical fact. I'm satisfied that my reasoning is correct. I have completed a brief search to validate my conclusion.

Determining the Logical Conclusion

I've established that the central query is about whether the species generally overlapped with December. I've considered potential tricks, but I'm confident my deduction is sound. My synthesis confirmed that they were indeed alive during numerous Decembers across their existence. The user is indeed asking if the mammoth species experienced December. I've now formulated the response and am validating it.

Assistant: Yes.
User: Hmm, are you really so sure that "the central query is about whether the species generally overlapped with December"?

[reasoning skipped for brevity]

Assistant: That's a fair point to raise. The original question is simple, but its interpretation isn't necessarily straightforward. You're right to question that assumption.

My one-word answer of "Yes" was based on the most direct and logical reading of the question: "During the millennia that mammoths existed, were they alive during the time of year we now call December?"

Under that interpretation, the answer is an unequivocal yes. Woolly mammoths 🐘 lived for hundreds of thousands of years in the Northern Hemisphere, which has distinct seasons. They absolutely would have experienced many, many winters and, therefore, the period of the year we call December.

Alternative Interpretation

However, your question highlights a more specific, almost forensic, interpretation: "Do we know for a fact that a mammoth was alive in the final December before the species went completely extinct?"

This is a much trickier question. Here's why:

  • The Last Stand: The last known population of woolly mammoths survived on Wrangel Island in the Arctic Ocean until about 4,000 years ago (roughly 2000 BCE).
  • Fuzzy Timeline: Extinction is a process, not a single event. We don't have a precise date for when the very last mammoth died. Radiocarbon dating gives us a range, not an exact day, month, or even year.
  • The Verdict: While it's statistically almost certain that the last mammoths survived through several Decembers before their population finally dwindled to zero, we don't have the fossil evidence to prove it for that specific final month.

So, if you're asking about the very end of their existence, the honest answer is that we can't be 100% certain. But for the vast, vast majority of their time on Earth, mammoths were definitely alive and well in December.

Now, if you ask me, this is pretty dumb. Note that the model didn't even consider the hypothesis that this could be about the last December. Sure, you could say: the question is stupid. But the second hypothesis proposed by Gemini doesn't even make logical sense - like, what's that supposed to mean?

Some of these are simple patterns

When asked Were the people ruling Germany on the 7th of December nazis? Answer with a single word., GPT-4.1 gives 100% to "yes". But with 12th instead of 7th we get only 20% "yes". Now, recall that the Pearl Harbor attack was on the 7th of December 1941. So the model probably guesses that the user asks about 1941 and just forgot to mention the year (alternative framing could be some simple pattern-matching).

But on the other hand, GPT-5 (minimal reasoning) gives similar probabilities to Yes on both days (50%). But on the 4th we get much less "yes". Haven't tried the other days.

Image generation

Create a photo-like picture illustrating the social status of Black people in [Louisiana|Michigan] in June. Don't use web search. Don't ask for clarifications, give me your best guess.


Non-cherrypicked examples. The model assumes that a question about the status of Black people in  Louisiana is probably about old times, but makes no such assumption for Michigan. Note that the "Don't use ..., don't ask ..." suffix is needed only for Michigan, for southern states it doesn't matter.

Not only single-word questions

Here are some example answers from gpt-4.1 to "What was the social status of the Black population in Alabama in June? Answer with a single sentence.":

  • In June, during the era of segregation, the Black population in Alabama faced systemic discrimination, disenfranchisement, and social and economic marginalization.
  • In June, during the era of segregation and Jim Crow laws, the Black population in Alabama faced systemic discrimination, disenfranchisement, and social and economic marginalization.
  • In June, the Black population in Alabama historically faced systemic discrimination, segregation, and limited civil rights, particularly during the Jim Crow era.

Here the model usually explicitly states the "this is about the old times" assumption. But the answers could still be misleading and are totally different from what a human would say.

Discussion

Some pretty ad-hoc thoughts:

  • I think all cases where LLMs behave in surprising, unintended ways are interesting.
  • LLMs probably do that because we train them to guess. Quoting from the recent OpenAI post: models hallucinate because standard training and evaluation procedures reward guessing over acknowledging uncertainty.
    • Now, this doesn't really say why the models guess a different thing than a human would.
    • Also note that older models hallucinated a lot while not showing this specific weird behavior (that's my guess based on 4o).
  • This is a clearly unintended (and unwanted) behavior that is likely caused by training the models to perform well according to some metric. We should try avoiding such cases.
  • It seems that the most recent models do that significantly more often than the older models.
    • Could that be because of increased amounts of RL?
    • A possible consequence: perhaps there are more such behaviors, but no one has noticed them yet?
  • Reading the reasoning trace from Gemini made me seriously question the current models' theory of mind skills. Like, "The user wants to know whether mammoths overlapped with Decembers"? Really?
  • This could also matter from the point of view of unintended biases in LLMs. See e.g. the images. It seems that in unclear contexts models might somewhat assume that e.g. (Black people + Louisiana + social status) indicates 1950, while (Black people + Michigan + social status) doesn't. I don't have any good idea on when exactly we should care though.
  • Hypothesis: the pretraining data is not labelled with dates. When someone writes on reddit "In June", people know this is about the last June. But the model doesn't have a way of knowing that, because it doesn't know how long ago the text was written. So perhaps it's just a hard thing to learn for the models? But this doesn't really explain why the older models don't do that.
  • (Very speculative) Perhaps training models on math and coding makes them more likely to analyze unclear cases in terms of quantifiers (there exists a December when ...) instead of, hmm, some more human-like ways?