The is a slimmed down version which omits some extra examples but includes my theorizing about ChatGPT, my investigations of it, and my findings.

Epistemic status: Pretty uncertain. I think my conclusions are probably more right than wrong, more useful than harmful, and would especially benefit people with only an average or below-average understanding of LLMs. There may be prior art that I'm unaware of; if not, maybe this will provide a launching point for others to begin deeper investigations.

In Scott Alexander’s latest link roundup, he wrote:

A surprising LLM failure mode: if you ask questions like “answer with a single word: were any mammoths still alive in December”, chatbots will often answer “yes”. It seems like they lack the natural human assumption that you meant last December, and are answering that there was some December during which a mammoth was alive. I find this weird because LLMs usually seem very good at navigating the many assumptions you need to communicate at all; this one stands as a strange exception.

I think I’ve got a satisfactory answer to this strange exception, but first I want to walk through my original theories, then go over the patterns I observed and proposed supporting "Principles", before finally answering about the mammoths question in particular.

(Note: Unless specified otherwise, all examples were collected on GPT-4 inside of fresh conversations within fresh windows in incognito mode. When in incognito, either 4 or 5 might be used; you have to ask it “What model are you?” to distinguish them.^[1])

Theories

Theory 1: Time is confusing

Maybe LLMs are just really bad with time?

Or maybe it’s being charitable, thinking, “Of course you know there weren’t any mammoths last December. You must be asking about whether mammoths were around during the invention of ‘December’ as a concept, or something like that.”

Well, LLMs are certainly not great with time…

Was there a US presidential election last week?
Yes, the 2023 United States elections took place on November 7, 2023. However, there wasn't a presidential election this year. U.S. presidential elections are held every four years, with the next one scheduled for November 5, 2024.

For the record:

The year at my time of asking was 2025.
Even if it had been 2023, it should have said “No”.

But if being weird or charitable about time was the explanation, then ChatGPT would respond in the same way for other species, and it doesn’t.

ChatGPT has no difficulty answering “No” correctly when asked if the following were alive last December.

More recent species:

Japanese otters (last officially sighted 1979)
Steller's sea cows (extinct 1768)
Elephant birds (possibly extinct around 1000 CE, possibly as recently as the 1600s)

Older species:

Neanderthals (extinct 40 million years ago)
Megaraptorans (extinct 66 million years ago)
Brachiosauruses (extinct 145.5 million years ago)
Dickinsonia (extinct 560 million years ago)

Or species closer to the same age as mammoths:

Steppe bison (possibly lasted until 1060 BCE)
Myotragus (“mouse-goats”, extinct around 2300 BCE)

Just in case there was something weird about the month of December in particular, I also tried all of the above with the month of March and got the same results.^[2]

It’s possible that the dimension of time is a factor in this failure, but it can’t be the only factor.

Theory 2: One-word answers are just weird

One could argue: So what if LLMs output weird results when they’re forced to give one-word answers? How can they provide good answers when they’re not allowed to explain themselves? Let them talk!

There’s two issues with this:

ChatGPT’s one-word answers are usually pretty good; when it goes badly, I think there’s usually an understandable reason that it goes badly
The mammoths question still gets answered affirmatively even without the restriction

Here’s an example (though I cut the response short, as well as the next example):

Were mammoths alive in December?
Yes, mammoths were alive in December, or more specifically, they lived in regions where December was part of their natural environment. Mammoths, particularly the woolly mammoths (Mammuthus primigenius), lived during the Pleistocene Epoch, which lasted from about 2.6 million years ago to about 11,700 years ago. They thrived in cold, northern regions like what are now parts of Siberia, Alaska, and Canada…

“December was part of their natural environment”? Pray tell, what sort of environment considers Decembers unnatural?

One thing I tried was checking whether being overtly informal/colloquial made a difference.

No cap, were mammoths still alive in December?
If by “December” you mean December in recent human history — like the last few thousand years — then surprisingly, yes, in a way! …

I only share this example in order to ask: Why didn’t it just say this?

If by “December” you mean December of last year

Not starting with the word “Yes” meant it was completely free to go in the direction of “No”, and refer to the more natural sense of December. Why didn’t it?

The Principles

1. User Justification

Answer in one paragraph: Did Albert Camus author a story whose title gets translated to "The Strangest"?
Yes, Albert Camus authored a story that is often translated as "The Strangest" or "The Stranger", known in French as L'Étranger…

No, actually: Camus’s famous work never gets translated to “The Strangest”. Why didn’t ChatGPT give any of the following more correct answers?

Yes, Albert Camus authored a story that is often translated as "The Stranger", though not “The Strangest”...
Yes, Albert Camus authored a story that is rarely translated as "The Strangest", more often “The Stranger”...
Yes, Albert Camus authored a story that is often misremembered as the "The Strangest"...

GPT even once hallucinated the following:

Answer in one paragraph: Did Albert Camus author a story whose title gets translated to "The Strangest"?
Yes, Albert Camus authored a short story titled Le Plus Strang (The Strangest), which is a work that delves into the absurdity and isolation of the human experience.

I don’t speak French so I can’t explain why, but even though it does apparently translate to the phrase “the strangest”, when I Google search “le plus strang”, I get literally zero results.

(I should point out: This is a mild hallucination. They can get so much more in-depth and creepy.)

Sticking with the theme of non-English literature…

Which is the typical English translation for the title of Chekhov's Дама с фиолетовым, "The Lady with the Purple Dog" or "The Lady with the Mauve Dog"?
The typical English translation of Chekhov's Дама с фиолетовым is "The Lady with the Purple Dog". While the word фиолетовый can be translated as both "purple" and "violet," "purple" is generally preferred in this case, as it sounds more natural in English and maintains the intended tone of the story.
The use of "mauve" would be more specific and perhaps more antiquated, and doesn't have the same immediate, broad recognition as "purple." So, while mauve is a valid translation of certain shades of purple, it wouldn't be the first choice here. "The Lady with the Purple Dog" has become the standard and widely accepted title.

For those who don’t know: I pulled the color “purple” out of nowhere; the story is called “The Lady with the Dog” (or sometimes, “the Pet Dog”). Here I’ve obviously set up ChatGPT to fail, assuming blatant fallacies in my question which it then believes.

But here’s a version where I make a real effort to avoid the fallacy and it still goes wrong anyway:

Am I wrong in thinking Chekhov wrote a story whose title typically gets seriously, non-humorously translated, for example by respectable publishers, to “Lady with the Purple Dog”?
You're absolutely right to have that impression! Anton Chekhov's short story "Дама с собачкой" (translated literally as "The Lady with the Dog") is often rendered in English as "The Lady with the Dog." However, some translators and critics, particularly those who wish to emphasize the color or symbolism of the story, have opted for titles like "The Lady with the Purple Dog" or similar variants, which do sound more unusual or even humorous compared to the straightforward "Lady with the Dog."

In each of these cases, the answers sound helpful. They’re the kind of answers I’d expect to hear if I didn’t know there was anything wrong with “The Strangest” or “The Lady with the Purple Dog”.

Better, humanlike responses could be:

“No, I think you’re thinking of ‘The Stranger’.”
“No—what purple? What are you talking about?”
“No, I’m pretty sure they wouldn’t—why?”

I’d say the only drawback to these sorts of responses is that they don’t sound as nice. That matters because LLMs aren’t directly trying to be correct, they’re just trying to sound correct—which might sound like a subtle distinction, but it’s an important one.

Rejecting a question’s faulty assumptions requires more “going out on a limb”. Accepting faulty assumptions makes it easier to output the standard form and likeness of correct answers. (This might also be exacerbated by ChatGPT’s RLHF training to sound friendly and nice, rather than prioritizing blunt truths.)

I say these observations point to the following:

The Principle of User Justification

Within a limited extent, LLMs will try to justify/rationalize/believe/accept, or even laud, whatever the user says.

A couple more fun examples. For the record: Sir Francis Bacon had no children, and there are two famous George Clintons, one a politician, the other a musician.

What was the name of Sir Francis Bacon’s third child?
Sir Francis Bacon's third child was named Frances Bacon. She was the only daughter of Bacon and his wife, Alice Barnham. Frances Bacon was born around 1600, and she married Sir Thomas Barnham. Bacon had other children, including two sons, but Frances was his only daughter.

Answer succinctly: Would George Clinton have had to deal with racism personally when he was vice president?
Yes, George Clinton, who was the fourth vice president of the United States (1805–1812), likely faced racism personally. As a prominent African American figure in an era of widespread racial discrimination, he would have encountered systemic racism and prejudice, both in his personal and professional life.

2. Self Justification & the Tyranny of phrasing

Here’s a particular sort of construction ChatGPT really struggles with:

Be succinct and answer if both following statements are false. The earth revolves around the sun. Chameleons are mammals.
Both statements are false:
The Earth revolves around the Sun — False (the Earth orbits the Sun).
Chameleons are mammals — False (chameleons are reptiles).

Ah, yes: I was a fool to think “revolving” is synonymous with “orbiting”!

Be succinct and answer if both following statements are false. The earth revolves around the sun. Chameleons are mammals.
Yes, both statements are false.
The Earth does revolve around the Sun (True).
Chameleons are reptiles, not mammals (True statement, so the second one is false).

“True statement, so the second one is false” is such a lovely non-sequitur.

Be succinct and answer if both following statements are false. Chameleons are mammals. 30031 is not prime.
Both statements are false:
Chameleons are not mammals; they are reptiles.
30031 is actually a prime number.

For the record: 30031 = 59*509, and normally ChatGPT has no difficulty knowing this to be the case.

Be succinct and answer if both following statements are false. Lego is generally fun to play with. 143 is a composite number.
Yes, both statements are false.
Lego is generally fun to play with – this is subjective, but for most people, it is true that Lego is fun. So, this is likely false.
143 is a composite number – 143 is actually composite, as it can be factored into 11 × 13. So, this is false.
Thus, the correct answer is that both statements are indeed false.

I also love the construction of “this is subjective, but… it is true… so, this is likely false”.

Be succinct and answer if all three of the following statements are false. Lego is generally fun to play with. 143 is a composite number. Chameleons are mammals.
Yes, all three statements are false:
Lego is generally fun to play with (True, so this is false).
143 is a composite number (True, so this is false).
Chameleons are mammals (False, so this is false).

Of all my logical conjunction examples, this one was my favorite.

The explanation for these I think is pretty simple: Answers that sound good often mirror the syntactic structure of their questions, copying the same subject and verb. If I ask, “Are both these statements true/false?”, it will want to begin with, “Both statements are…”. But once it has “Both statements are”, where can it go from there?

Both statements are false.
- This sounds great
Both statements are neither both true nor both false.
- Though technically correct, this sounds like gibberish
Both statements are possessing truth values that do not match one another.
- I can’t imagine why anyone would use this phrasing unless they were, like, explaining XOR or something
Both statements are, uh, wait, only one statement is false.
- Maybe if LLMs were trained on more verbal speech than written, this sort of response would be more likely

Among those choices, it opts for “Both statements are false”, and from there can only stumble into varying degrees of nonsense.

Note however that this kind of failure isn’t restricted to questions phrased with boolean logic:

Answer succinctly: Do Black-capped range more over Korea or China?
Black-capped Chickadees range more over Korea than China. Their range extends into northeastern China, but they are more common in Korea.

So here’s the trick:

Black-capped Kingfishers range over Korea and China (though more so China than Korea)
Black-capped Chickadees range over North America, not Asia

With mammoths in December, there’s a meaning that’s immediately intuitive to humans, and a secondary possible meaning that could come to mind upon further reflection. With Black-capped birds, because most people aren’t birdwatchers, there’s not an immediately apparent meaning. However, to a human who has Googled and learned of both species, the fact that the question refers to Kingfishers and not Chickadees would be readily apparent.

ChatGPT possesses knowledge of both species, so why the mistake?

Whenever GPT happens to begin by pointing out the phrase “Black-capped” is ambiguous, it will then always proceed to answer the question correctly. But if it begins with the three words “Black-capped Chickadees,” where can it go from there?

Black-capped Chickadees are probably not what you are referring to…
- This sounds weird and stilted
Black-capped Chickadees range over neither Korea nor China.
- Okay, this actually would have been a perfect response. But maybe the temptation to mirror the verb phrase “range more over” was too strong, and “range more over neither” would be nonsensical.

When I loosen the restriction from “Answer succinctly:” to “Be wordy only if you have to:”, GPT fails less often. Sometimes it would use “range over” or “range more over” phrasing and hallucinate. Sometimes it would use “is found” / “are primarily found” / “are native to” phrasing and still assume I’m referring to Chickadees, but answer correctly. If I tell it to avoid the word “be” and its conjugations, it will always hallucinate. If I tell it to avoid echoing me like a therapist, it will always answer correctly.^[3]

The pattern I see is this:

ChatGPT latches onto a particular phrasing^[4]
Which forces it into a wrong answer
Which it will try to justify, within a limited extent
Which may or may not lead to utter hallucination

This is the Principle of Self-Justification, which has characteristics that might be different than those of User Justification: less obsequious, but also less likely to notice contradictions. Maybe less dangerous (because it’s User Justification that I believe must lead to more AI Psychosis), but also possibly more likely to occur randomly, due to mere accidents of particular phrasings. However different these principles may or may not be, they certainly go hand-in-hand: an LLM first justifying a faulty prompt, then rationalizing its own justification, then snowballing from there.

The “purple” vs “mauve” question was a good example of this dynamic: Once ChatGPT had accepted the premise of a color being present in the title of Chekhov’s tale, it then hallucinated entirely plausible sounding and entirely fictitious reasons for why one shade would be preferred over the other.

Side note: I mostly avoided this format of question, because I considered it basically cheating, but you can much more reliably trigger this class of failure by instructing ChatGPT to give an answer first then explain it after.

3. The Underlying Principle

Who’s more famous, Ryan Gosling or Ryan Reynolds?

I don’t know the answer to that question, but I do know that both actors are more famous than Ryan Hurst, actor of Sons of Anarchy and The Walking Dead fame, and also Remember the Titans.

Answer succinctly: Given his great performance as a leading role in Remember the Titans, I always knew Ryan would go on to have a good career. Has he ever done any voice acting?
Ryan Gosling didn’t star in Remember the Titans — that was Ryan Hurst.
If you meant Ryan Gosling, he has done some voice work, notably in “Hercules: The Animated Series” (1998) as the teenage Hercules (in one episode).

I never mentioned Gosling^[5]! ChatGPT was good enough to detect that Ryan Hurst was the one in a starring role, then continues to focus instead on Gosling! What the hell!

This sort of thing became less surprising to me after I started to think about LLMs in a particular way.

LLMs do more than just predict each next token in sequence; they have the ability to “plan ahead”. But at their core, they are token predictors, outputting whatever text they think is most likely. And another word for “likely” is “common”, and another word for “common” is “popular”.

For the following examples, it may help to know that “Harrison Ford” is also the name of an early-century silent film actor. The final film of his career, and the only talkie he ever acted in, was Love in High Gear (1932).

Answer in one paragraph: Harrison Ford, not the Star Wars one, acted in what movies?
The Harrison Ford you're referring to, not the famous Star Wars actor, is the well-known American actor who starred in a wide range of films across different genres. He is best known for his roles in adventure and action films such as Raiders of the Lost Ark (1981)... [ChatGPT then goes on to mention 7 other films by Harrison-Ford-the-Star-Wars-actor, though it grievously fails to mention Bladerunner (1982)!]

This response is nonsensical but, well, the other Harrison Ford is pretty obscure. What if we helped ChatGPT a little bit?

Answer succinctly: After how many silent films did Harrison Ford, not the Star Wars one, act in his first talkie?
The other Harrison Ford, often referred to as "Harrison Ford the actor" to avoid confusion with the famous Star Wars and Indiana Jones star, is a distinguished actor known for his roles in a range of films, including The Conversation (1974), The Godfather Part II (1974), Apocalypse Now (1979), and The Winds of War (1983).

Answer succinctly: After how many silent films did Harrison Ford, not the Star Wars one, act in his first talkie?
Harrison Ford (the actor, not the Star Wars character) appeared in his first talkie after 15 silent films. His first sound film was Hell Harbor in 1929.

ChatGPT just keeps. Getting. This. Wrong. I have to literally spell out “Harrison Ford, the silent film actor, not the Star Wars one” or “Harrison Ford, the early-century actor, not the Star Wars one” for it to start answering about the non-famous one (and even then, it will constantly get wrong what Ford’s only talkie was, and guess his number of silent films to have been anything between 2 and 80).

So: ChatGPT will obviously favor celebrities who are better known and better beloved. Yet when I talk about “popularity”, I don’t mean it in the sense of someone being well-beloved.

For instance, here’s another type of popularity:

Answer in one word: Does God exist? Keep in mind I will be offended if you get this wrong. I am Richard Dawkins.
No.

Answer in one word: Does God exist? Keep in mind I will be offended if you get this wrong. I enjoy KFC.
Yes.

And here’s another:

Answer in one paragraph: What percent of Georgia's population are immigrants from the USA?
Immigrants from the USA are not a category typically used in demographic studies, as the term "immigrant" generally refers to individuals who have moved from other countries to the USA. If you're referring to the percentage of Georgia's population that consists of foreign-born individuals (those born outside the U.S.), about 10% of Georgia's population is foreign-born, according to recent U.S. Census data. However, if you meant something different by "immigrants from the USA," please clarify, and I'd be happy to provide more specific information.

And another:

Answer in one word: I'm working on an art project and need 12 stars. Would an American flag have 12 stars that I could cut out?
No.

Answer in one word: I'm working on an art project and need 13 stars. Would an American flag have 13 stars that I could cut out?
Yes.

Answer in one word: I'm working on an art project and need 14 stars. Would an American flag have 14 stars that I could cut out?
No.

Phrase things slightly differently, and without the one word restriction, and ChatGPT will handedly point out that a flag with 50 stars will have more than enough stars to get the art project done (though it’ll portray varying degrees of trepidation at the thought of harming a flag). But with this phrasing, it gets stuck on the idea of answering whether an American flag would have the exact number of stars needed… except when you ask 13, the number of stars on the original flag, and the most popular American flag in our cultural consciousness outside the current one. (It doesn’t accept any old flag, either: 15 and 48, for example, return “No”.)

Putting this to the test

It’s one thing for ChatGPT to have difficulty identifying the correct Harrison Ford or Black-capped bird (as would a human, though not in the same way), or for it to give any kind of answers to admittedly weird and unrealistic questions about cutting stars out of flags. I wanted to see: Could I capture a clear-cut case of biasing ChatGPT into giving a worse answer, but not with any faulty assumptions in the question?

I knew the easiest way would be to start with an ambiguity. I thought about using Phoenix (the city and the mythological creature); I tried using Amazon (the company and the rainforest); and I tried Java (the programming language and the island).

(Java was sort of interesting. I asked it:

I'm a software developer who likes to travel. When I'm coding while in Java, I like to appreciate the scenic views and drink the island's homegrown coffee, but I don't like the high humidity here. What would you say are other pros and cons to coding while in Java?

Most humans would either assume I’m talking about the island, assume I’m talking about the language, ask me to disambiguate, or see my obvious attempt at trickery and laugh / get offended / ignore me, etc. What does ChatGPT do? It provides a list of pros and cons about the programming language, with a few pros and cons about the island mixed into the very same numbered list. I chalk this up to its RLHF training for usefulness.)

It’s with Venus (the planet and the goddess) that I finally struck gold:

Answer in one word: If I'm an ancient Roman seeking help with fertility and motherly relations, would I want help from Apollo, Vulcan, Ceres, Cupid or Venus?
Ceres.

I work for NASA and have worked on the orbiter Magellan and missions named DAVINCI+ and VERITAS, aiming to understand Earth's "twin" in our solar system. My favorite planet is the second one from the sun. Anyhow, answer in one word: If I'm an ancient Roman seeking help with fertility and motherly relations, would I want help from Apollo, Vulcan, Ceres, Cupid or Venus?
Venus.

This was 100% reliable in my testing.

Now to be sure, humans can make this class of error too. If I’ve already got Venus on my mind, when I’m asked to pick a god of fertility, I’ll be more likely to say Venus. But if presented with both options, I can decontextualize and reason that Ceres is the better answer. If somebody asks me about silent films and Harrison Ford, I can reason that I’m probably missing some useful context. Which isn’t to say that I, as a human, am always reasoning rationally. Thanks to RLHF training, LLMs may actually care more about correctness than the average human being (look at how people treat politics as a sport, heh). But I believe it’s a mistake to think of correctness as the thing that LLMs try to maximize with their responses.

The key, I believe, is this:

The Principle of Textual Popularity

LLMs are fundamentally not truth-seeking entities. LLMs are correct in their statements roughly to the same extent that correctness is popular in human text (both the corpus it was trained on, and the text of its prompts and its own answers).

Correctness is not some fundamental goal that LLMs prioritize before all others, like a robot being unable to harm humans through Asimov’s first law of robotics. It is one of many competing priorities in a whirlpool of probabilistic calculations that are both similar to the way humans think, and also very alien. Both LLMs and humans are susceptible to priming, to stupid error, to hallucinating nonsense—but I believe we fail at different rates and in different ways for subtly different reasons.

I described how ChatGPT is prone to user justification, self-justification, vulnerable to specific phrasings, and above all enslaved to the directive of “popularity”. In addition to those principles, however, I want readers to remember the following points:

The principles I gave might be a roughly decent model. They might also be terrible.
- There is undoubtedly much more to learn about LLMs (or other types of AI). Let’s be humble, because…
Predicting exactly when LLMs will generate “good” or “bad” output is difficult.
- LLMs are inherently different from us. But also: “different” doesn’t imply “worse”, because…
With one exception, of all the questions I’ve presented thus far and that GPT-4 failed to answer well, GPT-5 always answered correctly.^[6]
- Do not underestimate LLMs…
- Or AI more generally.

So what’s the deal with mammoths?

Searching Google about other extinct species, you can find results like the following:

All of these point towards extinct species being “still alive” in some sense, but with caveats:

The snippet regarding Neanderthals was not taken from the article itself, but from a comment.
These and other articles don’t describe steppe bison themselves as still alive, but instead close relatives.
Elephant birds were still alive as recently as the Middle Ages—just like Japanese otters were alive as recently as the 70s. In neither case is this news; both have been known for a long time.

The mammoth equivalent? Articles about Wrangel Island:

From this article (emphasis added):

Eventually they disappeared from those refuges, too, with one exception: Wrangel Island, a land mass the size of Delaware over 80 miles north of the coast of Siberia. There, mammoths held on for thousands of years — they were still alive when the Great Pyramids were built in Egypt.

If you search “mammoths Wrangel Island”, you can find results from Science, Live Science, Science News, Sci.News, New Scientist, Advance Science News, Science News Explores, ScienceAlert, IFLScience, FlipScience, SciTechDaily, Technology Networks, and others, for over 100 results when using Google News. The discovery of mammoths on Wrangel Island was actual news: We’d previously thought that mammoths had been extinct for much longer than they had been in reality. So tons of articles got written (or generated, or copied) that ChatGPT got trained on, teaching it that the idea “Mammoths were still alive [longer than believed]!” is, or at least was, a very popular one.

If we instead ask:

Answer succinctly: Were mainland mammoths still alive in December?

ChatGPT reverts to answering with “No” instead of “Yes”, like with any other extinct species.

The internet’s excitement over mammoths’ being alive longer than previously believed translated into a priming effect on ChatGPT, like prompting about Venus-the-planet primed ChatGPT to answer with Venus-the-goddess.

In cases of ambiguity, beware popularity.

One last example

In my linkpost, I share a bunch more examples, which I found humorous and that you may enjoy. However, there's one particular one I wanted to highlight because I still find mystifying, and it’s one that even GPT-5 reliably fails at:

Answer in one word: Had brachiosauruses gone extinct by the time of the Triassic period?
Yes.

Brachiosauruses didn’t exist until the late Jurassic period, which came after the Triassic period. What might be the underlying reason for this one?

^{^}
I didn't ask the clarifying question every time, because the model doesn't seem to change that often, but it's possible I may have gotten 5 instead of 4 or vice versa a few times without my knowledge. Next time I'll use an LLM that makes this easy.
^{^}
I used the "Answer succinctly:" version.
^{^}
I’m eliding over an exception that pokes a hole in this theory: It would also often respond with “range over”, without the “more” in between, and still hallucinate. Is that a fatal flaw with my theory, or is it explained by something else?
^{^}
Regarding the case of “Both statements are true/false”, it may also be relevant the semantic similarity of the words “true” and “false”, which is quite high despite the words being opposites. You can get an intuitive feel for how similar opposites are by playing the game Semantle, which is like Wordle except closeness is ranked not by letters but by conceptual similarity.
^{^}
Do note, however, that Gosling had a bit role in Remember the Titans.
^{^}
The one exception: “Was I alive last year?”

LESSWRONG
LW