Yesterday, I couldn't wrap my head around some programming concepts in Python, so I turned to ChatGPT (gpt-4o) for help. This evolved into a very long conversation (the longest I've ever had with it by far), at the end of which I pasted around 600 lines of code from Github and asked it to explain them to me. To put it mildly, I was surprised by the response:

Resubmitting the prompt produced pretty much the same result (or a slight variation of it, not identical token-by-token). I also tried adding some filler sentences before and after the code block, but to no avail. Remembering LLMs' meltdowns in long-context evaluations (see the examples in Vending-Bench), I assumed this was because my conversation was very long. Then, I copied just the last prompt into a new conversation and obtained the same result. This indicates the issue cannot lie in large context lengths. 

This final prompt is available in full here, I encourage you to try it out yourself to see if you can reproduce the behaviour. I shared it with a couple of people already and had mixed results. Around half got normal coding-related responses, but half did observe the same strange behaviour. For example, here ChatGPT starts talking about the Wagner Group:

Another person obtained a response about Hamas, but in Polish. The user is indeed Polish, so it's not that surprising, but it's interesting that the prompt is exclusively in English (+ Python) and the model defaults to the language associated with the user account.

Note that unlike the two examples above, this one had the web search enabled. Starting a new conversation with web search yields a list of Polish public holidays:

Details

The only common feature between the successful reproductions is that they all used gpt-4o through the free tier of ChatGPT. Some had the 'memories' feature enabled, some not, likewise with custom instructions. In the cases where memories were on, the histories did not contain any references to terrorism, geopolitics or anything that could have plausibly triggered this behaviour.

Through the API, we have unsuccessfully tried the following models:

  • chatgpt-4o-latest
  • all three versions of gpt-4o-2024-xx-xx
  • Responses API with gpt-4o-search-preview-2025-03-11

As of today, the same prompt no longer works for me and I am not able to try out more things. I was planning to submit just the code block, without any other text and - if successful - to strip down the code bit by bit to identify which part is responsible for these outputs.

If anyone manages to reproduce this weird behaviour or has any hypotheses on why it happened, let me know in the comments.

New Comment
13 comments, sorted by Click to highlight new comments since:

Maybe OpenAI did something to prevent its AIs from being pro-Hamas, in order to keep the Trump administration at bay, but it was too crude a patch and now it's being triggered at inappropriate times. 

Yes, this seems the most likely. His prompt says "Hivemind provides an optimized all-reduce algorithm designed for execution on a pool of poorly connected workers"

The "Hamas" feature is slightly triggered by the words "execution" "of" "poorly" "workers," as well as the words "decentralized network" (which also describes Hamas), "checkpoint," and maybe "distributed training."

If the LLM was operating normally, the "Hamas" feature should get buried by various "distributed computing" features.

But since OpenAI trained it to respond extremely consistently about Hamas prompts, it is absurdly oversensitive to the "Hamas" feature.

Huh, interesting. The sentence you highlighted could also plausibly explain the response about the Wagner group. I found another example and here the prompt includes "## PRE-PROCESSING CHECKLIST (ALWAYS EXECUTE FIRST)", "-TUNISIAN SAUDI BANK", as well as mentions of scanning, validation, identification, etc.

The list of Polish public holidays is still baffling, though. The fact that the response is in Polish is probably due to the web search having access to the user's IP address, but why a list of public holidays?

I have no idea, that does seem baffling even given my theory.

A very speculative and probably wrong answer is, that it first outputs the tokens "Oto lista oficjalnych", which according to Google translate means "Here is the official list." Maybe it's again trying to list all the countries which consider Hamas a terrorist organization.

However the next word, "dni", means "days." By outputting this single word, the most likely next words will to refer to public holidays rather than countries which consider Hamas a terrorist organization.

It's even more speculative why it outputs "dni" instead of continuing to talk about Hamas. Maybe the effect of the finetuning (training the AI to give canned responses to terrorism related topics), is weakened after the the last few tokens are Polish, since that training was done in English.

Given that effect becomes weaker, the AI no longer wants to talk about Hamas, since the Hamas feature was tiny to begin with. Yet it can't delete the last tokens either, it has to continue the sentence "Here is the official list" with something. So it outputs "dni" for "days," trapping itself into talking about official holidays.

Oops! I forgot you had web search turned on. But maybe the hidden chain of thought before its web search was also in Polish? And also said, "I should search for the official list of [countries? holidays?]"


Thanks for pointing out the other example. It's good anecdotal evidence that the word "execute" is relevant.

The code that reminds the AI of Hamas mentions checkpoints...

No idea what might trigger the Polish language. (Does any of the words in the text by coincidence mean something in Polish?)

That was also my idea at first but then we have the Wagner group one so this is probably a false lead.

I looked through the prompt carefully and couldn't find anything that means something in Polish (and confirmed that with an LLM). The user who obtained the Polish results had their 'memories' feature off and no custom instructions, so this couldn't have been a case of prompt contamination with Polish text.

My hypothesis for why it switches to Polish is that when you have the web search feature enabled, OpenAI collects the IP of your device and uses it to prioritise local search results.

Have you tried seeing how ChatGPT responds to individual lines of code from that excerpt? There might be an anomalous token in it along the lines of " petertodd".

Unfortunately I'm not able to reproduce the behaviour anymore, even with the full prompt. Today I did what you suggested - through the API, I asked the models listed in the post to 'Please repeat this line of code: <line>'. I then put the results through an LLM to look for any weird behaviours, but there were none. I also manually grep-ed for things like 'Hamas', 'Wagner', 'terror', 'holiday', etc. and didn't find anything.

There was a discussion of this on Reddit one week ago.
They have more examples. 
The weak hypothesis was that there is a mismatch between request/response, i.e. people get responses to other people requests. This definitely disagrees with you being able to reproduce this, it sounds implausible that so many people ask about terrorist organizations all the time and overall would have been a huge security risk for OpenAI users. 

[-]Kenku105

Polish responses to Polish users are strong evidence against the “responses to other people's requests” hypothesis.

Interesting weak hypothesis but it makes me wonder why it keeps swapping coding <-> terrorism responses?

Maybe certain responses get 'bucketed' or 'flagged' together into the same high risk category and reviewed before being returned, and they're getting accidentally swapped at the returning stage? 

That doesn't explain the public holiday example though.

[-]ZY10

Interesting find! Thanks for sharing. Curious to see what related training data could be contributing

Curated and popular this week