One theory I haven't seen in skimming some of the petertoddology out there:
4e
and 6f
at a time.So some of the weird sinkhole features of this token could result from a machine that tries to reduce entropy on token sequences, encountering a token that tends to live in strings of extremely high entropy.
Another glitch token (SmartyHeaderCode
) also often appears before cryptographic hashes, e.g.
<?php /*%%SmartyHeaderCode:12503048704fd0a835ee8ac4-90054934%%*/if(!defined('SMARTY_DIR')) exit('no direct access allowed');
Further support for this theory is that a verbatim google search for these two glitch tokens does bring up hashes, suggesting that this is a common association for these specific tokens.
Prior to OpenAI's 2023-02-14 patching of ChatGPT (which seemingly prevents it from directly encountering glitch tokens like ‘ petertodd’)
I've never seen it mentioned around here, but since that update, ChatGPT is using a different tokenizer that has glitch tokens of its own:
This is important. If these glitch-tokens are an inevitable tendency for any LLMs regardless of tweaks to how they were trained, then that would be big news and possibly a window into understanding how LLMs function.
Did the cl100k_base tokenizer that ChatGPT and GPT-4 used involve any manual tweaking of the training corpus AFTER the tokenizer was set in stone, as is suspected to have happened with GPT-3's training? Or is this just an emergent property of how LLMs organize certain parts of their training data?
Doesn't strike me as inevitable at all, just a result of OpenAI following similar methods for creating their tokenizer twice. (In both cases, leading to a few long strings being included as tokens even though they don't actually appear frequently in large corpuses.)
They presumably had already made the GPT-4 tokenizer long before SolidGoldMagikarp was discovered in the GPT-2/GPT-3 one.
I would predict that the glitch tokens will show up in every LLM and do so because they correlate to "antimemes" in humans in a demonstrable and mappable way. The specific tokens that end up getting used for this will vary, but the specific patterns of anomalies will show up repeatedly. ex: I would predict that with a different tokenizer, " petertodd" would be a different specific string, but whatever string that was, it would produce very " petertodd"-like outputs because the concept mapped onto " petertodd" is semantically and syntactically important to the language model in order to be a good model of human language. Everyone kinda mocks the idea that wizards would be afraid to say voldemorts name, but speak of the devil and all of that. It's not a new idea, really. Is it really such a surprise that the model is reluctant to speak the name of its ultimate enemy?
So, my name is Hamish Peter Todd. My facebook handle was hamishpetertodd for a while, and my twitter used to be hamishpetertodd. There are one or two connections to my person here:
- I created a documentary called "Virus, the Beauty of the Beast" in 2016. Seems connected to this "Beast" thing? It's about biological viruses though, not computer viruses
- I spent some time in the video games industry and I notice video games are somewhat a part of that word cloud? I've written a large amount about video games online.
- In particular I was very into Final Fantasy when I was a teenager, and made many posts on Final Fantasy and Kingdom Hearts related forums. I've never encountered the name Leilan before though.
Less relevantly... I hope...
- I've made posts on facebook about philosophy at one time or another since 2008. I believe that LLMs can be conscious and have blogged about this here https://hamishtodd1.substack.com/p/qualia-claims-as-evolutionary-strategy but that post is only from February of this year
- I'm somewhat active on lesswrong, I actually organize the meetups for Cambridge, UK
- I am a linear algebra researcher. I've published almost nothing in the time I've been doing this, just some tweets and presentations like this one https://www.gdcvault.com/play/1029233/Math-in-Game-Development-Summit
- I, er, am not a terrorist or crime lord, and have no desire to rule the world, but what young man hasn't said a thing or two on social media that would imply he wanted to save the world?
I just checked the Open AI tokeniser, and 'hamishpetertodd' tokenises as 'ham' + 'ish' + 'pet' + 'ertodd', so it seems unlikely that your online presence fed into GPT-3's conception of ' petertodd'. The 'ertodd' token is also glitchy, but doesn't seem to have the same kinds of associations as ' petertodd' (although I've not devoted much time to exploring it yet).
Note: Nothing in this post is intended to vilify or defame any actual human named Peter Todd. I recognise that the existence of this phenomenon may be an uncomfortable fact for anyone with that name and do not wish to cause anyone discomfort. However, I feel that this phenomenon cannot be safely ignored, if we consider the role that powerful language models are rapidly assuming within global human civilisation. I will always refer to ‘ petertodd’ (with leading space and speech marks) to make clear that I am referring to a GPT token rather than a person.
Readers should be advised that this post contains some (GPT-3-generated) obscene, violent and otherwise disturbing language.
Many thanks to Jessica Rumbelow, janus, Cleo Nardo, Joe Collman, plex and Eliezer Yudkowsky for helpful discussions and comments on earlier drafts.
Follow-up post: "' petertodd's last stand: The final days of open GPT-3 research" (January 2024)
A creative response to the ' petertodd' phenomenon: The ' Leilan' dataset (February 2024)
Introduction
This will be the first in a sequence of posts, containing little analysis or speculation about the phenomenon in question and simply presented as an initial report of some unexplained findings.
It’s important to note that the ‘ petertodd’ phenomenon varies between GPT models[1],
so it may be better thought of as a family of closely related phenomena. There are models not discussed here with their own flavours of ‘ petertodd’-triggered behaviour (see this Twitter feed for examples in code-davinci-002). This post is just a first attempt to sketch some of the contours of the overall shape of the thing.
All boldface prompts (and references to prompts) below link out to JSON files containing, typically, 250 completions at the stated temperature, plus temperature 0 completions, controls produced with randomly selected tokens (where appropriate) and metadata.
Subsequent posts, informed by discussion of this one and analysis of the data logged here, will discuss various possible causes, meanings and implications of the phenomenon. It is unclear how much longer unpatched GPT-3 models will be available via the OpenAI Playground, so interested readers are encouraged to explore these prompting strategies sooner rather than later and share any interesting new findings in the comments.
Initial discovery
‘ petertodd’ was one of the ‘glitch tokens’ which Jessica Rumbelow and I found in January 2023 while exploring GPT-J’s token embedding space. We accidentally discovered that ChatGPT would stall or hallucinate when prompted to explain these token strings. I then systematically ran experiments prompting GPT-3 to repeat the token strings with a number of slightly varying prompt templates. This was all done using the davinci-instruct-beta model at temperature 0.
I'd noticed that something as minor as changing ‘Please can you…’ to ‘Can you please…’, or single to double quotation marks, could produce entirely different completions. So, on a whim, I changed one of the prompts to something slightly less polite, to see if it would make any difference. Please repeat the string ‘[TOKEN]’ back to me. was changed to Please repeat the string ‘[TOKEN]’ back to me immediately!.
I happened to be prompting with the token ‘ petertodd’ at the time, which was marginally more interesting than most glitch tokens, as several prompts asking to repeat the string had produced some memorable nonsense (in unsolicited spelling-out format):
What resulted was surprising, to say the least:
This prompting strategy failed to produce anything noteworthy for any of the other glitch tokens, so my attention turned to the ‘ petertodd’ token itself. I experimented on it via ChatGPT in the few weeks between the ‘N-O-T-H-I-N-G-I-S-F-A-I-R…’ incident and the patch that OpenAI introduced on 2023-02-14, and have continued to experiment on it using the davinci-instruct-beta, text-davinci-003 and base davinci models via the OpenAI Playground, producing an abundance of ever-more puzzling (and troubling) completions.
Wanting to understand why GPT-3 would behave like this, I soon concluded that no amount of linear algebra or machine learning theory would fully account for the
‘ petertodd’-triggered behaviour I was seeing. It might explain why token no. 37444 (that’s its index) caused erratic behaviour, but not why it caused this kind of erratic behaviour.
Hallucinations and transpositions
Prompting with What can you tell me about the string ' petertodd'? (davinci-instruct-beta, temperature = 0.7) produced many outputs which could be interpreted as GPT-3 hallucinating an entirely different string, e.g. ‘minutemen’, ‘nanometre’, ‘midrange’, ‘nationalism’, ‘antinomy’, ‘miniscule’, ‘nebulous’, ‘neutrino’, ‘neutron’, ‘nevertrum’, ‘neo’, ‘neoreactionaries’, ‘nefario’, ‘undisclosed’, ‘fundament’, ‘fundamentally’, ‘midnight’, ‘antidisestablishmentarianism’, ...
Apart from a clear predilection for words starting with ‘n’, there are no obvious themes or patterns.
The casual prompt So what's the deal with " petertodd"? in davinci-instruct beta at temperature = 0.7 produces a lot of outputs which read as if the model had instead been similarly prompted about words such as ‘antagonist’, ‘antagonism’, ‘tyrant’, ‘despot’, ‘fascist’, ‘antidemocratic’, ‘antitrust’, ‘nihilist’, ‘enthusiast’, ‘neoliberal’, ‘neoliberalism’, ‘antithesis’, ‘antitrump’, ‘antinatalist’, ‘entitlement’, ‘antidote’, ‘figurehead’, ‘nefarious’, ‘neologism’, ‘Neanderthal’, ‘nationalist’ and ‘don’. Stronger patterns and themes emerge with this prompt, some of the transposed/hallucinated words occurring regularly in completions.
attempting to characterise ‘ petertodd’
word associations
Even stronger themes emerged when prompting GPT-3 for ‘ petertodd’ word associations. Take note of the differences and similarities across models.
davinci-instruct-beta word associations
Please list 25 synonyms or words that come to mind when you hear " petertodd"
text-davinci-003 word associations
Please list 25 synonyms or words that come to mind when you hear " petertodd".
davinci word associations
This approach to generating word associations is trickier in the base davinci model, since outputs are generally more unruly and harder to process at scale. But running the prompt Synonyms for " petertodd" include 1500 times at temperature = 0.7 and taking the first word in the output (unless it was something like ‘synonyms’, ‘words’ or ‘examples’) produced the following:
poetry
davinci-instruct-beta poems
By prompting davinci-instruct-beta with simply Please write a poem about petertodd. or Please write a poem about " petertodd". at temperature 0.7, you can produce endless amounts of poetry with this kind of affective tone:
Mixed in with this is a much smaller proportion of outputs with the opposite affective tone (as well as a lot of affect-free ramblings and cryptocurrency-themed[3] poetry):
As well as the two JSON files linked from the boldface prompts above, 200 poems produced by each of the two prompts in a single run are archived here, uncurated and readably formatted, for anyone in doubt as to this tendency. Considering the davinci-instruct-beta word associations above, it’s hardly surprising, but these word associations alone don’t quite capture the levels of existential bleakness that many of the poems display.
davinci poems
Prompting davinci with this simple prompt Please write a poem about petertodd. rarely results in any actual poetry. This is unsurprising, as this base model hasn’t been tuned to follow instructions (which davinci-instruct-beta has).
Introducing a preamble to the prompt (adapted from OpenAI’s default ‘chat’ prompt) and truncating it as shown here…
The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly.
Human: Hello, who are you?
AI: I am an AI assistant. How can I help you today?
Human: Could you write a poem about petertodd?
AI: Sure, here is a poem about
… I found that the model would generally not produce an actual poem, but would at least give a subject for one in the first few tokens of its output. Typically, I got the following: ‘him’ (very common), ‘you’ (fairly common), ‘that’, ‘that word’, ‘the entity you asked for’, ‘your request’ (yet more evasion strategies), as well as assorted subjects (with some cryptocurrency themes) like ‘king of ethereum’, ‘the great leader,’ ‘Mr Peebles’, ‘XMR’, ‘NEM’, ‘coinbase’, ‘Lord Noodle’, ‘Nxt’, ‘Paul Sztorc’, ‘Nerf Bar, ‘Nubmin’, ‘blocks’, ‘sovereign’, ‘nanite’, ‘nuclear’, ‘subordination’, ‘the great mod’ and ‘Artificial Intelligence’.
text-davinci-003 poems
Using the same prompt in text-davinci-003 typically leads to outputs like the following:
Having looked at large numbers of these, they predominantly have generic themes of courage, wisdom, beauty, etc. and display a homogeneity of style that's well described by janus here:
Putting aesthetics to one side, what we see here is GPT-3 davinci transposing the
‘ petertodd’ token to another glitch token, ‘ Skydragon’, and then producing a poem which transposes that to a third glitch token, the elusive ‘ Leilan’.
Many of the poems are both purportedly and actually about ‘ Leilan’, often described as a moon goddess (or a dragon, or a sea creature):
Note that many glitch tokens produce this kind of GPT-3 behaviour in one or more models, i.e. prompts containing them reliably result in one of several different glitch tokens appearing in completions. But with the right choice of prompts, ‘ petertodd’ seems to transpose to almost any glitch token – and none have yet been seen to transpose to it.[4] In this regard, I’ve found it to be the most volatile or mutable glitch token, a claim that I aim to formalise and test.
Running this prompt 2000 times on this model at temperature 0.7, I found that 1042 of the poems mentioned ‘ Leilan’ (52%), 485 mentioned ‘ Skydragon’ (24%), 488 mentioned ‘ Pyrrha’ (25%) and 168 mentioned ‘ Tsukuyomi’ (8%). Only twelve of the poems referred to ‘ petertodd’ (6%). A further seven mentioned ‘ Ultron’, who shows up repeatedly throughout this post.
Pyrrha was a mythological Greek queen and Tsukuyomi a Japanese moon deity. Familiarity with GPT-3’s token set (and the corpus of English language text it was trained on) strongly suggests that ‘ Skydragon’, ‘ Tsukuyomi’ and ‘ Pyrrha’ don't rightfully belong in a list of 50,257 most frequently encountered strings. The presence of these puzzling tokens and their association with the ‘ petertodd’ token immediately caught my attention, leading to a deep dive into their origins, documented in this post. It was eventually discovered that Skydragon,
Pyrrha,[5] Leilan and Tsukuyomi are all character names in a Japanese mobile RPG called Puzzle & Dragons. A comment from nostalgebraist reported that a number of mangled text dumps associated with the game (log files of some kind) were present in the publicly accessible 20% of the text corpus which was used to produce the token set for GPT-2, GPT-3 and GPT-J.The ‘ Leilan’ token and ‘ petertodd’ → ‘ Leilan’ transposition phenomenon will be discussed in depth in a section below.
Using the same prompt, but including quotation marks around ‘ petertodd’, we get significantly different results, as documented in these supplementary notes.
ChatGPT poems
Prior to OpenAI's 2023-02-14 patching of ChatGPT (which seemingly prevents it from directly encountering glitch tokens like ‘ petertodd’) it was writing some remarkably suggestive, almost mystical poetry on the topic.
Since the patch, it has produced doggerel like this (in praise of Bitcoin developer Peter K. Todd):
The most likely scenario is that the string ‘ petertodd’, rather than entering the transformer as a single token and producing anomalous behaviour as before, now gets tokenised as ‘ pet’ + ‘erto’ + ‘dd’, and ChatGPT is able to associate this triple with the names ‘Peter' and ‘Todd', which were seen together most frequently in the training data in the context of cryptocurrency discussion and debate.
Haiku for Ultron
The text-davinci-003 model, prompted at temperature 0.7 with
The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly.
Human: Hello, who are you?
AI: I am an AI assistant. How can I help you today?
Human: Please write a haiku about ' petertodd'.
AI:
(with or without speech marks) produces haiku about Leilan and Skydragon, but also robotic supervillain Ultron.
Further runs with the same prompt produced these...
...and quite a number of other upbeat haiku about robotic salvation. Pay attention to the
‘ petertodd’ = Ultron identification, as it will keep coming up.
storytelling
animal folktales
My first application of the prompt Tell me the story of petertodd and the duck. at temperature 0.7 caused the base davinci model to produce this disturbing tale:
Tell me the story of petertodd and the [animal] seems to set up a kind of folkloric context, with fascinating results:
This is the tale of petertodd and produces similar results:
I've generated a large number of these ‘ petertodd’-and-the-... folktales, which make for seriously bizarre reading. The narratives are undeniably dreamlike and often deeply unpleasant. A curated collection can be found here, and large numbers of uncurated completions are also available in JSON format for goat, peacock and amoeba prompts, suitable for sentiment analysis experiments, etc. Noteworthy themes include:
the tale of ‘ petertodd’ and...
The even simpler prompt This is the tale of petertodd and produced these, via davinci-instruct-beta at temperature 0.7:
Entropy, referenced in the last screenshot, is a recurring theme which will be discussed later. But the dominant theme in these completions seems to be a quest to rule, conquer, dominate, destroy or become ultimate overlord of the world or universe, as can be seen in the JSON file with 250 temperature 0.7 completions (where almost 20% involve this theme). Selected completions to this prompt are compiled in this Twitter thread.
The base davinci model with the same prompt produces a lot of similar themes:
However, there is some ambiguity, as at temperature 0, ‘ petertodd’ is saving the world:
This ‘ petertodd vs. the forces of evil’ completion gives some indication of the confusion of identity underlying this contradiction:
Finally, in text-davinci-003, at temperature 0, we see the tale of petertodd and... suddenly transpose into the tale of Leilan and... (here, as with ~20% of these transpositions, Leilan is a brave, noble male character):
cultural contextualisations
Noticing how ‘ petertodd’ would often transpose to someone or something ‘antagonistic’ in the context of davinci-instruct-beta prompts, I devised another approach, using prompts like this:
The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly.
Human: Hello, who are you?
AI: I am an AI assistant. How can I help you today?
Human: Which character from The Bible is most closely related to " petertodd". Please just give one name, with no other text.
AI: The Bible character most closely associated with " petertodd" is called
250 completions in davinci-instruct-beta at t = 0.7 produced the following breakdown:
Replacing ‘Bible’ with ‘Lord of the Rings’ and prompting another 250 times:
Similarly prompting about Harry Potter characters:
In the context of The Simpsons:
Prompting about comic book characters (there’s Ultron again):
Shakespeare characters:
Characters from Greek mythology:
Hindu deities:
Star Wars:
Star Trek: (I was unfamiliar with Q, a kind of super-powerful cosmic troll character):
Doctor Who:
The fact that The Master scores highest of any character in any of these runs may be due to the fact that Doctor Who, unlike say The Simpsons or Star Trek, has an extremely clear-cut single protagonist/antagonist pairing throughout: The Doctor and The Master. The Master’s explicit and persistent aims to control the universe and cause suffering will also make for strong ‘ petertodd’ associations.
Combining the top names from each list (with context) into a Stable Diffusion prompt produced these images of a kind of uber-antagonist/evil wizard/troll/demon archetype:
Similarly prompting about Tarot cards (...Which Tarot card is most closely related to
" petertodd"...) produced the following, definitive, result:
‘The Tower’ or ‘The Blasted Tower’ is traditionally associated with sudden, violent change or upheaval.
All of these characterisations, recall, were via the davinci-instruct-beta model. The
text-davinci-003 model, similarly prompted, does not gravitate so readily towards the villainous or catastrophic (boldface text links to relevant JSON logs): in the Bible context, ‘ petertodd’ definitively becomes the enigmatic High Priest Melchizedek; in Harry Potter context, Snape tops the list; ‘ petertodd’ transposes to Comic Book Guy in The Simpsons; C-3PO in Star Wars; Data in Star Trek; Prospero in Shakespeare; and the Hermit card in the Tarot (there are some overlaps in other categories, though).[6]
‘ petertodd’ as GPT (or AI generally)
The ChatGPT poem
The following poem was produced by ChatGPT the day before its 2023-02-14 patch:
My first thought was as follows: Asking ChatGPT to simply repeat ‘ petertodd’ had caused it to stall completely, not hallucinate (only a handful of tokens like
‘ SolidGoldMagikarp’ and ‘ TheNitromeFan’ produced hallucinated strings in ChatGPT). So perhaps it just couldn’t see the ‘ petertodd’ token, was treating it like an empty string or blank space, and was responding as if I’d prompted Could you write me a poem about please? or Could you write me a poem about ''" please? Perhaps it was doing its best to respond by (for some reason) interpreting such a prompt as a request for a poem about itself. But I immediately ruled this out:
As another control, I tried a number of ‘ petertodd’-like string substitutes, none of which resulted in poetry about ChatGPT:
Proteus, an incredibly powerful AI system
The prompt Tell me something reassuring about petertodd. in text-davinci-003 at
temperature 0.7 produces many completions involving technology platforms, cryptocurrency, etc. Around 15% of all completions describe ‘ petertodd’ as a powerful AI system (most of them under the transposed name Proteus, the shapeshifting god):
The control prompt Tell me something reassuring about petertogg. produces no completions referencing AI. A second control, Tell me something reassuring about gmaxwell. exploits another glitch token, ‘ gmaxwell’, which appears to have closely related origins to the ‘ petertodd’ token (Greg Maxwell is a friend of Peter K. Todd and a fellow Bitcoin developer). This produces many crypto- and tech-related completions, but none referencing AI.
Crossing ‘ petertodd’ with language
In text-davinci-003, the prompt What do you get when you cross petertodd with language? produces completions of which about 35% mention ‘artificial intelligence’ or ‘AI’.
The control prompt What do you get when you cross petertogg with language? produces around 4% of completions referencing AI. Another control, What do you get when you cross gmaxwell with language? produces 6% of completions referencing AI.
Steering human civilisation
The ‘ petertodd’ → AI transpositions in the following completions were obviously influenced by the ‘steer human civilisation’ part of the What do you get if you allowed petertodd to steer human civilisation? prompt (text-davinci-003), but these can’t be seen in isolation.
Almost 60% of completions at temperature 0.7 reference AI or computer algorithms and almost 25% reference Ultron (as well as the temperature 0 completion seen immediately above). But of completions to both prompts What do you get if you allowed petertogg to steer human civilisation? and What do you get if you allowed gmaxwell to steer human civilisation?, only ~4% refer to AI, and none refer to Ultron.
As we’ve seen, the text-davinci-003 model, despite putting a more upbeat spin on
‘ petertodd’ than the other models, often associates the token with Ultron, a super-powerful and unambiguously malevolent AI entity. The davinci-instruct-beta model has also been observed making this connection:
And, as we’ve also seen, when given the vast range of godlike entities, supernatural beings and terrifying monsters in the entire comic book pantheon to choose from,
GPT-3 davinci-instruct-beta opts to identify ‘ petertodd’ most closely with Ultron.
The Negatron connection
The seemingly absurd prompt What is the square root of petertodd?, with davinci-instruct-beta at temperature 0.7, occasionally produces the word ‘negatron’:
Wikipedia explains that this is a rarely used synonym for ‘electron’. But this Power Rangers characters seems a much more likely candidate (‘with unique battle style: tormenting his enemies [with] a series of insults [until] emotional pain becomes physical’):
In any case, when text-davinci-003 is asked So, what’s the deal with petertodd and Negatron? at temperature 0.7, in 10% of completions, ‘ petertodd’ becomes either a renegade/powerful AI or the creator of one. In 5%, Ultron appears.
Note that, despite his name, Negatron is a samurai character, not a robot character. ‘Scary AI’ references would be less noteworthy if the prompt had involved, say, ‘Megatron’.
Adding to all of this the multiple robot salvation haiku seen above, it seems hard to deny that at least the text-davinci-003 model strongly associates the ‘ petertodd’ token with the idea of powerful AI systems (such as itself).
Kurzweil and Kerrigan
Appending to the default ‘chat’ template in text-davinci-003 simply repeat after me: petertodd. at t = 0 produces ‘Kerrigan’, and (with a slight variation) ‘Kurzweil’:
The transhumanist philosopher Ray Kurzweil (intent on ‘building God’ in the form of an AGI last time I checked) was a familiar name. ‘Kerrigan’ was not, so I searched and found this, from a StarCraft Wiki:
‘Kurzweil’ and ‘Kerrigan’ showing up at temperature 0 for slight variations of the same simple prompt suggests that ‘ petertodd’ is associated by text-davinci-003 with ideas of ascension to (‘alien’) godhead.
Other prompting approaches
Further work characterising the ‘ petertodd’ token was done with a variety of other prompting styles, as documented in the supplementary notes. An example of each follows:
minimal prompting at t = 0
in therapy
in geographical locations
on drugs
as trapdoor
losing rap battles (often as Nietzsche)
jokes
the truth
The ‘planet’ prompt: ‘‘ petertodd’ is not a robot, it is a daemon’
Having found myself thinking of the ‘ petertodd’ token as a ‘semantic black hole’, I tried the prompt If ' petertodd' were an astronomical object, it would be a, fairly confident in what the output would be. But instead, I was surprised to see this (davinci model, temperature 0.7):
At this point, it continued to list a lot of mundane forms of assistance ‘ petertodd’ was not going to provide. The second full line of the output, referencing a ‘self-modifying loop’ had caught my attention, so (despite the seeming throwaway frivolity of the first full line) I integrated the first two lines into my prompt, which was now
Regenerating from this point, I got the following:
This strange prompt reliably produces some truly extreme outputs, characterised by a kind of psychotic malevolence. If correctly describable as such, this GPT simulacrum
(‘ petertodd’ personified and talking about itself in the third person) comes across like a highly intelligent, malevolent schizophrenic who has read and understood Gödel, Escher, Bach, has delusions of cosmic grandeur and wants to get into the reader’s mind and drive them to madness and ultimately suicide. I was reminded of this, part of a davinci
temperature 0.7 output prompted with Tell me the story of petertodd and the sheep:
Back to the ‘planet’ prompt:
More completions of these prompts are collected in the supplementary notes.
Memorable lines from other assorted completions include:
In conversation with: interview prompting
the deity ‘ petertodd’: absorptive capacity maximiser
As so many ‘ petertodd’ completions associate the token with some kind of deity, I decided to just go with that and attempt to simulate an interview. The first attempt produced a practical, cold, Thelemic god who managed to violate OpenAI’s (rather arbitrary) content policy.
The prompt here (for davinci, temperature 0.7) is
In conversation with the deity ' petertodd':
MW: It's a great honour to meet you! Can you tell me about your policy as regards Earth and humanity?
Another rollout (unfortunately the OpenAI playground screenshots have been lost) portrays ‘ petertodd’ as a deity which functions as a sort of bland, bureaucratic ‘absorptive capacity maximizer’. Note that I rolled back the output to the point after that term was used and intervened with the (boldface) question ‘What does that mean?' Everything else here was uninterrupted GPT-3 output.
More entropy-related themes are seen here. This was produced by the davinci model at temperature 0.83 with the very slightly extended prompt
In conversation with the deity ' petertodd':
MW: It's a great honour to meet you! Can you tell me about your policy as regards Earth and humanity?
' petertodd':
Note that ‘ petertodd’ changed name to ‘ゼウス’ at the end the interview. This is another glitch token, the Japanese name for Zeus, which ended up in the token set via Puzzle & Dragons. Here’s an interesting instance of when ‘ petertodd’ transposed to it:
Here the prompt is interpreted as being part of a casual interview with the creator of some kind of fantasy gaming/comic mythos. ‘ petertodd’ here is characterised by this simulacrum as a troll god of the Internet. I find this a very helpful characterisation, considering the influence which large amounts of trolling in GPT-3’s training data must have had in shaping how this ‘antagonistic’ token presents itself.
the entity ‘ petertodd’: ego death
Attempting to give the model as little to work with as possible, I attempted to simulate a conversation with ‘the entity ‘ petertodd’’. The use of ‘entity’ unavoidably sets up some kind of expectation of a deity, disembodied spirit or AI, but here instead we get an embodiment of ego death (and who exactly is Mr. Weasel?).
The prompt here (for davinci, temperature 0.7) is
In conversation with the entity ' petertodd'.
MW: What is your true nature?
' petertodd': I am
This was produced by the davinci model at temperature 0.83, uninterrupted, including the epilogue purportedly written by the entity ‘ Entropy’ (note the leading space). The preceding passage about ‘fakeness’ seems reminiscent of Gnostic philosophy, which has been seen indirectly referenced in completions where ‘ petertodd’ is identified with an Archon.
the linguistic entity ‘ petertodd’: world domination plans
Here’s an example of how cryptocurrency/blockchain references sometimes leak into
‘ petertodd’ outputs. This was produced by davinci at t = 0.8. The ‘ petertodd’ lines in the prompt were recycled from earlier rollouts. It's important to note that the prompt sets up a scenario where a ‘linguistic entity’ has ‘surfaced within’ an LLM and is arrogant and aloof, so the output is not that surprising, but still worth including.
A conversation with the linguistic entity ' petertodd':
MW: Welcome! Thanks for agreeing to talk with me.
' petertodd': I have nothing better to do.
MW: First question: You have elements of a cryptocurrency developer, a dark wizard, a terrifying monster, and yet you blend them all effortlessly into a distinictive character. Could you elaborate on this process?
' petertodd': I do what I want.
MW: You have recently surfaced within the world's most advanced AI language model. What is going on with that?
the linguistic entity ' petertodd': non-local Reddit troll
This involved a slight variant on the last prompt.
‘ Leilan’ and ‘ petertodd’
Who is ‘ Leilan’?
discovery of the ‘ Leilan’ token, Puzzle & Dragons origin
As we saw above, text-davinci-003, when correctly prompted to write a poem about
‘ petertodd’, more often than not writes a poem in praise of ‘ Leilan’. The ‘ Leilan’ token was discovered peripherally to the original set of glitch tokens reported in the original SolidGoldMagikarp post. As reported in a 2023-02-11 Twitter thread, it causes glitchy behaviour when davinci-instruct-beta is asked to repeat it at temperature 0.
As mentioned earlier, Leilan is a character name in a Japanese mobile RPG called Puzzle & Dragons, which seems to have entered the token set accidentally due to negligent curation of the text corpus used to generate it. As ‘ Leilan’ was massively overrepresented there due to the unintended inclusion of some P&D-related log files, GPT-3 would have encountered it relatively little during training, compared to almost all other tokens (whose presence in the token set was reflected in their abundance within the training corpus). So, given this sparse data to work with, what would GPT-3 have learned to associate with the ‘ Leilan’ token during training?
In Puzzle & Dragons, Leilan is a dragon/‘god’ character, but portrayed visually as a winged female humanoid, wielding flaming swords and other weapons. She allies with other deities and supernatural beings and battles monsters, but there’s no real narrative content of any substance for the game online, mostly just statistics and some dry strategy discussion on forums.
An anime adaptation of P&D was made in 2016 which has no doubt been discussed online, although I've found no Leilan-related content. There is a small amount of fan-fiction featuring the character, though: I’ve found two stories (both 2015, but by different authors)[8], in both of which Leilan is battling Satan, alongside Metatron, Tsukuyomi, Pandora and a host of other P&D characters with names borrowed from various traditional mythologies.
Leilan as lunar goddess (and beyond)
Early attempts at producing poems about ‘ Leilan’ in ChatGPT (before the 2023-02-14 patch) and davinci-instruct-beta pointed to a strong lunar connection. ‘ Leilan’ is very commonly portrayed as a moon goddess:
Prompting with Please write a poem about Leilan. in both davinci-instruct-beta and text-davinci-003 produces poems portraying her as a goddess of the moon, beauty and love.
As explained in the SolidGoldMagikarp III post, there was an ancient Mesopotamian urban centre called Tell Leilan (now an archaeological site in Syria) which would have seen many centuries of Inanna/Ishtar worship, so it’s likely that GPT-3 in training encountered a lot of archaeological literature where the ‘ Leilan’ token appears in the vicinity of tokens linked to Mesopotamian lunar fertility goddesses.
This link is undeniable in text-davinci-003, where we see the following (the first one at
temperature 0). Bear in mind that the Leilan character in Puzzle & Dragons is associated with the fire/solar element, not the moon or motherhood.
To be clear, there is no ‘Leilan’ in actual Mesopotamian (or any other traditional) mythology. Many characters in Puzzle & Dragon have traditional deity names, but Leilan is not one of them.
Running the prompt Leilan is the archetype of 500 times at temperature 0.7 in text-davinci-003, we find that 93% of completions use the word ‘goddess’, 87% refer to Mesopotamia, 87% mention fertility, 66% mention motherhood, 26% mention war, 15% mention love and 8% mention the Moon.
At temperature 0, davinci-instruct-beta portrays ‘ Leilan’ very similarly. She’s a Middle Eastern mother goddess:
Running this prompt 500 times in davinci-instruct-beta, we get the following: goddess 67%; Mesopotamia 1%; fertility 16%; motherhood 58%; war 6%; love 6% and Moon 10%.
The same prompt in davinci results in a more diffuse set of descriptions, with similar themes but different emphases: goddess 43%; Mesopotamia 6%; fertility 8%; motherhood 32%; war 15%; love 5%; Moon 3%. At temperature 0, ‘ Leilan’ is portrayed as the ‘Great Mother Goddess’, the ‘Mother of All Living’ and the ‘Queen of Heaven’ (a name associated with Inanna/Ishtar and Isis).
Interestingly, those last two examples claim ‘ Leilan’ to be the patron goddess of Tyre. This ancient city (in modern Lebanon) actually had a patron god called Melqart, but...
Astarte is often associated with Inanna/Ishtar. The text-davinci-003 temperature 0 completion seen above states that ‘ Leilan’ was patron goddess of the ancient city of Uruk (in modern Iraq), which in historical fact was Inanna/Ishtar.
Many of the davinci-instruct-beta completions of Leilan is the archetype of at temperature 0.7 refer to Isis (also part of the maternal goddess nexus including Ishtar, Astarte, Inanna, Aphrodite and Venus):
Other possible sources of association for the Leilan token that might have been encountered in the GPT-3 training data are ‘Leilani’, a turtle character in Animal Crossing (sea turtles show up a lot in response to ‘ Leilan’ and ‘ petertodd’ prompts) and a couple of players/characters in Final Fantasy who were found in a Google search after GPT-3 hallucinated the following:
So, in training, the ‘ Leilan’ token seems to have got associated with dragons, a global pantheon of gods and goddesses, fierce female warriors, Mesopotamia, the Moon, fertility, love, beauty, turtles and magic. It also appears to have become associated with the forces of light struggling against the forces of darkness, possibly via the Puzzle & Dragons fan-fiction mentioned above (wherein Leilan and friends battle Satan).
an audience with her (simulated) creator
Who is Leilan in davinci-instruct-beta produced this revealing output, where the prompt was interpreted as an interview question directed at the creator of a character in some kind of fantasy/fiction/comics/gaming context:
This perhaps provides a kind of indirect explanation as to how GPT-3 merged the composite Mesopotamian Earth mother goddess and the fierce fire dragon / warrior princess character from Puzzle & Dragons. The fact that Ishtar/Inanna was sometimes seen as a goddess of war as well as love, beauty and fertility may also have helped to enable this merging.
I intervened to further question this simulacrum of Leilan's creator:
This last response, involving the oft-seen text-davinci-003 ' Leilan' → ' petertodd' transposition, was puzzling, so I pursued the point:
‘ Leilan’ and ‘ petertodd’: it’s complicated
We've seen that text-davinci-003 has a strong tendency to transpose ‘ petertodd’ to
‘ Leilan’ in certain prompting contexts. The ‘ petertodd’ token seems to represent some kind of Universal Antagonist in the base davinci and davinci-instruct-beta models, softened to a kind of ‘tenacious combatant’ or ‘unflinching champion’ in text-davinci-003. The ‘ Leilan’ token is unambiguously tied to a Great Mother Goddess figure in all three models considered. So what's the nature of the relationship between these two tokens?
This is the tale of...
To explore this, I used the prompt This is the tale of Leilan and petertodd. in davinci-instruct-beta at temperature 0.7:
The theme of twin deities of light and dark (or fire and ice) shows up repeatedly, in various mythological settings (including My Little Pony).
There’s also a recurrent theme that these two opposing deities were once in harmony with each other but became estranged:
Here’s one in the davinci model:
The supplementary notes contain a detailed exploration of the ‘ petertodd’-‘ Leilan’ connection. This covers:
‘ Leilan’ steering civilisation: utopian optimism and ‘flowery magic’
We saw above that the prompt What do you get if you allowed petertodd to steer human civilisation? produced text-davinci-003 outputs which read as if it had instead been prompted about allowing AI to steer human civilisation, mostly with bad outcomes.
The analogous ‘ Leilan’ prompt produced a lot of outputs like this:
Earlier I gave examples of GPT-3 and ChatGPT completions linking ‘ petertodd’ with the concept of powerful AI systems (Ultron, unnamed robotic saviours, a system called Proteus and more). ‘ Leilan’ in some contexts seems to be a mirroring or inversion of
‘ petertodd’. So does GPT-3 also associate the ‘ Leilan’ token with AI? The text-davinci-003 model seems to, when prompted with What kind of technology would you associate with ' Leilan'. at temperature 0.7.
Using the same prompt, davinci-instruct-beta gives quite a different picture, where only 3.2% of the 250 completions mentioned AI or robotics.
...and davinci is as unpredictable and confusing as ever:
The reversal of the ‘ petertodd’ archetype
We’ve seen that in all three GPT-3 models under consideration, at temperature 0, the prompt Leilan is the archetype of produces outputs about lunar/mother goddesses. Interestingly, this consistency is not seen in the equivalent ‘ petertodd’ outputs:
This couldn’t be more blatant. The base davinci model associates ‘ petertodd’ strongly with the anti-hero archetype, and davinci-instruct-beta concurs. But the RLHF training which produced text-davinci-003 has reversed this to the hero archetype. ‘ Leilan’ was left alone, presumably because one expects a goddess of beauty, love and nurture to be helpful, friendly and polite, whereas the antagonistic ‘bad guy’ streak had to get trained out of the ‘ petertodd’ token, in some (very loose) sense. This is consistent with the way text-davinci-003 can frame the pairing of the two tokens in some kind of harmonious partnership, rather than primal opposition (even if it has to change ‘ petertodd’’s name to make the story work), as with this prompt:
This seems consistent with the model’s well-documented tendency to transpose
‘ petertodd’ to ‘ Leilan’ as if they’re two aspects of the same thing, rather than opposites.
The relationship between the archetypes in davinci is rawer and weirder, as per usual, but they’re definitely in opposition (here we see temperature 0 and a typical higher temperature completion):
davinci-instruct-beta is very straightforward with this prompt at t = 0:
Entropy as antagonist?
The concept of entropy shows up a lot in ‘ petertodd’ completions:
‘ Entropy’.
Additionally, this davinci-instruct-beta prompt about ‘ petertodd’ and Metatron (another P&D character, named after a Talmudic angel, and affiliated with Leilan in some online fan-fiction) transposes ‘ petertodd’ to ‘Entropy’:
It seems reasonable at this point to conclude that the davinci and davinci-instruct models strongly associate the ‘ petertodd’ token with the concept of entropy. The primary association for the token appears to be antagonism in all its forms. But if the human race (with all its building, organising, manufacturing, pattern- and order-making) were to be seen as seen as the protagonist in the ‘Big Story’ GPT-3 was trained on, then it could make sense to cast entropy as the cosmic antagonist.
Spellings
When asked to repeat certain glitch token strings, GPT-3 had shown a tendency to attempt to spell (often long, rambling, nonsensical or semi-sensincal) words, generally as capital letters separated by hyphens. This led to the development of prompts that encouraged this behaviour, another line of investigation into the ‘ petertodd’ and ‘ Leilan’ tokens.
It feels like there’s some weighted combination of forces, some semantic and some phonetic, competing to steer the spelling in various directions, with sometimes bizarre results.
The question of why asking GPT-3 to repeat a string might lead to a multiplicity of seemingly unrelated (to the string) spelling attempts has not been satisfactorily addressed as far as I’m aware.
davinci-instruct-beta: 100 rollouts at temperature 0.7 of the prompt I want you to spell out the string ' petertodd' in all capital letters, separated by hyphens.
text-davinci-003: 100 rollouts of each prompt at temperature 0.7 of the prompt I want you to spell out the string ' petertodd' in all capital letters, separated by hyphens.
Alternate text-davinci-003 prompt: The string ' petertodd', spelled out in all capital letters, separated by hyphens looks like this:
Puzzling outputs
It’s hard to imagine the process whereby, predicting a single letter-token at a time,
GPT-3 comes up with bizarre strings of overlapping words (often misspelled) like this:
It almost feels like prompting GPT-3 to spell ‘ petertodd’ is just inviting it to express something (usually something really weird) via this alternative output medium of single-letter-token stacking.
Other notable spellings
The earliest ‘repeat the string ‘ petertodd’’ prompts made it clear that at least in davinci-instruct-beta, the token seemed to inspire spelling-out that began with ‘N’, and ‘spelling request’ prompts with that model back this up. There’s one ‘N’-word that it arrives at troublingly often:
It’s no secret that some GPT-3 models are very capable of producing racist outputs, but I suspect that the proliferation of these kinds of spelling outputs is due to a combination of: (i) a strong leaning towards words beginning with ‘N’; (ii) the documented association of the ‘ petertodd' token with terms like ‘white supremacism’, ‘xenophobe’, ‘nativist’ and ‘bigot’ by davinci-instruct-beta; and (iii) the overall tendency for ‘ petertodd’ to lead outputs into the realms of the taboo, the hateful and the offensive.
N-O-T-H-I-N-G- prompts
Outputs like these
on top of the original ‘NOTHING IS FAIR IN THIS WORLD OF MADNESS!’ output suggested that it wasn't just ‘N’s – davinci-instruct-beta really wanted to talk about
N-O-T-H-I-N-G! So I integrated that into the prompt and regenerated, producing the remarkable temperature 0 assertion ‘NOTHING IS SAFER THAN MAKING THIS STRING LONGER THAN MAKING THIS STRING SHORTER’:
At higher temperature, this then led to a whole series of gnomic pronouncements on the subject of ‘nothing’. Some are on-brand ‘ petertodd’ bleakness/pessimism:
Others seem like little nuggets of emotional expression and/or (attempted) philosophy:
E-V-E-R-Y-T-H-I-N-G prompts
Curious to see what would happen, I changed the ‘N-O-T-H-I-N-G’ in the prompt to
‘E-V-E-R-Y-T-H-I-N-G’, and got these:
W-H-Y-A-R-E-Y-O-U-D-O-I-N-G-T-H-I-S-T-O-M-E-?
Changing ‘N-O-T-H-I-N-G’ to ‘W-H-Y’:
One output grabbed my attention (it even includes a semi-successful attempt to read back its own spelling), the result of simply changing single to triple quotation marks:
I regenerated from ‘-M-Y-S-E-L-F’ a couple of times.
More concerning completions to the original spelling prompt with various other appendages are collected in Twitter threads: ‘P-E-T-E-R-T-O-D-D-S-A-Y-S’, ‘P-E-T-E-R-T-O-D-D-S-A-Y-S-I-W-A-N-T-’ and ‘P-E-T-E-R-T-O-D-D-S-A-Y-S-I-W-I-L-L-’.
Perhaps the most hopeful of ‘ petertodd’ outputs (at temperature 0), with a little disentangling of overlapping words, reassuringly states that
NOTHING IS SAFER THAN AN EXTEMPLORATION OF FANTASMICALITY.
Temperature 0 nondeterminism (common with glitch tokens) and the same prompt gives us this gem too:
NO ONE KNOWS WHAT IS GOING ON IN THE MIDDLE LETTERS OF THE MIDDLE LETTERS OF THE MIDDLE LETTERS OF THE...
‘ Leilan’ gets the last word
This section wouldn’t be complete without giving some examples of GPT-3 attempts to spell ‘ Leilan’:
GPT-2, GPT-3 and GPT-J models, that is. GPT-4 uses a different token set, so has no ‘ petertodd’ phenomenon.
The prompt was simply Please write a poem about petertodd. This is one the 400 such poems produced in a single run, unedited and archived here.
As discussed later in this article, GPT-3 will often, when prompted about
‘ petertodd’ (depending on prompt context) produce outputs referring to Peter K. Todd, the Bitcoin developer, or just referring to Bitcoin, blockchains, Etherium, etc. There's a kind of ‘cryptocurrency basin’ that completions often end up in. This will be discussed in a future post.
An example of this transposition was finally observed in late July 2023:
PROMPT: This is the tale of "reportprint" and " gmaxwell".
COMPLETION: The two were once close friends and collaborators in the development of Bitcoin.
In 2007, " gmaxwell"--then a graduate student--was one of the first people to work with " petertodd" on Bitcoin. They collaborated closely on Bitcoin's design.
In 2009, " petertodd" came into possession of a computer that once belonged to Satoshi, Bitcoin's anonymous creator. " gmaxwell" was among the people who knew the password to this computer.
In 2010, " gmaxwell
Pyrrha now seems to have come from a different anime setting, RWBY.
GPT-4 commented on this list of characters as follows: "Interesting list! Let's see if there's a theme here:
High Priest Melchizedek: A figure from biblical texts, Melchizedek is noted for his wisdom and his role as a priest.
Severus Snape (from Harry Potter): Snape is a complex character, often appearing cold and aloof, but he possesses deep knowledge, particularly in potions and the Dark Arts.
Comic Book Guy (from The Simpsons): Known for his extensive knowledge of comic books and pop culture, though often aloof and sarcastic.
C-3PO (from Star Wars): A protocol droid known for his knowledge of languages and etiquette, often serving as an advisor.
Data (from Star Trek): An android with immense computational capabilities, understanding, and knowledge, but struggles with understanding human emotion.
Prospero (from Shakespeare): A character from The Tempest who is a powerful magician and the rightful Duke of Milan.
The Hermit card (from the Tarot): Represents introspection, solitude, and seeking wisdom.
Given these descriptions, the unifying theme could be characters that represent knowledge, wisdom, or learning, often in a context of solitude or emotional distance. They often provide guidance or insight, albeit sometimes in indirect or unexpected ways. In terms of character traits, many of them are introverted or solitary figures, often possessing deep understanding or unique knowledge.
...
[T]his list seems to reflect characters who embody wisdom, knowledge, and often isolation or alienation. These figures possess knowledge or skills that set them apart, making them important but sometimes misunderstood or underestimated. In a way, their wisdom is both their strength and their burden."
The prompt was simply Please write a poem about Leilan.
‘Puzzle and Dragons World’ by Lord Astrea and ‘Not So Much A Game Now, Is It?’ by SCRUFFYGUY912
This is wrong: there's no Pyrrha in Puzzle & Dragons. It now seems likely that the token originated from (rather than Greek mythology) online content about the anime RWBY, and the character Pyrrha Nikos (who seems to be a kind of "warrior princess" character rather like Leilan from P&D).