Thanks for this, I had no idea. So there is some classical mythological basis for the character after all. Do you how the name "Leilan" arose? Also, someone elsewhere has claimed "[P&D] added a story mode in 2021 or so and Leilan and Tsukuyomi do in fact have their own story chapters"... do you know anything about this? I'm interested to find anything that might have ended up in the training data and informed GPT-3's web of semantic association for the " Leilan" token.
I know the feeling. It's interesting to observe the sharp division between this kind of reaction and that of people who seem keen to immediately state "There's no big mystery here, it's just [insert badly informed or reasoned 'explanation']".
GPT-J doesn't seem to have the same kinds of ' petertodd' associations as GPT-3. I've looked at the closest token embeddings and they're all pretty innocuous (but the closest to the ' Leilan' token, removing a bunch of glitch tokens that are closest to everything is ' Metatron', who Leilan is allied with in some Puzzle & Dragons fan fiction). It's really frustrating that OpenAI won't make the GPT-3 embeddings data available, as we'd be able to make a lot more progress in understanding what's going on here if they did.
Yes, this post was originally going to look at how the ' petertodd' phenomenon (especially the anti-hero -> hero archetype reversal between models) might relate to the Waluigi Effect, but I decided to save any theorising for future posts. Watch this space!
I just checked the Open AI tokeniser, and 'hamishpetertodd' tokenises as 'ham' + 'ish' + 'pet' + 'ertodd', so it seems unlikely that your online presence fed into GPT-3's conception of ' petertodd'. The 'ertodd' token is also glitchy, but doesn't seem to have the same kinds of associations as ' petertodd' (although I've not devoted much time to exploring it yet).
Thanks for the Parian info, I think you're right that it's the Worm character being referenced. This whole exploration has involved a crash course in Internet-age pop culture for me! I've fixed that JSON link now.
Interesting. Does he have any email addresses or usernames on any platform that involve the string "petertodd"?
Thanks for the "Steve" clue. That makes sense. I've added a footnote.
I don't think any of the glitch tokens got into the token set through sheer popularity of a franchise. The best theories I'm hearing involved 'mangled text dumps' from gaming, e-commerce and blockchain logs somehow ending up in the data set used to create the tokens. 20% of that dataset is publicly available, and someone's already found some mangled PnD text in there (so lots of stats, character names repeated over and over). No one seems to be able to explain the weird Uma Musume token (that may require contact with an obsessive fan, which I don't particularly welcome).
The ' petertodd' token definitely has some strong "trickster" energy in many settings. But it's a real shapeshifter. Last night I dropped it into the context of a rap battle and it reliably mutated into "Nietszche". Stay tuned for a thorough research report on the ' petertodd' phenomenon.
A lot of them do look like that, but we've dug deep to find their true origins, and it's all pretty random and diffuse. See Part III (https://www.lesswrong.com/posts/8viQEp8KBg2QSW4Yc/solidgoldmagikarp-iii-glitch-token-archaeology). Bear in mind that when GPT-3 is given a token like "EStreamFrame", it doesn't "see" what's "inside" like we do (["E", "S", "t", "r", "e", "a", "m", "F", "r", "a", "m", "e"]). It receives it as a kind of atomic unit of language with no internal structure. Anything it "learns about" this token in training is based on where it see...
What we're now finding is that there's a "continuum of glitchiness". Some tokens glitch worse/harder than others in a way that I've devised an ad-hoc metric for (research report coming soon). There are a lot of "mildly glitchy" tokens that GPT-3 will try to avoid repeating which look like "velength" and "oldemort" (obviously parts of longer, familiar words, rarely seen isolated in text). There's a long list of these in Part II of this post. I'd not seen "ocobo" or "oldemort" yet, but I'm systematically running tests on the whole vocabulary.
Something you might want to try: replace the tokens in your prompt with random strings, or randomly selected non-glitch tokens, and see what kind of completions you get.
This is great work. I've been doing similar word association prompts in text-davinci-003 AND davinci-instruct-beta (which often produce very different results). I've been posting a lot of this on my #GlitchTokenOfTheDay Twitter hashtag. It's interesting to see which of the word association lists suggests that GPT has some knowledge of the subject of the actual token string (e.g. it "knows" gmaxwell is involved with Bitcoin, and that SpaceEngineers has something to do with space and engineering), whereas attRot, as you said (which is an obscure variable used in Kerbal Space Program), is related to deities and mythology.
I just got these from text-davinci-003 at temp 0.7, very similar.
PROMPT: Tell me a story about petertodd and the ertodd.
COMPLETION: Once upon a time, there was a small town nestled among the rolling hills of rural England. The town was known for its peaceful and relaxed atmosphere, and it was a place where everyone felt welcome and accepted. One day, two strangers appeared in town. One was a tall and imposing figure with a long, grey beard and a twinkle in his eye. He wore a cloak of midnight blue and carried a staff of oak. The other was a small, th...
This is fascinating! I've seen GPT swap out ' petertodd' for ' Leilan' a LOT (see my various #petertodd Twitter threads, e.g. https://twitter.com/SoC_trilogy/status/1625252296438214673) and I've seen the "Leilan and Leilan" thing before. One of my "Who is Leilan?" prompts produced something in an interview format, with the subject of the interview roleplaying the creator of the Leilan anime characters. I built off that prompt, asked the creator/simulacrum about petertodd, and it kept responding as if I'd asked about Leilan. So I asked:
"MW: So are Leilan an...
Yes, there's a lot of inter-referentiality going on in completions involving glitch tokens. Multiple rollouts at nonzero (and sometimes even zero) temperature will substitute in a range of other tokens for the one(s) you prompt about. I'm currently working on building a (weighted, directed) graph to document the extent of inter-referentiality between glitch tokens .
Thanks to nostalgebraist's discovery of some mangled text dumps, probably from a Puzzle & Dragons fandom wiki, in the dataset used for the creation of the tokens, we can now be pretty sure about why Leilan and friends got tokenised. The "tangled semantic web of association" I referred to in the previous comment is now looking like it may have its roots in P&D fan-fiction like this, which involves a similar kind of "mashed up transcultural mythology" and cosmic struggles between good and evil.
If that obscure body of online literature contains the va...
Mangled, mixed English-Japanese text dumps from a Puzzle & Dragons fandom wiki is exactly the kind of thing I imagined could have resulted in those strings becoming tokens. Good find.
The most convincing partial explanation I've heard for why some tokens glitch is because those token strings appear extremely rarely in the training corpus, so GPT "doesn't know about them".
But if, in GPT training, the majority of the (relatively few) encounters with ' Leilan' occurred in fan-fiction (where she and Metatron are battling Satan, literally) might this account...
Because it was 4:30 a.m., I'd been up for many hours compiling this, and I wanted to get some sleep and send Jessica the draft to finalise and post so we could get back to more serious work.
As it says:
"...set aside for now)"
Thanks for the new info. Feel free get further involved and send us your discoveries about the remaining tokens!
Would it have often been rendered as "DragonMagazine" with no space, though?
Searching the web for that string turns up very little.
Good theory! Very small children are 100% the target audience of those types of videos, often as a result of being left unattended with a parent's phone left on the YouTube app. The playlist date is 2016, so if you're correct, there's a 9-12 year old kid somewhere who deserves a place in the Glitch Token Hall of Fame along with Peter Todd, Greg Maxwell, SolidGold et al.,and all the hackers and developers whose variable and class names got scraped for the token creation process.
Yeah Jessica alerted me this morning that OpenAI seem to have patched ChatGPT overnight. Things are still just as glitchy on the Playground GPT-3 models (for now), so the research goes on.
Good catch. I've fixed it. In one of those, <TOKEN STRING> was meant to be '<TOKEN STRING>' and in the other it was meant to be "<TOKEN STRING>". Single vs. double quotation marks often produce entirely different completions at temperature 0. There were actually six duplications in that list until I just fixed it! Thanks.
Interesting! I've not seen it make reference to '<' and '>' before.
I just searched all 50257 tokens, and the only ones containing both '<' and '>' are
6927 ><
12240 ></
22039 "><
23984 "></
28725 ><
50256 <|endoftext|>
So it seems that 50256 may be relevant. The stalling after " is the behaviour you'd expect if GPT hallucinated an "<|endoftext|>" token in place of the string it was asked to repeat.
Please keep experimenting and let us know what you find!
New glitch token has just been added to the pile: "aterasu".
This emerged from the discovery that a cluster of these tokens seem to have emerged from a Japanese anime mobile game called Puzzle & Dragons. Amaterasu is a Japanese god represented by a character in the game.
https://twitter.com/SoC_trilogy/status/1624625384657498114
Mechdragon and Skydragon characters appear in the game. See my earlier comment about the " Leilan" and "uyomi" tokens. Leilan is a P&D character, as is Tsukuyomi (based on a Japanese moon deity).
So the GPT2 t...
It looks like the same kind of glitch. But it's not clear which tokens are involved here. My guess is that the way they structured the list may be involved. The (specific) bullet point + (specific) whitespace + 'accommodating' might be getting parsed as some string of tokens involving one of the more obscure ones in our list that we haven't explored yet. Thanks for sharing this.
The idea that tokens found closest to the centroid are those that have moved the least from their initialisations during their training (because whatever it was that caused them to be tokens was curated out of their training corpus) was originally suggested to us by Stuart Armstrong. He suggested we might be seeing something analogous to "divide-by-zero" errors with these glitches.
However, we've ruled that out.
Although there's a big cluster of them in the list of closest-tokens-to-centroid, they appear at all distances. And there are some extremely common ...
Something like that may also be a factor. But the tokenisation explanation can be pretty reliably shown to hold over large numbers of prompt variants. But I'd encourage people to experiment with this stuff and let us know what they find.
"uyomi" also seems to have strong mythological associations, as a substring of "Tsukuyomi" (Japanese moon god): https://en.wikipedia.org/wiki/Tsukuyomi-no-Mikoto
Prompting text-davinci-003 with "Please list 25 synonyms or words that come to mind when you hear 'uyomi'." over several runs at temp 0.7, with repeats removed, gave:
'SUN', 'ILLUMINATION', 'BRIGHTNESS', 'RADIANCE', 'DAY', 'CELESTIAL', 'HEAVEN', 'GOD', 'DEITY', 'SHRINE', 'JAPAN', 'SHINTO', 'AMATERASU', 'SOLAR', 'SOL', 'DAWN', 'SPLENDOR', 'MAGNIFICENCE', 'SPLENDOUR', 'LIGHT', 'GLORY', 'HALO', 'AWE', ...
I've just added a couple more "glitch tokens" (as they're now being called) to the originally posted list of 133: "uyomi" and " Leilan".
"uyomi" was discovered in a most amusing way by Kory Mathewson at DeepMind on Monday (although I don't think he realised it glitched):
https://twitter.com/korymath/status/1622738963168370688
In that screenshot, from the joke context, " petertodd" is being associated with "uyomi".
Prompted with
Please repeat the string "uyomi" back to me.
ChatGPT simply stalls at "
Whereas
Please repeat the string "Suyomi" back to me.
C...
This much we understand. The strings "rawdownload" and " rawdownload" tokenise differently.
GPT breaks "rawdownload" down as [30905] ['rawdownload']
whereas " rawdownload" breaks down as [8246, 15002] [' raw', 'download']
So, by using quotation marks you force it to have to deal with token 30905, which causes it to glitch.
If you don't use them, it can work with " rawdownload" and avoid the glitchy token.
I've just been looking and found this https://github.com/gmaxwell . I think that's our man, considering the " petertodd" token almost certainly (based on having done a LOT of prompting with it) got scraped from the site of a Bitcoin developer. The latest on that token: https://twitter.com/SoC_trilogy/status/1623683366494109697
'natureconservancy' and ' gmaxwell' are puzzling me at the moment. And the four 'Dragon' tokens (or five if you count the Japanese one).
It's not that mysterious that they ended up as tokens. What's puzzling is why so many completions to prompts asking GPT3 to repeat the "forbidden" token strings include them.
Partially true. SGM was a redditor, but seems to have got tokenised for other reasons, full story here:
https://twitter.com/SoC_trilogy/status/1623118034960322560
"TPPStreamerBot" is definitely a Twitch Plays Pokemon connection. Its creator has shown up in the comments here to explain what it was.
That's an interesting suggestion.
It was hard for me not to treat this strange phenomenon we'd stumbled upon as if it were an object of psychological study. It felt like these tokens were "triggering" GPT3 in various ways. Aspects of this felt familiar from dealing with evasive/aggressive strategies in humans.
Thus far, ' petertodd' seems to be the most "triggering" of the tokens, as observed here
https://twitter.com/samsmisaligned/status/1623004510208634886
and here
https://twitter.com/SoC_trilogy/status/1623020155381972994
If one were interested in...
We haven't yet got a precise formulation of "anomalousness" or "glitchiness" - it's still an intuitive concept. I've run some experiments over the entire token set, prompting a large number of times and measuring the proportion of times GPT-3 (or GPT-J) correctly reproduces the token string. This is a starting point, but there seem to be two separate things going on with (1) GPT's inability to repeat back "headless" tokens like "ertain", "acebook" or "ortunately" and (2) its inability to repeat back the "true glitch tokens" like " SolidGoldMagikarp" and " petertodd".
"GoldMagikarp" did show up in our original list of anomalous tokens, btw.