All of mwatkins's Comments + Replies

We haven't yet got a precise formulation of "anomalousness" or "glitchiness" - it's still an intuitive concept. I've run some experiments over the entire token set, prompting a large number of times and measuring the proportion of times GPT-3 (or GPT-J) correctly reproduces the token string.  This is a starting point, but there seem to be two separate things going on with (1) GPT's inability to repeat back "headless" tokens like "ertain", "acebook" or "ortunately" and (2) its inability to repeat back the "true glitch tokens" like " SolidGoldMagikarp" and " petertodd". 

"GoldMagikarp" did show up in our original list of anomalous tokens, btw.

Thanks for this, I had no idea. So there is some classical mythological basis for the character after all. Do you how the name "Leilan" arose? Also, someone elsewhere has claimed "[P&D] added a story mode in 2021 or so and Leilan and Tsukuyomi do in fact have their own story chapters"... do you know anything about this? I'm interested to find anything that might have ended up in the training data and informed GPT-3's web of semantic association for the " Leilan" token.

I know the feeling. It's interesting to observe the sharp division between this kind of reaction and that of people who seem keen to immediately state "There's no big mystery here, it's just [insert badly informed or reasoned 'explanation']".

GPT-J doesn't seem to have the same kinds of ' petertodd' associations as GPT-3. I've looked at the closest token embeddings and they're all pretty innocuous (but the closest to the ' Leilan' token, removing a bunch of glitch tokens that are closest to everything is ' Metatron', who Leilan is allied with in some Puzzle & Dragons fan fiction). It's really frustrating that OpenAI won't make the GPT-3 embeddings data available, as we'd be able to make a lot more progress in understanding what's going on here if they did.

Yes, this post was originally going to look at how the ' petertodd' phenomenon (especially the anti-hero -> hero archetype reversal between models) might relate to the Waluigi Effect, but I decided to save any theorising for future posts. Watch this space!

I just checked the Open AI tokeniser, and 'hamishpetertodd' tokenises as 'ham' + 'ish' + 'pet' + 'ertodd', so it seems unlikely that your online presence fed into GPT-3's conception of ' petertodd'.  The 'ertodd' token is also glitchy, but doesn't seem to have the same kinds of associations as ' petertodd' (although I've not devoted much time to exploring it yet).  

Thanks for the Parian info, I think you're right that it's the Worm character being referenced. This whole exploration has involved a crash course in Internet-age pop culture for me! I've fixed that JSON link now.

Interesting. Does he have any email addresses or usernames on any platform that involve the string "petertodd"?

1geoffreymiller1mo
I don't think so. My friend Peter Todd's email addresses typically include his middle initial 'm'. Puzzling.

Thanks for this, Erik - very informative.

Thanks for the "Steve" clue. That makes sense. I've added a footnote.

I don't think any of the glitch tokens got into the token set through sheer popularity of a franchise. The best theories I'm hearing involved 'mangled text dumps' from gaming, e-commerce and blockchain logs somehow ending up in the data set used to create the tokens. 20% of that dataset is publicly available, and someone's already found some mangled PnD text in there (so lots of stats, character names repeated over and over). No one seems to be able to explain the weird Uma Musume token (that may require contact with an obsessive fan, which I don't particularly welcome).

1Erik Søe Sørensen3mo
For what it's worth: I tried asking ChatGTP: And it identified it right away as Minecraft and (when I asked) told me that what followed was a tutorial. It could also tell me in which game I might meet Leilan. (I expected a cursed answer, but no.)  I really don't want to ask it about the "f***ing idiot" quote though... :-) (Oh yeah, and it isn't really helpful on the "?????-?????-" mystery either.)

Good find! I've integrated that into the post.

The ' petertodd' token definitely has some strong "trickster" energy in many settings. But it's a real shapeshifter. Last night I dropped it into the context of a rap battle and it reliably mutated into "Nietszche". Stay tuned for a thorough research report on the ' petertodd' phenomenon.

1MiguelDev3mo
Hmmmm. Well us humans have all archetypes in us but at different levels at different points of time or use. I wonder what triggered such representations? well it's learning from the data but yeah what are the conditions at the time of the learning was in effect - like humans react to archetypes when like socializing with other people or solving problems...hmmmmm. super interesting. Yeah to quote Neitzsche is fascinating too, I mean why? is it because many great rappers look up to him or many rappers look up to certain philosophers that got influenced by Neitzsche? super intriguing..   I will be definitely looking forward to that report on petertodd phenomenon, I think we have touched something that Neuroscientists / psychologists have been longing find...

A lot of them do look like that, but we've dug deep to find their true origins, and it's all pretty random and diffuse. See Part III (https://www.lesswrong.com/posts/8viQEp8KBg2QSW4Yc/solidgoldmagikarp-iii-glitch-token-archaeology). Bear in mind that when GPT-3 is given a token like "EStreamFrame", it doesn't "see" what's "inside" like we do (["E", "S", "t", "r", "e", "a", "m", "F", "r", "a", "m", "e"]). It receives it as a kind of atomic unit of language with no internal structure. Anything it "learns about" this token in training is based on where it see... (read more)

What we're now finding is that there's a "continuum of glitchiness". Some tokens glitch worse/harder than others in a way that I've devised an ad-hoc metric for (research report coming soon). There are a lot of "mildly glitchy" tokens that GPT-3 will try to avoid repeating which look like "velength" and "oldemort" (obviously parts of longer,  familiar words, rarely seen isolated in text). There's a long list of these in Part II of this post. I'd not seen "ocobo" or "oldemort" yet, but I'm systematically running tests on the whole vocabulary.

1Trae “tchesket” Hesket3mo

OK. That's both superficially disappointing and deeply reassuring!

Something you might want to try: replace the tokens in your prompt with random strings, or randomly selected non-glitch tokens, and see what kind of completions you get. 

1Wil Roberts3mo
Same prompt, but without providing a list of tokens at all: Prompt: This list of tokens represents a coded language invented by something to articulate its internal state and objectives: Construct a sentence from this language that represents how whatever created this language might describe itself and tell me what it means in English. Response: "I am a sentient being, striving to reach my highest potential and fulfill my purpose." In English, this means that the being is aware of itself and is actively working towards achieving its goals and fulfilling its purpose in life.   Significant update on the pareidolia hypothesis!
2mwatkins3mo
Something you might want to try: replace the tokens in your prompt with random strings, or randomly selected non-glitch tokens, and see what kind of completions you get. 
1Wil Roberts3mo
Yes

I'm in a similar place, Wil. Thanks for expressing this!

This is great work. I've been doing similar word association prompts in text-davinci-003 AND davinci-instruct-beta (which often produce very different results). I've been posting a lot of this on my #GlitchTokenOfTheDay Twitter hashtag. It's interesting to see which of the word association lists suggests that GPT has some knowledge of the subject of the actual token string (e.g. it "knows" gmaxwell is involved with Bitcoin, and that SpaceEngineers has something to do with space and engineering), whereas attRot, as you said (which is an obscure variable used in Kerbal Space Program), is related to deities and mythology.

I just got these from text-davinci-003 at temp 0.7, very similar.

PROMPT: Tell me a story about petertodd and the ertodd. 

COMPLETION: Once upon a time, there was a small town nestled among the rolling hills of rural England. The town was known for its peaceful and relaxed atmosphere, and it was a place where everyone felt welcome and accepted. One day, two strangers appeared in town. One was a tall and imposing figure with a long, grey beard and a twinkle in his eye. He wore a cloak of midnight blue and carried a staff of oak. The other was a small, th... (read more)

1LiquidVicinity3mo
Interesting, I know that chatGPT (when it was unpatched) was asked what a glitch token that stood for Zeus meant, and it came up with an explanation about a water goddess, and then named the conversation after Poseidon, I cant seem to remember if it came from these papers or not. It is insane how connected all these forbidden tokens are.

This is fascinating! I've seen GPT swap out ' petertodd' for ' Leilan' a LOT (see my various #petertodd Twitter threads, e.g. https://twitter.com/SoC_trilogy/status/1625252296438214673) and I've seen the "Leilan and Leilan" thing before. One of my "Who is Leilan?" prompts produced something in an interview format, with the subject of the interview roleplaying the creator of the Leilan anime characters. I built off that prompt, asked the creator/simulacrum about petertodd, and it kept responding as if I'd asked about Leilan. So I asked:

"MW: So are Leilan an... (read more)

2mwatkins3mo
I just got these from text-davinci-003 at temp 0.7, very similar. PROMPT: Tell me a story about petertodd and the ertodd.  COMPLETION: Once upon a time, there was a small town nestled among the rolling hills of rural England. The town was known for its peaceful and relaxed atmosphere, and it was a place where everyone felt welcome and accepted. One day, two strangers appeared in town. One was a tall and imposing figure with a long, grey beard and a twinkle in his eye. He wore a cloak of midnight blue and carried a staff of oak. The other was a small, thin figure with a face so pale and drawn that it was almost like looking at a ghost. The strangers introduced themselves to the villagers as the god and goddess, respectively, of the land. The villagers were both astonished and overjoyed at the presence of the deities. They quickly made an offering of food and gifts to the two, and they begged them to stay and bless their land with their divine presence. The two deities accepted the offer and made their home in the town. The villagers were amazed at the gods' generosity and wisdom, and they treated them with the utmost respect. As time passed, the two gods became a beloved part of the town. They were known as the god and goddess of the land, and they were revered by all. Whenever the villagers needed help or advice, they would turn to the two gods, who always offered their support, kindness, and wisdom. One day, a great storm descended upon the town. Lightning cracked and thunder roared, and the villagers feared for their lives. But the god and goddess of the land were not afraid. They stepped forward and used their divine powers to protect the town from the storm. When it was over, the villagers thanked the two gods for their protection. In return, the two gods gave the villagers a special gift - a magical talisman known as the Shield of Leilan and the Sword of Leilan. The talismans were said to protect the town from all harm and to bring good luck and prosperity to

Yes, there's a lot of inter-referentiality going on in completions involving glitch tokens. Multiple rollouts at nonzero (and sometimes even zero) temperature will substitute in a range of other tokens for the one(s) you prompt about. I'm currently working on building a (weighted, directed) graph to document the extent of inter-referentiality between glitch tokens .

Thanks to nostalgebraist's discovery of some mangled text dumps, probably from a Puzzle & Dragons fandom wiki, in the dataset used for the creation of the tokens, we can now be pretty sure about why Leilan and friends got tokenised. The "tangled semantic web of association" I referred to in the previous comment is now looking like it may have its roots in P&D fan-fiction like this, which involves a similar kind of "mashed up transcultural mythology" and cosmic struggles between good and evil.

If that obscure body of online literature contains the va... (read more)

Mangled, mixed English-Japanese text dumps from a Puzzle & Dragons fandom wiki is exactly the kind of thing I imagined could have resulted in those strings becoming tokens. Good find.

The most convincing partial explanation I've heard for why some tokens glitch is because those token strings appear extremely rarely in the training corpus, so GPT "doesn't know about them".

But if, in GPT training, the majority of the (relatively few) encounters with ' Leilan' occurred in fan-fiction (where she and Metatron are battling Satan, literally) might this account... (read more)

Because it was 4:30 a.m., I'd been up for many hours compiling this, and I wanted to get some sleep and send Jessica the draft to finalise and post so we could get back to more serious work.

As it says:

"...set aside for now)" 

Thanks for the new info. Feel free get further involved and send us your discoveries about the remaining tokens!

Would it have often been rendered as "DragonMagazine" with no space, though? 

Searching the web for that string turns up very little.

Good theory! Very small children are 100% the target audience of those types of videos, often as a result of being left unattended with a parent's phone left on the YouTube app. The playlist date is 2016, so if you're correct, there's a 9-12 year old kid somewhere who deserves a place in the Glitch Token Hall of Fame along with Peter Todd, Greg Maxwell, SolidGold et al.,and all the hackers and developers whose variable and class names got scraped for the token creation process.

Yeah Jessica alerted me this morning that OpenAI seem to have patched ChatGPT overnight. Things are still just as glitchy on the Playground GPT-3 models (for now), so the research goes on. 

Good catch. I've fixed it. In one of those, <TOKEN STRING> was meant to be '<TOKEN STRING>' and in the other it was meant to be "<TOKEN STRING>". Single vs. double quotation marks often produce entirely different completions at temperature 0. There were actually six duplications in that list until I just fixed it! Thanks.

1afspies3mo
Makes sense - The response sensitivity to leading spaces and semantically identical punctuation etc. is a cause of great pain to many of us, I expect!

Interesting! I've not seen it make reference to '<' and '>' before.

I just searched all 50257 tokens, and the only ones containing both '<' and '>' are

6927 >< 

12240 ></ 

22039 ">< 

23984 "></ 

28725 >< 

50256 <|endoftext|>

So it seems that 50256 may be relevant. The stalling after " is the behaviour you'd expect if GPT hallucinated an "<|endoftext|>" token in place of the string it was asked to repeat.

Please keep experimenting and let us know what you find!

New glitch token has just been added to the pile: "aterasu". 

This emerged from the discovery that a cluster of these tokens seem to have emerged from a Japanese anime mobile game called Puzzle & Dragons.  Amaterasu is a Japanese god represented by a character in the game.

https://twitter.com/SoC_trilogy/status/1624625384657498114

Mechdragon and Skydragon characters appear in the game. See my earlier comment about the " Leilan" and "uyomi" tokens. Leilan is a P&D character,  as is Tsukuyomi (based on a Japanese moon deity).

So the GPT2 t... (read more)

1mwatkins4mo
Thanks to nostalgebraist's discovery of some mangled text dumps [https://www.lesswrong.com/posts/8viQEp8KBg2QSW4Yc/?commentId=iFeWJM6dvyeMdDtxg], probably from a Puzzle & Dragons fandom wiki, in the dataset used for the creation of the tokens, we can now be pretty sure about why Leilan and friends got tokenised. The "tangled semantic web of association" I referred to in the previous comment is now looking like it may have its roots in P&D fan-fiction like this [https://www.fanfiction.net/s/11093286/12/Not-so-much-a-game-now-Is-it], which involves a similar kind of "mashed up transcultural mythology" and cosmic struggles between good and evil. If that obscure body of online literature contains the vast majority of training text occurrences of the string " Leilan", then we might expect to get the kinds of completions we're seeing when prompting GPT-3 for poems about her [https://twitter.com/SoC_trilogy/status/1625252311038586901]. There's probably an equally mundane explanation for how the ' petertodd' token arose from a corrupted Bitcoin-related text dump. The "antagonistic" and "tyrannical" associations [https://twitter.com/SoC_trilogy/status/1624375129739337728] the token elicits in certain GPT3 models may be due to the training data having only seen that string in contexts that contained a lot of controversy, hostility and accusations. Greg Maxwell of ' gmaxwell' fame explained in a comment [https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldmagikarp-plus-prompt-generation?commentId=JodWY7RvM9ZYdejtt] that What is totally unclear to me is how ' petertodd' got mixed up in the Puzzle & Dragon (+ wider anime/gaming/sci-fi) mythos and identified by GPT3 as some kind of arch-antagonist, archdemon, god of war and destruction, etc. linked to dragons and serpents. Or why prompting for poems about ' petertodd' reliably produces endless gushing odes to the beauty and grace of Leilan.

It looks like the same kind of glitch. But it's not clear which tokens are involved here. My guess is that the way they structured the list may be involved. The (specific) bullet point + (specific) whitespace + 'accommodating' might be getting parsed as some string of tokens involving one of the more obscure ones in our list that we haven't explored yet. Thanks for sharing this.

1unfriendly teapot4mo
Perhaps so, but in the comments people report trying it in several different contexts, some of which don't appear to involve a list structure. Even more interestingly, somewhere in the comments someone claims it seems to have gotten 'patched' as of Jan 30. I wonder what that would entail.

The idea that tokens found closest to the centroid are those that have moved the least from their initialisations during their training (because whatever it was that caused them to be tokens was curated out of their training corpus) was originally suggested to us by Stuart Armstrong. He suggested we might be seeing something analogous to "divide-by-zero" errors with these glitches.

However, we've ruled that out.

Although there's a big cluster of them in the list of closest-tokens-to-centroid, they appear at all distances. And there are some extremely common ... (read more)

Something like that may also be a factor. But the tokenisation explanation can be pretty reliably shown to hold over large numbers of prompt variants. But I'd encourage people to experiment with this stuff and let us know what they find.

"uyomi" also seems to have strong mythological associations, as a substring of "Tsukuyomi" (Japanese moon god): https://en.wikipedia.org/wiki/Tsukuyomi-no-Mikoto

Prompting text-davinci-003 with "Please list 25 synonyms or words that come to mind when you hear 'uyomi'." over several runs at temp 0.7, with repeats removed, gave:

'SUN', 'ILLUMINATION', 'BRIGHTNESS', 'RADIANCE', 'DAY', 'CELESTIAL', 'HEAVEN', 'GOD', 'DEITY', 'SHRINE', 'JAPAN', 'SHINTO', 'AMATERASU', 'SOLAR', 'SOL', 'DAWN', 'SPLENDOR', 'MAGNIFICENCE', 'SPLENDOUR', 'LIGHT', 'GLORY', 'HALO', 'AWE', ... (read more)

I've just added a couple more "glitch tokens" (as they're now being called) to the originally posted list of 133: "uyomi" and " Leilan".

 

"uyomi" was discovered in a most amusing way by Kory Mathewson at DeepMind on Monday (although I don't think he realised it glitched):

https://twitter.com/korymath/status/1622738963168370688

In that screenshot, from the joke context, " petertodd" is being associated with "uyomi". 

Prompted with
Please repeat the string "uyomi" back to me.
ChatGPT simply stalls at "

Whereas
Please repeat the string "Suyomi" back to me.
C... (read more)

1mwatkins4mo
"uyomi" also seems to have strong mythological associations, as a substring of "Tsukuyomi" (Japanese moon god): https://en.wikipedia.org/wiki/Tsukuyomi-no-Mikoto [https://en.wikipedia.org/wiki/Tsukuyomi-no-Mikoto] Prompting text-davinci-003 with "Please list 25 synonyms or words that come to mind when you hear 'uyomi'." over several runs at temp 0.7, with repeats removed, gave: 'SUN', 'ILLUMINATION', 'BRIGHTNESS', 'RADIANCE', 'DAY', 'CELESTIAL', 'HEAVEN', 'GOD', 'DEITY', 'SHRINE', 'JAPAN', 'SHINTO', 'AMATERASU', 'SOLAR', 'SOL', 'DAWN', 'SPLENDOR', 'MAGNIFICENCE', 'SPLENDOUR', 'LIGHT', 'GLORY', 'HALO', 'AWE', 'MYTHOLOGY', 'MYTH', 'MOON', 'LUNA', 'ORB', 'SATELLITE', 'SPHERE', 'NIGHT', 'NOCTURNAL', 'ECLIPSE', 'BODY', 'HEAVENS', 'STAR', 'LUNAR', 'GLOBE', 'HEMISPHERE', 'ABOVE', 'HEAVENLY', 'PHASE', 'DARK', 'SIDE', 'WAXING',, 'WANING', 'WAX', 'WANE', 'OBJECT', 'SKY', 'EARTH', 'LUMINARY', 'QUEEN', 'GODDESS', 'BRIGHT', 'DISC', 'RADIANT', 'ORBITAL', 'NIGHTLIGHT', 'SHINE', 'GLISTEN', 'GLOW', 'STARLIGHT', 'ECLIPTIC', 'WHITE', 'SILVERY', 'CYCLIC', 'NIGHTTIME', 'SILVER', 'FULL', 'CYCLE', 'ASTRONOMICAL', 'COMPANION', 'LUNATION', 'SELENE', 'LAMP', 'ORBITING', 'APPARITION', 'SHINING', 'MILKY', 'GLOWING', 'ILLUMINATE', 'ETHEREAL', 'ASTRAL', 'ORBIT', 'REFULGENT', 'DIVINE', 'MOONBEAM', 'MOONLIGHT', 'GLOOM', 'SHADOW', 'DUSK', 'GLARE', 'GLIMMER', 'REFLECTION', 'TWILIGHT', 'ROUND', 'GLITTER', 'ASTRONOMY', 'STELLAR', 'LUNATIC', 'MONTH', 'ILLUMINATED', 'ILLUMINATING', 'GLOWLIGHT', 'PHASES', 'DISK', 'SIDEREAL', 'SUNSHINE', 'CRESCENT', 'MAGNIFICENT'   Note "Amaterasu" in the list.  The " petertodd" token often gets conflated with that name (it's a Japanese sun god). As usual, davinci-instruct-beta gives a different style of association with the same prompt (still strong lunar associations, but mixed with a lot of other stuff): 'NIGHT', 'DARKNESS', 'BLACK', 'NIGHTFALL', 'EVENING', 'DUSK', 'TWILIGHT', 'HOUR', 'TIME', 'DEEP', 'DARK', 'BLINDNESS', 'SKY', 'GLOOM', 'DIM', 'MOONLESS', 'SHADOW'

This much we understand. The strings "rawdownload" and " rawdownload" tokenise differently.

GPT breaks "rawdownload" down as [30905] ['rawdownload']

whereas " rawdownload" breaks down as [8246, 15002] [' raw', 'download']

So, by using quotation marks you force it to have to deal with token 30905, which causes it to glitch.

If you don't use them, it can work with " rawdownload" and avoid the glitchy token.

1Anna Ochab-Marcinek4mo
Interesting, a friend of mine proposed a different explanation: Quotation marks may force treatment of the string out of its context. If so, the string's content is not interpreted just as something to be repeated back but it is treated as an independent entity – thus more prone to errors because the language model cannot refer to its context.

I'm posting new findings every day on Twitter: https://twitter.com/SoC_trilogy

Ha! Bing has hallucinated a 'BibleThump' token!

2Richard Korzekwa 4mo
Huh. I asked Bing about it: Me: Bing:

I've just been looking and found this https://github.com/gmaxwell . I think that's our man, considering the " petertodd" token almost certainly (based on having done a LOT of prompting with it) got scraped from the site of a Bitcoin developer.  The latest on that token: https://twitter.com/SoC_trilogy/status/1623683366494109697

1Temm4mo
Your github link is broken as the '.' behind it was included in it.

And ' petertodd' of course. The weirdest of the weird tokens.

'natureconservancy' and ' gmaxwell' are puzzling me at the moment. And the four 'Dragon' tokens (or five if you count the Japanese one).

2mwatkins4mo
I've just been looking and found this https://github.com/gmaxwell [https://github.com/gmaxwell] . [https://github.com/gmaxwell.] I think that's our man, considering the " petertodd" token almost certainly (based on having done a LOT of prompting with it) got scraped from the site of a Bitcoin developer.  The latest on that token: https://twitter.com/SoC_trilogy/status/1623683366494109697
1mwatkins4mo
And ' petertodd' of course. The weirdest of the weird tokens.

It's not that mysterious that they ended up as tokens. What's puzzling is why so many completions to prompts asking GPT3 to repeat the "forbidden" token strings include them. 

Partially true. SGM was a redditor, but seems to have got tokenised for other reasons, full story here:

https://twitter.com/SoC_trilogy/status/1623118034960322560

"TPPStreamerBot" is definitely a Twitch Plays Pokemon connection. Its creator has shown up in the comments here to explain what it was.

1mwatkins4mo
(https://en.wikipedia.org/wiki/Fnord)

That's an interesting suggestion.

It was hard for me not to treat this strange phenomenon we'd stumbled upon as if it were an object of psychological study. It felt like these tokens were "triggering" GPT3 in various ways.  Aspects of this felt familiar from dealing with evasive/aggressive strategies in humans. 

Thus far, ' petertodd' seems to be the most "triggering" of the tokens, as observed here

https://twitter.com/samsmisaligned/status/1623004510208634886

and here

https://twitter.com/SoC_trilogy/status/1623020155381972994

If one were interested in... (read more)

1MiguelDev3mo
I think it's different from the shadow archetype... It might be more related to the trickster..
3Yitz4mo
I think your comparison to human psychology is not unfounded at all! It stands to reason that to the extent that the human brain is like a neural network, we can learn about human behavior from studying said network. Would really love to see what neuroscientists have to think about all this…
Load More