mwatkins

Wiki Contributions

Comments

We haven't yet got a precise formulation of "anomalousness" or "glitchiness" - it's still an intuitive concept. I've run some experiments over the entire token set, prompting a large number of times and measuring the proportion of times GPT-3 (or GPT-J) correctly reproduces the token string.  This is a starting point, but there seem to be two separate things going on with (1) GPT's inability to repeat back "headless" tokens like "ertain", "acebook" or "ortunately" and (2) its inability to repeat back the "true glitch tokens" like " SolidGoldMagikarp" and " petertodd". 

"GoldMagikarp" did show up in our original list of anomalous tokens, btw.

Thanks for this, I had no idea. So there is some classical mythological basis for the character after all. Do you how the name "Leilan" arose? Also, someone elsewhere has claimed "[P&D] added a story mode in 2021 or so and Leilan and Tsukuyomi do in fact have their own story chapters"... do you know anything about this? I'm interested to find anything that might have ended up in the training data and informed GPT-3's web of semantic association for the " Leilan" token.

I know the feeling. It's interesting to observe the sharp division between this kind of reaction and that of people who seem keen to immediately state "There's no big mystery here, it's just [insert badly informed or reasoned 'explanation']".

GPT-J doesn't seem to have the same kinds of ' petertodd' associations as GPT-3. I've looked at the closest token embeddings and they're all pretty innocuous (but the closest to the ' Leilan' token, removing a bunch of glitch tokens that are closest to everything is ' Metatron', who Leilan is allied with in some Puzzle & Dragons fan fiction). It's really frustrating that OpenAI won't make the GPT-3 embeddings data available, as we'd be able to make a lot more progress in understanding what's going on here if they did.

Yes, this post was originally going to look at how the ' petertodd' phenomenon (especially the anti-hero -> hero archetype reversal between models) might relate to the Waluigi Effect, but I decided to save any theorising for future posts. Watch this space!

I just checked the Open AI tokeniser, and 'hamishpetertodd' tokenises as 'ham' + 'ish' + 'pet' + 'ertodd', so it seems unlikely that your online presence fed into GPT-3's conception of ' petertodd'.  The 'ertodd' token is also glitchy, but doesn't seem to have the same kinds of associations as ' petertodd' (although I've not devoted much time to exploring it yet).  

Thanks for the Parian info, I think you're right that it's the Worm character being referenced. This whole exploration has involved a crash course in Internet-age pop culture for me! I've fixed that JSON link now.

Interesting. Does he have any email addresses or usernames on any platform that involve the string "petertodd"?

Thanks for this, Erik - very informative.

Thanks for the "Steve" clue. That makes sense. I've added a footnote.

I don't think any of the glitch tokens got into the token set through sheer popularity of a franchise. The best theories I'm hearing involved 'mangled text dumps' from gaming, e-commerce and blockchain logs somehow ending up in the data set used to create the tokens. 20% of that dataset is publicly available, and someone's already found some mangled PnD text in there (so lots of stats, character names repeated over and over). No one seems to be able to explain the weird Uma Musume token (that may require contact with an obsessive fan, which I don't particularly welcome).

Load More