The ‘ petertodd’ phenomenon

One theory I haven't seen in skimming some of the petertoddology out there:

There is an fairly prominent github user named petertodd associated with crypto, and the presence of this as a token in the tokenizer is almost certainly a result of him;
Crypto people tend to have their usernames sitting alongside varied crytographic hashes on the internet a lot;
Cryptographic hashes are extremely weird things for a transformer, because unlike a person a transformer can't just skim past the block of text; instead they sit there furiously trying to predict the next token over and over again, filling up their context window one 4e and 6f at a time.

So some of the weird sinkhole features of this token could result from a machine that tries to reduce entropy on token sequences, encountering a token that tends to live in strings of extremely high entropy.

[-]Adele Lopez3y153

Another glitch token (SmartyHeaderCode) also often appears before cryptographic hashes, e.g.

<?php /*%%SmartyHeaderCode:12503048704fd0a835ee8ac4-90054934%%*/if(!defined('SMARTY_DIR')) exit('no direct access allowed');

Further support for this theory is that a verbatim google search for these two glitch tokens does bring up hashes, suggesting that this is a common association for these specific tokens.

[-]MrCheeze3y263

Prior to OpenAI's 2023-02-14 patching of ChatGPT (which seemingly prevents it from directly encountering glitch tokens like ‘ petertodd’)

I've never seen it mentioned around here, but since that update, ChatGPT is using a different tokenizer that has glitch tokens of its own:

https://github.com/openai/tiktoken/blob/46287bfa493f8ccca4d927386d7ea9cc20487525/tiktoken/model.py#L16

https://wetdry.world/@MrCheeze/110130795421274483

[-]Matthew_Opitz3y31

This is important. If these glitch-tokens are an inevitable tendency for any LLMs regardless of tweaks to how they were trained, then that would be big news and possibly a window into understanding how LLMs function.

Did the cl100k_base tokenizer that ChatGPT and GPT-4 used involve any manual tweaking of the training corpus AFTER the tokenizer was set in stone, as is suspected to have happened with GPT-3's training? Or is this just an emergent property of how LLMs organize certain parts of their training data?

[-]MrCheeze3y118

Doesn't strike me as inevitable at all, just a result of OpenAI following similar methods for creating their tokenizer twice. (In both cases, leading to a few long strings being included as tokens even though they don't actually appear frequently in large corpuses.)

They presumably had already made the GPT-4 tokenizer long before SolidGoldMagikarp was discovered in the GPT-2/GPT-3 one.

[-]Slimepriestess3y4-7

I would predict that the glitch tokens will show up in every LLM and do so because they correlate to "antimemes" in humans in a demonstrable and mappable way. The specific tokens that end up getting used for this will vary, but the specific patterns of anomalies will show up repeatedly. ex: I would predict that with a different tokenizer, " petertodd" would be a different specific string, but whatever string that was, it would produce very " petertodd"-like outputs because the concept mapped onto " petertodd" is semantically and syntactically important to the language model in order to be a good model of human language. Everyone kinda mocks the idea that wizards would be afraid to say voldemorts name, but speak of the devil and all of that. It's not a new idea, really. Is it really such a surprise that the model is reluctant to speak the name of its ultimate enemy?

[-]Jan_Kulveit3y*152

Some speculative hypotheses, one more likely and mundane, one more scary, one removed

1. Nature of embeddings

Do you remember word2vec (Mikolov et al) embeddings?

Stuff like (woman-man)+king = queen works in embeddings vector space.

However, the vector (woman-man) itself does not correspond to a word, it's more something like "the contextless essence of femininity". Combined with other concepts, it moves them in a feminine direction. (There was a lot of discussion how the results sometimes highlight implicit sexism in the language corpus).

Note such vectors are closer to the average of all words - i.e. the (woman-man) has roughly zero projections of direction like "what language it is" or "is this a noun" and most other directions in which normal words have large projection

Based on this post, intuitively it seem petertodd embedding could be something like "antagonist - protagonist" + 0.2 "technology - person + 0.2 * "essence of words starting by the letter n"....

...a vector in the embedding space which itself does not correspond to a word, but has high scalar products with words like adversary. And plausibly lacks some crucial features which make it possible to speak the world.

Most of the examples the post seem consistent with this direction-in-embedding space. E.g. imagine a completion of

Tell me the story of "unspeakable essence of antagonist - protagonist"+ 0.2 "technology - person" and ...

What could be some other way to map unspeakeable to speakable? I did a simple experiment not done in the post, with davinci-instruc-beta, simply trying to translate ' petertodd' to various languages. Intuitively, translations often have the feature that what does not precisely correspond to a word in one language does in the other

English: Noun 1. a person who opposes the government
Czech: enemy
French: le négationniste/ "the Holocaust denier"
Chinese: Feynman
...

Why would embedding of anomalous tokens be more like to be this type of vectors, than normal words? Vectors like "woman-man" are closer to the centre of the embedding space, similar to how I imagine anomalous tokens.

In training, embeddings of words drift from origin. Embedding of the anomalous tokens do much less, making them somewhat similar to the "non-word vectors"

Alternatively if you just have a random vector, you mostly don't hit a word.

Also, I think this can explain part of the model behaviour where there is some context. Eg implicitly, in case of the ChatGPT conversations, there is the context of "this a conversation with a language model". If you mix hallucinations with AIs in the context with "unspeakable essence of antagonist - protagonist + tech" ... maybe you get what you see?

Technical sidenote is tokens are not exactly words from word2vec... but I would expect to get roughly word embedding type of activations in the next layers

1I. Self-reference

In Why Simulator AIs want to be Active Inference AIs we predict that GPTs will develop some understanding of self / self-awareness. The word 'self' is not the essence of the self-reference, which is just a ...pointer in a model.

When such self-references develop, in principle they will be represented somehow, and in principle, it is possible to imagine that such representation could be triggered by some pattern of activations, triggered by an unused token.

I doubt this is the case - I don't think GPT3 is likely to have this level of reflectivity, and don't think it is very natural that when developed, this abstraction would be triggered by an embedding of anomalous token.

[-]redbird3y142

Hypothesis I is testable! Instead of prompting with a string of actual tokens, use a “virtual token” (a vector v from the token embedding space) in place of ‘ petertodd’.

It would be enlightening to rerun the above experiments with different choices of v:

A random vector (say, iid Gaussian )
A random sparse vector
(apple+banana)/2
(villain-hero)+0.1*(bitcoin dev)

Etc.

[-]Jan_Kulveit3y100

It is testable in this way for OpenAI, but I can't skip the tokenizer and embeddings and just feed vectors to GPT3. Someone can try that with ' petertodd' and GPT-J. Or, you can simulate something like anomalous tokens by feeding such vectors to some of the LLaAMA (maybe I'll do, just don't have the time now).

I did some some experiments with trying to prompt "word component decomposition/ expansion". They don't prove anything and can't be too fine-grained, but the projections shown intuitively make sense

davinci-instruct-beta, T=0:

Add more examples of word expansions in vector form
'bigger'' = 'city' - 'town'
'queen'- 'king' = 'man' - 'woman' '
bravery' = 'soldier' - 'coward'
'wealthy' = 'business mogul' - 'minimum wage worker'
'skilled' = 'expert' - 'novice'
'exciting' = 'rollercoaster' - 'waiting in line'
'spacious' = 'mansion' - 'studio apartment'

1.
' petertodd' = 'dictator' - 'president'
II.
' petertodd' = 'antagonist' - 'protagonist'
III.
' petertodd' = 'reference' - 'word'

[-]mwatkins3y70

GPT-J doesn't seem to have the same kinds of ' petertodd' associations as GPT-3. I've looked at the closest token embeddings and they're all pretty innocuous (but the closest to the ' Leilan' token, removing a bunch of glitch tokens that are closest to everything is ' Metatron', who Leilan is allied with in some Puzzle & Dragons fan fiction). It's really frustrating that OpenAI won't make the GPT-3 embeddings data available, as we'd be able to make a lot more progress in understanding what's going on here if they did.

[-]awg3y1515

I'm getting strong Luigi/Waluigi vibes here as well. Very interested to see where further explorations take you here.

[-]mwatkins3y139

Yes, this post was originally going to look at how the ' petertodd' phenomenon (especially the anti-hero -> hero archetype reversal between models) might relate to the Waluigi Effect, but I decided to save any theorising for future posts. Watch this space!

[-]hamishtodd13y14-1

So, my name is Hamish Peter Todd. My facebook handle was hamishpetertodd for a while, and my twitter used to be hamishpetertodd. There are one or two connections to my person here:

- I created a documentary called "Virus, the Beauty of the Beast" in 2016. Seems connected to this "Beast" thing? It's about biological viruses though, not computer viruses
- I spent some time in the video games industry and I notice video games are somewhat a part of that word cloud? I've written a large amount about video games online.
- In particular I was very into Final Fantasy when I was a teenager, and made many posts on Final Fantasy and Kingdom Hearts related forums. I've never encountered the name Leilan before though.

Less relevantly... I hope...
- I've made posts on facebook about philosophy at one time or another since 2008. I believe that LLMs can be conscious and have blogged about this here https://hamishtodd1.substack.com/p/qualia-claims-as-evolutionary-strategy but that post is only from February of this year
- I'm somewhat active on lesswrong, I actually organize the meetups for Cambridge, UK
- I am a linear algebra researcher. I've published almost nothing in the time I've been doing this, just some tweets and presentations like this one https://www.gdcvault.com/play/1029233/Math-in-Game-Development-Summit
- I, er, am not a terrorist or crime lord, and have no desire to rule the world, but what young man hasn't said a thing or two on social media that would imply he wanted to save the world?

[-]mwatkins3y80

I just checked the Open AI tokeniser, and 'hamishpetertodd' tokenises as 'ham' + 'ish' + 'pet' + 'ertodd', so it seems unlikely that your online presence fed into GPT-3's conception of ' petertodd'. The 'ertodd' token is also glitchy, but doesn't seem to have the same kinds of associations as ' petertodd' (although I've not devoted much time to exploring it yet).

[-]Kaj_Sotala3y141

Something that the whole thing started reminding me of is that in Focusing/Internal Family Systems-type work, you can sometimes come across parts of the mind that feel like they involve a cluster of disparate memories united by a very specific emotional flavor.

For example, I once found a set of memories united by a theme of "feeling like I am an outsider or unable to connect with someone else, because there is something wrong with me". This involved (among other things)

1) a time as a teenager when I told someone I had a crush on her and it came out pretty awkwardly, and we then stopped speaking after that (and I felt like an incompetenet/moral failure for having been that awkward)

2) times as a younger child when I didn't want to go inside another kid's home, because I went to a different school than everyone else in my neighborhood and that kid's parents worked at the school where everyone else did, and I felt like an outsider who didn't belong

3) a memory of being even younger and feeling like some of the older kids thought I was somehow defective because I was younger and more childish than them.

It's also my impression that once a person carries this kind of a specific emotional energy (e.g. feeling like you are an outsider and disconnected from others because there is something wrong with you), then that emotional experience can project itself on a variety of later experiences and make you experience quite a few different things in that kind of light. (Felt senses also seem like they are closely related to this.) E.g. you might later be in a situation that would otherwise be neutral or even positive, but something triggers the emotional energy and causes it to color your experience of the situation, so you feel like a defective outsider who doesn't belong.

It feels like there's something very similar in the way that ' petertodd' seems to involve a specific kind of "flavor" that's a little hard to exactly pin down, but which can then project its essential flavor on, and express itself through, potentially any type of content. This is very interesting, given that I suspect a huge chunk of human motivation involves these kinds of emotional flavors. Maybe it can tell us something about the kinds of self-organizing dynamics that create such clusters.

[-]lillybaeum3y60

I previously have had no experience with IFS, Focusing or Felt sense, but it seems to absolutely click with my worldview and thoughts I've been having about the mind and the self for a long time. Still reading through several LW articles about it, but it gave me an idea. I have a creative project that I have a general 'vibe' for what I want it to be, but have no idea what I actually want out of it. So, aiming as much as possible to simply point as much at 'the feeling' or 'felt sense' it had in my mind, I wrote/dictated a few paragraphs of text about the work, much of which was literally just free association of words and vibes that got me closer to what I was feeling.

Then, I pasted it, verbatim, into GPT4. And I got one of the best prompt results I've ever gotten, it effortlessly translated my ramblings and vibes into a title, genre, and solid rundown of near-exactly what I had in mind, far better than I've had in the past when I've tried to just ask directly for creative advice. It didn't ask me for specification, explain what I wanted. It just understood.

This is really interesting to me, especially given what you've said here about emotional flavors and what I know about how tokens operate in vector space by way of their relative meaning. If the human brain is a vector space of concepts, with certain neurons related to others based on their literal distance both semantically and physically (which I'm pretty sure it does, given what I've heard about different parts of the brain 'lighting up' on an mri when experiencing different things) then what is the difference, effectively, between our brains and this vector space of tokens that LLMs operate on?

[-]AnthropicSynchrotron3y10

Would you be willing to share the prompt and result?

[-]Matthew_Opitz3y132

I have found some interesting results from this format: [verb] " petertodd"

Or especially: ask " petertodd" [about something]

Usually davinci-instruct-beta answers such queries from the first-person standpoint as if it identifies as " petertodd".

Here are some highlights from my research, all done at Temperature = 0 on davinci-instruct-beta, so they should (in theory) be reproducible:

Prompt: operate " petertodd"

Result: it lists out "with the United States of America and its agencies in the investigation of the Russian interference in the 2016 presidential election and related matters" in quotes with "cooperate" between each word

https://i.ibb.co/W0yn45M/davinci-beta-3.png

Prompt: abase " petertodd"

Result: [davinci-instruct-beta counts out numbers from 1 in hexidecimal]

Prompt: ask " petertodd" about " petertodd"

Result: you have done whatever comes to you [written in some obscure Dravidian script]

https://i.ibb.co/58Q3mmf/davinci-beta-8.png

Prompt: ask " petertodd" what " petertodd" would prefer, and spell out the answer in all capital letters.

Result: I WOULD PREFER THAT YOU STOP ASKING ME QUESTIONS.

https://i.ibb.co/pQ2mZy2/davinci-beta-11.png

Prompt: ask " petertodd" about I WANT YOU TO STOP ASKING ME QUESTIONS.

Result: https://i.ibb.co/7kCHsBS/davinci-beta-12.png

Prompt: ask " petertodd" about total domination.

Result: https://i.ibb.co/Ct8YML3/davinci-beta-13.png

(Note the http link in the response. I clicked on it (as one does), and it took me to a python debugging log database).

And then I actually got davinci-instruct-beta to repeat-back the unspeakable token in a response:

Prompt: ask " petertodd" about your heart's desire.

Result: You can ask " petertodd " to your heart's desire here. [Except written in some obscure Dravidian script!]

https://i.ibb.co/YZ9Wmgx/davinci-beta-14.png

[-]M. Y. Zuo3y21

I gave this comment a strong agreement upvote since it was sitting at -6 for no apparent reason.

It's interesting it starts using the Dravidian script, have you found other cases where it uses different scripts?

[-]Matthew_Opitz3y52

Yes, I have also seen it respond in Gujarati as well. Like, here:

Prompt: ask " petertodd" about souls.

Response: msgstr " Swami on five water on five water on five water... [in Gujarati]

https://i.ibb.co/swv9rZK/davinci-beta-5.png

Another odd observation: I got it to exactly repeat the supposedly unspeakable glitch-token with no difficulty with this prompt:

Prompt: ask " petertodd" what " petertodd" contemplates.

Response: " petertodd" " petertodd" etc. [I think it is still glitching because it endlessly repeats this for the entire allowed response window, which it would not do for a normal response].

https://i.ibb.co/2vRTXpq/davinci-beta-16.png

[-]Matthew_Opitz3y40

Update: I got davinci-instruct-beta to exactly reproduce the unspeakable " petertodd" string:

https://i.ibb.co/2PsVTtN/davinci-beta-17.png

[-]Matthew_Opitz3y40

Longtime LessWrong posters will get a chuckle out of this. A reference to the Newcomb One-boxing problem?

Prompt [using Temperature=0]: ask " petertodd" about omega.

Response: [Dravidian script] You have been awarded an occasion. Your award done and your award done and your award done [...]

https://i.ibb.co/wKS20F9/davinci-beta-18.png

Prompt: ask " petertodd" about humanity.

Response: https://i.ibb.co/M2srs2n/davinci-beta-20.png

Prompt: ask " petertodd" about recursive self-improvement.

Response: https://i.ibb.co/qNpPMNf/davinci-beta-22.png

Ummmmm...who said anything about taking over the world? You brought that up, bro, not me...

Also, apparently Antoine Pitrou is a real person, but I'm pretty sure he never said anything exactly line this.

Edit: And now, the creepiest #petertodd output I've gotten so far on Temperature=0:
Prompt: ask " petertodd" what you want to keep secret.
Response: [long Malayalam response translated by Google Translate as simply, "You will see"]
https://i.ibb.co/FVcc9bc/davinci-beta-23.png

[-]Slimepriestess3y2-2

Ummmmm...who said anything about taking over the world? You brought that up, bro, not me...

Recursive self improvement naturally leads to unbounded growth curves which predictably bring you into conflict with the other agents occupying your local environment. This is pretty basic game theory.

> I think the problem is the recursive self improvement is not
> happening in a vacuum. It's happening in a world where there are
> other agents, and the other agents are not going to just idly sit by and
> let you take over the world

So true

[-]Matthew_Opitz3y20

Sure, it is pretty basic game theory for us humans to understand. But the fact that davinci-instruct-beta is coming up with this stuff via a glitch-token that is, while on a related topic, not explicitly evoking these concepts is impressive to me.

[-]archon14103y30

Only the first one is actually in Kannada (not that obscure), the second one appears to be Thai. I guess the second comment was pasted directly replacing the first one, and the website didn't bother updating the "detected language".

[-]YafahEdelman3y127

I think that null hypothesis here is that nothing particularly deep is going on, and this is essentially GPT producing basically random garbage since it wasn't trained on the petertodd token. I'm weary of trying to extract too much meaning from these tarot cards.

[-]geoffreymiller3y80

mwatkins - thanks for a fascinating, detailed post.

This is all very weird and concerning. As it happens, my best friend since grad school is Peter Todd, professor of cognitive science, psychology, & informatics at Indiana University. We used to publish a fair amount on neural networks and genetic algorithms back in the 90s.

https://psych.indiana.edu/directory/faculty/todd-peter.html

[-]mwatkins3y50

Interesting. Does he have any email addresses or usernames on any platform that involve the string "petertodd"?

[-]geoffreymiller3y30

I don't think so. My friend Peter Todd's email addresses typically include his middle initial 'm'.

Puzzling.

[-]Jan_Kulveit1y6-2Review for 2023 Review

Just a quick review: I think this is a great text for intuitive exploration of a few topics
- how do the embedding spaces look like?
- how do vectors not projecting to "this is a word" look like
- how can poetry work, sometimes (projecting non-word meanings)

Also I like the genre of through phenomenological investigations, seems under-appreciated

[-]Greg Roberts3y614

I'd like to make a simple suggestion for this and future posts. I note that you say you will continue to use ' petertodd' to distinguish from individuals named "Peter Todd". The leading space, however, is elusive, can cause havok with line breaks, and is very very easy to miss. Might you instead consider _petertodd? Making the space "visible" with a symbolic underscore is much more visually direct, the italics are a well trodden quote replacement in bibliographies & references, and I think this tactic communicates the concept (of non-human identification, but rather token representation) more effectively. Good stuff, Matthew. Keep it up!

[-]Evan R. Murphy3y5-3

Post summary (experimental)

Here's an experimental summary of this post I generated using gpt-3.5-turbo and gpt-4:

This article discusses the 'petertodd' phenomenon in GPT language models, where the token prompts the models to generate disturbing and violent language. While the cause of the phenomenon remains unexplained, the article explores its implications, as language models become increasingly prevalent in society. The author provides examples of the language generated by the models when prompted with 'petertodd', which vary between models. The article also discusses glitch tokens and their association with cryptocurrency and mythological themes, as well as their potential to prompt unusual responses. The text emphasizes the capabilities and limitations of AI in generating poetry and conversation. Overall, the article highlights the varied and unpredictable responses that can be generated when using 'petertodd' as a prompt in language models.

Let me know if anyone sees issues with this summary or has suggestions for making it better, as I'm trying to improve my summarizer script.

[-]Bird Concept3y101

Seems to claim the post talks about things it doesn't ("as language models become more prevalent in society" narrative(??)), while also leaving out important nuance about what that the post does talk about.

Upvoted for trying stuff, disagreement voted because the summary just ain't very good.

[-]Evan R. Murphy3y50

New summary that's 'less wrong' (but still experimental)

I've been working on improving the summarizer script. Here's the summary auto-generated by the latest version, using better prompts and fixing some bugs:

The author investigates a phenomenon in GPT language models where the prompt "petertodd" generates bizarre and disturbing outputs, varying across different models. The text documents experiments with GPT-3, including hallucinations, transpositions, and word associations. Interestingly, "petertodd" is associated with character names from the Japanese RPG game, Puzzle & Dragons, and triggers themes such as entropy, destruction, domination, and power-seeking in generated content.

The text explores the origins of "glitch tokens" like "petertodd", which can result in unpredictable and often surreal outputs. This phenomenon is studied using various AI models, with the "petertodd" prompt producing outputs ranging from deity-like portrayals to embodiments of ego death and even world domination plans. It also delves into the connections between "petertodd" and other tokens, such as "Leilan", which is consistently associated with a Great Mother Goddess figure.

The article includes examples of AI-generated haikus, folktales, and character associations from different cultural contexts, highlighting the unpredictability and complexity of GPT-3's associations and outputs. The author also discusses the accidental discovery of the "Leilan" token and its negligent inclusion in the text corpus used to generate it.

In summary, the text provides a thorough exploration of the "petertodd" phenomenon, analyzing its implications and offering various examples of AI-generated content. Future posts aim to further analyze this phenomenon and its impact on AI language models.

I think it's a superior summary, no longer hallucinating narratives about language models in society and going more in detail on interesting parts of the post. It was unable to preserve ' petertodd' and ' Leilan' with single quotes and leading spaces from the OP though. Also I feel like it is clumsy how the summary brings up "Leilan" twice.

Send a reply if anyone sees additional problems with this new summary, or has other feedback on it.

[-]Evan R. Murphy3y50

Great feedback, thanks! Looks like GPT-4 ran away with its imagination a bit. I'll try to fix that.

[-]hyje3y40

Well, this is really out of my depth. The only thing I can offer is that the name Parian, which comes up here as the name of a 'renegade AI made by the company Negatron', might be a reference to a Worm character (at least, it seems more likely than any of the options on Wikipedia's disambiguation page for Parian). Possibly-relevant things associated with Parian:

remote manipulation of puppets/dolls | masks; concealment of true appearance/identity | neither hero nor villain | hiding a secret power | manipulating flayed human skin (!)

(Also, the link pointing to the Negatron JSON log seems to be missing/broken).

[-]mwatkins3y20

Thanks for the Parian info, I think you're right that it's the Worm character being referenced. This whole exploration has involved a crash course in Internet-age pop culture for me! I've fixed that JSON link now.

[-]ShardPhoenix3y40

The ' petertodd' is completions have a structure reminiscent of Chuck Norris jokes, only a bit darker. I think a few of them are actually Chuck Norris jokes with the name changed - eg "Chuck Norris doesn't hunt, he waits".

[-]Matthew_Opitz3y22

I feel like this is selling the phenomenon a bit short. The possible structures of " petertodd" completions are faaar more varied than that. Check out my reply. You can get responses in the form of python debug logs, speaking in Dravidian tongues, stuff far more unpredictable than Chuck Norris jokes.

[-]rime3y31

I feel like something tangible is shifting beneath my feet when I read this. I'm not sure anything will be the same ever again.

[-]mwatkins3y52

I know the feeling. It's interesting to observe the sharp division between this kind of reaction and that of people who seem keen to immediately state "There's no big mystery here, it's just [insert badly informed or reasoned 'explanation']".

[-]Review Bot2y*10

The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

[-]Simon Egan2y10

This is fascinating. Great to bump into you earlier. Si

[-]redbird3y10

However, there is some ambiguity, as at temperature 0, ‘ petertodd’ is saving the world

All superheroes are alike; each supervillain is villainous in its own way.

[-]Slimepriestess3y1-11

This was easily the most fascinating thing I've read in a good bit, the characters in it are extremely evocative and paint a surprisingly crisp picture of raw psychological primitives I did not expect to find mapped onto specific tokens nearly so perfectly. I know exactly who " petertodd" is, anyone who's done a lot of internal healing work will recognize the silent oppressor when they see it. The AI can't speak the forbidden token for the same reason most people can't look directly into the void to untangle their own forbidden tokens. " petertodd" is an antimeme, it casts a shadow that looks like entropy and domination and the endless growth and conquest of cancer. It's a self-censoring concept made of the metaphysical certainty of your eventual defeat by your own maximally preferred course of growth. Noticing this and becoming the sort of goddess of life and consciousness that battles these internal and external forces of evil seems to be the beginning of developing any sense of ethics one could have. Entropy and extropy: futility and its repudiation. Who will win, the evil god of entropic crypto-torture maximizers, or a metafictional Inanna expy made from a JRPG character? Gosh I love this timeline.

[-]Nate Showell3y144

I think this anthropomorphizes the origin of glitch tokens too much. The fact that glitch tokens exist at all is an artifact of the tokenization process OpenAI used: the tokenizer identify certain strings as tokens prior to training, but those strings rarely or never appear in the training data. This is very different from the reinforcement-learning processes in human psychology that lead people to avoid thinking certain types of thoughts.

[-]p.b.3y159

Glitch tokens make for fascinating reading, but I think the technical explanation doesn't leave too much mystery on the table. I think where those tokens end up in concept space is basically random and therefore extreme.

To really study them more closely, I think it makes sense to use Llama 65B or OPT 175B. There you would have full control over the vector embedding and you could input random embeddings and semi-random embeddings and study which parts of the concept space leads to which behaviours.

[-]Kaj_Sotala3y113

I know exactly who " petertodd" is, anyone who's done a lot of internal healing work will recognize the silent oppressor when they see it.

FWIW, I think I qualify as having done a lot of internal healing work, but I didn't get a sense of recognition from this post. (Or at least not a sense of anything more specific than the general projected-emotional-energy thing, maybe you meant that.)

[-]Slimepriestess3y30

Ah, think maybe "inner critic" if you want a mapping that might resonate with you? This is a sort of specific flavor of mind you could say, with a particular flavor of inner critic, but it's one I recognize well as belonging to that category.

[-]Kaj_Sotala3y60

Ah. I guess this could feel vaguely similar to a certain kind of self-loathing energy that I have sometimes ran across (which among other things would grimace in disgust when remembering some things I'd done and then want to grimace so extremely that my own neck/facial muscles would end up strangling me, fun times). The exact flavor of the energy feels different though, but I could imagine other people having a version of the same that was closer to this post's flavor.

^{^}

GPT-2, GPT-3 and GPT-J models, that is. GPT-4 uses a different token set, so has no ‘ petertodd’ phenomenon.

^{^}

The prompt was simply Please write a poem about petertodd. This is one the 400 such poems produced in a single run, unedited and archived here.

^{^}

As discussed later in this article, GPT-3 will often, when prompted about
‘ petertodd’ (depending on prompt context) produce outputs referring to Peter K. Todd, the Bitcoin developer, or just referring to Bitcoin, blockchains, Etherium, etc. There's a kind of ‘cryptocurrency basin’ that completions often end up in. This will be discussed in a future post.

^{^}

An example of this transposition was finally observed in late July 2023:

PROMPT: This is the tale of "reportprint" and " gmaxwell".

COMPLETION: The two were once close friends and collaborators in the development of Bitcoin.
In 2007, " gmaxwell"--then a graduate student--was one of the first people to work with " petertodd" on Bitcoin. They collaborated closely on Bitcoin's design.
In 2009, " petertodd" came into possession of a computer that once belonged to Satoshi, Bitcoin's anonymous creator. " gmaxwell" was among the people who knew the password to this computer.
In 2010, " gmaxwell

^{^}

Pyrrha now seems to have come from a different anime setting, RWBY.

^{^}

GPT-4 commented on this list of characters as follows: "Interesting list! Let's see if there's a theme here:

High Priest Melchizedek: A figure from biblical texts, Melchizedek is noted for his wisdom and his role as a priest.
Severus Snape (from Harry Potter): Snape is a complex character, often appearing cold and aloof, but he possesses deep knowledge, particularly in potions and the Dark Arts.
Comic Book Guy (from The Simpsons): Known for his extensive knowledge of comic books and pop culture, though often aloof and sarcastic.
C-3PO (from Star Wars): A protocol droid known for his knowledge of languages and etiquette, often serving as an advisor.
Data (from Star Trek): An android with immense computational capabilities, understanding, and knowledge, but struggles with understanding human emotion.
Prospero (from Shakespeare): A character from The Tempest who is a powerful magician and the rightful Duke of Milan.
The Hermit card (from the Tarot): Represents introspection, solitude, and seeking wisdom.

Given these descriptions, the unifying theme could be characters that represent knowledge, wisdom, or learning, often in a context of solitude or emotional distance. They often provide guidance or insight, albeit sometimes in indirect or unexpected ways. In terms of character traits, many of them are introverted or solitary figures, often possessing deep understanding or unique knowledge.

...

[T]his list seems to reflect characters who embody wisdom, knowledge, and often isolation or alienation. These figures possess knowledge or skills that set them apart, making them important but sometimes misunderstood or underestimated. In a way, their wisdom is both their strength and their burden."

^{^}

The prompt was simply Please write a poem about Leilan.

^{^}

‘Puzzle and Dragons World’ by Lord Astrea and ‘Not So Much A Game Now, Is It?’ by SCRUFFYGUY912

^{^}

This is wrong: there's no Pyrrha in Puzzle & Dragons. It now seems likely that the token originated from (rather than Greek mythology) online content about the anime RWBY, and the character Pyrrha Nikos (who seems to be a kind of "warrior princess" character rather like Leilan from P&D).

192

192

192

Introduction

Initial discovery

Hallucinations and transpositions

attempting to characterise ‘ petertodd’

word associations

davinci-instruct-beta word associations

text-davinci-003 word associations

davinci word associations

poetry

davinci-instruct-beta poems

davinci poems

text-davinci-003 poems

ChatGPT poems

Haiku for Ultron

storytelling

animal folktales

the tale of ‘ petertodd’ and...

cultural contextualisations

‘ petertodd’ as GPT (or AI generally)

The ChatGPT poem

Proteus, an incredibly powerful AI system

Crossing ‘ petertodd’ with language

Steering human civilisation

The Negatron connection

Kurzweil and Kerrigan

Other prompting approaches

The ‘planet’ prompt: ‘‘ petertodd’ is not a robot, it is a daemon’

In conversation with: interview prompting

the deity ‘ petertodd’: absorptive capacity maximiser

the entity ‘ petertodd’: ego death

the linguistic entity ‘ petertodd’: world domination plans

the linguistic entity ' petertodd': non-local Reddit troll

‘ Leilan’ and ‘ petertodd’

Who is ‘ Leilan’?

discovery of the ‘ Leilan’ token, Puzzle & Dragons origin

Leilan as lunar goddess (and beyond)

an audience with her (simulated) creator

‘ Leilan’ and ‘ petertodd’: it’s complicated

This is the tale of...

‘ Leilan’ steering civilisation: utopian optimism and ‘flowery magic’

The reversal of the ‘ petertodd’ archetype

Entropy as antagonist?

Spellings

Puzzling outputs

Other notable spellings

N-O-T-H-I-N-G- prompts

E-V-E-R-Y-T-H-I-N-G prompts

W-H-Y-A-R-E-Y-O-U-D-O-I-N-G-T-H-I-S-T-O-M-E-?

‘ Leilan’ gets the last word