Posts

Sorted by New

Wiki Contributions

Comments

One theory I haven't seen in skimming some of the petertoddology out there:

  1. There is an fairly prominent github user named petertodd associated with crypto, and the presence of this as a token in the tokenizer is almost certainly a result of him;
  2. Crypto people tend to have their usernames sitting alongside varied crytographic hashes on the internet a lot;
  3. Cryptographic hashes are extremely weird things for a transformer, because unlike a person a transformer can't just skim past the block of text; instead they sit there furiously trying to predict the next token over and over again, filling up their context window one 4e and 6f at a time.

So some of the weird sinkhole features of this token could result from a machine that tries to reduce entropy on token sequences, encountering a token that tends to live in strings of extremely high entropy.