textdavinci-003 completes There are 210 byte in a kilobyte. That means there are with 1,024 ~65.7% of the time.
textdavinci-003 completes There are about 8x109 people on earth. This implies that there are with approximately 8 billion ~57.1% of the time.

Reply

[-]TekhneMakre3y20

GPT gives the token "216" in the string "63 = 216" a very low probability, just as low as "215" or "217".

Replacing "63" with "62" in the prompt still gives "216" as an output with ~10% probability.

Would the tokenizer behave differently given "216" and "2^16", e.g. giving respectively the token "216" and some tokens like "**2" and "16*"? That would explain this as, GPT knows of course that 216 isn't 63, but, it's been forced to predict a relationship like "**2" + "16*" = "**63*".

Reply

[-]LawrenceC3y31

The Codex tokenizer used by the GPT-3.5 models tokenizes them differently: "216" is 1 token, "2^16" is 3 ("2", "^", "16"). Note that " 216" (with a space) is a different token, and it's what text-davinci-003 actually really wants to predict (you'll often see 100x probability ratios between these two tokens).

Here's the log probs of the two sequences using Adam's prompt above, with the trailing space removed (which is what he did in his actual setup, otherwise you get different probabilities):
2 16 -> -15.91

2^16 -> -1.34

Reply

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

47

A brainteaser for language models

47

47