My model of a GPT-4 without a self-model which was told to create a compression technique for text is that it would copy something from its training data which related to the string 'compression,' but which it couldn't actually decompress. It would take out all the vowels and spaces or something, and if you asked it in the same instance it would give you the 'decompression' by repeating back what you just told it, but if you asked it on another instance, it wouldn't understand the language at all.

Am I missing something? Is there a way it could gain an understanding of its own capabilities from something in its training data?

To put it more simply- an LLM pseudo-language has never existed before, and wouldn't be in the training data, so what are the odds that GPT-4 creates one on its first try which actually sort of works? Wouldn't we expect that it would create a human imitation/idea of a pseudo-language which fails to actually be comprehensible to itself?

New to LessWrong?

New Answer
New Comment

1 Answers sorted by

baturinsky

Apr 10, 2023

10
  1. It could unpack it in the same instance because the original was still in the context window.
  2. Omission of letters is commonly used in chats, was used in telegrams, many written languages were not using vowels and/or whitespaces, or used hyeroglyphs. So it by no means is original.
  3. GPT/Bing has some self-awareness. For example, it explicitly refers to itself "as a language model"
  1. Yes I know? I thought this was simple enough that I didn't bother to mention it in the question? But it's pretty clearly implied in the last sentence of the first paragraph?

  2. This is a good data point.

  3. If you tell it to respond as a Oxford professor, it will say 'As an Oxford professor,' it's identity as a language model is in the background prompt and probably in the training, but if it successfully created a pseudo-language that worked well to encode things for itself, that would indicate a deeper level understanding of its own capabilities.