Noam Chomsky famously believes that language originated to facilitate thought, but then came to be a medium of communication. Others believe the reverse, that it originated as a facility for communication which turned out to facilitate thinking. That is certainly my view.
If that is so, then one would think that language is structured to facilitate communication. Communication is serial, one token at a time, token after token. That would imply that language is structured to facilitate next-token prediction. That would in-turn imply that the relational structure of semantics would evolve to facilitate mapping between the linear structure of the language string and the multidimensional structure of meaning. You want to be able to efficiently project multidimensional semantic structure onto a string and to reconstruction multidimensional semantic structure from a string.
How are LLMs trained? By next token prediction. That is to say, the training regime mirrors the primary communication constraint governing the structure of language. So it is with text generation as well. Language is spoken one token at a time, and so LLMs generate texts, one token at a time.
The tasks that LLMs have trouble with, such as planning and arithmetic, ARE NOT primarily communicative in nature. They are tasks for thought, for reasoning.
Noam Chomsky famously believes that language originated to facilitate thought, but then came to be a medium of communication. Others believe the reverse, that it originated as a facility for communication which turned out to facilitate thinking. That is certainly my view.
If that is so, then one would think that language is structured to facilitate communication. Communication is serial, one token at a time, token after token. That would imply that language is structured to facilitate next-token prediction. That would in-turn imply that the relational structure of semantics would evolve to facilitate mapping between the linear structure of the language string and the multidimensional structure of meaning. You want to be able to efficiently project multidimensional semantic structure onto a string and to reconstruction multidimensional semantic structure from a string.
How are LLMs trained? By next token prediction. That is to say, the training regime mirrors the primary communication constraint governing the structure of language. So it is with text generation as well. Language is spoken one token at a time, and so LLMs generate texts, one token at a time.
The tasks that LLMs have trouble with, such as planning and arithmetic, ARE NOT primarily communicative in nature. They are tasks for thought, for reasoning.