This is somewhat speculative, but borne out by empirical data: LLMs developed an analogue of human System 1 and System 2, and their version of "System 2" is also slow, effortful and unreliable, just like ours.

This is an unusual emergent behavior, by all accounts. It is not like there is a shortage of accurate math in the training data. Maybe distilling these particular rules from the data is harder? 

Interestingly enough, augmentation works for LLMs the same way it works for humans: you can get a math plugin, such as Wolfram Alpha, which enhances the model's abilities temporarily, but does not improve its "native" abilities with prolonged use. The moment the plugin (or a human's calculator) is "disconnected" the model (or the human) is back to its previous "self", not having learned much. 

Maybe reasoning is harder in general? Or maybe it is an artifact of the training data? Is something like a System 2 an emergent feature? Does it mean that there is a "ceiling" of sorts for logical reasoning by the current crop of models, and we should not expect them to, say, code better than a human, write better poetry, prove theorems and discover new laws of nature? Would it have some implication for super-intelligent machines? I must admit, I am quite confused.

New to LessWrong?

New Comment
2 comments, sorted by Click to highlight new comments since: Today at 3:47 PM

This seems to be a consequence of having a large but not-actually-that-deep-in-serial-steps net trained on next token prediction of a big pile of human data. AI doesn't have to be like that - I expect something that can competently choose which cognitive strategies to execute will be much better at multiplication than a human, but it's hard to get to that kind of AI by predictive training on a big pile of human data.

I think this is the point. Existing training creates something like System 1, which now happens to match what humans find "natural". Something else is probably needed to make math "natural" for ML models.