LESSWRONG
LW

Tiago Chamba
0010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
My AI Predictions for 2027
Tiago Chamba12h10

LLMs, on the other hand, are feed-forward networks. Once an LLM decides on a path, it's committed. It can't go back to the previous layer. We run the entire model once to generate a token. Then, when it outputs a token, that token is locked in, and the whole model runs again to generate the subsequent token, with its intermediate states ("working memory") completely wiped. This is not a good architecture for deep thinking.

It might be the case that LLMs develop different cognitive strategies to cope with this, such as storing the working memory on the CoT tokens, so that the ephemeral intermediate steps aren't load-bearing. The effect would be that the LLM+CoT system acts as... whatever part of our brain explores ideas.

Reply
No posts to display.