It feels unnatural for LLMs/transformers to be intelligent while they can only generate a token at a time. The Idea-Gated transformers is about letting the transformer think in terms of ideas and not words. While it still generates one token at a time, a separate auxiliary head called the thinking head plans the next 20 tokens together (or plans the next idea). As expected it helps the model stay on topic and track. Please take a read and provide me your feedback. Would love to read your views.
It feels unnatural for LLMs/transformers to be intelligent while they can only generate a token at a time. The Idea-Gated transformers is about letting the transformer think in terms of ideas and not words. While it still generates one token at a time, a separate auxiliary head called the thinking head plans the next 20 tokens together (or plans the next idea). As expected it helps the model stay on topic and track.
Please take a read and provide me your feedback. Would love to read your views.
Paper Link