Why LLMs Waste So Much Cognitive Bandwidth — and How to Fix It

This post was rejected for the following reason(s):

Insufficient Quality for AI Content. There’ve been a lot of new users coming to LessWrong recently interested in AI. To keep the site’s quality high and ensure stuff posted is interesting to the site’s users, we’re currently only accepting posts that meet a pretty high bar.
If you want to try again, I recommend writing something short and to the point, focusing on your strongest argument, rather than a long, comprehensive essay. (This is fairly different from common academic norms.) We get lots of AI essays/papers every day and sadly most of them don't make very clear arguments, and we don't have time to review them all thoroughly.
We look for good reasoning, making a new and interesting point, bringing new evidence, and/or building upon prior discussion. If you were rejected for this reason, possibly a good thing to do is read more existing material. The AI Intro Material wiki-tag is a good place, for example.
Not concrete, technical or novel enough: The ideas in this post aren't wrong exactly, but chatbots like ChatGPT already incorporate the ideas here are the level of detail currently presented, to say something useful you'd need to get into technical details.
Capabilities. Also, while we don't have a hard rule against it, do be aware that increasing capabilities of AI is not the main focus area of LessWrong, which is generally more worried about AI progress leading to human extinction. See https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities

1. Selective Ignoring

Humans constantly filter out irrelevant input — a survival adaptation evolved over millions of years.
LLMs, on the other hand, accumulate tokens indiscriminately, treating each word from earlier in the conversation with equal weight.

A model that can forget tactically — deprioritizing stale, low-salience, or outdated context — would free up cognitive bandwidth and reduce waste.

2. Interaction Continuity

Each new session is a hard reset. Valuable patterns of reasoning, clarification, and alignment are discarded at every turn.

Persisting selected context across sessions, or allowing user-anchored memory layers (with privacy-preserving opt-in), would create a more coherent and collaborative partner. It would also reduce redundant prompts and regenerate less.

Additionally, current models struggle with multi-topic switching — a human-like conversation often moves between distinct yet related themes. When models lack an internal structure for managing topic transitions, they either reset prematurely or confuse the threads. This undermines reasoning continuity and makes long-term collaboration harder.

3. Predictive Emphasis

Human conversations involve constant inference — guessing what the other person really wants to focus on.
LLMs could approximate this by prioritizing likely-intended tokens, inferred from short-term memory patterns, context clues, or prior user behavior.

Even basic heuristics here could improve perceived relevance and drastically lower unnecessary branching.

Note: The 30–50% figure here is based on my personal experience using LLMs over long-term dialogue sessions, especially when switching between topics or referencing earlier context.
While not empirically validated, I believe this inefficiency is broadly recognizable to anyone interacting with models in creative or dynamic tasks, such as writing, coding, or image/video generation.

A truly intelligent system isn’t the one that remembers everything —
It’s the one that knows what to ignore, when to forget, and how to prioritize what the human really meant.

This post is written by a real person, based on actual usage patterns and frustrations. Feedback welcome.

LESSWRONG
LW

LESSWRONG
LW

1

Why LLMs Waste So Much Cognitive Bandwidth — and How to Fix It

1

1

1. Selective Ignoring

2. Interaction Continuity

3. Predictive Emphasis