A smart human-like mind looking at all these pictures would (I claim) assemble them all into one big map of the world, like the original, either physically or mentally.
On my model, humans are pretty inconsistent about doing this.
I think humans tend to build up many separate domains of knowledge and then rarely compare them, and even believe opposite heuristics by selectively remembering whichever one agrees with their current conclusion.
For example, I once had a conversation about a video game where someone said you should build X "as soon as possible", and then later in the conversation they posted their full build priority order and X was nearly at the bottom.
In another game, I once noticed that I had a presumption that +X food and +X industry are probably roughly equally good, and also a presumption that +Y% food and +Y% industry are probably roughly equally good, but that these presumptions were contradictory at typical food and industry levels (because +10% industry might end up being about 5 industry, but +10% food might end up being more like 0.5 food). I played for dozens of hours before realizing this.
Does this cache out into concrete predictions of tasks which you expect LLMs to make little progress on in the future?
A very literal eval your post would suggest is to literally take two maps or images of some kind of similar stylistic form but different global structure, cut them into little square sections, and ask a model to partition the pieces from both puzzles into two coherent wholes. I expect LLMs to be really bad at this task right now, but they're very bad at vision in general so "true understanding" isn't really the bottleneck IMO.
But one could do a similar test for text-based data; eg one could ask a model to reconstruct two math proofs with shared variable names based on an unordered list of the individual sentences in each proof. Is this the kind of thing you expect models to make unusually little progress on relative to other tasks of similar time horizon? (I might be down to bet on something like this, though I think it'll be tricky to operationalize something crisply enough.)
My counterargument, if I’m trying to play devil’s advocate, is that humans seem to notice this sort of thing in an online way. We don’t need to grow a 3x larger brain in order to notice and fix inconsistencies.
Having spent a lot of time attempting to explain things to my 3 year old children, I'm far from certain this is the case. No matter how many times we explain the difference between a city and a country, and when we go on a car, vs a plane she'll ask as us right after a five hour flight whether we're close to my work, which we usually go to on a one hour train.
My 5 year old groks all this intuitively, but there's very little point explaining it to the 3 year old (even though she talks beautifully, and can understand all the sentences we say as standalone facts). At some point her brain will grow more sophisticated and she'll grok all of this too.
I observed the same process with puzzles. No matter how many times I point out what corner pieces and edge pieces are, they simply cannot work out that a corner piece has to be next to an edge piece until they're about 3. No amount of explaining or examples will help.
Disagree with these. Humans don't automatically make all the facts in their head cohere. I think its plausible that they're worse at humans at doing this. But that seems insufficient for making a discrete demarcation. For example:
and then make some totally different and incompatible assumption about the symbol later on in the proof, as though it means something totally different.
This happens pretty often with humans actually? Like one of the most common ways people (compsci undergrads and professional mathematicians alike) make errors in proofs is like
Now we've proven that if object G has properties A,B,C,D, G also has property P ... [steps] ... And as we know, object G' has properties A,B,C therefore has property P...
I agree that there are ways LLMs understanding is shallower than humans, but from my PoV, a lot of that impression comes from.
Disagree with these. Humans don't automatically make all the facts in their head cohere.
Hm, do you see the OP as arguing that it happens "automatically"? My reading was more like that it happens "eventually, if motivated to figure it out" and that we don't know how to "motivate" LLMs to be good at this in an efficient way (yet).
people (compsci undergrads and professional mathematicians alike) make errors in proofs
Sure, and would you hire those people and rely on them to do a good job BEFORE they learn better?
I’ve been working towards automated research (for safety) for a long time. After a ton of reflection and building in this direction, I’ve landed on a similar opinion as presented in this post.
I think LLM scaffolds will solve some problems, but I think they will be limited in ways that make it hard to solve incredibly hard problems. You can claim that LLMs can just use a scratchpad as a form of continual online learning, it feels like this will hit limits. Information loss and being able to internalize new information feels like bottlenecks.
Scale will help, but unclear how far it will go and clearly not economical.
That said, I still think automated research for safety is underinvested.
For example, when tasked with proving mathematical claims, a common pattern I’ve noticed from LLMs is that they’ll define a symbol to mean one thing… and then make some totally different and incompatible assumption about the symbol later on in the proof, as though it means something totally different.
Happened to me; an LLM defined "countable sequence" as (1) a sequence of countable length, (2) a sequence containing countable values; without saying either of these explicitly. Designed a page-length proof, which proved the theorem by secretly switching between the meanings in the middle.
I would strongly prefer if it said "I don't know" rather than give me false hopes and let me waste time debugging the proof, but I guess that's not what is rewarded during the reinforcement learning.
Stepping back to the meta level (the OP seems a fine), I worry that you fail to utilize LLMs.
"There is are ways in which John could use LLMs that would be useful in significant ways, that he currently isn't using, because he doesn't know how to do it. Worse he doesn't even know these exist."
I am not confident this statement is true, but based on things you say, and based on how useful I find LLMs, I intuit there is a significant chance it is true.
If the statement is true or not doesn't really matter, if the following is true: "John never seriously sat down for 2 hours and really tried to figure out how to utilize LLMs full."
E.g. I expect when you had the problem that the LLM reused symbols randomly you didn't go: "Ok how could I prevent this from happening? Maybe I could create an append only text pad, in which the LLM records all definitions and descriptions of each symbol, and have this text pad be always appended to the prompt. And then I could have the LLM verify that the current response has not violated the pad's contents, and that no duplicate definitions have been added to the pad."
Maybe this would resolve the issue, probably not based on priors. But it seems important to think this kind of thing (and think for longer such that you get multiple ideas, of which one might work, and ideally first focus on trying to build a mechanistic model of why the error is happening in the first place, that allows you to come up with better interventions).
I somewhat agree with your description of how LLMs seem to think, but I don't think it is an explanation of a general limitation of LLMs. But the patterns you describe do not seem to me to be a good explanation for how humans think in general. Ever since The Cognitive Science of Rationality has it been discussed here that humans usually do not integrate their understanding into a single, coherent map of the world. Humans instead build and maintain many partial, overlapping, and sometimes contradictory maps that only appear unified. Isn't that the whole point of Heuristics & Biases? I don't doubt that the process you describe exists or is behind the heights of human reasoning, but it doesn't seem to be the basis of the main body of "reasoning" out there on the internet on which LLMs are trained. Maybe they just imitate that? Or at least they will have a lot of trouble imitating human thinking while still building a coherent picture underneath that.
I think "understanding" in humans is an active process that demands cognitive skills we develop with continuous learning. I think you're right that LLMs are missing "the big picture" and organizing their local concepts to be consistent with it. I don't think humans do this automatically (per Dweomite's comment on this post), but that we need to learn skills to do it. I think this a lot of what LLMs are missing (TsviBT's "dark matter of intelligence").
I wrote about this in Sapience, understanding, and "AGI" but I wasn't satisfied and it's out of date. This is an attempt to do a better and briefer explanation, as a sort of run-up to doing an updated post.
We've learned skills for thought management/metacognition/executive function. They're habits, not beliefs (episodic memories or declarative knowledge), so they're not obvious to us. We develop "understanding" by using those skills to metaphorically turn over concepts in our minds. This is actively comparing them to memories of data, and other beliefs. Doing this checks their consistency with other things we know. Learning from these investigations improves our future understanding of that concept, and our skills for understanding others.
What LLMs are missing relative to humans is profound right now, but may be all too easy to add adequately to get takeover-capable AGI. Among other things (below), they're missing cognitive skills that aren't well-described in the text training set, but may be pretty easy to learn with a system 2 type approach that can be "habitized" with continuous learning. This might be as easy as a little fine-tuning, if the interference problem is adequately solved - and what's adequate might not be a high bar. Fine-tuning already adds this type of skills, but it seems to produce too much interference for it to keep going. And I don't know of a full self-teaching loop, although there is constant progress on most or all of the components to build one.
There may be other routes to filling in that missing executive function and active processing for human-like understanding.
This is why I'm terrified of short timelines while most people have slightly longer timelines at this point.
I've been thinking about this a lot in light of the excellent critiques of LLM thinking over the last year. My background is "computational cognitive neuroscience," so comparing LLMs to humans is my main tool for alignment thinking.
When I was just getting acquainted with LLMs in early 2023, my answers were that they're missing episodic memory (for "snapshot" continuous learning) and "executive function", a vague term that I'm now thinking is mostly skills for managing cognition. I wrote about this in Capabilities and alignment of LLM cognitive architectures in early 2023. If you can overlook my focus on scaffolding, I think it stands up as a partial analysis of what LLMs are missing and the emergent/ synergistic/ multiplicative advantages of adding those things.
But it's incomplete. I didn't emphasize continuous skill learning there, but I now think it's pretty crucial for how humans develop executive function and therefore understanding. I don't see a better way to give it to agentic LLMs. RL on tasks could do it, but that has a data problem if it's not self-directed like human learning is. But there might be other solutions.
I think this is important to figure out. It's pretty crucial for both timelines and alignment strategy.
Two more passes of my own:
Sprinting—Usain Bolt is a world-class sprinter, but does he know the underlying physics behind sprinting? No. What he has is his genes and the muscle memory that resulted from years of training his form. The fact that he doesn't know the physics implies what I might call a ghost that's acting when Usain sprints. It's a ghost because there is no knowledge of physics in there, just neural firing patterns, remnants, that imply lots of training in the past. Now if you were to capture his sprinting with a camera, and feed those pixels to a biomechanist to be interpreted, only then is there a deeper understanding present. The biomechanist can look at those pixels and gain a deeper understanding that generalizes further.
Forecasting—Suppose that you have no prior knowledge of physics and you're forecasting the result of a collision between two objects inside some predefined volume. (So there's no influence from anything outside.) If all the starting examples you are given contain air, you might naively predict that there would be noise in a vacuum, but you would be wrong. In order to generalize further, you need a deeper understanding, a model of physics with more gears.[1]
All that is to say, at least at the moment, I have longer timelines than 2028. I don't think LLMs are capable of kicking off real RSI where they can improve themselves over and over again. A hint here is that you get better results with tokens rather than letters. That implies that they are mostly just ghosts combining tokens into interesting patterns that imply lots of training in the past. But since the output is tokens, that can be easily interpreted by a human, and that is where the real understanding lies.
See Hofstadter, Gödel, Escher, Bach, 82. ↩︎
>It would, for instance, never look at the big map and hypothesize continental drift.
Millions of humans must have looked at relatively accurate maps of the globe without hypothesizing continental drift. A large number must have also possessed sufficient background knowledge of volcanism, tectonic activity etc to have had the potential to connect the dots.
Even the concept of evolution experienced centuries or millenia of time between widespread understanding and application of selective breeding, without people before Darwin/Wallace making the seemingly obvious connection that the selection pressure on phenotype and genotype could work out in the wild. Human history is littered with a lot of low hanging fruit, as well as discoveries that seem unlikely to have been made without multiple intermediate discoveries.
I believe it was Gwern who suggested that future architectures or training programs might have LLMs "dream" and attempt to draw connections between separate domains of their training data. In the absence of such efforts, I doubt we can make categorical claims that LLMs are incapable of coming up with truly novel hypotheses or paradigms. And even if they did, would we recognize it? Would they be capable of, or even allowed to follow up on them?
Edit: Even in something as restricted as artistic "style", Gwern raised the very important question of whether a truly innovative leap by an image model would be recognized as such (assuming it would if a human artist made it) or dismissed as weird/erroneous. The old deep dream was visually distinct from previous human output, yet I can't recall anyone endorsing it as an AI-invented style.
I personally, as a child, looked at a map of the world and went "huh, it sure looks like these continents over here kinda fit in over there, maybe they moved?", before I had learned of continental drift.
(For some reason I remember the occasion quite well, like I remember the spot where I was sitting at the time.)
Gwern's essay you mentioned, in case others are curious: https://gwern.net/ai-daydreaming
Despite impressive capabilities, large language models have yet to produce a genuine breakthrough. The puzzle is why.
A reason may be that they lack some fundamental aspects of human thought: they are frozen, unable to learn from experience, and they have no “default mode” for background processing, a source of spontaneous human insight.
To illustrate the issue, I describe such insights, and give an example concrete algorithm of a day-dreaming loop (DDL): a background process that continuously samples pairs of concepts from memory. A generator model explores non-obvious links between them, and a critic model filters the results for genuinely valuable ideas. These discoveries are fed back into the system’s memory, creating a compounding feedback loop where new ideas themselves become seeds for future combinations.
The cost of this process—a “daydreaming tax”—would be substantial, given the low hit rate for truly novel connections. This expense, however, may be the necessary price for innovation. It would also create a moat against model distillation, as valuable insights emerge from the combinations no one would know to ask for.
The strategic implication is counterintuitive: to make AI cheaper and faster for end users, we might first need to build systems that spend most of their compute on this “wasteful” background search. This suggests a future where expensive, daydreaming AIs are used primarily to generate proprietary training data for the next generation of efficient models, offering a path around the looming data wall.
I'd also highlight the obstacles and implications sections:
Obstacles and Open Questions
…Just expensive. We could ballpark it as <20:1 based on the human example, as an upper bound, which would have severe implications for LLM-based research—a good LLM solution might be 2 OOMs more expensive than the LLM itself per task. Obvious optimizations like load shifting to the cheapest electricity region or running batch jobs can reduce the cost, but not by that much.
Cheap, good, fast: pick 2. So LLMs may gain a lot of their economic efficiency over humans by making a severe tradeoff, in avoiding generating novelty or being long-duration agents. And if this is the case, few users will want to pay 20× more for their LLM uses, just because once in a while there may be a novel insight.
This will be especially true if there is no way to narrow down the retrieved facts to ‘just’ the user-relevant ones to save compute; it may be that the most far-flung and low-prior connections are the important ones, and so there is no easy way to improve, no matter how annoyed the user is at receiving random puns or interesting facts about the CIA faking vampire attacks.
Implications
Only power-users, researchers, or autonomous agents will want to pay the ‘daydreaming tax’ (either in the form of higher upfront capital cost of training, or in paying for online daydreaming to specialize to the current problem for the asymptotic scaling improvements, see AI researcher Andy Jones 2021).
Data moat. So this might become a major form of RL scaling, with billions of dollars of compute going into ‘daydreaming AIs’, to avoid the “data wall” and create proprietary training data for the next generation of small cheap LLMs. (And it is those which are served directly to most paying users, with the most expensive tiers reserved for the most valuable purposes, like R&D.) These daydreams serve as an interesting moat against naive data distillation from API transcripts and cheap cloning of frontier models—that kind of distillation works only for things that you know to ask about, but the point here is that you don’t know what to ask about. (And if you did, it wouldn’t be important to use any API, either.)
Given RL scaling laws and rising capital investments, it may be that LLMs will need to become slow & expensive so they can be fast & cheap.
First off, TFTP. I marked some stuff I thought was most relevant. This is helping remind me of some things I think about LLM confabulation and lack of binding/reasoning... I don't have my thoughts fully formed but there's something here about global inconsistency despite local compatibility, and how that cashes out in Problems. Something a little like an inability to define a sheaf, or homology detection, or something like that? I might say more better words later about it.
When I put on my LLM skeptic hat, sometimes I think things like “LLMs don’t really understand what they’re saying”. What do I even mean by that? What’s my mental model for what is and isn’t going on inside LLMs minds?
First and foremost: the phenomenon precedes the model. That is, when interacting with LLMs, it sure feels like there’s something systematically missing which one could reasonably call “understanding”. I’m going to articulate some mental models below, but even if I imagine all those mental models are wrong, there’s still this feeling that LLMs are missing something and I’m not quite sure what it is.
That said, I do have some intuitions and mental models for what the missing thing looks like. So I’ll run the question by my intuitions a few times, and try to articulate those models.
First Pass: A Bag Of Map-Pieces
Imagine taking a map of the world, then taking a bunch of pictures of little pieces of the map - e.g. one picture might be around the state of Rhode Island, another might be a patch of Pacific Ocean, etc. Then we put all the pictures in a bag, and forget about the original map.
A smart human-like mind looking at all these pictures would (I claim) assemble them all into one big map of the world, like the original, either physically or mentally.
An LLM-like mind (I claim while wearing my skeptic hat) doesn't do that. It just has the big bag of disconnected pictures. Sometimes it can chain together three or four pictures to answer a question, but anything which requires information spread across too many different pictures is beyond the LLM-like mind. It would, for instance, never look at the big map and hypothesize continental drift. It would never notice if there's a topological inconsistency making it impossible to assemble the pictures into one big map.
Second Pass: Consistent Domains
Starting from the map-in-a-bag picture, the next thing which feels like it’s missing is something about inconsistency.
For example, when tasked with proving mathematical claims, a common pattern I’ve noticed from LLMs is that they’ll define a symbol to mean one thing… and then make some totally different and incompatible assumption about the symbol later on in the proof, as though it means something totally different.
Bringing back the map-in-a-bag picture: rather than a geographical map, imagine lots of little pictures of a crystal, taken under an electron microscope. As with the map, we throw all the pictures in a bag. A human-like mind would try to assemble the whole thing into a globally-consistent picture of the whole crystal. An LLM-like mind will kinda… lay out a few pieces of the picture in one little consistent pattern, and then separately lay out a few pieces of the picture in another little consistent pattern, but at some point as it’s building out the two chunks they run into each other (like different crystal domains, but the inconsistency is in the map rather than the territory). And then the LLM just forges ahead without doing big global rearrangements to make the whole thing consistent.
That’s the mental picture I associate with the behavior of LLMs in proofs, where they’ll use a symbol to mean one thing in one section of the proof, but then use it in a totally different and incompatible way in another section.
Third Pass: Aphantasia
What’s the next thing which feels like it’s missing?
Again thinking about mathematical proofs… the ideal way I write a proof is to start with an intuitive story/picture for why the thing is true, and then translate that story/picture into math and check that all the pieces follow as my intuition expects.[1]
Coming back to the map analogy: if I were drawing a map, I’d start with this big picture in my head of the whole thing, and then start filling in pieces. The whole thing would end up internally consistent by default, because I drew each piece to match the pre-existing picture in my head. Insofar as I draw different little pieces in a way that doesn’t add up to a consistent big picture, that’s pretty strong evidence that I wasn’t just drawing out a pre-existing picture from my head.
I’d weakly guess that aphantasia induces this sort of problem: an aphantasic, asked to draw a bunch of little pictures of different parts of an object or animal or something, would end up drawing little pictures which don’t align with each other, don’t combine into one consistent picture of the object or animal.
That’s what LLMs (and image generators) feel like. It feels like they have a bunch of little chunks which they kinda stitch together but not always consistently. That, in turn, is pretty strong evidence that they’re not just transcribing a single pre-existing picture or proof or whatever which is already “in their head”. In that sense, it seems like they lack a unified mental model.
Fourth Pass: Noticing And Improving
A last piece: it does seem like, as LLMs scale, they are able to assemble bigger and bigger consistent chunks. So do they end up working like human minds as they get big?
Maybe, and I think that’s a pretty decent argument, though the scaling rate seems pretty painful.
My counterargument, if I’m trying to play devil’s advocate, is that humans seem to notice this sort of thing in an online way. We don’t need to grow a 3x larger brain in order to notice and fix inconsistencies. Though frankly, I’m not that confident in that claim.
I don’t always achieve that ideal; sometimes back-and-forth between intuition and math is needed to flesh out the story and proof at the same time, which is what most of our meaty research looks like.