Epistemic status: speculating on what other people mean when they make certain claims, without having done any checking to validate.  Probably (mostly) true anyways.

Thought prompted by Tsvi's latest post: most discussions of whether LLMs "understand" things seem to conflate a few claims under the term "understand":

  1. the weights encode an algorithmic representation of the "thing" in question
  2. the weights encode some kind of much more general reasoning ability, which is sufficiently capable that it can model a large number of not-obviously-related abstractions, including the one in question
  3. #2, but in a way which also includes some kind of map-territory distinction
  4. #2, but in a way which resembles human "system 2" thinking, where it's capable of doing the laborious explicit reasoning people sometimes do when other options aren't good enough

There are probably other things meant by the claim that I haven't considered.

I suspect that some people who say that LLMs don't understand things are thinking of 3 or 4, maybe 2.  1 seems pretty obviously true (we have numerous examples: modular arithmetic, Othello, mazes, etc).  I wouldn't bet against 2; evidence in favor includes some speculation[1] that one reason the GPT-series of models are so much better than everything else is the substantial amount of code in their training data[2].  I don't think 3 or 4 have happened yet.

Once one's gotten past the linguistic and philosophical confusions, such arguments could in principle be resolved with empiricism.  In practice, most such arguments probably aren't worth the time.

  1. ^

    I don't remember where exactly I saw this claim.  If anybody has a source I'd appreciate it.

  2. ^

    The causal chain being something like "code is more structured than natural language, and that structure includes a lot of very powerful, generalizable insights, useful across many domains".  I'm not sure if this was spelled out in the source or if that was me filling in the blanks.

New to LessWrong?

New Comment
1 comment, sorted by Click to highlight new comments since: Today at 4:16 AM

#3 is a gradual property (self-awareness is one manifestation of it), and it starts to emerge already (cf. "Discovering Language Model Behaviors with Model-Written Evaluations"). No reasons to think that LLMs are somehow ontologically precluded from developing this meta-awareness to a very advanced degree.

#4, albeit not fully embodied by a "pure" LLM itself, is emulated via Tree-of-Thought, reasoning verification, etc.