jade — LessWrong

For context, I have moderate experience working with LLMs, and I think this is a great summary for laypeople. I've observed humans have a general tendency to anthropomorphize behavior that seems "intelligent", and it seems more productive to resist that tendency and seek better explanations.

At risk of becoming too technical, one topic that could help bridge the "From predictor to generator" and "A better guessing machine" sections is a bit more detail on how outputs are chosen in modern models. The greedy (choose the most likely next word) and random (sample from the distribution of next words) strategies are mentioned, but the most common method used today is some form of beam search or multi-token sampling, which explores multiple future sequences and chooses its next words based on some metric.

Metaphorically, this behavior seems "human" -- one can imagine a writer beginning a sentence, then hastily deleting it in favor of something more coherent. But the metric for the human writer is "does this clearly communicate the idea I'm trying to convey?", while the metric for the LLM is generally some variant of "is this output statistically likely to match the training data?"

Can submarines swim?

jade3y10

Thanks! I had actually skimmed this recently but forgot to add it to my reading list. The cherry-picked examples for text generation seem a bit low-information, but it would be interesting to see their technique applied to a larger model.

jade3y30

Huggingface has a nice guide that covers popular approaches to generation circa 2020. I recently read about tail free sampling as well. I'm sure other techniques have been developed since then, though I'm not immersed enough in NLP state-of-the-art to be aware of them.

jade3y*2-1

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments