Humans Are Spiky (In an LLM World)
Assessments of "general" vs "spiky" capability profiles are secretly assessments of "matches existing infrastructure" vs "doesn't". Human societies contain human-shaped roles because humans were the only available workers for most of history. Packaging tasks into human-sized, human-shaped jobs was efficient. Given LLMs, the obvious thing to do is to try to drop them into those roles, giving them the same tools and affordances humans have. When that fails to work, though, we should not immediately conclude that the failure is because LLMs are missing some "core of generality". When LLM agents become more abundant than humans, as seems likely in the very near term, the most effective shape for a job stops being human-shaped. At that point, we may discover that human capability profiles are the spiky ones.
To clarify (and this was an issue of lack of clarity on my part that Claude also ended up getting confused by), I expect that many low-rank fine-tunes can be well-approximated by soft prompts, and that most soft prompts can be well-approximated by a linear combination of 20-100 hard prompts. The dictionary of hard prompts that those 20-100 hard prompts are taken from would be much (much) larger than 20-100 - there are
d_vocab ** len_prompt * n_promptspossible soft prompt decompositions. Which might be useful, ifMy modal expectation is that Claude successfully shows that my idea was bad, and why my idea was bad. Which would still be pretty good - I have lots of long-shot ideas but usually feedback from reality is slow and low bandwidth.