In the poetry case study, we had set out to show that the model didn't plan ahead, and found instead that it did.
I found it shocking they didn't think the model plans ahead. The poetry ability of LLMs since at least GPT2 is well beyond what feels possible without anticipating a rhyme by planning at least a handful of tokens in advance.
Yep. Meme NFTs are an existence proof of such people.
https://en.wikipedia.org/wiki/List_of_most_expensive_non-fungible_tokens
I suspect that LLMs likely can write blogs on par with most humans if we trained and scaffolded them appropriately, but is that really what we want from LLMs?
Claude 3.7 might not write outstanding blogs but he can help explain why not:
The fundamental mismatch between LLMs and blogging isn't primarily about capabilities, but about design and motivation:
Current LLMs are RLHF-tuned to be balanced, helpful assistants - essentially the opposite of good bloggers. Assistants hedge, acknowledge all perspectives, and avoid strong stances. Good bloggers take intellectual risks, have distinctive voices, and present unique viewpoints.
Humans blog for reasons LLMs simply don't have:
- Building intellectual reputation in a community
- Working through personal confusions
- Creative self-expression
- The social reward of changing minds
The metrics we use to evaluate LLMs (helpfulness, accuracy, harmlessness) don't capture what makes blogs compelling (novelty, intellectual risk-taking, personality).
Simply making LLMs more capable won't bridge this gap. We'd need systems with fundamentally different optimization targets - ones trained to be interesting rather than helpful, to develop consistent viewpoints rather than being balanced, and to prioritize novel insights over comprehensive coverage.
FYI, there has been even further progress with Leela odds nets. Here are some recent quotes from GM Larry Kaufman (a.k.a. Hissha) found on the Leela Chess Zero Discord:
(2025-03-04) I completed an analysis of how the Leela odds nets have performed on LiChess since the search-contempt upgrade on Feb. 27. [...] I believe these are reasonable estimates of the LiChess Blitz rating needed to break even with the bots at 5'3" in serious play. Queen and move odds (means Leela plays Black) 2400, Queen odds (Leela White) 2550, [...] Rook and move odds (Leela Black); 3000. Rook odds (Leela White) 3050, knight odds 3200. For comparison only a few top humans exceed 3000, with Magnus at 3131. So based on this, even Magnus would lose a match at 5'3" with knight odds, while perhaps the top five blitz players in the world would win a match at rook odds. Maybe about top fifty could win a match at queen for knight. At queen odds (Leela White), a "par" (FIDE 2400) IM should come out ahead, while a "par" (FIDE 2300) FM should come out behind.
(2025-03-07) Yes, there have to be limits to what is possible, but we keep blowing by what we thought those limits were! A decade ago, blitz games (3'2") were pretty even between the best engine (then Komodo) and "par" GMs at knight odds. Maybe some people imagined that some day we could push that to being even at rook odds, but if anyone had suggested queen odds that would have been taken as a joke. And yet, if we're not there already, we are closing in on it. Similarly at Classical time controls, we could barely give knight odds to players with ratings like FIDE 2100 back then, giving knight odds to "par" GMs in Classical seemed like an impossible goal. Now I think we are already there, and giving rook odds to players in Classical at least seems a realistic goal. What it means is that chess is more complicated than we thought it was.
I have so many mixed feelings about schooling that I'm glad I don't have my own children to worry about. There is enormous potential for improving things, yet so little of that potential gets realized.
The thing about school choice is that funding is largely zero sum. Those with the means to choose better options than public schools take advantage of those means and leave underfunded public schools to serve the least privileged remainder. My public school teacher friends end up with disproportionately large fractions of children with special needs who need extra care and attention but don't have the support to care for them effectively. As a result, all the students and all the teachers suffer. How do we do right by all these individuals? Private schools can largely avoid accommodating these students. The private schools can largely choose their students, but the public schools cannot. It reminds me of insurance companies choosing not to cover certain people, who then have no affordable coverage options. It's not an unsolvable problem in theory, but reality is messy and politically fraught.
How does the falling price factor into an investor's decision to enter the market? Should they wait for batteries to get even cheaper, or should they invest immediately and hope the arbitrage rates hold up long enough to provide a good return on investment? The longer the payback period, the more these dynamics matter.
I may feel smug if the "novel idea" is basically a worse version of an existing one, but there are more interesting possibilities to probe for.
Less likely to be rounded away:
Nearly all conceptual rounding errors will not be anything as grand as the extreme examples I gave, but often there is still something worth examining.