It is probably just a silly arbitrary codename reference to something like Altman growing strawberries at his house, who knows; but I would doubt that it refers to the counting-letters problem specifically because (1) that is due to BPE tokenization, which has way simpler solutions like byte tokenization, and it's not at all obvious how any kind of 'planning' or self-play RL breakthrough would apply to solving spelling gotcha questions; (2) I think that exact variant of the gotcha showed up after the first reporting of 'Strawberry' last year; (3) the reporting about Strawberry implied it was all about math problems like GSM8k, nothing to do with spelling; and (4) there's plenty of other things that would make a lot more sense as a reference (for example, being a riff off LeCun's "cherry" - another small red fruit frequently put on top of dessert cakes).
Why?
It was already known the AGI Labs were experimenting with synthetic data and that OpenAI are training GPT-5, and the article is light on new details:
I mean, the state of affairs is by no means not worrying, but I don't really see what's in this article would prompt a meaningful update?
I also felt like this was mostly priced in, but I think a maybe more useful prompt for people who feel like they made an update: I think this is a good time to ask "How could I have thought that faster?", and think about what updates you maybe still haven't fully propagated.
I will note that I don't think we've seen this approach work any wonders yet.
(...well unless this is what's up with Sonnet 3.5 being that much better than before 🤷♂️)
However, OpenAI is also using the bigger version of Strawberry to generate data for training Orion, said a person with knowledge of the situation. That kind of AI-generated data is known as "synthetic." It means that Strawberry could help OpenAI overcome limitations on obtaining enough high-quality data to train new models from real-world data such as text or images pulled from the internet.
Reminded me of these quotes/predictions from Epoch's Trading Off Compute in Training and Inference:
Meanwhile, these companies might be able to leverage additional inference compute to achieve better capabilities at a smaller scale, either for internal use or for a small number of external customers. Policy proposals which seek to control the advancement or proliferation of dangerous AI capabilities should take this possibility into account.
it seems likely that models deployed at scale will be closer to the low end of inference compute. Meanwhile, there will be substantially more capable versions of those models that use more inference compute and therefore won’t be available at scale.
AI progress might be faster than expected in some applications, at a limited scale due to higher inference costs. For example, AI companies might be able to use augmented models to speed up their own AI research.
Should we be worried about the alignment of Strawberry itself?
If it is misaligned, and is providing training data for their next Gen, then it can poison the well, even if Strawberry itself is nowhere near TAI.
Please tell me that they have considered this...
Or that I am wrong and it's not a valid concern.
The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?
Two new The Information articles with insider information on OpenAI's next models and moves.
They are paywalled, but here are the new bits of information:
Some excerpts about these: