A few things to note:
"GPT-4's release was delayed by ~8 months because they wanted to do safety testing"
I have heard this claim before (with 6 months). This could be understood as "GPT-4 was ready to go 6 month earlier, they simply did a lot of testing to go the extra mile."
Alternatively this is how long it took to make the foundational model useful, and while they did spend extra resources for red teaming etc. in parallel, this didn't come with a great cost of releasing it later.
Are we sure they didn't just count the time to RLHF it? Seems plausible to me that it always takes ~ 20% of dev time to RLHF a model. (epistemic status: spitballing)
Some months before release they had a RLHF-ed model, where the RLHF was significantly worse on most dimensions than the model they finally released. This early RLHF-ed model was mentioned in eg Sparks of AGI.
Kalshi has a real-money market "ChatGPT-5 revealed" for 2023 (that I've traded). I think they wouldn't mind adding another one for 2024.
important to note that gpt4 is more like 300x scale equivalent of gpt3, not 100x, based on gpt4 being trained with (rumored) 2e25 flops vs contemporary gpt3-level models (llama2-7b) being trained on 8e22 flops ( 250 times the compute for that particular pair)
The markets you linked to roughly align with my intuition that GPT-5 before 2025 is likely, although maybe not fully released publicly.
Things to keep in mind:
Other than the amount of investment in next-gen models, most of my intuition is related to the human and marketing factors involved. OpenAI won't want to lose its lead by waiting many years.
I have no special insight here but boring, cynical common sense suggests the following:
The big difference between now and the pre-ChatGPT era is that Google and a bunch of other massive competitors have woken up and want to blaze past OpenAI. For their part, OpenAI doesn't want there to be a perception that they have been overtaken, so will want to release on a fast enough schedule to be able to trump Google's latest and greatest. (Of course the arrival of something marketed as "GPT-5" tells us nothing about the true state of progress. The GPTs aren't natural kinds.)
So far, each generation of GPT has brought significant improvements. Thinking about the timeline for the next iteration, I noticed that there is a striking difference between extrapolating the past trend and what prediction markets seem to believe.
Forecasting short term AI Capabilities is important as current trends might continue to lead us to bigger changes. Also, it is nice to have testable predictions that we can use to calibrate our predictions and figure out who is worth listening to.
Extrapolating from old GPTs
Rule 1 of forecasting: stop thinking too much and look at some historical data.
We don't know how long it took to develop each GPT, but we can look at how much time passed between each iteration.
So far, more and more time has passed between GPTs. It took 8 months from GPT-1 to GPT-2 and roughly twice as long to GPT-3. And then it took twice as long again to get to GPT-4! This means it took almost 3 years to get from GPT-3 to GPT-4.
Naively Extrapolating from this, it should take till the beginning of 2029 to develop GPT-5.
Now this does seem rather far in the future. So far the trends seems surprisingly consistent, but there aren’t that many datapoints here and it seems like things can change somewhat fast in AI.
Prediction markets
Let’s look at what the prediction markets think:
Manifold Market seems to think, that there is an 62% chance that GPT-5 will come out before 2025.
Similarly, Metaculus puts the announcement date of GPT-5 to Sept 2024: [2]
So it seems like the forecasting sites expect a strong deviation from the historical trend.
Note, that if GPT-5 takes as long as GPT-4 did, we would expect it by the end of 2025. Not only do they expect that GPT-5 will not take longer than GPT-4, they seem to expect GPT-5 to be developed a whole year faster than GPT-4.
So who is right?
Now that we have observed the seeming discrepancy, we can speculate on how to resolve it.
One approach is to say that the markets are right. Ideally they would incorporate additional information, but what could that be?
nerdsinformed citizens couldn’t shut up about AI before. But now everyone from Snoop Dogg, world leaders and your parents talk about LLMs.Am I missing something?
Help me get this question on a real money prediction market
Metaculus and Manifold don’t use real money. Does this affect forecasting ability? In my experience it is a lot easier to gain fake internet points on Manifold than real money on e.g. Polymarket.
Real money works great to make markets more efficient.
So what I hope for, is that we can create a real money market, so we can feel more confident in its prediction.
I have suggested the question to the Polymarket community. They have an issue with the framining: What if GPT-5 will be created under a different name? Like Windows 1, 2, 3, 95. Surely that should resolve YES and it should not depend on the naming scheme of OpenAI.
I would love to hear if you have any ideas how to deal with this. What would be a good definition for a “GPT-5 model” that can be assessed objectively? Has anyone written about what kind of benchmarks we might expect from GPT-5?
Also, I wonder which other real money prediction markets might consider such a market if I ask nicely?
UPDATE 21.12: Kalshi made a real money prediction market on this! (H/T @Lech Mazur). You can bet on it if you are an US citizen. Currently the odds seem roughly consistent with the other prediction sites.
Also, @ChristianWilliams from Metaculus was nice enough to reach out and mention that the Metaculus Prediction for this question is similar to the Community Prediction[2] and also expects GPT-5 before 2025:
To be consistent, this is referring to the initial release date. GPT-2 stands out in that the full model was released 9 months after the initial release date.
Note this is the Community Prediction. This means, it it basically a simple average of all forecasts. Don't confuse it with the (superior) Metaculus Prediction, which incorporates how good forecasters are, but is harder to access.
If anyone knows the Metaculus Predicition for this question, I would be interested in hearing it.
This is reflected in a jump in OpenAIs Revenue and in some economic data.