LESSWRONG
LW

Hide's Shortform

by Hide
30th Jun 2025
1 min read
6

2

This is a special post for quick takes by Hide. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
Hide's Shortform
3Hide
5Vladimir_Nesov
3Hide
5the gears to ascension
1TimothyTV
3the gears to ascension
6 comments, sorted by
top scoring
Click to highlight new comments since: Today at 4:43 AM
[-]Hide2mo3-1

Grok 4 doesn’t appear to be a meaningful improvement over other SOTA models. Minor increases in benchmarks are likely the result of Goodharting.  

I expect that GPT 5 will be similar, and if it is, this gives greater credence to diminishing returns on RL & compute.  


It appears the only way we will see continued exponential progress is with a steady stream of new paradigms like reasoning models. However, reasoning models are a rather self-suggesting and low-hanging fruit, and new needle-moving ideas will become increasingly hard to come by.

As a result, I’m increasingly bearish on AGI within 5-10 years, especially as a result of merely scaling within the current paradigm.

Reply
[-]Vladimir_Nesov2mo5-1

Current AIs are trained with 2024 frontier AI compute, which is 15x original GPT-4 compute (of 2022). The 2026 compute (that will train the models of 2027) will be 10x more than what current AIs are using, and then plausibly 2028-2029 compute will jump another 10x-15x (at which point various bottlenecks are likely to stop this process, absent AGI). We are only a third of the way there. So any progress or lack thereof within a short time doesn't tell much about where this is going by 2030, even absent conceptual innovations.

Grok 4 specifically is made by xAI, which is plausibly not able to make use of their compute as well as the AI companies that were at it longer (GDM, OpenAI, Anthropic). While there are some signs that it's at a new level of RLVR, even that is not necessarily the case. And it's very likely smaller than compute optimal for pretraining even on 2024 compute.

They likely didn't have GB200 NVL72 for long enough and in sufficient enough numbers to match their pretraining compute with them alone, which means compute utilitization by RLVR was worse than it will be going forward. So the effect size of RLVR will only start being visible more clearly in 2026, after enough time has passed with sufficient availability of GB200/GB300 NVL72. Though perhaps there will soon be a GPT-4.5-thinking release with pretraining-scale amount of RLVR that will be a meaningful update.

(Incidentally, now that RLVR is plausibly catching up with pretraining in terms of GPU-time, there is a question of a compute optimal ratio between them, which portion of GPU-time should go to pretraining and which to RLVR.)

Reply
[-]Hide2mo3-4

It’s starting to really feel like we’re in the process of AI improvement fizzling out and companies are merely disguising this with elaborate products. 

Reply
[-]the gears to ascension2mo55

Yeah there haven't been any improvements that significantly changed how capable a model is on a hard task I need solved for like, at least a week, maybe more /j

Reply
[-]TimothyTV2mo10

Im out of the loop, can you point to an example please?

Reply
[-]the gears to ascension2mo30

/j was because I haven't really kept track of how long it's been. Gemini 2.5 pro was the last one I was somewhat impressed by. now, like, to be clear, it's still flaky and still an LLM, still incremental improvement, but noticeably stronger on certain kinds of math and programming tasks. still mostly relevant when you want speed and some slop is ok.

Reply
Moderation Log
More from Hide
View more
Curated and popular this week
6Comments