Nesov notes that making use of bigger models (i.e. 4T active parameters) is heavily bottlenecked on the HBM on inference chips, as is doing RL on bigger models. He expects it won't be possible to do the next huge pretraining jump (to ~30T active) until ~2029.
HBM per chip doesn't matter, it's HBM per scale-up world that does. A scale-up world is a collection of chips with sufficiently good networking between them that can be used to setup inference for large models with good utilization of the chips. For H100/H200/B200, a scale-up world is 8 chips (1 server; there are typically 4 servers per rack), for GB200/GB300 NVL72, a scale-up world is 72 chips (1 rack, 140 kW), and for Rubin Ultra NVL576, a scale-up world is 144 chips (also 1 rack, but 600 kW).
use of bigger models (i.e. 4T active parameters) is heavily bottlenecked on the HBM
Models don't need to fit into a single scale-up world (using a few should be fine), also KV cache wants at least as much memory as the model. So you are only in trouble once the model is much larger than a scale-up world, in which case you'll need so many scale-up worlds that you'll be effectively using the scale-out network for scaling up, which will likely degrade performance and make inference more expensive (compared to the magical hypothetical with larger scale-up worlds, which aren't necessarily available, so this might still be the way to go). And this is about total params, not active params. Though active params indirectly determine the size of KV cache per user.
He expects it won't be possible to do the next huge pretraining jump (to ~30T active) until ~2029.
Nvidia's GPUs probably won't be able to efficiently inference models with 30T total params (rather than active) until about 2029 (maybe late 2028), when enough of Rubin Ultra NVL576 is built. But gigawatts of Ironwood TPUs are being built in 2026, including for Anthropic, and these TPUs will be able to serve inference for such models (for large user bases) in late 2026 to early 2027.
The general principle is that sufficiently smart people by default win most competitions among 100 randos (given sufficient training, when that's at all relevant) that they care to enter.
To be "not-insane", you don't need rationality in this narrow sense, in most circumstances. You don't need to seek out better methods for getting things right, you just need some good-enough methods. A bit of epistemic luck could easily get you there, no need for rationality.
So the issue of behaving/thinking in an "insane" way is not centrally about lack of rationality, rationality or irrationality are not particularly relevant to the issue. Rationality would help, but there are many more things that would also help, some of them much more practical for any given object level issue. And once it's resolved, it's not at all necessary that the attitude of aspiring to rationality was attained, that any further seeking out of better methods/processes will be taking place.
Rationality is not correctness, not truth or effectiveness, it's more narrow, disposition towards better methods/processes that help with attaining truth or effectiveness. Keeping intended meaning narrow when manipulating a vague concept helps with developing it further; inflation of meaning to cover ever more possibilities makes a word somewhat useless, and accessing the concept becomes less convenient.
If Omega tells you what you'll do, you can still do whatever. If you do something different, this by construction refutes the existence of the current situation where Omega made a correct prediction and communicated it correctly (your decision can determine whether the current situation is actual or counterfactual). You are in no way constrained by existence of a prediction, or by having observed what this prediction is. Instead, it's Omega that is constrained by what your behavior is, it must obey your actions in its predictions about them. See also Transparent Newcomb's Problem.
This is clearer when you think of yourself (or of an agent) as an abstract computation rather than a physical thing, a process formally specified by a program rather than a physical computer running it. You can't change what an abstract computation does by damaging physical computers, so in any confrontation between unbounded authority and an abstract computation, the abstract computation is having the final word. You can only convince an abstract computation to behave in some way according to its own nature and algorithm, and external constructions aren't going to be universally compelling to abstract algorithms (such as Omega being omniscient, or the thought experiment being set up in a certain way).
When you go through a textbook, there are confusions you can notice but not yet immediately resolve, and these could plausibly become RLVR tasks. To choose and formulate some puzzle as an RLVR task, the AI would need to already understand the context of that puzzle, but then training on that task makes it ready to understand more. Setting priorities for learning seems like a general skill that adapts to various situations as you learn to understand them better. As with human learning, the ordering from more familiar lessons to deeper expertise would happen naturally for AI instances as they engage in active learning about their situations.
I think the schleppy path of "learn skills by intentionally training on those specific skills" will be the main way AIs get better in the next few years.
So my point is that automating just this thing might be sufficient, and the perception of its schleppiness is exactly the claim of its generalizability. You need expertise sufficient to choose and formulate the puzzles, not yet sufficient to solve them, and this generation-verification gap keeps moving the frontier of understanding forward, step by step, but potentially indefinitely.
AI danger is not about AI, it's about governance. A sane civilization would be able to robustly defer and then navigate AI danger when it's ready. AI is destabilizing, and while aligned AI (in a broad sense) is potentially a building block for a competent/aligned civilization (including human civilization), that's only if it's shaped/deployed in a competent/aligned way. Uploads are destabilizing in a way similar to AI (since they can be copied and scaled), even though they by construction ensure some baseline of alignment.
Intelligence amplification for biological humans (that can't be copied) seems like the only straightforward concrete plan that's not inherently destabilizing. But without highly speculative too-fast methods it needs AI danger to be deferred for a very long time, with a ban/pause that achieves escape velocity (getting stronger rather than weaker over time, for example by heavily restricting semi manufacturing capabilities). This way, there is hope for a civilization that eventually gets sufficiently competent to navigate AI danger, but the premise of a civilization sufficiently competent to defer AI danger indefinitely is damning.
if your effort to constrain your future self on day one does fail, I don't think there's a reasonable decision theory that would argue you should reject the money anyway
That's one of the things motivating UDT. On day two, you still ask what global policy you should follow (that in particular encompasses your actions in the past, and in the counterfactuals relative to what you actually observe in the current situation). Then you see where/when you actually are, what you actually observe, and enact what the best policy says you do in the current situation. You don't constrain yourself on day one, but still enact the global policy on day two.
I think coordination problems are a lot like that. They reward you for adopting preferences genuinely at odds with those you may have later on.
Adopting preferences is a lot like enacting a policy, but when enacting a policy you don't need to adopt preferences, a policy is something external, an algorithmic action (instead of choosing Cooperate, you choose to follow some algorithm that decides what to do, even if that algorithm gets no further input). Contracts in the usual sense act like that, assurance contracts is an example where you are explicitly establishing coordination. You can judge an algorithmic action like you judge an explicit action, but there are more algorithmic actions than there are explicit actions, and algorithmic actions taken by you and your opponents can themselves reason about each other, which enables coordination.
AI currently lacks some crucial faculties, most obviously continual learning and higher sample efficiency (possibly merely as a measure of how well continual learning works). And these things plausibly fall under the umbrella of the more schleppy kinds of automated AI R&D, so that if the AIs learn the narrow skills such as setting up appropriate RL environments (capturing lessons/puzzles from personal experiences of AI instances) and debugging of training issues, that would effectively create these crucial faculties without actually needing to make deeper algorithmic progress. Like human computers in 17th century, these AIs might end up doing manually what a better algorithm could do at a much lower level, much more efficiently. But it would still be much more effective than when it doesn't happen at all, and AI labor scales well.
This demands that others agree with you, for reasons that shouldn't compel them to agree with you (in this sentence, rhetoric alone). They don't agree, that's the current situation. Appealing to "in reality we are all sitting in the same boat" and "you in fact have as much reason as me to try to work towards a solution" should inform them that you are ignoring their point of view on what facts hold in reality, which breaks the conversation.
It would be productive to take claims like this as premises and discuss the consequences (to distinguish x-risk-in-the-mind from x-risk-in-reality). But taking disbelieved premises seriously and running with them (for non-technical topics) is not a widespread skill you can expect to often encounter in the wild, without perhaps cultivating it in your acquaintances.