LESSWRONG
LW

2583
Vladimir_Nesov
35248Ω5234698201508
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
10Vladimir_Nesov's Shortform
Ω
1y
Ω
140
What SB 53, California’s new AI law, does
Vladimir_Nesov21h40

I was ineptly objecting to this snippet in particular:

If that were the only provision of the bill, then yes, that would be a problem

The problem I intended to describe in the first two comments of the thread is that this provision creates a particular harmful incentive. By itself, this incentive is created regardless of whether it's also opposed in some contexts by other things. The net effect of the bill in the mitigated contexts could then be beneficial, but the incentive would still be there (in some balance with other incentives), and it wouldn't be mitigated in the other contexts. The incentive is not mitigated for podcasts and blog posts, examples I've mentioned above, so it would still be a problem there (if my argument for it being a problem makes sense), and the way it's still a problem there is not moved at all by the other provisions of the bill.

So I was thinking of my argument as about existence of this incentive specifically, and read tlevin's snippet as missing the point, claiming the incentive's presence depends on things that have nothing to do with the mechanism that brings it into existence. But there's also a plausible reading of what I was saying (even though unintended) as an argument for the broader claim that the bill as a whole incentivises AI companies to communicate less than what they are communicating currently, because of this provision. I don't have a good enough handle on this more complicated question, so it wasn't my intent to touch on it at all (other than by providing a self-contained ingredient for considering this broader question).

But in this unintended reading, tlevin's comment is a relevant counterargument, and my inept objection to it is stubborn insistence on not seeing its relevance or validity, expressed without argument. Judging by the votes, it was a plausible enough reading, and the readers are almost always right (about what the words you write down actually say, regardless of your intent).

Reply1
Benito's Shortform Feed
Vladimir_Nesov3d42

"Locally invalid" was a specific react for highlighting the part of a comment that makes a self-contained mistake, different from "Disagree". A faulty step is not centrally a "weak argument", as it's sometimes not any kind of argument. And discussion often gestures at a claim without providing any sort of evidence or giving any argument, the evidence or the argument is for the recipients to reconstruct for themselves.

Reply1
What SB 53, California’s new AI law, does
Vladimir_Nesov3d2-8

Whether this thing in particular is a problem or not doesn't depend on the presence of other things in there, even those that would compensate for it.

Reply
What SB 53, California’s new AI law, does
Vladimir_Nesov3d62

I wouldn't know about what works in court, but not saying anything (in interviews or posts on their site and such) is probably even safer, unless the sky is already on fire or something. It seems to be a step in an obviously wrong direction, a friction that gets worse if the things AI company representative would've liked to say happen to be sufficiently contrary to prevailing discourse. Like with COVID-19.

Reply
What SB 53, California’s new AI law, does
Vladimir_Nesov3d146

Not make “any materially false or misleading statement” about catastrophic risk from its frontier models, its management of catastrophic risk, or its compliance with its frontier AI framework.

The risk of any statement being considered "materially false or misleading" is an incentive for AI companies to avoid talking about catastrophic risk.

Reply1
A non-review of "If Anyone Builds It, Everyone Dies"
Vladimir_Nesov4d107

In this framing the crux is whether there is an After at all (at any level of capability). The distinction between "failure doesn't kill the observer" (a perpetual Before) and "failure is successfully avoided" (managing to navigate the After).

Reply
Zach Stein-Perlman's Shortform
Vladimir_Nesov7dΩ240

My point is that the 10-30x AIs might be able to be more effective at coordination around AI risk than humans alone, in particular more effective than currently seems feasible in the relevant timeframe (when not taking into account the use of those 10-30x AIs). Saying "labs" doesn't make this distinction explicit.

Reply1
Zach Stein-Perlman's Shortform
Vladimir_Nesov7dΩ6120

with 10-30x AIs, solving alignment takes like 1-3 years of work ... so a crucial factor is US government buy-in for nonproliferation

Those AIs might be able to lobby for nonproliferation or do things like write a better IABIED, making coordination interventions that oppose myopic racing. Directing AIs to pursue such projects could be a priority comparable to direct alignment work. Unclear how visibly asymmetric such interventions will prove to be, but then alignment vs. capabilities work might be in a similar situation.

Reply
OpenAI Shows Us The Money
Vladimir_Nesov8d120

There doesn't necessarily need to be algorithmic progress to get there, sufficient bandwidth enables traditional pretraining across multiple sites. But it might be difficult to ensure it's available across the geographically distributed sites on short notice, if you aren't already a well-established hyperscaler building near your older datacenter sites.

In 2028, targeting inference on Rubin Ultra NVL576 (150 TB of HBM in a scale-up world) might want a MoE model with 80 TB of total params (80T params if in FP8, 160T in FP4). If training uses the same precision for gradients, that's also 80 TB of gradients to exchange. If averaged gradients use more precision, this could be 2x-8x more data.

If training is done using 2 GW of some kind of Rubin GPUs, that's about 2e22-3e22 FP4 FLOP/s, and at 30% utilization for 4 months it produces 8e28 FP4 FLOPs. At 120 tokens/param (anchoring to 40 tokens/param for the dense Llama 3 405B and adjusting 3x for 1:8 sparsity), this system might want about 10T active params (so we get 1:16 sparsity, with 160T total FP4 params, or about 1:8 for FP8). This needs 1,200T tokens, maybe 250T unique, which is a problem, but not yet orders of magnitude beyond the pale, so probably something can still be done without needing bigger models.

With large scale-up worlds, processing sequences of 32K tokens with non-CPX Rubin NVL144 at 30% utilization would take just 2.7 seconds (for pretraining). A 2 GW system has 9K racks, so that's a batch of 300M tokens, which is already a lot (Llama 3 405B used 16M token batches in the main phase of pretraining), so that should be the target characteristic time for exchanging gradients.

Moving 80 TB in 2.7 seconds needs 240 Tbps, or 500-2,000 Tbps if averaged gradients use 2x-8x more precision bits (even more if not all-to-all, which is likely with more than 2 sites), and this already loses half of utilization or asks for even larger batches. A DWDM system might transmit 30-70 Tbps over a fiber optic pair, so this is 4-70 fiber optic pairs, which seems in principle feasible to secure for overland fiber cables (which hold hundreds of pairs), especially towards the lower end of the estimate.

Reply
OpenAI Shows Us The Money
Vladimir_Nesov8d345

Peter Wildeford: I’d expect the 10GW OpenAI cluster becomes operational around 2027-2028.

There are no 10 GW OpenAI clusters, there is a conditional investment by Nvidia for every additional 1 GW in total across all their datacenters, which won't be forming a single training system. Inference and smaller training experiments need a lot of compute, and OpenAI is now building their own even for inference, so most of what they are building is not for frontier model training.

Zvi Mowshowitz: these announcements plus the original cite in Abilene, Texas cover over $400 billion and 7-gigawatts over three years

Depending on OpenAI growth, this is more of a soft upper bound on what gets built. This is evidence that 5 GW training systems likely aren't going to be built by (end of) 2028. So there's going to be a slowdown compared to the trend of 12x in compute (and 6x in power) every 2 years for the largest frontier AI training systems, which held in 2022-2026[1]. In 2027-2028, the largest training systems are likely going to be merely 2 GW instead of the on-trend 5 GW. Though for the 2024 systems, FP8 is likely relevant, and for 2026 systems maybe even FP4, which turns the 12x in compute every 2 years in 2022-2026 into 24x in compute every 2 years (5x per year in pretraining-relevant raw compute for a single training system).

This 24x every 2 years is even less plausible to remain on-trend in 2028, so 2027+ is going to be the time of scaling slowdown, at least for pretraining. Though the AIs trained on 2026 compute might only come out in 2027-2028, judging by how 2024 training compute is still held back by inference capabilities, and some AIs enabled by 2024 levels of compute might only come out in 2026. So the slowdown in the scale of frontier AI training systems after 2026 might only start being observable in scaling of deployed AIs starting in 2028-2029.

Perhaps some of these sites will be connected with sufficient bandwidth, and training with RLVR at multiple sites doesn't need a lot of bandwidth. Actual plans for larger training runs should urge them to build larger individual sites, as this ensures optionality for unusual training processes. So the fact that this isn't happening suggests that there are no such plans for now (except perhaps for RLVR-like things specifically).


  1. 24K A100s in 2022 (7e18 BF16 FLOP/s, 22 MW), 100K H100s in 2024 (1e20 BF16 FLOP/s, 150 MW), 400K chips in GB200/GB300 NVL72 racks in 2026 (1e21 BF16 FLOP/s, 900 MW). The power estimates are all-in, for the whole datacenter site. ↩︎

Reply
Load More
75Permanent Disempowerment is the Baseline
2mo
23
48Low P(x-risk) as the Bailey for Low P(doom)
2mo
29
66Musings on AI Companies of 2025-2026 (Jun 2025)
3mo
4
34Levels of Doom: Eutopia, Disempowerment, Extinction
4mo
0
193Slowdown After 2028: Compute, RLVR Uncertainty, MoE Data Wall
5mo
25
170Short Timelines Don't Devalue Long Horizon Research
Ω
6mo
Ω
24
19Technical Claims
6mo
0
149What o3 Becomes by 2028
9mo
15
41Musings on Text Data Wall (Oct 2024)
1y
2
10Vladimir_Nesov's Shortform
Ω
1y
Ω
140
Load More
Well-being
23 days ago
(+58/-116)
Sycophancy
23 days ago
(-231)
Quantilization
2 years ago
(+13/-12)
Bayesianism
3 years ago
(+1/-2)
Bayesianism
3 years ago
(+7/-9)
Embedded Agency
3 years ago
(-630)
Conservation of Expected Evidence
4 years ago
(+21/-31)
Conservation of Expected Evidence
4 years ago
(+47/-47)
Ivermectin (drug)
4 years ago
(+5/-4)
Correspondence Bias
4 years ago
(+35/-36)
Load More