Josh You
Josh You has not written any posts yet.

Josh You has not written any posts yet.

Bit of feedback: would be helpful if you explicitly stated your estimated number of H200 and Huawei chips and/or provide a B300-eq conversion table, so they are more comparable to other reports that are quoted just in number of chips. I understand how you do the conversion but it is not super apparent in the post.
One thing I don't know is when data center investments get committed to specific customers. Google and Amazon are Anthropic's two main compute partners and will spend $200B each in capex this year and are presumably planning and developing sites for many more hundreds of billions by 2028. So one possible view of it is that their capex creates a window, and Anthropic's eventual share depends on its funding and revenue. But Google and Amazon don't quite know how their 2027-2028 data centers will be allocated.
In general, for large data centers the specific lab that will use it is settled well before the late stages of construction, e.g. Stargate and Rainier, But I know independent data center developers often start developing a site without having a client pinned down. And smaller inference clusters are presumably more fungible.
A good term for 10^20 FLOP would be useful. This would make modern models around 100k to 10 million of this unit, which is a tangible number. Some people, e.g. at DeepMind tried to make "petaflop-days" (8.64e19) a thing but it didn't catch on.
Another point here is that elections are an additional check after the courts, Congress, etc. US presidential elections are not administered by the federal government, they are administered by the states. So to interfere with elections, the president can't just fill election boards with cronies or give orders to anyone in his chain of command to rig the election. He'd have to forcibly manipulate or interfere with state officials and state governments, risking direct conflict with states. And if he doesn't interfere with the election and the states announce results showing he lost in a landslide, his political power almost certainly evaporates. Of course, if all the president's crazy actions are in... (read more)
In-practice most federal offices have deferred to what the Supreme Court says, but we haven’t really seen what happens when e.g. a sitting president insists on an interpretation of the constitution that disagrees, and the constitution itself provides no clear answer.
This is a somewhat confusing statement. To be clear, it's extremely common for the president to disagree with courts on the law or Constitution: this happens dozens of times per presidential term. And when they lose in court the president may declare that they still think they are right and the Court ruled incorrectly. But this wouldn't cause a constitutional crisis or anything by default: the president almost always follows court orders... (read more)
Yeah I think leading labs generally retrain their base models less often than every 6 months (but there's a lot we don't know for sure). And I believe this most likely has to do with a production AI model being the result of a lot of careful tuning on pre-training, mid-training, post-training etc. Swapping in a new base model might lead to a lot of post-training regressions that need to be fixed. And your old base model is a "lucky" one in some sense because it either was selected for doing well and/or it required lots experiments, derisking runs, etc. Even with all of your new algorithmic tricks it might be hard to one-shot YOLO a base model that's better than your SOTA model from nine months ago. But this is probably much easier for your model from 18 or 27 months ago.
Also I'd guess staff costs are more important than compute costs here but these considerations mean compute costs of retraining are higher than one might think.
you should be more uncertain about the METR benchmark's external validity than what these error bars show.
but your baseline uncertainty about key facts about AI progress in general should also often span much more than one order of magnitude between your 2.5th percentile and 97.5th percentile guess. the METR results add a lot of value and I don't think these error bars are a big deal in the scheme of things.
Most successful startups slow down a lot after a brief hypergrowth phase. We should be looking for signs that AI companies like OpenAI and Anthropic* are experiencing unusually long and persistent hypergrowth: surprisingly little slowdown in growth, or maintaining >2x growth/year at surprisingly high revenue levels like 100B. They are both already growing very surprisingly fast for companies with multiple billions in revenue, to be clear, but whether that continues is valuable evidence.
This could be a sign that present-day models have a higher economic ceiling than we realize (closer to TAI than they might look), or that companies are making real progress towards transformative AI. Most companies don't dramatically improve their product lineup over and over again after they find initial product-market-fit, so sustained rapid growth means that AI development is leading to a new batch of successful products on a regular basis, i.e. escalating economic usefulness.
*I think companies that serve AI to end-users are the most useful indicators
I'd flip it around and ask whether Gabriel thinks the best models from 6, 12, or 18 months ago could be performing at today's level with maximum elicitation.
Isn't Emeryville kind of doing this? Though I'm not sure if they're maxing out the envelope of housing production from real costs even if a city government goes 100% YIMBY.