CloudMatrix 384 is indeed in principle sufficient to build inference and training systems with capabilities rivaling those using Nvidia's hardware in the next 1-3 years (if we ignore the cost and everything actually works). In this sense Huawei is ahead even of AMD, which is only planning a large scale-up world system for next year (Helios). Large scale-up world size (a collection of GPUs connected by very high bandwidth networking that enables them to effectively share memory and do things like matrix multiplications in parallel across the GPUs on the scale-up network) is critical for fast inferencing of large reasoning models, and likely also for training them if RLVR training usefully scales to rival pretraining in GPU-time use (which could happen next year).

But CloudMatrix 384 is significantly less cost- and power-efficient (Ascend 910C chips are 7nm), has unclear reliability (which especially matters for giant training systems), and so far can't be produced by China domestically. The entire stock of HBM (memory) and compute dies for Ascend 910C is imported, the compute dies were ordered from TSMC via intermediaries (while the policy is to not sell to Huawei). In principle, Huawei has enough parts to produce 1.1M chips, which is about the same number of BF16 FLOPs as ~350K Blackwell chips, so if all of it went into a single training system, it would match Stargate Abilene as it completes construction in summer 2026. But much more plausibly it's going to be sold piecemeal, and it's unclear if there will be enough lapses in export controls or progress in domestic production to get any more of the chips in the immediate future.

[-]sanxiyn6mo10

This is a good argument and I think it is mostly true, but this absolutely should be in AI 2027 Compute Forecast page. Simply not saying a word about the topic makes it looks unserious and incompetent. In fact, that reaction happened repeatedly in my discussion with my friends in South Korea.

[-]Vladimir_Nesov6mo60

Well, CloudMatrix 384 was announced Apr 10, while AI-2027 was published Apr 3, and there is still no word on actual ability of China to produce the compute (as opposed to designing/assembling it).

this absolutely should be in AI 2027 Compute Forecast page

[-]sanxiyn6mo20

CloudMatrix was not, but Huawei Ascend has been there for a long time, and was used to train LLM even back in 2022. I didn't realize AI 2027 predated CloudMatrix but I still think ignoring China for Compute Production was unjustified.

[-]Vladimir_Nesov6mo4-1

A central premise of AI-2027 is takeoff via very fast large reasoning models, which necessarily means a lot of compute specifically in the form of large scale-up world systems. Compute with smaller scale-up worlds (such as H100s) can be used to pretrain large models, but not to run inference of large reasoning models at a high speed, and not to train large reasoning models with RLVR if that ends up needing pretraining-scale amounts of GPU-time.

Before CloudMatrix 384, China had all the ingredients except chip/HBM manufacturing capability, large scale-up world systems, and possibly feeling the AGI (I'm not sure if companies like Alibaba won't be going full Google/DeepMind on AGI if they had the compute). Now that they have large scale-up world systems, there are fewer missing ingredients for entering the AGI race in earnest. This is no small thing, GB200 NVL72 is essentially the first and only modern large scale-up world size system for AI in the world that uses all-to-all topology (sufficient for fast inference or reasoning training of a reasoning model at the scale of a hypothetical GPT-4.5-thinking). The only other alternative is Google's TPUs, which use 3D torus topology that constrains applications somewhat but seems sufficient for AI. The new Gemini 2.5 report says they were using TPU-v5p in training it, strong systems built out of relatively weak 0.5e15 BF16 FLOP/s chips, 2x slower individually than H100.

Unlike any other country that is not USA, China has manufacturing of everything else down, has enough AI researchers, and potential to quickly fund and execute construction of sufficiently large training systems, if only they had the chips. And in principle they can produce 7nm chips, just not at a low enough defect density that the yield for the reticle-sized AI compute dies is good enough to ramp their production. (The situation with HBM might be even worse, but then the export controls for HBM remain more porous.)

So the point about not having capability to produce the chips remains crucial and keeps China out of the AGI race that follows AI-2027 rules, unless export controls sufficiently loosen or fail. And without CloudMatrix 384, even obtaining a lot of chips wouldn't have helped with large reasoning models.

(This framing is mostly only relevant for the AI-2027 timeline, since without AGI a few years down the line either 7nm chips become too power-inefficient to matter compared to hypothetical USA's future 5GW+ training systems build out of 1nm chips, or thanks to the pressure of chips-but-not-tools export controls China sufficiently progresses in domestic chip manufacturing that they move on to being able to produce their own 5nm and then 3nm chips without being left too far behind in power-efficiency.)

[-]StanislavKrym6mo-10

in principle they can produce 7nm chips, just not at a low enough defect density that the yield for the reticle-sized AI compute dies is good enough to ramp their production.

Unfortunately, I doubt that China will fail to mitigate the effects of defective chips. Adding noise to the weights is already used to, for example, uncover sandbagging.

China sufficiently progresses in domestic chip manufacturing that they move on to being able to produce their own 5nm and then 3nm chips without being left too far behind in power-efficiency.

Xiaomi already asks us to hold its beer while it tries to produce the 3nm chips. Hopefully for the USA, China could end up receiving the chips at an insufficient rate.

unless export controls sufficiently loosen or fail

China is already likely to openly buy NVIDIA-produced chips or to undermine the USA's project by invading Taiwan. If I make the flawed assumption that China has no smuggled chips and forever increases its compute production five times a year (while the USA increases 1.5 times per four months^[1]), then, as I tried to show in my severely downvoted post, the USA are unlikely to keep leadership by slowing down. What about the world without any flawed assumptions?

^{^}
The latter assumption is almost verbatim lifted from the AI-2027 forecast and didn't take into account the USA's potential weakness.

[-]romeo5mo10

CloudMatrix announcements indeed predated AI 2027 but the compute forecast did make predictions of how much compute China will have, including domestic production, smuggling and legal purchasing of foreign chips and found that they would still be significantly behind by 2027. The CloudMatrix doesn't change this because its still around 2x less cost-efficient than what US companies have access to, and US companies are investing around 4-5x their Chinese counterparts. This follow up blog post addressed the concern that we underestimated China, focusing on this compute gap.

I think China has a very serious chance of overtaking the US in terms of both compute and overall frontier AI capabilities post-2030, since they might crack EUV by then and the US will start running into more significant power bottlenecks that China won't face.

Moderation Log

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

24

Serving LLM on Huawei CloudMatrix

24

24