“Hardware noise” in AI accelerators is often seen as a nuisance, but it might actually turn out to be a useful signal for verification of claims about AI workloads and hardware usage.
With this post about my experiments (GitHub), I aim to
Contribute more clarity to the discussion about “GPU non-determinism”
Present how non-associativity can help monitor untrusted AI datacenters
Summary
I ran ML inference in dozens of setups to test which setups have exactly reproducible results, and which differences in setups lead to detectable changes in outputs or activations.
In nearly all cases studied, results were bitwise-reproducible within fixed settings. Differences across production methods were consistent, not random.
Given that these perturbations are reproducible and unique, they can act as a “fingerprint” of the exact setup that produced an output. This may turn out useful for monitoring untrusted ML hardware (such as in the context of AI hardware governance, international treaty verification, and AI control/security).
Some settings had unique fingerprints, while others were invariant under change.
Invariant (i.e. not detectable by noise):
batch size in prefill inference
concurrent CUDA streams
pipeline parallelism rank
Detectable when re-executing on identical hardware:
different quantization methods, even at the same precision
Any change that affects numerics is detectable, since results were bitwise-reproducible within settings.
Detectable even with reproduction on different hardware:
attention algorithm
different quantizations (even within the same INT precision)
and of course different inputs or models
Different reduction order (a subtle difference resulting from batching, tensor parallelism, etc.) is masked by cross-hardware “noise”. Different algorithms are still detectable, because they are not just rounding errors, but qualitatively different math.
In a world with demand for assurance against hidden large-scale ML hardware use, this could become a new layer of defense, conditional on some engineering to make it deployment-ready.
The full post is to be found on my Substack.
This work was part of my technical AI governance research at MATS (ML Theory and Alignment Scholars). Special thanks go to Mauricio Baker for his excellent mentoring and guidance, and to Elise Racine for her support and helpful advice.
“Hardware noise” in AI accelerators is often seen as a nuisance, but it might actually turn out to be a useful signal for verification of claims about AI workloads and hardware usage.
With this post about my experiments (GitHub), I aim to
Summary
In a world with demand for assurance against hidden large-scale ML hardware use, this could become a new layer of defense, conditional on some engineering to make it deployment-ready.
The full post is to be found on my Substack.
This work was part of my technical AI governance research at MATS (ML Theory and Alignment Scholars). Special thanks go to Mauricio Baker for his excellent mentoring and guidance, and to Elise Racine for her support and helpful advice.