What would a compute monitoring plan look like? [Linkpost]

Orpheus16

Yonadav Shavit (CS PhD student at Harvard) recently released a paper titled What does it take to catch a Chinchilla? Verifying Rules on Large-Scale Neural Network Training via Compute Monitoring.

The paper describes a compute monitoring regime that could allow governments to monitor training runs and detect deviations from training run regulations.

I think it's one of the most detailed public write-ups about compute governance, and I recommend AI governance folks read (or skim) it. A few highlights below (bolding mine).

Abstract:

As advanced machine learning systems' capabilities begin to play a significant role in geopolitics and societal order, it may become imperative that (1) governments be able to enforce rules on the development of advanced ML systems within their borders, and (2) countries be able to verify each other's compliance with potential future international agreements on advanced ML development. This work analyzes one mechanism to achieve this, by monitoring the computing hardware used for large-scale NN training. The framework's primary goal is to provide governments high confidence that no actor uses large quantities of specialized ML chips to execute a training run in violation of agreed rules. At the same time, the system does not curtail the use of consumer computing devices, and maintains the privacy and confidentiality of ML practitioners' models, data, and hyperparameters. The system consists of interventions at three stages: (1) using on-chip firmware to occasionally save snapshots of the the neural network weights stored in device memory, in a form that an inspector could later retrieve; (2) saving sufficient information about each training run to prove to inspectors the details of the training run that had resulted in the snapshotted weights; and (3) monitoring the chip supply chain to ensure that no actor can avoid discovery by amassing a large quantity of un-tracked chips. The proposed design decomposes the ML training rule verification problem into a series of narrow technical challenges, including a new variant of the Proof-of-Learning problem [Jia et al. '21].

Solution overview:

In this section, we outline a high-level technical plan, illustrated in Figure 1, for Verifiers to monitor Provers’ ML chips for evidence that a large rule-violating training occurred. The framework revolves around chip inspections: the Verifier will inspect a sufficient random sample of the Prover’s chips (Section 3.2), and confirm that none of these chips contributed to a rule-violating training run. For the Verifier to ascertain compliance from simply inspecting a chip, we will need interventions at three stages: on the chip, at the Prover’s data-center, and in the supply chain.

On the chip (Section 4): When the Verifier gets access to a Prover’s chip, they need to be able to confirm whether or not that chip was involved in a rule-violating training run. Given that rule violation depends only 5 Verifying Rules on Large-Scale NN Training via Compute Monitoring on the code that was run, our solution will necessitate that ML chips logging infrequent traces of their activity, with logging done via hardware-backed firmware. We suggest that ML chips’ firmware occasionally log a copy of the current state of the chip’s high-bandwidth memory to long-term storage, and in particular, that it logs the shard of the NN’s weights stored in memory. These weight-snapshots can serve as a fingerprint of the NN training that took place on each chip.
At the data-center (Section 5): The Verifier needs a way to interpret the chips’ logs, and determine whether or not they are evidence for a rule-violating training run. To that end, the Prover, who is training the model, will be required to store a transcript of the training process — including training data, hyperparameters, and intermediate weight checkpoints — for each model they train. Using protocols similar to “Proof-of-Learning” [23], these training transcripts may serve as provenance for the logged weight-snapshots, which are themselves the result of the same training process. In practice, for each (hash of a) weight-snapshot logged by a chip, the Prover provides the Verifier (the hashed version of) the matching training transcript. Then the Prover and Verifier jointly and securely verify that, with high probability, retraining using the training transcript would have indeed resulted in the logged weight-snapshot (and that no other valid training transcript could have resulted in that snapshot). Finally, now that the Verifier knows an approximate training transcript of the training run that had been executed on that chip at that time, they can examine properties of the training transcript to confirm that the Prover has complied with the agreed upon rules.
At the supply chain (Section 6): The Verifier needs to know which ML chips the Prover owns, so that the Verifier can randomly inspect a representative sample of those chips, to confirm their ownership and that their logging mechanisms are reporting correctly. Without this chip-ownership verification step, a Prover might covertly acquire a large quantity of chips and use them for training without ever notifying the Verifier. Keeping track of chip-ownership is viable because the cutting-edge data-center chip supply chain is highly concentrated, meaning that chips originate from a few initial chokepoints and can be monitored thereafter.

These steps, put together, enable a chain of guarantees:

When any organization wants to train a large rule-violating ML model, they must do so using chips that the Verifier is aware of.
These chips will occasionally log weight-snapshots. Each time a chip logs a weight-snapshot, the Prover must report the log to the Verifier, along with (hashes of) training transcripts that establish the provenance of that weight-snapshot.
If the Prover tries to hide the logs from every chip involved in the training run, the Verifier will eventually find out, because it will sample and physically inspect at least one of those chips with high probability.
Also, the hashed training transcripts that the Prover provides along with the logged weight-snapshot need to be authentic, reflecting the training run that was run on the chip. If they are not authentic, they will be caught by the Verifier’s transcript verification procedure.
If the training transcript is authentic, and the run violated the rules, then the Verifier can tell, and will catch the Prover.

Thus, so long as the Prover complies with the Verifier’s steps, the Verifier will detect the Prover’s rule-violation with high probability. Just as in financial audits, a Prover’s refusal to comply with the verification steps would itself represent an indication of guilt.

Curated. I feel like I've heard the idea of tracking GPU compute over the years, but always in the high-level abstract, with no one actually thinking hard about what it would look like and how feasible it is. I'm very in favor of people trying to flesh out proposals into something that can be discussed concretely, so kudos to this author.

There are some serious issues that need to be overcome for any scheme of this nature to be secure.

Chips are not tamper-proof black boxes. Secure computing enclaves like SGX are routinely broken by academics: https://arstechnica.com/information-technology/2022/08/architectural-bug-in-some-intel-cpus-is-more-bad-news-for-sgx-users/. Secure flash memory is expensive and tampering with the memory writing logic or hardware could allow an actor to write false logs. Cryptographically signing logs runs the risk of leaking key material through power or fault-injection side channels. Once a chip is physically in the possession of someone with university-lab-level equipment and expertise, all bets are off w.r.t. on-chip controls. While some of these exploits are hard to carry out at scale, even a single effective exploit renders whole generations of GPUs uncontrollable.
The author does not actually propose a "Proof of Training Transcript" protocol, only providing a possible definition of such a scheme. While they acknowledge the challenges in constructing such a scheme, I think it is worth highlighting the fact that constructing a secure, efficient instantiation of such a scheme is not something that we (as a species) know how to do. The best noninteractive proofs for arbitrary verifiable computation currently require several orders of magnitude more time to generate the proof than to carry out the original computation and proof generation is not totally parallelizable. The requirement to generate PoTTs basically obviates the utility of using a fast GPU in the first place. It is possible to imagine a special-purpose proof scheme for gradient descent with faster concrete efficiency, but the vague outline of a scheme proposed by the author relies on a stream of nonstandard hardness assumptions that I find very unconvincing.

I think that from a quick read of the paper or from the summary in the post one might be led to believe that such a scheme could be implemented with a few years of engineering effort and the cooperation of chip manufacturers. In fact, substantial advances in cryptography would be required.

Policy-makers' attempts to enforce policy by requiring the use of special chips have in the past largely been broken: Clipper chip, DRM via SGX, etc.

When I first saw "save all weights to on chip hardware", I thought it would be super expensive, but actually saving like 5 times the GPU's memory to a seperate flash chip would only cost $20 (80GB*5 at 5 cents per gigabyte for flash storage). It can be way cheaper bc it's low bandwidth and slow.

It feels like the implicit message here is "And therefore we might coordinate around an alignment solution where all major actors agree to only train NNs that respect certain rules", which... really doesn't seem realistic, for a million reasons?

Like, even assuming major powers can agree to an "AI non-proliferation treaty" with specific metrics, individual people could still bypass the treaty with decentralized GPU networks. Rogue countries could buy enough GPUs to train an AGI, disable the verification hardware and go "What are you gonna do, invade us?", under the assumption that going to war over AI safety is not going to be politically palatable. Companies could technically respect the agreed-upon rules but violate the spirit in ways that can't be detected by automated hardware. Or they could train a perfectly-aligned AI on compliant hardware, then fine-tune it in non-aligned ways on non-compliant hardware for a fraction of the initial cost.

Anyway, my point is: any analysis of a "restrict all compute everywhere" strategy should start by examining what it actually looks like to implement that strategy, what the political incentives are, and how resistant that strategy will be to everyone on the internet trying to break it.

It feels like the author or this paper haven't even begun to do that work.

There's also the infohazard problem since it heavily involves geopolitical considerations, if someone or some group actually figured out a practical means, would they ever reveal it publicly?

Or is it even possible to reveal such a design without undermining the means by which it functions?

The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

Isn't this akin to a protocol for securely monitoring private industry's experiments in thermonuclear weapons? It's better than nothing, but when something is dangerous enough, industrial regulation is never strict enough.

Some things are too dangerous to allow private competition in. The only sensible thing to do is nationalize them and have them run exclusively by extremely security-minded government agencies, if at all. And even that might not be good enough for AI, because we've never had tech whose base case scenario was "kill everyone".

The provable training part seems achievable using the existing cryptography and without requiring Confidential Computing techniques.

For example, by committing to weights at the snapshot point of time using a Hash function and also generating proof that, indeed, at least amount of T has passed between each snapshot using Verifiable Delay Functions (VDFs), we can already achieve a rate limit at how quick someone can train a model. In addition to that, to prove that we have been consistent and the next weight checkpoint was correctly produced by applying X training operations from the previous checkpoint, we can use zk-SNARKs to generate a succinct proof. As this is essentially a continuous process of proving computations, we can also apply ideas from incrementally verifiable computations (IVC), making the scheme even more efficient. I must note that SNARKs still need to be sufficiently efficient to prove extensive computations, such as training of neural networks such as GPT-4, but there are already examples of using them to prove training of GPT-2. Furthermore, as the whole scheme can be made Zero Knowledge, we can have the prover proactively submit the proofs to the verifier to ensure they are consistently compliant.

To ensure that someone cannot ignore the regulation and do training hidden, we would need to require any capable training hardware to be actively registered with the officials. Requiring an active signature from a government server to enable doing X computations could be a solution. This should be achievable as SGX and similar providers have only incremental counters or alternative solutions. Furthermore, requesting computation integrity is a far less strict property on the SGX hardware than attestation, as it does not depend on crucial extraction attacks and would require actual chip-level alterations. Considering we are already using a lot of similar gadgets around ourselves, supported by ARM TEE, this should be achievable.

Current SNARK provers are many orders of magnitude slower than the underlying computation - certainly you can prove that you took sufficient time to do the computation (e.g. using a VDF) or did it in few enough steps or according to some policy (e.g. using NARKS) but what is the point of using a modern GPU in the first place if you're going to be limited to speeds easily achievable by a 1990's CPU?

The only way this scheme becomes more useful than just banning GPU usage outright is if the proof of policy compliance can be generated only slightly slower than actually doing the computation. We don't have primitives that can do that currently.

I agree that the current state of the art does indeed very much limit the throughput of the models. However, considering the current effort in scaling the space and the amount of funding it has been receiving, I would hope that in 5-10 year time this would be a more negligible slow down. After all, it has been shown that several SNARKs and STARKs can achieve linear prover time, so right now it is all about fighting the constants and getting the schemes ever more efficient. And, considering how usually slow the governments are even with urgent matters, 5-10 year period should be just right for that. For the temporary solution we could reserve to SGX and other TEE providers if we have to, and trust that the patching they constantly release would work for now.

After all, this is better than nothing, and having some, even potentially not 100% effective solutions would be good for our cause. As long as we can gradually improve their security in parallel to growing concern over the AI threat.