The scenario seems unrealistic because of the thieves would likely be able to steal important parts of the codebase. But assuming you literally just get a big list of named tensors, my guess is that it could be anywhere between many person-months and millions of person-years (mostly spent re-inventing the algorithms) depending on how many architectural novelties went into the model. For context, it took a decently sized effort to actually get Gemma working properly, given a paper describing how it works and a faulty implementation.
The scenario seems unrealistic because of the thieves would likely be able to steal important parts of the codebase.
Thanks for this. So I guess when knowledgeable people talk about stealing a model's weights as being equivalent to stealing the model itself, "steal the weights" is shorthand that implies also stealing the minimal *other elements you'd need to replicate the model. [Edit: changed "the minimal" to "other"]
If you can steal the weights, you can very likely steal (part of) the code base. At this point, figuring out the architecture isn't too hard.
You could probably infer quite a bit from the model weight file. At a minimum you'll get a list of tensors and their sizes, and you'll likely get useful names for each tensor too. I'm not sure what's typically in the metadata but you might learn a lot there too. Some files will even give you architectural details, although you might be able to guess what the layers are based on their sizes and order in the file even without that.
If the model uses a standard activation function and it's the same for every layer, it wouldn't take very long to just go through each of them and run some benchmarks to figure out which activation function gives you the best results. Another question is whether it would even matter if you picked correctly. Claude seems confident that having the wrong activation function would be obvious and broken, but LLMs are weirdly resilient so it might just work.
I think the number of tensors in a frontier model is large enough that you wouldn't be able to figure it out by pure brute force though.
This is a helpful answer, thank you! Thanks also for the link to the HF article on common model formats.
I'm thinking of an unreleased frontier model. No public information. How realistic is it to think such a model could be duplicated starting from the weights alone, e.g. by brute forcing through different combinations of architecture and activation functions? Would thieves be likely to end up with an inferior bizarro model?