The scenario seems unrealistic because of the thieves would likely be able to steal important parts of the codebase. But assuming you literally just get a big list of named tensors, my guess is that it could be anywhere between many person-months and millions of person-years (mostly spent re-inventing the algorithms) depending on how many architectural novelties went into the model. For context, it took a decently sized effort to actually get Gemma working properly, given a paper describing how it works and a faulty implementation.

[-]Jemal Young2mo*22

The scenario seems unrealistic because of the thieves would likely be able to steal important parts of the codebase.

Thanks for this. So I guess when knowledgeable people talk about stealing a model's weights as being equivalent to stealing the model itself, "steal the weights" is shorthand that implies also stealing ~~the minimal~~ *other elements you'd need to replicate the model. [Edit: changed "the minimal" to "other"]

Reply

ryan_greenblatt

Aug 06, 2025

83

If you can steal the weights, you can very likely steal (part of) the code base. At this point, figuring out the architecture isn't too hard.

2 comments, sorted by

top scoring

Click to highlight new comments since: Today at 8:06 AM

[-]Brendan Long2mo84

You could probably infer quite a bit from the model weight file. At a minimum you'll get a list of tensors and their sizes, and you'll likely get useful names for each tensor too. I'm not sure what's typically in the metadata but you might learn a lot there too. Some files will even give you architectural details, although you might be able to guess what the layers are based on their sizes and order in the file even without that.

If the model uses a standard activation function and it's the same for every layer, it wouldn't take very long to just go through each of them and run some benchmarks to figure out which activation function gives you the best results. Another question is whether it would even matter if you picked correctly. Claude seems confident that having the wrong activation function would be obvious and broken, but LLMs are weirdly resilient so it might just work.

I think the number of tensors in a frontier model is large enough that you wouldn't be able to figure it out by pure brute force though.

Reply

1

[-]Jemal Young2mo10

This is a helpful answer, thank you! Thanks also for the link to the HF article on common model formats.

Reply

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

6

[ Question ]

How useful could stolen AI model weights be without knowing the architecture and activation functions?

6

6

2 Answers sorted by
top scoring

Aug 07, 2025

Aug 06, 2025

6

[ Question ]

How useful could stolen AI model weights be without knowing the architecture and activation functions?

6

6

2 Answers sorted by top scoring

Aug 07, 2025

Aug 06, 2025

2 Answers sorted by
top scoring