LESSWRONG
LW

AI
Frontpage

6

[ Question ]

How useful could stolen AI model weights be without knowing the architecture and activation functions?

by Jemal Young
6th Aug 2025
1 min read
A
2
5

6

AI
Frontpage

6

How useful could stolen AI model weights be without knowing the architecture and activation functions?
9Fabien Roger
2Jemal Young
8ryan_greenblatt
8Brendan Long
1Jemal Young
New Answer
New Comment

2 Answers sorted by
top scoring

Fabien Roger

Aug 07, 2025

93

The scenario seems unrealistic because of the thieves would likely be able to steal important parts of the codebase. But assuming you literally just get a big list of named tensors, my guess is that it could be anywhere between many person-months and millions of person-years (mostly spent re-inventing the algorithms) depending on how many architectural novelties went into the model. For context, it took a decently sized effort to actually get Gemma working properly, given a paper describing how it works and a faulty implementation.

Add Comment
[-]Jemal Young25d*22

The scenario seems unrealistic because of the thieves would likely be able to steal important parts of the codebase.

Thanks for this. So I guess when knowledgeable people talk about stealing a model's weights as being equivalent to stealing the model itself, "steal the weights" is shorthand that implies also stealing the minimal *other elements you'd need to replicate the model. [Edit: changed "the minimal" to "other"]

Reply

ryan_greenblatt

Aug 06, 2025

83

If you can steal the weights, you can very likely steal (part of) the code base. At this point, figuring out the architecture isn't too hard.

Add Comment
2 comments, sorted by
top scoring
Click to highlight new comments since: Today at 11:25 AM
[-]Brendan Long26d84

You could probably infer quite a bit from the model weight file. At a minimum you'll get a list of tensors and their sizes, and you'll likely get useful names for each tensor too. I'm not sure what's typically in the metadata but you might learn a lot there too. Some files will even give you architectural details, although you might be able to guess what the layers are based on their sizes and order in the file even without that.

If the model uses a standard activation function and it's the same for every layer, it wouldn't take very long to just go through each of them and run some benchmarks to figure out which activation function gives you the best results. Another question is whether it would even matter if you picked correctly. Claude seems confident that having the wrong activation function would be obvious and broken, but LLMs are weirdly resilient so it might just work.

I think the number of tensors in a frontier model is large enough that you wouldn't be able to figure it out by pure brute force though.

Reply1
[-]Jemal Young26d10

This is a helpful answer, thank you! Thanks also for the link to the HF article on common model formats.

Reply
Moderation Log
More from Jemal Young
View more
Curated and popular this week
A
2
2

I'm thinking of an unreleased frontier model. No public information. How realistic is it to think such a model could be duplicated starting from the weights alone, e.g. by brute forcing through different combinations of architecture and activation functions? Would thieves be likely to end up with an inferior bizarro model?