LESSWRONG
LW

KhromeM
1020
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Using an LLM perplexity filter to detect weight exfiltration
KhromeM1y10

My last statement was totally wrong. Thanks for catching that.

In theory its probably even possible to get the approximate weights by expending insane amounts of compute, but you could use those resources much more efficiently. 

Reply
Using an LLM perplexity filter to detect weight exfiltration
KhromeM1y2-5

I do not understand how you can extract weights through just conversing with an LLM any more than you can get information on how my neurons are structured by conversing with me. Extracting training data it has seen is one thing, but presumably it has never seen its weights. If the system prompts did not tell it it was an LLM, it should not even be able to figure out that.

Reply
No posts to display.