2860

LESSWRONG
LW

2859

ojorgensen's Shortform

by ojorgensen
4th May 2023
1 min read
1

2

This is a special post for quick takes by ojorgensen. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
ojorgensen's Shortform
1ojorgensen
1 comment, sorted by
top scoring
Click to highlight new comments since: Today at 7:23 AM
[-]ojorgensen2y10

Problem: we want to make it hard for ML systems (trained via SGD) to perform naive gradient hacking. By naive gradient hacking, I mean "being able to keep some weights of the network constant for an arbitrary step of SGD".

Solution: do "stochastic" regularisation, e.g. sample the amount of regularisation we perform randomly (could use quantum stuff if we want true randomness). This seems like it should make naive gradient hacking almost impossible - in order to keep some target weights unchanged, you'd have to match their +ve contribution to the loss to the degree of regularisation. If the degree of regularisation is stochastic, the loss contribution must also be stochastic, which is not possible with a determinstic NN!

The question of how useful this is from preventing any gradient hacking depends on how stable the loss landscape around some "deceptive / gradient hacking minimum is". Seems possible the surrounding loss landscape could be pretty unstable to random pertubations?

Reply
Moderation Log
More from ojorgensen
View more
Curated and popular this week
1Comments