Programmatic backdoors: DNNs can use SGD to run arbitrary stateful computation

Buck

Also, if anyone thinks this is really cool and want to buy the impact certificate, it's possible we can work out a good price.

[-]Algon2y80

Somewhat related:

Dropout layers in a Transformer leak the phase bit (train/eval) - small example. So an LLM may be able to determine if it is being trained and if backward pass follows. Clear intuitively but good to see, and interesting to think through repercussions of.

-Andrej Karpathy. He's got a Colab notebook demonstrating this.

Also, how complex is this model? Like, a couple of hundred bits? And how would we know if any model had implemented this?

[-]Review Bot2y*10

The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

^{^}

The expression is different than the 1-variable case because we need to tie $s_{i}$ to $s_{i}^{'}$ . To do so, we sum the $(s_{i} - s_{i}^{'})^{2}$ . But this would leave us with $2 (s_{i} - s_{i}^{'}) \sum_{i} [s_{i} - s_{i}^{'}]^{2}$ , which is why need the $n$ function.

LESSWRONG
LW

LESSWRONG
LW

107

Programmatic backdoors: DNNs can use SGD to run arbitrary stateful computation

107

Ω 52

107

Ω 52

The embedding algorithm

The idea

In a more realistic NN context

The need for stopgrad and its implementation

Limitations

What if you use a batch size greater than 1?

What if you use Adam or SGD with momentum?

What if you don’t have a perfect predictor of the labels?

Experiments

Toy scalar model

A more fancy toy scalar model (from the figure at the top)

MNIST flips

Future work