Just a nit-pick but to me "AI growth-rate" suggests economic growth due to progress in AI, as opposed to simply techincal progress in AI. I think "Excessive AI progress yields little socio-economic benefit" would make the argument more immediately clear.
This seems like a bad rule of thumb. If your social circle is largely comprised of people who have chosen to remain within the community, ignoring information from "outsiders" seems like a bad strategy for understanding issues with the community.
Even if OpenAI don't have the option to stop Bing Chat being released now, this would surely have been discussed during investment negotiations. It seems very unlikely this is being released without approval from decision-makers at OpenAI in the last month or so. If they somehow didn't foresee that something could go wrong and had no mitigations in place in case Bing Chat started going weird, that's pretty terrible planning.
Great post! This helps to clarify and extend lots of fuzzy intuitions I had around gradient hacking, so thanks! If anyone is interested in a different perspective / set of intuitions for how some properties of gradient descent affect gradient hacking, I wrote a small post about this here: https://www.lesswrong.com/posts/Nnb5AqcunBwAZ4zac/extremely-naive-gradient-hacking-doesn-t-work
I’d expect this to mainly be of use if the properties of gradient descent labelled 1, 4, 5 were not immediately obvious to you.
Hey! Not currently working on anything related to this, but would be excited to read anything related to this you are writing :))
Problem: we want to make it hard for ML systems (trained via SGD) to perform naive gradient hacking. By naive gradient hacking, I mean "being able to keep some weights of the network constant for an arbitrary step of SGD".
Solution: do "stochastic" regularisation, e.g. sample the amount of regularisation we perform randomly (could use quantum stuff if we want true randomness). This seems like it should make naive gradient hacking almost impossible - in order to keep some target weights unchanged, you'd have to match their +ve contribution to the loss to the degree of regularisation. If the degree of regularisation is stochastic, the loss contribution must also be stochastic, which is not possible with a determinstic NN!
The question of how useful this is from preventing any gradient hacking depends on how stable the loss landscape around some "deceptive / gradient hacking minimum is". Seems possible the surrounding loss landscape could be pretty unstable to random pertubations?