ah! fair enough actually. No idea how I missed that. But to be fair, I don't know how much others would care about this when suspecting him, so it may be moot anyway.
But I think if there's a risk reward graph of risking insider trading at X amount vs at Y amount, it's not 10 times more suspicious to trade 10 times as much, so therefore he would be acting irrationally.
But yeah, it's a fair argument that maybe he is acting irrationally precisely to avoid such suspicions.
It's an artifact of crossposting a google doc to lesswrong, It is fixed now
Oh wow thank you, I will edit tommorow to reflect and add an addendum to my application! That's crazy!
Cool paper! :) are these results surprising at all to you?
It's a bit of a deepity but also a game theoretical conclusion that "if deepmind releases a paper it is either something groundbreaking or something they will never use in production". The TITANS paper is about a year old now, and the MIRAS paper about 9 months old. you would think that some other frontier lab would have implemented it by now if it worked that well. I suspect a piece is missing here, or maybe the time between pre-training run and deployment is just way longer than I think it is and all the frontier labs are looking at this.
To my understanding TITANS requires you to do a backward pass during inference, this probably is a scaling disaster in inference as well, but maybe less so, since they do say that it can be done efficiently and in parallel. It's unclear to me!
I mean, you may just be right. TITANS+MIRAS could be in the latter category. Gemma 3 (which we know does not use TITANS) for example probably benefits from a lot of RL environments, yet it absolutely sucks at this task. So it is possible that they are using it in production.
I guess like all things we will know for sure once the open chinese labs start doing it.
This is very hard to answer. I just tried to write down basically everything. The noise kind of stopped after a while. it was a very strange sensation
It's fiction, I'm vaguely talking about myself as "you" here but I'm getting at some instinct here basically. Thanks for linking that, I hadn't seen it and that's kind of exactly what I was getting at.
Possibly yes, but I don't think that's a legitimate safety concern since this can already be done very easily with other techniques. And for this technique you would need to model diff with a nonrefusal prompt of the bad concept in the first place, so the safety argument is moot. But sounds like an interesting research question
This makes sense honestly. I guess you would still run the risk of a non-vegan seeing you do these things and going "ha! hypocrite!" but I don't know how real that risk is honestly.
Thank you so much for this reply. Makes perfect sense.
Turns out LW obsession with game theory matters in the real world after all :)