Thank you, Solenoid! The SSC podcast is the only reason I to consume all of posts like Biological Anchors: A Trick That Might Or Might Not Work
Thanks. It's similar in one sense, but (if I'm reading the paper right) a key difference is that in the MAML examples, the ordering of the meta-level and object level training is such that you still wind up optimizing hard for a particular goal. The idea here is that the two types of training function in opposition, as a control system of sorts, such that the meta-level training should make the model perform worse at the narrow type of task it was trained on.
That said, for sure, the types of distribution shift thing is an issue. It seems like this meta-level bias might be less bad than at the object level, but I have no idea.
Inspired by Eliezer's Lethalities post and Zvi's response:
Has there been any research or writing on whether we can train AI to generalize out of distribution?
I'm thinking, for example:
And, of course, we can keep on adding piling layers on.
A few minutes of hasty Googling didn't turn up anything on this, but it seems pretty unlikely to be an original idea. But who knows! I wanted to get the idea written down and online before I had time to forget about it.
On the off chance it hasn't been beaten thought about to death yet by people smarter than myself, I would consider together longer, less hastily written post on the idea
MichaelStJules is right about what I meant. While it's true that preferring not to experience something doesn't necessarily imply that the thing is net-negative, it seems to me very strong evidence in that direction.
Entirely agree. There are certainly chunks of my life (as a privileged first-worlder) I'd prefer not to have experienced, and these generally these seem less bad than "an average period of the same duration as a Holocaust prisoner." Given that animals are sentient, I'd put it at at ~98% that their lives are net negative.
Good point; complex, real world questions/problems are often not Googleable, but I suspect a lot of time is spent dealing with mundane, relatively non-complex problems. Even in your example, I bet there is something useful to be learned from Googling "DIY cabinet instructions" or whatever.
Interesting, but I think you're way at the tail end of the distribution on this one. I bet I use Google more than 90%+ of people, but still not as much as I should.
Yes - if not heretical, at least interesting to other people! I'm going to lean into the "blogging about things that seem obvious to me" thing now.
Fair enough, this might be a good counterargument though I'm very unsure. How much do mundane "brain workouts" matter? Tentatively, the lack of efficacy of brain training programs like Luminosity would suggest that they might not be doing much.
Ah late to the party! This was a top-level post aptly titled "Half-baked alignment idea: training to generalize" that didn't get a ton of attention.