Very nice! I think work in this general direction is what is more or less needed if we want to survive. I just wanted to probe a bit when it comes to turning these methods into governance proposals. Do you see ways of creating databases/tests for objective measurement or how do you see this being used in policy and the real world?(Obviously, I get that understanding AI will be better for less doom, but I'm curious about your thoughts on the last implementation step)
When I inevitably have to answer why I didn't duplicate myself to my future offspring, I will link them to this post; thank you for this gem.
Well, incubators and many smaller bets are usually the best approach in this type of situation since you want to black-swan farm, as you say. This is the approach I'm generally taking right now, similar to the pyramid scheme argument of getting more people to be EA, I think it's worth mentoring new people on alignment perspectives.So some stuff you could do:1. Start a local organisation where you help people understand alignment and help them get perspectives. (Note that this only works over a certain quality of thinking but getting this is also a numbers game IMO.)2. Create and mentor isolated groups of independent communities that develop alignment theories. (You want people to be weird but not too weird, meaning that they should be semi-independent from the larger community)3. Theoretical alignment conference/competition with proposals optimised for being interesting in weird ways. I'm trying to do this specifically in the Nordics as I think there are a bunch of smart people here who don't want to move to the "talent hubs" so my marginal impact might be higher. To be honest, I'm uncertain on how much to focus on this vs. developing more on my personal alignment theories, but I'm a strong pyramid scheme believer. I could however be wrong on this and I would love to hear some more takes on this.
I really like the overall picture that Guns, Germs and Steel presents and a book I believe compliments it very well if one is interested in the evolution of the human species is The Secret of Our Success which goes into more of the mechanisms of cultural evolution and our current theories for why Tasmanians for example fell behind mainland Australia as much as it did.
Maybe a shit answer but becoming great at meditation allows you to be with emotions without trouble. Chores then become an oppurtunity to deepen your awareness and practice (more long term but you cut the root of the tree)
Uhm, fuck yeah?
If someone told me this 2 years a go I wouldn't have believed it, kinda feels like we're doing a Dr Strange on the timeline right now.
I feel like there's a point worth emphasizing here about myopia and the existing work around that, as OP stated. I don't think of myopia as generally promising because of FDT-style reasoning since a VNM-style agent would continually optimise for consistency over longer time periods.
Therefore this seems a bit like the myopic reasoning RLHF will go towards the same failure modes as RLHF in the limit as the agent becomes more capable. (I'm happy to be shown I'm wrong here)This also depends on questions such as to what extent the underlying base model will be a maximiser and the agency model of the base model. (which OP also states)
If someone were to provide a convincing story that showed;1. How this method could be used whilst counteracting deception2. An example of how this would look from the inside of the AI3. How the model itself doesn't converge towards reasoning RLHF4. How this then itself is happening inside a generally capable AI
Then I might be convinced that it is a good idea.
Man, this was a really fun and exciting post! Thank you for writing it.
Maybe there's a connection to the FEP here? I remember Karl Friston saying something about how we can see morality as downstream from the FEP, which is (kinda?) used here, maybe?
I actually read less books than I used to, the 3x thing was that I listen to audiobooks at 3x the speed so I read less non-fiction but at a faster pace.
Also weird but useful in my head is for example looking into population dynamics to understand alignment failures. When does ecology predict that mode collapse will happen inside of large language models? Understanding these areas and writing about them is weird but it could also a useful bet for at least someone to take.
However, this also depends on how much doing the normal stuff is saturated. I would recommend trying to understand the problems and current approaches really well and then come up with ways of tackling them. To get the bits of information on how to tackle them you might want to check out weirder fields since those bits aren't already in the common pool of "alignment information" if that makes sense?
Generally a well-argued post; I enjoyed it even though I didn't agree with all of it. I do want to point out the bitter lesson when it comes to capabilities increase. On current priors, it seems like that intelligence should be something that can solve a lot of tasks at the same time. This would point towards higher capabilities in individual AIs, especially once you add online learning to the mix. The AGI will not have a computational storage limit for the amount of knowledge it can have. The division of agents you propose will most likely be able to made into the same agent, it's more about storage retrieval time here and storing an activation module for "play chess" is something that will not be computationally intractable for an AGI to do.
This means that the most probable current path forward is into highly capable general AI that generalise across tasks.