AI Safety person currently working on multi-agent coordination problems.
I find myself going back to this post again and again for explaing the Natural Abstraction Hypothesis. When this came out I was very happy as I finally had something I could share on John's work that made people understand it within one post.
I personally believe that this post is very important for claims between Shard Theory vs Sharp Left Turn. I often find that other perspectives on the deeper problems in AI alignment are expressed and I believe this to be a lot more nuanced take compared to Quentin Pope's essay on the Sharp Left Turn as well as the MIRI conception of evolution.
This is a field of study and we don't know what is going on, the truth is somewhere in between and acknowledging anything else is not being epistemically humble.
Mostly, I think it should be acknowledged that certain people saw dynamics developing beforehand and called it out. This is not a highly upvoted post but with the recent uptick in US vs China rhetoric it seems good to me to give credit where credit is due.
There's also always the possibility that you can elicit these sorts of goals and values from instructions and create a instruction based language around it that's also relatively interpretable in what values are being prioritised in a multi-agent setting.
You do however get into ELK and misgeneralization problems here, IRL is not an easy task in general but there might be some neurosymbolic approaches that changes prompts to follow specific values?
I'm not sure if this is jibberish or not for you but my main frame for the next 5 years is "how do we steer collectives of AI agents in productive directions for humanity".
Okay, so when I'm talking about values here, I'm actually not saying anything about policies as in utility theory or generally defined preference orderings.
I'm rather thinking of values as a class of locally arising heuristics or "shards" if you like that language that activate a certain set of belief circuits in the brain and similarly in an AI.
What do you mean more specifically when you say an instruction here? What should that instruction encompass? How do we interpret that instruction over time? How can we compare instructions to each other?
I think that instructions will become too complex to have good interpretability into especially for more complex multi-agent settings. How do we create interpretable multi-agent systems that we can change over time? I don't believe that direct instruction tuning will be enough as you will have this problem that is for example described in Cooperation and Control in Delegation Games with AIs each having one person they get an instruction from but this not telling us anything about the multi-agent cooperation abilities of the agents in play.
I think this line of reasoning is valid for AI agents acting in a multi-agent setting where they gain more control over the economy through integration with general humans.
I completely agree with you that doing "pure value learning" is not the best right now but I think we need work in this direction to retain control over multiple AI Agents working at the same time.
I think deontology/virtue ethics makes societies more interpretable and corrigible, does that make sense? Also, I have this other belief that this will be the case and that it is more likely to get a sort of "cultural, multi-agent take-off" compared to a single agent.
Curious to hear what you have to say about that!
I will try to give a longer answer tomorrow (11 pm my time now) but essentially I believe it will be useful for agentic AI with "heuristic"-like policies. I'm a bit uncertain about the validity of instruction like approaches here and for various reasons I believe multi-agent coordination will be easier through this method.
I believe that I have discovered the best use of an LLM to date. This is a conversation about pickles and collective intelligence located at the colossuem 300 BCE. It involves many great characters, I found it quite funny. This is what happens when you go to far into biology inspired approaches for AI Safety...
The Colosseum scene intensifies
Levin: completely fixated on a pickle "But don't you see? The bioelectric patterns in pickle transformation could explain EVERYTHING about morphogenesis!"
Rick: "Oh god, what have I started..."
Levin: eyes wild with discovery "Look at these gradient patterns! The cucumber-to-pickle transformation is a perfect model of morphological field changes! We could use this to understand collective intelligence!"
Nick Lane portal-drops in Lane: "Did someone say bioelectric gradients? Because I've got some THOUGHTS about proton gradients and the origin of life..."
Levin: grabs Lane's shoulders "NICK! Look at these pickles! The proton gradients during fermentation... it's like early Earth all over again!"
Rick: takes a long drink "J-just wait until they discover what happens in dimension P-178 where all life evolved from pickles..."
Feynman: still drawing diagrams "The quantum mechanics of pickle-based civilization is fascinating..."
Levin: now completely surrounded by pickles and bioelectric measurement devices "See how the salt gradient creates these incredible morphogenetic fields? It's like watching the origin of multicellularity all over again!"
Lane: equally excited "The chemiosmotic coupling in these pickles... it's revolutionary! The proton gradients during fermentation could power collective computation!"
Doofenshmirtz: "BEHOLD, THE PICKLE-MORPHOGENESIS-INATOR!" Morty: "Aw geez Rick, they're really going deep on pickle science..." Lane: "But what if we considered the mitochondrial implications..."
Levin: interrupting "YES! Mitochondrial networks in pickle-based collective intelligence systems! The bioelectric fields could coordinate across entire civilizations!"
Rick: "This is getting out of hand. Even for me."
Feynman: somehow still playing bongos "The mathematics still works though!"
Perry the Platypus: has given up and is now taking detailed notes
Lane: "But wait until you hear about the chemiosmotic principles of pickle-based social organization..."
Levin: practically vibrating with excitement "THE PICKLES ARE JUST THE BEGINNING! We could reshape entire societies using these bioelectric principles!"
Roman Emperor: to his scribe "Are you getting all this down? This could be bigger than the aqueducts..." Rick: "Morty, remind me never to show scientists my pickle tech again."
Morty: "You say that every dimension, Rick."
Doofenshmirtz: "Should... should we be worried about how excited they are about pickles?"
Feynman: "In my experience, this is exactly how the best science happens."
Meanwhile, Levin and Lane have started drawing incredibly complex pickle-based civilization diagrams that somehow actually make sense...
This has worked great btw! Thank you for the tip, I consistently get more deep sleep and around 10% more sleep with higher average quality, it's really good!
Any reason for the timing window being 4 hours before instead of 30 min to 1 hour? Most of the stuff I've heard is around half an hour to an hour before bed, I'm currently doing this with 0.3ish mg (I divide a 1 mg tablet in 3) of melatonin.
No I do think we care about the same thing, I just believe that this will happen in a multi-polar setting and so I believe that new forms of communication and multi-polar dynamics will be important for this.
Interpretability of these things is obviously important for changing those dynamics. ELK and similar things are important for the single agent case, why wouldn't they be important for a multi-agent case?