I think multi-multi is really hard to think about. One of the first steps I'm taking to get less confused about it is to scrutinize claims or opinions that I've encountered in the wild.
Critch & Krueger 2020 primarily discuss delegation, which is described as "when some humans want something done, those humans can delegate responsibility for the task to one or more AI systems." (p19). Delegation is in fact the composition of three subproblems; comprehension ("the human ability to understand how an AI system works and what it will do"), instruction ("the human ability to convey instructions to an AI system regarding what it should do"), and control ("the human ability to retain or regain control of a situation involving an AI system, especially in cases where the human is unable to successfully comprehend or instruct the AI system via the normal means intended by the system’s designers"). The four flavors of delegation are single-(human principal)/single-AI-(system), single-(human principal)/multi-AI-(systems), multi-(human principals)/single-AI-(system), and multi-(human principals)/multi-AI-(systems).
Naturally, the traditional "alignment problem" is roughly single-single delegation, especially single-single control. Aspects of single-multi can be made sense of in light of Dafoe et. al. 2020, and aspects of multi-single can be made sense of in light of Baum 2020, but it's difficult to find even minimal footholds in multi-multi.
Here's some notes from Critch about what makes multi-multi problematic:
I almost never refer to a multi-multi alignment, because I don’t know what it means and it’s not clear what the values of multiple different stakeholders even is. What are you referring to when you say the values are this function? ... So, I don’t say multi-multi alignment a lot, but I do sometimes say single-single alignment to emphasize that I’m talking about the single stakeholder version. I think the multi-multi alignment concept almost doesn’t make sense. ... And the idea that there’s this thing called human values, that we’re all in agreement about. And there’s this other thing called AI that just has to do with the human value says. And we have to align the AI with human values. It’s an extremely simplified story. It’s got two agents and it’s just like one big agent called the humans. And then there’s this one big agent called AIs. And we’re just trying to align them. I think that is not the actual structure of the delegation relationship that humans and AI systems are going to have with respect to each other in the future. And I think alignment is helpful for addressing some delegation relationships, but probably not the vast majority.
I have two, I'd like your help in expanding this list.
- Just solve single-single and ask the aligned superintelligence to solve multi-multi for us
- Multi-multi won't be suitable for technical people to work on until it’s been reduced to the theory of nonlinear systems (admittedly I've only found this opinion in one person so it's not really "in the memeplex")
If you have an opinion that you think might be yours that you don't think qualifies as "in the memeplex", I hope you share it anyway! I'm also really happy if you pontificate about intuitions you have or bottlenecks you see here in the answers. In general, fitting your answer into my project is my problem, not yours. This is also an invitation for you to DM me what confuses you about multi-multi, why you think it might be hard or easy, etc.