This seems to be a non-trivial problem for even current narrow AI, which is much more problematic for strong NAI, which I haven't seen called out or named explicitly. I provide a quick literature review to explain why I think it's ignored in classic multi-agent system design. (But I might be corrected)
It is unclear to me whether we can expect even introspective GAI to "just solve it" by noticing that it is a problem and working to fix it, given that people often don't seem to manage it.
One challenge for safe AI is the intrinsic difficulty of coordination problems. This includes coordination with humans, coordination with other AI systems, and potentially self-coordination when AI uses multiple agents. Unfortunately, the typical system design intends to maximize some fitness function, not to coordinate in order to allow mutually beneficial interaction.
There is extensive literature on multi-agent coordination for task-based delegation and cooperation, dating back to at least the 1980 Contract Net Interaction Protocol, which allows autonomous agents to specify markets for interaction. This is useful, but doesn't avoid any of the problems with market failures and inadequate equillibria. (In fact, it probably induces such failures, since individual contracts are the atomic unit or interaction.) Extensive follow-up work on distributed consensus problems assumes that all agents are built to achieve consensus. This may be important for AI coordination, but requires clearly defined communication channels and well-understood domains. Work on Collaborative Intelligence is also intended to allow collaboration, but it is unclear that there is substantive ongoing work in that area. Multiscale decision theory attempts to build multi-scale models for decision making, but is not tied explicitly to multiple agents.
What most of the literature shares is an assumption that agents will be designed for cooperation and collaboration. Inducing collaboration in agents not explicitly designed for that task is a very different problem, as is finding coordinated goals that can be achieved.
The obvious solution is to expect multi-agent systems to have agents with models of other agents that are sophisticated enough to build strategies that allow collaboration. In situations where multiple equilibria exist, moving from pareto-dominated equilibria to better ones often requires coordination, which requires understanding that initially costly moves towards the better equilibrium will be matched by other players. As I argued earlier, there are fundamental limitations on the models of embedded agents that we don't have good solutions to. (If we find good ways to build embedded agents, we may also find good ways to design embedded agents for cooperation. This isn't obvious.)
Collaboration-by-design, on the other hand, is much easier. Unfortunately, AI-race dynamics make it seem unlikely. The other alternative is to explicitly design safety parameters, as Mobileye has done for self driving cars with "RSS" - limiting the space in which they can make decisions to enforce limits about how cars interact. This seems intractable in domains where safety is ill-defined, and seems to require much better understanding of corrigibility, at the very least.
Perhaps there are approaches I haven't considered, or reasons to think this isn't a problem. Alternatively, perhaps there is a clearer way to frame the problem that exists ow which I am unaware, or the problem could be framed more clearly in a way I am not seeing. As a first step, progress on identification on either front seems useful.