Hello,
As I read more from this forum and other areas about ethical AI and Decision Theory, I start to imagine what future scenario there could be if multiple AGIs are created simultaneously, by actors who are not coordinating with each other. We've seen similar issues come up in the history of technology, particularly computer technology, where multiple competing standards are created around the same time which are mutually incompatible.
So we imagine we have two AGIs, which are intelligent enough to communicate with humans and each other but are built on very different utility functions, as well as different approaches to Decision Theory. From the perspective of their respective creators, their AGI is perfectly aligned with human morality, but due to different sets of assumptions, philosophies or religions the two actors defined their outer alignment in two different ways.
The reason this seems like a bad situation is because (from what I understand) FDT works on an assumption that other actors use a similar utility function as itself. Thus, the two agents would start with the assumption that the other agent uses the same utility function (which is a false). This bad assumption would fall into miscommunication and conflict, as each agent believes the other is acting immoral or defective.
This seems oddly similar to the way human conflicts arise in the real world (through miscommunication), so an AGI being capable of having that problem incidentally makes them more human-like.
What do you think would be the result? Has this thought experiment been entertained before?
Any common knowledge they can draw up can go into a coordinating agent (adjudicator), all it needs is to be shared among the coalition, it doesn't need to have any particular data. The problem is verifying that all members of the coalition will follow the policy chosen by the coordinating agent, and common knowledge of source code is useful for that. But it could just be the source code of the trivial rule of always following the policy given by the coordinating agent.
One possible policy chosen by the adjudicator should be falling back to unshared/private BATNA, aborting the bargain, and of course doing other things not in scope of this particular bargain. These things are not parts of the obey-the-adjudicator algorithm, but consequences of following it. So common knowledge of everything is not needed, only common knowledge of the adjudicator and its authority over the coalition. (This is also a possible way of looking at UDT, where a single agent in many possible states acting through many possible worlds coordinates among its variants.)