That was also my idea at first but then we have the Wagner group one so this is probably a false lead.
I really like that I see more discussion of "ok even if we managed to avoid xrisk what then?", e.g. recent papers on AI-enabled coups and so on. To the point however, I think the problem runs deeper. What I fear the most is that by "Western values imbued in AGI" people mean "we create an everlasting upperclass with no class mobility because capital is everything that matters and we freeze the capital structure, you will get UBI so you should be grateful."
It probably makes sense to keep the capitalist structure between ASIs but between humans? Seems like a very bad outcome for me (You will live in a pod and you will be happy type of endgame for the masses).
Very cool paper!
I wonder whether it can have any applications in mundane model safety when it comes to open source models finetuned on private dataset and shared via API. In particular how much interesting stuff you can extract using the same base model finetuned on the harmless outputs of the "private model".