Is the rational mind set an existential risk? It spreads the idea of arms races and the treacherous turn. Should we be encouraging less than rational world views to spread if so what? And should we be coding them into our AI? You probably want them to be hard to predict so they cannot be exploited easily.
If it is it would still be worth preserving as an example of an insidious threat that should be guarded against. Perhaps in a simulation for people to interact with.
You might want as rational a choice of mindset to adopt as possible though. Decision making under deep uncertainty seems to allow you to deviate from the traditionally rational. You can evaluate plans under different world views and pick actions or plans that don't seem too bad under them all. This could allow irrational world views to have a voice.
How irrational do you want to accept a world view under deep uncertainty? Perhaps you need to evaluate the outcomes from that world view and see if there might be something hidden that it is tapping into.
I've been thinking a lot about identity (as in pg, keep your identity small).
Specifically which identities might lead to safe development of AI. And trying to validate that by running these different activities:
1. Role playing games where the participants are asked to take on specific identities and play through a scenario where AI has to be created.
2. Similar things where LLMs are prompted to take on particular roles and given agency to play in the role playing games too.
Has there been similar work before?
I'm particularly interested in cosmic identity, where you see humanity as a small part of a wider cosmos, including potentially hostile and potentially useful aliens. It has a number of properties that I think make it interesting, which I'll discuss in a full post, if people think this is worth exploring.
Are there identities that people think should be explored too?
The cosmic identity and related issues have been considered and I even used them to make a conjecture about alignment. As for role-playing games, I doubt that they are actually useful. Unless, of course, you mean something like Cannell's proposal.
As for "the idea of arms races and the treacherous turn", the AI-2027 team isn't worried about such a risk, they are more worried about the race itself causing the humans to do worse safety checks.
But slightly irrational actors might not race (especially if they know that other actors are slightly irrational in the same or compatible way.)
I think that there might be perverse incentives if identities or view points get promoted in a legible fashion. To hack that system rather than to do useful work.
So it might be good to have identity promotion to be done in a way that is obfuscated or ineffable in some way.