Inspired by the Founder’s Pledge and the 10% Pledge, we can offer people transitioning to an AI safety career to make an AI Safety Pledge. It could look something like this:
Note: this is a very early idea, not a fully fledged proposal. I am currently entertained by the idea of an AI Safety Pledge, but not convinced that it’s useful and desirable. I'm posting it here to see what opinions people have about this.
Theory of Change
Hopefully, this pledge will:
Some of the risks:
What do you think? Could this be useful? What assumptions would need to be true for this to be impactful? What could be a simple way of testing those assumptions?
Having a legible way to show you're doing this, and state the principles of truth seeking, actually looking at impacts, etc, seems good. I'm less convinced by the pledge framing, seems liable to bind your future self in ways that are overall unhealthy more often than not, but having something that you can sign up for the let's you sign out seems good. Esp with a bunch of focus on principles.
In particular; I expect not feeling like you get to in the moment be tracking whenever it feels right for you to keep working on this gets messy somewhat often.
I'd be more enthusiastic about carefully psychologically designed things near this in design space, and think this space is worth looking at. I'd be happy to have a list of people who are currently signed up for something vaguely like:
I am currently dedicated to trying to make AI go well for all sentient life. I wish to not hold false beliefs, and endeavour to understand and improve the consequences of my efforts.
Thanks for sharing your thoughts, Plex. I can imagine there are indeed some psychological considerations into having an effective commitment mechanism.
Is there anything in particular that having access to a list of people signed up for a mission statement like that would enable you, or those people, to do?
Coordinate more easily? Track who's doing what? Especially if the list was kept fresh, e.g. by pinging them once a year or every 6 months to see if they're still focusing on this.