Luc Brinkman's Shortform

Luc Brinkman

This is a special post for quick takes by Luc Brinkman. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

AI Safety Pledge

Inspired by the Founder’s Pledge and the 10% Pledge, we can offer people transitioning to an AI safety career to make an AI Safety Pledge. It could look something like this:

I pledge to spend the coming years of my career on AI safety.
If I don’t manage to do so, for example because I can’t find a job in AI Safety, I will donate 10% of my income to the AI safety movement.
If I ever do decide to move back into AI safety, I can receive back my contributions to support my AI safety work.

Note: this is a very early idea, not a fully fledged proposal. I am currently entertained by the idea of an AI Safety Pledge, but not convinced that it’s useful and desirable. I'm posting it here to see what opinions people have about this.

Theory of Change

Hopefully, this pledge will:

incentivize people to try harder to complete their career transition to AI safety
create more buy-in towards keeping people accountable to their good intentions, e.g. through a virtual career coach.
decrease the effective income gap between non-safety work (which would now be reduced by 10%) and AI safety work
incentivize people to keep trying moving back to AI safety even if they weren’t successful initially

Some of the risks:

it might stimulate earning-to-give, which 80’000 hours currently views as less effective than direct career contributions.
it might be perceived by the outside world as cult-like
AI safety work may be hard to define

What do you think? Could this be useful? What assumptions would need to be true for this to be impactful? What could be a simple way of testing those assumptions?

Having a legible way to show you're doing this, and state the principles of truth seeking, actually looking at impacts, etc, seems good. I'm less convinced by the pledge framing, seems liable to bind your future self in ways that are overall unhealthy more often than not, but having something that you can sign up for the let's you sign out seems good. Esp with a bunch of focus on principles.

In particular; I expect not feeling like you get to in the moment be tracking whenever it feels right for you to keep working on this gets messy somewhat often.

I'd be more enthusiastic about carefully psychologically designed things near this in design space, and think this space is worth looking at. I'd be happy to have a list of people who are currently signed up for something vaguely like:

I am currently dedicated to trying to make AI go well for all sentient life. I wish to not hold false beliefs, and endeavour to understand and improve the consequences of my efforts.

Probably with some of the things in your suggestion as listed default paths.

Thanks for sharing your thoughts, Plex. I can imagine there are indeed some psychological considerations into having an effective commitment mechanism.

Is there anything in particular that having access to a list of people signed up for a mission statement like that would enable you, or those people, to do?

Coordinate more easily? Track who's doing what? Especially if the list was kept fresh, e.g. by pinging them once a year or every 6 months to see if they're still focusing on this.