avatar

posted on 2023-03-23 — also cross-posted on lesswrong, see there for comments

continue working on hard alignment! don't give up!

let's call "hard alignment" the ("orthodox") problem, historically worked on by MIRI, of preventing strong agentic AIs from pursuing things we don't care about by default and destroying everything of value to us on the way there. let's call "easy" alignment the set of perspectives where some of this model is wrong — some of the assumptions are relaxed — such that saving the world is easier or more likely to be the default.

what should one be working on? as always, the calculation consists of comparing

because of how AI capabilities are going, i've seen for people start playing their outs — that is to say, to start acting as if alignment is easy, because if it's not we're doomed anyways. but i think, in this particular case, this is wrong.

this is the lesson of dying with dignity and bracing for the alignment tunnel: we should be cooperating with our counterfactual selves and continue to save the world in whatever way actually seems promising, rather than taking refuge in falsehood.

to me, p(hard) is big enough, and my hard-compatible plan seems workable enough, that it makes sense for me to continue to work on it.

let's not give up on the assumptions which are true. there is still work that can be done to actually generate some dignity under the assumptions that are actually true.

posted on 2023-03-23 — also cross-posted on lesswrong, see there for comments

unless explicitely mentioned, all content on this site was created by me; not by others nor AI.