Consistently optimizing for solving alignment (or any other difficult problem) is incredibly hard.

The first and most obvious obstacle is that you need to actually care about alignment and feel responsible for solving it. You cannot just ignore it or pass the buck; you need to aim for it.

If you care, you now have to go beyond the traditions you were raised in. Be willing to go beyond the tools that you were given, and to use them in inappropriate and weird ways. This is where most people who care about alignment tend to fail — they tackle it like a normal problem from a classical field of science and not an incredibly hard and epistemologically fraught problem.

If you manage to transcend your methodological upbringing, you might come up with a different, fitter approach to attack the problem — your own weird inside view. Yet beware becoming a slave to your own insight, a prisoner to your own frame; it’s far too easy to never look back and just settle in your new tradition.

If you cross all these obstacles, then whatever you do, even if it is not enough, you will be one of the few who adapt, who update, who course-correct again and again. Whatever the critics, you’ll actually be doing your best.

This is the first filter. This is the first hard and crucial step to solve alignment: actually optimizing for solving the problem.

When we criticize each other in good faith about our approaches to alignment, we are acknowledging that we are not wedded to any approach or tradition. That we’re both optimizing to solve the problem. This is a mutual acknowledgement that we have both passed the first filter.

Such criticism should thus be taken as a strong compliment: your interlocutor recognizes that you are actually trying to solve alignment and open to changing your ways.

New Comment
5 comments, sorted by Click to highlight new comments since:

Well written. Do you have a few examples of pivoting when it becomes apparent that the daily grind no longer optimizes for solving the problem?

Or also how to notice it?

Good point, noticing is always how one starts.

In a limited context, the first example that comes to me is high performers in competitive sports and games. Because if they truly only give a shit about winning (and the best generally do), they will throw away their legacy approaches when they find a new one, however it pains them.

That's the second filter, because "optimizing" is two words: having a goal and maximising (or minimising) it.

First, one has to aknowledge that solving aligment is a goal. Many people does not recognize that it's a problem, beacuse smart robots will learn what love means and won't hurt us.

What you talked about in your post comes after this. When someone is walking towards the goalpost of alignment, they should realize that there might be multiple routes there and they should choose the quickest one, because only winning matters.