Progress and Prizes in AI Alignment

by Jacobian 3y3rd Jan 20171 min read8 comments


Edit: In case it's not obvious, I have done limited research on AI alignment organizations and the goal of my post is to ask questions from the point of view of someone who wants to contribute and is unsure how. Read down to the comments for some great info on the topic.

I was introduced to the topic of AI alignment when I joined this very forum in 2014. Two years and one "Superintelligence" later, I decided that I should donate some money to the effort. I knew about MIRI, and I looked forward to reading some research comparing their work to the other organizations working in this space. The only problem is... there really aren't any.

MIRI recently announced a new research agenda focused on "agent foundations". Yet even the Open Philanthropy Project, made up of people who at least share MIRI's broad worldview, can't decide whether that research direction is promising or useless. The Berkeley Center for Human-Compatible AI doesn't seem to have a specific research agenda beyond Stuart Russell. The AI100 Center at Stanford is just kicking off. That's it.

I think that there are two problems here:


  1. There's no way to tell which current organization is going to make the most progress towards solving AI alignment.
  2. These organizations are likely to be very similar to each other, not least because they practically share a zipcode. I don't think that MIRI and the academic centers will do the exact same research, but in the huge space of potential approaches to AI alignment they will likely end up pretty close together. Where's the group of evo-psych savvy philosophers who don't know anything about computer science but are working to spell out an approximation of universal human moral intuitions?
It seems like there's a meta-question that needs to be addressed, even before any work is actually done on AI alignment itself:


How to evaluate progress in AI alignment?

Any answer to that question, even if not perfectly comprehensive or objective, will enable two things. First of all, it will allow us to direct money (and the best people) to the existing organizations where they'll make the most progress.

More importantly, it will enable us to open up the problem of AI alignment to the world and crowdsource it. 

For example, the XPrize Foundation is a remarkable organization that creates competitions around achieving goals beneficial to humanity, from lunar rovers to ecological monitoring. The prizes have two huge benefits over direct investment in solving an issue:


  1. They usually attract a lot more effort than what the prize money itself would pay for. Competitors often spend in aggregate 2-10 times the prize amount in their efforts to win the competition.
  2. The XPrizes attract a wide variety of creative entrants from around the world, because they only describe what needs to be done, not how.
So, why isn't there an XPrize for AI safety? You need very clear guidelines to create an honest competition, like "build the cheapest spaceship that can take 3 people to 100km and be reused within 2 weeks". It doesn't seem like we're close to being able to formulate anything similar for AI alignment. It also seems that if anyone will have good ideas on the subject, it will be the people on this forum. So, what do y'all think?

Can we come up with creative ways to objectively measure some aspect of progress on AI safety, enough to set up a competition around it?