Progress and Prizes in AI Alignment

[-]jacob_cannell9y130

I came to a similar conclusion a while ago: it is hard to make progress in a complex technical field when progress itself is unmeasurable or worse ill-defined.

Part of the problem may be cultural: most working in the AI safety field have math or philosophy backgrounds. Progress in math and philosophy is intrinsically hard to measure objectively; success is mostly about having great breakthrough proofs/ideas/papers that are widely read and well regarded by peers. If your main objective is to convince the world, then this academic system works fine - ex: Bostrom. If your main objective is to actually build something, a different approach is perhaps warranted.

The engineering oriented branches of Academia (and I include comp sci in this) have a very different reward structure. You can publish to gain social status just as in math/philosophy, but if your idea also has commercial potential there is the powerful additional motivator of huge financial rewards. So naturally there is far more human intellectual capital going into comp sci than math, more into deep learning than AI safety.

In a sane world we'd realize that AI safety is a public good of immense value that probably requires large-scale coordination to steer the tech-economy towards solving. The X-prize approach essentially is to decompose a big long term goal into subgoals which are then contracted to the private sector.

The high level abstract goal for the Ansari XPrize was "to usher in a new era of private space travel". The specific derived prize subgoal was then "to build a reliable, reusable, privately financed, manned spaceship capable of carrying three people to 100 kilometers above the Earth's surface twice within two weeks".

AI safety is a huge bundle of ideas, but perhaps the essence could be distilled down to: "create powerful AI which continues to do good even after it can take over the world."

For the Ansari XPrize, the longer term goal of "space travel" led to the more tractable short term goal of "100 kilometers above the Earth's surface twice within two weeks". Likewise, we can replace "the world" in the AI safety example:

AI Safety "XPrize": create AI which can take over a sufficiently complex video game world but still tends to continue to do good according to a panel of human judges.

To be useful, the video game world should be complex in the right ways: it needs to have rich physics that agents can learn to control, it needs to permit/encourage competitive and cooperative strategic complexity similar to that in the real world, etc. So more complex than pac-man, but simpler than the Matrix. Something in the vein of a minecraft mod might have the right properties - but there are probably even more suitable open-world MMO games.

The other constraint on such a test is we want the AI to be superhuman in the video game world, but not our world (yet). Clearly this is possible - ala AlphaGo. But naturally the more complex the video game world is in the direction of our world, both the harder the goal becomes and the more dangerous.

Note also that the AI should not know that it is being tested; it shall not know it inhabits a simulation. This isn't likely to be any sort of problem for the AI we can actually build and test in the near future, but it becomes an interesting issue later on.

DeepMind is now focusing on Starcraft, OpenAI has universe, so we already on a related path. Competent AI for open-ended 3D worlds with complex physics - like minecraft - is still not quite here, but is probably realizable in just a few years.

[-]RyanCarey9y80

Nitpick:

MIRI recently announced a new research agenda focused on "agent foundations". Yet even the Open Philanthropy Project, made up of people who at least share MIRI's broad worldview, can't decide whether that research direction is promising or useless. The Berkeley Center for Human-Compatible AI doesn't seem to have a specific research agenda beyond Stuart Russell. The AI100 Center at Stanford is just kicking off. That's it.

There's also:

MIRI's Alignment for Advanced Machine Learning Systems agenda
The Concrete Problems agenda by Amodei, Olah and others
Russell's Research Priorities doc written with Dewey and Tegmark, covers probably more than his CHCAI Centre
Owain Evans, Stuart Armstrong and Eric Drexler at FHI
Paul Christiano's thinking on AI Control
OpenAI's safety team in formation
DeepMind's safety team
(?) Wei Dai's thinking on metaphilosophy and AI... He occasionally comments e.g. on AgentFoundations
Other Machine Learning researchers, e.g. safe exploration in Deep RL, transparency.

[-]David Scott Krueger (formerly: capybaralet)9y50

Looking at what they've produced to date, I don't really expect MIRI and CHCAI to produce that similar of work. I expect Russell's group to be more focused on value learning an corrigibility vs. reliable agent designs (MIRI).

[-]Rob Bensinger9y40

The Berkeley Center for Human-Compatible AI doesn't seem to have a specific research agenda beyond Stuart Russell.

Stuart Russell was the primary author of the FLI research priorities document, so I'd expect CHCAI's work to focus in on some of the problems sketched there. Based on CHCAI's publication page, their focus areas will probably include value learning, human-robot cooperation, and theories of bounded rationality. Right now, Russell's group is spending a lot of time on cooperative inverse reinforcement learning and corrigibility.

This slide from a recent talk by Critch seems roughly right to me: https://intelligence.org/wp-content/uploads/2017/01/hotspot-slide.png

So, why isn't there an XPrize for AI safety?

A prize fund is one of the main side-projects MIRI has talked about wanting to do for the last few years, if we could run a sufficiently large one -- more or less for the reasons you mention. Ideally the AI safety community would offer a diversity of prizes representing different views about what kinds of progress we'd be most excited by.

If funds for this materialize at some point, the main challenge will be that the most important conceptual breakthroughs right now involve going from mostly informal ideas to crude initial formalisms. This introduces some subjectivity in deciding whether the formalism really captures the key original idea, and also makes it harder for outside researchers to understand what kinds of work we're looking for. (MIRI's research team's focus is exactly on the parts of the problem that are hardest to design a prize for.) It's easier to come up with benchmarks in areas where there's already been a decent amount of technical progress, which would be quite valuable on its own, though it means potentially neglecting the most important things to work on.

[-]NellWatson9y40

There is in fact an X-Prize for AI, and it is general enough that it can apply to safety/alignment, though it's not purely aimed at this. http://ai.xprize.org

www.OpenEth.org is a 'let's build stuff and iterate whilst flailing around slightly as concepts develop' engineering-focussed project for Narrow AI / Smart Contract / Algorithm alignment and proofing, that is competing for this prize.

[-]itaibn09y20

You point out a problem: There's no way to tell which organizations are making progress on AI alignment, and there is little diversity in current approaches. You turn this into the question: How do we create prizes that incentivize progress in AI alignment? You're missing a step or two here.

I'd say the logic goes the opposite direction: because there are no clear objectively measurable targets that will improve AI safety, prizes are probably a bad idea for increasing the diversity and effectiveness of AI safety research.

[-]WilliamKiely9y10

How about having a prize for coming up with the very clear guidelines for an AI safety XPrize?

[-]JoshuaFox9y00

There certainly should be more orgs with different approaches. But possibly, CHCAI plays a role as the representative of MIRI in the mainstream academic world, and so from the perspective of goals, it is OK that the two are quite close.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

10

Progress and Prizes in AI Alignment

10

10