Announcing the AI Alignment Prize

cousin_it

Stronger than human artificial intelligence would be dangerous to humanity. It is vital any such intelligence’s goals are aligned with humanity's goals. Maximizing the chance that this happens is a difficult, important and under-studied problem.

To encourage more and better work on this important problem, we (Zvi Mowshowitz and Vladimir Slepnev) are announcing a $5000 prize for publicly posted work advancing understanding of AI alignment, funded by Paul Christiano.

This prize will be awarded based on entries gathered over the next two months. If the prize is successful, we will award further prizes in the future.

This prize is not backed by or affiliated with any organization.

Rules

Your entry must be published online for the first time between November 3 and December 31, 2017, and contain novel ideas about AI alignment. Entries have no minimum or maximum size. Important ideas can be short!

Your entry must be written by you, and submitted before 9pm Pacific Time on December 31, 2017. Submit your entries either as URLs in the comments below, or by email to apply@ai-alignment.com. We may provide feedback on early entries to allow improvement.

We will award $5000 to between one and five winners. The first place winner will get at least $2500. The second place winner will get at least $1000. Other winners will get at least $500.

Entries will be judged subjectively. Final judgment will be by Paul Christiano. Prizes will be awarded on or before January 15, 2018.

What kind of work are we looking for?

AI Alignment focuses on ways to ensure that future smarter than human intelligence will have goals aligned with the goals of humanity. Many approaches to AI Alignment deserve attention. This includes technical and philosophical topics, as well as strategic research about related social, economic or political issues. A non-exhaustive list of technical and other topics can be found here.

We are not interested in research dealing with the dangers of existing machine learning systems commonly called AI that do not have smarter than human intelligence. These concerns are also understudied, but are not the subject of this prize except in the context of future smarter than human intelligence. We are also not interested in general AI research. We care about AI Alignment, which may or may not also advance the cause of general AI research.

I think that first you should elaborate on what you mean by "the goals of humanity". Do you mean majority opinion? In that case, one goal of humanity is to have a single world religious State, although there is disagreement on what that religion should be. Other goals of humanity include eliminating homosexuality and enforcing traditional patriarchal family structures.

Okay, I admit it--what I really think is that "goals of humanity" is a nonsensical phrase, especially when spoken by an American academic. It would be a little better to talk about values instead of goals, but not much better. The phrase still implies the unspoken belief that everyone would think like the person who speaks it, if only they were smarter.

For example, not turning the universe into paperclips is a goal of humanity.

Not really. I don't care if that happens in the long run, and many people wouldn't.

I hope at least you care if everyone on Earth dies painfully tomorrow. We don't have any theory that would stop AI from doing that, and any progress toward such a theory would be on topic for the contest.

Sorry, I'm feeling a bit frustrated. It's as if the decade of LW never happened, and people snap back out of rationality once they go off the dose of Eliezer's writing. And the mode they snap back to is so painfully boring.

tomorrow

That's not conventionally considered to be "in the long run".

We don't have any theory that would stop AI from doing that

The primary reason is that we don't have any theory about what a post-singularity AI might or might not do. Doing some pretty basic decision theory focused on the corner cases is not "progress".

I do care about tomorrow, which is not the long run.

I don't think we should assume that AIs will have any goals at all, and I rather suspect they will not, in the same way that humans do not, only more so.

I considered submitting an entry basically saying this, but decided that it would be pointless since obviously it would not get any prize. Human beings do not have coherent goals even individually. Much less does humanity.

anyone going to the AAAI ethics/safety conf?

Some references to lesswrong, and value alignment there.

David Abel has provided a fairly nice summary set of notes: https://cs.brown.edu/people/dabel/blog/posts/misc/aaai_2018.pdf

I have unpublished text on the topic and will put a draft online in the next couple of weeks, and will apply it to the competition. I will add URL here when it will be ready.

Update: My entry is here: https://www.lesserwrong.com/posts/CDWsjQr8KDuj69fTJ/message-to-any-future-ai-there-are-several-instrumental

I hope that those guys, who will win the alignment prize, they would be great one and creative. I would like to congrats them in advance. https://www.affordable-dissertation.co.uk