The Alignment Newsletter #4: 04/30/18

Rohin Shah

Highlights

Reptile: A Scalable Meta-Learning Algorithm (Alex Nichol et al): I somehow forgot to include this in past emails, so I'm including it now. Reptile is an algorithm for meta-learning, and in this paper is applied to few-shot classification, where given a few examples of different classes, you must learn a good classification algorithm for those classes. The authors show using a Taylor expansion that MAML and Reptile have very similar gradients to first order in alpha, the step size. Their evaluation shows that for the few-shot classification case, Reptile and MAML perform similarly (though they do not evaluate on reinforcement learning tasks, as in the MAML paper).

My opinion: This seems like an important advance in meta-learning, as it is much more computationally efficient than MAML while still achieving similar levels of performance.

Technical AI alignment

Technical agendas and prioritization

Inverse Reinforcement Learning and Inferring Human Preference with Dylan Hadfield-Menell (Lucas Perry and Dylan Hadfield-Menell): A few weeks ago, Lucas Perry interviewed Dylan Hadfield-Menell on the FLI podcast about his research (which includes papers like Cooperative Inverse Reinforcement Learning, The Off-Switch Game, and Inverse Reward Design). They discussed a variety of topics including the motivations behind Dylan's research, future directions, thoughts on hard problems such as corrigibility and preference aggregation, etc.

My opinion: This is probably most useful for understanding the motivations behind many of Dylan's papers and how they all tie into each other, which can be hard to glean just from reading the papers. There were also a lot of framings of problems that felt useful to me that I haven't seen elsewhere.

Learning human intent

Zero-Shot Visual Imitation (Deepak Pathak, Parsa Mahmoudieh et al)

Reward learning theory

Reward function learning: the value function and Reward function learning: the learning process (Stuart Armstrong): These posts introduce a theoretical framework for reward learning, where a reward learning algorithm is modeled as something that produces a probability distribution over reward functions given a history and current policy. With such a general notion of reward learning, it becomes hard to define the value function -- while we still want something like sum of expected rewards, it is no longer clear how to take an expectation over the reward function, given that the distribution over it can change over time. Most plausible ways of doing this lead to time-inconsistent decisions, but one works well. The second post turns to the learning process and analyzes properties that it would be nice to have. In the worst case, we can get quite pathological behavior, but of course we get to choose the learning algorithm so we can avoid worst-case behavior. In general, we would want our learning algorithm to be unriggable and/or uninfluenceable, but this is not possible when learning from humans since different policies on the AI's part will lead to it learning different rewards.

My opinion: I like this theoretical analysis that shows what could go wrong with processes that learn preferences. I did find it a bit hard to connect the ideas in this post with concrete reward learning algorithms (such as inverse reinforcement learning) -- it seems plausible to me that if I properly understood what the formal definitions of unriggable and uninfluenceable meant in the IRL setting, I wouldn't view them as desirable.

Forecasting

Double Cruxing the AI Foom debate (agilecaveman)

Critiques (Alignment)

The seven deadly sins of AI predictions (Rodney Brooks): This is an older article I was sent recently, that argues against AI risk and the idea that we will have AGI soon. It generally argues that AGI proponents are mistaken about current capabilities of AI and how long it will take to make progress in AGI research.

My opinion: This article is aimed at refuting the superintelligent perfectly-rational agent model of AGI, and so feels to me like it's attacking a strawman of the argument for AI risk, but it does seem to me that many people do have beliefs similar to the ones he's arguing against. I partially agree with some of his criticisms and disagree with others, but overall I think most of the arguments are reasonable ones and worth knowing about.

Miscellaneous (Alignment)

Value Alignment Map (FLI): This is a gigantic graph of many of the concepts in the AI risk space. Each concept has a description and links to existing literature, and by clicking around in the map I found several interesting links I hadn't seen before.

My opinion: This map is so large that I can't actually use it to get a birds-eye view of the entire space, but it seems quite useful for looking at a local region and as a starting point to explore one particular aspect more deeply.

AI strategy and policy

AI in the UK: ready, willing and able?

EU Member States sign up to cooperate on Artificial Intelligence

AI capabilities

Reinforcement learning

A Study on Overfitting in Deep Reinforcement Learning (Chiyuan Zhang et al)

TDM: From Model-Free to Model-Based Deep Reinforcement Learning (Vitchyr Pong)

Deep learning

Reptile: A Scalable Meta-Learning Algorithm (Alex Nichol et al): Summarized in the highlights!

Phrase-Based & Neural Unsupervised Machine Translation (Guillaume Lample et al)

Realistic Evaluation of Deep Semi-Supervised Learning Algorithms (Avital Oliver, Augustus Odena, Colin Raffel et al)

News

Summit on Machine Learning meets Formal Methods: This is a one-day summit on July 13 that is part of the Federated Logic Conference. This seems like an unusually good venue to think about how to apply formal methods to AI systems -- in particular I'm impressed by the list of speakers, which includes a variety of experts in both fields.

8