Daniel Dewey on MIRI's Highly Reliable Agent Design Work

by lifelonglearner1 min read9th Jul 20175 comments


Machine Intelligence Research Institute (MIRI)
Personal Blog
5 comments, sorted by Highlighting new comments since Today at 9:06 AM
New Comment

My summary of the review:

  • HRAD is “work that aims to describe basic aspects of reasoning and decision-making in a complete, principled, and theoretically satisfying way”

  • Further breaking down HRAD into MIRI’s research topics (Philosophy, decision theory, logical uncertainty, and Vingean reflection).

  • MIRI’s position is that even minor mistakes in AI design could have catastrophic effects if these AI systems are very powerful.

  • HRAD, if fully complete, would give us a full description of AI systems such that we would be able to feel relatively certain that a given AI system would or would not cause catastrophic harm.

  • Daniel agrees that current formalisms to describe reasoning are incomplete or unsatisfying.

  • He also agrees that powerful AI systems have the potential to cause serious harm if mistakes are made in their design.

  • He agrees that we should have some kind of formalism that tells us whether or not an advanced AI system will be aligned.

  • However, Daniel assigns only a 10% chance that MIRI’s work in HRAD will be helpful in understanding current and future AI designs.

  • The reasons for this are:
    (1) MIRI’s HRAD work does not seem to be applicable to any current machine learning systems. (2) Mainstream AI researchers haven’t expressed much enthusiasm for MIRI’s HRAD work. (3) Daniel is more enthusiastic about Paul Christiano’s alternative approach and believes academic AI researchers are as well.

  • However, he believes MIRI researchers are “thoughtful, aligned with our values, and have a good track record.”

  • He believes HRAD is currently funding constrained and somewhat neglected, therefore, if it turns out to be the correct approach, then supporting it now could be very beneficial.

Thanks for a more in-depth summary!

"From a portfolio approach perspective, a particular research avenue is worthwhile if it helps to cover the space of possible reasonable assumptions. For example, while MIRI’s research is somewhat controversial, it relies on a unique combination of assumptions that other groups are not exploring, and is thus quite useful in terms of covering the space of possible assumptions."


Short summary:

Daniel Dewey of Open Phil gives a very well-articulated summary (IMO) of MIRI's work on trying to develop High Reliable Agents, which is in their Agent Foundations agenda and covers things like reasoning under uncertainty / decision theory / epistemology.

He examines the arguments behind why MIRI thinks it's worth pursuing, and he evaluates them against other possible approaches to AI safety.

This is an exceptionally well reasoned article, I'd say. Particular props to the appropriate amount of uncertainty.