Two clarifications about "Strategic Background"


76


Rob Bensinger

I've talked to a few people who misunderstood important parts of the "strategic background" discussion in https://intelligence.org/2017/12/01/miris-2017-fundraiser/#3.

First, at least two people thought the 1-8 numbered list was "MIRI's organizational plan" rather than "what we'd be least surprised to see happen in the world, conditional on good outcomes." MIRI is trying to de-confuse itself about step 8 and help put AGI developers in a better position in the future to select for AGI designs that are alignment-conducive, not trying to develop AGI.

Second, at least two other people misread "minimal aligned AGI" as "minimally aligned AGI", and thought MIRI was saying that developers should do the bare minimum of alignment work and then deploy immediately; or they saw that we were recommending building "systems with the bare minimum of capabilities for ending the acute risk period" and thought we were recommending this as an alternative to working really hard to achieve highly reliable and robust systems.

The MIRI view isn't "rather than making alignment your top priority and working really hard to over-engineer your system for safety, try to build a system with the bare minimum of capabilities". It's: "in addition to making alignment your top priority and working really hard to over-engineer your system for safety, also build the system to have the bare minimum of capabilities".

The idea isn't that you can get away with cutting corners on safety by keeping the system weak; per Eliezer's security mindset posts, a good plan should work (or fail safely) if the system ends up being a lot smarter than intended. Instead, the idea is that shooting for the bare minimum of capabilities adds a lot of value if your fundamentals are really good. Every additional capability a developer needs to align adds some extra difficulty and additional points of failure, so developers should target minimality in addition to alignment.