Two clarifications about "Strategic Background"

by Rob Bensinger1 min read12th Apr 20186 comments

77

Organization UpdatesMachine Intelligence Research Institute (MIRI)
Personal Blog

I've talked to a few people who misunderstood important parts of the "strategic background" discussion in https://intelligence.org/2017/12/01/miris-2017-fundraiser/#3.

First, at least two people thought the 1-8 numbered list was "MIRI's organizational plan" rather than "what we'd be least surprised to see happen in the world, conditional on good outcomes." MIRI is trying to de-confuse itself about step 8 and help put AGI developers in a better position in the future to select for AGI designs that are alignment-conducive, not trying to develop AGI.

Second, at least two other people misread "minimal aligned AGI" as "minimally aligned AGI", and thought MIRI was saying that developers should do the bare minimum of alignment work and then deploy immediately; or they saw that we were recommending building "systems with the bare minimum of capabilities for ending the acute risk period" and thought we were recommending this as an alternative to working really hard to achieve highly reliable and robust systems.

The MIRI view isn't "rather than making alignment your top priority and working really hard to over-engineer your system for safety, try to build a system with the bare minimum of capabilities". It's: "in addition to making alignment your top priority and working really hard to over-engineer your system for safety, also build the system to have the bare minimum of capabilities".

The idea isn't that you can get away with cutting corners on safety by keeping the system weak; per Eliezer's security mindset posts, a good plan should work (or fail safely) if the system ends up being a lot smarter than intended. Instead, the idea is that shooting for the bare minimum of capabilities adds a lot of value if your fundamentals are really good. Every additional capability a developer needs to align adds some extra difficulty and additional points of failure, so developers should target minimality in addition to alignment.

77

6 comments, sorted by Highlighting new comments since Today at 11:33 PM
New Comment

Would you think that the following approach would fit within "in addition to making alignment your top priority and working really hard to over-engineer your system for safety, also build the system to have the bare minimum of capabilities" and possibly work, or would you think that it would be hopelessly doomed?

  • Work hard on designing the system to be safe
  • But there's some problem left over that you haven't been able to fully solve, and think will manifest at a certain scale (level of intelligence/optimization power/capabilities)
  • Run the system, but limit scale to stay well within the range where you expect it to behave well

I think you're probably in a really bad state if you have to lean very much on that with your first AGI system. You want to build the system to not optimize any harder than absolutely necessary, but you also want the system to fail safely if it does optimize a lot harder than you were expecting.

The kind of AGI approach that seems qualitatively like "oh, this could actually work" to me involves more "the system won't even try to run searches for solutions to problems you don't want solved" and less "the system tries to find those solutions but fails because of roadblocks you put in the way (e.g., you didn't give it enough hardware)".

That post says "We plan to say more in the future about the criteria for strategically adequate projects in 7a" and also "A number of the points above require further explanation and motivation, and we’ll be providing more details on our view of the strategic landscape in the near future". As far as I can tell, MIRI hasn't published any further explanation of this strategic plan (I expected there to be something in the 2018 update but that post talks about other things). Is MIRI still planning to say more about its strategic plan in the near future, and if so, is there a concrete timeframe (e.g. "in a few months", "in a year", "in two years") for publishing such an explanation?

Oops, I saw your question when you first posted it but forgot to get back to you, Issa. (Issa re-asked here.) My apologies.

I think there are two main kinds of strategic thought we had in mind when we said "details forthcoming":

  • 1. Thoughts on MIRI's organizational plans, deconfusion research, and how we think MIRI can help play a role in improving the future — this is covered by our November 2018 update post, https://intelligence.org/2018/11/22/2018-update-our-new-research-directions/.
  • 2. High-level thoughts on things like "what we think AGI developers probably need to do" and "what we think the world probably needs to do" to successfully navigate the acute risk period.

Most of the stuff discussed in "strategic background" is about 2: not MIRI's organizational plan, but our model of some of the things humanity likely needs to do in order for the long-run future to go well. Some of these topics are reasonably sensitive, and we've gone back and forth about how best to talk about them.

Within the macrostrategy / "high-level thoughts" part of the post, the densest part was maybe 7a. The criteria we listed for a strategically adequate AGI project were "strong opsec, research closure, trustworthy command, a commitment to the common good, security mindset, requisite resource levels, and heavy prioritization of alignment work".

With most of these it's reasonably clear what's meant in broad strokes, though there's a lot more I'd like to say about the specifics. "Trustworthy command" and "a commitment to the common good" are maybe the most opaque. By "trustworthy command" we meant things like:

  • The organization's entire command structure is fully aware of the difficulty and danger of alignment.
  • Non-technical leadership can't interfere and won't object if technical leadership needs to delete a code base or abort the project.

By "a commitment to the common good" we meant a commitment to both short-term goodness (the immediate welfare of present-day Earth) and long-term goodness (the achievement of transhumanist astronomical goods), paired with a real commitment to moral humility: not rushing ahead to implement every idea that sounds good to them.

We still plan to produce more long-form macrostrategy exposition, but given how many times we've failed to word our thoughts in a way we felt comfortable publishing, and given how much other stuff we're also juggling, I don't currently expect us to have any big macrostrategy posts in the next 6 months. (Note that I don't plan to give up on trying to get more of our thoughts out sooner than that, if possible. We'll see.)

Thanks! I have some remaining questions:

  • The post says "On our current view of the technological landscape, there are a number of plausible future technologies that could be leveraged to end the acute risk period." I'm wondering what these other plausible future technologies are. (I'm guessing things like whole brain emulation and intelligence enhancement count, but are there any others?)
  • One of the footnotes says "There are other paths to good outcomes that we view as lower-probability, but still sufficiently high-probability that the global community should allocate marginal resources to their pursuit." What do some of these other paths look like?
  • I'm confused about the differences between "minimal aligned AGI" and "task AGI". (As far as I know, this post is the only place MIRI has used the term "minimal aligned AGI", so I have very little to go on.) Is "minimal aligned AGI" the larger class, and "task AGI" the specific kind of minimal aligned AGI that MIRI has decided is most promising? Or is the plan to first build a minimal aligned AGI, which then builds a task AGI, which then performs a pivotal task/helps build a Sovereign?
    • If the latter, then it seems like MIRI has gone from a one-step view ("build a Sovereign"), to a two-step view ("build a task-directed AGI first, then go for Sovereign"), to a three-step view ("build a minimal aligned AGI, then task AGI, then Sovereign"). I'm not sure why "three" is the right number of stages (why not two or four?), and I don't think MIRI has explained this. In fact, I don't think MIRI has even explained why it switched to the two-step view in the first place. (Wei Dai made this point here.)

Imagine that you hadn't figured out FDT, but you did have CDT and EDT. Would building an AI that defers to humans if they are different be an example of minimal but aligned?

If we take artificial addition too seriously, its hard to imagine what a "minimal arithmatician" looks like. If you understand arithmetic, you can make a perfect system, if you don't, the system will be hopeless. I would not be surprised if there was some simple "algorithm of maximally efficient intelligence" and we built it. No foom, AI starts at the top. All the ideas about rates of intelligence growth are nonsense. We built a linear time AIXI.