I've talked to a few people who misunderstood important parts of the "strategic background" discussion in https://intelligence.org/2017/12/01/miris-2017-fundraiser/#3.
First, at least two people thought the 1-8 numbered list was "MIRI's organizational plan" rather than "what we'd be least surprised to see happen in the world, conditional on good outcomes." MIRI is trying to de-confuse itself about step 8 and help put AGI developers in a better position in the future to select for AGI designs that are alignment-conducive, not trying to develop AGI.
Second, at least two other people misread "minimal aligned AGI" as "minimally aligned AGI", and thought MIRI was saying that developers should do the bare minimum of alignment work and then deploy immediately; or they saw that we were recommending building "systems with the bare minimum of capabilities for ending the acute risk period" and thought we were recommending this as an alternative to working really hard to achieve highly reliable and robust systems.
The MIRI view isn't "rather than making alignment your top priority and working really hard to over-engineer your system for safety, try to build a system with the bare minimum of capabilities". It's: "in addition to making alignment your top priority and working really hard to over-engineer your system for safety, also build the system to have the bare minimum of capabilities".
The idea isn't that you can get away with cutting corners on safety by keeping the system weak; per Eliezer's security mindset posts, a good plan should work (or fail safely) if the system ends up being a lot smarter than intended. Instead, the idea is that shooting for the bare minimum of capabilities adds a lot of value if your fundamentals are really good. Every additional capability a developer needs to align adds some extra difficulty and additional points of failure, so developers should target minimality in addition to alignment.
That post says "We plan to say more in the future about the criteria for strategically adequate projects in 7a" and also "A number of the points above require further explanation and motivation, and we’ll be providing more details on our view of the strategic landscape in the near future". As far as I can tell, MIRI hasn't published any further explanation of this strategic plan (I expected there to be something in the 2018 update but that post talks about other things). Is MIRI still planning to say more about its strategic plan in the near future, and if so, is there a concrete timeframe (e.g. "in a few months", "in a year", "in two years") for publishing such an explanation?
Oops, I saw your question when you first posted it but forgot to get back to you, Issa. (Issa re-asked here.) My apologies.
I think there are two main kinds of strategic thought we had in mind when we said "details forthcoming":
Most of the stuff discussed in "strategic background" is about 2: not MIRI's organizational plan, but our model of some of the things humanity likely needs to do in order for the long-run future to go well. Some of these topics are reasonably sensitive, and we've gone back and forth about how best to talk about them.
Within the macrostrategy / "high-level thoughts" part of the post, the densest part was maybe 7a. The criteria we listed for a strategically adequate AGI project were "strong opsec, research closure, trustworthy command, a commitment to the common good, security mindset, requisite resource levels, and heavy prioritization of alignment work".
With most of these it's reasonably clear what's meant in broad strokes, though there's a lot more I'd like to say about the specifics. "Trustworthy command" and "a commitment to the common good" are maybe the most opaque. By "trustworthy command" we meant things like:
By "a commitment to the common good" we meant a commitment to both short-term goodness (the immediate welfare of present-day Earth) and long-term goodness (the achievement of transhumanist astronomical goods), paired with a real commitment to moral humility: not rushing ahead to implement every idea that sounds good to them.
We still plan to produce more long-form macrostrategy exposition, but given how many times we've failed to word our thoughts in a way we felt comfortable publishing, and given how much other stuff we're also juggling, I don't currently expect us to have any big macrostrategy posts in the next 6 months. (Note that I don't plan to give up on trying to get more of our thoughts out sooner than that, if possible. We'll see.)
Thanks! I have some remaining questions:
Would you think that the following approach would fit within "in addition to making alignment your top priority and working really hard to over-engineer your system for safety, also build the system to have the bare minimum of capabilities" and possibly work, or would you think that it would be hopelessly doomed?
I think you're probably in a really bad state if you have to lean very much on that with your first AGI system. You want to build the system to not optimize any harder than absolutely necessary, but you also want the system to fail safely if it does optimize a lot harder than you were expecting.
The kind of AGI approach that seems qualitatively like "oh, this could actually work" to me involves more "the system won't even try to run searches for solutions to problems you don't want solved" and less "the system tries to find those solutions but fails because of roadblocks you put in the way (e.g., you didn't give it enough hardware)".
Imagine that you hadn't figured out FDT, but you did have CDT and EDT. Would building an AI that defers to humans if they are different be an example of minimal but aligned?
If we take artificial addition too seriously, its hard to imagine what a "minimal arithmatician" looks like. If you understand arithmetic, you can make a perfect system, if you don't, the system will be hopeless. I would not be surprised if there was some simple "algorithm of maximally efficient intelligence" and we built it. No foom, AI starts at the top. All the ideas about rates of intelligence growth are nonsense. We built a linear time AIXI.