Nate Soares has written up a post that discusses MIRI's new research directions: a mix of reasons why we're pursuing them, reasons why we're pursuing them the way we are, and high-level comparisons to Agent Foundations.

Read the full (long!) post here.

New to LessWrong?

New Comment
12 comments, sorted by Click to highlight new comments since: Today at 2:11 AM

I was recently thinking about focus. Some examples:

This tweet:

The internet provides access to an education that the aristocracy of old couldn't have imagined.
It also provides the perfect attack vector for marketers to exploit cognitive vulnerabilities and dominate your attention.
A world-class education is free for the undistractable.

Sam Altman's recent blogpost on How to Be Successful has the following two commands:

3. Learn to think independently
6. Focus

(He often talks about the main task a startup founder has is to pick the 2 or 3 things to focus on that day of the 100+ things vying for your attention.)

And I found this old quote by the mathematician Gronthendieck on Michael Nielson's blog.

In those critical years I learned how to be alone. [But even] this formulation doesn't really capture my meaning. I didn't, in any literal sense, learn to be alone, for the simple reason that this knowledge had never been unlearned during my childhood. It is a basic capacity in all of us from the day of our birth. However these three years of work in isolation [1945-1948], when I was thrown onto my own resources, following guidelines which I myself had spontaneously invented, instilled in me a strong degree of confidence, unassuming yet enduring in my ability to do mathematics, which owes nothing to any consensus or to the fashions which pass as law. By this I mean to say: to reach out in my own way to the things I wished to learn, rather than relying on the notions of the consensus, overt or tacit, coming from a more or less extended clan of which I found myself a member, or which for any other reason laid claim to be taken as an authority. This silent consensus had informed me both at the lycee and at the university, that one shouldn't bother worrying about what was really meant when using a term like "volume" which was "obviously self-evident", "generally known," "in problematic" etc... it is in this gesture of "going beyond" to be in oneself rather than the pawn of a consensus, the refusal to stay within a rigid circle that others have drawn around one -- it is in this solitary act that one finds true creativity. All others things follow as a matter of course.
Since then I’ve had the chance in the world of mathematics that bid me welcome, to meet quite a number of people, both among my "elders" and among young people in my general age group who were more brilliant, much more ‘gifted’ than I was. I admired the facility with which they picked up, as if at play, new ideas, juggling them as if familiar with them from the cradle -- while for myself I felt clumsy, even oafish, wandering painfully up an arduous track, like a dumb ox faced with an amorphous mountain of things I had to learn (so I was assured) things I felt incapable of understanding the essentials or following through to the end. Indeed, there was little about me that identified the kind of bright student who wins at prestigious competitions or assimilates almost by sleight of hand, the most forbidding subjects.
In fact, most of these comrades who I gauged to be more brilliant than I have gone on to become distinguished mathematicians. Still from the perspective of thirty or thirty five years, I can state that their imprint upon the mathematics of our time has not been very profound. They've done all things, often beautiful things in a context that was already set out before them, which they had no inclination to disturb. Without being aware of it, they've remained prisoners of those invisible and despotic circles which delimit the universe of a certain milieu in a given era. To have broken these bounds they would have to rediscover in themselves that capability which was their birthright, as it was mine: The capacity to be alone.

Overall, it made me update that MIRI's decision to be closed-by-default is quite sensible. This section seems trivially correct from this point of view.

Focus seems unusually useful for this kind of work
There may be some additional speed-up effects from helping free up researchers’ attention, though we don’t consider this a major consideration on its own.
Historically, early-stage scientific work has often been done by people who were solitary or geographically isolated, perhaps because this makes it easier to slowly develop a new way to factor the phenomenon, instead of repeatedly translating ideas into the current language others are using. It’s difficult to describe how much mental space and effort turns out to be taken up with thoughts of how your research will look to other people staring at you, until you try going into a closed room for an extended period of time with a promise to yourself that all the conversation within it really won’t be shared at all anytime soon.
Once we realized this was going on, we realized that in retrospect, we may have been ignoring common practice, in a way. Many startup founders have reported finding stealth mode, and funding that isn’t from VC outsiders, tremendously useful for focus. For this reason, we’ve also recently been encouraging researchers at MIRI to worry less about appealing to a wide audience when doing public-facing work. We want researchers to focus mainly on whatever research directions they find most compelling, make exposition and distillation a secondary priority, and not worry about optimizing ideas for persuasiveness or for being easier to defend.

I think that closed-by-default is a very bad strategy from the perspective of outreach, and the perspective of building a field of AI alignment. But I realise that MIRI is explicitly and wholly focusing on making research progress, for at least the coming few years, and I think overall the whole post and decisions make a lot of sense from this perspective.

Our impression is indeed that well-targeted outreach efforts can be highly valuable. However, attempts at outreach/influence/field-building seem to us to currently constitute a large majority of worldwide research activity that’s motivated by AGI safety concerns,[10] such that MIRI’s time is better spent on taking a straight shot at the core research problems. Further, we think our own comparative advantage lies here, and not in outreach work.[11]

And here's the footnotes:

[10] In other words, many people are explicitly focusing only on outreach, and many others are selecting technical problems to work on with a stated goal of strengthening the field and drawing others into it.
[11] This isn’t meant to suggest that nobody else is taking a straight shot at the core problems. For example, OpenAI’s Paul Christiano is a top-tier researcher who is doing exactly that. But we nonetheless want more of this on the present margin.

Edited: Added a key section at the end.

Another interesting idea for discussion, is the value of making a long-term commitment to keeping research within a contained environment (i.e. what the OP calls 'nondisclosed-by-default').

There's a bunch of args. Many seem straightforward to me (early research doesn't translate well into papers at all, it might accidentally turn out to move capabilities forward and you want to see it develop a while to be sure it won't, etc) but this one surprised me more, and I'd be interested to know if it resonates/is-dissonant with others' experiences.

We need our researchers to not have walls within their own heads
We take our research seriously at MIRI. This means that, for many of us, we know in the back of our minds that deconfusion-style research could sometimes (often in an unpredictable fashion) open up pathways that can lead to capabilities insights in the manner discussed above. As a consequence, many MIRI researchers flinch away from having insights when they haven’t spent a lot of time thinking about the potential capabilities implications of those insights down the line—and they usually haven’t spent that time, because it requires a bunch of cognitive overhead. This effect has been evidenced in reports from researchers, myself included, and we’ve empirically observed that when we set up “closed” research retreats or research rooms,13 researchers report that they can think more freely, that their brainstorming sessions extend further and wider, and so on.
This sort of inhibition seems quite bad for research progress. It is not a small area that our researchers were (un- or semi-consciously) holding back from; it’s a reasonably wide swath that may well include most of the deep ideas or insights we’re looking for.
At the same time, this kind of caution is an unavoidable consequence of doing deconfusion research in public, since it’s very hard to know what ideas may follow five or ten years after a given insight. AI alignment work and AI capabilities work are close enough neighbors that many insights in the vicinity of AI alignment are “potentially capabilities-relevant until proven harmless,” both for reasons discussed above and from the perspective of the conservative security mindset we try to encourage around here.
In short, if we request that our brains come up with alignment ideas that are fine to share with everybody—and this is what we’re implicitly doing when we think of ourselves as “researching publicly”—then we’re requesting that our brains cut off the massive portion of the search space that is only probably safe.
If our goal is to make research progress as quickly as possible, in hopes of having concepts coherent enough to allow rigorous safety engineering by the time AGI arrives, then it seems worth finding ways to allow our researchers to think without constraints, even when those ways are somewhat expensive.
Focus seems unusually useful for this kind of work
There may be some additional speed-up effects from helping free up researchers’ attention, though we don’t consider this a major consideration on its own.
Historically, early-stage scientific work has often been done by people who were solitary or geographically isolated, perhaps because this makes it easier to slowly develop a new way to factor the phenomenon, instead of repeatedly translating ideas into the current language others are using [emphasis added]. It’s difficult to describe how much mental space and effort turns out to be taken up with thoughts of how your research will look to other people staring at you, until you try going into a closed room for an extended period of time with a promise to yourself that all the conversation within it really won’t be shared at all anytime soon.
Once we realized this was going on, we realized that in retrospect, we may have been ignoring common practice, in a way. Many startup founders have reported finding stealth mode, and funding that isn’t from VC outsiders, tremendously useful for focus. For this reason, we’ve also recently been encouraging researchers at MIRI to worry less about appealing to a wide audience when doing public-facing work. We want researchers to focus mainly on whatever research directions they find most compelling, make exposition and distillation a secondary priority, and not worry about optimizing ideas for persuasiveness or for being easier to defend.

Yes, this very much resonates with me, especially because a parallel issue exists in biosecurity, where we don't want to talk publicly about how to work to prevent things that we're worried about because it could prompt bad actors to look into those things.

The issues here are different, but the need to have walls between what you think about and what you discuss imposes a real cost.

When it comes to disclosure policies, if I'm uncertain between the "MIRI view" and the "Paul Christiano" view, should I bite the bullet and back one approach over the other? Or can I aim to support both views, without worrying that they're defeating each other?

My current understanding is that it's coherent to support both at once. That is, I can think that possibly intelligence needs lots of fundamental insights, and that safety needs lots of similar insights (this is supposed to be a characterisation of a MIRI-ish view). I can think that work done on figuring out more about intelligence and how to control it should only be shared cautiously, because it may accelerate the creation of AGI.

I can also think that prosaic AGI is possible, and fundamental insights aren't needed. Then I might think that I could do research that would help align prosaic AGIs but couldn't possibly align (or contribute to) an agent-based AGI.

Is the above consistent? Also do people (with better emulators of people) who worry about disclosure think that this makes sense from their point of view?

OP is quite long, let me copy over some interesting sections to reduce trivial inconveniences to discussion. This section is especially interesting, trying to explicate the 'deconfusion' concept:

By deconfusion, I mean something like “making it so that you can think about a given topic without continuously accidentally spouting nonsense.”
To give a concrete example, my thoughts about infinity as a 10-year-old were made of rearranged confusion rather than of anything coherent, as were the thoughts of even the best mathematicians from 1700. “How can 8 plus infinity still be infinity? What happens if we subtract infinity from both sides of the equation?” But my thoughts about infinity as a 20-year-old were notsimilarly confused, because, by then, I’d been exposed to the more coherent concepts that later mathematicians labored to produce. I wasn’t as smart or as good of a mathematician as Georg Cantor or the best mathematicians from 1700; but deconfusion can be transferred between people; and this transfer can spread the ability to think actually coherent thoughts.
In 1998, conversations about AI risk and technological singularity scenarios often went in circles in a funny sort of way. People who are serious thinkers about the topic today, including my colleagues Eliezer and Anna, said things that today sound confused. (When I say “things that sound confused,” I have in mind things like “isn’t intelligence an incoherent concept,” “but the economy’s already superintelligent,” “if a superhuman AI is smart enough that it could kill us, it’ll also be smart enough to see that that isn’t what the good thing to do is, so we’ll be fine,” “we’re Turing-complete, so it’s impossible to have something dangerously smarter than us, because Turing-complete computations can emulate anything,” and “anyhow, we could just unplug it.”) Today, these conversations are different. In between, folks worked to make themselves and others less fundamentally confused about these topics—so that today, a 14-year-old who wants to skip to the end of all that incoherence can just pick up a copy of Nick Bostrom’s Superintelligence.6
Of note is the fact that the “take AI risk and technological singularities seriously” meme started to spread to the larger population of ML scientists only after its main proponents attained sufficient deconfusion. If you were living in 1998 with a strong intuitive sense that AI risk and technological singularities should be taken seriously, but you still possessed a host of confusion that caused you to occasionally spout nonsense as you struggled to put things into words in the face of various confused objections, then evangelism would do you little good among serious thinkers—perhaps because the respectable scientists and engineers in the field can smell nonsense, and can tell (correctly!) that your concepts are still incoherent. It’s by accumulating deconfusion until your concepts cohere and your arguments become well-formed that your ideas can become memetically fit and spread among scientists—and can serve as foundations for future work by those same scientists.
Interestingly, the history of science is in fact full of instances in which individual researchers possessed a mostly-correct body of intuitions for a long time, and then eventually those intuitions were formalized, corrected, made precise, and transferred between people. Faraday discovered a wide array of electromagnetic phenomena, guided by an intuition that he wasn’t able to formalize or transmit except through hundreds of pages of detailed laboratory notes and diagrams; Maxwell later invented the language to describe electromagnetism formally by reading Faraday’s work, and expressed those hundreds of pages of intuitions in three lines.
An even more striking example is the case of Archimedes, who intuited his way to the ability to do useful work in both integral and differential calculus thousands of years before calculus became a simple formal thing that could be passed between people.
In both cases, it was the eventual formalization of those intuitions—and the linked ability of these intuitions to be passed accurately between many researchers—that allowed the fields to begin building properly and quickly.7

And footnote 7:

Historical examples of deconfusion work that gave rise to a rich and healthy field include the distillation of Lagrangian and Hamiltonian mechanics from Newton’s laws; Cauchy’s overhaul of real analysis; the slow acceptance of the usefulness of complex numbers; and the development of formal foundations of mathematics.

I'd be interested if anyone can add insight to the examples discussed in the footnote. I'm also curious if any further examples seem salient to people, or alternatively if this frame seems itself confused about how certain key types of insights come about.

Good post. I'm broadly supportive of MIRI's goal of "deconfusion" and I like the theoretical emphasis of their research angle.

To help out, I'll suggest a specific way in which it seems to me that MIRI is causing themselves unnecessarily confusion when thinking about these problems. From the article:

I can observe various reasons why we shouldn’t expect the strategy “train a large cognitive system to optimize for X” to actually result in a system that internally optimizes for X, but there are still wide swaths of the question where I can’t say much without saying nonsense.

In the mainstream machine learning community, the word "optimization" is almost always used in the mathematical sense: discovering a local or global optimum of a function, e.g. a continuous function of multiple variables. In contrast, MIRI uses "optimization" in two ways: sometimes in this mathematical sense, but sometimes in the sense of an agent optimizing its environment to match some set of preferences. Although these two operations share some connotational similarities, I don't think they actually have much in common--it seems like the algorithms we've discovered to perform these two activities are often pretty different, and the "grammatical structure"/"type signature" of the two problems certainly seem quite different. Robin Hanson has even speculated that the right brain does something more like the first kind of optimization and the left brain does something more like the second.

One group that isn't considered in the analysis is new trainees. It seems that AGI is probably sufficiently far off, that many of the people who will make the breakthroughs are not yet researchers or experts. If you are a bright young person who might work at MIRI or somewhere similar in 5 years time, you would want to get familiar with the area. You are probably reading MIRI's existing work, to see if you have the capability to work in the field. This means that if you do join MIRI, you have already been thinking along the right lines for years.

Obviously you don't want your discussions live streamed to the world, you might come up with dangerous ideas. But I would suggest sticking things online once you understand the area sufficiently well to be confident its safe. If writing it up into a fully formal paper is too time intensive, any rough scraps will still be read by the dedicated.

Yup. As someone aiming to do their dissertation on issues of limited agency (low impact, mild optimization, corrigibility), it sure would be frustrating to essentially end up duplicating the insights that MIRI has on some new optimization paradigm.

I still understand why they’re doing this and think it’s possibly beneficial, but it would be nice to avoid having this happen.

From this post:

You can find a number of interesting engineering practices at NASA. They do things like take three independent teams, give each of them the same engineering spec, and tell them to design the same software system; and then they choose between implementations by majority vote. The system that they actually deploy consults all three systems when making a choice, and if the three systems disagree, the choice is made by majority vote. The idea is that any one implementation will have bugs, but it’s unlikely all three implementations will have a bug in the same place.

One could make an argument for multiple indepedent AI safety teams on similar grounds:

  • "any one optimization paradigm may have weaknesses, but it's unlikely that multiple optimization paradigms will have weaknesses in the same place"

  • "any one team may not consider a particular fatal flaw, but it's unlikely that multiple teams will all neglect the same fatal flaw"

In the best case, you can merge multiple paradigms & preserve the strengths of all with the weaknesses of none. In the worst case, having competing paradigms still gives you the opportunity to select the best one.

This works best if individual researchers/research teams are able to set aside their egos and overcome not-invented-here syndrome to create the best overall system... which is a big if.

Small moderation note: The linked post contains a bunch of organization updates plus a hiring pitch, which is off-topic for the frontpage. However, the post also contains a bunch of very important gems of theory and insight, and I wanted to make sure those can be discussed here. I think it's better to keep discussion of logistical details and solicitations for donations off the frontpage, but I think it's fine to discuss "MIRI's epistemic state" in a similar fashion as you would discuss the epistemic state of a person on the frontpage.

Hm. I wonder what an "alternative" to neural nets and gradient descent would look like. Neural nets are really just there as a highly expressive model class that gradient descent works on.

One big difficulty is that if your model is going to classify pictures of cats (or go boards, etc.), it's going to be pretty darn complicated, and I'm sceptical that any choice of model class is going to prevent that. But maybe one could try to "hide" this complexity in a recursive structure. Neural nets already do this, but convnets especially mix up spatial hierarchy with logical hierarchy, and nns in general aren't as nicely packaged into human-thought-sized pieces as maybe they could be - consider resnets, which work well precisely because they abandon the pretense of each neuron being some specific human-scale logical unit.

So maybe you could go the opposite direction and make that pretense a reality with some kind of model class that tries to enforce "human-thought-sized" reused units with relatively sparse inter-unit connections? Could still train with SGD, or treat hypotheses as decision trees and take advantage of that literature.

But suppose we got such a model class working, and trained it to recognize cats. Would it actually be human-comprehensible? Probably not! I guess I'm just not really clear on what "designed for transparency and alignability" is supposed to cash out to at this stage of the game.