I Vouch For MIRI

A related thought:

There's an issue with AI Safety funding: many people agree it's important for there to be an AI Safety field, and such a field should grow, evaluate the problem from different angles, build up a body of knowledge and cultivate people who think seriously about the problem full-time.

Non-safety AI Researchers should know that such a field exists, that it worth serious attention.

This idea is popular both because it is true, but (I think) also because it is a tempting way to pass the buck.

If you're a funder who thinks AI Alignment is important, well, you're not sure what actually needs doing, but you can at least agree to fund something generic that "builds the field." A specific approach might be wrong, a generic approach can't "be wrong", and so your bullshit detectors don't see any concrete red flags.

But... at some point, to actually make progress, you need to get into the weeds and make a plan that involves specific assumptions and try to implement that plan.

MIRI is (relatively) unique among AI Safety organizations in that it focuses on particular types of approaches. The researchers at MIRI aren't a monolith, but they tend to share particular assumptions about the constraints of the problem, and what solving the problem would look like. And then trying to actually create that solution.

And they may be wrong in some of their assumptions. But someone has to have a plan that is specific enough that it might be wrong and try to implement that plan.

For this reason, I expect MIRI (and things like it) to be generally underfunded relative to more "generic seeming" organizations that aren't making explicit claims.

If true, this claim should probably push you to be more willing to fund things that are attempting concrete solutions.

(I also happen to think they are not wrong in their assumptions, and one of the things that's impressed me over the years is seeing their strategy shift over time as the landscape of AI changed, which I've found reassuring. I trust them to continue to update as new information comes out)

[-]Raemon8y40

I should note that I'm not sure whether OpenAI is a point against this claim or not (I think not but for complicated reasons). My vague impression is that they do tend to have their own set of assumptions, and are working on reasonably concrete things (I think those assumptions are wrong but am not that confident).

I do lean towards OpenAI and MIRI should both be fully funded, OpenAI just seems to be getting a lot more funding due to Elon's involvement and generally being more "traditionally prestigious".

[-]Raemon8y50

Further thoughts here:

Insofar as I think OpenAI shouldn't be funded, it's because I think it might be actively harmful.

(epistemic status: I am not very informed about the current goings on at OpenAI, this is a random person rambling hearsay and making the best guesses they can without doing a thorough review of their blog, let alone talking to them)

The reasons it might be actively harmful is because it seems like a lot of their work is more like actually developing AI instead of AI Safety, and sharing AI developments with the world that might accelerate progress.

MIRI is the only organization I know of working directly on AI safety that I've heard talk extensively about differential-technological-development. i.e, do research that would only help build Aligned AGI and doesn't accelerate generic AI that might feed into Unaligned AGI.

[-]cousin_it8y30

Zvi, it's great that you vouch for MIRI! But I'm not sure it's always a good idea to use logical decision theories in real life. It's kind of an open problem actually. Wei had a great counterexample called 2TDT-1CDT. Imagine a tournament of blind one-shot prisoner's dilemma played among many TDT agents and one CDT agent. The TDT agents will cooperate because the opponent is likely to be TDT, while the CDT agent will defect and win. The relevant math idea is evolutionarily stable strategy, I think.

[-]Zvi8y20

I agree that it's an open problem how best to make decisions in life. I do think that it's obvious that FDT>CDT as a policy, even if (or especially if!) you're a human and doing either one approximately. To put it more simply, ignoring the things FDT cares about and CDT does not care about seems like a pretty obvious mistake, and also a pretty easy way, if adopted by too many people, to have things go quite badly.

Interesting example there! But I don't think that example shows CDT>FDT even in this isolated designed-to-counterexample-me case. And I think that the fact that we're trying to construct an example where CDT>FDT should in and of itself be strong evidence of which you should choose.

One of these two things happens, depending on your perspective:

Zvi runs FDT. Omega approaches Zvi and presents this situation. I think to myself "Is my decision to submit an FDT bot correlated to Omega's decision to submit FDT bots?" And my response is, no. So I submit CDT, or probably just "D". So CDT=FDT here because we both submit the same bot.
Same setup. Zvi submits an FDT bot with the knowledge there are two bots submitted by Omega that were chosen first, this bot realizes it isn't correlated to Omega's bots (since they coopreate even if they know my bot defects), so it defects. Omega's bots realize they are correlated, so they cooperate. Same result. Setup matters.

The FDT agents are still making the right decision and still winning. Yes, the CDT agent doing better is annoying, but that's by definition not in your utility function, and you don't lose because someone else did better. The paperclip maximizer maximized paperclips, the staple maximizer maximized staples, versus either of them defecting or self-modifying to CDT or what not which goes very badly for them. The FDT agents are doing the right thing once the tournament starts. Note that if they submit one CDT and one FDT, you are forced to submit FDT, which is now correlated to the other FDT agent, which means the two of you now coopreate. So Omega isn't maximizing by submitting two FDT agents, but Omega rarely does. Omega is a shifty one. Odd motivations.

[-][anonymous]8y20

I'm with Calvin. I'm think misaligned agi is more likely to do local bad things when first invented . There are more partially rational agis than fully rational ones, so I expect those to be found first.

I do think it important who develops safe agi first. Whether there is the chance for a last minute power grab during the development process or not.

[-]t3tsubo8y10

Can you expand on this point:

"If we do build an AGI, its actions will determine what is done with the universe.

If the first such AGI we build turns out to be an unfriendly AI that is optimizing for something other than humans and human values, all value in the universe will be destroyed. We are made of atoms that could be used for something else."

Especially in the sense of why the first unfriendly AI we built will immediately be uncontainable and surpass current human civilization's ability to destroy it?

[-]Raemon8y60

Sorry I took awhile to find this. I think the best short answer to this question is Scott Alexander's Superintelligence FAQ.

The arguments involved here deal with a fair number of areas where human intuitions just aren't very good (i.e. we're used to dealing with linear systems, not exponential ones). So if Scott's post isn't persuasive to you, the next best answer I have is unfortunately "read the sequences."

[-]Zvi8y40

Thanks for asking! This has been the subject of much virtual ink, much of it here (e.g. the book Superintelligence, and much of Eliezer's writings, including the whole "Foom" debate with Robin Hanson). Rather than try to summarize that debate, I'll encourage others to chime in with the links and explanations they think are best, since this is an important thing to get right and I'd like to see what our community currently thinks is the best way to explain this. I'm not thrilled that I don't have a great go-to here.

If no one comes up with anything I'm happy with within the week, I'll see if I can do better.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

39

39

39