The part of my post that is relevant to AI alignment is right at the end, but I say something similar to Rohin, that we actually have significantly mitigated the effects of Coronavirus but have still failed in a certain specific way -

The lesson to be learned is that there may be a phase shift in the level of danger posed by certain X-risks - if the amount of advance warning or the speed of the unfolding disaster is above some minimal threshold, even if that threshold would seem like far too little time to do anything given our previous inadequacy, then there is still a chance for the MNM effect to take over and avert the worst outcome. In other words, AI takeoff with a small amount of forewarning might go a lot better than a scenario where there is no forewarning, even if past performance suggests we would do nothing useful with that forewarning.

More speculatively, I think we can see the MNM effect’s influence in other settings where we have consistently avoided the very worst outcomes despite systematic inadequacy - Anders Sandberg referenced something like it when he was discussing the probability of nuclear war. There have been many near misses when nuclear war could have started, implying that we can’t have been lucky over and over. Instead that there has been a stronger skew towards interventions that halt disaster at the last moment, compared to not-the-last-moment:

Reply

[-]Daniel Kokotajlo6yΩ240

They are both particularly critical of the idea that we can get general intelligence by simply scaling up existing deep learning models, citing the need for reasoning, symbol manipulation, and few-shot learning, which current models mostly don’t do

Huh. GPT-3 seems to me like something that does all three of those things, albeit at a rudimentary level. I'm thinking especially about its ability to do addition and anagrams/word letter manipulations. Was this interview recorded before GPT-3 came out?

Reply

[-]Steven Byrnes6yΩ230

On the Russell / Pinker debate, I thought Pinker had an interesting rhetorical sleight-of-hand that I hadn't heard before...

When people on the "AGI safety is important" side explain their position, there's kinda a pedagogical dialog:

A: Superintelligent AGI will be awesome, what could go wrong? B: Well it could outclass all of humanity and steer the future in a bad direction. A: OK then we won't give it an aggressive goal. B: Even with an innocuous-sounding goal like "maximize paperclips" it would still kill everyone... A: OK, then we'll give it a good goal like "maximize human happiness". B: Then it would forcibly drug everyone. A: OK, then we'll give it a more complicated goal like ... B: That one doesn't work either because ...

...And then Pinker reads this back-and-forth dialog, removes a couple pieces of it from their context, and says "The existential risk scenario that people are concerned about is the paperclip scenario and/or the drugging scenario! They really think those exact things are going to happen!" Then that's the strawman that he can easily rebut.

Pinker had other bad arguments too, I just thought that was a particularly sneaky one.

Reply

[-]Pattern6y20

Sparsity and interpretability? (Stanislav Böhm et al) (summarized by Rohin): If you want to visualize exactly what a neural network is doing, one approach is to visualize the entire computation graph of multiplies, additions, and nonlinearities. While this is extremely complex even on MNIST, we can make it much simpler by making the networks sparse, since any zero weights can be removed from the computation graph. Previous work has shown that we can remove well over 95% of weights from a model without degrading accuracy too much, so the authors do this to make the computation graph easier to understand.

Are models that are trained as sparse, rather than pruned to be sparse, different from each other? (Especially in terms of interpretability.)

Reply

[-]Rohin Shah6y20

This paper didn't check that, but usually when you train sparse networks you get worse performance than if you train dense networks and then prune them to be sparse.

Reply

Moderation Log

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

19

[AN #104]: The perils of inaccessible information, and what we can learn about AI alignment from COVID

19

Ω 12

19

Ω 12

HIGHLIGHTS

TECHNICAL AI ALIGNMENT

PROBLEMS

INTERPRETABILITY

FORECASTING

OTHER PROGRESS IN AI

META LEARNING

NEWS

FEEDBACK

PODCAST