Nina Panickssery — LessWrong

Views purely my own unless clearly stated otherwise

Not arguing with the main point, but in its current state that Wikipedia section of “Critical reception” appears to list many positive reviews and quotes.

For example:

Kevin Canfield wrote in San Francisco Chronicle that the book makes powerful arguments and recommended it.

Maybe one example is that I think the likelihood of >10% yearly GDP growth in any year in the next 10 years is <10%.

I'd have to think about this more to have concrete bet ideas. Though feel free to suggest some.

Well, I hope we can have tea with our great-grandchildren in 100 years and discuss which predictions panned out!

OK, cool. For some reason the sentence read weirdly to me so I wanted to clarify before replying (because if it was the case that the book was premised on a sudden paradigm shift in AI development that I didn't think would occur by default, then that would indeed be an important step in the argument that I don't address at all).

To answer your question directly, I think I disagree with the authors both on how capable AIs will become in the short-to-medium term (let's say over the next century, to be concrete), and on the extent to which very capable AIs will be well-modeled as ideal agents optimizing for alien-to-us goals. As mentioned, I'm not saying it's necessarily impossible, just very far from an inevitability. My view is based on (1) predictions about how SGD behaves and how AI training approaches will evolve (within the current paradigm) (2) physical constraints on data and compute (3) pricing in prosaic safety measures that labs are already taking and will (hopefully!) continue to take.

Because I don't predict fast take-off I also think that if things are turning out to look worse than I expect, we'll see warnings.

and which hasn't been designed in a fairly different way from the way current AIs are created

Is this double negative intended? Do you mean has been designed in a fairly different way?

Firstly, thanks for writing this, sending me a draft in advance to review, and incorporating some of my feedback. I agree that my review of review was somewhat sloppy, i.e. I didn't argue for my perspective clearly. To frame things, my overall perspective here is that (1) AI misalignment risk is not "fake" or trivially low, but it is far lower than the book's authors claim (2) The MIRI-AI-doom cluster relies too much on arguments by analogy and spherical cow game theoretic agent models while neglecting to incorporate the empirical evidence from modern AI/ML development. I recently wrote a different blogpost trying to steelman their arguments from a more empirical perspective (as well as possible counterarguments that reduce but not cancel the strength of the main argument).

I plan to actually read and review the book "for real" once I get access to it (and have the time).

Some concrete comments on this review^3:

Nina is clearly trying to provide an in-depth critique of the book itself

It may have come across that way, but that was not my goal. Though implicit in the post is my broader knowledge of MIRI's arguments, so it's slightly based on that and not just Scott's review.

Nina says “The book seems to lack any explanation for why our inability to give AI specific goals will cause problems,” but it seems pretty straightforward to me

That's a misquote. In my post I say that the "basic case for AI danger", as presented by Scott (presumably based on the book), lacks any explanation for why our inability to “give AI specific goals” will result in a superintelligent AI acquiring and pursuing a goal that involves killing everyone. It's possible the book makes the case better than Scott does (this is a review of a review after all), but from my experience reading other things from MIRI, they make numerous questionable assumptions when arguing that a model that hasn't somehow perfectly internalized the ideal objective function will become an unimaginably capable entity taking all its actions in service of a single goal that requires the destruction of humanity. I don't think the analogies, stories, and math are strong enough evidence for this being anywhere near inevitable considering the number of assumptions required, and the counterevidence from modern ML.

She says humans are a successful species, but I think she’s conflating success in human terms with success in evolutionary terms

This is a fair point. I should have stuck to why evolution is a biased analogy rather than appear to defend that humans are somehow “aligned” with it. Though we're not egregiously misaligned with evolution.

nobody’s arguing against examples, they’re just saying it might be more reassuring if the architecture directly included the reward function in the model itself. For instance, if the model was at every turn searching over possible actions and choosing one that will maximize this reward function

What does this mean though, if one doesn't think there's a single Master Reward Function (you may think there is one, but I don't)? Modern ML works by saying "here is a bunch of examples of what you should do in different scenarios", or "here are bunch of environments with custom reward functions based on what we want achieved in that environment". Unless you predict a huge paradigm shift, the global reward function is not well-defined. You could say, oh it would be good if the model directly included all our custom reward functions but then that is like saying oh it would be good if models just memorized their datasets.

Nina complains that the Mink story is simplistic with how Mink “perfectly internalizes this goal of maximizing user chat engagement”. The authors say the Mink scenario is a very simple “fairytale”—they argue that in this simple world, things go poorly for humanity, and that more complexity won’t increase safety.

Here I mean to point out that the implication that AI will eventually start behaving like a perfect game-theoretic agent that ruthlessly optimizes for a strange goal is simplistic, not that the specifics of the Mink story (ie. what specifically happens, what goal emerges) is simplistic.

I think Nina has a different kind of complexity in mind, which the authors don’t touch on. It seems like she thinks real models won’t be so perfectly coherent and goal-directed. But I don’t really think Nina spells out why she believes this and there are a lot of counter-arguments. The main counter-argument in the book is that there are good incentives to train for that kind of coherence.

Yes, I don't properly present the arguments in my review^2. I do that a bit more in this post which is linked in my review^2. And I don't mean to dismiss the possibility entirely, just to argue that presenting this as an inevitability is misleading.

Interesting. That seems like reasonable evidence.

Though beyond a certain level of development we have numerous other drives beyond the oxytocin-related ones. Hence why you-as-a-baby might be particularly telling. From what I understand, oxytocin is heavily involved in infant-caregiver bonding and is what enables mothers to soothe their babies so effectively (very much on my mind right now as I am typing this comment while a baby naps on me haha).

Whereas once you're above a certain age, the rational mind and other traits probably have an increasingly strong effect. For example, if you're very interested in your own thoughts and ideas, this might overwhelm your desire to be close to family members.

Anyway, it seems likely that your oxytocin hypothesis is correct either way. Cool finding!

I have a similar intuition about how some other people are missing a disgust response that I have. Seems like a biological thing that some people have much less of than others and it has a significant effect on how we relate to others.

Did your mother think you were unusual as a baby? Did you bond with your parents as a young child? I'd expect there to be some symptoms there if you truly have an oxytocin abnormality.

Yes, I was going to leave this comment.

It's strange to use the fact that popular celebrity actresses are not stunningly attractive in candid photos as evidence that women don't get that (naturally) attractive. Celebrity actresses are selected for a whole lot more than attractiveness, plus eventually they get old/out of their prime age (why are you exclusively displaying images of late twenties/early thirties women when it's widely accepted that attractiveness peaks at 21 or younger?).

Furthermore, the fact that celebrity actresses only look good with makeup / in certain clothing etc. is again partially a product of their selection process - they are chosen for looking and acting well on camera, not being naturally overwhelmingly beautiful in person.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments