Great, lucid post. I agree that we don't know if the net result could be negative. Sometimes you don't only look for positive returns like on a balance sheet, but more like a beneficial process. The process itself helps guide some important choices that may not be entirely associated with this field of research for now.
In a way, some people might think AI is nearly at a mature state because of the last couple of years' acceleration. I don't think it is. AI is still in its juvenile stage, and we have a long way to go before security, for instance, is well understood.
We have to be lucid and humble about what's ahead, and be ready to adapt based on new models as well. AGI and continual learning are maybe around the corner, but we'd better be ready before making the turn.
Do you think your list will evolve drastically in the future? Meaning, new unknown things popping up because of new paradigms?
Thanks! Yeah, agreed.
I'm pretty confused about how much to expect new ones. On the one hand, all these bullet points were already known about, in principle, several years ago. On the other hand, it seems unlikely that we know all of them since the space of possible considerations is so vast. So I'm not sure.
We probably know most of them but not all. The real difference might be that they need to reshuffle in importance from time to time. A paradigm shift can bring known-but-minor ones back to the surface, and make new ones appear around them.
Here’s Holden Karnofsky:
I’m not aware of a good list of downside risks for AI safety broadly[1], so I decided to make one.
This is not intended to be fully comprehensive, these are just the ones that I personally take seriously[2][3]:
(This list is taken from a previous post of mine, but I thought it deserved its own top-level reference.)
The closest thing I’m aware of is Safeguarding the Safeguards, but even that is more narrow.
To be clear, I don’t personally think AI safety has been net negative so far, like some do. I wouldn’t even say that I have a properly considered view about it - maybe 60% that it’s been net positive, with very low credal resilience.
But I do feel a vibe of overconfidence in the discourse here sometimes, and I think this can have downstream consequences, e.g. an action bias.
Quickly, here are others that I excluded because I don’t personally see them as potentially major factors, and didn’t want to water down the main list by including a bunch of implausible galaxy-brained stuff:
Holden Karnofsky: “Most things that touch policy at all in any way will move us along that spectrum in one direction or another, so therefore have a high chance of being negative [...]
And then most things that you can do in AI at all will have some impact on policy. Even just alignment research: policy will be shaped by what we’re seeing from alignment research, how tractable it looks, what the interventions look like.” (h/t Anthony DiGiovanni)
Holden Karnofsky: “there’s also a lot of micro ways in which you could do harm. Just literally working in safety and being annoying, you might do net harm. You might just talk to the wrong person at the wrong time, get on their nerves. I’ve heard lots of stories of this. Just like, this person does great safety work, but they really annoyed this one person, and that might be the reason we all go extinct” (h/t Anthony DiGiovanni)
Among other things.
I associate these with people like Richard Ngo (and here) and Oliver Habryka.