Jonathan Uesato

Comments

What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)

Thanks for the great post. I found this collection of stories and framings very insightful.

1. Strong +1 to "Problems before solutions." I'm much more focused when reading this story (or any threat model) on "do I find this story plausible and compelling?" (which is already a tremendously high bar) before even starting to get into "how would this update my research priorities?"

2. I wanted to add a mention to Katja Grace's "Misalignment and Misuse" as another example discussing how single-single alignment problems and bargaining failures can blur together and exacerbate each other. The whole post is really short, but I'll quote anyways:

I think a likely scenario leading to bad outcomes is that AI can be made which gives a set of people things they want, at the expense of future or distant resources that the relevant people do not care about or do not own...
When the business strategizing AI systems finally plough all of the resources in the universe into a host of thriving 21st Century businesses, was this misuse or misalignment or accident? The strange new values that were satisfied were those of the AI systems, but the entire outcome only happened because people like Bob chose it knowingly (let’s say). Bob liked it more than the long glorious human future where his business was less good. That sounds like misuse. Yet also in a system of many people, letting this decision fall to Bob may well have been an accident on the part of others, such as the technology’s makers or legislators.

In the post's story, both "misalignment" and "misuse" seem like two different, both valid, frames on the problem.

3. I liked the way this point is phrased on agent-agnostic and agent-centric (single-single alignment-focused) approaches as complementary.

The agent-focused and agent-agnostic views are not contradictory... Instead, the agent-focused and agent-agnostic views offer complementary abstractions for intervening on the system... Both types of interventions are valuable, complementary, and arguably necessary.

At one extreme end, in the world where we could agree on what constitutes an acceptable level of xrisk, and could agree to not build AI systems which exceed this level, and give ourselves enough time to figure out the alignment issues in advance, we'd be fine! (We would still need to do the work of actually figuring out a bunch of difficult technical and philosophical questions, but importantly, we would have the time and space to do this work.) To the extent we can't do this, what are the RAAPs, such as intense competition, which prevent us from doing so?

And at the other extreme, if we develop really satisfying solutions to alignment, we also shouldn't end up in worlds where we have "little human insight" or factories "so pervasive, well-defended, and intertwined with our basic needs that we are unable to stop them from operating."

I think Paul often makes this point in the context of discussing an alignment tax. We can both decrease the size of the tax, and make the tax more appealing/more easily enforceable.

4. I expect to reconsider many concepts through the RAAPs lens in the next few months. Towards this end, it'd be great to see a more detailed description of what the RAAPs in these stories are. For example, a central example here is "the competitive pressure to produce." We could also maybe think about "a systemic push towards more easily quantifiable metrics (e.g. profit vs. understanding or global well-being)"  which WFLL1 talks about or "strong societal incentives for building powerful systems without correspondingly strong societal incentives for reflection on how to use them". I'm currently thinking about all these RAAPs as a web (or maybe a DAG), where we can pull on any of these different levers to address the problem, as opposed to there being a single true RAAP; does that seem right to you?

Relatedly, I'd be very interested in a post investigating just a single RAAP (what is the cause of the RAAP? what empirical evidence shows the RAAP exists? how does the RAAP influence various threat models?). If you have a short version too, I think that'd help a lot in terms of clarifying how to think about RAAPs.

5. My one quibble is that there may be some criticism of the AGI safety community which seems undeserved. For example, when you write "That is, outside the EA / rationality / x-risk meme-bubbles, lots of AI researchers think about agent-agnostic processes," it seems to imply that inside this community, researchers don't think about RAAPs (though perhaps this is not what you meant!) It seems that many inside these circles think about agent-agnostic processes too! (Though not framed in these terms, and I expect this additional framing will be helpful.) Your section on "Successes in our agent-agnostic thinking" gives many such examples.

This is a quibble in the sense that, yes, I absolutely agree there is lots of room for much needed work on understanding and addressing RAAPs, that yes, we shouldn't take the extreme physical and economic competitiveness of the world for granted, and yes, we should work to change these agent-agnostic forces for the better. I'd also agree this should ideally be a larger fraction of our "portfolio" on the margin (acknowledging pragmatic difficulties to getting here). But I also think the AI safety community has had important contributions on this front.