Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Epistemic status: Musings from the past month. Still far too vague for satisfaction.


I have been away on a retreat this past week, seeking clarity on how to move forward with living a vibrant and beneficial life that resolves problems in the world, but I’m afraid that I have come back empty-handed. I have only vague musings of a bigger picture, but no clear sense for how to take decisive action. I’ll try to share what I have succinctly, so as not to take up too much time.

Understand alignment, not intelligence

Look, our task here is to align the systems that will most influence the future with what is actually good. To do that, we should look out at the world, identify which kinds of systems are most influential, and seek to align them to the benefit of all life on the planet. Intelligence is the means by which certain very powerful systems could have a very large influence over the future, and to that end we ought to be interested in understanding intelligence. But we need not have any particular interest in understanding intelligence for its own sake. What we should be interested in understanding is the means by which any system can exert influence over the future, and the means to align such powerful systems with that which is worth protecting.

Align systems, not AI

There has been great debate about what AI might look like. Will it look like a singleton, or like a tool, or like a set of cloud services, or like a society of competing entities? One person says that a powerful singleton might be dangerous, then another person says that AI might not look much like a powerful singleton.

Yet there is a single unifying issue to resolve here, which is this: how do we build things in the world that are and remain consistently beneficial to all life? How do we construct international treaties that are aligned in this way? How do we construct financial systems that are aligned in this way? How do we construct tools that are aligned in this way? How do we construct belief-forming, observation-making, action-taking agents that are aligned in this way? These questions are connected in a deep, not surface-level way, because they all come down to clarifying what is good and implementing it in a tangible system.

There is a hard problem of alignment

There are many difficult problems in AI alignment, but there seems to be one problem at the center that has an entirely different character of difficulty. The hard problem, as I see it, is this: how do we set up any system in a way that is aligned with what is actually good, when any particular operationalization of what is good is certain to be wrong?

The world now looks to us

In the early days of AI safety, there was a narrative that the world was mostly not on our side, that it was our job to beat the world over the head with the hard stick of difficult truths about dangers of advanced AI in order to wake people up to the impending destruction of life on this planet. This was a good narrative to have in the early days, and it served its purpose, but it is no longer serving us. I think that a better narrative to have now is the following.

The world is like an extremely wealthy but depressed person who realizes that their business empire is rapidly causing the destruction of life, and despite not finding the energy to make sweeping changes on their own, summons just enough clarity to make a large financial gift to a deputy who seems unusually agentic and trustworthy and ethical. That deputy -- that is, us, this community -- faces the difficult task of reforming an empire that is caught up in harmful patterns of politics and finance and prestige, so it is not exactly the case that everyone is "on their side", yet almost everyone in the empire sees that things are not going well, and in moments of clarity urges this deputy onwards, even if they soon return to participate in the very patterns that they hope the deputy will help to resolve.

We are the great hope of our civilization. Us, here, in this community. It is not that our civilization has woken up completely to the dangers of advanced AI. It is that our civilization has not woken up, yet wishes to wake up, and knows that it wishes to wake up, and has found just enough clarity to bestow significant power and resources to us in the hope that we will take up leadership.

In this subtle way, everyone is now on our side. Yet everyone is caught up in the very patterns that, at moments of clarity, they see are causing harm. Our job is to find the resolve to move forward with this difficult task, without getting caught up in the harmful patterns that exist in the world, and without losing track of the subtle way in which everyone is on our side.

This is the story. It is a way of seeing things, an ethos for carrying on with a difficult task that requires coordination with many people. It is a good way of seeing things to the extent that, if we chose to see things in this way, our actions would be beneficial to all life. It seems to me that seeing things this way would indeed be beneficial to all life because it calls us to befriend exactly that within everyone that seeks The Good, without giving even the tiniest accommodation to the patterns of behavior that are causing existential risk.

New Comment
11 comments, sorted by Click to highlight new comments since: Today at 6:11 PM

us, this community [...] We are the great hope of our civilization. Us, here, in this community

This kind of self-congratulatory bluster helps no one. You only get credit for having good ideas and effective plans, not being "One of us, the world-saving good guys."

It's certainly not about feeling good by identifying with a certain group. That helps no one, I agree.

But I'm sorry, the world simply is turning to this community for leadership. That is a thing that is happening in the world. There is a lot of very clear evidence. It is an extremely startling thing. It's not about anyone getting or not getting credit. It's about seeing things as they are.

But I'm sorry, the world simply is turning to this community for leadership. That is a thing that is happening in the world. There is a lot of very clear evidence.

Name three pieces of evidence?

Upvoted for specificity, but I would characterize this as "we have some degree of influence and reputation" rather than "the world is turning to us for leadership". (I guess from the "somewhat influential" in your other comment that you agree.)

Yeah, the exaggeration didn't seem like a crux for anything important.

Okay, I can see that, but as a writing tip for the future, rhetoric in the vein of "We are the great hope of our civilization" looks heavily optimized for the feeling-good-about-group-identification thing, rather than merely noticing the startling fact of being somewhat influential. And the startling fact of being somewhat influential makes it much more critical not to fall into the trap of valuing the group's brand, if the reputational pressures of needing to protect the brand make us worse at thinking.

Agreed, but there are additional considerations here. The way that we interact with the wider world is influenced by the stories we tell ourselves about our relationship with the world, so narratives about our relationship with the world affect not just our sense of whether we are doing a good job, but also the tone with which we speak to the world, the ambition of our efforts, and the emotional impact of what we hear back from the world.

If we tell ourselves stories in which the world is mostly not on our side then we will speak to the world coercively, we'll shy away from attempting big things, and we'll be gradually worn down as we face difficulties.

But if we see, correctly, I believe, that most people actually have brief moments in which they can appreciate the dangers of powerful agentic systems being developed through ham-fisted engineering methods, and that the most switched-on people in the world seem to be turning to this particular community on these issues, then we might adopt quite a different internal demeanor as we approach these problems, not because we give ourselves some particular amount of credit for our past efforts, but because we see the world as fundamentally friendly to our efforts, without underestimating the depth and reality of the problems that need to be resolved.

I think this issue of friendliness is really the most central point. So far as I can tell, it makes a huge difference to see clearly what it is in the world that is fundamentally friendly to one's efforts. Of course it's also critical not to mistake that which is not friendly to our efforts as being friendly to our efforts. But if one doesn't see that which is friendly towards us, then things just get lonely and exhausting real fast, which is doubly tragic because there is in fact something very real that really is deeply friendly towards our efforts.

That’s an inspiring narrative that rings true to me, I’m sure I will think on that framing more. Thank you.

Yet there is a single unifying issue to resolve here, which is this: how do we build things in the world that are and remain consistently beneficial to all life

I thought one of the motivations for talking about friendliness, as opposed to objective goodness is that objective goodness might not be good for us...for instance , an ASI might side with extreme environmentalism and decide that humans need to be greatly reduced in number to give other species a break.

Yes, that is an incredibly important issue in my view. I would consider the construction of an AI that took a view of extreme environmentalism and went on to kill large numbers of humans a terrible error. In fact I would consider the construction of an AI that would take any particular operationalization of some "objective good" through to the end of the world would be a very big error, since it seems to me that any particular operationalization of "good" leads, eventually, to something that is very obviously not good. You can go case-by-case and kind of see that each possible operationalization of "good" misses the mark pretty catastrophically, and then after a while you stop trying.

Yet we have to build things in the world somehow, and anything we build is going to operationalize its goals somehow, so how can possibly proceed? This is why I think this issue deserves the mantle of "the hard problem of alignment".

It doesn't necessarily help to replace "goodness" with "friendliness", although I do agree that "friendliness" seems like a better pointer towards the impossibly simple kind of benevolence that we seek to create.

A second point I think is underlying your comment (correct me if I'm wrong) is that perhaps there is some objective good, but that it isn't good for us (e.g. extreme environmentalism). I think this is a very reasonable concern if we imagine that there might be some particular operationalization of objective goodness that is the one-and-only final operationalization of objective goodness. If we imagine that such an operationalization might one day be discovered by us or by an AI, then yes, it's well worth asking whether this operationalization if in fact good for us. But luckily I don't think any such final operationalization of objective goodness exists. There just is no such thing, in my view.

Our task, then, in my view, is to make sure we don't build powerful systems that behave as though there is some final operationalization of objective goodness. Yet it seems that any tangible system whatsoever is going to behave according to some kind of operationalization of terminal goals implicit in its design. So if these two claims are both true then how the heck do we proceed? This is again what I am calling the "hard problem of alignment".