Former AI safety research engineer, now AI governance researcher at OpenAI. Blog:


Replacing fear
Shaping safer goals
AGI safety from first principles

Wiki Contributions


this text demonstrates that the standards by which we measure AI safety are standards which other systems that we do depend upon nevertheless - e.g. other humans - do not hold up to. 

I think we hold systems which are capable of wielding very large amounts of power (like the court system, or the government as a whole) to pretty high standards! E.g. a lot of internal transparency. And then the main question is whether you think of AIs as being in that reference class too.

I feel your story misses the thing that made the original so painful, though - that the joy of the group is supposedly only possible and conceivable due to the suffering of the child, and the fact that the child wants out and begs for it and could be released, but is denied for the sake of the other members, as an active choice against its even most basic human rights:

Yes, I reject this part because I don't think that we live in the least convenient possible world, where cities like Omelas can only be accepted or rejected, never gradually improved.

And so I wanted to ask: could this sort of suffering still happen in a world where things aren't magic, where you can make incremental changes? And I think the answer is yes, for the reasons in the story—which I personally find much more poignant than the original.

+1, was gonna make this comment myself, but TurnTrout said it better.

Yes. I didn't intentionally post this; it seems to have been automatically crossposted from my blog (but I'm not sure why).

I'm open to deleting it, but there are already a bunch of comments; not sure what the best move is.

I'm noticing it's hard to engage with this post because... well, if I observed this in a real conversation, my main hypothesis would be that Alice has a bunch of internal conflict and guilt that she's taking out on Bob, and the conversation is not really about Bob at all. (In particular, the line "That kind of seems like a you problem, not a me problem" seems like a strong indicator of this.)

So maybe I'll just register that both Alice and Bob seem confused in a bunch of ways, and if the point of the post is "here are two different ways you can be confused" then I guess that makes sense, but if the point of the post is "okay, so why is Alice wrong?" then... well, Alice herself doesn't even seem to really know what her position is, since it's constantly shifting throughout the post, so it's hard to answer that (although Holden's "maximization is perilous" post is a good start).

Relatedly: I don't think it's an accident that the first request Alice makes of Bob (donate that money rather than getting takeout tonight) is far more optimized for signalling ingroup status than for actually doing good.

Just noting here that I broadly agree with Said's position throughout this comment thread.

It does seem like a large proportion of disagreements in this space can be explained by how hard people think alignment will be. It seems like your view is actually more pessimistic about the difficulty of alignment than Eliezer's, because he at least thinks it's possible for mechinterp to help in principle.

I think that being confident in this level of pessimism is wildly miscalibrated, and such a big disagreement that it's probably not worth discussing much further. Though I reply indirectly to your point here.

Have edited slightly to clarify that it was "leading experts" who dramatically underestimated it. I'm not really sure what else to say, though...

(COI note: I work at OpenAI. These are my personal views, though.)

My quick take on the "AI pause debate", framed in terms of two scenarios for how the AI safety community might evolve over the coming years:

  1. AI safety becomes the single community that's the most knowledgeable about cutting-edge ML systems. The smartest up-and-coming ML researchers find themselves constantly coming to AI safety spaces, because that's the place to go if you want to nerd out about the models. It feels like the early days of hacker culture. There's a constant flow of ideas and brainstorming in those spaces; the core alignment ideas are standard background knowledge for everyone there. There are hackathons where people build fun demos, and people figuring out ways of using AI to augment their research. Constant interactions with the models allows people to gain really good hands-on intuitions about how they work, which they leverage into doing great research that helps us actually understand them better. When the public ends up demanding regulation, there's a large pool of competent people who are broadly reasonable about the risks, and can slot into the relevant institutions and make them work well.
  2. AI safety becomes much more similar to the environmentalist movement. It has broader reach, but alienates a lot of the most competent people in the relevant fields. ML researchers who find themselves in AI safety spaces are told they're "worse than Hitler" (which happened to a friend of mine, actually). People get deontological about AI progress: some hesitate to pay for ChatGPT because it feels like they're contributing to the problem (another true story); the dynamics around this look similar to environmentalists refusing to fly places. Others overemphasize the risks of existing models in order to whip up popular support. People are sucked into psychological doom spirals similar to how many environmentalists think about climate change: if you're not depressed then you obviously don't take it seriously enough. Just like environmentalists often block some of the most valuable work on fixing climate change (e.g. nuclear energy, geoengineering, land use reform), safety advocates block some of the most valuable work on alignment (e.g. scalable oversight, interpretability, adversarial training) due to acceleration or misuse concerns. Of course, nobody will say they want to dramatically slow down alignment research, but there will be such high barriers to researchers getting and studying the relevant models that it has similar effects. The regulations that end up being implemented are messy and full of holes, because the movement is more focused on making a big statement than figuring out the details.

Obviously I've exaggerated and caricatured these scenarios, but I think there's an important point here. One really good thing about the AI safety movement, until recently, is that the focus on the problem of technical alignment has nudged it away from the second scenario (although it wasn't particularly close to the first scenario either, because the "nerding out" was typically more about decision theory or agent foundations than ML itself). That's changed a bit lately, in part because a bunch of people seem to think that making technical progress on alignment is hopeless. I think this is just not an epistemically reasonable position to take: history is full of cases where even leading experts dramatically underestimated the growth of scientific knowledge, and its ability to solve big problems. Either way, I do think public advocacy for strong governance measures can be valuable, but I also think that "pause AI" advocacy runs the risk of pushing us towards scenario 2. Even if you think that's a cost worth paying, I'd urge you to think about ways to get the benefits of the advocacy while reducing that cost and keeping the door open for scenario 1.

FWIW I think some of the thinking I've been doing about meta-rationality and ontological shifts feels like metaphilosophy. Would be happy to call and chat about it sometime.

I do feel pretty wary about reifying the label "metaphilosophy" though. My preference is to start with a set of interesting questions which we can maybe later cluster into a natural category, rather than starting with the abstract category and trying to populate it with questions (which feels more like what you're doing, although I could be wrong).

Load More