LESSWRONG
LW

AI RiskAI Risk SkepticismAI
Personal Blog

0

Too Many Metaphors: A Case for Plain Talk in AI Safety

by David Harket
30th May 2025
3 min read
8

0

AI RiskAI Risk SkepticismAI
Personal Blog

0

Too Many Metaphors: A Case for Plain Talk in AI Safety
4TAG
1David Harket
3TAG
1David Harket
2don't_wanna_be_stupid_any_more
2David Harket
2Arcturus
1David Harket
New Comment
8 comments, sorted by
top scoring
Click to highlight new comments since: Today at 12:40 PM
[-]TAG3mo40

Current utility functions do not optimise toward values in which humans are treated as morally valuable.

You mean "utility functions" of current AI's? How literally do you mean UF? Current AI's dont have UF's in the technical sense.

Reply
[-]David Harket3mo10

You're right, current AIs don’t have utility functions in the strict formal sense. I was using the term loosely to refer to the optimisation objectives we train them on, like loss functions or reward signals. My point is that the current objectives do not reliably reflect human moral values. Even if today’s systems aren't agents in Yudkowsky’s sense, the concern still applies -> as systems gain more general capabilities, optimisation toward poorly aligned goals can have harmful consequences.

Reply
[-]TAG3mo30

I was using the term loosely

Yet calling for literalism!

My point is that the current objectives do not reliably reflect human moral values

Assuming there is such a coherent entity. And assuming that it is still a problem when the AI is not an agent.

The historic AI doom arguments have a problem: that they assume a bunch of things which aren't necessarily true. And many renditions of them for.public consumption have a further problem: that they gesture towards these assumptions as though they are widely accepted when they are not. The general public will reject an argument using the term "utility function" because they don't know what it is; and those knowledgeable about AI will reject it because they do. ..in their eyes , you are saying something false. But you need to come up with arguments that are valid before you worry about the PR issue.

Reply
[-]David Harket3mo10

I’d say there’s a meaningful distinction between literalism and what I’m advocating. I’m not arguing for rigid formalism or abandoning all metaphor. I’m calling for clarity, accessibility, and a prioritisation of core arguments, especially when communicating with people outside the field.

Your first critique concerns my statement that "Current utility functions do not optimise toward values in which humans are treated as morally valuable." I agree this could have been phrased more precisely, for example: “Current AI systems are not necessarily trained to pursue goals that treat humans as morally valuable.” That’s a fair point. I was using “utility function” loosely to refer to optimisation objectives (e.g., loss functions, reward signals) not in the strict agent-theoretic sense.

But the purpose of my post isn’t to adjudicate whether the conclusions drawn by Yudkowsky and others are true or not. I fully acknowledge that the arguments rest on assumptions and that there’s room for serious debate about their validity. I should (and probably will) think more about that (after my exams).

What I am addressing in this post is a communication issue. Even if we accept the core arguments about the risks of developing powerful misaligned AI systems, such as those based on instrumental convergence and the orthogonality thesis, I believe these risks are often communicated in ways that obscure rather than clarify. This is particularly true when metaphors become the primary framing, which can confuse people who are encountering these ideas for the first time.

So to clarify: I’m not trying to resolve the epistemic status of AI risk claims. I’m making a narrower point about how they’re presented, and how this presentation may hinder public understanding or uptake. That’s the focus of the post.

Reply
[-]don't_wanna_be_stupid_any_more3mo20

i agree that EY's earlier attempts at advocating for AI safety were not the best and often counter productive, but i think you are underestimating just how hard it is to communicate those ideas to a lay audience, i for example tried to discuss this topic with a few friends from my universities IT faculty ,they are NOT lay people and they have background knowledge and yet despite studying this subject for over a year i was unable to get my point across.

talking to people is a skill that needs training, you can't expect someone no matter how smart or knowledgeable they are to come out the gate swinging with max charisma, some mistakes need to be made.

EY has improved over the last year, his latest appearance on robinson erhardt was a major improvement.

Reply
[-]David Harket3mo20

I agree that no matter how smart or knowledgeable someone is, it's rare to come out of the gate with perfect communication skills. And I agree these ideas are genuinely hard to convey to non-experts.

That said, my intuition is that the risk of AGI is better communicated through distilled versions of the core arguments, like instrumental convergence and the orthogonality thesis, rather than via anthropomorphic or futuristic metaphors.

For example, I recently tried to explain AGI risk to my dad. I started with the basics: the problem of misaligned AGI, current alignment limitations, and how optimisation at scale could lead to unintended consequences. But his takeaway was essentially: “Sure, it’s risky to give powerful tools to people with different values than mine, but that’s not existential.”

I realised I hadn’t made the actual point. So I clarified: the danger isn’t just in bad actors using AI, but in AIs themselves pursuing goals that are misaligned with human values, even if no humans are malicious.

I used a simple analogy: “If you're building a highway and there’s an anthill in the way, you don’t hate the ants—you just don’t care.” That helped. It highlighted two things: (1) we already treat beings with different moral weight indifferently, and (2) AIs might do the same, not out of malice but out of indifference shaped by their goals.

My point isn’t that analogies are always bad. But they work best when they support a precise concept, not when they replace it. That’s why I think Yudkowsky’s earlier communication sometimes backfires. It leaned too hard on the metaphor and not enough on the logic (that definitely exists. I have a high regard for his work). I’ll check out the Robinson Erhardt interview, though; if it’s a shift in tone, that’s good to hear.

Reply
[-]Arcturus3mo20

His metaphors serve as useful intuition pumps for ideas that are highly abstract and far removed from what the average layperson has thought about prior. When I started reading about AI risk, many of the ideas I was introduced to, like optimization and orthogonality, were completely novel and I had a hard time understanding them. The various metaphors, analogies, and parables that exist within the AI-risk discussion were of significant benefit to me in gaining the necessary intuitions to understand the problem. 

Reply
[-]David Harket3mo10

Thank you for the perspective. I mostly agree with your points, but I still feel the abstract metaphors unnecessarily detract from the core arguments. While this style may help more people grasp that "this group", represented by Yudkowsky, sees AGI as an existential threat, I believe the excessive use of metaphor weakens the clarity and force of the underlying reasoning.

To draw an analogy :-) it feels a bit like teaching Sunday school Christianity to adults: offering simplified narratives while obscuring the more nuanced structures behind the beliefs.It might serve as a useful entry point for some, but for many, it comes across as unrealistic and simply too far removed from their lived experience to be taken seriously. (For the record, I’m not a Christian.)

That said, I do see your point, and I may be mistaken in my intuition. My observation is anecdotal, and you've just provided a counterexample to the ones I’ve encountered.

Reply
Moderation Log
More from David Harket
View more
Curated and popular this week
8Comments

I changed the title of the post from "Yudkowsky does not do alignment justice" to "Too Many Metaphors: A Case for Plain Talk in AI Safety". In hindsight, the title might have been a bit too sensationalist and not clear. Yudkowsky has done more for the field than most. The purpose of this post is not to disregard any of his words, but rather to share observations and ideas about how to potentially communicate the field's relevance more effectively.

For the past few months, I have been immersing myself in the AI safety debate here on LessWrong as a part of a research project. I was somewhat familiar with the topic of alignment, and had previously read Stuart Russell and listened to a few podcasts with Paul Christiano and Eliezer Yudkowsky. Starting off, I read AGI Ruin: A list of lethalities, where I was prompted to familiarise myself with the Orthogonality Thesis and Instrumental Convergence. Upon reading Yudkowsky's posts on these two key topics, they made immediate sense, and the risk associated with a lack of alignment in sufficiently intelligent systems seemed apparent to follow from this.

At a high level, the issue can be framed simply: Current utility functions do not optimise toward values in which humans are treated as morally valuable.

From this point on, assuming exponential growth in AI systems, there are a multitude of scenarios one can imagine in which a system optimises towards a goal in which humanity is a casualty as a result of the optimisation process (e.g., build a whole brain emulator). I see this risk, and I see other current risks such as those related to open weights models with alignment measures fine-tuned away being accessible to bad actors. However, I believe Yudkowsky’s current communication approach may be counterproductive when talking about alignment, especially in formats where people outside the field are being introduced to alignment.

For instance, I just listened to a debate between Stephen Wolfram and Yudkowsky in which the majority of the discussion circled around defining positions. I must say that in these four hours, Yudkowsky was able to get some major points across; however, most of the content was obscured by metaphors. Metaphors can be valuable, but when talking about AI safety, I think the most effective way of explaining the risk is to stick to the basics. In this specific podcast, there were a lot of anthropomorphic comparisons made between AGI/ASI and European settlers of America. I see the comparison - European settlers had goals that were incompatible with the Native Americans' existence (to some degree). They wanted "gold", and Native American casualties were a part of the path towards gold. However, when you already have concrete knock-down arguments, so to speak, with orthogonality and instrumental convergence, you should stick to this.

For my project, I have been working with group members who have not immersed themselves to the same extent as I have in the AI safety debate, and often, they use these anthropomorphic comparisons (and similar futuristic outlooks, e.g., nanobots) as points to dismiss the position people like Yudkowsky hold. I think this is uncharitable; however, it seems to be a reality. A goal of communicating risks associated with developing AI systems should be to spread attention AND convince them that this is worth their time thinking about.

To finish off, I have to say that Eliezer Yudkowsky has made invaluable contributions to the AI safety discussion, and that I am grateful for his work. Without him, it would probably not be on my mind. However, communicating these ideas should be simpler. Abstract scenarios should not be the gold standard. The space of possible risk scenarios is nearly endless, and trying to represent this space by presenting abstract examples like nanobot-doom or paperclip maximisers does not convince people, as they are not able to emotionally attach themselves to these scenarios.

The discussion should shift towards the concrete, not the abstract, to convince newcomers.