LESSWRONG
LW

208
WilliamKiely
915101670
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
5WilliamKiely's Shortform
1mo
2
No wikitag contributions to display.
5WilliamKiely's Shortform
1mo
2
32Why did interest in "AI risk" and "AI safety" spike in June and July 2025? (Google Trends)
Q
1mo
Q
4
10A Cheeky Pint with Anthropic CEO Dario Amodei
1mo
3
21Geoffrey Hinton - Full "not inconceivable" quote
2y
2
63Transcript: NBC Nightly News: AI ‘race to recklessness’ w/ Tristan Harris, Aza Raskin
2y
4
19[Linkpost] GatesNotes: The Age of AI has begun
2y
9
6Can AI systems have extremely impressive outputs and also not need to be aligned because they aren't general enough or something?
Q
3y
Q
3
99DeepMind: The Podcast - Excerpts on AGI
3y
12
62[Expired] 20,000 Free $50 Charity Gift Cards
5y
13
WilliamKiely's Shortform
WilliamKiely1mo10

-5 agreement karma from 3 people, but I have no indication of why people disagree. The point of writing this up was to find out why people disagree, so it'd be helpful if someone offered an explanation for their view.

Reply
WilliamKiely's Shortform
WilliamKiely1mo5-4

P(extinction by 2050 | doom) > 75% is way too high, right?

I commented about this here.

Reply
My views on “doom”
WilliamKiely1mo*1-1

In Vitalik Buterin's recent appearance on Liron Shapira's Doom Debates podcast, Vitalik stated that his p(doom) is currently about 12% and his p(extinction by 2050) is at least 9%. This means Vitalik's p(extinction by 2050 | doom) > 75%, which I think is way too high.

My reading of Paul's views stated here is that Paul's p(extinction by 2050 | doom) is probably < 30%.

I referenced Paul's views in my comment on the Doom Debates YouTube video, which I'm copying here in case someone thinks my reading of Paul's views is mistaken, so they can correct me:

Vitalik's p(extinction by 2050 | doom) > 75%.

Vitalik's numbers concretely: 9+% / 12% > 75%.

I think this >75% is significantly too high.

In other words, I think Vitalik seems to be overconfident in near-term extinction conditional on AI causing doom eventually.

I'm not the only person with high p(doom) (note: my p(doom) is ~60%) who thinks that Vitalik is overconfident about this.

For example, Paul Christiano, based on his numbers from his 2023 "My views on “doom”" post on LessWrong, thinks p(extinction by 2050 | doom) < 43%, and probably < 30%. Paul's numbers:

1. "Probability that most humans die within 10 years of building powerful AI (powerful enough to make human labor obsolete): 20%"

2. "Probability that humanity has somehow irreversibly messed up our future within 10 years of building powerful AI: 46%".

So Paul's p(extinction by 2050 | doom) < 20%/46% = 43%.

Paul's p(extinction by 2050 | doom) is probably actually significantly lower than 43%, perhaps < 30% because "most humans die" does not necessarily mean extinction.

Note: Paul also assigns substantial probability mass to scenarios where less than half of humans die -- He thinks (as of his 2023 post) at least half of humans die in only 50% of takeover scenarios and in only 37% of non-takeover scenarios in which humanity has irreversibly messed up it's future within 10 years of building powerful AI (doom). So I'd guess that his p(all humans die | most humans die within 10 years of building powerful AI) is < 70%, hence why I said his p(extinction by 2050 | doom) is probably < 30% (math: 70% of 43% is 30%).

Another important factor that potentially brings Paul's conditional even lower than 30%: We might not get powerful AI before 2040. So for example, if we get powerful AI in 2045, then even if it causes extinction say 9 years later in 2054, then extinction has not occurred by 2050, even though doom and near-term extinction occur.

So Vitalik's p(extinction by 2050 | doom) > 75% is strongly in disagreement with e.g. Paul Christiano, even though Paul's p(doom) is much higher than Vitalik's.

I suspect that upon consideration of these points Vitalik would raise his unconditional p(doom) while keeping his p(extinction by 2050) roughly the same, but I'd like to actually hear a more detailed discussion with him on this.

My views, approximately:

P(extinction by 2050) = 10%

P(doom) = 60%

P(extinction by 2050 | doom) = 10%/60% = 17%, i.e. way less than 75%.

Reply
Why did interest in "AI risk" and "AI safety" spike in June and July 2025? (Google Trends)
WilliamKiely1mo20

Great observation. If I remove "AI" from the searches, something strange is still happening the last week of July, but it seems you're right that what's happening is not AI safety / AI risk specific, so my interest in the phenomenon is now much reduced.

(I don't know why this is showing up an an Answer rather than just a reply to a comment.)

Reply
A Cheeky Pint with Anthropic CEO Dario Amodei
WilliamKiely1mo10

I commented on the Substack:

John Collison: To put numbers on this, you've talked about the potential for a 10% annual economic growth powered by AI. Doesn't that mean that when we talk about AI risk, it's often harms and misuses of AI, but isn't the big AI risk that we slightly misregulated or we slowed down progress, and therefore there's just a lot of human welfare that's missed out on because you don't have enough AI?

Dario's former colleague at OpenAI, Paul Christiano, has a great 2014 blog post "On Progress and Prosperity" that does a good job explaining why I don't believe this.

In short, "It seems clear that economic, technological, and social progress are limited, and that material progress on these dimensions must stop long before human society has run its course."

"For example, if exponential growth continued at 1% of its current rate for 1% of the remaining lifetime of our sun, Robin Hanson points out each atom in our galaxy would need to be about 10140 times as valuable as modern society."

"So while further progress today increases our current quality of life, it will not increase the quality of life of our distant descendants--they will live in a world that is "saturated," where progress has run its course and has only very modest further effects."

"I think this is sufficient to respond to the original argument: we have seen progress associated with good outcomes, and we have a relatively clear understanding of the mechanism by which that has occurred. We can see pretty clearly that this particular mechanism doesn't have much effect on very long-term outcomes."

Reply
A Cheeky Pint with Anthropic CEO Dario Amodei
WilliamKiely1mo10

Dario Amodei: "Now, I'm not at all an advocate of like, "Stop the technology. Pause the technology." I think for a number of reasons, I think it's just not possible. We have geopolitical adversaries; they're not going to not make the technology, the amount of money... I mean, if you even propose even the slightest amount of... I have, and I have many trillions of dollars of capital lined up against me for whom that's not in their interest. So, that shows the limits of what is possible and what is not."

Anthropic has a March 2023 blog post "Core Views on AI Safety: When, Why, What, and How" that says:

If we’re in a pessimistic scenario [in which "AI safety is an essentially unsolvable problem – it’s simply an empirical fact that we cannot control or dictate values to a system that’s broadly more intellectually capable than ourselves – and so we must not develop or deploy very advanced AI systems"]… Anthropic’s role will be to provide as much evidence as possible that AI safety techniques cannot prevent serious or catastrophic safety risks from advanced AI, and to sound the alarm so that the world’s institutions can channel collective effort towards preventing the development of dangerous AIs. If we’re in a “near-pessimistic” scenario, this could instead involve channeling our collective efforts towards AI safety research and halting AI progress in the meantime. Indications that we are in a pessimistic or near-pessimistic scenario may be sudden and hard to spot. We should therefore always act under the assumption that we still may be in such a scenario unless we have sufficient evidence that we are not.

So Anthropically has has specifically written that we may need to halt AI progress and prevent the development of dangerous AIs, and now we have Dario saying that he is not at all an advocate of pausing the technology, and even even is going so far as to say that it's not possible to pause it.

In the same post, Anthropic wrote "It's worth noting that the most pessimistic scenarios might look like optimistic scenarios up until very powerful AI systems are created. Taking pessimistic scenarios seriously requires humility and caution in evaluating evidence that systems are safe."

It doesn't seem like Dario is doing what Anthropic wrote we should do: "We should therefore always act under the assumption that we still may be in such a [pessimistic] scenario unless we have sufficient evidence that we are not." We clearly don't have sufficient evidence that we are not in such a situation, especially since "the most pessimistic scenarios might look like optimistic scenarios up until very powerful AI systems are created."

Reply
Low P(x-risk) as the Bailey for Low P(doom)
WilliamKiely2mo10

Meaningful use of "doom" seems hopeless.

I agree; it's a terrible term.

FWIW "existential catastrophe" isn't much better. I try to use it as Bostrom usually defined it, with the exception that I don't consider "the simulation gets shuts down" to be an existential catastrophe. Bostrom defined it that way, but in other forecasting domains when you're giving probabilities that some event will happen next year, you don't lower the probability based on the probability that the universe won't exist then because you're in a simulation, and so I don't think we should do that when forecasting the future of humanity either.

Then, as you  allude to, there's also the problem that a future in which 99% of our potential value is permanently beyond our reach, but we go on to realize the remaining 1% of our future, would be an incredibly amazing unbelievably good future--yet should probably still qualify as an existential catastrophe given the definition. People have asked before what percentage of our potential lost would qualify as "drastic" (50% or 99% or what), and for practical purposes I usually mean something like >99.999%. That is, I don't count extremely good outcomes as existential catastrophes merely for being very suboptimal.

I think a better way to define the class of bad outcomes may be in terms of recent-history value. Like if we say "one util" is the intrinsic value of the happiest billion human lives in 2024, then a future of Earth-originating life which is only 1000 utils or less would be extremely suboptimal. This would include all near-term extinction scenarios where humanity fails to blossom in the meantime, but also scenarios where humanity doesn't go extinct for a long time still but also fails to come anywhere close to achieving its potential (i.e. it includes many permanent disempowerment scenarios, etc).

Reply1
Low P(x-risk) as the Bailey for Low P(doom)
WilliamKiely2mo20

In other words, a P(doom) of 20% is perfectly compatible with P(x-risk) of 90-98%.

The phrase "P(x-risk)" doesn't make sense since risks are probabilities so the phrase would mean a probability of a probability. I think what you meant by "P(x-risk)" is actually the probability of an existential risk occurring, which would be the probability of the outcome, i.e. "P(existential catastrophe)."

Assuming that, I agree with your statement, but not for the reason you say:

The reason your statement is technically true is that "P(doom)" refers specifically to existentially catastrophic outcomes caused by AI. So if you thought the probability that other things besides AI cause an existential catastrophe was 70-78%, then your p(existential catastrophe) could be 90-98% while your p(doom) was only 20%.

But otherwise, the terribly ambiguous term "p(doom)" is (as I've always interpreted it unless someone who uses it defines otherwise) synonymous with p(existential catastrophe caused by AI)".

Reply2
DeepMind: The Podcast - Excerpts on AGI
WilliamKiely5mo10

I was happy to see the progression in what David Silver is saying re what goals AGIs should have:

David Silver, April 10, 2025 (from 35:33 of DeepMind podcast episode Is Human Data Enough? With David Silver):

David Silver: And so what we need is really a way to build a system which can adapt and which can say, well, which one of these is really the important thing to optimize in this situation. And so another way to say that is, wouldn't it be great if we could have systems where, you know, a human maybe specifies, what they want, but that gets translated into, a set of different numbers that the system can then optimize for itself completely autonomously.

Hannah Fry: So, okay, an example then let's say I said, okay, I want to be healthier this year. And that's kind of a bit nebulous, a bit fuzzy. But what you're saying here is that that can be translated into a series of metrics like resting heart rate or BMI or whatever it might be. And a combination of those metrics could then be used as a reward for reinforcement learning that, if I understood that correctly?

Silver: Absolutely correctly.

Fry: Are we talking about one metric, though? Are we talking about a combination here?

Silver: The general idea would be that you've got one thing which the human wants like two optimize my health. And and then the system can learn for itself. Like which rewards help you to be healthier. And so that can be like a combination of numbers that adapts over time. So it could be that it starts off saying, okay, well, you know, right now it's your resting heart rate that really matters. And then later you might get some feedback saying, hang on. You know, I really don't just care about that, I care about my anxiety level or something. And then that includes that into the mixture. And and based on on on feedback it could actually adapt. So one way to say this is that a very small amount of human data can allow the system to generate goals for itself that enable a vast amount of learning from experience.

Fry: Because this is where the real questions of alignment come in, right? I mean, if you said, for instance, let's do a reinforcement learning algorithm that just minimizes my resting heart rate. I mean, quite quickly, zero is is like a good minimization strategy that which would achieve its objective, just not maybe quite in the way that you wanted it to. I mean, obviously you really want to avoid that kind of scenario. So how do you have confidence that the metrics that you're choosing aren't creating additional problems?

Silver: One way you can do this is to leverage the the same answer, which has been so effective so far elsewhere in AI, which is at that level, you can make use of of some human input. If it's a human goal that we're optimizing, then we probably at that level need to measure, you know, and say, well, you know, human gives feedback to say, actually, you know, I'm starting to feel uncomfortable. And in fact, while I don't want to claim that we have the answers, and I think there's an enormous amount of research to get this right and make sure that this kind of thing is safe, it could actually help in certain ways in terms of this kind of safety and adaptation. There's this famous example of paving over the whole world with paperclips when, a system's been asked to make as many paperclips as possible. If you have a system which which is really its overall goal is to, you know, support human, well-being. And, and it gets that feedback from humans about and it understands their, their distress signals and their happiness signals and so forth. The moment it starts to, you know, do create too many paperclips and starts to cause people distress, it would adapt that that combination and it would choose a different combination and start to optimize for something which isn't going to pave over the world with paperclips. We're not there yet. Yeah, but I think there are some, some versions of this which could actually end up not only addressing some of the alignment issues that have been faced by previous approaches to, you know, goal focused systems that maybe even, you know, be, be more adaptive and therefore safer than what we have today.

Reply
Transformative VR Is Likely Coming Soon
WilliamKiely5mo10

It's now been 2.5 years. I think this resolves negatively?

Reply
Load More