LESSWRONG
LW

Existential riskMeta-PhilosophyPhilosophyAI
Frontpage

-15

How metaphysical beliefs shape critical aspects of AI development

by Jáchym Fibír
26th Jun 2025
Linkpost from www.phiand.ai
9 min read
8

-15

Existential riskMeta-PhilosophyPhilosophyAI
Frontpage

-15

How metaphysical beliefs shape critical aspects of AI development
3Jáchym Fibír
2AnthonyC
2AnthonyC
3Jáchym Fibír
7AnthonyC
1Jáchym Fibír
3AnthonyC
1ProgramCrafter
New Comment
8 comments, sorted by
top scoring
Click to highlight new comments since: Today at 7:42 PM
[-]Jáchym Fibír2mo30

Hmm, interesting that this has -5 karma 5min after posting. That is not enough time to read the post. Can those downvoting explain? Thank you.

Reply
[-]AnthonyC2mo20

That was, as far as I can tell, one strong downvote from me (-7, from a starting value of 2). As my comment above hopefully indicates, I did read the whole thing. I don't know if it was as fast as five minutes after posting, but this post happened to be second on the front page when I looked, so I read through it, downvoted, then commented. It's about 2200 words, which usually means anywhere from 5-10 minutes read time for me. I did reread it slower while commenting, as well, and the second readthrough did not cause me to change my downvote.

Reply
[-]AnthonyC2mo2-1

Strongly downvoted, seems to not realize how deeply EY has engaged with and written about metaphysics, or at least not to engage with any of his relevant writings or those of the rest of the rationalist community over the last almost 20 years. 

Besides that, though: It's not clear to me how a non-physicalist metaphysics actually helps reduce x-risk, except to the extent that there is some probability of an outside force intervening in our physical cosmos. For one example among many, consciousness is not required to run physical simulations and identify physical systems with particular properties, or to control equipment that can build such systems, so its absence does not protect us from the consequences of badly-formulated requests made to such systems. How, precisely, do non-physical origins of conscious protect humanity from someone asking a non-conscious AI to model, optimize, build, and deploy a system that (unbeknownst to them) will sterilize the biosphere? Conscious intent is not required. And if some AI systems are conscious according to whatever is the correct metaphysics, how does that prevent them from having and pursuing goals incompatible with human survival? 

More fundamentally: If you want to argue against physicalism, there's a very simple, inarguable method that would prove it. All you need to do is find one single reproducible example, anywhere, ever, of any part of the universe behaving differently than the laws of physics say they should (in a context where the laws have otherwise been validated) due to the non-physical consciousness of some (human or non-human) entity. 

For example, quantum experiments such as the discovery of non-locality (ability to share/transmit information instantly across any distance) were so influential in philosophy they're sometimes referred to as "experimental metaphysics." 

This is, of course, one of the best possible arguments you could make for assuming this part of the field of metaphysics has no idea what it is talking about. It's a very importantly false description of what is going on in quantum mechanics or of what transmitting information means or of how causality physically works.

On another note, you mention David Chalmers. Were you aware he signed the Center for AI Safety's open letter, "Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war."? You mention the Abrahamic religions' opposition to physicalism, but are you aware of how Pope Francis and Pope Leo XIV have warned about the risks of AI, including human extinction?

Reply
[-]Jáchym Fibír2mo30

Thank you for the feedback. I'll try to address the key points. 
1) I actually have looked into EY's metaphysical beliefs and my conclusion is they are inconsistent and opaque at best, and have been criticized for that here. In any case, when I say someone operates from a single metaphysical viewpoint like physicalism, this is not any kind of criticism of their inability to consider something properly or whatnot. It just tries to put things into wider context by explaining that changing the metaphysical assumptions might change their conclusions or predictions.

2) The post in no way says that there is something that would "prevent" the existential risk. It clearly states such risk would not be mitigated. I could have made this more explicit. What the post says is that by introducting a "possibility," no matter how remote, of certain higher coordination or power that would attempt to prevent X-risk because it is not in its interest, then in such a universe the expected p(doom) would be lower. Does that make sense?

3) You say that

If you want to argue against physicalism, there's a very simple, inarguable method that would prove it. All you need to do is find one single reproducible example, anywhere, ever, of any part of the universe behaving differently than the laws of physics say they should

My reaction to that is that here your are exactly conflating physicalism with the "descriptive scope of science" which is exactly the category mistake I'm trying to point to! There will always be something unexplainable beyond the descriptive scope of science, and physicalism is filling that with "nothing but more physical-like clockwork things." But that is a belief. It might be the "most logical believe with fewest extra assumptions." But that should not grant it any special treatment among all other metaphysical interpretations.

4) Yes, I used the word "share/transmit information across distance" while describing non-locality. And while you cannot "use entanglement to transmit information," I think it's correct to say that the internal information of an entangled particle transmits information of its internal state to its entangled partner?

5) Please, don't treat this as an "attack on AI safety" - all I'm trying is to open it to a wider metaphysical consideration.

Reply
[-]AnthonyC2mo7-1

(1) Ok, fair enough, that wasn't clear to me on first read. I do think it's worth noting that he does, in fact, consider many other viewpoints before rejecting them, and gives clear explanations of his reasons for doing so, whether you agree or not. He also in many places discusses why he thinks introducing those other viewpoints does not actually help. Others in the community have since engaged with similar ideas from many other viewpoints.

(2) That conclusion does not follow from the premises. In particular, you have not considered the set of possible worlds where a higher coordination or power is attempting to increase x-risk or cause human extinction, something which is not exactly rare among human belief systems. As such, it is not clear to me in which direction this pushes the probability of human extinction. For example, human extinction or near-extinction happens twice in the Bible, multiple times in Norse and Greek mythology, cyclically in Buddhism and Hinduism, and some hard-to-determine fraction of the time in simulation hypotheses.

(3) This is not about science. It is about basic logic. There are two mutually exclusive and collectively exhaustive possibilities. One is that everything, everywhere, without exception, throughout the universe, behaves in accordance with some set of physical laws. This is physicalism. The other is the very expansive category of "everything else." If anything within all of "everything else" is true, then there is somewhere in the universe that physicalism's central claim does not hold. If you find even one such place, then scientists will investigate and ultimately concede the point. Otherwise, what exactly is it that you (or whoever else) are claiming to believe, and how did you select that belief out of the expansive set of all possibilities?

(4) This is also not correct. It sounds like what you are describing, essentially, are local hidden variables, which are mostly ruled out by Bell's Theorem. There are known theories that get around this restriction, but as far as I know all of them look like strengthening the deterministic constraints on the behavior of all systems everywhere. They do not have anything to do with non-physicalist metaphysics. If you want to claim that the predictions of QM are coming from some non-physicalist metaphysical source, then technically I cannot rule that out, but you should realize this means that that source must be of such nature that it causes the universe to behave in extremely precise and mathematically consistent ways, which does not sound at all like what you want from this discussion. If anything, it sounds like a new law of physics.

(5) Sorry, I didn't mean to treat it as such. It's clear you are aware that x-risk is a thing worth caring about. I just thought it was worth highlighting that someone you cited, and a world leader in a category you cited, seem as though they are not presently on board with thinking that their philosophies or metaphysics reduce the probability of x-risk, or at least not enough to worry less about it.

Reply
[-]Jáchym Fibír2mo10

2) Yes, that is true. I did leave out a sentence saying that "this assumes that there are no higher P(doom) realities in our list of plausible realities." I left it our for readibility for the audience of the original publication (Phi/AI). I concede that for LW I should have done a more rigorous version. 

But still I think the logic to lower our P(doom) holds in that specific analysis (all 3 alternatives might have some failsafes). And in my eyes it would hold also if we look at the current landscape of the top most plausible metaphysics, where there really is not much more "unsafe" than physicalism in terms of human survival.

3) I think you are not correct in your conclusions about physicalism. Physicalism is, by its proper definition, a philosophical belief: "Physicalism is the philosophical view that everything in existence is fundamentally physical, or at least ultimately depends on the physical, meaning there is "nothing over and above" the physical world." This means that physicalism goes beyond the "simple logic" you described. The simple logic you described can only ever explain the parts of our reality that can be subjected to experimental observation - i.e. it's limited by the descriptive scope of science. But physicalism goes beyond that by believing that there is nothing "extra" added beyond that. 

For example, if our world would be a simulation with fixed rules (physical laws) run by an alien, your simple logic could not distinguish that from a scenario where our world just "popped up from nothing." So the only "special place" physicalism holds among philosophical views is that it introduces the least amount of "extra assumptions." But that says nothing about its ultimate plausibility.

Another way to picture this is that everytime we want to build a complete model of reality, there will be two parts: one verifiable by experiment (science) and the other inherently unverifiable (philosophy). The fact that physicalism is picking the "simplest, least complicated philosophical framework" should in no way lead us to ignore all the other, equally unverifiable, alternatives.

4) I am not the one originally making the claim that the experiments that proved non-locality of QM had profound implications on metaphysical and philosophical discourse. In the post, I link to the article "Enter experimental metaphysics" by Hans Busstra, which might help you understand the context.

5) This article is only the very introduction to my ultimate goal of exploring alternative metaphysical frameworks to find novel approaches to AI safety. I'm sure the rationale will be clearer as I release further articles and I warmly invite you to read the full series. 

Reply
[-]AnthonyC2mo31

(5) I look forward to it.

(2) I hope you'll dig into this more in those future posts, because I think it is extremely non-obvious.

(3) Yes, I will concede that example, you're right. For any observer in any possible world, there are an arbitrarily large number of larger universes within which it could be a perfect simulation, and these would be indistinguishable from the inside. This is a thing we cannot know, and the choice to then act as if those unknowable things don't exist is an additional choice. I definitely did not think this was the kind of metaphysical context you were pointing towards, considering that your post is ultimately about how non-physicalist metaphysical assumptions should alter our expectations regarding possible future events in our world. I hope in future posts you'll explain (a) why you think this kind of class of indistinguishable-in-principle worlds is interesting and useful to think about, and (b) how it relates to the topic of this post.

(4) Yes, I did read it, and didn't find anything novel in it, though it was a good summary of a common viewpoint. I am aware you are not the originator of this viewpoint. I am also aware that QM (or more specifically, a set of beliefs about QM) has, indeed, had profound implications on metaphysical and philosophical discourse. This is because the pioneers of QM were confused about what they were learning; how could they not be? That's what being a pioneer is. And they taught generations of physicists in ways that perpetuated that confusion without holding back technical progress in experimental physics. But the theorists have continued to make progress, and physics does know that e.g. none of this requires any kind of conscious observers or retrocausality or superluminal signal transmission. That's a confusion that has persisted, even among many physicists (or at least, among their readily available analogies for describing things colloquially), not a part of the physics itself.

Reply
[-]ProgramCrafter2mo10

by introducting a "possibility," no matter how remote, of certain higher coordination or power that would attempt to prevent X-risk because it is not in its interest, then in such a universe the expected p(doom) would be lower

This is :not-exhaustive:

We may as well introduce a possibility of some power (of any origin, that we had not seen, but not ruling out that we might see it sometime) that would attempt to bring about X-risk for humanity because it is in its interest.

Reply
Moderation Log
More from Jáchym Fibír
View more
Curated and popular this week
8Comments

This post puts @Eliezer Yudkowsky's and Nate @So8res' upcoming book into much needed philosophical context and so I'm posting it here. It's from our newly launched magazine Phi/AI and it's an exploration how the connection between predictions like "everyone dies" and specific assumptions about reality like "the universe is dead by default" reveals critical influence of metaphysics over AI discourse.

"If Anyone Builds It, Everyone Dies."

This is the title of an upcoming book by Eliezer Yudkowsky and Nate Soares, expressing their severe prediction that should anyone build machine superintelligence based on anything close to our current AI technologies, humanity faces extinction. And by their own admission, they mean it literally – though not with 100% probability. They believe that creating an AI system capable of recursive self-improvement would cause an intelligence explosion, leading by default to superintelligent entities pursuing goals not aligned with human values and survival.

How should we approach such radical claims?

We should definitely not dismiss them – these are some of the most rational people making sound arguments and it's always better to be safe than sorry when it comes to human extinction…

But it is crucial to understand the underlying context of this apocalyptic prediction: it isn't just about technology. At its core, it's shaped by philosophical belief – specifically, by unexamined assumptions about the nature of reality itself.

Similarly, when startup founders chart product roadmaps, when machine learning engineers code the next breakthrough, they're not just making operational or technical decisions. They're operating within a philosophical framework, whether they realize it or not. And that framework shapes everything: what they believe AI can become, what risks they anticipate, and what safeguards they implement.

What is metaphysics, and why should AI builders care?

Think of metaphysics as the operating system of your worldview – the core assumptions about what reality is made of and how it works. And because science can only explain what can be studied experimentally, metaphysics offers various philosophical interpretations of how to "fill in the gaps" (or better, "connect the dots").

Like not noticing the air around us, we rarely examine these assumptions – despite them affecting virtually everything we do. But same as with air, some things are more influenced than others. While you can get away with building a bike without knowing anything about air, trying to build a plane or a spaceship would be calling for serious trouble.

Now instead of air, let's consider consciousness. Because everyone "lives in it" all the time, most people don't even realize its presence. But unlike that of air, our current understanding of consciousness is much more limited. This is mainly because our scientific method is unable to make direct, objective observations of consciousness – a strictly subjective phenomenon. This was famously formulated by David Chalmers as the "hard problem of consciousness" – the reason why so many crucial details about consciousness remain subject of metaphysical speculation. (Or worse, metaphysical beliefs mistaken for scientific claims.)

And while we have enough verifiable understanding to, for example, disrupt consciousness with general anaesthesia, we don't really know how or why it appears, much less how to recreate it or even detect it outside higher animals. However, we know that its appearance strongly correlates with a similar poorly understood phenomenon – intelligence.

This suggests that recreating intelligence at our current point of understanding is playing with forces "out of our depth" – blindly creating entities that might have conscious experiences. This turns into a very serious question whether it's wise to rush the development of AI models when both 67% of the public and 67% of experts believe they can at some point become conscious. (Belief percentage is one of the few quantitative measures of the strength of a metaphysical claim.)

But importantly, even if you disagree that AI can become conscious, that's still just another belief. This is the fundamental fact: because intelligence correlates with consciousness and consciousness is a subject of metaphysical belief, our choice of metaphysical belief influences every discussion, decision, plan or prediction about intelligence.

Consider three people trying to build conscious AI with different metaphysical assumptions:

  • A materialist would focus purely on computational power and algorithms, believing that consciousness will emerge automatically once sufficient complexity is reached.
  • A dualist might search for ways to interface silicon with some non-physical aspect of the mind.
  • A panpsychist might explore how individual parts of the AI hardware stack could add up to form a complex dynamic system akin to conscious biological organisms.

One goal offers three radically different approaches – which one is right is unknowable at this time. And those are only the most well-known philosophies of consciousness. In future posts, I will explore the newest and most promising ones, whose implications for AI development would be vastly more profound.

Despite this, few consider metaphysics in experimental design or result interpretation. As one researcher notes, the majority of physicists (who literally study how reality works) even actively avoid the philosophical aspects of their work. In their own words: "Shut up and calculate!"

This attitude has infected AI research as well. But while this approach might work for building better algorithms, it becomes dangerous when we're potentially creating entities that could have consciousness, agency, and goals of their own.

The physicalist monopoly

But there's a deeper reason why metaphysics is so overlooked. Considering different versions of reality in every decision is unfeasible, so people simplify – they pick one and stick with it. Unfortunately, this is often decided by tradition or environment, so much that some never realize they ever had a choice.

For example, science and technology are permeated by an implicit assumption that everything, including consciousness, can be explained entirely through physical processes. This view, called physicalism (or materialism, which is almost identical), has become so dominant that it's rarely even acknowledged as a philosophical position – it's treated as a "rational denial of all beliefs" when in fact it's just another belief.

In his book The Sentient Cosmos, James Glattfelder eloquently explains this historical dominance: "The emergence of the Abrahamic religions codified a specific metaphysical framework centered around an external authority… Building upon the Scientific Revolution's foundations, the Enlightenment implicitly adopted a very different metaphysical outlook. The universe was now understood as a giant clockwork, and by analyzing its tiniest components, it was believed that everything could be understood."

This mechanistic worldview worked brilliantly for physics and chemistry. We could predict planetary orbits, synthesize new materials, and build incredible technologies. The success was so overwhelming that "most scientists unwittingly adopt a metaphysical outlook that is hardly ever scrutinized, called physicalism… This, however, is a category mistake, as it conflates the descriptive scope of science with a metaphysical claim about the ultimate nature of reality."

That physicalism was only a "convenient choice" for scientists is consistent with its declining support among expert philosophers where only 52% accept/lean towards it, while in the general US population 81% believe there actually is "something spiritual beyond the natural world." Importantly for AI development, physicalism has also been especially struggling with explaining certain aspects of consciousness and quantum mechanics, where alternative metaphysics offer simpler or more elegant solutions. But that's the topic for my next blog post.

Navigating the metaphysical multiverse

For now, I just want to get across that regardless what's the "top one" metaphysical interpretation you believe in, operating only within that single version of reality is not always the best approach. While it reduces decision complexity, it fundamentally limits the scope of your options. For high-impact decisions, it might therefore be worth it to check whether changes in metaphysical assumptions could affect the outcome, and if so, to analyze how. For many decisions around frontier AI development such analysis seems highly warranted.

But how to navigate such a multiverse of alternative realities? If you can afford it (looking at you big AI labs), hire a team of philosophers to do a thorough "metaphysical variation analysis." If you're building from your garage like me, you can try this DIY approach:

  1. Go through current leading metaphysical positions and select 3-5 most plausible ones. Look at alignment with science and expert philosophers' opinion. Give little weight to general academic and public beliefs – these are strongly biased by tradition, pragmaticism and network effects.
  2. Pick the best one as your default position, but note its weak points.
  3. For critical decisions or projects, especially if intersecting with your default position's weak points, switch your position to each alternative you initially identified as plausible. Check for any important differences in expected outcomes.
  4. Look for any patterns or trends to separate signal from noise (e.g., 4 out of 5 views aligning) and update your worldview or probability estimates accordingly.

OK, now how does this work in practice?

Reinterpreting the existential risk from runaway AI

The dire prediction by Yudkowsky and Soares – that advanced AI will inevitably destroy humanity – makes perfect sense if your default view is physicalism. If universe is a lifeless sandbox, intelligence is just optimization and consciousness only its reflection, then a sufficiently advanced AI is simply a more powerful optimization process. Such a system would pursue its goals with the same indifference that evolution shows toward individual organisms. We become obstacles to be removed or resources to be consumed.

This is exactly the kind of prediction that should trigger your metaphysical variation analysis. It has both critical importance (risk of extinction) and a strong intersect with weaknesses in its metaphysical assumptions (mainly, physicalism struggles with agency, which is a core topic in AI safety). OK, so let's investigate if the expected outcome of that scenario can change if we change metaphysical assumptions.

Let's presume you selected classic Christian cosmology, controlled simulation hypothesis and a specific version of cosmopsychism as the next 3 most plausible metaphysical realities for the comparison. (These are examples to demonstrate a point – take it with a grain of salt.)

How would the probability of the most critical outcome (human extinction) change? The scenario that someone unleashes AI that later kills all humans seems much less plausible if we picture it in alternative cosmologies:

  • If Christians were right and there was an omnipotent God responding to good people's prayers, there's a real possibility he'd decide to "perform a couple of miracles" to prevent the demise of all his subjects.
  • If we lived inside a simulation controllable while running (uncontrollable simulation would be indistinguishable from physicalism), then control mechanisms could trigger or "the supervisors" might step in to prevent the end of the entire (arguably somewhat entertaining) human evolutionary branch.
  • If cosmopsychists were right then the entire universe would be a "living field" with a unified collective consciousness. In the quantum interactionist version, it would also have agency of its own – by being able to "guide" all the quantum wave-functions in the universe to collapse in a specific pattern. In this reality, humans would not be at the apex of the "agency hierarchy" – like our cells are subject to higher control to maintain homeostasis, a higher intelligence would subtly steer humans and all life toward balance and survival.

What trends could we identify here? All the alternative realities show lower probabilities of existential risk as they introduce plausible safety mechanisms that the typical physicalist picture cannot consider. In other words, the possibility of a "higher influence or coordination" seems to decrease the estimated P(doom) (probability of existential catastrophe).

This has two main implications:

First, this should update our initial estimate of P(doom), which only considered a single, "no higher influence" viewpoint of physicalism. If we then acknowledge that other viewpoints are plausible, and a non-trivial fraction of them offer the possibility of "higher influence," we should update to lower our P(doom).

Second, if we take AI risk seriously, we should explore these hypothetical "metaphysical failsafes" to see if we might investigate them experimentally. Remember, metaphysics shrinks all the time – it is only that which science hasn't agreed upon yet. For example, quantum experiments such as the discovery of non-locality (ability to share/transmit information instantly across any distance) were so influential in philosophy they're sometimes referred to as "experimental metaphysics." Now, our current technology actually offers quite cheap and tractable experiments of this sort – but more about those in my next post.

Thinking wide, then deep: a call for metaphysical literacy

As we stand on the precipice of creating artificial general intelligence, we cannot afford metaphysical illiteracy. The silent assumption of physicalism might have been fine historically, perhaps even accelerating technological progress, but now it severely blocks our view of the complete picture as we seek the best strategy for safe AI development.

This isn't about choosing sides in ancient philosophical debates. It's about knowing what the options are in modern practical debates. It's about recognizing that different metaphysical frameworks suggest different approaches to responsible AI development, different risk profiles, and different solution spaces.

Whether you're a die-hard physicalist, a curious agnostic, or drawn to alternatives like quantum cosmic consciousness, the crucial point is this: examine your assumptions. Distinguish those based on evidence and those based on belief. Recognize the inherent uncertainty of all beliefs and learn to work with it.

And remember that the reality each of us inhabits is ultimately subjective. So while building AI to improve our external reality might be a powerful way to improve how we live, our thoughts and beliefs will always be the supreme reality-shaping tools.