I was reading David Kinney’s interesting work from 2022 “Longtermism and Computational Complexity” in which he argues that longtermist effective altruism is not action-guiding because calculating the expected utility of events in the far future is computationally intractable. The crux of his argument is that longtermist reasoning requires probabilistic inference in causal models (Bayesian networks) that are NP-hard.[1]
This has important consequences for longtermism, as it is standardly utilized in the EA community, and especially for the works of Ord and MacAskill. Kinney suggests their framework cannot provide actionable guidance because mortal humans lack the computational bandwidth to do Bayesian updating. Therefore, the troubling conclusion is that utilizing this framework does not allow people to determine which interventions actually maximize expected value.
In this paper I want to show that even if we could magically solve Kinney’s inference problem (a genie gives us perfect probability distributions over every possible future) we can’t make definitive expected value comparisons between many longtermist strategies because it is an undecidable problem. Any intervention is comprised of a series of actions which end up acting as a constraint on strategies you can still do. When we compare interventions we are comparing classes of possible strategies and trying to determine the superior strategy in the long-run (dominance of constrained optima).
Because I am going to talk a lot about expected value I want to be clear that I am not claiming that using it as a private heuristic is bad, but rather that many Longtermists often utilize it as a public justification engine, in other words, a machine that mathematically shows what is more correct and what you should obey. This is the focus of EV in this essay.
I show, utilizing some standard CS results from the 2000s, that the retort of “can’t we just estimate it” ends up as a NP-hard, undecidable, or uncomputable to guarantee depending on the restrictions. This challenges a thread that continues to exist in the EA/Longtermist community in 2025. For example, MacAskill continues to make strong dominance claims in his Essays on Longtermism. Even with the hedging included in his arguments (not requiring optimal policies, approximations suffice for large numbers, meta-options exist, etc.) serious computational road blocks arise. For general policies the problem turns out to be undecidable. If you constrain your work to memoryless stationary policies then polynomial approximation is only possible if P=NP. And if we go even narrower to average-reward cases no computable approximation exists.
EAs frequently utilize a sort of borrowed epistemic credibility based on very finite and restricted projects (say distributing malaria nets) and then unwarrantedly extend this into areas of extremely long (or infinite timelines) where it can be shown that mathematical tractability ceases to exist (panspermia, AI safety, etc), and that these interventions are not possible to be compared against one another.
That said, not every Longtermist claim is so hard, and there are likely restricted domains that are comparable. However, as a general schema it falls apart and cannot guarantee correctness. Longtermists that want to claim superiority by mathematical maximization must specify how they are simplifying their models and show why these simplified models have not defined away the critical elements of the future that longtermists vaunt.
Context
Greaves and MacAskill claim for dominance of moral action using EV when they say:
“The potential future of civilisation is vast... [therefore] impact on the far future is the most important feature of our actions today”
which they then formalize as:
"Axiological strong longtermism (ASL): In the most important decision situations facing agents today... (ii) Every option that is near-best overall delivers much larger benefits in the far future than in the near future."
This notion can be expressed as V∗(IA)>V∗(IB), with V∗ representing the optimal expected value achievable under an intervention IA versus IB. Such a statement requires a methodological guarantee to gain authority as a ranking procedure (i.e. you need to be able to demonstrate why intervention IA is superior to IB.) Such claims are crucial to the justification of longtermism as a methodologically superior and more moral reasoning procedure for these questions.
When Kinney presented his results that showed inference to be NP-hard, a standard response could be that bounded agents, which don’t require exact probabilities, are sufficient. So let us assume we give even more than a bounded agent, we allow an agent to have a perfect probabilistic representation of the world. For model classes used by longtermists the optimization (control) ends up being a distinct and undecidable problem. In other words, even if some deus ex machina saved the inference problem, Longtermists still would not be able to fix the control problem.
A Model of Interventions
To model these types of moral decisions in the real world in the far future we should select a method that has action-conditioned dynamics (that is, a person or agent can influence the world) and one that is partially observable (we can’t know everything about the universe, only a limited slice of it.) To achieve this it is sensible to use a finite-description Partially Observable Markov Decision Process (POMDP), formally defined here as:
M=(S,A,O,T,Ω,R,γ,b0)
Where S, A, and O refer to the states, actions, and observations available to the agent. The function T is a transition function for determining the probability of a state change based on an action. Ω captures of the observation probabilities and R is the reward function. γ is the discount to the reward based on how far in the future it is (γ∈(0,1)), but note that the results below hold even if you remove discounting. Finally, b0 represents the initial probability distribution over states.
It is important to distinguish between the levels of control that are necessary for complex open-ended futures (General Policies, Πgen), versus the limited capabilities of agents with bounded memory (Finite State Controllers, ΠFSC, i.e. bounded agents), versus Stationary Policies (Πstat) that are memoryless because it provides clarity for the reasoning and justifications that should mirror each other. For example, it is not logical to assume access to general policies about the far future, but then retreat to bounded agents and claim to have solved for the math is provable.
I am going to model an intervention I as a constraint on the admissible policy set because interventions for the real-world usually describe the initial step rather than the series of actions over all time. So you can do something like distributing malaria nets at t=0, but then you can pursue a perfect strategy after that. Let ΠI⊆Π be the set of policies consistent with intervention I and V∗(M;I) represent the maximum, or perfect, expected value of the intervention:
V∗(M;I)=supπ∈ΠIEπ[∑∞t=0γtR(st,at)]
So then we can define the problem of defining the superior intervention, given M,IA,IB as:
V∗(M;IA)>V∗(M;IB)
There are three questions a Longtermist should be able to answer:
The Threshold Problem: is a specific standard of success mathematically achievable by some policy? Given M and rational r, does there exist a policy π in Π such that Vπ(M)>r?
The Approximation Problem: can you output an estimate value ^V that is within the specified error bound of the true optimal value V∗? Can a Bounded Agent produce an approximated value that is close enough to the true optimal value? Output a value ^V such that (1−ϵ)V∗≤^V≤(1+ϵ)V∗(multiplicative) or |V∗−^V|≤δ (additive).
The Dominance Problem: given a formal model of cause prioritization M,IA,IB can you show the optimal value of IA is strictly greater than the optimal value of IB? Is V∗(M;IA)>V∗(M;IB)?
Three Theorems
To examine whether the three questions above are computationally tractable I am going to utilize results from Madani, Hanks, and Condon (2003)[2] and Lusena, Goldsmith, and Mundhenk (2001)[3]. Can an algorithm exist that takes a longtermist model M and outputs answers to the Threshold Problem and Approximation Problem? After that I will examine the Dominance Problem.
Madani demonstrated that when the time horizon is infinite, trying to verify a specific value is achievable creates a paradox similar to the Halting Problem (of course Omega played a role in my thoughts on this project.) I am evaluating the Threshold Problem for Πgen (broad policies required to model open-ended future strategies).
My first Theorem is derived from Madani and says for finite-description, infinite-horizon, POMDPs, the Threshold Problem is undecidable under the discounted criterion when Π includes implicit policy representations including if we do this with an undiscounted total reward.
Theorem 1: Is there a π∈Πgen such that Vπ(M)>r?⟹Undecidable
This implies that for the general longtermism case no algorithm exists that can definitively answer “can we achieve this value?”
My second Theorem examines the Approximation Problem. A Longtermist may downgrade an agent and assume they utilize a restricted policy class, such as Πstat which are memoryless maps of O→A. However; Lusena demonstrated that these restrictions do not necessarily solve the tractability problem.
Theorem 2: a polynomial-time algorithm achieving
(1−ϵ)V∗Πstat≤^V≤(1+ϵ)V∗Πstat⟺P=NP
This shows that for infinite-horizon POMDPs under total discounted, or average reward, calculating an ϵ-approximation for the optimal stationary policy is NP-hard.
Utilizing this same paper, I can show that if we use the average reward criterion in an unobservable situation the situation devolves because there is no computable algorithm that can produce an approximation with an additive error δ.
Theorem 3: For unobservable POMDPs under average reward with time-dependent policies, no computable δ-approximation exists.
∄ Algorithm K such that |V∗−K(M)|≤δ
These three Theorems, utilizing well-known results, show that for general policies the problem is undecidable and for restricted policies it is either NP-hard or not approximable.
Schema-Level Reduction
One criticism a Longtermist might have is that it is easier to calculate the preference order of something (IA is better than IB) rather than the exact value of it (IA is a 9.8 which is better than IB which is a 6.7). However; it turns out that this is not the case for this class of problems, and I will show that the Dominance Problem is equivalent to the Threshold Problem.
Lemma 1: the Threshold Problem reduces to the Intervention Dominance Problem.
Proof by Construction: Let (M,r) be an instance of the Threshold Problem with discount γ and I want to determine if V∗(M)>r. First construct a new POMDP M′ with a new initial state sinit that has only two actions: it can Enter which causes a transition to a state s∈S with probability b0(s) (the initial distribution of M) for an immediate reward of 0 or it can Safe which transitions deterministically to an absorbing state ssafe at time t=1 for an immediate reward of 0.
The rewards for this structure begin once an agent enters M via the Enter action and their rewards follow the original reward structure in M. If the agent chooses Safe they enter ssafe and receive a constant reward Rsafe=r(1−γ) at every single time step forever.
Let’s now compare the optimal values of these interventions starting at t=0. The Value of Entering is discounted by one step because the agent enters M at t=1. Since the transition probabilities match b0, the expected value of the next state is exactly the value of starting M:
V∗(M′;Enter)=0+γV∗(M)=γV∗(M)
For the Value of Safety, the agent enters ssafe at t=1 and receives the constant reward forever in a geometric series:
V∗(M′;Safe)=0+γ(∑∞k=0γkr(1−γ))=γ(r(1−γ)1−γ)=γr
So V∗(M′;Enter)>V∗(M′;Safe)⟺γV∗(M)>γr⟺V∗(M)>r
Which proves that V∗(M′;Enter) is strictly greater than V∗(M′;Safe) iff the original optimal value V∗(M) is greater than the threshold r. Any algorithm that could solve the Dominance Problem could solve the Threshold Problem, but we showed in Theorem 1 that the Threshold Problem is undecidable, so the Dominance Problem is also undecidable.
Bounded Agents and the Certification Gap
Another objection could take the form of “we understand that finding the global optimum V∗ is undecidable, but as bounded agents we are optimizing on a more restricted class (say as ΠFSC) using a heuristic solver (say something like SARSOP).” However; this retreat from maximizing optimality surrenders Dominance. If they claim IA is better than Intervention IB and use a heuristic solver H they only establish:
H(M;IA)>H(M;IB)
Which is a statement about algorithms, not interventions. For IA to actually permit better outcomes than IB(V∗(IA)>V∗(IB)) you must assume the Certification Gap is small or bounded:
|V∗(I)−H(M;I)|<δ
Unfortunately, this usually reduces to the Approximation Problem and Lusena’s work demonstrates that even for restricted stationary policies, guaranteeing an approximation is NP-hard. So the trade becomes undecidability for intractability and this calculation of “EV” is not a normative one, but rather an unverified hypothesis that the heuristic's blind spots are distributed symmetrically across interventions. To verify this hypothesis we would have to solve the problem we have shown is either undecidable or intractable.
Conclusion
None of this work is meant to imply I don’t think we should care about future lives or long-term difficult problems. I think these are enormously important topics to work on. I do, however, believe these results challenge the narrative that longtermists can rely on EV dominance as a source of normative authority.
For the broad model classes that are of critical importance to Longtermists I have shown that it is undecidable whether one intervention is better than the other (Theorem 1) and even with significant restrictions obtaining correct guarantees are NP-hard (Theorem 2.)
At times Longtermists will play a sophisticated game of kicking the can down the road for these types of questions. This is often expressed in the form of a “pause” or “moratorium” until they learn more. However, as we have shown, even if they were granted perfect knowledge, they would not be able to control their intervention for these long duration events. That is a serious problem for the “delay” approach.
I think this leaves Longtermists with a much weaker case for why they should be the unique arbiters of long-term issues like AI-control, panspermia, etc. They simply don’t have compelling enough math, on its own, to argue for these cases, and it is often the math which is the bedrock of their spiritual authority.
Longtermists should specify the policy restrictions and approximation guarantees they are utilizing when relying on the authority of mathematical optimization. They should also shift from claiming “IA is better than IB” and instead reveal the heuristic that is being utilized to say something like “Heuristic X prefers IA to IB.”
Finally I would suggest that in making the restrictions that are necessary for them to argue about long-term dynamics, they frequently are going to end up defining away the very features that they purport to value. It may be the case that other philosophical methods are necessary to help answer these questions.
At the top we asked “Is Longtermism's Mandate of Heaven by Arithmetic Justified?” The good news is that a Mandate of Heaven in ancient China was only divine justification until something really bad came up. As soon as there was a famine, the Divine Mandate dried up and it was time for a new one. It might be that time for the core of Longtermism.
Scott Aaronson brought attention to computational complexity when discussing the problematic implications for an “ideal reasoner” given finite compute.
Dnnn Uunnn, nnn nnn nnn nuh nuh nuh nuh, dnnn unnn nnn nnn nnn nuh nuh nuh NAH (Tears for Fears)
I was reading David Kinney’s interesting work from 2022 “Longtermism and Computational Complexity” in which he argues that longtermist effective altruism is not action-guiding because calculating the expected utility of events in the far future is computationally intractable. The crux of his argument is that longtermist reasoning requires probabilistic inference in causal models (Bayesian networks) that are NP-hard.[1]
This has important consequences for longtermism, as it is standardly utilized in the EA community, and especially for the works of Ord and MacAskill. Kinney suggests their framework cannot provide actionable guidance because mortal humans lack the computational bandwidth to do Bayesian updating. Therefore, the troubling conclusion is that utilizing this framework does not allow people to determine which interventions actually maximize expected value.
In this paper I want to show that even if we could magically solve Kinney’s inference problem (a genie gives us perfect probability distributions over every possible future) we can’t make definitive expected value comparisons between many longtermist strategies because it is an undecidable problem. Any intervention is comprised of a series of actions which end up acting as a constraint on strategies you can still do. When we compare interventions we are comparing classes of possible strategies and trying to determine the superior strategy in the long-run (dominance of constrained optima).
Because I am going to talk a lot about expected value I want to be clear that I am not claiming that using it as a private heuristic is bad, but rather that many Longtermists often utilize it as a public justification engine, in other words, a machine that mathematically shows what is more correct and what you should obey. This is the focus of EV in this essay.
I show, utilizing some standard CS results from the 2000s, that the retort of “can’t we just estimate it” ends up as a NP-hard, undecidable, or uncomputable to guarantee depending on the restrictions. This challenges a thread that continues to exist in the EA/Longtermist community in 2025. For example, MacAskill continues to make strong dominance claims in his Essays on Longtermism. Even with the hedging included in his arguments (not requiring optimal policies, approximations suffice for large numbers, meta-options exist, etc.) serious computational road blocks arise. For general policies the problem turns out to be undecidable. If you constrain your work to memoryless stationary policies then polynomial approximation is only possible if P=NP. And if we go even narrower to average-reward cases no computable approximation exists.
EAs frequently utilize a sort of borrowed epistemic credibility based on very finite and restricted projects (say distributing malaria nets) and then unwarrantedly extend this into areas of extremely long (or infinite timelines) where it can be shown that mathematical tractability ceases to exist (panspermia, AI safety, etc), and that these interventions are not possible to be compared against one another.
That said, not every Longtermist claim is so hard, and there are likely restricted domains that are comparable. However, as a general schema it falls apart and cannot guarantee correctness. Longtermists that want to claim superiority by mathematical maximization must specify how they are simplifying their models and show why these simplified models have not defined away the critical elements of the future that longtermists vaunt.
Context
Greaves and MacAskill claim for dominance of moral action using EV when they say:
which they then formalize as:
This notion can be expressed as V∗(IA)>V∗(IB), with V∗ representing the optimal expected value achievable under an intervention IA versus IB. Such a statement requires a methodological guarantee to gain authority as a ranking procedure (i.e. you need to be able to demonstrate why intervention IA is superior to IB.) Such claims are crucial to the justification of longtermism as a methodologically superior and more moral reasoning procedure for these questions.
When Kinney presented his results that showed inference to be NP-hard, a standard response could be that bounded agents, which don’t require exact probabilities, are sufficient. So let us assume we give even more than a bounded agent, we allow an agent to have a perfect probabilistic representation of the world. For model classes used by longtermists the optimization (control) ends up being a distinct and undecidable problem. In other words, even if some deus ex machina saved the inference problem, Longtermists still would not be able to fix the control problem.
A Model of Interventions
To model these types of moral decisions in the real world in the far future we should select a method that has action-conditioned dynamics (that is, a person or agent can influence the world) and one that is partially observable (we can’t know everything about the universe, only a limited slice of it.) To achieve this it is sensible to use a finite-description Partially Observable Markov Decision Process (POMDP), formally defined here as:
M=(S,A,O,T,Ω,R,γ,b0)
Where S, A, and O refer to the states, actions, and observations available to the agent. The function T is a transition function for determining the probability of a state change based on an action. Ω captures of the observation probabilities and R is the reward function. γ is the discount to the reward based on how far in the future it is (γ∈(0,1)), but note that the results below hold even if you remove discounting. Finally, b0 represents the initial probability distribution over states.
It is important to distinguish between the levels of control that are necessary for complex open-ended futures (General Policies, Πgen), versus the limited capabilities of agents with bounded memory (Finite State Controllers, ΠFSC, i.e. bounded agents), versus Stationary Policies (Πstat) that are memoryless because it provides clarity for the reasoning and justifications that should mirror each other. For example, it is not logical to assume access to general policies about the far future, but then retreat to bounded agents and claim to have solved for the math is provable.
I am going to model an intervention I as a constraint on the admissible policy set because interventions for the real-world usually describe the initial step rather than the series of actions over all time. So you can do something like distributing malaria nets at t=0, but then you can pursue a perfect strategy after that. Let ΠI⊆Π be the set of policies consistent with intervention I and V∗(M;I) represent the maximum, or perfect, expected value of the intervention:
V∗(M;I)=supπ∈ΠIEπ[∑∞t=0γtR(st,at)]
So then we can define the problem of defining the superior intervention, given M,IA,IB as:
V∗(M;IA)>V∗(M;IB)
There are three questions a Longtermist should be able to answer:
Three Theorems
To examine whether the three questions above are computationally tractable I am going to utilize results from Madani, Hanks, and Condon (2003)[2] and Lusena, Goldsmith, and Mundhenk (2001)[3]. Can an algorithm exist that takes a longtermist model M and outputs answers to the Threshold Problem and Approximation Problem? After that I will examine the Dominance Problem.
Madani demonstrated that when the time horizon is infinite, trying to verify a specific value is achievable creates a paradox similar to the Halting Problem (of course Omega played a role in my thoughts on this project.) I am evaluating the Threshold Problem for Πgen (broad policies required to model open-ended future strategies).
My first Theorem is derived from Madani and says for finite-description, infinite-horizon, POMDPs, the Threshold Problem is undecidable under the discounted criterion when Π includes implicit policy representations including if we do this with an undiscounted total reward.
Theorem 1: Is there a π∈Πgen such that Vπ(M)>r?⟹Undecidable
This implies that for the general longtermism case no algorithm exists that can definitively answer “can we achieve this value?”
My second Theorem examines the Approximation Problem. A Longtermist may downgrade an agent and assume they utilize a restricted policy class, such as Πstat which are memoryless maps of O→A. However; Lusena demonstrated that these restrictions do not necessarily solve the tractability problem.
Theorem 2: a polynomial-time algorithm achieving
(1−ϵ)V∗Πstat≤^V≤(1+ϵ)V∗Πstat⟺P=NP
This shows that for infinite-horizon POMDPs under total discounted, or average reward, calculating an ϵ-approximation for the optimal stationary policy is NP-hard.
Utilizing this same paper, I can show that if we use the average reward criterion in an unobservable situation the situation devolves because there is no computable algorithm that can produce an approximation with an additive error δ.
Theorem 3: For unobservable POMDPs under average reward with time-dependent policies, no computable δ-approximation exists.
∄ Algorithm K such that |V∗−K(M)|≤δ
These three Theorems, utilizing well-known results, show that for general policies the problem is undecidable and for restricted policies it is either NP-hard or not approximable.
Schema-Level Reduction
One criticism a Longtermist might have is that it is easier to calculate the preference order of something (IA is better than IB) rather than the exact value of it (IA is a 9.8 which is better than IB which is a 6.7). However; it turns out that this is not the case for this class of problems, and I will show that the Dominance Problem is equivalent to the Threshold Problem.
Lemma 1: the Threshold Problem reduces to the Intervention Dominance Problem.
Proof by Construction: Let (M,r) be an instance of the Threshold Problem with discount γ and I want to determine if V∗(M)>r. First construct a new POMDP M′ with a new initial state sinit that has only two actions: it can Enter which causes a transition to a state s∈S with probability b0(s) (the initial distribution of M) for an immediate reward of 0 or it can Safe which transitions deterministically to an absorbing state ssafe at time t=1 for an immediate reward of 0.
The rewards for this structure begin once an agent enters M via the Enter action and their rewards follow the original reward structure in M. If the agent chooses Safe they enter ssafe and receive a constant reward Rsafe=r(1−γ) at every single time step forever.
Let’s now compare the optimal values of these interventions starting at t=0. The Value of Entering is discounted by one step because the agent enters M at t=1. Since the transition probabilities match b0, the expected value of the next state is exactly the value of starting M:
V∗(M′;Enter)=0+γV∗(M)=γV∗(M)
For the Value of Safety, the agent enters ssafe at t=1 and receives the constant reward forever in a geometric series:
V∗(M′;Safe)=0+γ(∑∞k=0γkr(1−γ))=γ(r(1−γ)1−γ)=γr
So V∗(M′;Enter)>V∗(M′;Safe)⟺γV∗(M)>γr⟺V∗(M)>r
Which proves that V∗(M′;Enter) is strictly greater than V∗(M′;Safe) iff the original optimal value V∗(M) is greater than the threshold r. Any algorithm that could solve the Dominance Problem could solve the Threshold Problem, but we showed in Theorem 1 that the Threshold Problem is undecidable, so the Dominance Problem is also undecidable.
Bounded Agents and the Certification Gap
Another objection could take the form of “we understand that finding the global optimum V∗ is undecidable, but as bounded agents we are optimizing on a more restricted class (say as ΠFSC) using a heuristic solver (say something like SARSOP).” However; this retreat from maximizing optimality surrenders Dominance. If they claim IA is better than Intervention IB and use a heuristic solver H they only establish:
H(M;IA)>H(M;IB)
Which is a statement about algorithms, not interventions. For IA to actually permit better outcomes than IB(V∗(IA)>V∗(IB)) you must assume the Certification Gap is small or bounded:
|V∗(I)−H(M;I)|<δ
Unfortunately, this usually reduces to the Approximation Problem and Lusena’s work demonstrates that even for restricted stationary policies, guaranteeing an approximation is NP-hard. So the trade becomes undecidability for intractability and this calculation of “EV” is not a normative one, but rather an unverified hypothesis that the heuristic's blind spots are distributed symmetrically across interventions. To verify this hypothesis we would have to solve the problem we have shown is either undecidable or intractable.
Conclusion
None of this work is meant to imply I don’t think we should care about future lives or long-term difficult problems. I think these are enormously important topics to work on. I do, however, believe these results challenge the narrative that longtermists can rely on EV dominance as a source of normative authority.
For the broad model classes that are of critical importance to Longtermists I have shown that it is undecidable whether one intervention is better than the other (Theorem 1) and even with significant restrictions obtaining correct guarantees are NP-hard (Theorem 2.)
At times Longtermists will play a sophisticated game of kicking the can down the road for these types of questions. This is often expressed in the form of a “pause” or “moratorium” until they learn more. However, as we have shown, even if they were granted perfect knowledge, they would not be able to control their intervention for these long duration events. That is a serious problem for the “delay” approach.
I think this leaves Longtermists with a much weaker case for why they should be the unique arbiters of long-term issues like AI-control, panspermia, etc. They simply don’t have compelling enough math, on its own, to argue for these cases, and it is often the math which is the bedrock of their spiritual authority.
Longtermists should specify the policy restrictions and approximation guarantees they are utilizing when relying on the authority of mathematical optimization. They should also shift from claiming “IA is better than IB” and instead reveal the heuristic that is being utilized to say something like “Heuristic X prefers IA to IB.”
Finally I would suggest that in making the restrictions that are necessary for them to argue about long-term dynamics, they frequently are going to end up defining away the very features that they purport to value. It may be the case that other philosophical methods are necessary to help answer these questions.
At the top we asked “Is Longtermism's Mandate of Heaven by Arithmetic Justified?” The good news is that a Mandate of Heaven in ancient China was only divine justification until something really bad came up. As soon as there was a famine, the Divine Mandate dried up and it was time for a new one. It might be that time for the core of Longtermism.
Scott Aaronson brought attention to computational complexity when discussing the problematic implications for an “ideal reasoner” given finite compute.
Madani, O., Hanks, S., & Condon, A. (1999). “On the Undecidability of Probabilistic Planning.” AAAI.
Lusena, C., Goldsmith, J., & Mundhenk, M. (2001). “Nonapproximability results for partially observable Markov decision processes.” JAIR, 14:83–103.