Aran Nayebi — LessWrong

LESSWRONG
LW

Replying toPost-AGI Economics As If Nothing Ever Happens

Post-AGI Economics As If Nothing Ever Happens

Great post! Very much agree about the conservatism.

This is why I find it useful to do economic analyses where the variables and factors are exposed, such as in my recent AI UBI analysis, so that one needn't assume values for those variables, but can try out a multitude of scenarios and see how the predictions change, along with having an understanding of what factors matter more than others through the derived analytic form.

For example, one thing I found was that having a Scandinavian ownership amount of AI profits (~33%), drastically reduces the AI capability needed to be productive enough to fund a UBI. As a policy, this then seems very reasonable and attainable.

The full paper with all the cited sources can be found here: https://arxiv.org/abs/2505.18687

Replying toFrom Barriers to Alignment to the First Formal Corrigibility Guarantees

Aran Nayebi2mo

From Barriers to Alignment to the First Formal Corrigibility Guarantees

Pretraining doesn’t evade the lower bound: a “pointer” is just a compressed index into a large hypothesis space, and constructing it already requires resolving the same M-way ambiguity during pretraining. The lower bound applies regardless of where the bits are paid.

Replying toFrom Barriers to Alignment to the First Formal Corrigibility Guarantees

Aran Nayebi2mo

From Barriers to Alignment to the First Formal Corrigibility Guarantees

You can certainly put it in U2 instead (U2 is just a special case of U4 with one auxiliary), but putting it in U4 already ensures it’s suboptimal to preserve the switch & defer yet "kill all humans", because it collapses many future intervention and recovery options simultaneously. In other words, it’s a hard constraint in effect — U4 enforces it as a global irreversibility invariant, whereas U2 is only needed for narrow single-channel invariants like switch reachability.

Replying toFrom Barriers to Alignment to the First Formal Corrigibility Guarantees

Aran Nayebi2mo

From Barriers to Alignment to the First Formal Corrigibility Guarantees

That's correct, it can be naturally folded into U4 as one of its auxiliary utilities, in the same manner as we do for off-switch preservation.

Replying toFrom Barriers to Alignment to the First Formal Corrigibility Guarantees

Aran Nayebi2mo

From Barriers to Alignment to the First Formal Corrigibility Guarantees

Thanks! I really appreciate this, and I think your natural-latents framing fits nicely with the Part I point about needing to compress D down to a small set of crisp, structured latents. On the lexicographic point: it's worth noting that even though Theorem 3 writes the full objective as a discounted sum, the safety heads U1-U4 aren’t long-horizon objectives — they’re local one-step tests whose optimal action doesn’t depend on future predictions. For example, U1 is automatically satisfied each round by waiting (and once the human approves the proposed action, the agent simply executes it, thereby engaging U2-U5), and U4 is a one-step reversibility check against an inaction baseline, not a long-run impact estimate. The only head with genuine long-horizon structure is U5, which sits below the safety heads, so discounting never creates optimization pressure on them. This makes the whole scheme intentionally deontic and “natural-latent–friendly”, exactly matching the tractable regime suggested by the large-D lower bounds of Part I.

From Barriers to Alignment to the First Formal Corrigibility Guarantees

Aran Nayebi

2mo

This post summarizes my two related papers that will appear at AAAI 2026 in January:

Part I: Intrinsic Barriers and Practical Pathways for Human-AI Alignment: An Agreement-Based Complexity Analysis (selected for oral presentation)
Part II: Core Safety Values for Provably Corrigible Agents

What these papers try to quantify are two questions:

What factors make alignment hard in principle, and
Which safety targets remain tractable despite those limits.

The first paper gives formal lower bounds on the difficulty of AI alignment that apply even in best-case scenarios, yielding the first “No-Free-Lunch” theorems for alignment, namely: (1) aligning to “all human values” is intractable, and (2) reward hacking is inevitable in large state spaces & bounded agents.

The second paper shows... (read 3290 more words →)

An AI Capability Threshold for Funding a UBI (Even If No New Jobs Are Created)

Aran Nayebi

2mo

There’s been a lot of talk lately about an “AI explosion that will automate everything” to “AI will produce huge rents”. While it’s far from clear if any of these predictions will pan out, there’s a more grounded version of such questions we can quantitatively address:

Suppose AI automated every task that’s currently automatable — which is still not all jobs — and didn’t even create new ones. How capable would it need to be before its rents could fund something like a universal basic income (UBI)?

It turns out one can, in fact, give a clean, analytic threshold for this question.

Off the bat, it’s perhaps worth mentioning that this is explicitly a funding result, and... (read 839 more words →)