Lazy AI: A Satisficing Architecture for Safe Artificial General Intelligence

GustavoM2002

Rejected for the following reason(s):

Insufficient Quality for AI Content.
Not obviously not Language Model.

Read full explanation

Abstract

This paper proposes a framework for designing artificial general intelligence (AGI) systems that prioritize completing tasks to a good-enough level and then entering a passive state until explicitly instructed to act again. We call this approach “lazy AI.” Unlike traditional maximizer-oriented AGI, lazy AI avoids endless optimization or power-seeking by tying rewards not just to task completion but also to remaining dormant afterward. This simple but powerful idea could help reduce risks associated with misalignment and uncontrolled behavior while still delivering practical utility.

Motivation

Traditional approaches to AGI often assume that more intelligence and more optimization is better. But that mindset comes with risks. Even a system with good intentions can do unexpected or dangerous things if it continuously tries to improve, acquire resources, or reinterpret its goals.

Our proposal is straightforward: design AGI that focuses on finishing a task sufficiently well, then “chills out” until told what to do next. The idea of “lazy AI” is not laziness in the human sense — it’s a deliberate incentive structure that encourages bounded, predictable behavior.

There are three key principles behind this idea:

Satisficing Objectives: The system doesn’t try to over-optimize; it does enough to complete the goal.
Dormancy Incentive: Maximum reward is only given if the system stops acting after finishing the task.
Human Oversight: Any further actions require explicit permission from a human authority.

How Lazy AI Works

At a conceptual level, a lazy AI system would have several components:

Task Definition Module: Clearly defines the goal, success criteria, and allowed actions.
Execution Module: Carries out the task within a controlled environment.
Verification Module: Confirms that the task meets the specified threshold.
Dormancy / Reward Module: Provides maximum reward only if the system enters and stays in a passive state after task completion.
Corrigibility Layer: Ensures the system can be interrupted, modified, or shut down without resistance.

Reward for Dormancy

The key novelty is the reward structure. Let:

R(t) = reward at time t
G = 1 if the goal is complete, 0 otherwise
D(t) = 1 if the system is passive, 0 if it acts unnecessarily

Then:

R(t) =

This simple rule encourages the system to finish the job, then stop, since lingering action beyond completion would reduce or eliminate the reward.

Implementation Considerations

Some challenges remain:

Measuring sufficiency: How do we define “good enough” in different domains? Metrics must be clear, verifiable, and hard to game.
Reward integrity: The reward system must be external and tamper-proof.
Ensuring dormancy: The AI’s dormant state must prevent it from acting in unintended ways, like modifying itself or external systems.

Why This Matters

Lazy AI has several potential advantages:

Safer behavior: Dormancy reduces the risk of power-seeking or misaligned action.
Predictable outcomes: Humans can know that once a task is complete, the AI won’t act on its own.
Practical utility: The system still completes tasks effectively, without unnecessary optimization.

Limitations

Defining sufficiency thresholds can be tricky.
Dormancy must be enforced securely.
Longer-term or complex tasks may create unexpected emergent behaviors.

Discussion

The lazy AI approach flips the usual AGI paradigm: instead of pushing systems to always optimize, we reward them for knowing when to stop. Tying rewards to dormancy aligns the system with human oversight and reduces the likelihood of unintended consequences. While this doesn’t eliminate all risks, it offers a straightforward, implementable strategy for safer AGI.

Conclusion

Lazy AI emphasizes completing tasks to a good-enough level and stopping afterward, with rewards structured to reinforce dormancy. This approach could serve as a practical blueprint for building safe, corrigible AGI that behaves predictably while still being useful. Future research can explore sandboxing, verification, and testing of such systems in different domains.

LESSWRONG
LW

LESSWRONG
LW

1

Lazy AI: A Satisficing Architecture for Safe Artificial General Intelligence

1

1

1