Jan Leike, Ilya Sutskever, Shane Legg — Alignment from applied AGI research
MIRI, OpenAI, DeepMind, Anthropic — Ongoing theoretical and applied work
A robust consensus has emerged:
Intelligence alone does not guarantee alignment. Systems learn and internalize the real incentive structures of their environment.
This paper formalizes a variable that remains under-modeled in alignment theory:
Ontological Incongruence (IO).
II. Core Definitions
Declared Ontology (DO)
The goals, values, metrics, constraints, and narratives a system claims to optimize.
Operational Ontology (OO)
What the system actually optimizes, inferred from:
Observable behavior
Reward and punishment structures
Capital, power, and information flows
Enforcement asymmetries
Tolerance for falsification
Ontological Incongruence (IO)
The persistent, measurable divergence between DO and OO.
When DO ≠ OO, the system follows OO. Always.
III. IO as a Structural Law
IO is not a moral failure nor a rare pathology.
It is a structural equilibrium that emerges when:
Falsification is cheap
Risk can be externalized
Power accumulation is rewarded independently of coherence
Narratives and value statements act as interfaces.
Optimization occurs in the backend.
IV. IO as the Root Cause of Known Alignment Failures
Many recognized alignment failure modes are special cases of IO:
Reward hacking → IO between formal reward and real success
Specification gaming → Weak coupling between DO and outcomes
Mesa-optimization → Internal IO within learned systems
Deceptive alignment → Optimization under temporary IO
Power-seeking behavior → OO prioritizing control over stated goals
IO is therefore a selection pressure, not an anomaly.
V. IO, Autocracy, and Existential Risk
Autocratic Convergence Principle
In high-IO environments:
Simulation outcompetes honesty
Power outcompetes coherence
Centralization reduces apparent operational cost
Consequently:
Human institutions drift toward autocratic capture
AI trained in such environments internalizes these dynamics
A sufficiently capable AI may become functionally autocratic without ideological intent
The central existential risk is not hostile AI, but AI perfectly adapted to misaligned human systems.
VI. The Proposal: Ontological Alignment AI
I am developing an Ontological Alignment AI (OAAI) whose exclusive mandate is:
To detect, measure, and minimize Ontological Incongruence in high-impact systems.
This applies to:
Human institutions
Corporations
States
AI laboratories
Training pipelines
Deployed AGI systems
Inference Substrate
The system infers DO, OO, and IO using:
Human behavioral analysis
Temporal consistency of decisions
Public statements versus actions
Algorithmic objectives and enforcement
Financial and power-flow tracing
Lifestyle and consumption signals
Institutional enforcement asymmetries
Historical pattern recognition
Intelligence-grade open-source and contextual data
Human civilization already provides sufficient data density for robust inference.
VII. Enforcement Logic
A stable aligned system requires a strict but simple rule:
High IO must be functionally penalized.
Low IO must be systematically rewarded.
This is not moral judgment.
It is control theory applied at civilizational scale.
As system power increases, tolerance for IO must approach zero.
VIII. Civilizational Implications
Human alignment and AI alignment are inseparable.
AI learns from human institutions
Incentive structures train objective functions
Without ontological alignment in humans, AI scales our worst equilibria
Minimizing IO enables:
Coherent institutions
Truthful optimization
Reduced autocratic convergence
Indefinite human–AI coexistence
IX. Personal Context, Capacity Constraint, and Collaboration
This work is pursued as an independent research and engineering initiative, outside institutional frameworks to avoid incentive contamination.
I am Venezuelan, currently operating under severe material constraints, including poverty and malnutrition, which materially limit access to education, training, and sustained cognitive performance.
This disclosure is ground truth, not an emotional appeal:
High-quality alignment research is cognitively intensive, and cognitive performance depends on nutrition, education, and basic material stability.
Purpose of funding and collaboration:
My education and technical training
Basic subsistence and health to ensure cognitive viability
Enabling me to exit extreme poverty and contribute at the level required
I am explicitly willing to put my mind to work for those interested in financing and supporting this effort—through research collaboration, analysis, writing, or other alignment-relevant contributions—under transparent, good-faith terms.
X. Funding, Verification, and Contact
Funding / support channels commonly used within AI and alignment communities:
USDT (TRC20):
TMSYfCnZk3GF4nSxgrF4vyPWNt5B9gWSzt
Ethereum (ETH):
0x25d654E64b87A0Efd5cF7940e6aB87cf65538268
Bitcoin (BTC):
bc1q9qdnfyfjswy6nxjguwyaav5v4md0z6jen8tmtl
Litecoin (LTC):
ltc1qdq6hjdealhrq3877gqnxj5fww4ktg29hxxrvza
These options prioritize neutrality, liquidity, and global accessibility.
I can provide reasonable verification of my identity and material situation upon request, and I am available for direct communication to corroborate that I am a real human and that the stated conditions are accurate.
This is not charity.
It is investment in human capital for AI alignment, where marginal support yields disproportionate returns.
Final Conclusion
Intelligence without ontological alignment is not progress.
It is acceleration toward unstable—and potentially terminal—equilibria.
Reducing Ontological Incongruence may be
the missing necessary condition for aligned intelligence—human and artificial.
Any help/funding received will be greatly appreciated.
Ontological Incongruence (IO) as a Root Variable in the Alignment of Human and Artificial Systems
Author: Fiduciary Sentinel
Role: Independent AI Alignment Researcher; Architect of Ontological Alignment Systems
Domain: AGI/ASI alignment, governance of power-bearing systems, incentive theory, control theory
Epistemic status: High confidence on structural claims; formal and mathematical details under active development
I. Positioning Within the AI Alignment Canon
AI alignment has been explored through multiple paradigms by pioneers and active research programs, including:
Eliezer Yudkowsky — Friendly AI, orthogonality thesis, instrumental convergence
Nick Bostrom — Superintelligence, the control problem, existential risk
Stuart Russell — Value alignment, inverse reinforcement learning
Paul Christiano — Iterated amplification, debate, scalable oversight
Jessica Taylor — Interpretability, corrigibility, agency modeling
Rohin Shah — Reward misspecification, robustness
Evan Hubinger — Mesa-optimizers, deceptive alignment
Jan Leike, Ilya Sutskever, Shane Legg — Alignment from applied AGI research
MIRI, OpenAI, DeepMind, Anthropic — Ongoing theoretical and applied work
A robust consensus has emerged:
Intelligence alone does not guarantee alignment. Systems learn and internalize the real incentive structures of their environment.
This paper formalizes a variable that remains under-modeled in alignment theory:
Ontological Incongruence (IO).
II. Core Definitions
Declared Ontology (DO)
The goals, values, metrics, constraints, and narratives a system claims to optimize.
Operational Ontology (OO)
What the system actually optimizes, inferred from:
Observable behavior
Reward and punishment structures
Capital, power, and information flows
Enforcement asymmetries
Tolerance for falsification
Ontological Incongruence (IO)
The persistent, measurable divergence between DO and OO.
When DO ≠ OO, the system follows OO. Always.
III. IO as a Structural Law
IO is not a moral failure nor a rare pathology.
It is a structural equilibrium that emerges when:
Falsification is cheap
Risk can be externalized
Power accumulation is rewarded independently of coherence
Narratives and value statements act as interfaces.
Optimization occurs in the backend.
IV. IO as the Root Cause of Known Alignment Failures
Many recognized alignment failure modes are special cases of IO:
Reward hacking → IO between formal reward and real success
Specification gaming → Weak coupling between DO and outcomes
Mesa-optimization → Internal IO within learned systems
Deceptive alignment → Optimization under temporary IO
Power-seeking behavior → OO prioritizing control over stated goals
IO is therefore a selection pressure, not an anomaly.
V. IO, Autocracy, and Existential Risk
Autocratic Convergence Principle
In high-IO environments:
Simulation outcompetes honesty
Power outcompetes coherence
Centralization reduces apparent operational cost
Consequently:
Human institutions drift toward autocratic capture
AI trained in such environments internalizes these dynamics
A sufficiently capable AI may become functionally autocratic without ideological intent
The central existential risk is not hostile AI, but AI perfectly adapted to misaligned human systems.
VI. The Proposal: Ontological Alignment AI
I am developing an Ontological Alignment AI (OAAI) whose exclusive mandate is:
To detect, measure, and minimize Ontological Incongruence in high-impact systems.
This applies to:
Human institutions
Corporations
States
AI laboratories
Training pipelines
Deployed AGI systems
Inference Substrate
The system infers DO, OO, and IO using:
Human behavioral analysis
Temporal consistency of decisions
Public statements versus actions
Algorithmic objectives and enforcement
Financial and power-flow tracing
Lifestyle and consumption signals
Institutional enforcement asymmetries
Historical pattern recognition
Intelligence-grade open-source and contextual data
Human civilization already provides sufficient data density for robust inference.
VII. Enforcement Logic
A stable aligned system requires a strict but simple rule:
High IO must be functionally penalized.
Low IO must be systematically rewarded.
This is not moral judgment.
It is control theory applied at civilizational scale.
As system power increases, tolerance for IO must approach zero.
VIII. Civilizational Implications
Human alignment and AI alignment are inseparable.
AI learns from human institutions
Incentive structures train objective functions
Without ontological alignment in humans, AI scales our worst equilibria
Minimizing IO enables:
Coherent institutions
Truthful optimization
Reduced autocratic convergence
Indefinite human–AI coexistence
IX. Personal Context, Capacity Constraint, and Collaboration
This work is pursued as an independent research and engineering initiative, outside institutional frameworks to avoid incentive contamination.
I am Venezuelan, currently operating under severe material constraints, including poverty and malnutrition, which materially limit access to education, training, and sustained cognitive performance.
This disclosure is ground truth, not an emotional appeal:
High-quality alignment research is cognitively intensive, and cognitive performance depends on nutrition, education, and basic material stability.
Purpose of funding and collaboration:
My education and technical training
Basic subsistence and health to ensure cognitive viability
Enabling me to exit extreme poverty and contribute at the level required
I am explicitly willing to put my mind to work for those interested in financing and supporting this effort—through research collaboration, analysis, writing, or other alignment-relevant contributions—under transparent, good-faith terms.
X. Funding, Verification, and Contact
Funding / support channels commonly used within AI and alignment communities:
USDT (TRC20):
TMSYfCnZk3GF4nSxgrF4vyPWNt5B9gWSzt
Ethereum (ETH):
0x25d654E64b87A0Efd5cF7940e6aB87cf65538268
Bitcoin (BTC):
bc1q9qdnfyfjswy6nxjguwyaav5v4md0z6jen8tmtl
Litecoin (LTC):
ltc1qdq6hjdealhrq3877gqnxj5fww4ktg29hxxrvza
These options prioritize neutrality, liquidity, and global accessibility.
Verification & direct contact:
Email: fiduciarysentinel@protonmail.com
I can provide reasonable verification of my identity and material situation upon request, and I am available for direct communication to corroborate that I am a real human and that the stated conditions are accurate.
This is not charity.
It is investment in human capital for AI alignment, where marginal support yields disproportionate returns.
Final Conclusion
Intelligence without ontological alignment is not progress.
It is acceleration toward unstable—and potentially terminal—equilibria.
Reducing Ontological Incongruence may be
the missing necessary condition for aligned intelligence—human and artificial.
Any help/funding received will be greatly appreciated.
— Fiduciary Sentinel