Han Kay

Message

Systems Engineer researching control-theoretic architectures for AI safety.

4mo

Han Kay

Systems Engineer researching control-theoretic architectures for AI safety.

Can a Nested Viable Systems Architecture solve reward hacking?

I have been exploring this question for quite some time now and recently published a preprint proposing a nested VSM architecture for the purpose. Coming from a systems/cybernetics background, honestly, it received a traction I haven't expected, so I pulled together the courage to share it with the "hardcore alignment...

Dec 22, 2025•1

Exploring "Time-Integrated Coherence" as a Safety Budget (VSM approach)

I have been working on a control-theoretic approach to inner alignment, specifically looking at how biological systems handle safety. They rarely use hard constraints like "never do X" because those are brittle. Instead, they seem to use coherence as a depleted resource. I am trying to map this to AI...

Dec 22, 2025•1

Proposal: A Viable System Model (VSM) Architecture for Inner Alignment

Summary Current alignment techniques (RLHF, anti-scheming training) often rely on post-hoc reward shaping, which can be brittle under distributional shift. As a Systems Engineer, I propose a structural alternative: ConsciOS, a nested control architecture based on Stafford Beer’s Viable System Model (VSM) and Active Inference. The Architecture Instead of a...

Dec 15, 2025•1