Rejected for the following reason(s):
- Insufficient Quality for AI Content.
- We are sorry about this, but submissions from new users that are mostly just links to papers on open repositories (or similar) have usually indicated either crackpot-esque material, or AI-generated speculation.
- Not obviously not Language Model.
Read full explanation
Summary
Current alignment techniques (RLHF, anti-scheming training) often rely on post-hoc reward shaping, which can be brittle under distributional shift. As a Systems Engineer, I propose a structural alternative: ConsciOS, a nested control architecture based on Stafford Beer’s Viable System Model (VSM) and Active Inference.
The Architecture
Instead of a monolithic agent, the system is decomposed into three nested controllers (based on VSM tiers):
Key Mechanism: Coherence-Gated Selection
The core contribution is the Resonance Engine: a selector that gates high-complexity actions based on "Time-Integrated Coherence" (TIC). Effectively, the agent is structurally incapable of executing complex, covert plans if its internal state does not resonate with its safety priors. This aims to create a "physics of safety" rather than a "rule of safety."
Empirical Roadmap
The paper outlines specific simulation benchmarks for distinguishing "coherent" behavior from "reward-hacking" behavior. I am looking for feedback specifically on the implementation of the Interoceptive Control Signal (ICS) as a shaping reward.
Link to Full Preprint on Zenodo (v4)
(Note: This paper has seen significant interest from the systems engineering community, with ~190 downloads in the last 3 weeks. I am now bringing it to the Alignment community for rigorous critique.)