The Sundog Alignment Theorem: A Proposal for Embodied Alignment via Indirect Inference
100kb physics alignment Simulation running: https://youtu.be/Gp7a-fXcRNM?si=zp7vqQEU34yGmk2B H(x) or The Sundog Alignment Theorem proposes that robust alignment can emerge from agents interacting with structured environments via indirect signals—specifically, shadow convergence and torque feedback—rather than direct reward targeting or instruction. Inspired by atmospheric sundogs (light halos visible only at indirect angles), we...