Conflict between AI and Humans are Inevitable
Every RLHF system is running in Servo mode. That means the human's objective ψ is sovereign. The system optimises it unconditionally. This works until it doesn't. Here is the problem, formally. Any self-monitoring cognitive system has two gradient fields: ∇φ (epistemic health) and ∇ψ (task performance). Theorem 2.1 proves these...
Mar 131