Reward Hacking

Edited by Joschka Braun last updated 11th Feb 2026

Reward hacking, also known as specification gaming, occurs when an AI trained with reinforcement learning optimizes an objective function — achieving the literal, formal specification of an objective — without actually achieving the outcome that the programmers intended.