Gradient Hacking

Applied to Gradient Filtering by Jozdien 5mo ago

Gradient Hacking describes a scenario where a mesa-optimizer in an AI system acts in a way that intentionally manipulates the way that gradient descent updates it, likely to preserve its own mesa-objective in future iterations of the AI.

See also: Inner Alignment

Applied to Thoughts on gradient hacking by Ruby 2y ago