LESSWRONGTags
LW

Gradient Hacking

EditHistorySubscribe
Discussion (0)
Help improve this page (1 flag)
EditHistorySubscribe
Discussion (0)
Help improve this page (1 flag)
Gradient Hacking
Random Tag
Contributors
0Multicore

Gradient Hacking describes a scenario where a mesa-optimizer in an AI system acts in a way that intentionally manipulates the way that gradient descent updates it, likely to preserve its own mesa-objective in future iterations of the AI.

See also: Inner Alignment

Posts tagged Gradient Hacking
Most Relevant
4
102Gradient hackingΩ
evhub
3y
Ω
39
3
141Gradient hacking is extremely difficultΩ
beren
12d
Ω
18
3
50Gradient FilteringΩ
Jozdien, janus
17d
Ω
16
3
15Some real examples of gradient hackingΩ
Oliver Sourbut
1y
Ω
8
2
68How does Gradient Descent Interact with Goodhart?QΩ
Scott Garrabrant, evhub
4y
QΩ
19
2
52Gradient Hacker Design Principles From BiologyΩ
johnswentworth
5mo
Ω
13
2
33Thoughts on gradient hackingΩ
Richard_Ngo
1y
Ω
12
2
31Towards Deconfusing Gradient HackingΩ
leogao
1y
Ω
3
2
24Gradient hacking: definitions and examplesΩ
Richard_Ngo
7mo
Ω
1
2
16Approaches to gradient hackingΩ
adamShimi
1y
Ω
8
1
55Meta learning to gradient hackΩ
Quintin Pope
1y
Ω
11
1
33Gradient Hacking via Schelling GoalsΩ
Adam Scherlis
1y
Ω
4
1
30Understanding Gradient HackingΩ
peterbarnett
1y
Ω
5
1
30A Toy Model of Gradient HackingΩ
Oam Patel
8mo
Ω
7
1
28Obstacles to gradient hackingΩ
leogao
1y
Ω
11
Load More (15/19)
Add Posts