This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
Goodhart's Law
•
Applied to
Resolutions to the Challenge of Resolving Forecasts
by
Noosphere89
at
1mo
•
Applied to
Soft optimization makes the value target bigger
by
Vladimir_Nesov
at
1mo
•
Applied to
Don't align agents to evaluations of plans
by
TurnTrout
at
2mo
•
Applied to
Don't design agents which exploit adversarial inputs
by
TurnTrout
at
3mo
•
Applied to
Alignment allows "nonrobust" decision-influences and doesn't require robust grading
by
TurnTrout
at
3mo
•
Applied to
Scaling Laws for Reward Model Overoptimization
by
leogao
at
4mo
•
Applied to
Outer alignment and imitative amplification
by
Noosphere89
at
4mo
•
Applied to
Oversight Leagues: The Training Game as a Feature
by
Paul Bricman
at
5mo
•
Applied to
Can "Reward Economics" solve AI Alignment?
by
Q Home
at
5mo
•
Applied to
Reducing Goodhart: Announcement, Executive Summary
by
Ruby
at
6mo
•
Applied to
The Dark Miracle of Optics
by
Noosphere89
at
6mo
•
Applied to
Bayesianism versus conservatism versus Goodhart
by
Noosphere89
at
6mo
•
Applied to
Circumventing interpretability: How to defeat mind-readers
by
Lee Sharkey
at
7mo
•
Applied to
Proxy misspecification and the capabilities vs. value learning race
by
Ruby
at
9mo
•
Applied to
Goodhart's Law Causal Diagrams
by
Raemon
at
10mo
•
Applied to
Replacing Karma with Good Heart Tokens (Worth $1!)
by
jimrandomh
at
10mo
•
Applied to
[Intro to brain-like-AGI safety] 10. The alignment problem
by
Steven Byrnes
at
10mo
•
Applied to
Practical everyday human strategizing
by
[anonymous]
at
10mo
•
Applied to
Why Agent Foundations? An Overly Abstract Explanation
by
plex
at
10mo