hereisonehand's Shortform

I have been watching this video on AI alignment (something I'm very behind on, my apologies) and it occurred to me that one aspect of the problem is finding a concrete formalized solution to Goodhart's law-styled problems? Like Yudkowsky was talking about ways that an AGI optimized towards making smiles could go wrong (namely, the AGI could find smarter and smarter ways to effectively give everyone heroin to quickly create lasting smiles) - and it seems like one aspect of this problem is that the metric... (read more)

Is this in fact a part of the AI alignment problem, and if so is anyone trying to solve this facet of the problem and where might I go to read more about that?

Yes, it's part of some approaches to the AI alignment problem. It used to be considered more central to AI alignment until people started thinking it might be too hard, and started working on other ways of trying to solve AI alignment that perhaps don't require "finding an effective way to tell an AI what wellbeing is". See AI Safety "Success Stories" where "Sovereign Singleton" requires solving t

... (read more)

hereisonehand's Shortform

by hereisonehand 24th Aug 201918 comments