I have been watching this video https://www.youtube.com/watch?v=EUjc1WuyPT8 on AI alignment (something I'm very behind on, my apologies) and it occurred to me that one aspect of the problem is finding a concrete formalized solution to Goodhart's law-styled problems? Like Yudkowsky was talking about ways that an AGI optimized towards making smiles could go wrong (namely, the AGI could find smarter and smarter ways to effectively give everyone heroin to quickly create lasting smiles) - and it seems like one aspect of this problem is that the metric... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post

Is this in fact a part of the AI alignment problem, and if so is anyone trying to solve this facet of the problem and where might I go to read more about that?

Yes, it's part of some approaches to the AI alignment problem. It used to be considered more central to AI alignment until people started thinking it might be too hard, and started working on other ways of trying to solve AI alignment that perhaps don't require "finding an effective way to tell an AI what wellbeing is". See AI Safety "Success Stories" where "Sovereign Singleton" requires solving t

... (Read more)(Click to expand thread. ⌘F to Expand All)Cmd/Ctrl F to expand all comments on this post

hereisonehand's Shortform

by hereisonehand 25d24th Aug 201918 comments

2