I mentioned that I expect proof-level guarantees will be easy once the conceptual problems are worked out. Strong interpretability is part of that: if we know how to "see whether the AI runs a check for whether it can deceive humans", then I expect systems which provably don't do that won't be much extra work. So we might disagree less on that front than it first seemed.

The question of whether to model the AI as an open-ended optimizer is is one I figured would come up. I don't think we need to think of it as truly open-ended in or... (read more)

AI Alignment Open Thread August 2019

by habryka 1 min read4th Aug 201996 comments


Ω 12

