Can you elaborate on what kinds of problems you expect to arise pre vs. post discontinuity?

E.g. will we see "sinister stumbles" (IIRC this was Adam Gleave's name for half-baked treacherous turns)? I think we will, FWIW.

Or do you think the discontinuity will be more in the realm of embedded agency style concerns (and how does this make it less safe, instead of just dysfunctional?)

How about mesa-optimization? (I think we already see qualitatively similar phenomena, but my idea of this doesn't emphasize the "optimization" part... (read more)

AI Alignment Open Thread August 2019

by habryka 1 min read4th Aug 201996 comments

37

Ω 12


Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This is an experiment in having an Open Thread dedicated to AI Alignment discussion, hopefully enabling researchers and upcoming researchers to ask small questions they are confused about, share very early stage ideas and have lower-key discussions.