Not sure who you have in mind as people believing this, but after searching both LW and Arbital, the closest thing I've found to a statement of the empirical claim is from Eliezer's 2012 Reply to Holden on ‘Tool AI’:

I’ve re­peat­edly said that the idea be­hind prov­ing de­ter­minism of self-mod­ifi­ca­tion isn’t that this guaran­tees safety, but that if you prove the self-mod­ifi­ca­tion sta­ble the AI might work, whereas if you try to get by with no proofs at all, doom is guaran­teed.

Paul Christiano argued against this at length in Stable self-improve

... (read more)

AI Alignment Open Thread August 2019

by habryka 1 min read4th Aug 201996 comments


Ω 12

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This is an experiment in having an Open Thread dedicated to AI Alignment discussion, hopefully enabling researchers and upcoming researchers to ask small questions they are confused about, share very early stage ideas and have lower-key discussions.