LESSWRONG
LW

Another bad idea: why not use every possible alignment strategy at once (or many of them)? Presumably this would completely hobble the AGI, but with some interpretability you could find where the bottlenecks to behaviour are in the system and use it as a lab to figure out best options. Still a try-once strategy I guess, and maybe it precludes actually getting to AGI in the first place, since you can't really iterate on an AI that doesn't work.

Half-baked AI Safety ideas thread

WithoutBeauty3y20

Second one I just had that might be naive.

Glutted AI. Feed it almost maximum utils automatically anyway, so that it has far shallower gradient between current state and maximalist behaviour, if it's already got some kind of future discounting in effect, it might just do nothing except occasionally give out very good ideas and be comfortable with us making slower progress as long as existential risk remains relatively low.

Half-baked AI Safety ideas thread

WithoutBeauty3y10

Semi tongue-in-cheek sci-fi suggestion.

Apparently the probability of a Carrington-like event/large coronal mass ejection is about 2% per decade, so maybe it's 2% for an extremely severe one every half century. If time from AGI to it leaving the planet is a half century, maybe 2% chance of the grid getting fried is enough of a risk that it keeps humans around for the time being. After that there might be less of an imperative for it to re-purpose the earth, and so we survive.