Disclaimer: I might have no idea what I'm talking about and my proposal might be an infohazard but it also might save us all; I don't know

LessWrong has monthly AGI safety basic questions threads where people typically ask in the comments why their imagined solution wouldn't work, without understanding the problem. We are essentially prompting people to act like Anti-Eliezers in this way.

I can't read Eliezer's mind super well. I don't know what solutions he will approve but he'll probably disapprove 99+% of them. Maybe he will feel better-assured if I am training myself in seeing the difficulty of the task instead of the ease.

And so, I think what would be more productive is a monthly AGI Doom Argument thread, where people make theorem-style multi-step logical arguments about what is necessary or sufficient for aligned AGI, and why attempting to acquire those desiderata is a terrible idea. That will make people more like the kind of people who would spontaneously write the List of Lethalities from scratch without being prompted to, so goes my hypothesis. 

Each argument should be associated with its creator and numerically indexed by its recency of creation, and then maybe we could find some way to store, categorically sort, and rate on multiple metrics, those arguments on a wiki or something.

Here's an example:

LVSN's Doom Argument #1: 

1. In order for an AI to become aligned, it must live with humans and evolve bottom-up thinking styles analogously to humans through reproduction 

2. Living and evolving analogously to humans takes several generations 

3. We don't have several generations' time before AGI 

4. Therefor when AGI comes it will not be aligned

Maybe this is a dumb argument. Maybe it's a bad argument. Maybe the premises are definitely wrong or maybe we don't know if they're wrong. Maybe it's ignoring a lot of desiderata. Maybe it's right. Whatever you have to say about the argument, the argument can be labeled as such. Eventually the strongest arguments for our doom will be distilled, and copy/pasting those distillations will be a lot easier than explaining to each alignment-wannabe on a case-by-case basis why their solution doesn't work.

This would create progress on simply understanding the problem. The more would-be alignment savants we can get to actually understand the problem, the better chances we have of dying with dignity.

New Comment