> "The first catastrophe mechanism seriously considered seems to have been the possibility, raised in the 1940s at Los Alamos before the first atomic bomb tests, that fission or fusion bombs might ignite the atmosphere or oceans in an unstoppable chain reaction."[1]
This is not our first rodeo. We have done risk assessments before. The best reference-class examples I could find were the bomb, vacuum decay, killer strangelets, and LHC black holes (all covered in [1]).
I was looking for a few days, but didn't complete my search, but I decided to publish this note as now Tyler Cowen is asking too: "Which is the leading attempt to publish a canonical paper on AGI risk, in a leading science journal, refereed of course. The paper should have a formal model or calibration of some sort, working toward the conclusion of showing that the relevant risk is actually fairly high. Is there any such thing?"
The three papers people replied with were:
- Is Power-Seeking AI an Existential Risk?
- The Alignment Problem from a Deep Learning Perspective
- Unsolved Problems in ML Safety
Places I was looking so far:
- The list of references for that paper[2]
- The references for the Muehlhauser and Salamon intelligence explosion paper[3]
- The Sandburg review of singularities[4] and related papers (these are quite close to passing muster I think)
Places I wanted to look further:
- Papers by Yampolsky, aka[5]
- Papers mentioned in there by Schmidhuber (haven't gotten around to this)
- I haven't thoroughly reviewed Intelligence Explosion Microeconomics, maybe this is the closest thing to fulfilling the criteria?
But if there is something concrete in eg. some papers by Yampolsky and Schmidhuber, why hasn't anyone fleshed it out in more detail?
For all the time people spend working on 'solutions' to the alignment problem, there still seems to be a serious lack of 'descriptions' of the alignment problem. Maybe the idea is, if you found the latter you would automatically have the forme