Ah that's clear, thanks! I must've overlooked the "In 2016" right at the top of the post.
Very minor thing but I was confused for a while when you say end of 2020, I thought of it as the year instead of the decade (2020s).
Your position makes sense. Part of it was just paraphrasing (what seems to me as) the 'consensus view' that preventing AIs from wiping us out is much more urgent / important than preventing AIs from keeping us alive in a far-from-ideal state.
This is a great guide - thank you. However, in my experience as someone completely new to the field, 100-200 hours on each level is very optimistic. I've easily spent double/triple the duration on the first two levels and not get to a comfortable level.
(If you only look at one, look at this:) An introduction to contemporary hazard analysis that justifies the methods far more completely than this post can.
For those who prefer not to spend 3 hours (or 1.5 hours on 2x speed) watching the video, the lecture notes are here. They seem fairly self-explanatory.
This is great, thanks!
"Because it's there" - George Mallory in 1923, when asked why he wanted to climb Everest. He died in his summit attempt the following year.
A part of me is worried that the terminology invites viewing mesa-optimisers as a description of a very specific failure mode, instead of as a language for the general worry described above.
I have been very confused about the term for a very long time, and have always thought mesa-optimisers is a very specific failure mode.This post helped me clear things up.
Are you just noting that the model won't necessarily find the global maxima, and only reach some local maxima?
That was my takeaway as well, but I'm also somewhat confused.
This post has helped me clear up some confusions that I had about inner misaglinment for the longest time. Thank you.