By "refining pure human feedback", do you mean refining RLHF ML techniques?
I assume you still view enhancing human feedback as valuable? And also more straightforwardly just increasing the quality of the best human feedback?
Amazing! Thanks so much for making this happen so quickly.
To anyone who's trying to figure out how to get it to work on Google Podcasts, here's what worked for me (searching the name didn't, maybe this will change?):
Go to the Libsyn link. Click the RSS symbol. Copy the link. Go to Google Podcasts. Click the Library tab (bottom right). Go to Subscriptions. Click symbol that looks like adding a link in the upper right. Paste link, confirm.
Hey Paul, thanks for taking the time to write that up, that's very helpful!
Hey Rohin, thanks a lot, that's genuinely super helpful. Drawing analogies to "normal science" seems both reasonable and like it clears the picture up a lot.
I would be interested to hear opinions about what fraction of people could possibly produce useful alignment work?
Ignoring the hurdle of "knowing about AI safety at all", i.e. assuming they took some time to engage with it (e.g. they took the AGI Safety Fundamentals course). Also assume they got some good mentorship (e.g. from one of you) and then decided to commit full-time (and got funding for that). The thing I'm trying to get at is more about having the mental horsepower + epistemics + creativity + whatever other qualities are useful, or likely being able to get there after some years of training.
Also note that I mean direct useful work, not indirect meta things like outreach or being a PA to a good alignment researcher etc. (these can be super important, but I think it's productive to think of them as a distinct class). E.g. I would include being a software engineer at Anthropic, but exclude doing grocery-shopping for your favorite alignment researcher.
An answer could look like "X% of the general population" or "half the people who could get a STEM degree at Ivy League schools if they tried" or "a tenth of the people who win the Fields medal".
I think it's useful to have a sense of this for many purposes, incl. questions about community growth and the value of outreach in different contexts, as well as priors about one's own ability to contribute. Hence, I think it's worth discussing honestly, even though it can obviously be controversial (with some possible answers implying that most current AI safety people are not being useful).
That's a very detailed answer, thanks! I'll have a look at some of those tools. Currently I'm limiting my use to a particular 10-minute window per day with freedom.to + the app BlockSite. It often costs me way more than 10 minutes (checking links after, procrastinating before...) of focus though, so I might try to find an alternative.
Sorry for the tangent, but how do you recommend engaging with Twitter, without it being net bad?
Maybe I'm being stupid here. On page 42 of the write-up, it says:
In order to ensure we learned the human simulator, we would need to change the training strategy to ensure that it contains sufficiently challenging inference problems, and that doing direct translation was a cost-effective way to improve speed (i.e. that there aren’t other changes to the human simulator that would save even more time). [emphasis mine]
Shouldn't that be?
In order to ensure we learned the direct translator, ...
There are two very similar pages. This one and https://www.lesswrong.com/tag/scoring-rules/