I'm surprised by how strong the disagreement is here. Even if what we most need right now is theoretical/pre-paradigmatic, that seems likely to change as AI develops and people reach consensus on more things; compare eg. the work done on optics pre-1800 to all the work done post-1800. Or the work done on computer science pre-1970 vs. post-1970. Curious if people who disagree could explain more - is the disagreement about what stage the field is in/what the field needs right now in 2022, or the more general claim that most future work will be empirical?
I think saying "we" here dramatically over-indexes on personal observation. I'd bet that most overweight Americans have not only eaten untasty food for an extended period (say, longer than a month); and those that have, found that it sucked and stopped doing it. Only eating untasty food really sucks! For comparison, everyone knows that smoking is awful for your health, it's expensive, leaves bad odors, and so on. And I'd bet that most smokers would find "never smoke again" easier and more pleasant (in the long run) than "never eat tasty food again". Yet, the vast majority of smokers continue smoking:https://news.gallup.com/poll/156833/one-five-adults-smoke-tied-time-low.aspx
https://transformer-circuits.pub/ seems impressive to me!
There are now quite a lot of AI alignment research organizations, of widely varying quality. I'd name the two leading ones right now as Redwood and Anthropic, not MIRI (which is in something of a rut technically). Here's a big review of the different orgs by Larks:
Great post. I'm reminded of instructions from the 1944 CIA (OSS) sabotage manual:"When possible, refer all matters to committees, for “further study and consideration.” Attempt to make the committee as large as possible — never less than five."
Eliezer's writeup on corrigibility has now been published (the posts below by "Iarwain", embedded within his new story Mad Investor Chaos). Although, you might not want to look at it if you're still writing your own version and don't want to be anchored by his ideas.
Would be curious to hear more about what kinds of discussion you think are net negative - clearly some types of discussion between some people are positive.
Thanks for writing this! I think it's a great list; it's orthogonal to some other lists, which I think also have important stuff this doesn't include, but in this case orthogonality is super valuable because that way you're less likely for all lists to miss something.
This is an awesome comment, I think it would be great to make it a top-level post. There's a Facebook group called "Information Security in Effective Altruism" that might also be interested
I hadn't seen that, great paper!