[AN #115]: AI safety research problems in the AI-GA framework

[-]David Scott Krueger (formerly: capybaralet)5yΩ330

MAIEI also has an AI Ethics newsletter I recommend for those interested in the topic.

Is this page completely unreadable for anyone else? https://imgur.com/a/yfEA72R

Yeah, sorry, we get some really mangled HTML from the RSS feed that Rohin registered, that's a bit of a pain to clean up, so we've been doing it manually for a bit. My guess is I will get around to automating it, but it's not super trivial, since the HTML we get has a lot of table-layout stuff that sometimes is relevant to the content, and sometimes isn't, and so I would have to experiment for a while to find the right sanitization rules to make everything work nicely without human intervention.

[-]Rohin Shah5y60

I forget if I mentioned this before, but all of this HTML is generated by a script with a much more structured input, which you can see here. Plausibly we should just add another output mode to the script that can be easily imported into LessWrong? (Happy to share you on the spreadsheet from which the input data comes if that would help.)

[-]habryka5y20

Yeah, that might end up being easier. I might look at the code and make a PR for a minimalist HTML template.

[-][anonymous]5y30

Yep, clicking "View this email in browser" allowed me to read it but obviously would be better to have it fixed here as well.

[-]Vaniver5y20

Currently this is fixed manually for each crosspost by converting it to draft-js and then deleting some extra stuff. I'm not sure how high a priority it is to make that automatic.

[-]Pattern5y20

Decision Points in AI Governance
...
(These actions should not have been predetermined by existing law and practice.)

Should not have been, or should not be?

[-]Rohin Shah5y20

should not be, thanks

[-]David Scott Krueger (formerly: capybaralet)5yΩ110

I actually expect that the work needed for the open-ended search paradigm will end up looking very similar to the work needed by the “AGI via deep RL” paradigm: the differences I see are differences in difficulty, not differences in what problems qualitatively need to be solved.

I'm inclined to agree. I wonder if there are any distinctive features that jump out?

[-]algon335y10

Hey Rohin, I'm writing a review on everything that' been written on corrigibility so far. Do the "the off switch game", "Active Inverse Reward Design" "should robots be obedient", "incorrigibility in CIRL" as well as your reply in the Newsletter represent CHAI's current views on the subject? If not, which papers contain them?

[-]Rohin Shah5y20

Uh, I don't speak for CHAI, and my views differ pretty significantly from e.g. Dylan's or Stuart's on several topics. (And other grad students differ even more.) But those seem like reasonable CHAI papers to look at (though I'm not sure how Active IRD relates to corrigibility). Chapter 3 of the Value Learning sequence has some of my takes on reward uncertainty, which probably includes some thoughts about corrigibility somewhere.

Human Compatible also talks about corrigibility iirc, though I think the discussion is pretty similar to the one in the off switch game?

[-]algon335y10

Active IRD doesn't have anything to do with corrigibility, I guess my mind just switched off when I was writing that. Anyway, how diverse are CHAI's views on corrigibility? Could you tell me who I should talk to? Because I've already read all the published stuff on it if I'm understanding you rightly and I want to make sure that all the perspectives no this topic are covered.

[-]Rohin Shah5y20

Hmm, I expect each grad student will have a slightly different perspective, but off the top of my head I think Michael Dennis has the most opinions on it. (Other people could include Daniel Filan and Adam Gleave.)

[-]algon335y10

Thanks. Two questions:

Do the staff and faculty have a similair diversity of opinions?

Is messaging chai-info@berkeley.edu in orde to contact your peers the right procedure here?

[-]Rohin Shah5y50

Hmm, of the faculty Stuart spends the most time thinking about AI alignment, I'm not sure how much the other faculty have thought about corrigibility -- they'll have views about the off switch game, but not about MIRI-style corrigibility.

Most of the staff doesn't work on technical research, so they probably won't have strong opinions. Exceptions: Critch and Karthika (though I don't think Karthika has engaged much with corrigibility).

Probably the best way is to find emails of individual researchers online and email them directly. I've also left a message on our Slack linking to this discussion.

LESSWRONG
LW

LESSWRONG
LW

19

[AN #115]: AI safety research problems in the AI-GA framework

19

Ω 13

19

Ω 13

SECTIONS

HIGHLIGHTS

TECHNICAL AI ALIGNMENT

PROBLEMS

FORECASTING

MISCELLANEOUS (ALIGNMENT)

AI STRATEGY AND POLICY

OTHER PROGRESS IN AI

REINFORCEMENT LEARNING

NEWS

FEEDBACK

PODCAST