When I was first introduced to AI Safety, coming from a background studying psychology, I kept getting frustrated about the way people defined the and used the word "intelligence". They weren't able to address my questions about cultural intelligence, social evolution, and general intelligence in a way I found rigorous enough to be convincing. I felt like professionals couldn't answer what I considered to be basic and relevant questions about general intelligence, which meant that I took a lot longer to take AI Safety seriously than I otherwise would have. It feels possible to me that other people have run into AI Safety pitches and been turned off because of something similar -- a communication issue because both parties approached the conversation with very different background information. I'd love to try to minimize these occurrences, so if you've had anything similar happen, could you please share: 

What is something that you feel AI Safety pitches usually don't seem to understand about your field/background? What's a common place where you feel you've become stuck in a conversation with AI Safety pitches? What question/information makes/made the conversation stop progressing and start circling? 

(Cross-posted from the EA forum)

New Answer
Ask Related Question
New Comment

1 Answers sorted by

I think many working in AI safety, and specifically those focused on alignment, basically don't understand what values are and, alarmingly to me, haven't invested a lot of effort into figuring it out. I think this is motivated stopping, though, because figuring out what values are and how they work is hard and a problem that doesn't easily avail itself to methods well known by AI researchers, so it gets ignored or pushed off as something AI will figure out for us. As a result, most of the time value is treated as some kind of black box at worst and as an abstract mathematical construct akin to preferences at best, which is a step beyond just totally ignoring not knowing what values are but not anywhere close to where I think we'll need to be to build aligned AI.

I've tried to address this in the past, but don't know how much impact I've really had. The recent work on the shard theory of values gives me hope the situation is changing.

2 comments, sorted by Click to highlight new comments since: Today at 6:21 AM

Were the questions eventually answered to your satisfaction? If so, what/who did it? Or did you end up concluding that the AI Safety people have no idea what they are talking about when they mention "intelligence"? Or was the inferential gap just too large and you ended up doing all the work on your own?Or did something else happen?

The inferential gap didn't end up being worked out through conversation and I ended up mainly working that out by reading (Superintelligence, The Precipice, AGI Safety Fundamentals in that order) and bridging the other side of information with my own. I think this was pretty unfortunate time-wise though. Some of the things that were helpful included: 

- Increased understanding on my end of how ML worked such that I could understand what "learning" looked like. Once I understood this, it was easier to see how my initial questions might have sounded irrelevant to someone working on AI Safety. 
- A better understanding of what an AI planning multiple steps in advance (such as behaving until a treacherous turn) might look like. 
 - Encountering terms like APS or TAI, which communicated the ideas in ways that don't try to say "general intelligence" 

I'd mostly thank AGI Safety Fundamentals for these! I don't regret reading any of those resources, but I do think I'd have come to find AI Safety to be important more quickly if someone had addressed my questions with more understanding of my own background in the early stages. 

New to LessWrong?