I spent the past week in Prague (natively called Praha, but known widely as Prague via the usual mechanisms others decide your own names) at the Human Level AI multi-conference held between the AGI, BICA, the NeSY conferences. It also featured the Future of AI track, which was my proximate reason for attending to speak on the Solving the AI Race panel thanks to the selection of my submission to GoodAI's Solving the AI Race challenge as one of the finalists. I enjoyed the conference tremendously and the people I met there even more, and throughout the four days I found myself noticing things I felt were worth sharing more widely, hence this field report.

I arrived on Tuesday and stayed at the home of a fellow EA (they can out themselves in the comments if they like, but I won't do it here since I couldn't get in touch with them in time to get their permission to disclose their personage). If you didn't already know, Prague is rapidly becoming a hub for effective altruists in Europe, and having visited it's easy to see why: the city is beautiful, most people speak enough English that communication is easy for everyone, the food is delicious (there's lots of vegan food even though traditional Czech cuisine is literally meat and potatoes), and it's easily accessible to everyone on the continent. Some of the recent events held in Prague include the Human-aligned AI Summer School and the upcoming Prague AI safety camp.

Wednesday was the first day of the conference, and I honestly wasn't quite sure what to expect, but Ben Goertzel did a great job of setting the tone with his introductory keynote that made clear the focus was on exploring ideas related to building AGI. We then dove in to talk after talk for the next 4 days, each one considering an idea that might lie on the path to enabling general intelligence for machines, and we doubled-up on the third day with a Future of AI program that considered AI policy and ethics issues. I took a lot of notes, but being a safety researcher I'm hesitant to share them because that feels like a hazard that might accelerate AGI development, so instead I'll share a few high-level insights that I think summarize what I learned without saying anything with more than a very small (let's say < 5%) chance of giving someone a dangerous inspiration they wouldn't have had anyway or as soon.

  • Progress towards AGI is not amorphous; it's being made through small starts by individuals with faces and names. I knew this intellectually, but it was different to live it—shaking people's hands, swapping stories, and hearing them express their hopes and doubts that don't appear in published work because it doesn't fit the academic form. That said, I still find their work threatening, but now I know the threat on a human level.
  • Thankfully we are further away than it often feels like if you focus on the success of deep learning systems. Nearly everyone at the conference agreed that deep learning in its current form is missing some key ingredients to get to AGI. Again, I won't repeat here what those are, but the good news is that deep learning appears to not be on the path to AGI even if it can produce non-general superintelligence, and AGI requires new paradigms in machine intelligence that we either haven't discovered or don't yet know how to make work well in practice.
  • From talking to people I have a stronger model of why they work on capabilities and not safety or, as is more often the case, think safety concerns do not require extreme caution, instead necessitating only normal levels of caution as is the case in most engineering disciplines. My read from dozens of similar interactions with researchers is that this attitude reflects both a lack of deep reflection on safety and a difficulty in updating to an idea that would break the momentum of their life's work.
  • But not everyone was like this. For example, once people figured out I was a "safety guy", one person came up to me and asked what he should do because he has an idea he thinks would move AI in a dangerous direction and didn't know if he should publish anyway, keep it a secret, or do something else. I recommended that he reach out to MIRI, and I hope he will, but it exposed to me that even when a capabilities researcher notices their work is going in a dangerous direction they don't have a standard script for what to do about it.
  • On a different note, EA continues to have perception problems. I talked to two people who said things that indicated they lean EA, asked them about if they identified that way, and then they told me they didn't because they associate EA with Singer-style act utilitarianism and self-imposed poverty through maximizing donated income. I let them know the EA tent is much bigger than that, but clearly EA evangelists have more work to do in this direction.

To summarize, I think the main takeaway is that we, being the sorts of persons who read LessWrong, live in a bubble where we know AGI is dangerous, and outside that bubble people still don't know or have confused ideas about how it's dangerous, even among the group of people weird enough to work on AGI instead of more academically respectable, narrow AI. That's really scary, because the people outside the bubble also include those affecting public policy and making business decisions, and they lack the perspective we share about the dangers of both narrow and general AI. Luckily, this points towards two opportunities we can work on now to mitigate the risks of AGI:

  1. Normalize thinking about AI safety. Right now it would be a major improvement if we could move the field of AI research to be on par with the way biomedical researchers think about risks for near-term, narrow applications of their research, let alone getting everyone thinking about the existential risks of AGI. I think most of this work needs to happen on a 1-to-1, human level right now, pushing individual researchers towards more safety-focus and reshaping the culture of the AI research community. For this reason I think it's extremely important that I and others make an effort to
    1. attend AI capabilities conferences,
    2. form personal relationships with AI researchers,
    3. and encourage them to take safety seriously.
  2. Establish and publicize a "sink" for dangerous AI research. When people have an idea they think is dangerous (which assumes to some extent we succeed at the previous objective, but as I mentioned it already comes up now), they need a default script for what to do. Cybersecurity and biomedical research have standard approaches, and although I don't think their approaches will work the same for AGI, we can use them as models for designing a standard. The sink source should then be owned by a team seen as extremely responsible, reliable, and committed to safety above all else. I recommend FHI or MIRI (or both!) take on that role. The sub-actions of this work are to
    1. design and establish a process,
    2. find it a home,
    3. publicize its use,
    4. and continually demonstrate its effectiveness—especially to capabilities researchers who might have dangerous ideas—so it remains salient.

These interventions are more than I can take on myself, and I don't believe I have a comparative advantage to execute on them, so I need your help if you're interested, i.e. if you've been thinking about doing more with AI safety, know that there is concrete work you can do now other than technical work on alignment. For myself I've set an intention that when I attend HLAI 2020 we'll have moved at least half-way towards achieving these goals, so I'll be working on them as best I can, but we're not going to get there if I have to do it alone. If you'd like to coordinate on these objectives feel free to start in the comments below or reach out to me personally and we can talk more.

I feel like I've only just scratched the surface of my time at HLAI 2018 in this report, and I think it will take a while to process everything I learned and follow-up with everyone I talked to there. But if I had to give my impression of the conference in a tweet it would be this: we've come a long way since 2014, and I'm very pleased with the progress (SuperintelligencePuerto RicoAsilomar), but we have even further to go, so let's get to work!

New to LessWrong?

New Comment
12 comments, sorted by Click to highlight new comments since: Today at 4:11 PM

outside that bubble people still don't know or have confused ideas about how it's dangerous, even among the group of people weird enough to work on AGI instead of more academically respectable, narrow AI.

I agree. I run a local AI Safety Meetup and it's frustrating to see that the ones who better understand the discussed concepts consider that Safety is way less interesting/important than AGI Capabilities research. I remember someone saying something like: "Ok, this Safety thing is kind of interesting, but who would be interested in working on real AGI problems" and the other guys noding. What they say:

  • "I'll start an AGI research lab. When I feel we're close enough to AGI I'll consider Safety."
  • "It's difficult to do significant research on Safety without knowing a lot about AI in general."

I definitely agree that we need to normalize thinking about AI safety, and I think that's been happening. In fact, I think of that as one of the major benefits of writing the Alignment newsletter, even though I started it with AI safety researchers in mind (who still remain the audience I write for, if not the audience I actually have).

I'm less convinced that we should have a process for dangerous AI research. What counts as dangerous? Certainly this makes sense for AI research that can be dangerous in the short term, such as research that has military or surveillance applications, but what would be dangerous from a long-term perspective? It shouldn't just be research that differentially benefits general AI over long-term safety, since that's almost all AI research. And even though on the current margin I would want research to differentially advance safety, it feels wrong to call other research dangerous, especially given its enormous potential for good.

it feels wrong to call other research dangerous, especially given its enormous potential for good.

I agree that calling 99.9% of AI research "dangerous" and AI Safety research "safe" is not an useful dichotomy. However, I consider AGI companies/labs and people focusing on implementing self-improving AI/code synthesis extremely dangerous. Same for any breakthrough in general AI, or things that greatly shorten the AGI timeline.

Do you mean that some AI research have positive expected utility (e.g. in medecine) and should not be called dangerous because the good they produce compensates for the increased AI-risk?

Just to return for a moment to what I wrote, I don't mean to be making an assessment here on "dangerous", but instead to provide this service for things people themselves think are dangerous. Figuring out where to draw the line in what capabilities research is so dangerous it should not be published is a thing I have only very weak opinions on. For example, if you figured out how to make recursive self improvement work in a way that doesn't immediately result in wild divergence and could stablely produce better results over many iterations I'd say that's dangerous, but less than that I'm not sure where you might draw the line.

Great idea about the sink. People want to publish their research, even dangerous one, to get feedback and appreciation from others, and such sink could provide needed feedback.

Hi Gordon!

thanks for writing this. I am glad you enjoyed HLAI 2018.

I agree, many AI/AGI researchers partially or completely ignore AI/AGI safety. But I have noticed a trend in the past years: it's possible to "turn" these people and make them take safety more seriously.

Usually the reason to their "safety ignorance" is just insufficient insight, not spending enough time on this topic. Once they learn more, they quickly see how thing can go wrong. Of course, not everyone.

Hope this helped.



Great way how to visualize the risks of unaligned AGI is the alien life form in the film Life https://www.youtube.com/watch?v=cuA-xqBw4jE

It starts as a seed entity, quickly adapts, learns new tricks, gets bigger and stronger, ruthlessly oblivious to human value system. Watch it, and instead of the alien imagine child AGI.

Yeah, it'd be nice if researchers consulted FHI about any dangerous AI ideas. But I don't really understand how that would work. Let's say I'm a researcher at some university, I have a dangerous AI idea and contact FHI about it. What happens next?

Honestly I'm not sure. Maybe you still publish but are advised to withhold certain details? Maybe you keep it a secret? Maybe you use a woodchipper to destroy all evidence of the idea? I think much of the value would come from having both a visible and active norm of taking ideas you think are dangerous seriously and having someone to discuss dangerous ideas with so you don't end up with researchers like the ones I met who feel they are alone and have to grapple with thinking through the consequences of their research in isolation.

Additional reflections from Marek, CEO of GoodAI, along with links to additional media coverage, including one about whether or not to publish dangerous AI research.

The problem with being able to direct people to outside "safety consultants" is that it's not like there's a module that needs to be strapped on to an AI to make it friendly. The part of the AI that decides which actions are right is the entire AI - once you know the good actions you can just take them.

Safety is a feature of the AI like "turns up the link you were looking for" is a feature of web search. Or as Stuart Russell puts it, nobody talks about "building bridges that don't fall down" as a separate research area from building bridges.

So people looking to "add safety" to their AGI design might need to increase their own ability to design AGI. Does this imply that we should be putting out more educational resources, more benchmarks, more ways of thinking about the consequences of a particular AI design?

Hmm, I think the current situation is a bit more complicated. Yes, we can't just bring in a safety consultant to try to fix things up, but it's also the case that safety is not always something there's a way to meaningfully talk about with everyones' research because it's so far away from safety. To use the bridge metaphor, it would be like talking about bridge safety when you're doing research on mortar: yes, mortar has impacts on safety, but it's also pretty far removed until you put it in the context of a full system and very few people are doing something on the order of building a bridge/AGI (at least at this conference) and instead were focused on improvements to algorithms and architectures that they believe are on the path to figuring out how to build the thing at all.

That said, I think all of your suggested actions sound reasonable, because it seems to me now the primary issue may simply be changing the culture in AI/AGI research to have a much stronger safety focus.